khp¶

khp.contacts module¶

khp.contacts.download_contacts(interaction_type, start_date=None, end_date=None)[source]¶

Download contacts for a given interaction type and time period, and save the data.

Parameters:	interaction_type (str) – Type of contact (i.e. IM, Voice, Email) start_date (`str`, optional) – Start time, YYYY-mm-dd. Defaults to beginning of yesterday. start_date – End time, YYYY-mm-dd. Defaults to end of yesterday.

khp.contacts.download_transcripts(contact_ids=None)[source]¶

Download transcripts for a list of contact_ids.

Parameters:	contact_ids (`list`, optional) – List of Contact IDs to retrieve recordings for. If None are provided (default), queries contacts that have not been parsed

khp.contacts.enhanced_transcripts()[source]¶: Read in un-processed transcripts from Postgres, perform a series of operations to produce metadata per contact_id, load into table enhanced_transcripts

khp.contacts.get_contacts_to_load()[source]¶

Grab the filenames of all contact files that have not been loaded to Postgres.

Returns:	List of contact filenames to load
Return type:	list

khp.contacts.get_transcripts_to_load()[source]¶

Grab the filenames of all transcript files that have not been loaded to Postgres

Returns:	List of trancsript files to load
Return type:	list

khp.contacts.load_enhanced_transcript(contact_id, summary)[source]¶

Load the transcript summary dictionary to the enhanced_transcripts table

Parameters:	contact_id (int) – contact id summary (dict) – Summary dict of the transcript

khp.contacts.load_transcripts_df(contact_ids)[source]¶

Load the transcripts data associated with a set of contact_ids into a pandas Dataframe.

Parameters:	contact_ids (list) – List of contact ids to load
Returns:	Dataframe containing the loaded transcripts
Return type:	pandas.Dataframe

khp.contacts.main(interaction_type='IM', start_date=None, end_date=None)[source]¶

Run the full contacts pipeline.

Parameters:	interaction_type (`str`, optional) – Type of contact (i.e. IM, Voice, Email) start_date (`str`, optional) – Start date, format YYYY-mm-dd end_date (`str`, optional) – End date, format YYYY-mm-dd

khp.contacts.parse_contacts_file(filename)[source]¶

Parse the JSON contacts file downloaded from Icescape. Parsing includes:

Reading JSON file into python object
Running transformations on each contact dict in the contacts file
Uploading parsed/transformed contact dicts into Postgres

Parameters:	filename (str) – Full path of the contacts file

khp.contacts.parse_transcript(filename)[source]¶

Parse the transcript file downloaded from Icescape. Parsing includes:

Reading JSON file into python object
Running transformations on the raw transcript
Uploading parsed/transformed cmessages into Postgres

Parameters:	filename (str) – Full path of the transcript file

khp.contacts.save_data(data, filename)[source]¶

Save data from icescape API.

Parameters:	data (list or dict) – data to save locally or in S3 filename (str) – filename to save data to

khp.ftp module¶

khp.icescape module¶

class khp.icescape.Icescape[source]¶

Bases: object

Summary

headers¶: TYPE – Description

password¶: TYPE – Description

token¶: TYPE – Description

user¶: TYPE – Description

user_agent¶: TYPE – Description

get_contacts(interaction_type, start_time=None, end_time=None)[source]¶

Get results from the Icescape QueryContacts2 API.

Parameters:	interaction_type (str) – Type of contact (i.e. IM, Voice, Email) start_time (str, optional) – Start time, accepts date formats YYYY-mm-dd or YYYY-mm-dd H:M:S. Defaults to beginning of yesterday. end_time (str, optional) – Start time, accepts date formats YYYY-mm-dd or YYYY-mm-dd H:M:S. Defaults to end of yesterday.
Returns:	Array of contact data dictionaries
Return type:	data (list)

get_recordings(contact_ids)[source]¶

Get the results from the Icescape GetRecordings API.

Parameters:	contact_ids (list) – List of Contact IDs to retrieve recordings for
Returns:	Array of contact data dictionaries
Return type:	data (list)

khp.implicitly_tls module¶

khp.transforms module¶

Module containing a set of transformations that are run are the JSON responses from the API, and the transcript dataframes.

class khp.transforms.Transformer(transforms_meta)[source]¶

Bases: object

A class that ingests a dictionary of transforms, and runs those transforms on a supplied dictionary or dataframe.

transforms¶: list – List of the transformation dictionaries.

static get_input_cols()[source]¶

Return the input columns associated with a transformation

Parameters:	transform_dict (dict) – Transform dict parsed from transforms.yml
Returns:	list of input columns
Return type:	list

static get_value(data)[source]¶

Grab the value associated with a key in a dictionary. Supports the nested key definitions in transforms.yml. For example, a key of ‘KEY1|KEY2|KEY3’ will return 5 from the following data:

 data = {
     'KEY1': {
         'KEY2': {
             'KEY3': 5,
             ...
             },
         ...
     },
     ...
 }

Parameters:	key (str) – Dictionary key of the value to return data (dict) – Dictionary of data to return value from
Returns:	Returns the value associated with the specified key(s)

static parse_transforms()[source]¶

Parse the list of raw transformations. Generally, each transformation (i.e. each element of transforms_meta) will be in the following format:

{

‘field_name’:

{: ‘key1’: ‘value1’, ‘key2’: ‘value2’, …

}

Parameters:	transforms_meta (list) – List of the raw transformation dictionaries.
Returns:	List of the transformation dictionaries.
Return type:	list

run_df_transforms(dataframe)[source]¶

Run the transforms on a supplied dataframe.

Parameters:	dataframe (pandas.Dataframe) – Input dataframe to run transformation on.
Returns:	Dataframe with updated and/or new columns
Return type:	pandas.Dataframe

run_meta_df_transforms(dataframe)[source]¶

Run transforms on a supplied dataframe.

Parameters:	dataframe (pandas.Dataframe) – Input dataframe to run transformation on.
Returns:	Output dictionary from the input dataframe
Return type:	dict

run_transforms(data)[source]¶

Run the transforms on a supplied dictionary of data.

Parameters:	data (dict) – dictionary of data to run transformation on
Returns:	data that has been parsed, re-mapped and transformed according to the self.transforms instructions
Return type:	(list or dict)

khp.transforms.calc_handle_time(dataframe)[source]¶

Calculate the handle time for a contact. Handle time is calculated as time elasped between all messages with convo_ind == 1.

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Handle time for the contact, in minutes
Return type:	float

khp.transforms.calc_message_sequence(dataframe)[source]¶

Calculate the message sequence for each message, defined as: ‘prev_message_type’ - ‘message_type’, used to indicate whether a message was from counsellor to counsellor, counselee to counsellor, system to counsellor etc.

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Message sequence
Return type:	pandas.Series

khp.transforms.calc_response_time(dataframe)[source]¶

Calculate the response time for each message. Defined as time elapsed between message and previous message.

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Response time for the message
Return type:	pandas.Series

khp.transforms.calc_wait_time(dataframe)[source]¶

Calculate the wait time for a contact. Wait time is calculated as time elasped between the start of the transcript and the first message with convo_ind == 1.

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Wait time for the contact, in minutes
Return type:	float

khp.transforms.clean_text(text)[source]¶

Function to sanitize the trnascript messages. Replaces all whitespace with single spaces (since newlines break things when uploading to the DB). Get rid of everything that isn’t a number, letter, or in a list of characters to keep.

Parameters:	text (str) – Text string to clean
Returns:	Cleaned text str
Return type:	str

khp.transforms.column_operator(dataframe, parameters)[source]¶

Apply a numpy operator on a column in a dataframe, optionally applying filters specified in parameters.

If multiple aggregators are supplied, a dictionary will be returned instead of the float result. For example, if the following parameters are provided:

 parameters = {
     'output': khp_response_time,
     'aggregator': [mean, max]
 }

Function will return:

{mean_khp_response_time: 1.2325, max_khp_response_time: 55.212}

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	aggregator operation output. If multiple aggregators are supplied, returns a dict with the result of each aggregator. Otherwise, returns the float result of the aggregator operation.
Return type:	dict or float

khp.transforms.convert_timedelta(value, unit)[source]¶

Convert a timedelta64 object to a float

Parameters:	value (numpy.timedelta64[ns]) – Timedelta value to convert unit (TYPE) – Datetime unit code, see link for a list of acceptable codes https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
Returns:	Converted timedelta value
Return type:	float

khp.transforms.convo_indicator(dataframe)[source]¶

Create an indicator for each message to signal whether it’s part of the conversation. A message is deemed part of the conversation if it appears after the convo_start_ind messages and is message type 3 or 4.

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Conversation indicator
Return type:	pandas.Series

khp.transforms.convo_start_indicator(dataframe)[source]¶

Create an indicator for each message to signal whether it’s the start of the conversation. Starting messages are detected using a regex, since the starting messages are system generated (hence filtering on message_type=1).

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Conversation start indicator
Return type:	pandas.Series

khp.transforms.filter_df(dataframe, filters)[source]¶

Filter a dataframe

Parameters:	dataframe (pandas.DataFrame) – Input dataframe filters (list) – list of filters (dicts) to apply
Returns:	Filtered dataframe
Return type:	pandas.DataFrame

khp.transforms.parse_handlers(handlers)[source]¶

Parse the handlers associated with a contact. Split out primary handler and secondary handlers, assuming primary handler as the first handler in the list of handlers.

Parameters:	handlers (list) – List of handlers
Returns:	Dictionary containing primary and secondary handlers
Return type:	dict

khp.transforms.parse_html(html)[source]¶

Utilize the beautiful soup html parser to return the text from html

Parameters:	html (str) – String of html
Returns:	extracted text
Return type:	str

khp.transforms.parse_message(message_text, is_html)[source]¶

Return the text from a (potentially) html message string

Parameters:	message_text (str) – Description is_html (bool) – Boolean indicator whether the text is html, provided upstream from the API response.
Returns:	Parsed message text
Return type:	str

khp.transforms.parse_messages(messages)[source]¶

Transformation function to parse a list of messages

Parameters:	messages (list) – List of message dicts
Returns:	List of transformed message dicts
Return type:	list

khp.transforms.row_count(dataframe, parameters)[source]¶

Count the number of rows in a dataframe, optionally applying filters specified in parameters.

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Number of rows
Return type:	int

khp.transforms.str_length(dataframe, parameters)[source]¶

Calculate the length of a string

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Number of characters for each row of a column
Return type:	pandas.Series

khp.transforms.word_count(dataframe, parameters)[source]¶

Count the number of words in a string

Parameters:	dataframe (pandas.DataFrame) – Input dataframe parameters (dict) – Parameters associated with the transform
Returns:	Word counts for each row of a column
Return type:	pandas.Series

khp.utils module¶

Utils module, contains utility functions used throughout the codebase.

khp.utils.check_response(response)[source]¶

Check the status of a requests response. If the status code is not 200, log the error and raise an exception.

Parameters:	response (requests.models.Response) – Requests response object

Raises: Exception: If the status code is not 200

khp.utils.chunker(seq, chunk_size)[source]¶

Break a list into a set of smaller lists with len = chunk_size

Parameters:	seq (list) – list to split up into chunks chunk_size (int) – size of chunks
Returns:	list of lists with len = chunk_size
Return type:	list

khp.utils.clean_dir(path, prefix=None)[source]¶

Helper function to clear any folders and files in a specified path.

Parameters:	path (str) – input path prefix (`str`, optional) – File prefix

khp.utils.convert_timezone(dt, tz1, tz2)[source]¶

Convert a datetime object from one timezone to another timezone

Parameters:	dt (datetime.datime) – Datetime object to convert tz1 (str) – pytz acceptable timezone that dt is in tz1 – pytz acceptable timezone to conver to
Returns:	Datetime object in timezone 2
Return type:	datetime.datetime

khp.utils.generate_date_range(start_date, end_date)[source]¶

Generate the range of dates between start_date and end_date

Parameters:

start_date (str) – Start date, YYYY-mm-dd
end_date (str) – End date, YYYY-mm-dd

Returns:

List of dates, as datetime.datetime objects, between start_date: and end_date

Return type:

list

khp.utils.get_s3_keys(s3_bucket, prefix=None)[source]¶

Get a list of keys in an S3 bucket. Optionally specify a prefix to narrow down the keys returned.

Parameters:	s3_bucket (str) – Name of the S3 bucket. prefix (`str`, optional) – File prefix. Defaults to None.
Returns:	List of keys in the S3 bucket.
Return type:	list

khp.utils.parse_date(str_dt)[source]¶

Convert a date string to a datetime object

Parameters:	str_dt (str) – Date in any format excepted by dateutil.parser. WARNING: read the dateutil.parser docs before using to udnerstand default behaviour (i.e. how str_dt’s like 2018 or 2 are handled)
Returns:	Datetime object
Return type:	datetime.datetime

khp.utils.parse_s3_contents(contents, delimiter, remove_dupes=False, skip_first_line=False)[source]¶

Read the contents of an S3 object into a list of lists.

Parameters:	contents (str) – contents of an S3 object delimiter (str) – delimiter to split the contents of each line with remove_dupes (`bool`, optional) – ensure each line is unique. Defaults to False. skip_first_line (`bool`, optional) – skip the first line of the S3 object. Defaults to False.
Returns:	List of lists, where each tuple is the contents of a single line.
Return type:	list

khp.utils.read_jason(filename)[source]¶

Read a json file into a python object

Parameters:	filename (str) – path of the file
Returns:	parsed data from the file
Return type:	list or dict

khp.utils.read_s3_file(s3_bucket, key)[source]¶

Read the contents of an S3 object.

Parameters:	s3_bucket (str) – Name of the S3 bucket. key (str) – Name of the S3 object
Returns:	Contents of S3 object
Return type:	str

khp.utils.read_yaml(yaml_file)[source]¶

Read a yaml file.

Parameters:	yaml_file (str) – Full path of the yaml file.
Returns:	Dictionary of yaml_file contents.
Return type:	dict
Raises:	`Exception` – If the yaml_file cannot be opened.

khp.utils.search_path(path, like=None)[source]¶

Search a path and return all the files. Optionally specify file prefixes and/or filetypes to narrow your criteria.

Parameters:	path (str) – input path like (`list`, optional) – List of file regexes to match files on
Returns:	list of files matching the specified filetypes
Return type:	list

khp.utils.upload_to_s3(s3_bucket, files, encrypt=True)[source]¶

Upload a list of files to S3.

Parameters:	s3_bucket (str) – Name of the S3 bucket. files (list) – List of files to upload encrypt (`bool`, optional) – Use serverside AES256 encryption, defaults to True.

khp.utils.write_jason(data, filename)[source]¶

Write a Python list or dictionary to a json file.

Parameters:	data (list or dict) – data to write to file filename (str) – path of the file to write to

khp.utils.yesterdays_range()[source]¶

Generate yesterdays date range, in datetime objects

Returns:	Beginning of yesterday datetime.datetime: End of yesterday
Return type:	datetime.datetime