khp

khp.contacts module

khp.contacts.download_contacts(interaction_type, start_date=None, end_date=None)[source]

Download contacts for a given interaction type and time period, and save the data.

Parameters:
  • interaction_type (str) – Type of contact (i.e. IM, Voice, Email)
  • start_date (str, optional) – Start time, YYYY-mm-dd. Defaults to beginning of yesterday.
  • start_date – End time, YYYY-mm-dd. Defaults to end of yesterday.
khp.contacts.download_transcripts(contact_ids=None)[source]

Download transcripts for a list of contact_ids.

Parameters:contact_ids (list, optional) – List of Contact IDs to retrieve recordings for. If None are provided (default), queries contacts that have not been parsed
khp.contacts.enhanced_transcripts()[source]

Read in un-processed transcripts from Postgres, perform a series of operations to produce metadata per contact_id, load into table enhanced_transcripts

khp.contacts.get_contacts_to_load()[source]

Grab the filenames of all contact files that have not been loaded to Postgres.

Returns:List of contact filenames to load
Return type:list
khp.contacts.get_transcripts_to_load()[source]

Grab the filenames of all transcript files that have not been loaded to Postgres

Returns:List of trancsript files to load
Return type:list
khp.contacts.load_enhanced_transcript(contact_id, summary)[source]

Load the transcript summary dictionary to the enhanced_transcripts table

Parameters:
  • contact_id (int) – contact id
  • summary (dict) – Summary dict of the transcript
khp.contacts.load_transcripts_df(contact_ids)[source]

Load the transcripts data associated with a set of contact_ids into a pandas Dataframe.

Parameters:contact_ids (list) – List of contact ids to load
Returns:Dataframe containing the loaded transcripts
Return type:pandas.Dataframe
khp.contacts.main(interaction_type='IM', start_date=None, end_date=None)[source]

Run the full contacts pipeline.

Parameters:
  • interaction_type (str, optional) – Type of contact (i.e. IM, Voice, Email)
  • start_date (str, optional) – Start date, format YYYY-mm-dd
  • end_date (str, optional) – End date, format YYYY-mm-dd
khp.contacts.parse_contacts_file(filename)[source]

Parse the JSON contacts file downloaded from Icescape. Parsing includes:

  • Reading JSON file into python object
  • Running transformations on each contact dict in the contacts file
  • Uploading parsed/transformed contact dicts into Postgres
Parameters:filename (str) – Full path of the contacts file
khp.contacts.parse_transcript(filename)[source]

Parse the transcript file downloaded from Icescape. Parsing includes:

  • Reading JSON file into python object
  • Running transformations on the raw transcript
  • Uploading parsed/transformed cmessages into Postgres
Parameters:filename (str) – Full path of the transcript file
khp.contacts.save_data(data, filename)[source]

Save data from icescape API.

Parameters:
  • data (list or dict) – data to save locally or in S3
  • filename (str) – filename to save data to

khp.ftp module

khp.icescape module

class khp.icescape.Icescape[source]

Bases: object

Summary

headers

TYPE – Description

password

TYPE – Description

token

TYPE – Description

user

TYPE – Description

user_agent

TYPE – Description

get_contacts(interaction_type, start_time=None, end_time=None)[source]

Get results from the Icescape QueryContacts2 API.

Parameters:
  • interaction_type (str) – Type of contact (i.e. IM, Voice, Email)
  • start_time (str, optional) – Start time, accepts date formats YYYY-mm-dd or YYYY-mm-dd H:M:S. Defaults to beginning of yesterday.
  • end_time (str, optional) – Start time, accepts date formats YYYY-mm-dd or YYYY-mm-dd H:M:S. Defaults to end of yesterday.
Returns:

Array of contact data dictionaries

Return type:

data (list)

get_recordings(contact_ids)[source]

Get the results from the Icescape GetRecordings API.

Parameters:contact_ids (list) – List of Contact IDs to retrieve recordings for
Returns:Array of contact data dictionaries
Return type:data (list)

khp.implicitly_tls module

khp.transforms module

Module containing a set of transformations that are run are the JSON responses from the API, and the transcript dataframes.

class khp.transforms.Transformer(transforms_meta)[source]

Bases: object

A class that ingests a dictionary of transforms, and runs those transforms on a supplied dictionary or dataframe.

transforms

list – List of the transformation dictionaries.

static get_input_cols()[source]

Return the input columns associated with a transformation

Parameters:transform_dict (dict) – Transform dict parsed from transforms.yml
Returns:list of input columns
Return type:list
static get_value(data)[source]

Grab the value associated with a key in a dictionary. Supports the nested key definitions in transforms.yml. For example, a key of ‘KEY1|KEY2|KEY3’ will return 5 from the following data:

 data = {
     'KEY1': {
         'KEY2': {
             'KEY3': 5,
             ...
             },
         ...
     },
     ...
 }
Parameters:
  • key (str) – Dictionary key of the value to return
  • data (dict) – Dictionary of data to return value from
Returns:

Returns the value associated with the specified key(s)

static parse_transforms()[source]

Parse the list of raw transformations. Generally, each transformation (i.e. each element of transforms_meta) will be in the following format:

{
‘field_name’:
{
‘key1’: ‘value1’, ‘key2’: ‘value2’, …

}

}

Parameters:transforms_meta (list) – List of the raw transformation dictionaries.
Returns:List of the transformation dictionaries.
Return type:list
run_df_transforms(dataframe)[source]

Run the transforms on a supplied dataframe.

Parameters:dataframe (pandas.Dataframe) – Input dataframe to run transformation on.
Returns:Dataframe with updated and/or new columns
Return type:pandas.Dataframe
run_meta_df_transforms(dataframe)[source]

Run transforms on a supplied dataframe.

Parameters:dataframe (pandas.Dataframe) – Input dataframe to run transformation on.
Returns:Output dictionary from the input dataframe
Return type:dict
run_transforms(data)[source]

Run the transforms on a supplied dictionary of data.

Parameters:data (dict) – dictionary of data to run transformation on
Returns:
data that has been parsed, re-mapped and
transformed according to the self.transforms instructions
Return type:(list or dict)
khp.transforms.calc_handle_time(dataframe)[source]

Calculate the handle time for a contact. Handle time is calculated as time elasped between all messages with convo_ind == 1.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Handle time for the contact, in minutes

Return type:

float

khp.transforms.calc_message_sequence(dataframe)[source]

Calculate the message sequence for each message, defined as: ‘prev_message_type’ - ‘message_type’, used to indicate whether a message was from counsellor to counsellor, counselee to counsellor, system to counsellor etc.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Message sequence

Return type:

pandas.Series

khp.transforms.calc_response_time(dataframe)[source]

Calculate the response time for each message. Defined as time elapsed between message and previous message.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Response time for the message

Return type:

pandas.Series

khp.transforms.calc_wait_time(dataframe)[source]

Calculate the wait time for a contact. Wait time is calculated as time elasped between the start of the transcript and the first message with convo_ind == 1.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Wait time for the contact, in minutes

Return type:

float

khp.transforms.clean_text(text)[source]

Function to sanitize the trnascript messages. Replaces all whitespace with single spaces (since newlines break things when uploading to the DB). Get rid of everything that isn’t a number, letter, or in a list of characters to keep.

Parameters:text (str) – Text string to clean
Returns:Cleaned text str
Return type:str
khp.transforms.column_operator(dataframe, parameters)[source]

Apply a numpy operator on a column in a dataframe, optionally applying filters specified in parameters.

If multiple aggregators are supplied, a dictionary will be returned instead of the float result. For example, if the following parameters are provided:

 parameters = {
     'output': khp_response_time,
     'aggregator': [mean, max]
 }

Function will return:

{mean_khp_response_time: 1.2325, max_khp_response_time: 55.212}
Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

aggregator operation output. If multiple aggregators are supplied, returns a dict with the result of each aggregator. Otherwise, returns the float result of the aggregator operation.

Return type:

dict or float

khp.transforms.convert_timedelta(value, unit)[source]

Convert a timedelta64 object to a float

Parameters:
Returns:

Converted timedelta value

Return type:

float

khp.transforms.convo_indicator(dataframe)[source]

Create an indicator for each message to signal whether it’s part of the conversation. A message is deemed part of the conversation if it appears after the convo_start_ind messages and is message type 3 or 4.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Conversation indicator

Return type:

pandas.Series

khp.transforms.convo_start_indicator(dataframe)[source]

Create an indicator for each message to signal whether it’s the start of the conversation. Starting messages are detected using a regex, since the starting messages are system generated (hence filtering on message_type=1).

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Conversation start indicator

Return type:

pandas.Series

khp.transforms.filter_df(dataframe, filters)[source]

Filter a dataframe

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • filters (list) – list of filters (dicts) to apply
Returns:

Filtered dataframe

Return type:

pandas.DataFrame

khp.transforms.parse_handlers(handlers)[source]

Parse the handlers associated with a contact. Split out primary handler and secondary handlers, assuming primary handler as the first handler in the list of handlers.

Parameters:handlers (list) – List of handlers
Returns:Dictionary containing primary and secondary handlers
Return type:dict
khp.transforms.parse_html(html)[source]

Utilize the beautiful soup html parser to return the text from html

Parameters:html (str) – String of html
Returns:extracted text
Return type:str
khp.transforms.parse_message(message_text, is_html)[source]

Return the text from a (potentially) html message string

Parameters:
  • message_text (str) – Description
  • is_html (bool) – Boolean indicator whether the text is html, provided upstream from the API response.
Returns:

Parsed message text

Return type:

str

khp.transforms.parse_messages(messages)[source]

Transformation function to parse a list of messages

Parameters:messages (list) – List of message dicts
Returns:List of transformed message dicts
Return type:list
khp.transforms.row_count(dataframe, parameters)[source]

Count the number of rows in a dataframe, optionally applying filters specified in parameters.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Number of rows

Return type:

int

khp.transforms.str_length(dataframe, parameters)[source]

Calculate the length of a string

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Number of characters for each row of a column

Return type:

pandas.Series

khp.transforms.word_count(dataframe, parameters)[source]

Count the number of words in a string

Parameters:
  • dataframe (pandas.DataFrame) – Input dataframe
  • parameters (dict) – Parameters associated with the transform
Returns:

Word counts for each row of a column

Return type:

pandas.Series

khp.utils module

Utils module, contains utility functions used throughout the codebase.

khp.utils.check_response(response)[source]

Check the status of a requests response. If the status code is not 200, log the error and raise an exception.

Parameters:response (requests.models.Response) – Requests response object
Raises
Exception: If the status code is not 200
khp.utils.chunker(seq, chunk_size)[source]

Break a list into a set of smaller lists with len = chunk_size

Parameters:
  • seq (list) – list to split up into chunks
  • chunk_size (int) – size of chunks
Returns:

list of lists with len = chunk_size

Return type:

list

khp.utils.clean_dir(path, prefix=None)[source]

Helper function to clear any folders and files in a specified path.

Parameters:
  • path (str) – input path
  • prefix (str, optional) – File prefix
khp.utils.convert_timezone(dt, tz1, tz2)[source]

Convert a datetime object from one timezone to another timezone

Parameters:
  • dt (datetime.datime) – Datetime object to convert
  • tz1 (str) – pytz acceptable timezone that dt is in
  • tz1 – pytz acceptable timezone to conver to
Returns:

Datetime object in timezone 2

Return type:

datetime.datetime

khp.utils.generate_date_range(start_date, end_date)[source]

Generate the range of dates between start_date and end_date

Parameters:
  • start_date (str) – Start date, YYYY-mm-dd
  • end_date (str) – End date, YYYY-mm-dd
Returns:

List of dates, as datetime.datetime objects, between start_date

and end_date

Return type:

list

khp.utils.get_s3_keys(s3_bucket, prefix=None)[source]

Get a list of keys in an S3 bucket. Optionally specify a prefix to narrow down the keys returned.

Parameters:
  • s3_bucket (str) – Name of the S3 bucket.
  • prefix (str, optional) – File prefix. Defaults to None.
Returns:

List of keys in the S3 bucket.

Return type:

list

khp.utils.parse_date(str_dt)[source]

Convert a date string to a datetime object

Parameters:str_dt (str) – Date in any format excepted by dateutil.parser. WARNING: read the dateutil.parser docs before using to udnerstand default behaviour (i.e. how str_dt’s like 2018 or 2 are handled)
Returns:Datetime object
Return type:datetime.datetime
khp.utils.parse_s3_contents(contents, delimiter, remove_dupes=False, skip_first_line=False)[source]

Read the contents of an S3 object into a list of lists.

Parameters:
  • contents (str) – contents of an S3 object
  • delimiter (str) – delimiter to split the contents of each line with
  • remove_dupes (bool, optional) – ensure each line is unique. Defaults to False.
  • skip_first_line (bool, optional) – skip the first line of the S3 object. Defaults to False.
Returns:

List of lists, where each tuple is the contents of a single line.

Return type:

list

khp.utils.read_jason(filename)[source]

Read a json file into a python object

Parameters:filename (str) – path of the file
Returns:parsed data from the file
Return type:list or dict
khp.utils.read_s3_file(s3_bucket, key)[source]

Read the contents of an S3 object.

Parameters:
  • s3_bucket (str) – Name of the S3 bucket.
  • key (str) – Name of the S3 object
Returns:

Contents of S3 object

Return type:

str

khp.utils.read_yaml(yaml_file)[source]

Read a yaml file.

Parameters:yaml_file (str) – Full path of the yaml file.
Returns:Dictionary of yaml_file contents.
Return type:dict
Raises:Exception – If the yaml_file cannot be opened.
khp.utils.search_path(path, like=None)[source]

Search a path and return all the files. Optionally specify file prefixes and/or filetypes to narrow your criteria.

Parameters:
  • path (str) – input path
  • like (list, optional) – List of file regexes to match files on
Returns:

list of files matching the specified filetypes

Return type:

list

khp.utils.upload_to_s3(s3_bucket, files, encrypt=True)[source]

Upload a list of files to S3.

Parameters:
  • s3_bucket (str) – Name of the S3 bucket.
  • files (list) – List of files to upload
  • encrypt (bool, optional) – Use serverside AES256 encryption, defaults to True.
khp.utils.write_jason(data, filename)[source]

Write a Python list or dictionary to a json file.

Parameters:
  • data (list or dict) – data to write to file
  • filename (str) – path of the file to write to
khp.utils.yesterdays_range()[source]

Generate yesterdays date range, in datetime objects

Returns:Beginning of yesterday datetime.datetime: End of yesterday
Return type:datetime.datetime