khp¶
khp.contacts module¶
-
khp.contacts.
download_contacts
(interaction_type, start_date=None, end_date=None)[source]¶ Download contacts for a given interaction type and time period, and save the data.
Parameters: - interaction_type (str) – Type of contact (i.e. IM, Voice, Email)
- start_date (
str
, optional) – Start time, YYYY-mm-dd. Defaults to beginning of yesterday. - start_date – End time, YYYY-mm-dd. Defaults to end of yesterday.
-
khp.contacts.
download_transcripts
(contact_ids=None)[source]¶ Download transcripts for a list of contact_ids.
Parameters: contact_ids ( list
, optional) – List of Contact IDs to retrieve recordings for. If None are provided (default), queries contacts that have not been parsed
-
khp.contacts.
enhanced_transcripts
()[source]¶ Read in un-processed transcripts from Postgres, perform a series of operations to produce metadata per contact_id, load into table enhanced_transcripts
-
khp.contacts.
get_contacts_to_load
()[source]¶ Grab the filenames of all contact files that have not been loaded to Postgres.
Returns: List of contact filenames to load Return type: list
-
khp.contacts.
get_transcripts_to_load
()[source]¶ Grab the filenames of all transcript files that have not been loaded to Postgres
Returns: List of trancsript files to load Return type: list
-
khp.contacts.
load_enhanced_transcript
(contact_id, summary)[source]¶ Load the transcript summary dictionary to the enhanced_transcripts table
Parameters: - contact_id (int) – contact id
- summary (dict) – Summary dict of the transcript
-
khp.contacts.
load_transcripts_df
(contact_ids)[source]¶ Load the transcripts data associated with a set of contact_ids into a pandas Dataframe.
Parameters: contact_ids (list) – List of contact ids to load Returns: Dataframe containing the loaded transcripts Return type: pandas.Dataframe
-
khp.contacts.
main
(interaction_type='IM', start_date=None, end_date=None)[source]¶ Run the full contacts pipeline.
Parameters: - interaction_type (
str
, optional) – Type of contact (i.e. IM, Voice, Email) - start_date (
str
, optional) – Start date, format YYYY-mm-dd - end_date (
str
, optional) – End date, format YYYY-mm-dd
- interaction_type (
-
khp.contacts.
parse_contacts_file
(filename)[source]¶ Parse the JSON contacts file downloaded from Icescape. Parsing includes:
- Reading JSON file into python object
- Running transformations on each contact dict in the contacts file
- Uploading parsed/transformed contact dicts into Postgres
Parameters: filename (str) – Full path of the contacts file
-
khp.contacts.
parse_transcript
(filename)[source]¶ Parse the transcript file downloaded from Icescape. Parsing includes:
- Reading JSON file into python object
- Running transformations on the raw transcript
- Uploading parsed/transformed cmessages into Postgres
Parameters: filename (str) – Full path of the transcript file
khp.ftp module¶
khp.icescape module¶
-
class
khp.icescape.
Icescape
[source]¶ Bases:
object
Summary
-
headers
¶ TYPE – Description
-
password
¶ TYPE – Description
-
token
¶ TYPE – Description
-
user
¶ TYPE – Description
-
user_agent
¶ TYPE – Description
-
get_contacts
(interaction_type, start_time=None, end_time=None)[source]¶ Get results from the Icescape QueryContacts2 API.
Parameters: - interaction_type (str) – Type of contact (i.e. IM, Voice, Email)
- start_time (str, optional) – Start time, accepts date formats YYYY-mm-dd or YYYY-mm-dd H:M:S. Defaults to beginning of yesterday.
- end_time (str, optional) – Start time, accepts date formats YYYY-mm-dd or YYYY-mm-dd H:M:S. Defaults to end of yesterday.
Returns: Array of contact data dictionaries
Return type: data (list)
-
khp.implicitly_tls module¶
khp.transforms module¶
Module containing a set of transformations that are run are the JSON responses from the API, and the transcript dataframes.
-
class
khp.transforms.
Transformer
(transforms_meta)[source]¶ Bases:
object
A class that ingests a dictionary of transforms, and runs those transforms on a supplied dictionary or dataframe.
-
transforms
¶ list – List of the transformation dictionaries.
-
static
get_input_cols
()[source]¶ Return the input columns associated with a transformation
Parameters: transform_dict (dict) – Transform dict parsed from transforms.yml Returns: list of input columns Return type: list
-
static
get_value
(data)[source]¶ Grab the value associated with a key in a dictionary. Supports the nested key definitions in transforms.yml. For example, a key of ‘KEY1|KEY2|KEY3’ will return 5 from the following data:
data = { 'KEY1': { 'KEY2': { 'KEY3': 5, ... }, ... }, ... }
Parameters: - key (str) – Dictionary key of the value to return
- data (dict) – Dictionary of data to return value from
Returns: Returns the value associated with the specified key(s)
-
static
parse_transforms
()[source]¶ Parse the list of raw transformations. Generally, each transformation (i.e. each element of transforms_meta) will be in the following format:
- {
- ‘field_name’:
- {
- ‘key1’: ‘value1’, ‘key2’: ‘value2’, …
}
}
Parameters: transforms_meta (list) – List of the raw transformation dictionaries. Returns: List of the transformation dictionaries. Return type: list
-
run_df_transforms
(dataframe)[source]¶ Run the transforms on a supplied dataframe.
Parameters: dataframe (pandas.Dataframe) – Input dataframe to run transformation on. Returns: Dataframe with updated and/or new columns Return type: pandas.Dataframe
-
-
khp.transforms.
calc_handle_time
(dataframe)[source]¶ Calculate the handle time for a contact. Handle time is calculated as time elasped between all messages with convo_ind == 1.
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: Handle time for the contact, in minutes
Return type: float
-
khp.transforms.
calc_message_sequence
(dataframe)[source]¶ Calculate the message sequence for each message, defined as: ‘prev_message_type’ - ‘message_type’, used to indicate whether a message was from counsellor to counsellor, counselee to counsellor, system to counsellor etc.
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: Message sequence
Return type: pandas.Series
-
khp.transforms.
calc_response_time
(dataframe)[source]¶ Calculate the response time for each message. Defined as time elapsed between message and previous message.
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: Response time for the message
Return type: pandas.Series
-
khp.transforms.
calc_wait_time
(dataframe)[source]¶ Calculate the wait time for a contact. Wait time is calculated as time elasped between the start of the transcript and the first message with convo_ind == 1.
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: Wait time for the contact, in minutes
Return type: float
-
khp.transforms.
clean_text
(text)[source]¶ Function to sanitize the trnascript messages. Replaces all whitespace with single spaces (since newlines break things when uploading to the DB). Get rid of everything that isn’t a number, letter, or in a list of characters to keep.
Parameters: text (str) – Text string to clean Returns: Cleaned text str Return type: str
-
khp.transforms.
column_operator
(dataframe, parameters)[source]¶ Apply a numpy operator on a column in a dataframe, optionally applying filters specified in parameters.
If multiple aggregators are supplied, a dictionary will be returned instead of the float result. For example, if the following parameters are provided:
parameters = { 'output': khp_response_time, 'aggregator': [mean, max] }
Function will return:
{mean_khp_response_time: 1.2325, max_khp_response_time: 55.212}
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: aggregator operation output. If multiple aggregators are supplied, returns a dict with the result of each aggregator. Otherwise, returns the float result of the aggregator operation.
Return type: dict or float
-
khp.transforms.
convert_timedelta
(value, unit)[source]¶ Convert a timedelta64 object to a float
Parameters: - value (numpy.timedelta64[ns]) – Timedelta value to convert
- unit (TYPE) – Datetime unit code, see link for a list of acceptable codes https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
Returns: Converted timedelta value
Return type: float
-
khp.transforms.
convo_indicator
(dataframe)[source]¶ Create an indicator for each message to signal whether it’s part of the conversation. A message is deemed part of the conversation if it appears after the convo_start_ind messages and is message type 3 or 4.
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: Conversation indicator
Return type: pandas.Series
-
khp.transforms.
convo_start_indicator
(dataframe)[source]¶ Create an indicator for each message to signal whether it’s the start of the conversation. Starting messages are detected using a regex, since the starting messages are system generated (hence filtering on message_type=1).
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: Conversation start indicator
Return type: pandas.Series
-
khp.transforms.
filter_df
(dataframe, filters)[source]¶ Filter a dataframe
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- filters (list) – list of filters (dicts) to apply
Returns: Filtered dataframe
Return type: pandas.DataFrame
-
khp.transforms.
parse_handlers
(handlers)[source]¶ Parse the handlers associated with a contact. Split out primary handler and secondary handlers, assuming primary handler as the first handler in the list of handlers.
Parameters: handlers (list) – List of handlers Returns: Dictionary containing primary and secondary handlers Return type: dict
-
khp.transforms.
parse_html
(html)[source]¶ Utilize the beautiful soup html parser to return the text from html
Parameters: html (str) – String of html Returns: extracted text Return type: str
-
khp.transforms.
parse_message
(message_text, is_html)[source]¶ Return the text from a (potentially) html message string
Parameters: - message_text (str) – Description
- is_html (bool) – Boolean indicator whether the text is html, provided upstream from the API response.
Returns: Parsed message text
Return type: str
-
khp.transforms.
parse_messages
(messages)[source]¶ Transformation function to parse a list of messages
Parameters: messages (list) – List of message dicts Returns: List of transformed message dicts Return type: list
-
khp.transforms.
row_count
(dataframe, parameters)[source]¶ Count the number of rows in a dataframe, optionally applying filters specified in parameters.
Parameters: - dataframe (pandas.DataFrame) – Input dataframe
- parameters (dict) – Parameters associated with the transform
Returns: Number of rows
Return type: int
khp.utils module¶
Utils module, contains utility functions used throughout the codebase.
-
khp.utils.
check_response
(response)[source]¶ Check the status of a requests response. If the status code is not 200, log the error and raise an exception.
Parameters: response (requests.models.Response) – Requests response object - Raises
- Exception: If the status code is not 200
-
khp.utils.
chunker
(seq, chunk_size)[source]¶ Break a list into a set of smaller lists with len = chunk_size
Parameters: - seq (list) – list to split up into chunks
- chunk_size (int) – size of chunks
Returns: list of lists with len = chunk_size
Return type: list
-
khp.utils.
clean_dir
(path, prefix=None)[source]¶ Helper function to clear any folders and files in a specified path.
Parameters: - path (str) – input path
- prefix (
str
, optional) – File prefix
-
khp.utils.
convert_timezone
(dt, tz1, tz2)[source]¶ Convert a datetime object from one timezone to another timezone
Parameters: - dt (datetime.datime) – Datetime object to convert
- tz1 (str) – pytz acceptable timezone that dt is in
- tz1 – pytz acceptable timezone to conver to
Returns: Datetime object in timezone 2
Return type: datetime.datetime
-
khp.utils.
generate_date_range
(start_date, end_date)[source]¶ Generate the range of dates between start_date and end_date
Parameters: - start_date (str) – Start date, YYYY-mm-dd
- end_date (str) – End date, YYYY-mm-dd
Returns: - List of dates, as datetime.datetime objects, between start_date
and end_date
Return type: list
-
khp.utils.
get_s3_keys
(s3_bucket, prefix=None)[source]¶ Get a list of keys in an S3 bucket. Optionally specify a prefix to narrow down the keys returned.
Parameters: - s3_bucket (str) – Name of the S3 bucket.
- prefix (
str
, optional) – File prefix. Defaults to None.
Returns: List of keys in the S3 bucket.
Return type: list
-
khp.utils.
parse_date
(str_dt)[source]¶ Convert a date string to a datetime object
Parameters: str_dt (str) – Date in any format excepted by dateutil.parser. WARNING: read the dateutil.parser docs before using to udnerstand default behaviour (i.e. how str_dt’s like 2018 or 2 are handled) Returns: Datetime object Return type: datetime.datetime
-
khp.utils.
parse_s3_contents
(contents, delimiter, remove_dupes=False, skip_first_line=False)[source]¶ Read the contents of an S3 object into a list of lists.
Parameters: - contents (str) – contents of an S3 object
- delimiter (str) – delimiter to split the contents of each line with
- remove_dupes (
bool
, optional) – ensure each line is unique. Defaults to False. - skip_first_line (
bool
, optional) – skip the first line of the S3 object. Defaults to False.
Returns: List of lists, where each tuple is the contents of a single line.
Return type: list
-
khp.utils.
read_jason
(filename)[source]¶ Read a json file into a python object
Parameters: filename (str) – path of the file Returns: parsed data from the file Return type: list or dict
-
khp.utils.
read_s3_file
(s3_bucket, key)[source]¶ Read the contents of an S3 object.
Parameters: - s3_bucket (str) – Name of the S3 bucket.
- key (str) – Name of the S3 object
Returns: Contents of S3 object
Return type: str
-
khp.utils.
read_yaml
(yaml_file)[source]¶ Read a yaml file.
Parameters: yaml_file (str) – Full path of the yaml file. Returns: Dictionary of yaml_file contents. Return type: dict Raises: Exception
– If the yaml_file cannot be opened.
-
khp.utils.
search_path
(path, like=None)[source]¶ Search a path and return all the files. Optionally specify file prefixes and/or filetypes to narrow your criteria.
Parameters: - path (str) – input path
- like (
list
, optional) – List of file regexes to match files on
Returns: list of files matching the specified filetypes
Return type: list
-
khp.utils.
upload_to_s3
(s3_bucket, files, encrypt=True)[source]¶ Upload a list of files to S3.
Parameters: - s3_bucket (str) – Name of the S3 bucket.
- files (list) – List of files to upload
- encrypt (
bool
, optional) – Use serverside AES256 encryption, defaults to True.