API Reference

pyutils.dbutils

Module that defines common database functions.

See also

genutils

module that defines many general and useful functions.

logutils

module that defines common logging functions.

dbutils.connect_db(db_path, autocommit=False)[source]

Open a database connection to a SQLite database.

Parameters
  • db_path (str) – File path to the database file.

  • autocommit (bool, optional) – In autocommit mode, all changes to the database are committed as soon as all operations associated with the current database connection complete 1 (the default value is False, which implies that statements that modify the database don’t take effect immediately 2. You have to call commit() to close the transaction.).

Raises

sqlite3.Error – Raised if any SQLite-related errors occur, e.g. IntegrityError or OperationalError, since sqlite3.Error is the class for all exceptions of the module.

Returns

dbutils.create_db(overwrite_db, db_filepath, schema_filepath, **kwargs)[source]

Create a SQLite database.

A schema file is needed for creating the database. If an existing SQLite database will be overwritten, the user is given 5 seconds to stop the script before the database is overwritten.

Parameters
  • overwrite_db (bool) – Whether the database will be overwritten. The user is given some time to stop the script before the database is overwritten.

  • db_filepath (str) – Path to the SQLite database.

  • schema_filepath (str) – Path to the schema file.

  • **kwargs – TODO

Raises

IOError – Raised if there is any IOError when opening the schema file, e.g. the schema file doesn’t exist (OSError).

dbutils.sql_sanity_check(sql, values)[source]

Perform sanity checks on an SQL query.

Only SQL queries that have values to be added are checked, i.e. INSERT queries.

These are the checks performed:

  • Whether the values are of tuple type

  • Whether the SQL expression contains the right number of values

Parameters
  • sql (str) – SQL query to be executed.

  • values (tuple of str) – The values to be inserted in the database.

Raises

SQLSanityCheckError – Raised when the sanity check on a SQL query fails: e.g. the query’s values are not of tuple type or wrong number of values in the SQL query.

pyutils.genutils

Module that defines many general and useful functions.

You will find such functions as loading a YAML file, writing to a file on disk, and getting the local time based on the current time zone.

See also

dbutils

module that defines database-related functions.

logutils

module that defines log-related functions.

saveutils

module that defines a class for saving webpages on disk.

genutils.convert_utctime_to_local_tz(utc_time=None)[source]

Convert a given UTC time into the local time zone.

If a UTC time is given, it is converted to the local time zone. If utc_time is None, then the current time based on the local time zone is returned.

The date and time are returned as a string with format YYYY-MM-DD HH:MM:SS-HH:MM

The modules pytz and tzlocal need to be installed. You can install them with pip:

$ pip install tzlocal

This will also install pytz.

Parameters

utc_time (time.struct_time) – The UTC time to be converted in the local time zone (the default value is None which implies that the current time will be retrieved and converted into the local time zone).

Returns

local_time – The UTC time converted into the local time zone with the format YYYY-MM-DD HH:MM:SS-HH:MM

Return type

str

Raises

ImportError – Raised if the modules tzlocal and pytz are not found.

See also

get_current_local_datetime()

only returns the current time based on the local time zone.

Examples

>>> import time
>>> utc_time = time.gmtime()
>>> convert_utctime_to_local_tz(utc_time)
'2019-09-05 18:17:59-04:00'
genutils.create_directory(dirpath)[source]

Create a directory if it doesn’t already exist.

Parameters

dirpath (str) – Path to directory to be created.

Raises
genutils.create_timestamped_dir(parent_dirpath, new_dirname='')[source]

Create a timestamped directory if it doesn’t already exist.

The timestamp is added to the beginning of the directory name, e.g.:

/Users/test/20190905-122929-documents
Parameters
  • parent_dirpath (str) – Path to the parent directory.

  • new_dirname (str, optional) – Name of the directory to be created (the default value is “” which implies that only the timestamp will be added as the name of the directory).

Returns

new_dirpath – Path to the newly created directory.

Return type

str

Raises
  • FileExistsError – Raised if the directory already exists.

  • PermissionError – Raised if trying to run an operation without the adequate access rights.

genutils.dump_pickle(filepath, data)[source]

Write data to a pickle file.

Parameters
  • filepath (str) – Path to the pickle file where data will be written.

  • data – Data to be saved on disk.

Raises

OSError – Raised if any I/O related occurs while writing the data to disk, e.g. the file doesn’t exist.

genutils.dumps_json(filepath, data, encoding='utf8', sort_keys=True, ensure_ascii=False)[source]

Write data to a JSON file.

The data is first serialized to a JSON formatted string and then saved to disk.

Parameters
  • filepath (str) – Path to the JSON file where the data will be saved.

  • data – Data to be written to the JSON file.

  • encoding (str, optional) – Encoding to be used for opening the JSON file.

  • sort_keys (bool, optional) – If sort_keys is true, then the output of dictionaries will be sorted by key. See the json.dumps() docstring description. (the default value is True).

  • ensure_ascii (bool, optional) – If ensure_ascii is False, then the return value can contain non-ASCII characters if they appear in strings contained in data. Otherwise, all such characters are escaped in JSON strings. See the json.dumps() docstring description (the default value is False).

Raises

OSError – Raised if any I/O related occurs while writing the data to disk, e.g. the file doesn’t exist.

genutils.get_creation_date(filepath)[source]

Get creation date of a file.

Try to get the date that a file was created, falling back to when it was last modified if that isn’t possible.

If modification date is needed, use os.path.getmtime() which is cross-platform supported.

Parameters

filepath (str) – Path to file whose creation date will be returned.

Returns

Time of creation in seconds.

Return type

float

References

Code is from Stack Overflow’s user Mark Amery.

Examples

>>> from datetime import datetime
>>> creation = get_creation_date("/Users/test/directory")
>>> creation
1567701693.0
>>> str(datetime.fromtimestamp(creation))
'2019-09-05 12:41:33'
genutils.get_current_local_datetime()[source]

Get the current date and time based on the system’s time zone.

The modules pytz and tzlocal need to be installed. You can install them with pip:

$ pip install tzlocal

This will also install pytz.

Returns

The date and time in the system’s time zone.

Return type

datetime.datetime

Raises

ImportError – Raised if the modules tzlocal and pytz are not found.

See also

convert_utctime_to_local_tz()

converts a UTC time based on the system’s time zone.

Examples

>>> datetime_with_tz = get_current_local_datetime()
>>> datetime_with_tz
datetime.datetime(2019, 9, 5, 13, 34, 0, 678836, tzinfo=<DstTzInfo
'US/Eastern' EDT-1 day, 20:00:00 DST>)
>>> str(datetime_with_tz)
'2019-09-05 13:34:18.898435-04:00'
genutils.load_json(filepath, encoding='utf8')[source]

Load JSON data from a file on disk.

Parameters
  • filepath (str) – Path to the JSON file which will be read.

  • encoding (str, optional) – Encoding to be used for opening the JSON file.

Returns

Data loaded from the JSON file.

Return type

data

Raises

OSError – Raise

genutils.load_pickle(filepath)[source]

Open a pickle file.

The function opens a pickle file and returns its content.

Parameters

filepath – Path to the pickle file

Returns

Content of the pickle file.

Return type

data

genutils.load_yaml(f)[source]

Load the content of a YAML file.

The module yaml needs to be installed. It can be installed with pip:

$ pip install pyyaml
Parameters

f – File stream associated with the file read from disk.

Returns

The dictionary read from the YAML file.

Return type

dict

Raises
  • ImportError – Raised if the module yaml is not found.

  • yaml.YAMLError – Raised if there is any error in the YAML structure of the file.

Notes

I got a YAMLLoadWarning when calling yaml.load() without Loader, as the default Loader is unsafe. You must specify a loader with the Loader= argument. See PyYAML yaml.load(input) Deprecation.

genutils.read_file(filepath)[source]

Read a file (in text mode) from disk.

Parameters

filepath (str) – Path to the file to be read from disk.

Returns

Content of the file returned as strings.

Return type

str

Raises

OSError – Raised if any I/O related error occurs while reading the file, e.g. the file doesn’t exist.

genutils.read_yaml(filepath)[source]

Read a YAML file.

Its content is returned which is a dict.

The module yaml needs to be installed. It can be installed with pip:

$ pip install pyyaml
Parameters

filepath (str) – Path to the YAML file to be read.

Returns

The dict read from the YAML file.

Return type

dict

Raises
  • ImportError – Raised if the module yaml is not found.

  • OSError – Raised if any I/O related error occurs while reading the file, e.g. the file doesn’t exist or an error in the YAML structure of the file.

genutils.run_cmd(cmd)[source]

Run a command with arguments.

The command is given as a string but the function will split it in order to get a list having the name of the command and its arguments as items.

Parameters

cmd (str) –

Command to be executed, e.g.

open -a TextEdit text.txt

Returns

retcode – Return code which is 0 if the command was successfully completed. Otherwise, the return code is non-zero.

Return type

int

Examples

TODO

genutils.write_file(filepath, data, overwrite_file=True)[source]

Write data (text mode) to a file.

Parameters
  • filepath (str) – Path to the file where the data will be written.

  • data – Data to be written.

  • overwrite_file (bool, optional) – Whether the file can be overwritten (the default value is True which implies that the file can be overwritten).

Raises
  • OSError – Raised if any I/O related error occurs while reading the file, e.g. the file doesn’t exist.

  • OverwriteFileError – Raised if an existing file is being overwritten and the flag to overwrite files is disabled.

pyutils.logutils

Module that defines common logging functions.

See also

dbutils

module that defines common database functions.

genutils

module that defines many general and useful functions.

logutils.get_error_msg(exc)[source]

Get an error message from an exception.

It converts an error message of type Exception (e.g. sqlite3.IntegrityError) into a string to then be logged.

Parameters

exc (Exception) – The error message as an Exception, e.g. TypeError, which will be converted to a string.

Returns

error_msg – The error message converted as a string.

Return type

str

logutils.setup_logging(logging_config)[source]

Setup logging from a YAML configuration file or logging dictionary.

Loggers can be setup through a YAML logging configuration file or logging dictionary which defines the loggers, their handlers, and the formatters (how log messages get displayed).

Also, a date and time can be added to the beginning of the log filename by setting the option add_datetime to True in the logging configuration file. Thus, you can generate log filenames like this:

2019-08-29-00-58-22-debug.log
Parameters

logging_config (str or dict) – The YAML configuration file path or the logging dict that is used to setup the logging. The contents of the logging dictionary is described in Configuration dictionary schema.

Returns

config_dict – The logging configuration dict that is used to setup the logger(s). The contents of the logging dictionary is described in Configuration dictionary schema.

Return type

dict

Raises
  • KeyError – Raised if a key in the logging config dict is not found.

  • OSError – Raised if the YAML logging config file doesn’t exist.

  • ValueError – Raised if the YAML logging config dict is invalid, i.e. a key or value is invalid. Example: a logging handler’s class is written incorrectly.

Notes

For an example of a YAML logging configuration file, check Configuring Logging.

pyutils.saveutils

Module that defines a class for saving webpages on disk.

Only the HTML content of the webpage is saved on disk, thus the other resources, such as pictures, might not get rendered when viewed on a browser.

class saveutils.SaveWebpages(overwrite_webpages=False, http_get_timeout=5, delay_between_requests=8, headers={'Accept': 'text/html, application/xhtml+xml, application/xml;q=0.9, image/webp, */*;q=0.8', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36c'})[source]

Bases: object

A class that saves webpages on disk.

The HTML content of the webpages is saved on disk. Thus, other resources (such as pictures) might not get rendered when viewed on a browser.

When retrieving webpages, a certain delay is introduced between HTTP requests to the server in order to reduce its workload.

Parameters
  • overwrite_webpages (bool, optional) – Whether a webpage that is saved on disk can be overwritten (the default value is True which implies that the webpages can be overwritten on disk).

  • http_get_timeout (int, optional) – Timeout when a GET request doesn’t receive any response from the server. After the timeout expires, the GET request is dropped (the default value is 5 seconds).

  • headers (dict, optional) –

    The information added to the HTTP GET request that a user’s browser sends to a Web server containing the details of what the browser wants and will accept back from the server. See HTTP request header (the default value is defined in headers).

    Its keys are the request headers’ field names like Accept, Cookie, User-Agent, or Referer and its values are the associated request headers’ field values. See List of all HTTP headers (Mozilla) and List of HTTP header fields (Wikipedia).

get_cached_webpage(filepath)[source]

Load a webpage from disk.

Load the HTML content of a webpage from disk.

The webpages are cached in order to reduce the number of requests to the server.

Parameters

filepath (str) – The file path of the webpage to load from disk.

Raises

OSError – Raised if an I/O related error occurs while reading the cached HTML document, e.g. the file doesn’t exist.

Returns

html – HTML content of the webpage that is loaded from disk.

Return type

str

get_webpage(url)[source]

Get the HTMl content of a webpage.

When retrieving the webpage, a certain delay is introduced between HTTP requests to the server in order to reduce its workload.

Parameters

url (str) – URL of the webpage whose HTML content will be retrieved.

Raises
Returns

html – HTML content of the webpage that is saved on disk.

Return type

str

headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36c'}

The information added to the HTTP GET request that a user’s browser sends to a Web server containing the details of what the browser wants and will accept back from the server.

save_webpage(filepath, url)[source]

Save a webpage on disk.

First, the webpage is checked if it’s already cached. If it’s found in cache, then its HTML content is simply returned.

If the webpage is not found in cache, then it’s retrieved from the server and saved on disk.

IMPORTANT: the webpage found on cache might also be overwritten if the option overwrite_webpages is set to True.

Parameters
  • filepath (str) – File path of the webpage that will be saved on disk.

  • url (str) – URL to the webpage that will be saved on disk.

Raises
  • HTTP404Error – Raised if the server returns a 404 status code because the webpage is not found.

  • OverwriteFileError – Raised if an existing file is being overwritten and the flag to overwrite files is disabled.

  • OSError – Raised if an I/O related error occurs while writing the webpage on disk, e.g. the file doesn’t exist.

Returns

html – HTML content of the webpage that is saved on disk.

Return type

str

pyutils.exceptions

exceptions.connection

Module that defines exceptions related to connection problems.

These are the exceptions that are raised when querying a server for a resource.

exception exceptions.connection.HTTP404Error[source]

Raised if the server returns a 404 status code because the webpage is not found.

exceptions.files

Module that defines exceptions related to files problems.

These are the exceptions that are raised when reading or writing files.

exception exceptions.files.OverwriteFileError[source]

Raised if an existing file is being overwritten and the flag to overwrite files is disabled.

exceptions.log

Module that defines exceptions related to logging problems.

These are the exceptions that are raised when logging.

exception exceptions.log.LoggingSanityCheckError[source]

Raised if the sanity check on one of the LoggingWrapper parameters fails.

exceptions.sql

Module that defines exceptions related to SQL database problems.

These are the exceptions that are raised when querying a SQL database.

exception exceptions.sql.SQLSanityCheckError[source]

Raised if one of the sanity checks on a SQL query fails: e.g. the query’s values are not of tuple type or wrong number of values in the SQL query.