API Reference¶
Contents
pyutils.dbutils
¶
Module that defines common database functions.
See also
-
dbutils.
connect_db
(db_path, autocommit=False)[source]¶ Open a database connection to a SQLite database.
- Parameters
db_path (str) – File path to the database file.
autocommit (bool, optional) – In autocommit mode, all changes to the database are committed as soon as all operations associated with the current database connection complete 1 (the default value is False, which implies that statements that modify the database don’t take effect immediately 2. You have to call
commit()
to close the transaction.).
- Raises
sqlite3.Error – Raised if any SQLite-related errors occur, e.g.
IntegrityError
orOperationalError
, sincesqlite3.Error
is the class for all exceptions of the module.- Returns
sqlite3.Connection – Connection object that represents the SQLite database.
-
dbutils.
create_db
(overwrite_db, db_filepath, schema_filepath, **kwargs)[source]¶ Create a SQLite database.
A schema file is needed for creating the database. If an existing SQLite database will be overwritten, the user is given 5 seconds to stop the script before the database is overwritten.
- Parameters
- Raises
IOError – Raised if there is any IOError when opening the schema file, e.g. the schema file doesn’t exist (OSError).
-
dbutils.
sql_sanity_check
(sql, values)[source]¶ Perform sanity checks on an SQL query.
Only SQL queries that have values to be added are checked, i.e. INSERT queries.
These are the checks performed:
Whether the values are of
tuple
typeWhether the SQL expression contains the right number of values
- Parameters
sql (str) – SQL query to be executed.
values (tuple of str) – The values to be inserted in the database.
- Raises
SQLSanityCheckError – Raised when the sanity check on a SQL query fails: e.g. the query’s values are not of
tuple
type or wrong number of values in the SQL query.
pyutils.genutils
¶
Module that defines many general and useful functions.
You will find such functions as loading a YAML file, writing to a file on disk, and getting the local time based on the current time zone.
See also
-
genutils.
convert_utctime_to_local_tz
(utc_time=None)[source]¶ Convert a given UTC time into the local time zone.
If a UTC time is given, it is converted to the local time zone. If
utc_time
is None, then the current time based on the local time zone is returned.The date and time are returned as a string with format
YYYY-MM-DD HH:MM:SS-HH:MM
The modules
pytz
andtzlocal
need to be installed. You can install them withpip
:$ pip install tzlocal
This will also install
pytz
.- Parameters
utc_time (time.struct_time) – The UTC time to be converted in the local time zone (the default value is None which implies that the current time will be retrieved and converted into the local time zone).
- Returns
local_time – The UTC time converted into the local time zone with the format
YYYY-MM-DD HH:MM:SS-HH:MM
- Return type
- Raises
ImportError – Raised if the modules
tzlocal
andpytz
are not found.
See also
get_current_local_datetime()
only returns the current time based on the local time zone.
Examples
>>> import time >>> utc_time = time.gmtime() >>> convert_utctime_to_local_tz(utc_time) '2019-09-05 18:17:59-04:00'
-
genutils.
create_directory
(dirpath)[source]¶ Create a directory if it doesn’t already exist.
- Parameters
dirpath (str) – Path to directory to be created.
- Raises
FileExistsError – Raised if the directory already exists.
PermissionError – Raised if trying to run an operation without the adequate access rights - for example filesystem permissions (See
PermissionError
). Also, on Windows, thePermissionError
can occur if you try to open a directory as a file. Though, the error is more accurate in Linux: “[Errno 21] Is a directory” (See PermissionError Errno 13 Permission denied (stackoverflow))
-
genutils.
create_timestamped_dir
(parent_dirpath, new_dirname='')[source]¶ Create a timestamped directory if it doesn’t already exist.
The timestamp is added to the beginning of the directory name, e.g.:
/Users/test/20190905-122929-documents
- Parameters
- Returns
new_dirpath – Path to the newly created directory.
- Return type
- Raises
FileExistsError – Raised if the directory already exists.
PermissionError – Raised if trying to run an operation without the adequate access rights.
-
genutils.
dumps_json
(filepath, data, encoding='utf8', sort_keys=True, ensure_ascii=False)[source]¶ Write data to a JSON file.
The data is first serialized to a JSON formatted string and then saved to disk.
- Parameters
filepath (str) – Path to the JSON file where the data will be saved.
data – Data to be written to the JSON file.
encoding (str, optional) – Encoding to be used for opening the JSON file.
sort_keys (bool, optional) – If
sort_keys
is true, then the output of dictionaries will be sorted by key. See thejson.dumps()
docstring description. (the default value is True).ensure_ascii (bool, optional) – If
ensure_ascii
is False, then the return value can contain non-ASCII characters if they appear in strings contained indata
. Otherwise, all such characters are escaped in JSON strings. See thejson.dumps()
docstring description (the default value is False).
- Raises
OSError – Raised if any I/O related occurs while writing the data to disk, e.g. the file doesn’t exist.
-
genutils.
get_creation_date
(filepath)[source]¶ Get creation date of a file.
Try to get the date that a file was created, falling back to when it was last modified if that isn’t possible.
If modification date is needed, use
os.path.getmtime()
which is cross-platform supported.- Parameters
filepath (str) – Path to file whose creation date will be returned.
- Returns
Time of creation in seconds.
- Return type
References
Code is from Stack Overflow’s user Mark Amery.
Examples
>>> from datetime import datetime >>> creation = get_creation_date("/Users/test/directory") >>> creation 1567701693.0 >>> str(datetime.fromtimestamp(creation)) '2019-09-05 12:41:33'
-
genutils.
get_current_local_datetime
()[source]¶ Get the current date and time based on the system’s time zone.
The modules
pytz
andtzlocal
need to be installed. You can install them withpip
:$ pip install tzlocal
This will also install
pytz
.- Returns
The date and time in the system’s time zone.
- Return type
- Raises
ImportError – Raised if the modules
tzlocal
andpytz
are not found.
See also
convert_utctime_to_local_tz()
converts a UTC time based on the system’s time zone.
Examples
>>> datetime_with_tz = get_current_local_datetime() >>> datetime_with_tz datetime.datetime(2019, 9, 5, 13, 34, 0, 678836, tzinfo=<DstTzInfo 'US/Eastern' EDT-1 day, 20:00:00 DST>) >>> str(datetime_with_tz) '2019-09-05 13:34:18.898435-04:00'
-
genutils.
load_pickle
(filepath)[source]¶ Open a pickle file.
The function opens a pickle file and returns its content.
- Parameters
filepath – Path to the pickle file
- Returns
Content of the pickle file.
- Return type
data
-
genutils.
load_yaml
(f)[source]¶ Load the content of a YAML file.
The module
yaml
needs to be installed. It can be installed withpip
:$ pip install pyyaml
- Parameters
f – File stream associated with the file read from disk.
- Returns
The dictionary read from the YAML file.
- Return type
- Raises
ImportError – Raised if the module
yaml
is not found.yaml.YAMLError – Raised if there is any error in the YAML structure of the file.
Notes
I got a
YAMLLoadWarning
when callingyaml.load()
withoutLoader
, as the default Loader is unsafe. You must specify a loader with theLoader=
argument. See PyYAML yaml.load(input) Deprecation.
-
genutils.
read_yaml
(filepath)[source]¶ Read a YAML file.
Its content is returned which is a
dict
.The module
yaml
needs to be installed. It can be installed withpip
:$ pip install pyyaml
- Parameters
filepath (str) – Path to the YAML file to be read.
- Returns
The
dict
read from the YAML file.- Return type
- Raises
ImportError – Raised if the module
yaml
is not found.OSError – Raised if any I/O related error occurs while reading the file, e.g. the file doesn’t exist or an error in the YAML structure of the file.
-
genutils.
run_cmd
(cmd)[source]¶ Run a command with arguments.
The command is given as a string but the function will split it in order to get a list having the name of the command and its arguments as items.
- Parameters
cmd (str) –
Command to be executed, e.g.
open -a TextEdit text.txt
- Returns
retcode – Return code which is 0 if the command was successfully completed. Otherwise, the return code is non-zero.
- Return type
Examples
TODO
-
genutils.
write_file
(filepath, data, overwrite_file=True)[source]¶ Write data (text mode) to a file.
- Parameters
- Raises
OSError – Raised if any I/O related error occurs while reading the file, e.g. the file doesn’t exist.
OverwriteFileError – Raised if an existing file is being overwritten and the flag to overwrite files is disabled.
pyutils.logutils
¶
Module that defines common logging functions.
See also
-
logutils.
get_error_msg
(exc)[source]¶ Get an error message from an exception.
It converts an error message of type
Exception
(e.g.sqlite3.IntegrityError
) into a string to then be logged.
-
logutils.
setup_logging
(logging_config)[source]¶ Setup logging from a YAML configuration file or logging dictionary.
Loggers can be setup through a YAML logging configuration file or logging dictionary which defines the loggers, their handlers, and the formatters (how log messages get displayed).
Also, a date and time can be added to the beginning of the log filename by setting the option
add_datetime
to True in the logging configuration file. Thus, you can generate log filenames like this:2019-08-29-00-58-22-debug.log
- Parameters
logging_config (str or dict) – The YAML configuration file path or the logging
dict
that is used to setup the logging. The contents of the logging dictionary is described in Configuration dictionary schema.- Returns
config_dict – The logging configuration
dict
that is used to setup the logger(s). The contents of the logging dictionary is described in Configuration dictionary schema.- Return type
- Raises
KeyError – Raised if a key in the logging config
dict
is not found.OSError – Raised if the YAML logging config file doesn’t exist.
ValueError – Raised if the YAML logging config
dict
is invalid, i.e. a key or value is invalid. Example: a logging handler’s class is written incorrectly.
Notes
For an example of a YAML logging configuration file, check Configuring Logging.
pyutils.saveutils
¶
Module that defines a class for saving webpages on disk.
Only the HTML content of the webpage is saved on disk, thus the other resources, such as pictures, might not get rendered when viewed on a browser.
-
class
saveutils.
SaveWebpages
(overwrite_webpages=False, http_get_timeout=5, delay_between_requests=8, headers={'Accept': 'text/html, application/xhtml+xml, application/xml;q=0.9, image/webp, */*;q=0.8', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36c'})[source]¶ Bases:
object
A class that saves webpages on disk.
The HTML content of the webpages is saved on disk. Thus, other resources (such as pictures) might not get rendered when viewed on a browser.
When retrieving webpages, a certain delay is introduced between HTTP requests to the server in order to reduce its workload.
- Parameters
overwrite_webpages (bool, optional) – Whether a webpage that is saved on disk can be overwritten (the default value is True which implies that the webpages can be overwritten on disk).
http_get_timeout (int, optional) – Timeout when a GET request doesn’t receive any response from the server. After the timeout expires, the GET request is dropped (the default value is 5 seconds).
headers (dict, optional) –
The information added to the HTTP GET request that a user’s browser sends to a Web server containing the details of what the browser wants and will accept back from the server. See HTTP request header (the default value is defined in
headers
).Its keys are the request headers’ field names like Accept, Cookie, User-Agent, or Referer and its values are the associated request headers’ field values. See List of all HTTP headers (Mozilla) and List of HTTP header fields (Wikipedia).
-
get_cached_webpage
(filepath)[source]¶ Load a webpage from disk.
Load the HTML content of a webpage from disk.
The webpages are cached in order to reduce the number of requests to the server.
-
get_webpage
(url)[source]¶ Get the HTMl content of a webpage.
When retrieving the webpage, a certain delay is introduced between HTTP requests to the server in order to reduce its workload.
- Parameters
url (str) – URL of the webpage whose HTML content will be retrieved.
- Raises
HTTP404Error – Raised if the server returns a 404 status code because the webpage is not found.
requests.RequestException – Raised if there is a
requests
-related error, e.g.requests.ConnectionError
if the URL is not known.
- Returns
html – HTML content of the webpage that is saved on disk.
- Return type
-
headers
= {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36c'}¶ The information added to the HTTP GET request that a user’s browser sends to a Web server containing the details of what the browser wants and will accept back from the server.
-
save_webpage
(filepath, url)[source]¶ Save a webpage on disk.
First, the webpage is checked if it’s already cached. If it’s found in cache, then its HTML content is simply returned.
If the webpage is not found in cache, then it’s retrieved from the server and saved on disk.
IMPORTANT: the webpage found on cache might also be overwritten if the option
overwrite_webpages
is set to True.- Parameters
- Raises
HTTP404Error – Raised if the server returns a 404 status code because the webpage is not found.
OverwriteFileError – Raised if an existing file is being overwritten and the flag to overwrite files is disabled.
OSError – Raised if an I/O related error occurs while writing the webpage on disk, e.g. the file doesn’t exist.
- Returns
html – HTML content of the webpage that is saved on disk.
- Return type
pyutils.exceptions
¶
exceptions.connection
¶
Module that defines exceptions related to connection problems.
These are the exceptions that are raised when querying a server for a resource.
exceptions.files
¶
Module that defines exceptions related to files problems.
These are the exceptions that are raised when reading or writing files.
exceptions.log
¶
Module that defines exceptions related to logging problems.
These are the exceptions that are raised when logging.
exceptions.sql
¶
Module that defines exceptions related to SQL database problems.
These are the exceptions that are raised when querying a SQL database.