PyTOA5: Utilities for TOA5 Files¶
This library contains routines for the processing of data files in the TOA5 format. Since this format is basically a CSV file with a specific header, this library primarily provides functions to handle the header; the rest of the file can be read with Python’s csv module. A function to read a TOA5 file into a Pandas DataFrame is also provided.
[ Source code on GitHub | Author, Copyright, and License ]
TL;DR¶
Code examples with
csv
:toa5.read_header()
Code example with
pandas
:toa5.read_pandas()
Documentation¶
[ Index ]
TOA5 files are essentially CSV files that have four header rows:
The “environment line”:
EnvironmentLine
The column header names:
ColumnHeader.name
The columns’ units:
ColumnHeader.unit
The columns’ “data process”:
ColumnHeader.prc
The following two functions can be used to read files with this header:
- toa5.read_header(csv_reader: Iterator[Sequence[str]], *, allow_dupes: bool = False) tuple[EnvironmentLine, tuple[ColumnHeader, ...]] ¶
Read the header of a TOA5 file.
A common use case to read a TOA5 file would be the following; as you can see, the main difference between reading a regular CSV file and a TOA5 file is the additional call to this function.
>>> import csv, toa5 >>> with open('Example.dat', encoding='ASCII', newline='') as fh: ... csv_rd = csv.reader(fh, strict=True) ... env_line, columns = toa5.read_header(csv_rd) ... print([ toa5.short_name(col) for col in columns ]) ... for row in csv_rd: ... print(row) ['TIMESTAMP', 'RECORD', 'BattV_Min[V]'] ['2021-06-19 00:00:00', '0', '12.99'] ['2021-06-20 00:00:00', '1', '12.96']
This also works with
csv.DictReader
:>>> import csv, toa5 >>> with open('Example.dat', encoding='ASCII', newline='') as fh: ... env_line, columns = toa5.read_header(csv.reader(fh, strict=True)) ... for row in csv.DictReader(fh, strict=True, ... fieldnames=[toa5.short_name(col) for col in columns]): ... print(row) {'TIMESTAMP': '2021-06-19 00:00:00', 'RECORD': '0', 'BattV_Min[V]': '12.99'} {'TIMESTAMP': '2021-06-20 00:00:00', 'RECORD': '1', 'BattV_Min[V]': '12.96'}
- Seealso:
short_name()
, used in the examples above, is an alias fordefault_col_hdr_transform()
.- Parameters:
csv_reader – The
csv.reader()
object to read the header rows from. Only the header is read from the file, so after you call this function, you can use the reader to read the data rows from the input file.allow_dupes – Whether or not to allow duplicates in the
ColumnHeader.name
values.
- Returns:
Returns an
EnvironmentLine
object and a tuple ofColumnHeader
objects.- Raises:
Toa5Error – In case any error is encountered while reading the TOA5 header.
- toa5.read_pandas(filepath_or_buffer, *, encoding: str = 'UTF-8', encoding_errors: str = 'strict', col_trans: ~collections.abc.Callable[[~toa5.ColumnHeader], str] = <function default_col_hdr_transform>, **kwargs)¶
A helper function to read TOA5 files into a
pandas.DataFrame
. Usesread_header()
andpandas.read_csv()
internally.>>> import toa5, pandas >>> df = toa5.read_pandas('Example.dat', low_memory=False) >>> print(df) RECORD BattV_Min[V] TIMESTAMP 2021-06-19 0 12.99 2021-06-20 1 12.96 >>> print(df.attrs['toa5_env_line']) EnvironmentLine(station_name='TestLogger', logger_model='CR1000X', logger_serial='12342', logger_os='CR1000X.Std.03.02', program_name='CPU:TestLogger.CR1X', program_sig='2438', table_name='Example')
- Parameters:
filepath_or_buffer –
A filename or file object from which to read the TOA5 data.
Note
Unlike
pandas.read_csv()
, URLs are not accepted, only such filenames that Python’sopen()
accepts.col_trans – The
ColumnHeaderTransformer
to use to convert theColumnHeader
objects into column names. Defaults todefault_col_hdr_transform()
kwargs – Any additional keyword arguments are passed through to
pandas.read_csv()
. It is not recommended to setheader
andnames
, since they are provided by this function. Other options that this function provides by default, such asna_values
orindex_col
, may be overridden.
- Returns:
A
pandas.DataFrame
. TheEnvironmentLine
is stored inpandas.DataFrame.attrs
under the key"toa5_env_line"
.Note
At the time of writing,
pandas.DataFrame.attrs
is documented as being experimental.
- class toa5.EnvironmentLine(station_name: str, logger_model: str, logger_serial: str, logger_os: str, program_name: str, program_sig: str, table_name: str)¶
Named tuple representing a TOA5 “Environment Line”, giving details about the data logger and its program.
- class toa5.ColumnHeader(name: str, unit: str = '', prc: str = '')¶
Named tuple representing a column header.
This class represents a column header as it would be read from a text (CSV) file, therefore, when optional fields are empty, this is represented by empty strings, not by
None
.- simple_checks(*, strict: bool = True) str ¶
Validates the values in this object against some rules mostly derived from experience:
name
must start with letters, an underscore, or dollar sign, and otherwise only consist of letters, numbers, underscores, and dollar sign, optionally followed by indices (integers separated by commas) in parentheses. May not be longer than 255 characters in total.unit
is fairly lenient and currently allows most printable ASCII characters except backslash. May not be longer than 64 characters in total.prc
is fairly strict and currently allows only up to 32 letters, numbers, underscores, and dashes.
Important
Since these rules are derived from experience, they may be adapted in the future, and they may not accurately reflect the rules your data logger imposes on the values. This should normally not be a problem, because within this library, this function is currently only used to generate warnings in
default_col_hdr_transform()
, and you are free to disable itsstrict
option.Please feel free to suggest changes.
- Parameters:
strict – Whether or not to raise an error for invalid values.
- Returns:
Returns the empty string if there are no problems. When
strict
is off and problems are detected, returns a string describing the problems.- Raises:
ValueError – When
strict
is on and any unusual values are detected.
- toa5.write_header(env_line: EnvironmentLine, columns: Sequence[ColumnHeader]) Generator[Sequence[str], None, None] ¶
Convert an
EnvironmentLine
and sequence ofColumnHeader
objects back into the four TOA5 header rows, suitable for use in e.g.writerows()
.
- toa5.ColumnHeaderTransformer¶
A type for a function that takes a
ColumnHeader
and turns it into a single string. Seedefault_col_hdr_transform()
.
- toa5.default_col_hdr_transform(col: ColumnHeader, *, short_units: dict[str, str] | None = None, strict: bool = True) str ¶
The default function used to transform a
ColumnHeader
into a single string.This conversion is slightly opinionated and will:
strip all whitespace from
ColumnHeader
values,append
ColumnHeader.prc
to the name with a slash (unless the name already ends with it),append the units in square brackets shorten some units, and
ignore the “TS” and “RN” units on the “TIMESTAMP” and “RECORD” columns, respectively.
- Parameters:
col – The
ColumnHeader
to process.short_units – A lookup table in which the keys are the original unit names as they appear in the TOA5 file, and the values are a shorter version of that unit. If not provided, defaults to
SHORTER_UNITS
.strict – When this is enabled (the default), raise a
ValueError
if the column name contains the characters/[]
, which might cause duplicate column names in a table, and warn ifColumnHeader.simple_checks()
fails.
- toa5.short_name(col: ColumnHeader, *, short_units: dict[str, str] | None = None, strict: bool = True) str ¶
A short alias for
default_col_hdr_transform()
.
- toa5.SHORTER_UNITS: dict[str, str]¶
A table of shorter versions of common units, used in
default_col_hdr_transform()
.
- toa5.sql_col_hdr_transform(col: ColumnHeader) str ¶
An alternative function that transforms a
ColumnHeader
to a string suitable for use in SQL.appends
ColumnHeader.prc
(unless the name already ends with it)any characters that are not ASCII letters or numbers are converted to underscores (and consecutive underscores are reduced to a single one)
the returned name is all-lowercase
units are omitted (these could be stored in an SQL column comment, for example)
Warning
This transformation can potentially result in two columns on the same table having the same name, for example, this would be the case with
ColumnHeader("Test_1","Volts","Smp")
andColumnHeader("Test(1)","","Smp")
, which would both result in"test_1_smp"
.Therefore, it is strongly recommended that you check for duplicate column names after using this transformer. For example, see
more_itertools.classify_unique()
.- Parameters:
col – The
ColumnHeader
to process.
- exception toa5.Toa5Error¶
An error class for
read_header()
.
Command-Line TOA5-to-CSV Tool¶
The following is a command-line interface to convert a TOA5 file’s headers to a single row, which makes it more suitable for processing in other programs that expect CSV files with a single header row.
If this module and its scripts have been installed correctly, you should be able
to run toa5-to-csv --help
or python -m toa5.to_csv --help
for details.
usage: toa5.to_csv [-h] [-o OUT_FILE] [-l ENV_LINE_FILE]
[-d {excel,excel-tab,unix}] [-n] [-s] [-a] [-e IN_ENCODING]
[-c OUT_ENCODING] [-t] [-j]
[TOA5FILE]
TOA5 to CSV Converter
positional arguments:
TOA5FILE The TOA5 file to process ("-"=STDIN)
options:
-h, --help show this help message and exit
-o, --out-file OUT_FILE
Output filename ("-"=STDOUT)
-l, --env-line ENV_LINE_FILE
JSON file for environment line ("-"=STDOUT)
-d, --out-dialect {excel,excel-tab,unix}
Output CSV dialect (see Python `csv` module)
-n, --simple-names Simpler column names (no units etc.)
-s, --sql-names Transform column names to be suitable for SQL
-a, --allow-dupes Allow duplicate column names (in input and output)
-e, --in-encoding IN_ENCODING
Input file encoding (default UTF-8)
-c, --out-encoding OUT_ENCODING
Output file encoding (default UTF-8)
-t, --require-timestamp
Require first column to be TIMESTAMP
-j, --allow-jagged Allow rows to have differing column counts
Details can be found at https://haukex.github.io/pytoa5/
Changelog¶
v0.9.2 - 2024-10-21¶
Potentially incompatible changes:
Added
strict
totoa5.default_col_hdr_transform()
and enabled it by default, so the characters/[]
are now not allowed in column namestoa5.default_col_hdr_transform()
now strips whitespacetoa5.default_col_hdr_transform()
andtoa5.sql_col_hdr_transform()
now no longer drop “Smp” fromtoa5.ColumnHeader.prc
Therefore, temporarily marked this project as “Beta”
v0.9.1 - 2024-10-19¶
Actually allow overriding
toa5.read_pandas()
arguments (didn’t work as documented)Made
toa5.read_pandas()
arguments more flexible: accept filename as well, and allow overriding all arguments.Added
--sql-names
and--allow-dupes
to CLIA few documentation updates.
v0.9.0 - 2024-10-18¶
Initial release