prep
¶
Invoked by: vaxstats prep
The prep
command is designed to clean and prepare data from various experimental sources for uniform processing in vaxstats
. It addresses the common challenge of inconsistent data formats by allowing you to specify the crucial data elements and their formats.
Usage¶
Examples¶
Required Arguments¶
Argument | Description | Example |
---|---|---|
file_path |
Path to Excel or CSV file containing the data. | 2024-08-04.xlsx , data/exp001.csv |
--date_idx |
Column index containing date information. | 0 , 1 , etc. |
--time_idx |
Column index containing time information. | 0 , 1 , etc. |
--y_idx |
Column index containing the target variable (e.g., temperatures). | 0 , 1 , etc. |
Note
All column indices are zero-based, meaning the first column has index 0
.
Optional Arguments¶
Option | Description | Default | Example |
---|---|---|---|
--input_date_fmt |
Format of the input date strings. | %m-%d-%y |
%Y-%m-%d , %d/%m/%Y |
--input_time_fmt |
Format of the input time strings. | %I:%M:%S %p |
%H:%M:%S , %I:%M %p |
--output_fmt |
Format of the output datetime strings. | %Y-%m-%d %H:%M:%S |
%Y-%m-%dT%H:%M:%S |
--output |
Name of the output CSV file. | output.csv |
prepared_data.csv |
Input Format Specifications¶
- For
--input_date_fmt
and--input_time_fmt
, use Python's strftime format codes. - Common format specifiers:
%Y
: Year with century (e.g., 2024)%y
: Year without century as zero-padded decimal number (01, 02, 99).%m
: Month as a zero-padded number (01-12)%d
: Day of the month as a zero-padded number (01-31)%H
: Hour (24-hour clock) as a zero-padded number (00-23)%I
: Hour (12-hour clock) as a zero-padded number (01-12)%M
: Minute as a zero-padded number (00-59)%S
: Second as a zero-padded number (00-59)%p
: Locale's equivalent of AM or PM
Output¶
The prep
command will generate a CSV file (default name: output.csv
) with the following columns:
unique_id
: A unique identifier for each row (always set to 0 in the current version).ds
: The combined and formatted date and time.y
: The target variable (from the column specified by--y_idx
).
This output format is designed to be compatible with various forecasting and analysis tools.
Tip
If your input data has a non-standard date or time format, use the --input_date_fmt
and --input_time_fmt
options to specify the correct format. This ensures accurate parsing of your datetime information.