Io
clean_df(df, col_idx=0)
¶
Cleans a DataFrame by dropping rows with null values in the specified column.
PARAMETER | DESCRIPTION |
---|---|
df
|
The input DataFrame.
TYPE:
|
col_idx
|
The index of the column to check for null values. Defaults to 0.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pl.DataFrame: A DataFrame with rows containing null values in the specified column removed. |
RAISES | DESCRIPTION |
---|---|
IndexError
|
If col_idx is out of range of the DataFrame's columns. |
Examples:
cli_peak(args)
¶
cli_prep(args)
¶
Prepare data for analysis using command-line arguments.
This function loads a file, cleans the DataFrame, prepares it for forecasting, and saves the result to a CSV file.
PARAMETER | DESCRIPTION |
---|---|
args
|
Command-line arguments parsed by argparse. Expected attributes:
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
None |
load_file(file_path, file_type=None, *args, **kwargs)
¶
Loads a file into a Polars DataFrame.
PARAMETER | DESCRIPTION |
---|---|
file_path
|
The path to the file.
TYPE:
|
file_type
|
The type of file to load. Supported types are Excel and csv.
TYPE:
|
*args
|
Additional positional arguments to pass to the Polars file reading function.
TYPE:
|
**kwargs
|
Additional keyword arguments to pass to the Polars file reading function.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pl.DataFrame: The loaded DataFrame. |
RAISES | DESCRIPTION |
---|---|
TypeError
|
If the |
Examples:
>>> import polars as pl
>>> df = load_file("data.xlsx")
>>> df = load_file("data.csv", file_type="csv")
Notes
We use either
read_excel
or
read_csv
from polars to read files.
Please refer to their respective documentation for args
or kwargs
that are available.
prep_forecast_df(df, date_idx, time_idx, y_idx, input_date_fmt='%m-%d-%y', input_time_fmt='%I:%M:%S %p', output_fmt='%Y-%m-%d %H:%M:%S')
¶
Prepares a DataFrame for forecasting by combining date and time columns, and formatting them.
PARAMETER | DESCRIPTION |
---|---|
df
|
The input DataFrame.
TYPE:
|
date_idx
|
The index of the date column.
TYPE:
|
time_idx
|
The index of the time column.
TYPE:
|
y_idx
|
The index of the target variable column.
TYPE:
|
input_date_fmt
|
The format of the input date strings. Defaults to "%m-%d-%y".
TYPE:
|
input_time_fmt
|
The format of the input time strings. Defaults to "%I:%M:%S %p".
TYPE:
|
output_fmt
|
The format of the output datetime strings. Defaults to "%Y-%m-%d %H:%M:%S".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
A DataFrame with a combined and formatted datetime column ready for forecasting. |
IndexError: If any of date_idx, time_idx, or y_idx are out of range of the DataFrame's columns. ValueError: If the date and time strings do not match the specified formats.
Notes
If date_idx
and time_idx
are the same, we combine input_date_fmt
and
input_time_fmt
and load from the specified column.
Examples:
import polars as pl data = {'date': ["01-01-23", "01-02-23"], 'time': ["01:00:00 PM", "02:00:00 PM"], 'y': [10, 20]} df = pl.DataFrame(data) prep_forecast_df(df, date_idx=0, time_idx=1, y_idx=2) shape: (2, 3) ┌─────────────────────┬───────┬─────────────┐ │ ds ┆ y ┆ unique_id │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════════════════════╪═══════╪═════════════╡ │ 2023-01-01 13:00:00 ┆ 10 ┆ 0 │ │ 2023-01-02 14:00:00 ┆ 20 ┆ 0 │ └─────────────────────┴───────┴─────────────┘