Skip to content

Io

clean_df(df, col_idx=0)

Cleans a DataFrame by dropping rows with null values in the specified column.

PARAMETER DESCRIPTION
df

The input DataFrame.

TYPE: DataFrame

col_idx

The index of the column to check for null values. Defaults to 0.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
DataFrame

pl.DataFrame: A DataFrame with rows containing null values in the specified column removed.

RAISES DESCRIPTION
IndexError

If col_idx is out of range of the DataFrame's columns.

Examples:

>>> import polars as pl
>>> data = {'a': [1, 2, None], 'b': [4, None, 6]}
>>> df = pl.DataFrame(data)
>>> clean_df(df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ NaN │
└─────┴─────┘

cli_peak(args)

cli_prep(args)

Prepare data for analysis using command-line arguments.

This function loads a file, cleans the DataFrame, prepares it for forecasting, and saves the result to a CSV file.

PARAMETER DESCRIPTION
args

Command-line arguments parsed by argparse. Expected attributes:

  • file_path (str): Path to the input file.
  • date_idx (int): Index of the date column.
  • time_idx (int): Index of the time column.
  • y_idx (int): Index of the target variable column.
  • input_date_fmt (str): Format of the input date strings.
  • input_time_fmt (str): Format of the input time strings.
  • output_fmt (str): Format of the output datetime strings.
  • output (str): Name of the output CSV file.

TYPE: Namespace

RETURNS DESCRIPTION

None

load_file(file_path, file_type=None, *args, **kwargs)

Loads a file into a Polars DataFrame.

PARAMETER DESCRIPTION
file_path

The path to the file.

TYPE: str

file_type

The type of file to load. Supported types are Excel and csv.

TYPE: str DEFAULT: None

*args

Additional positional arguments to pass to the Polars file reading function.

TYPE: Any DEFAULT: ()

**kwargs

Additional keyword arguments to pass to the Polars file reading function.

TYPE: Any DEFAULT: {}

RETURNS DESCRIPTION
DataFrame

pl.DataFrame: The loaded DataFrame.

RAISES DESCRIPTION
TypeError

If the file_type is not supported.

Examples:

>>> import polars as pl
>>> df = load_file("data.xlsx")
>>> df = load_file("data.csv", file_type="csv")
Notes

We use either read_excel or read_csv from polars to read files. Please refer to their respective documentation for args or kwargs that are available.

prep_forecast_df(df, date_idx, time_idx, y_idx, input_date_fmt='%m-%d-%y', input_time_fmt='%I:%M:%S %p', output_fmt='%Y-%m-%d %H:%M:%S')

Prepares a DataFrame for forecasting by combining date and time columns, and formatting them.

PARAMETER DESCRIPTION
df

The input DataFrame.

TYPE: DataFrame

date_idx

The index of the date column.

TYPE: int

time_idx

The index of the time column.

TYPE: int

y_idx

The index of the target variable column.

TYPE: int

input_date_fmt

The format of the input date strings. Defaults to "%m-%d-%y".

TYPE: str DEFAULT: '%m-%d-%y'

input_time_fmt

The format of the input time strings. Defaults to "%I:%M:%S %p".

TYPE: str DEFAULT: '%I:%M:%S %p'

output_fmt

The format of the output datetime strings. Defaults to "%Y-%m-%d %H:%M:%S".

TYPE: str DEFAULT: '%Y-%m-%d %H:%M:%S'

RETURNS DESCRIPTION
DataFrame

A DataFrame with a combined and formatted datetime column ready for forecasting.

IndexError: If any of date_idx, time_idx, or y_idx are out of range of the DataFrame's columns. ValueError: If the date and time strings do not match the specified formats.

Notes

If date_idx and time_idx are the same, we combine input_date_fmt and input_time_fmt and load from the specified column.

Examples:

import polars as pl data = {'date': ["01-01-23", "01-02-23"], 'time': ["01:00:00 PM", "02:00:00 PM"], 'y': [10, 20]} df = pl.DataFrame(data) prep_forecast_df(df, date_idx=0, time_idx=1, y_idx=2) shape: (2, 3) ┌─────────────────────┬───────┬─────────────┐ │ ds ┆ y ┆ unique_id │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════════════════════╪═══════╪═════════════╡ │ 2023-01-01 13:00:00 ┆ 10 ┆ 0 │ │ 2023-01-02 14:00:00 ┆ 20 ┆ 0 │ └─────────────────────┴───────┴─────────────┘