Skip to content

Utils

datetime_to_float(df, time_unit='days', date_column='ds', date_fmt='%Y-%m-%d %H:%M:%S')

get_baseline_df(df, date_column='ds', date_fmt='%Y-%m-%d %H:%M:%S', baseline=None)

Get baseline DataFrame if baseline is provided.

PARAMETER DESCRIPTION
df

The input DataFrame.

TYPE: DataFrame

date_column

The name of the date column in the DataFrame.

TYPE: str DEFAULT: 'ds'

date_fmt

The date format string for parsing the date column.

TYPE: str DEFAULT: '%Y-%m-%d %H:%M:%S'

baseline

The number of hours to use as baseline. If None, returns the original DataFrame.

TYPE: float | int | None DEFAULT: None

RETURNS DESCRIPTION
DataFrame

The baseline DataFrame or the original DataFrame if baseline is None.

split_df(df, hours=72.0, date_column='ds', *args, **kwargs)

Splits a DataFrame into multiple sets based on time window(s).

This function takes a Polars DataFrame with a timestamp column named ds, parses the timestamps, and splits the DataFrame into \(n\) parts.

PARAMETER DESCRIPTION
df

The input DataFrame containing a 'ds' column with timestamps.

TYPE: DataFrame

hours

Time duration (exclusive bounds) to include in \(n - 1\) splits. If this is 24, then two splits with the following rows will be provided:

  • Up to, but not including, 24 hours,
  • Any rows at and after 24 hours.

TYPE: float | int | Collection[float | int] DEFAULT: 72.0

date_column

column name containing date information.

TYPE: str DEFAULT: 'ds'

*args

Additional positional arguments.

TYPE: tuple[Any, ...] DEFAULT: ()

**kwargs

Additional keyword arguments.

TYPE: dict[str, Any] DEFAULT: {}

RETURNS DESCRIPTION
tuple[DataFrame, ...]

A tuple containing DataFrame splits.

RAISES DESCRIPTION
ValueError

If the 'ds' column is not found in the DataFrame.

TypeError

If 'hours' is not a float or int.

Examples:

>>> import polars as pl
>>> data = {
...     'ds': ["2023-01-01 00:00:00", "2023-01-02 00:00:00", "2023-01-03 00:00:00",
...            "2023-01-04 00:00:00", "2023-01-05 00:00:00"],
...     'value': [10, 20, 30, 40, 50]
... }
>>> df = pl.DataFrame(data)
>>> splits = split_df(df, hours=[24, 48])
>>> len(splits)
3
>>> splits[0]
shape: (1, 2)
┌─────────────────────┬───────┐
 ds                   value 
 ---                  ---   
 datetime[ms]         i64   
╞═════════════════════╪═══════╡
 2023-01-01 00:00:00  10    
└─────────────────────┴───────┘
>>> splits[1]
shape: (2, 2)
┌─────────────────────┬───────┐
 ds                   value 
 ---                  ---   
 datetime[ms]         i64   
╞═════════════════════╪═══════╡
 2023-01-02 00:00:00  20    
 2023-01-03 00:00:00  30    
└─────────────────────┴───────┘
>>> splits[2]
shape: (2, 2)
┌─────────────────────┬───────┐
 ds                   value 
 ---                  ---   
 datetime[ms]         i64   
╞═════════════════════╪═══════╡
 2023-01-04 00:00:00  40    
 2023-01-05 00:00:00  50    
└─────────────────────┴───────┘

str_to_datetime(df, date_column='ds', date_fmt='%Y-%m-%d %H:%M:%S')

Converts DataFrame datetime column strings to datetimes.