Data loading¶
In plotszoo data is organized in a plotszoo.data.DataCollection.
plotszoo.data.DataCollection collect two data types:
scalars: organized in a pandasDataFrameseries: organized in a pythondicthaving as keys the indices of thescalarsand as values time seriesDataFrame
Classes to pull data from common services are also provided such as plotszoo.data.WandbData
-
class
plotszoo.data.DataCollection¶ Base class for data collection
- Attributes:
- scalars
DataFramecointaning the scalars- series
dictcointaining the time series
-
align_series(to='longest', **kwargs)¶ Algin series to the longest or shortest one.
Reset all the series indices, chooses the new index according to the strategy and
reindexall the series- Args:
- to
alignment strategy (one of
longestorshortest) (Default:longest)- **kwargs
keyword arguments for
pandasreindex
- Example::
>>> data.align_series(to="longest", method="nearest")
-
are_series_aligned()¶ Returns
Trueif all series share the same indices
-
astype(columns, type=<class 'float'>)¶ Change the type of the scalars columns calling pandas
astype- Args:
- columns
List of colum to convert to numeric
- type
Type to cast the column to (Default:
float)
-
create_categorical(column, new_column)¶ Creates a new categorical column (0, 1, 2, …) from a textual one (“sin”, “cos”, “tan”, ..-)
- Args:
- column
Column to use as input
- new_column
Name of the new categorical column
-
create_scalar_from_series(scalar_name, agg_fn)¶ Create a new column of scalars using the corresponding time series
- Args:
- scalar_name
The name of the new scalar
- agg_fn
Function to be called to the corresponding time series to create the scalar
Example:
>>> data.create_scalar_from_series("start_time", lambda s: s["timestamp"].min())
-
dropna(columns)¶ Discard
NaNvalues from the scalars and also discard the corresponding time series- Args:
- columns
Columns to check for
NaN
-
dropna_series(columns)¶ Drop all the rows where column in
NaNin series. Will probably unalign the series- Args:
- columns
Columns to check for
NaN
-
fillna(column, value=0)¶ Sobsitute
NaNvalues from the scalars- Args:
- column
Column to check for
NaN
-
fillna_series(column, value=0)¶ Sobsitute
NaNvalues from the series with a new one- Args:
- column
Series column to fill
- value
Value to use (Default: 0)
-
is_both()¶ Returns
Trueif theDataCollectioncontains both scalars and time series
-
is_empty()¶ Returns
Trueif theDataCollectionis empty
-
is_scalars()¶ Returns
Trueif theDataCollectioncontains scalars
-
is_series()¶ Returns
Trueif theDataCollectioncontains time series
-
rolling_series(column, new_column, fn='mean', **kwargs)¶ Apply
pandasrolling function to all the series- Args:
- column
Series column to apply the rolling to
- new_column
Series column in which store the rolling function results
- fn
pandasrolling function, for example"mean"` means ``series[column].rolling().mean()- **kwargs
Keyword arguments for the
pandasrolling function
- Example::
>>> data.rolling_series("reward", "mean_reward", window=20, fn="mean")
-
set_scalars(data)¶ Set the scalars
- Args:
- data
The
DataFramecointaning the scalars
-
set_series(series)¶ Set the series
- Args:
- series
The :class`dict` cointaning the time series
seriesmust be set afterscalarsthe
seriesdictmust have a key for each index of thescalars
-
class
plotszoo.data.WandbData(entity, project, query, cache=True, cache_dir='./.plotszoo-wandb-cache', verbose=True)¶ Retrive scalars and time series from wandb.
- Args:
- entity
wandbentity (username or team name)- project
wandbproject- query
MongoDB query for wandb (check here.)
- cache
Cache retrived data (Default:
True)- cache_dir
Directory to cache the data to (Default:
./.plotszoo-wandb-cache)- verbose
Be verbose about pulling and caching (Default:
True)
-
pull_scalars(state='finished', force_update=False)¶ Pull scalars from
wandb- Args:
- state
Filter the runs using their
state,Noneto disable (Default: “finished”)- force_update
Force cache update (Default:
False)
-
pull_series(scan_history=True, force_update=False)¶ Pull series from
wandb- Args:
- scan_history
Use wandb.Api.run.scan_history to pull the full history (Default:
True)- force_update
Force cache update (Default:
False)
-
class
plotszoo.data.OptunaData(storage, study_name, cache=True, cache_dir='./.plotszoo-optuna-cache', verbose=True)¶ Retrive scalars and time series from an optuna. storage
- Args:
- storage
optunastorage (example:sqlite:///example.db)- study_name
optunastudy name- cache
Cache retrived data (Default:
True)- cache_dir
Directory to cache the data to (Default:
./.plotszoo-optuna-cache)- verbose
Be verbose about pulling and caching (Default:
True)
-
pull_scalars(force_update=False)¶ Pull scalars from the
optunastorage- Args:
- force_update
Force cache update (Default:
False)