Data loading¶
In plotszoo data is organized in a plotszoo.data.DataCollection.
plotszoo.data.DataCollection collect two data types:
scalars: organized in a pandasDataFrameseries: organized in a pythondicthaving as keys the indices of thescalarsand as values time seriesDataFrame
Classes to pull data from common services are also provided such as plotszoo.data.WandbData
- 
class 
plotszoo.data.DataCollection¶ Base class for data collection
- Attributes:
 - scalars
 DataFramecointaning the scalars- series
 dictcointaining the time series
- 
align_series(to='longest', **kwargs)¶ Algin series to the longest or shortest one.
Reset all the series indices, chooses the new index according to the strategy and
reindexall the series- Args:
 - to
 alignment strategy (one of
longestorshortest) (Default:longest)- **kwargs
 keyword arguments for
pandasreindex
- Example::
 >>> data.align_series(to="longest", method="nearest")
- 
are_series_aligned()¶ Returns
Trueif all series share the same indices
- 
astype(columns, type=<class 'float'>)¶ Change the type of the scalars columns calling pandas
astype- Args:
 - columns
 List of colum to convert to numeric
- type
 Type to cast the column to (Default:
float)
- 
create_categorical(column, new_column)¶ Creates a new categorical column (0, 1, 2, …) from a textual one (“sin”, “cos”, “tan”, ..-)
- Args:
 - column
 Column to use as input
- new_column
 Name of the new categorical column
- 
create_scalar_from_series(scalar_name, agg_fn)¶ Create a new column of scalars using the corresponding time series
- Args:
 - scalar_name
 The name of the new scalar
- agg_fn
 Function to be called to the corresponding time series to create the scalar
Example:
>>> data.create_scalar_from_series("start_time", lambda s: s["timestamp"].min())
- 
dropna(columns)¶ Discard
NaNvalues from the scalars and also discard the corresponding time series- Args:
 - columns
 Columns to check for
NaN
- 
dropna_series(columns)¶ Drop all the rows where column in
NaNin series. Will probably unalign the series- Args:
 - columns
 Columns to check for
NaN
- 
fillna(column, value=0)¶ Sobsitute
NaNvalues from the scalars- Args:
 - column
 Column to check for
NaN
- 
fillna_series(column, value=0)¶ Sobsitute
NaNvalues from the series with a new one- Args:
 - column
 Series column to fill
- value
 Value to use (Default: 0)
- 
is_both()¶ Returns
Trueif theDataCollectioncontains both scalars and time series
- 
is_empty()¶ Returns
Trueif theDataCollectionis empty
- 
is_scalars()¶ Returns
Trueif theDataCollectioncontains scalars
- 
is_series()¶ Returns
Trueif theDataCollectioncontains time series
- 
rolling_series(column, new_column, fn='mean', **kwargs)¶ Apply
pandasrolling function to all the series- Args:
 - column
 Series column to apply the rolling to
- new_column
 Series column in which store the rolling function results
- fn
 pandasrolling function, for example"mean"` means ``series[column].rolling().mean()- **kwargs
 Keyword arguments for the
pandasrolling function
- Example::
 >>> data.rolling_series("reward", "mean_reward", window=20, fn="mean")
- 
set_scalars(data)¶ Set the scalars
- Args:
 - data
 The
DataFramecointaning the scalars
- 
set_series(series)¶ Set the series
- Args:
 - series
 The :class`dict` cointaning the time series
seriesmust be set afterscalarsthe
seriesdictmust have a key for each index of thescalars
- 
class 
plotszoo.data.WandbData(entity, project, query, cache=True, cache_dir='./.plotszoo-wandb-cache', verbose=True)¶ Retrive scalars and time series from wandb.
- Args:
 - entity
 wandbentity (username or team name)- project
 wandbproject- query
 MongoDB query for wandb (check here.)
- cache
 Cache retrived data (Default:
True)- cache_dir
 Directory to cache the data to (Default:
./.plotszoo-wandb-cache)- verbose
 Be verbose about pulling and caching (Default:
True)
- 
pull_scalars(state='finished', force_update=False)¶ Pull scalars from
wandb- Args:
 - state
 Filter the runs using their
state,Noneto disable (Default: “finished”)- force_update
 Force cache update (Default:
False)
- 
pull_series(scan_history=True, force_update=False)¶ Pull series from
wandb- Args:
 - scan_history
 Use wandb.Api.run.scan_history to pull the full history (Default:
True)- force_update
 Force cache update (Default:
False)
- 
class 
plotszoo.data.OptunaData(storage, study_name, cache=True, cache_dir='./.plotszoo-optuna-cache', verbose=True)¶ Retrive scalars and time series from an optuna. storage
- Args:
 - storage
 optunastorage (example:sqlite:///example.db)- study_name
 optunastudy name- cache
 Cache retrived data (Default:
True)- cache_dir
 Directory to cache the data to (Default:
./.plotszoo-optuna-cache)- verbose
 Be verbose about pulling and caching (Default:
True)
- 
pull_scalars(force_update=False)¶ Pull scalars from the
optunastorage- Args:
 - force_update
 Force cache update (Default:
False)