renate.benchmark.datasets.wild_time_data module#
- class renate.benchmark.datasets.wild_time_data.WildTimeDataModule(data_path, dataset_name, src_bucket=None, src_object_name=None, time_step=0, tokenizer=None, tokenizer_kwargs=None, val_size=0.0, seed=0)[source]#
Bases:
DataIncrementalDataModule
Data module wrapping around the Wild-Time data.
Huaxiu Yao, Caroline Choi, Bochuan Cao, Yoonho Lee, Pang Wei Koh, Chelsea Finn: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. NeurIPS 2022
- Parameters:
data_path¶ (
Union
[Path
,str
]) – the path to the folder containing the dataset files.dataset_name¶ (
str
) – Name of the wild time dataset.src_bucket¶ (
Optional
[str
]) – the name of the s3 bucket. If not provided, downloads the data from original source.src_object_name¶ (
Optional
[str
]) – the folder path in the s3 bucket.time_step¶ (
int
) – Time slice to be loaded.tokenizer¶ (
Optional
[PreTrainedTokenizer
]) – Tokenizer to apply to the dataset. See https://huggingface.co/docs/tokenizers/ for more information on tokenizers.tokenizer_kwargs¶ (
Optional
[Dict
[str
,Any
]]) – Keyword arguments passed when calling the tokenizer’s__call__
function.val_size¶ (
float
) – Fraction of the training data to be used for validation.seed¶ (
int
) – Seed used to fix random number generation.