Renate Benchmarks#
Renate features a variety of models,
datasets and scenarios.
This allows for evaluating the different Renate updaters on many standard benchmarks.
In the following, we describe how to combine the different components to your very own benchmark.
Independent of the benchmark, they all are started via execute_experiment_job()
.
The function call below demonstrates the simplest setup where the benchmark is run locally.
The benchmark will be configured by config_space
.
from renate.benchmark.experimentation import execute_experiment_job, experiment_config_file
execute_experiment_job(
backend="local",
config_file=experiment_config_file(),
config_space=config_space,
experiment_outputs_url="results/",
mode="max",
metric="val_accuracy",
num_updates=5,
)
Models#
You can select the model by assigning the corresponding name to config_space["model_name"]
.
For example, to use a ResNet-18 model, you use
config_space["model_name"] = "ResNet18"
Each model may have independent arguments. The ResNet-18 model requires to define the number of outputs.
config_space["num_outputs"] = 10
The full list of models and model names including a short description is provided in the following table.
Model Name |
Description |
Additional Inputs |
---|---|---|
Neural network consisting of a sequence of dense layers. |
|
|
18 layer ResNet CNN architecture |
|
|
34 layer ResNet CNN architecture |
|
|
50 layer ResNet CNN architecture |
|
|
18 layer ResNet CNN architecture for small image sizes (approx 32x32) |
|
|
34 layer ResNet CNN architecture for small image sizes (approx 32x32) |
|
|
50 layer ResNet CNN architecture for small image sizes (approx 32x32) |
|
|
Base Vision Transformer architecture for images of size 32x32 with patch size 4. |
|
|
Base Vision Transformer architecture for images of size 224x224 with patch size 16. |
|
|
Base Vision Transformer architecture for images of size 224x224 with patch size 32. |
|
|
Large Vision Transformer architecture for images of size 224x224 with patch size 16. |
|
|
Large Vision Transformer architecture for images of size 224x224 with patch size 32. |
|
|
Huge Vision Transformer architecture for images of size 224x224 with patch size 14. |
|
|
Wrapper around Hugging Face transformers. |
|
|
Learning to Prompt Transformer. Supports both text and vision transformers. |
|
|
|
Datasets#
Similarly, you select the dataset by assigning the corresponding name to config_space["dataset_name"]
.
For example, to use the CIFAR-10 dataset with 10% of the data used for validation, you use
config_space["dataset_name"] = "CIFAR10"
config_space["val_size"] = 0.1
The following table contains the list of supported datasets.
Dataset Name |
Task |
Data Summary |
Reference |
---|---|---|---|
arxiv |
Text Classification: category recognition of arXiv papers. |
~1.9M train, ~206k test, 172 classes, years 2007-2023 |
Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022. |
CIFAR10 |
Image Classification |
50k train, 10k test, 10 classes, image shape 32x32x3 |
Alex Krizhevsky: Learning Multiple Layers of Features from Tiny Images. 2009. |
CIFAR100 |
Image Classification |
50k train, 10k test, 100 classes, image shape 32x32x3 |
Alex Krizhevsky: Learning Multiple Layers of Features from Tiny Images. 2009. |
CLEAR10 |
Image Classification |
10 different datasets, one for each year. Each with 3,300 train, 550 test, 11 classes |
Zhiqiu Lin et al.: The CLEAR Benchmark: Continual LEArning on Real-World Imagery. NeurIPS Datasets and Benchmarks 2021. |
CLEAR100 |
Image Classification |
11 different datasets, one for each year. Each with roughly 10k train, 5k test, 100 classes |
Zhiqiu Lin et al.: The CLEAR Benchmark: Continual LEArning on Real-World Imagery. NeurIPS Datasets and Benchmarks 2021. |
DomainNet |
Image Classification |
6 datasets from different domains. 345 classes, number of train and test image varies |
Xingchao Peng et al.: Moment Matching for Multi-Source Domain Adaptation. ICCV 2019. |
FashionMNIST |
Image Classification |
60k train, 10k test, 10 classes, image shape 28x28x1 |
Han Xiao et al.: Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. 2017. |
fmow |
Image Classification: land use recognition from satellite images. |
62 classes, image shape 32x32x3 |
Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022. |
huffpost |
Text Classification: category recognition of news paper articles. |
~58k train, ~6k test, 11 classes, years 2012-2019 |
Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022. |
MNIST |
Image Classification |
60k train, 10k test, 10 classes, image shape 28x28x1 |
Li Deng: The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Processing Magazine. 2012. |
MultiText |
Text Classification |
115k train, 7.6k test, access to one of four datasets: ag_news, yelp_review_full, dbpedia_14, yahoo_answers_topics |
Please refer to the official documentation. |
yearbook |
Image Classification: gender identification in yearbook photos. |
~33k train, ~4k test, 2 classes, years 1930-2013, image shape 32x32x1 |
Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022. |
hfd-{dataset_name} |
multiple |
Any Hugging Face dataset can be used. Just prepend the prefix |
Please refer to the official documentation. |
CDDB |
Image Classification: deepfake detection |
2 classes, 5 domains, each generated using image generation techniques: GauGAN, BigGAN, WildDeepfake, WhichFaceReal, SAN respectively from HARD evaluation scenario. Numbers vary across domains. |
Li, Chuqiao, et al. A continual deepfake detection benchmark: Dataset, methods, and essentials. IEEE/CVF Winter Conference on Applications of Computer Vision. 2023. |
Core50 |
Image Classfication |
50 classes, 8 (0-7) domains for training, a single test set for evaluation. |
Vincenzo Lomonaco and Davide Maltoni: CORe50: a new Dataset and Benchmark for continual Object Recognition. 1st Annual Conference on Robot Learning, PMLR 78:17-26, 2017. |
Scenarios#
A scenario defines how the dataset is split into several data partitions.
While running the benchmark, the model is trained sequentially on each data partition.
The scenario can be selected by setting config_space["scenario_name"]
accordingly.
Each scenario might have specific settings which require additional changes of config_space
.
We will describe those settings in the table below.
The following is an example that uses the class-incremental scenario which splits the entire dataset into 2 parts. The first part contains all instances with classes 1 and 2, the second with classes 3 and 4.
config_space["scenario_name"] = "ClassIncrementalScenario"
config_space["groupings"] = ((1, 2), (3, 4))
Scenario Name |
Description |
Settings |
---|---|---|
Used in combination only with |
|
|
Creates data partitions by splitting the data according to class labels. |
|
|
Splits data into different tasks after sorting the data according to a specific feature. Can be used for image data as well. In that case channels are selected and we select according to average channel value. Random permutations may be applied to have a less strict sorting. |
|
|
A specific scenario only for image data. Very similar to
|
|
|
Divides the dataset uniformly at random into equally-sized partitions. |
|
|
Creates data partitions by rotating the images by different angles. |
|
|
Creates data partitions by randomly permuting the input features. |
|
Example: Class-incremental Learning on CIFAR-10#
The following example reproduces the results shown in Dark Experience for General Continual Learning: a Strong, Simple Baseline Table 2, Buffer 500, S-CIFAR-10, Class-IL, DER++. These settings use a class-incremental scenario in which the CIFAR-10 dataset is partitioned into 5 parts, each containing two unique classes. Dark Experience Replay++ is used as the updating method with a memory buffer size of 500 and the experiment is repeated 10 times.
from renate.benchmark.experimentation import execute_experiment_job, experiment_config_file
config_space = {
"updater": "DER",
"optimizer": "SGD",
"momentum": 0.0,
"weight_decay": 0.0,
"learning_rate": 0.03,
"alpha": 0.2,
"beta": 0.5,
"batch_size": 64,
"batch_memory_frac": 0.5,
"memory_size": 500,
"max_epochs": 50,
"loss_normalization": 0,
"loss_weight": 1.0,
"model_name": "ResNet18CIFAR",
"scenario_name": "ClassIncrementalScenario",
"dataset_name": "CIFAR10",
"val_size": 0,
"groupings": ((0, 1), (2, 3), (4, 5), (6, 7), (8, 9)),
"num_outputs": 10,
}
for seed in range(10):
execute_experiment_job(
backend="local",
config_file=experiment_config_file(),
config_space=config_space,
experiment_outputs_url=f"results/{seed}/",
mode="max",
metric="val_accuracy",
num_updates=5,
seed=seed,
)