Renate Benchmarks#

Renate features a variety of models, datasets and scenarios. This allows for evaluating the different Renate updaters on many standard benchmarks. In the following, we describe how to combine the different components to your very own benchmark. Independent of the benchmark, they all are started via execute_experiment_job(). The function call below demonstrates the simplest setup where the benchmark is run locally. The benchmark will be configured by config_space.

from renate.benchmark.experimentation import execute_experiment_job, experiment_config_file

execute_experiment_job(
    backend="local",
    config_file=experiment_config_file(),
    config_space=config_space,
    experiment_outputs_url="results/",
    mode="max",
    metric="val_accuracy",
    num_updates=5,
)

Models#

You can select the model by assigning the corresponding name to config_space["model_name"]. For example, to use a ResNet-18 model, you use

config_space["model_name"] = "ResNet18"

Each model may have independent arguments. The ResNet-18 model requires to define the number of outputs.

config_space["num_outputs"] = 10

The full list of models and model names including a short description is provided in the following table.

Renate Model Overview#
Model Name	Description	Additional Inputs
`MultiLayerPerceptron`	Neural network consisting of a sequence of dense layers.	`num_inputs`: Input dimensionality of data. `num_outputs`: Output dimensionality, for classification the number of classes. `num_hidden_layers`: Number of hidden layers. `hidden_size`: Size of hidden layers, can be `int` or `Tuple[int]`.
`ResNet18`	18 layer ResNet CNN architecture	`num_outputs`: Output dimensionality, for classification the number of classes.
`ResNet34`	34 layer ResNet CNN architecture	`num_outputs`: Output dimensionality, for classification the number of classes.
`ResNet50`	50 layer ResNet CNN architecture	`num_outputs`: Output dimensionality, for classification the number of classes.
`ResNet18CIFAR`	18 layer ResNet CNN architecture for small image sizes (approx 32x32)	`num_outputs`: Output dimensionality, for classification the number of classes.
`ResNet34CIFAR`	34 layer ResNet CNN architecture for small image sizes (approx 32x32)	`num_outputs`: Output dimensionality, for classification the number of classes.
`ResNet50CIFAR`	50 layer ResNet CNN architecture for small image sizes (approx 32x32)	`num_outputs`: Output dimensionality, for classification the number of classes.
`VisionTransformerCIFAR`	Base Vision Transformer architecture for images of size 32x32 with patch size 4.	`num_outputs`: Output dimensionality, for classification the number of classes.
`VisionTransformerB16`	Base Vision Transformer architecture for images of size 224x224 with patch size 16.	`num_outputs`: Output dimensionality, for classification the number of classes.
`VisionTransformerB32`	Base Vision Transformer architecture for images of size 224x224 with patch size 32.	`num_outputs`: Output dimensionality, for classification the number of classes.
`VisionTransformerL16`	Large Vision Transformer architecture for images of size 224x224 with patch size 16.	`num_outputs`: Output dimensionality, for classification the number of classes.
`VisionTransformerL32`	Large Vision Transformer architecture for images of size 224x224 with patch size 32.	`num_outputs`: Output dimensionality, for classification the number of classes.
`VisionTransformerH14`	Huge Vision Transformer architecture for images of size 224x224 with patch size 14.	`num_outputs`: Output dimensionality, for classification the number of classes.
`HuggingFaceSequenceClassificationTransformer`	Wrapper around Hugging Face transformers.	`pretrained_model_name_or_path`: Hugging Face transformer ID. `num_outputs`: The number of classes.
`LearningToPromptTransformer`	Learning to Prompt Transformer. Supports both text and vision transformers.	`pretrained_model_name_or_path`: Hugging Face transformer ID. `num_outputs`: The number of classes. `pool_size`: Total number of prompts in the prompt pool. `pool_selection_size`: Number of prompts to select for each input from the pool. `prompt_size`: Number of input tokens each prompt is equivalent to. `prompt_key_dim`: Dimenensionality of the features used for prompt matching.
`SPromptTransformer`	S-Prompt Transformer.	`pretrained_model_name_or_path`: Hugging Face transformer ID. `num_outputs`: The number of classes. `prompt_size`: Number of input tokens each prompt is equivalent to. `clusters_per_task`: Number of clusters for K-Means in task identification. `per_task_classifier`: Flag to share or use individual classifier per task.

Datasets#

Similarly, you select the dataset by assigning the corresponding name to config_space["dataset_name"]. For example, to use the CIFAR-10 dataset with 10% of the data used for validation, you use

config_space["dataset_name"] = "CIFAR10"
config_space["val_size"] = 0.1

The following table contains the list of supported datasets.

Renate Dataset Overview#
Dataset Name	Task	Data Summary	Reference
arxiv	Text Classification: category recognition of arXiv papers.	~1.9M train, ~206k test, 172 classes, years 2007-2023	Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.
CIFAR10	Image Classification	50k train, 10k test, 10 classes, image shape 32x32x3	Alex Krizhevsky: Learning Multiple Layers of Features from Tiny Images. 2009.
CIFAR100	Image Classification	50k train, 10k test, 100 classes, image shape 32x32x3	Alex Krizhevsky: Learning Multiple Layers of Features from Tiny Images. 2009.
CLEAR10	Image Classification	10 different datasets, one for each year. Each with 3,300 train, 550 test, 11 classes	Zhiqiu Lin et al.: The CLEAR Benchmark: Continual LEArning on Real-World Imagery. NeurIPS Datasets and Benchmarks 2021.
CLEAR100	Image Classification	11 different datasets, one for each year. Each with roughly 10k train, 5k test, 100 classes	Zhiqiu Lin et al.: The CLEAR Benchmark: Continual LEArning on Real-World Imagery. NeurIPS Datasets and Benchmarks 2021.
DomainNet	Image Classification	6 datasets from different domains. 345 classes, number of train and test image varies	Xingchao Peng et al.: Moment Matching for Multi-Source Domain Adaptation. ICCV 2019.
FashionMNIST	Image Classification	60k train, 10k test, 10 classes, image shape 28x28x1	Han Xiao et al.: Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. 2017.
fmow	Image Classification: land use recognition from satellite images.	62 classes, image shape 32x32x3	Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.
huffpost	Text Classification: category recognition of news paper articles.	~58k train, ~6k test, 11 classes, years 2012-2019	Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.
MNIST	Image Classification	60k train, 10k test, 10 classes, image shape 28x28x1	Li Deng: The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Processing Magazine. 2012.
MultiText	Text Classification	115k train, 7.6k test, access to one of four datasets: ag_news, yelp_review_full, dbpedia_14, yahoo_answers_topics	Please refer to the official documentation.
yearbook	Image Classification: gender identification in yearbook photos.	~33k train, ~4k test, 2 classes, years 1930-2013, image shape 32x32x1	Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.
hfd-{dataset_name}	multiple	Any Hugging Face dataset can be used. Just prepend the prefix `hfd-`, e.g., `hfd-rotten_tomatoes`. Select input and target columns via `config_space`, e.g., add `"input_column": "text", "target_column": "label"` for the rotten_tomatoes example.	Please refer to the official documentation.
CDDB	Image Classification: deepfake detection	2 classes, 5 domains, each generated using image generation techniques: GauGAN, BigGAN, WildDeepfake, WhichFaceReal, SAN respectively from HARD evaluation scenario. Numbers vary across domains.	Li, Chuqiao, et al. A continual deepfake detection benchmark: Dataset, methods, and essentials. IEEE/CVF Winter Conference on Applications of Computer Vision. 2023.
Core50	Image Classfication	50 classes, 8 (0-7) domains for training, a single test set for evaluation.	Vincenzo Lomonaco and Davide Maltoni: CORe50: a new Dataset and Benchmark for continual Object Recognition. 1st Annual Conference on Robot Learning, PMLR 78:17-26, 2017.

Scenarios#

A scenario defines how the dataset is split into several data partitions. While running the benchmark, the model is trained sequentially on each data partition. The scenario can be selected by setting config_space["scenario_name"] accordingly. Each scenario might have specific settings which require additional changes of config_space. We will describe those settings in the table below.

The following is an example that uses the class-incremental scenario which splits the entire dataset into 2 parts. The first part contains all instances with classes 1 and 2, the second with classes 3 and 4.

config_space["scenario_name"] = "ClassIncrementalScenario"
config_space["groupings"] = ((1, 2), (3, 4))

Renate Scenario Overview#
Scenario Name	Description	Settings
`DataIncrementalScenario`	Used in combination only with `DataIncrementalDataModule`, e.g., Wild-Time datasets, CLEAR, MultiText, or DomainNet. Data is presented data by data, where the data could represent a domain or a time slice.	`num_tasks`: You can provide this argument if the different datasets are identified by ids 0 to `num_tasks`. This is the case for time-incremental datasets such as CLEAR or Wild-Time. `data_ids`: Tuple of data identifiers. Used for DomainNet to select order or subset of domains, e.g., `("clipart", "infograph", "painting")`. `groupings`: An alternative to data identifiers that in addition to defining the sequence allows to combine different domains to one chunk, e.g., `(("clipart", ), ("infograph", "painting"))`.
`ClassIncrementalScenario`	Creates data partitions by splitting the data according to class labels.	`groupings`: Tuple of tuples containing the class labels, e.g., `((1, ), (2, 3, 4))`.
`FeatureSortingScenario`	Splits data into different tasks after sorting the data according to a specific feature. Can be used for image data as well. In that case channels are selected and we select according to average channel value. Random permutations may be applied to have a less strict sorting.	`num_tasks`: Number of data partitions. `feature_idx`: The feature index used for sorting. `randomness`: After sorting, `0.5 * N * randomness` random pairs in the sequence are swapped where `N` is the number of data points. This must be a value between 0 and 1. This allows for creating less strict sorted scenarios.
`HueShiftScenario`	A specific scenario only for image data. Very similar to `FeatureSortingScenario` but this scenario sorts according to the hue value of an image. Sorting can be less strict by applying random permutations.	`num_tasks`: Number of data partitions. `randomness`: After sorting, `0.5 * N * randomness` random pairs in the sequence are swapped where `N` is the number of data points. This must be a value between 0 and 1. This allows for creating less strict sorted scenarios.
`IIDScenario`	Divides the dataset uniformly at random into equally-sized partitions.	`num_tasks`: Number of data partitions.
`ImageRotationScenario`	Creates data partitions by rotating the images by different angles.	`degrees`: Tuple of degrees, e.g., `(45, 90, 180)`.
`PermutationScenario`	Creates data partitions by randomly permuting the input features.	`num_tasks`: Number of data partitions. `input_dim`: Data dimensionality (tuple or int as string).

Example: Class-incremental Learning on CIFAR-10#

The following example reproduces the results shown in Dark Experience for General Continual Learning: a Strong, Simple Baseline Table 2, Buffer 500, S-CIFAR-10, Class-IL, DER++. These settings use a class-incremental scenario in which the CIFAR-10 dataset is partitioned into 5 parts, each containing two unique classes. Dark Experience Replay++ is used as the updating method with a memory buffer size of 500 and the experiment is repeated 10 times.

from renate.benchmark.experimentation import execute_experiment_job, experiment_config_file


config_space = {
    "updater": "DER",
    "optimizer": "SGD",
    "momentum": 0.0,
    "weight_decay": 0.0,
    "learning_rate": 0.03,
    "alpha": 0.2,
    "beta": 0.5,
    "batch_size": 64,
    "batch_memory_frac": 0.5,
    "memory_size": 500,
    "max_epochs": 50,
    "loss_normalization": 0,
    "loss_weight": 1.0,
    "model_name": "ResNet18CIFAR",
    "scenario_name": "ClassIncrementalScenario",
    "dataset_name": "CIFAR10",
    "val_size": 0,
    "groupings": ((0, 1), (2, 3), (4, 5), (6, 7), (8, 9)),
    "num_outputs": 10,
}

for seed in range(10):
    execute_experiment_job(
        backend="local",
        config_file=experiment_config_file(),
        config_space=config_space,
        experiment_outputs_url=f"results/{seed}/",
        mode="max",
        metric="val_accuracy",
        num_updates=5,
        seed=seed,
    )