Renate Benchmarks#

Renate features a variety of models, datasets and scenarios. This allows for evaluating the different Renate updaters on many standard benchmarks. In the following, we describe how to combine the different components to your very own benchmark. Independent of the benchmark, they all are started via execute_experiment_job(). The function call below demonstrates the simplest setup where the benchmark is run locally. The benchmark will be configured by config_space.

from renate.benchmark.experimentation import execute_experiment_job, experiment_config_file

execute_experiment_job(
    backend="local",
    config_file=experiment_config_file(),
    config_space=config_space,
    experiment_outputs_url="results/",
    mode="max",
    metric="val_accuracy",
    num_updates=5,
)

Models#

You can select the model by assigning the corresponding name to config_space["model_name"]. For example, to use a ResNet-18 model, you use

config_space["model_name"] = "ResNet18"

Each model may have independent arguments. The ResNet-18 model requires to define the number of outputs.

config_space["num_outputs"] = 10

The full list of models and model names including a short description is provided in the following table.

Renate Model Overview#

Model Name

Description

Additional Inputs

MultiLayerPerceptron

Neural network consisting of a sequence of dense layers.

  • num_inputs: Input dimensionality of data.

  • num_outputs: Output dimensionality, for classification the number of classes.

  • num_hidden_layers: Number of hidden layers.

  • hidden_size: Size of hidden layers, can be int or Tuple[int].

ResNet18

18 layer ResNet CNN architecture

  • num_outputs: Output dimensionality, for classification the number of classes.

ResNet34

34 layer ResNet CNN architecture

  • num_outputs: Output dimensionality, for classification the number of classes.

ResNet50

50 layer ResNet CNN architecture

  • num_outputs: Output dimensionality, for classification the number of classes.

ResNet18CIFAR

18 layer ResNet CNN architecture for small image sizes (approx 32x32)

  • num_outputs: Output dimensionality, for classification the number of classes.

ResNet34CIFAR

34 layer ResNet CNN architecture for small image sizes (approx 32x32)

  • num_outputs: Output dimensionality, for classification the number of classes.

ResNet50CIFAR

50 layer ResNet CNN architecture for small image sizes (approx 32x32)

  • num_outputs: Output dimensionality, for classification the number of classes.

VisionTransformerCIFAR

Base Vision Transformer architecture for images of size 32x32 with patch size 4.

  • num_outputs: Output dimensionality, for classification the number of classes.

VisionTransformerB16

Base Vision Transformer architecture for images of size 224x224 with patch size 16.

  • num_outputs: Output dimensionality, for classification the number of classes.

VisionTransformerB32

Base Vision Transformer architecture for images of size 224x224 with patch size 32.

  • num_outputs: Output dimensionality, for classification the number of classes.

VisionTransformerL16

Large Vision Transformer architecture for images of size 224x224 with patch size 16.

  • num_outputs: Output dimensionality, for classification the number of classes.

VisionTransformerL32

Large Vision Transformer architecture for images of size 224x224 with patch size 32.

  • num_outputs: Output dimensionality, for classification the number of classes.

VisionTransformerH14

Huge Vision Transformer architecture for images of size 224x224 with patch size 14.

  • num_outputs: Output dimensionality, for classification the number of classes.

HuggingFaceSequenceClassificationTransformer

Wrapper around Hugging Face transformers.

  • pretrained_model_name_or_path: Hugging Face transformer ID.

  • num_outputs: The number of classes.

LearningToPromptTransformer

Learning to Prompt Transformer. Supports both text and vision transformers.

  • pretrained_model_name_or_path: Hugging Face transformer ID.

  • num_outputs: The number of classes.

  • pool_size: Total number of prompts in the prompt pool.

  • pool_selection_size: Number of prompts to select for each input from the pool.

  • prompt_size: Number of input tokens each prompt is equivalent to.

  • prompt_key_dim: Dimenensionality of the features used for prompt matching.

SPromptTransformer

S-Prompt Transformer.

  • pretrained_model_name_or_path: Hugging Face transformer ID.

  • num_outputs: The number of classes.

  • prompt_size: Number of input tokens each prompt is equivalent to.

  • clusters_per_task: Number of clusters for K-Means in task identification.

  • per_task_classifier: Flag to share or use individual classifier per task.

Datasets#

Similarly, you select the dataset by assigning the corresponding name to config_space["dataset_name"]. For example, to use the CIFAR-10 dataset with 10% of the data used for validation, you use

config_space["dataset_name"] = "CIFAR10"
config_space["val_size"] = 0.1

The following table contains the list of supported datasets.

Renate Dataset Overview#

Dataset Name

Task

Data Summary

Reference

arxiv

Text Classification: category recognition of arXiv papers.

~1.9M train, ~206k test, 172 classes, years 2007-2023

Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.

CIFAR10

Image Classification

50k train, 10k test, 10 classes, image shape 32x32x3

Alex Krizhevsky: Learning Multiple Layers of Features from Tiny Images. 2009.

CIFAR100

Image Classification

50k train, 10k test, 100 classes, image shape 32x32x3

Alex Krizhevsky: Learning Multiple Layers of Features from Tiny Images. 2009.

CLEAR10

Image Classification

10 different datasets, one for each year. Each with 3,300 train, 550 test, 11 classes

Zhiqiu Lin et al.: The CLEAR Benchmark: Continual LEArning on Real-World Imagery. NeurIPS Datasets and Benchmarks 2021.

CLEAR100

Image Classification

11 different datasets, one for each year. Each with roughly 10k train, 5k test, 100 classes

Zhiqiu Lin et al.: The CLEAR Benchmark: Continual LEArning on Real-World Imagery. NeurIPS Datasets and Benchmarks 2021.

DomainNet

Image Classification

6 datasets from different domains. 345 classes, number of train and test image varies

Xingchao Peng et al.: Moment Matching for Multi-Source Domain Adaptation. ICCV 2019.

FashionMNIST

Image Classification

60k train, 10k test, 10 classes, image shape 28x28x1

Han Xiao et al.: Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. 2017.

fmow

Image Classification: land use recognition from satellite images.

62 classes, image shape 32x32x3

Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.

huffpost

Text Classification: category recognition of news paper articles.

~58k train, ~6k test, 11 classes, years 2012-2019

Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.

MNIST

Image Classification

60k train, 10k test, 10 classes, image shape 28x28x1

Li Deng: The MNIST Database of Handwritten Digit Images for Machine Learning Research. IEEE Signal Processing Magazine. 2012.

MultiText

Text Classification

115k train, 7.6k test, access to one of four datasets: ag_news, yelp_review_full, dbpedia_14, yahoo_answers_topics

Please refer to the official documentation.

yearbook

Image Classification: gender identification in yearbook photos.

~33k train, ~4k test, 2 classes, years 1930-2013, image shape 32x32x1

Huaxiu Yao et al.: Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time. Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022.

hfd-{dataset_name}

multiple

Any Hugging Face dataset can be used. Just prepend the prefix hfd-, e.g., hfd-rotten_tomatoes. Select input and target columns via config_space, e.g., add "input_column": "text", "target_column": "label" for the rotten_tomatoes example.

Please refer to the official documentation.

CDDB

Image Classification: deepfake detection

2 classes, 5 domains, each generated using image generation techniques: GauGAN, BigGAN, WildDeepfake, WhichFaceReal, SAN respectively from HARD evaluation scenario. Numbers vary across domains.

Li, Chuqiao, et al. A continual deepfake detection benchmark: Dataset, methods, and essentials. IEEE/CVF Winter Conference on Applications of Computer Vision. 2023.

Core50

Image Classfication

50 classes, 8 (0-7) domains for training, a single test set for evaluation.

Vincenzo Lomonaco and Davide Maltoni: CORe50: a new Dataset and Benchmark for continual Object Recognition. 1st Annual Conference on Robot Learning, PMLR 78:17-26, 2017.

Scenarios#

A scenario defines how the dataset is split into several data partitions. While running the benchmark, the model is trained sequentially on each data partition. The scenario can be selected by setting config_space["scenario_name"] accordingly. Each scenario might have specific settings which require additional changes of config_space. We will describe those settings in the table below.

The following is an example that uses the class-incremental scenario which splits the entire dataset into 2 parts. The first part contains all instances with classes 1 and 2, the second with classes 3 and 4.

config_space["scenario_name"] = "ClassIncrementalScenario"
config_space["groupings"] = ((1, 2), (3, 4))
Renate Scenario Overview#

Scenario Name

Description

Settings

DataIncrementalScenario

Used in combination only with DataIncrementalDataModule, e.g., Wild-Time datasets, CLEAR, MultiText, or DomainNet. Data is presented data by data, where the data could represent a domain or a time slice.

  • num_tasks: You can provide this argument if the different datasets are identified by ids 0 to num_tasks. This is the case for time-incremental datasets such as CLEAR or Wild-Time.

  • data_ids: Tuple of data identifiers. Used for DomainNet to select order or subset of domains, e.g., ("clipart", "infograph", "painting").

  • groupings: An alternative to data identifiers that in addition to defining the sequence allows to combine different domains to one chunk, e.g., (("clipart", ), ("infograph", "painting")).

ClassIncrementalScenario

Creates data partitions by splitting the data according to class labels.

  • groupings: Tuple of tuples containing the class labels, e.g., ((1, ), (2, 3, 4)).

FeatureSortingScenario

Splits data into different tasks after sorting the data according to a specific feature. Can be used for image data as well. In that case channels are selected and we select according to average channel value. Random permutations may be applied to have a less strict sorting.

  • num_tasks: Number of data partitions.

  • feature_idx: The feature index used for sorting.

  • randomness: After sorting, 0.5 * N * randomness random pairs in the sequence are swapped where N is the number of data points. This must be a value between 0 and 1. This allows for creating less strict sorted scenarios.

HueShiftScenario

A specific scenario only for image data. Very similar to FeatureSortingScenario but this scenario sorts according to the hue value of an image. Sorting can be less strict by applying random permutations.

  • num_tasks: Number of data partitions.

  • randomness: After sorting, 0.5 * N * randomness random pairs in the sequence are swapped where N is the number of data points. This must be a value between 0 and 1. This allows for creating less strict sorted scenarios.

IIDScenario

Divides the dataset uniformly at random into equally-sized partitions.

  • num_tasks: Number of data partitions.

ImageRotationScenario

Creates data partitions by rotating the images by different angles.

  • degrees: Tuple of degrees, e.g., (45, 90, 180).

PermutationScenario

Creates data partitions by randomly permuting the input features.

  • num_tasks: Number of data partitions.

  • input_dim: Data dimensionality (tuple or int as string).

Example: Class-incremental Learning on CIFAR-10#

The following example reproduces the results shown in Dark Experience for General Continual Learning: a Strong, Simple Baseline Table 2, Buffer 500, S-CIFAR-10, Class-IL, DER++. These settings use a class-incremental scenario in which the CIFAR-10 dataset is partitioned into 5 parts, each containing two unique classes. Dark Experience Replay++ is used as the updating method with a memory buffer size of 500 and the experiment is repeated 10 times.

from renate.benchmark.experimentation import execute_experiment_job, experiment_config_file


config_space = {
    "updater": "DER",
    "optimizer": "SGD",
    "momentum": 0.0,
    "weight_decay": 0.0,
    "learning_rate": 0.03,
    "alpha": 0.2,
    "beta": 0.5,
    "batch_size": 64,
    "batch_memory_frac": 0.5,
    "memory_size": 500,
    "max_epochs": 50,
    "loss_normalization": 0,
    "loss_weight": 1.0,
    "model_name": "ResNet18CIFAR",
    "scenario_name": "ClassIncrementalScenario",
    "dataset_name": "CIFAR10",
    "val_size": 0,
    "groupings": ((0, 1), (2, 3), (4, 5), (6, 7), (8, 9)),
    "num_outputs": 10,
}

for seed in range(10):
    execute_experiment_job(
        backend="local",
        config_file=experiment_config_file(),
        config_space=config_space,
        experiment_outputs_url=f"results/{seed}/",
        mode="max",
        metric="val_accuracy",
        num_updates=5,
        seed=seed,
    )