renate.evaluation.metrics.classification module#

renate.evaluation.metrics.classification.average_accuracy(results, task_id, num_instances)[source]#

Compute the average accuracy of a model.

This measure is defined by:

\[\frac{1}{T} sum_{i=1}^T a_{T,i}\]

where \(T\) is the number of tasks, \(a_{T,i}\) is the accuracy of the model on task \(i\), while having learned all tasks up to \(T\).

Parameters:
  • results (Dict[str, List[List[float]]]) – The results dictionary holding all the results with respect to all recorded metrics.

  • task_id (int) – The task index.

  • num_instances (List[int]) – Count of test data points for each update.

Return type:

float

renate.evaluation.metrics.classification.micro_average_accuracy(results, task_id, num_instances)[source]#

Compute the micro average accuracy of a model.

This measure is defined by the number of correctly classified data points divided by the total number of data points. If the number of data points is the same in each update step, this is the same as average_accuracy.

Parameters:
  • results (Dict[str, List[List[float]]]) – The results dictionary holding all the results with respect to all recorded metrics.

  • task_id (int) – The task index.

  • num_instances (List[int]) – Count of test data points for each update.

Return type:

float

renate.evaluation.metrics.classification.forgetting(results, task_id, num_instances)[source]#

Compute the forgetting measure of the model.

This measure is defined by:

\[\frac{1}{T-1} sum_{i=1}^{T-1} f_{T,i}\]

where \(f_{j,i}\) is defined as:

\[f_{j,i} = \max_{k\in\{1, \ldots, j=i\}} a_{k,i} - a_{j,i}\]

where \(T\) is the final task index, \(a_{n,i}\) is the test classification accuracy on task \(i\) after sequentially learning the nth task and \(f_{j,i}\) is a measure of forgetting on task \(i\) after training up to task \(j\).

Parameters:
  • results (Dict[str, List[List[float]]]) – The results dictionary holding all the results with respect to all recorded metrics.

  • task_id (int) – The task index.

  • num_instances (List[int]) – Count of test data points for each update.

Return type:

float

renate.evaluation.metrics.classification.backward_transfer(results, task_id, num_instances)[source]#

Compute the backward transfer measure of the model.

This measure is defined by:

\[\frac{1}{T-1} sum_{i=1}^{T-1} a_{T,i} - a_{i,i}\]

where \(T\) is the final task index, \(a_{n,i}\) is the test classification accuracy on task \(i\) after sequentially learning the nth task.

Parameters:
  • results (Dict[str, List[List[float]]]) – The results dictionary holding all the results with respect to all recorded metrics.

  • task_id (int) – The task index.

  • num_instances (List[int]) – Count of test data points for each update.

Return type:

float

renate.evaluation.metrics.classification.forward_transfer(results, task_id, num_instances)[source]#

Compute the forward transfer measure of the model.

This measure is defined by:

\[\frac{1}{T-1} sum_{i=2}^{T} a_{i-1,i} - b_{i}\]

where \(T\) is the final task index, \(a_{n,i}\) is the test classification accuracy on task \(i\) after sequentially learning the nth task. \(b_{i}\) is the accuracy for all \(T\) tasks recorded at initialisation prior to observing any task.

Parameters:
  • results (Dict[str, List[List[float]]]) – The results dictionary holding all the results with respect to all recorded metrics.

  • task_id (int) – The task index.

  • num_instances (List[int]) – Count of test data points for each update.

Return type:

float