Working with NLP and Large Language Models#
This example demonstrates how to use Renate to train NLP models. We will train a sequence classifier
to distinguish between positive and negative movie reviews. Using Renate, we will sequentially
train this model on two movie review datasets, called "imdb"
and "rotten_tomatoes"
.
Configuration#
Let us take a look at the renate_config.py
for this example. In the model_fn
function, we use the Hugging Face transformers
library to instantiate a sequence
classification model. Since this model is static, we can easily turn it into a RenateModule
by wrapping it in RenateWrapper
.
In the data_module_fn
, we load the matching tokenizer from the transformers
library.
We then use Renate’s HuggingfaceTextDataModule
to access datasets from the Hugging Face datasets hub. This
data module expects the name of a dataset as well as a tokenizer. Here, we load the "imdb"
dataset in the first training stage (chunk_id = 0
) and the "rotten_tomatoes"
dataset
for the subsequent model update (chunk_id = 1
).
The function loss_fn
defines the appropriate loss criterion. As this is a classification
problem we use torch.nn.CrossEntropyLoss
.
The data module will return pre-tokenized data and no further transforms are needed in this case.
from typing import Optional
import torch
import transformers
import renate.defaults as defaults
from renate.benchmark.datasets.nlp_datasets import HuggingFaceTextDataModule
from renate.data.data_module import RenateDataModule
from renate.models import RenateModule
from renate.models.renate_module import RenateWrapper
def model_fn(model_state_url: Optional[str] = None) -> RenateModule:
"""Returns a DistilBert classification model."""
transformer_model = transformers.DistilBertForSequenceClassification.from_pretrained(
"distilbert-base-uncased", num_labels=2, return_dict=False
)
model = RenateWrapper(transformer_model)
if model_state_url is not None:
state_dict = torch.load(model_state_url)
model.load_state_dict(state_dict)
return model
def loss_fn() -> torch.nn.Module:
return torch.nn.CrossEntropyLoss(reduction="none")
def data_module_fn(data_path: str, chunk_id: int, seed: int = defaults.SEED) -> RenateDataModule:
"""Returns one of two movie review datasets depending on `chunk_id`."""
tokenizer = transformers.DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
dataset_name = "imdb" if chunk_id else "rotten_tomatoes"
data_module = HuggingFaceTextDataModule(
data_path,
dataset_name=dataset_name,
tokenizer=tokenizer,
val_size=0.2,
seed=seed,
)
return data_module
Training#
As in previous examples, we also include a launch script called start.py
. For more details
on this see previous examples or How to Run a Training Job.
import boto3
from syne_tune.backend.sagemaker_backend.sagemaker_utils import get_execution_role
from renate.training import run_training_job
config_space = {
"optimizer": "SGD",
"momentum": 0.9,
"weight_decay": 0.0,
"learning_rate": 0.001,
"alpha": 0.5,
"batch_size": 64,
"batch_memory_frac": 0.5,
"memory_size": 300,
"loss_normalization": 0,
"loss_weight": 0.5,
}
if __name__ == "__main__":
AWS_ID = boto3.client("sts").get_caller_identity().get("Account")
AWS_REGION = "us-west-2" # use your AWS preferred region here
run_training_job(
config_space=config_space,
mode="min",
metric="val_loss",
updater="ER", # we train with Experience Replay
max_epochs=5,
config_file="renate_config.py",
# For this example, we can train on two binary movie review datasets: "rotten_tomatoes" and
# "imdb". Set chunk_id to [0, 1] to switch between the two.
chunk_id=0,
# replace the url below with a different one if you already ran it and you want to avoid
# overwriting
output_state_url=f"s3://sagemaker-{AWS_REGION}-{AWS_ID}/renate-training-nlp-finetuning/",
# uncomment the line below only if you already created a model with this script and you want
# to update it
# input_state_url=f"s3://sagemaker-{AWS_REGION}-{AWS_ID}/renate-training-nlp-finetuning/",
backend="sagemaker", # run on SageMaker, select "local" to run this locally
role=get_execution_role(),
instance_count=1,
instance_type="ml.g4dn.xlarge",
job_name="renate-training-nlp-finetuning",
devices=1,
strategy="deepspeed_stage_2",
precision="32",
)
Support for training large models#
To support training methods for larger models, we expose two arguments in the
run_experiment_job
to enable training on multiple GPUs. For this we exploit the
strategy functionality provided by Lightning
large model tutorial and
documentation. Currently, we
support
the strategies:
"ddp_find_unused_parameters_false"
"ddp"
"deepspeed"
"deepspeed_stage_1"
"deepspeed_stage_2"
"deepspeed_stage_2_offload"
"deepspeed_stage_3"
"deepspeed_stage_3_offload"
"deepspeed_stage_3_offload_nvme"
These can be enabled by passing one of the above options to strategy
. The number of devices
to be used for parallel training can be specified using devices
argument which defaults to
1
. We also support lower precision training by passing the precision
argument which
accepts the options "16"
, "32"
, "64"
, "bf16"
. Note that it has to be a string and not the
integer 32
. bf16
is restricted to newer hardware and thus need slightly more attention before
using it.
See last four lines in the previous code example.
devices=1,
strategy="deepspeed_stage_2",
precision="32",