renate.training.training module#

renate.training.training.run_training_job(mode, config_space, metric, backend, updater='ER', max_epochs=50, task_id='default_task', chunk_id=None, input_state_url=None, output_state_url=None, working_directory='renate_working_dir', dependencies=None, config_file=None, requirements_file=None, role=None, instance_type='ml.c5.xlarge', instance_count=1, instance_max_time=259200, max_time=None, max_num_trials_started=None, max_num_trials_completed=None, max_num_trials_finished=None, max_cost=None, n_workers=1, scheduler=None, scheduler_kwargs=None, seed=0, accelerator='auto', devices=1, strategy='ddp', precision='32', deterministic_trainer=False, gradient_clip_val=None, gradient_clip_algorithm=None, job_name='renate')[source]#

Starts updating the model including hyperparameter optimization.

Parameters:
  • mode (Literal['min', 'max']) – Declares the type of optimization problem: min or max.

  • config_space (Dict[str, Any]) – Details for defining your own search space is provided in the Syne Tune Documentation.

  • metric (str) – Name of metric to optimize.

  • backend (Literal['local', 'sagemaker']) – Whether to run jobs locally (local) or on SageMaker (sagemaker).

  • updater (str) – Updater used for model update.

  • max_epochs (int) – The maximum number of epochs used to train the model. For comparability between methods, epochs are interpreted as “finetuning-equivalent”. That is, one epoch is defined as len(current_task_dataset) / batch_size update steps.

  • task_id (str) – Unique identifier for the current task.

  • chunk_id (Optional[int]) – Unique identifier for the current data chunk.

  • input_state_url (Optional[str]) – Path to the Renate model state.

  • output_state_url (Optional[str]) – Path where Renate model state will be stored.

  • working_directory (Optional[str]) – Path to the working directory.

  • dependencies (Optional[List[str]]) – (SageMaker backend only) List of strings containing absolute or relative paths to files and directories that will be uploaded as part of the SageMaker training job.

  • config_file (Optional[str]) – File containing the definition of model_fn and data_module_fn.

  • requirements_file (Optional[str]) – (SageMaker backend only) Path to requirements.txt containing environment dependencies.

  • role (Optional[str]) – (SageMaker backend only) An AWS IAM role (either name or full ARN).

  • instance_type (str) – (SageMaker backend only) Sagemaker instance type for each worker.

  • instance_count (int) – (SageMaker backend only) Number of instances for each worker.

  • instance_max_time (float) – (SageMaker backend only) Requested maximum wall_clock time for each worker.

  • max_time (Optional[float]) – Stopping criterion: wall clock time.

  • max_num_trials_started (Optional[int]) – Stopping criterion: trials started.

  • max_num_trials_completed (Optional[int]) – Stopping criterion: trials completed.

  • max_num_trials_finished (Optional[int]) – Stopping criterion: trials finished.

  • max_cost (Optional[float]) – (SageMaker backend only) Stopping criterion: SageMaker cost.

  • n_workers (int) – Number of workers running in parallel.

  • scheduler (Union[str, Type[TrialScheduler], None]) – Default is random search, you can change it by providing either a string (random, bo, asha or rush) or scheduler class and its corresponding scheduler_kwargs if required. For latter option, see details at .

  • scheduler_kwargs (Optional[Dict]) – Only required if custom scheduler is provided.

  • seed (int) – Seed used for ensuring reproducibility.

  • accelerator (Literal['auto', 'cpu', 'gpu', 'tpu']) – Type of accelerator to use.

  • devices (int) – Number of devices to use per worker (set in n_workers).

  • strategy (str) – Name of the distributed training strategy to use. More details

  • precision (str) – Type of bit precision to use. More details

  • deterministic_trainer (bool) – When true the Trainer adopts a deterministic behaviour also on GPU.

  • gradient_clip_val (Optional[float]) – The value at which to clip gradients. Passing None disables it. More details

  • gradient_clip_algorithm (Optional[str]) – The gradient clipping algorithm to use. Can be norm or value. More details

  • job_name (str) – Prefix for the name of the SageMaker training job.

Return type:

Optional[Tuner]

renate.training.training.submit_remote_job(dependencies, role, instance_type, instance_count, instance_max_time, job_name, optional_dependencies=None, **job_kwargs)[source]#

Executes the training job on SageMaker.

See renate.train.run_training_job for a description of arguments.

Return type:

str