renate.utils.file module#

renate.utils.file.get_aws_region()[source]#

Returns the name of the AWS region used during the execution.

Return type:

str

renate.utils.file.get_bucket()[source]#

Returns the default S3 bucket.

Return type:

str

renate.utils.file.is_s3_uri(uri)[source]#

Checks if the uri is an S3 uri.

Return type:

bool

renate.utils.file.move_to_uri(src, dst, ignore_extensions=['.sagemaker-uploading', '.sagemaker-uploaded'])[source]#

Moves files in directory or file to directory or s3.

If the files exist they are overwritten. The files in the local directory are deleted.

Parameters:
  • src (Union[Path, str]) – Local file or directory to move.

  • dst (str) – Target directory or s3 uri.

  • ignore_extensions (List[str]) – List of extensions to ignore.

Return type:

None

renate.utils.file.copy_to_uri(src, dst, ignore_extensions=['.sagemaker-uploading', '.sagemaker-uploaded'])[source]#

Copies files in directory or file to directory or s3.

If the files exist they are overwritten. The files in the local directory are preserved.

Parameters:
  • src (Union[Path, str]) – Local directory to copy.

  • dst (str) – Target directory or s3 uri.

  • ignore_extensions (List[str]) – List of extensions to ignore.

Return type:

None

renate.utils.file.maybe_download_from_s3(url, local_dir)[source]#

Tries to download a file from S3.

Return type:

str

renate.utils.file.download_folder_from_s3(src_bucket, src_object_name, dst_dir)[source]#

Downloads folder from S3 to local disk.

Return type:

None

renate.utils.file.upload_folder_to_s3(local_dir, s3_url=None, dst_bucket=None, prefix=None, ignore_extensions=['.sagemaker-uploading', '.sagemaker-uploaded'])[source]#

Uploads all files within a local folder to s3.

Parameters:
  • local_dir (Union[Path, str]) – Folder containing files to be uploaded.

  • s3_url (Union[str, Path, None]) – Full path to s3 location.

  • dst_bucket (Optional[str]) – s3 bucket.

  • prefix (Optional[str]) – Prefix for all s3 object names.

  • ignore_extensions (List[str]) – List of extensions to ignore.

Return type:

None

renate.utils.file.download_file_from_s3(src_bucket, src_object_name, dst)[source]#

Downloads file from S3 to local disk

Parameters:
  • src_bucket (str) – Source S3 bucket

  • src_object_name (Union[Path, str]) – Source S3 object

  • dst (Union[Path, str]) – local destination

Return type:

None

renate.utils.file.upload_file_to_s3(src, s3_url=None, dst_bucket=None, dst_object_name=None)[source]#

Upload a file to an S3 bucket

Parameters:
  • src (Union[Path, str]) – File to upload.

  • s3_url (Union[str, Path, None]) – Full path to s3 location.

  • dst_bucket (Optional[str]) – Destination S3 bucket

  • dst_object_name (Union[str, Path, None]) – Destination S3 object

Return type:

bool

Returns:

True if file was uploaded, else False

renate.utils.file.delete_file_from_s3(bucket, object_name)[source]#

Delete file from the S3 bucket

Parameters:
  • bucket (str) – bucket in which the object (file) is stored

  • object_name (str) – object to be deleted

Return type:

None

renate.utils.file.extract_file(dataset_name, data_path, file_name)[source]#

Extract .zip or .tar depending on the flag files into folder named with dataset name.

Return type:

None

renate.utils.file.download_file(dataset_name, data_path, src_bucket, src_object_name, url, file_name)[source]#

A helper function to download data from URL or s3.

Return type:

None

renate.utils.file.download_and_unzip_file(dataset_name, data_path, src_bucket, src_object_name, url, file_name)[source]#

A helper function to download data .zips and uncompress them.

Return type:

None

renate.utils.file.save_pandas_df_to_csv(df, file_path)[source]#

A helper function to save pandas dataframe to a .csv.

It guarantees that the saved dataframes across Renate are consistent.

Return type:

DataFrame

Function to remove files and folders.

Unlink works for files, rmdir for empty folders, but not for non-empty ones. Hence a recursive solution.

Return type:

None