Run AWS Train Sagemaker Model

Description

This job triggers the running of a job to train a specific Sagemaker Model

Use Cases

We recommend creating a job for each model you want to train.

This way, you can use Orchestra to trigger your model training on a cron or event based schedule. This has a number of advantages vs. using AWS' in built workflows feature.

  • You can co-ordinate tasks outside of AWS - these would typically be other train jobs, ingestion jobs prior to training, or other tasks in AWS-adjacent environments e.g. Snowflake queries

  • You can use Orchestra to trigger jobs across AWS Accounts / Environments

  • When AWS Sagemaker train jobs run, cost is incurred. Running these operations on a schedule you set explicitly ensures these costs do not go out of hand

  • We aggregate metadata from the AWS Sagemaker task in the same place as the metadata from other operations in your Pipeline

Parameters

Orchestra uses the boto3 library and the create_training_job API. Documentation can be found here.

In Sagemaker each training job name must be unique. Orchestra requires the user to enter a prefix for the training job name and will append a unique identifier to the end of the job name. This is to ensure that the job name is unique. The unique identifier is the current UTC datetime in the following format: 'YYYY-MM-DD-HH-MM-SS'.

The other parameters for the task must be configured in the Parameters text block as JSON object. The keys of the JSON object must in CamelCase to match the expected keys of the boto3 library. An exmaple of the params can be found below:

{
  "HyperParameters": {},
  "AlgorithmSpecification": {
    "TrainingImage": "<ECR_IMAGE_URL>",
    "TrainingInputMode": "File"
  },
  "RoleArn": "<ROLE_ARN>",
  "InputDataConfig": [
    {
      "ChannelName": "train",
      "DataSource": {
        "S3DataSource": {
          "S3DataType": "S3Prefix",
          "S3Uri": "s3://<BUCKET>/<KEY>",
          "S3DataDistributionType": "FullyReplicated"
        }
      },
      "ContentType": "text/csv",
      "CompressionType": "None"
    }
  ],
  "OutputDataConfig": {
    "S3OutputPath": "s3://<BUCKET>/<KEY>/"
  },
  "ResourceConfig": {
    "InstanceType": "<INSTANCE_TYPE>",
    "InstanceCount": 1,
    "VolumeSizeInGB": 8
  },
  "Environment": {},
  "StoppingCondition": {
    "MaxRuntimeInSeconds": 43200
  }
}

Last updated