Run AWS Glue ETL Job

Description

This job triggers the running of an ETL Job in AWS Glue

Use Cases

We recommend creating a ETL Job for each ingestion/transformation task in AWS Glue you wish to perform.

This way, you can use Orchestra to trigger your reverse ELT on a cron or event based schedule. This has a number of advantages vs. using AWS' in built workflows feature.

  • You can co-ordinate tasks outside of AWS - these would typically be other ETL jobs, other notebooks, or other tasks in AWS-adjacent environments e.g. Snowflake queries

  • You can use Orchestra to trigger jobs across AWS Accounts / Environments

  • When AWS Glue jobs run, cost is incurred. Running these operations on a schedule you set explicitly ensures these costs do not go out of hand

  • We aggregate metadata from the AWS Glue Task in the same place as the metadata from other operations in your Pipeline

Parameters and setup

These parameters are required to run the Run Workflow Task

NameData typeRestrictionsExample

Job name

String

N.A.

S3 Ingestion

Arguments

JSON

JSON format

Region

String

AWS region

us-east-1

Create AWS Glue ETL job

  1. Navigate to the AWS Glue console

  2. Create an ETL job

Using Arguments

You can set arguments to send when starting your ETL job. These much be in a JSON format

{
  "environment": "staging"
}

Last updated