🗒️Tasks

The smallest unit of work.

A Task is the most basic unit of execution in Orchestration. Tasks are arranged into pipelines, and then have upstream and downstream dependencies set between them into order to express the order they should run in.

Unlike in workflow orchestration tools, tasks are not arbitrary pieces of code waiting to be executed - Orchestra does not currently support the running of arbitrary code as a task. A major difference is that users execute Integration Jobs, which you can think of as pre-configured tasks. The analog to these in Airflow would be Operators.

For example, users can execute a dbt run job which triggers a dbt job in dbt Cloud, configures monitoring, monitors the task, fetches, stores, and cleans metadata. The only parameter required is a job id . Doing this using a workflow orchestration tool would require multiple files and quite a lot of code, so we think this is a great improvement in managing complexity.

Features

Tasks are equipped with out-the-box features that ensure extremely robust orchestration and a seamless user experience. These are grouped into:

  • Error handling

  • Dependency limitations

  • Cancelling

  • Efficient monitoring (Alpha)

Error handling

There are three types of ways a task can fail:

  1. Job failure: triggering the integration job succeeds but the job itself fails. For example, Orchestra may successfully trigger a dbt job, but the dbt job itself may fail.

  2. API failure: in this event, the triggering of the job fails. This could be due to a mis-specified configuration (e.g. an incorrect job ID) or an expired token.

  3. Internal failure: a process internal to Orchestra results in a failure.

If any task's dependencies have failed (i.e. Task A depends on Task B, and the Task run for Task B is in a "FAILED" state), then the task will not run and will be set to a "SKIPPED" state. All downstream dependencies for the skipped task will also be set to "SKIPPED".

Dependency Limitations

  • Dependencies: any task can depend on any number of other tasks. Tasks cannot have circular dependencies

  • Failure behaviour: downstream tasks are not executed (either in Orchestra or in the underlying platform) if an upstream task fails by default

    • (Beta) This feature can be turned off

  • Retries: on an API failure, Orchestra will by default retry the operation 3 times. These can be set at the Pipeline or Task level

  • Timeouts: due to the possibility of long-running tasks, there are no timeouts set on operations for now. We expect to introduce timeouts that vary depending on the task type, and for users to explicitly set timeouts as it encourages active monitoring of task duration. These can be set at the Pipeline or Task level

Cancelling

Generally, Task runs can be cancelled if the integration supports cancelling of the task. This has the advantage of saving you cloud costs. It means we can terminate redundant processes in underlying systems (such as Snowflake) in the event that cancelling is necessary.

Efficient Monitoring (Alpha)

Where possible, Orchestra configures webhooks once Pipelines are saved and when Pipeline Runs are created in order to efficiently monitor tasks.

Last updated