🗒️Tasks

The smallest unit of work.

A Task is the most basic unit of execution in Orchestra. Tasks are arranged into Pipelines, and they have upstream and downstream dependencies set between them in order to express how they should run.

Unlike in other workflow orchestration tools, Tasks are not arbitrary pieces of code waiting to be executed - Orchestra does not currently support the running of arbitrary code as a Task. A major difference is that users execute Integration Jobs, which you can think of as pre-configured tasks. The analog to these in Airflow would be Operators.

For example, users can execute a dbt run job which triggers a dbt job in dbt Cloud, configures monitoring, monitors the task, fetches, stores, and cleans metadata. The only parameter required is a job id. Doing this using a workflow orchestration tool would require multiple files and quite a lot of code, so we think this is a great improvement in managing complexity.

Features

Tasks are equipped with out-the-box features that ensure extremely robust orchestration and a seamless user experience. These are grouped into:

  • Error handling

  • Dependency limitations

  • Cancelling

  • Efficient monitoring (Alpha)

Error handling

There are three types of ways a task can fail:

  1. Job failure: triggering the integration job succeeds but the job itself fails. For example, Orchestra may successfully trigger a dbt job, but the dbt job itself may fail.

  2. API failure: in this event, the triggering of the job fails. This could be due to a mis-specified configuration (e.g. an incorrect job ID) or an expired token.

  3. Internal failure: a process internal to Orchestra results in a failure.

If any task's dependencies have failed (i.e. Task A depends on Task B, and the Task run for Task B is in a "FAILED" state), then the task will not run and will be set to a "SKIPPED" state. All downstream dependencies for the skipped task will also be set to "SKIPPED".

Dependency Limitations

  • Dependencies: any task can depend on any number of other tasks. Tasks cannot have circular dependencies

  • Failure behaviour: by default, downstream tasks are not executed (either in Orchestra or in the underlying platform) if an upstream task has failed. This behaviour can be modified in Orchestra using branching.

  • Retries: on an API failure, Orchestra will by default retry the operation 3 times with an exponential backoff strategy.

    • (beta) Orchestra in the future will allow this configuration to be set at the Pipeline or Task level

  • Timeouts: by default, Orchestra waits for 1 minute for integration platforms to respond to us. This is combined with the retry logic outlined above to ensure maximum reliability in your pipeline. Task runs are stopped after 6 hours in our system, but this can be configured if required for your use-case

Cancelling

Generally, Task runs can be cancelled if the integration supports cancelling of the task. This has the advantage of saving you cloud costs. It means we can terminate redundant processes in underlying systems (such as Snowflake) in the event that cancelling is necessary.

Efficient Monitoring (Alpha)

Where possible, Orchestra configures webhooks once Pipelines are saved and when Pipeline Runs are created in order to efficiently monitor tasks.

Last updated