🗒️Tasks
The smallest unit of work.
A Task is the most basic unit of execution in Orchestra. Tasks are arranged into Pipelines, and they have upstream and downstream dependencies set between them in order to express how they should run.
Unlike in other workflow orchestration tools, Tasks are not arbitrary pieces of code waiting to be executed - Orchestra does not currently support the running of arbitrary code as a Task. A major difference is that users execute Integration Jobs, which you can think of as pre-configured tasks. The analog to these in Airflow would be Operators.
For example, users can execute a dbt run job
which triggers a dbt job in dbt Cloud, configures monitoring, monitors the task, fetches, stores, and cleans metadata. The only parameter required is a job id
. Doing this using a workflow orchestration tool would require multiple files and quite a lot of code, so we think this is a great improvement in managing complexity.
Features
Tasks are equipped with out-the-box features that ensure extremely robust orchestration and a seamless user experience. These are grouped into:
Error handling
Dependency limitations
Cancelling
Efficient monitoring (Alpha)
Error handling
There are three types of ways a task can fail:
Job failure: triggering the integration job succeeds but the job itself fails. For example, Orchestra may successfully trigger a dbt job, but the dbt job itself may fail.
API failure: in this event, the triggering of the job fails. This could be due to a mis-specified configuration (e.g. an incorrect job ID) or an expired token.
Internal failure: a process internal to Orchestra results in a failure.
If any task's dependencies have failed (i.e. Task A depends on Task B, and the Task run for Task B is in a "FAILED" state), then the task will not run and will be set to a "SKIPPED" state. All downstream dependencies for the skipped task will also be set to "SKIPPED".
Dependency Limitations
Dependencies: any task can depend on any number of other tasks. Tasks cannot have circular dependencies
Failure behaviour: by default, downstream tasks are not executed (either in Orchestra or in the underlying platform) if an upstream task has failed. This behaviour can be modified in Orchestra using branching.
Retries: on an API failure, Orchestra will by default retry the operation 3 times with an exponential backoff strategy.
(beta) Orchestra in the future will allow this configuration to be set at the Pipeline or Task level
Timeouts: by default, Orchestra waits for 1 minute for integration platforms to respond to us. This is combined with the retry logic outlined above to ensure maximum reliability in your pipeline. Task runs are stopped after 6 hours in our system, but this can be configured if required for your use-case
Cancelling
Generally, Task runs can be cancelled if the integration supports cancelling of the task. This has the advantage of saving you cloud costs. It means we can terminate redundant processes in underlying systems (such as Snowflake) in the event that cancelling is necessary.
Efficient Monitoring (Alpha)
Where possible, Orchestra configures webhooks once Pipelines are saved and when Pipeline Runs are created in order to efficiently monitor tasks.
Last updated