Running Pipelines

How to configure Orchestra to execute your pipelines

In addition to providing a UI for defining what pipelines you want to run, Orchestra also manages the execution of your data pipelines.

A key distinction between Orchestra and Open-Sourced Workflow Orchestration tools is that Orchestra focuses on Orchestration. This means that there is no heavy-compute within the platform - we integrate out the box with your other data tools (such as Snowflake, or EC2 instances) and recommend you leverage these purpose-built services to manage your heavy-compute.

Running Pipelines using triggers

Running a pipeline is determined by a trigger. A pipeline can have multiple triggers, and you can read more about these here.

Retrying Pipelines

Orchestra has powerful retry-logic enabled within the platform. If pipelines fail, we recommend you use the Orchestra platform to find the source of the error, and fix it in the underlying platform.

For example, if an EC2 task fails (for whatever reason), we recommend you debug this, fix the error within the repository that stores the code running in EC2. You can then head over to the Orchestra portal and "Re-run from Failed".

This approach applies to any of your Integrations. In the example below, while the dbt Cloud Task runs, a single test fails. Upon fixing the test in the dbt repo and pushing to main, users can return to Orchestra and "Re-run from failed".

Orchestra is intelligent enough to know where to retry from, and therefore starts completed nodes in a "SKIPPED" state while retrying from the underlying failed jobs. In this case, only one test has failed, so Orchestra only runs that test and downstream dependencies.

This helps avoid redundant compute and allows you to "recover" from failed runs in the same place as where pipelines are run from in Production. This is not something any workflow orchestration tool offers for third party saas tools because the plugin integrations are too lightweight.

Platform Notes

dbt Cloud

When retrying jobs which include dbt Cloud Tasks, we make use of their retry endpoint. The dbt Cloud retry functionality relies on looking at the last run for a given Job ID. This means that if users manually retry dbt Cloud jobs via the dbt Cloud UI while trying to debug DAGs using Orchestra, it can lead to unexpected funky behaviour. An example is below:

  • Run in Orchestra results in a failed dbt Cloud Task

  • User fixes the dbt model

  • Users runs the dbt model from dbt Cloud

    • This creates a new job run ID which Orchestra is not aware of

    • This job run ID succeeds

  • User re-runs the Pipeline from failed in Orchestra

  • Orchestra rather than retrying from the failed node, as the latest job run ID succeeded, Orchestra will not be able to run anything in dbt Cloud

  • Orchestra runs downstream dependencies

Last updated