Running Pipelines
How to configure Orchestra to execute your pipelines
In addition to providing a UI for defining what pipelines you want to run, Orchestra also manages the execution of your data pipelines.
A key distinction between Orchestra and Open-Sourced Workflow Orchestration tools is that Orchestra focuses on Orchestration. This means that there is no heavy-compute within the platform - we integrate out the box with your other data tools (such as Snowflake, or EC2 instances) and recommend you leverage these purpose-built services to manage your heavy-compute.
Running Pipelines using triggers
Running a pipeline is determined by a trigger. A pipeline can have multiple triggers, and you can read more about these here.
Retrying Pipelines
Orchestra has powerful retry-logic enabled within the platform. If pipelines fail, we recommend you use the Orchestra platform to find the source of the error, and fix it in the underlying platform.
For example, if an EC2 task fails (for whatever reason), we recommend you debug this, fix the error within the repository that stores the code running in EC2. You can then head over to the Orchestra portal and "Re-run from Failed".
This approach applies to any of your Integrations. In the example below, while the dbt Cloud Task runs, a single test fails. Upon fixing the test in the dbt repo and pushing to main
, users can return to Orchestra and "Re-run from failed".
Orchestra is intelligent enough to know where to retry from, and therefore starts completed nodes in a "SKIPPED" state while retrying from the underlying failed jobs. In this case, only one test has failed, so Orchestra only runs that test and downstream dependencies.
This helps avoid redundant compute and allows you to "recover" from failed runs in the same place as where pipelines are run from in Production. This is not something any workflow orchestration tool offers for third party saas tools because the plugin integrations are too lightweight.
Platform Notes
dbt Cloud
When retrying jobs which include dbt Cloud Tasks, we make use of their retry endpoint. The dbt Cloud retry functionality relies on looking at the last run for a given Job ID. This means that if users manually retry dbt Cloud jobs via the dbt Cloud UI while trying to debug DAGs using Orchestra, it can lead to unexpected funky behaviour. An example is below:
Run in Orchestra results in a failed dbt Cloud Task
User fixes the dbt model
Users runs the dbt model from dbt Cloud
This creates a new job run ID which Orchestra is not aware of
This job run ID succeeds
User re-runs the Pipeline from failed in Orchestra
Orchestra rather than retrying from the failed node, as the latest job run ID succeeded, Orchestra will not be able to run anything in dbt Cloud
Orchestra runs downstream dependencies
Last updated