Run Query
Description
This job runs a SQL Query in Snowflake in an asynchronous fashion and checks it periodically.
Use Cases
We recommend creating a Snowflake Run Query Task for every "Query" you need to run within Snowflake.
In Snowflake, pretty much everything you can do can be done using a query. This ranges from simple SQL statements to anything within the DDL/DML spectrum. Furthermore, you can also run Data Quality tests, dynamically assign roles, and clone tables using Snowflake Queries.
Snowflake also have a number of new and exciting use-cases such as provisioning Snowpark Container services, Dynamic Tables and Snowflake Tasks (which can render sub-dags). These are all supported in Orchestra too. There are a number of advantages to triggering Snowflake queries in Orchestra vs. other workflow orchestration tools
You can co-ordinate tasks outside of Snowflake- these would typically be other Snowflake jobs, data loading steps, or other tasks in Snowflake-adjacent environments e.g. dbt, a BI tool
You can use Orchestra to trigger jobs across Snowflake Accounts / Environments
When Snowflake jobs run, Data Warehouse cost is incurred. Running these operations on a schedule you set explicitly ensures these costs do not go out of hand. Orchestra also fetches cost per query which can be monitored and easily accessed by users
We aggregate metadata from the Snowflake Task in the same place as the metadata from other operations in your Pipeline
Parameters
These parameters are required to run the Snowflake Run Query Task
Name | Data type | Restrictions | Example |
---|---|---|---|
Query | String | N.A. | SELECT * FROM TABLE |
Variants
Orchestra has a variant of the Run Query Task which is to run a query as a data quality test. This very much resembles how SQL tests in popular packages like dbt, Great Expectations, or Elementary are today.
To use this option, select the run test task. From here, you can choose your error and warning expressions. By default, the task will fail if the number of results returned by Snowflake is greater than 0. This can be helpful for enforcing data quality tests upon teams that wish to interact with Snowflake but are not comfortable using dbt.
It can also be helpful in the case of wanting to do highly custom testing. It can be arduous to specify every single custom test in dbt using a macro - where this is not preferred, Orchestra can be used as a testing layer as well.
Orchestra always evaluates errors before evaluating warnings. A summary of the logic is below:
Rows returned | Error expression | Warn expression | Task status |
---|---|---|---|
10 |
|
| Failure |
7 |
|
| Warning |
0 |
|
| Success |
2 |
|
| Success |
Anomaly Detection using Snowflake Cortex
You can run an Anomaly Detection test by leveraging the multi-query statement below:
Train SQL Statement
This should be a query string that references the data you want to use to train the model. For example, if the model should be trained on all data before June 16 2024, you would run
Select timestamp_col, target_col, series_col from table where cast(timestamp_col as date) <= '16-06-2024'
Note - the timestamp_col must be in the timestamp_ntz format.
Predict SQL Statement
This should be a query string that references the data you want to predict. For example, if the date is now June 20 2024 and you want to test observations at that point in time, the string would be
Select timestamp_col, target_col, series_col from table where cast(timestamp_col as date) = '20-06-2024'
Note - the timestamp_col must be in the timestamp_ntz format.
Target Column Name
This is the column name of the thing you are testing. For example, if you are looking for anomalies in sales values and that column was called "sales" then this value would be "sales"
Timestamp Column Name
The name of the column containing the timestamp_ntz data
Optional - Series Column name
Snowflake supports series-level testing. This means that if you have data that is not only split up by date, but also some other dimension like store, supplier, customer, service etc. you can perform series-level testing by passing in the column name of this value.
Note - under the hood Snowflake will generate multiple models (one per series) so this can significantly increase query times depending on how many series you have.
Error handling
Responses
If we receive the following error codes from the Snowflake driver (which we use, rather than the REST API Directly), we will throw an error
Error | Description | Behaviour |
---|---|---|
Any Python Error Class | Unauthorised | We will raise an error and parse what information is available from the response as the Orchestra message |
Last updated