Run Query

Description

This job runs a SQL Query in Snowflake in an asynchronous fashion and checks it periodically.

Use Cases

We recommend creating a Snowflake Run Query Task for every "Query" you need to run within Snowflake.

In Snowflake, pretty much everything you can do can be done using a query. This ranges from simple SQL statements to anything within the DDL/DML spectrum. Furthermore, you can also run Data Quality tests, dynamically assign roles, and clone tables using Snowflake Queries.

Snowflake also have a number of new and exciting use-cases such as provisioning Snowpark Container services, Dynamic Tables and Snowflake Tasks (which can render sub-dags). These are all supported in Orchestra too. There are a number of advantages to triggering Snowflake queries in Orchestra vs. other workflow orchestration tools

  • You can co-ordinate tasks outside of Snowflake- these would typically be other Snowflake jobs, data loading steps, or other tasks in Snowflake-adjacent environments e.g. dbt, a BI tool

  • You can use Orchestra to trigger jobs across Snowflake Accounts / Environments

  • When Snowflake jobs run, Data Warehouse cost is incurred. Running these operations on a schedule you set explicitly ensures these costs do not go out of hand. Orchestra also fetches cost per query which can be monitored and easily accessed by users

  • We aggregate metadata from the Snowflake Task in the same place as the metadata from other operations in your Pipeline

Parameters

These parameters are required to run the Snowflake Run Query Task

NameData typeRestrictionsExample

Query

String

N.A.

SELECT * FROM TABLE

Variants

Orchestra has a variant of the Run Query Task which is to run a query as a data quality test. This very much resembles how SQL tests in popular packages like dbt, Great Expectations, or Elementary are today.

To use this option, select the run test task. From here, you can choose your error and warning expressions. By default, the task will fail if the number of results returned by Snowflake is greater than 0. This can be helpful for enforcing data quality tests upon teams that wish to interact with Snowflake but are not comfortable using dbt.

It can also be helpful in the case of wanting to do highly custom testing. It can be arduous to specify every single custom test in dbt using a macro - where this is not preferred, Orchestra can be used as a testing layer as well.

Orchestra always evaluates errors before evaluating warnings. A summary of the logic is below:

Rows returnedError expressionWarn expressionTask status

10

> 5

> 8

Failure

7

> 10

> 5

Warning

0

> 0

> 0

Success

2

= 1

= 3

Success

Anomaly Detection using Snowflake Cortex

You can run an Anomaly Detection test by leveraging the multi-query statement below:

begin
    call orchestra_anomaly_model!DETECT_ANOMALIES(
        INPUT_DATA => SYSTEM$QUERY_REFERENCE("select sales,to_timestamp_ntz(date) as date from sales_data where date > '01-Jan-24'"),
        TIMESTAMP_COLNAME => 'date',
        TARGET_COLNAME => 'sales'
    );
    LET x := SQLID;
    CREATE OR REPLACE TEMPORARY TABLE my_anomalies AS SELECT * FROM TABLE(RESULT_SCAN(:x));
END;
SELECT * FROM my_anomalies where IS_ANOMALY = TRUE;

Train SQL Statement

This should be a query string that references the data you want to use to train the model. For example, if the model should be trained on all data before June 16 2024, you would run

Select timestamp_col, target_col, series_col from table where cast(timestamp_col as date) <= '16-06-2024'

Note - the timestamp_col must be in the timestamp_ntz format.

Predict SQL Statement

This should be a query string that references the data you want to predict. For example, if the date is now June 20 2024 and you want to test observations at that point in time, the string would be

Select timestamp_col, target_col, series_col from table where cast(timestamp_col as date) = '20-06-2024'

Note - the timestamp_col must be in the timestamp_ntz format.

Target Column Name

This is the column name of the thing you are testing. For example, if you are looking for anomalies in sales values and that column was called "sales" then this value would be "sales"

Timestamp Column Name

The name of the column containing the timestamp_ntz data

Optional - Series Column name

Snowflake supports series-level testing. This means that if you have data that is not only split up by date, but also some other dimension like store, supplier, customer, service etc. you can perform series-level testing by passing in the column name of this value.

Note - under the hood Snowflake will generate multiple models (one per series) so this can significantly increase query times depending on how many series you have.

Error handling

Responses

If we receive the following error codes from the Snowflake driver (which we use, rather than the REST API Directly), we will throw an error

ErrorDescriptionBehaviour

Any Python Error Class

Unauthorised

We will raise an error and parse what information is available from the response as the Orchestra message

Last updated