Skip to content

CustomDataFlow

modeler-flow.png

The CustomDataFlow allows populate a SQL statement with source and sink table metadata. You can provide any number of sources (as long as they have the same columns, like different tenant tables) to insert into any number of sink tables within one single CustomDataFlow.

The primary intent of CustomDataFlow is for highly specific queries which DataFlows can not perform easily. For each CustomDataFlow you need to provide the exact query with placeholders for source and sink table.

The CustomDataFlow creates SQL queries or an Exec SQL Job by inserting the metadata into the previously provided query. The metadata in the JSON files define Jobs which are ready to use within the orchestration which can be done by the Orchestrator.

The CustomDataFlow does not provide data quality. Best practice is to wrap your columns in the SQL statement with IFNULL or TRY_CONVERT to ensure the data quality you need.

Example

For each source / sink table pair you need to modify your SQL query. CustomDataFlows replace the {{ sink_table }} and {{ source_table }} with the provided metadata. The space between the double brackets is needed!

Here is an example SQL statement for a CustomDataFlow query (derive statement) which you can adapt for your own use case.

1
2
3
4
5
6
7
8
9
TRUNCATE TABLE {{ sink_table }}
INSERT INTO     {{ sink_table }} (
        SinkTableColumnName1
        SinkTableColumnName2
    )
SELECT
    DISTINCT stg.SourceTableColumnName1
    ,   stg.SourceTableColumnName2
FROM     {{ source_table }} stg;

Since the placeholders only carry meaning after the CLOE code generation, the source and sink can be interchanged.

Supported Features

  • highly specified SQL statement execution (eg. when using CTE's)
  • Variable sink and source tables

Workflow

  1. Define (in the Repository) the source tables you gather data from in the stage layer
  2. Define (in the Repository) the derive tables you load the data into, in the stage/core layer
  3. Create (Modeler > CustomDataFlows) a new CustomDataFlow example_pipe
  4. Define (Modeler > CustomDataFlows > example_pipe > TableMappings) the TableMappings of the example_pipe your source & your sink table
  5. Enter in (Modeler > CustomDataFlows) example_pipe the SQL statement for your use case
  6. EXTRA: If a Job is already defined in (Jobs > ExecSQL), you can add the Job information (in Modeler > CustomDataFlows > example_pipe) to your pipe.