DataFlow¶

The DataFlow is primarily used for the transfer and processing of data between Stage/ODS layer and the core in the DWH. Queries have a fixed sink and one ore more source tables. This makes a DataFlow able to map one to many relationships (1:n).

The DataFlow creates SQL queries or an Exec SQL Job by inserting the metadata into the previously provided template. The metadata in the JSON files define exec_sql which are ready to use within the orchestration which can be done by the Orchestrator.

DataFlows support many features:

BK generation
Handeling tenant logic
Null handling
Data Quality 1-3 with logging
Lookups Type 1 & 2
Pre & Post-Procesing SQL
Slowly Changing Dimension(SCD)
Computed Columns
Conversion of data types and error handling

The data flows from the stage into the core layer in the following fashion:

Stage layer tables are marked in blue, the stage layer views are green and the core layer tables are orange. You see DataFlow fill the core layer tables from the highest data quality view. There are three levels of data quality:

DQ1: Is the BK unique?
DQ2: Do we have any conversion errors?
DQ3: Are any of the returned values from the lookup empty?

In most cases DQ3 is not neededIn this way, any calculation you need to do are only executed once in the first data quality layer. The latest CLOE release has DQ1, DQ2, logDQ1 and logDQ2 activated by default.

Lookups¶

The Lookup will be executed with the update insert statement of the DataFlow. You need to fill your lookup table beforehand Here:

Generic Lookup¶

Setup: This is the setup for a generic lookup where the sink table core.table_2 has a lookup on the lookup table core.lu_table and returns the "Lookup_ID" as a Lookup Return Column into the sinktable.

Workflow¶

Define a new DataFlow in Modeler > DataFlow
Define the Sourcetable