Skip to content

DataFlow

The DataFlow is primarily used for the transfer and processing of data between Stage/ODS layer and the core in the DWH. Queries have a fixed sink and one ore more source tables. This makes a DataFlow able to map one to many relationships (1:n).

The DataFlow creates SQL queries or an Exec SQL Job by inserting the metadata into the previously provided template. The metadata in the JSON files define exec_sql which are ready to use within the orchestration which can be done by the Orchestrator.

DataFlows support many features:

  • BK generation
  • Handeling tenant logic
  • Null handling
  • Data Quality 1-3 with logging
  • Lookups Type 1 & 2
  • Pre & Post-Procesing SQL
  • Slowly Changing Dimension(SCD)
  • Computed Columns
  • Conversion of data types and error handling

The data flows from the stage into the core layer in the following fashion:

DataFlow Schema

Stage layer tables are marked in blue, the stage layer views are green and the core layer tables are orange. You see DataFlow fill the core layer tables from the highest data quality view. There are three levels of data quality:

  1. DQ1: Is the BK unique?
  2. DQ2: Do we have any conversion errors?
  3. DQ3: Are any of the returned values from the lookup empty?

In most cases DQ3 is not neededIn this way, any calculation you need to do are only executed once in the first data quality layer. The latest CLOE release has DQ1, DQ2, logDQ1 and logDQ2 activated by default.

Lookups

The Lookup will be executed with the update insert statement of the DataFlow. You need to fill your lookup table beforehand Here: Lookup

Generic Lookup

Setup: This is the setup for a generic lookup where the sink table core.table_2 has a lookup on the lookup table core.lu_table and returns the "Lookup_ID" as a Lookup Return Column into the sinktable.

Lookup

Workflow

  1. Define a new DataFlow in Modeler > DataFlow
  2. Define the Sourcetable