Skip to content

Concepts

Operational Data Storage (ODS)

An Operational Data Store (ODS) is a type of database often used as an interim area for a data warehouse (DWH). It primarily serves as a "staging area" for immediate operational or transactional data before it's integrated into the data warehouse for more complex analysis.

Purpose

  1. Data Integration: It acts as an integration point, merging data from multiple sources to provide a unified view, which helps to ensure data consistency across systems.
  2. Fallback: Can be used to load the data into the DWH without the need to unpack all source data from data lake.

Benefits

  1. Prototyping: It provides a way to integrate operational systems' data in fast development of new data products.
  2. Faster Access: It provides faster access to operational data than a DWH, supporting time-sensitive operations.
  3. Data Validation: It allows for data cleansing and transformation before data is loaded into the DWH, enhancing data quality.
  4. Performance: By providing a stage area with additional columns like checksums or data modification records(DMR), data integration into DWH can be improved.

Prequesites

  • Download the respective SQL-Template with "(ODS)", add to your common_sqltemplates_X.json file and chose this template for the respective data flows. (Currently not implemented)
  • Depending on the ODS feature you want to use, your tables need to have the columns:
    • "_DCR" to identify when a record was created for the first time
    • "_DMR" if a record was modified recently and when - can also be used to accelerate updating in core
    • "_UPDATE_HASH" if update accerlation with hashing should be used from stage to ods

Processing

Data is loaded from stage layer directly into ODS. This can either be done with a combined UPDATE/INSERT or a MERGE statement. To make proessing of both existing and new records easier it is recommended to add a BK column to ODS tables.

Records that exist in both ODS and Stage should be updated. They can either always be updated which is not recommended for large tables or just update records with differing values. Differing values can either be identified by simple column-to-column comparison or by creating an additional hash column in ODS tables of all relevant columns that can then be compared against the incoming hash from Stage(needs to calculated on the fly in query).

If a record is updated the "_DMR" column is updated to indicate a changed record datetime. This can then be used to compare Core and ODS data and accelerate update by excluding all data in Core where "_DMR" is the same as in ODS.

Soft Delete

CLOE Provides the option to enable the Soft Delete Feature. Currently this can be used by selecting the corresponding SQL-Template for your Data Flow.

Prerequisites

  • Download the respective SQL-Template with "+ Soft Delete", add to your common_sqltemplates_SF.json file and chose this template for the respective data flows.
  • Your tables need to have the corresponding columns "_DELETED_FG" and "_DELETED_ON" (if your columns are named differently, make sure to search-replace the column names in the SQL-Template with your custom names)

Process-Logic

Using the Soft Delete feature will add the following steps to your ETL-Procedure:

  • The standard UPDATE statement will now include a section to set "_DELETED_FG" = 0 and "_DELETED_ON" = null, for all rows where target and source are matched by the BK / BK-artifact.
  • The standard INSERT statement will always insert data with the values "_DELETED_FG" = 0 and "_DELETED_ON" = null
  • An additional UPDATE statement is added, that will find all rows in the target table for which the BK does not exist in the source BK-artifact and the current "_DELETED_FG" is equal to 0. For these rows "_DELETED_FG" is set to 1 and "_DELETED_ON" is set to the current timestamp.

Limitations

  • The soft delete feature is currently only available in full load setups. (For delta load setups this feature is currently not available since it is either impossible to identify deletions, if there is no such information supplied from the source or a different process is needed if this information is provided from the source.)