Skip to content

Column Dependencies

The synthetic data generator supports both intra-table (column-to-column) and inter-table dependencies, allowing you to create realistic data relationships.

Intra-table Dependencies (Column Dependencies)

Generate columns that depend on other columns within the same table using reference mappings.

Basic Example

columns:
  - name: "department"
    data_type: "string"
    nullable: false
    faker_function: "random_element"
    faker_options:
      elements: ["Engineering", "Sales", "Marketing"]

  - name: "job_level"
    data_type: "string"
    nullable: false
    depends_on: "department"
    reference_mapping:
      "Engineering": ["Senior Engineer", "Engineer", "Junior Engineer"]
      "Sales": ["Sales Manager", "Account Executive", "Sales Representative"]
      "Marketing": ["Marketing Director", "Marketing Manager", "Content Specialist"]

Inter-table Dependencies (Reference Tables)

Reference external tables for data values, enabling complex multi-table relationships.

Basic Reference Table

columns:
  - name: "country_code"
    data_type: "string"
    nullable: false
    reference_table:
      catalog: "reference_data"
      schema: "geographic"
      table: "countries"
      key_column: "code"

Dependent Reference Table

columns:
  - name: "country_code"
    data_type: "string"
    nullable: false
    reference_table:
      catalog: "reference_data"
      schema: "geographic"
      table: "countries"
      key_column: "code"

  - name: "city_name"
    data_type: "string"
    nullable: false
    depends_on: "country_code"
    reference_table:
      catalog: "reference_data"
      schema: "geographic"
      table: "cities"
      key_column: "name"  # Will filter cities where country_code matches parent value

Validation Rules

  • Dependent columns cannot be nullable
  • Parent columns must exist and be defined before dependent columns
  • Circular dependencies are detected and prevented (break after max depth of 10)
  • Either reference_mapping or reference_table is required for dependent columns
  • Cannot specify both reference_mapping and reference_table for the same column

Table Generation Order

When processing multiple configuration files, tables are automatically sorted by dependencies to ensure referenced tables exist before dependent tables are generated.