Column Dependencies¶
The synthetic data generator supports both intra-table (column-to-column) and inter-table dependencies, allowing you to create realistic data relationships.
Intra-table Dependencies (Column Dependencies)¶
Generate columns that depend on other columns within the same table using reference mappings.
Basic Example¶
columns:
- name: "department"
data_type: "string"
nullable: false
faker_function: "random_element"
faker_options:
elements: ["Engineering", "Sales", "Marketing"]
- name: "job_level"
data_type: "string"
nullable: false
depends_on: "department"
reference_mapping:
"Engineering": ["Senior Engineer", "Engineer", "Junior Engineer"]
"Sales": ["Sales Manager", "Account Executive", "Sales Representative"]
"Marketing": ["Marketing Director", "Marketing Manager", "Content Specialist"]
Inter-table Dependencies (Reference Tables)¶
Reference external tables for data values, enabling complex multi-table relationships.
Basic Reference Table¶
columns:
- name: "country_code"
data_type: "string"
nullable: false
reference_table:
catalog: "reference_data"
schema: "geographic"
table: "countries"
key_column: "code"
Dependent Reference Table¶
columns:
- name: "country_code"
data_type: "string"
nullable: false
reference_table:
catalog: "reference_data"
schema: "geographic"
table: "countries"
key_column: "code"
- name: "city_name"
data_type: "string"
nullable: false
depends_on: "country_code"
reference_table:
catalog: "reference_data"
schema: "geographic"
table: "cities"
key_column: "name" # Will filter cities where country_code matches parent value
Validation Rules¶
- Dependent columns cannot be nullable
- Parent columns must exist and be defined before dependent columns
- Circular dependencies are detected and prevented (break after max depth of 10)
- Either
reference_mappingorreference_tableis required for dependent columns - Cannot specify both
reference_mappingandreference_tablefor the same column
Table Generation Order¶
When processing multiple configuration files, tables are automatically sorted by dependencies to ensure referenced tables exist before dependent tables are generated.