Skip to content

Metadata guide

Introduction

This guide presents the different forms of metadata used by nessy. It is structured by - object: the object which the metadata describes. This is currently tables, schemas, and views. - the necessity of the metadata: either required or optional.

In the following, metadata and necessity are presented by object.

Schema metadata

Schema metadata reference

Field Required? Metadata Type Description Example
catalog true string The catalog in which the schema is located. my_catalog
name true string Name of schema within a catalog. my_schema
storage_path false string Storage path of the schema. Acts as a fallback for table storage paths. "abfss://container@storageaccount.company.core.windows.net/schema_folder/schemas/"
business_properties false dictionary A dictionary containing case-specific metadata required by business or other stakeholders.
comment false string A comment added to the schema. What a wonderful schema to store data.

Example Schema metadata

catalog: "my_catalog"
name: "raw"
storage_path: "abfss://mycontainer@mystorageaccount.blob.core.windows.net/schemas/"

Info

The definition of the schema allows for dynamic values by addressing them in the format of {{env:NAME_OF_ENV_VARIABLE}}. This allows for the definition of environment-specific values, such as the storage path or catalog name.

Table metadata

Table metadata reference

Field Required? Metadata Type Description
name true string Name of the table.
columns true list of columns A list of columns containing column metadata for each column.
is_external false boolean Indicates whether the table is managed in DBX or external.
partition_by false list List of columns by which the table is partitioned or clustered (depending on the attribute liquid_clustering).
liquid_clustering false boolean Defines whether to to cluster the table by the partition_by columns (liquid, multi-dimensional clustering for a Delta Lake table) or to partition by the partition_by columns. Required if partition_by is not None.
composite_primary_key false list List of columns which denote unique rows.
properties false dictionary DBX delta table properties as described here
constraints false dictionary One or more constraint definitions as dictionary of constraint names, SQL code to filter on one or multiple columns, and a description.
foreign_keys false list A list of dictionaries containing foreign_key_columns, parent_table, parent_columns, and foreign_key_options for each foreign key in the table.
storage_path false string Storage path of the table. Acts as overwrite for schema storage path.
business_properties false dictionary A dictionary containing case-specific metadata required by business or other stakeholders.
comment false string A comment on the table.
data_source_format false string The format of the data, e.g. delta or parquet.

Foreign key metadata reference

Field Required? Metadata Type Description
foreign_key_columns false string The name of the columns in the current table that are foreign keys referencing columns in another table.
parent_table false string The name of the parent table containing the column being referenced by the foreign keys.
parent_columns false string The name of the columns in the parent table being referenced by the foreign keys.
foreign_key_option false list of strings The options for the foreign key constraint, such as "MATCH FULL", "ON UPDATE NO ACTION" or "ON DELETE NO ACTION". For details, see the Databricks documentation.

Column metadata reference

Field Required? Metadata Type Description
name true string Physical column name.
data_type true string The data types supported in Azure Databricks.
nullable true boolean Whether the column allows null values. By default it should not.
default_value false In case of non-nullable columns, the default value according to data modeling standards.
generated false string Optional generated expression for auto-generation of column values.
business_properties false dictionary A dictionary containing case-specific metadata required by business or other stakeholders.
comment false string A comment on the column.

Example table metadata

name: customers
is_external : false
partition_by:
    - "region"
liquid_clustering: true
data_source_format: "delta"
properties:
    "delta.autoOptimize.optimizeWrite": "true"
    "delta.enableDeletionVectors": "true"
constraints:
  realisticBirthConstraint:
      expression: "(birthDate > '1900-01-01')"
      description: "Ensures birthDate is realistic"
foreign_keys:
    - foreign_key_columns:
        - "service_level"
      parent_table: "service_level_lookup"
      parent_columns:
        - "service_level_name"
      foreign_key_option:
        - "MATCH FULL"
columns:
    - name: "customer_id"
      data_type: "long"
      nullable: false
      generated: "GENERATED ALWAYS AS IDENTITY"
    - name: "customer_name"
      data_type: "string"
      nullable: false
      default_value: "Max Mustermann"
business_properties:
  responsible_manager: "Dr. Michaela Mustermensch"

External vs. Managed Tables

Managed and external tables can be configured in multiple ways in the metadata. The following table provides an overview of the different possibilities:

is_external table_storage_path Result
false not set managed table
true not set external table derived from schema_storage_path
true set external table based on table_storage_path
false set no valid table