Snowflake Crawler¶
Introduction¶
The Snowflake crawler can crawl a Snowflake tenant and retrieve metadata of tables only(!). It then saves these metadata to a CLOE compliant Database metadata format. Alternatively it can also update an existing Database metadata format.
Use cases might be if there is an existing database solution in the project that should be used by CLOE. It can also be used in conjunction with the Snowflake role concept to automatically onboard new schemas/databases.
Requirements¶
- DevOps Pipeline (cloud execution) OR Python (local execution)
- See downloads section for example
-
Snowflake technical user
- With privileges to call get_ddl function on objects
Flow¶
The exact crawler flow might differ based on use case and infrastructure.
Arguments¶
| Argument | Optional | Description |
|---|---|---|
| output_json_path | Path where to output the final json to | |
| snowflake-user | The snowflake user the crawler should use. If not set is expected as CLOE_SNOWFLAKE_USER env variable. | |
| snowflake-password | The snowflake password the crawler should use. If not set is expected as CLOE_SNOWFLAKE_PASSWORD env variable. | |
| snowflake-account | The snowflake account the crawler should use. If not set is expected as CLOE_SNOWFLAKE_ACCOUNT env variable. | |
| snowflake-warehouse | The snowflake warehouse the crawler should use. If not set is expected as CLOE_SNOWFLAKE_WAREHOUSE env variable. | |
| snowflake-role | The snowflake role the crawler should use. Can also be set with the CLOE_SNOWFLAKE_ROLE env variable. If not set uses users default role. | |
| existing-repository-json-path | Reference an existing Repository json to update. | |
| ignore-columns | Ignore columns of tables and just retrieve information about the table itself. | |
| database-filter | Filters databases based on defined filter. Is used as Snowflake wildcard pattern in SHOW DATABASES. If no filter defined all databases are retrieved. Can be used to limit databases to environment. | |
| database-name-replace | Replaces parts of a name with the CLOE env placeholder. Can be regex. Can be used to remove the environment part in a database name. |
