Unity Catalog Setup & Governance
Intermediatev1.0.0
Set up Databricks Unity Catalog for centralized data governance — metastore configuration, catalog/schema hierarchy, access controls, data lineage, and cross-workspace sharing.
Content
Overview
Unity Catalog provides centralized governance for all data assets in Databricks. It manages access control, data lineage, auditing, and sharing across workspaces and clouds with a three-level namespace: catalog.schema.table.
Why This Matters
- -Centralized governance — one place for all access control
- -Data lineage — automatic tracking of data flow between tables
- -Fine-grained access — column-level and row-level security
- -Cross-workspace — share data between Databricks workspaces
- -Compliance — audit logs for regulatory requirements
How It Works
Step 1: Create Catalog Hierarchy
Step 2: Configure Access Controls
Step 3: External Locations
Step 4: Data Lineage and Auditing
Best Practices
- -One catalog per environment (dev, staging, prod)
- -Three schemas per catalog following medallion (bronze, silver, gold)
- -Grant access at schema level, restrict at table level when needed
- -Use groups, not individual users, for access grants
- -Enable audit logging for compliance-sensitive data
- -Tag sensitive columns for data classification
Common Mistakes
- -Granting access to individual users instead of groups
- -Not creating separate dev/prod catalogs (risky development)
- -Skipping external location setup (can't access cloud storage)
- -Not reviewing lineage after pipeline changes
- -Missing GRANT USE CATALOG (users can't see anything without it)
FAQ
Discussion
Loading comments...