Notebook Development Standards

Beginner

Enforce consistent Databricks notebook structure — widget parameterization, markdown documentation, cell organization, error handling, and production-readiness requirements.

File Patterns

**/*.py**/*.sql

This rule applies to files matching the patterns above.

Rule Content

rule-content.md

# Notebook Development Standards

## Rule
All Databricks notebooks MUST use widgets for parameters, include markdown documentation, follow the standard cell structure, and handle errors for production readiness.

## Required Cell Structure
```
Cell 1: Markdown — Title, description, author, last updated
Cell 2: Python — Widget definitions (ALL configurable parameters)
Cell 3: Python — Imports and configuration
Cell 4+: Python/SQL — Transformations with markdown section headers
Cell N-1: Python — Data quality assertions
Cell N: Python/SQL — Write output to Delta table
```

## Widget Requirements

### Good — Fully Parameterized
```python
# All configurable values as widgets
dbutils.widgets.text("env", "dev", "Environment")
dbutils.widgets.text("run_date", "", "Run Date (YYYY-MM-DD)")
dbutils.widgets.text("catalog", "", "Catalog Name")
dbutils.widgets.dropdown("mode", "append", ["append", "overwrite"], "Write Mode")

env = dbutils.widgets.get("env")
run_date = dbutils.widgets.get("run_date") or str(date.today())
catalog = dbutils.widgets.get("catalog") or f"{env}_catalog"
```

### Bad — Hardcoded Values
```python
# NEVER hardcode these
catalog = "prod_catalog"  # Should be a widget
run_date = "2026-03-01"   # Should be a widget
```

## Markdown Requirements
- Title cell with notebook purpose and author
- Section headers before each logical transformation step
- Comments explaining business logic (not just code mechanics)

## Error Handling
```python
# Good — explicit error handling
try:
    df = spark.read.table(f"{catalog}.bronze.events")
    assert df.count() > 0, "Source table is empty"
except Exception as e:
    dbutils.notebook.exit(f"FAILED: {str(e)}")
```

## Anti-Patterns
- Hardcoded dates, paths, catalog names, or environment strings
- No markdown cells between code cells (unreadable)
- Using display() as the final output (results not persisted)
- No error handling (silent failures in production)
- collect() on large datasets without limit (OOM)
- print() for logging instead of structured logging

FAQ

Discussion

Loading comments...

# Notebook Development Standards ## Rule All Databricks notebooks MUST use widgets for parameters, include markdown documentation, follow the standard cell structure, and handle errors for production readiness. ## Required Cell Structure ``` Cell 1: Markdown — Title, description, author, last updated Cell 2: Python — Widget definitions (ALL configurable parameters) Cell 3: Python — Imports and configuration Cell 4+: Python/SQL — Transformations with markdown section headers Cell N-1: Python — Data quality assertions Cell N: Python/SQL — Write output to Delta table ``` ## Widget Requirements ### Good — Fully Parameterized ```python # All configurable values as widgets dbutils.widgets.text("env", "dev", "Environment") dbutils.widgets.text("run_date", "", "Run Date (YYYY-MM-DD)") dbutils.widgets.text("catalog", "", "Catalog Name") dbutils.widgets.dropdown("mode", "append", ["append", "overwrite"], "Write Mode") env = dbutils.widgets.get("env") run_date = dbutils.widgets.get("run_date") or str(date.today()) catalog = dbutils.widgets.get("catalog") or f"{env}_catalog" ``` ### Bad — Hardcoded Values ```python # NEVER hardcode these catalog = "prod_catalog" # Should be a widget run_date = "2026-03-01" # Should be a widget ``` ## Markdown Requirements - Title cell with notebook purpose and author - Section headers before each logical transformation step - Comments explaining business logic (not just code mechanics) ## Error Handling ```python # Good — explicit error handling try: df = spark.read.table(f"{catalog}.bronze.events") assert df.count() > 0, "Source table is empty" except Exception as e: dbutils.notebook.exit(f"FAILED: {str(e)}") ``` ## Anti-Patterns - Hardcoded dates, paths, catalog names, or environment strings - No markdown cells between code cells (unreadable) - Using display() as the final output (results not persisted) - No error handling (silent failures in production) - collect() on large datasets without limit (OOM) - print() for logging instead of structured logging