Mastering dbt: From Core Concepts to CI/CD, Lineage, and Medallion Architecture
Modern data teams don’t struggle with collecting data anymore — they struggle with trusting it. That’s exactly the problem dbt was built to solve.
This post is a complete, end-to-end guide to dbt, covering:
What problems it solves
Core concepts (models, materialization, lineage)
Medallion architecture
CI/CD with GitHub Actions
Interview-ready mental models and scenarios
1. What is dbt?
dbt (data build tool) is a SQL-first transformation framework that runs transformations inside your data warehouse.
What dbt is for
Transforming raw data into analytics-ready tables
Applying software engineering practices to analytics
Enabling testing, documentation, and lineage
What dbt is NOT for
Data ingestion (Fivetran, NiFi, Glue)
Orchestration (Airflow)
Streaming or real-time processing
One-liner: dbt brings version control, testing, and CI/CD to SQL-based analytics.
2. The Problem dbt Solves
Before dbt:
Business logic lives in BI tools
No version control for SQL
Broken dashboards after schema changes
No visibility into downstream impact
With dbt:
All transformations live in Git
SQL is modular and testable
Automatic dependency graph (lineage)
Safe deployments via CI/CD
3. Core dbt Concepts (You Must Know)
3.1 Models
SQL
SELECTstatementsEach model becomes a table, view, or incremental table
Dependencies defined using
ref()
select *
from {{ ref('stg_orders') }}
3.2 Materialization (Very Important)
Materialization defines how a model is stored in the warehouse.
| Type | Use Case |
view | Lightweight, always fresh |
table | BI-facing, fast queries |
incremental | Large fact tables |
ephemeral | Reusable logic, no storage |
Example (incremental):
{{ config(materialized='incremental', unique_key='order_id') }}
select *
from raw.orders
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
Materialization is dbt’s cost and performance control knob.
3.3 Sources
Explicit declaration of raw tables
Enables freshness and source tests
sources:
- name: raw
tables:
- name: orders
3.4 Tests
Built-in tests:
not_nulluniquerelationships
Custom SQL tests supported
Tests fail the pipeline, not silently.
3.5 Macros
Reusable SQL using Jinja
Helps avoid duplication
Used for:
Date logic
Environment-based behavior
Custom tests
3.6 Snapshots
Track historical changes (SCD Type 2)
Useful when source systems don’t provide history
4. Data Lineage Explained Simply
Lineage = visual map of data flow from source to final model.
dbt builds lineage automatically using:
source()ref()
Example:
raw.orders
↓
stg_orders
↓
int_orders
↓
fct_orders
Why lineage matters
Impact analysis
Faster debugging
Safer CI/CD
Easier onboarding
dbt lineage shows data dependencies, not execution order.
5. Medallion Architecture (Bronze → Silver → Gold)

Bronze (Raw)
Ingested data
Minimal transformation
Append-only
Silver (Cleaned)
Deduplication
Type casting
Business logic begins
Gold (Analytics)
Aggregated
Business-ready
Used by BI tools
How dbt fits
Bronze → dbt sources
Silver → staging models
Gold → marts
6. dbt in the Modern Data Stack
| Tool | Responsibility |
| NiFi / Fivetran | Ingestion |
| AWS Glue | Heavy Spark ETL |
| Airflow | Orchestration |
| dbt | Analytics transformations |
dbt focuses on the last mile of analytics.
7. CI/CD for dbt (Production-Grade)
Why CI/CD is critical
Prevent broken SQL in prod
Catch schema changes early
Enforce data quality
CI (Pull Requests)
dbt compileSlim CI (
state:modified+)Run tests on changed models only
CD (Main branch)
Full
dbt runFull
dbt testGenerate documentation
8. GitHub Actions – Complete CI/CD Workflow
Environments
dev→ CIprod→ production deploy
CI + CD Workflow
name: dbt-ci-cd
on:
pull_request:
push:
branches: [main]
jobs:
dbt-ci:
runs-on: ubuntu-latest
environment: dev
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- run: pip install dbt-snowflake
- run: dbt deps
- run: dbt compile
- run: dbt test
dbt-prod:
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: prod
needs: dbt-ci
steps:
- uses: actions/checkout@v4
- run: pip install dbt-snowflake
- run: dbt run
- run: dbt test
9. Real Interview Scenarios
Why dbt over stored procedures?
Version control
Testing
Lineage
CI/CD
Modular SQL
How do you handle large tables?
Incremental models
Partitioning
Late-arriving data logic
How do you stop bad data in prod?
CI tests
Protected branches
Mandatory PR checks
10. Final Mental Model
dbt → Transform data
Materialization → Control cost & performance
Lineage → Trust & impact analysis
Medallion architecture → Structure data layers
CI/CD → Safe, scalable deployments
Final One-Liner
dbt is a SQL-based analytics engineering framework that enables trusted, testable, and production-ready data transformations using modern software engineering practices.
