Mastering dbt: From Core Concepts to CI/CD, Lineage, and Medallion Architecture

Modern data teams don’t struggle with collecting data anymore — they struggle with trusting it. That’s exactly the problem dbt was built to solve.

This post is a complete, end-to-end guide to dbt, covering:

What problems it solves
Core concepts (models, materialization, lineage)
Medallion architecture
CI/CD with GitHub Actions
Interview-ready mental models and scenarios

1. What is dbt?

dbt (data build tool) is a SQL-first transformation framework that runs transformations inside your data warehouse.

What dbt is for

Transforming raw data into analytics-ready tables
Applying software engineering practices to analytics
Enabling testing, documentation, and lineage

What dbt is NOT for

Data ingestion (Fivetran, NiFi, Glue)
Orchestration (Airflow)
Streaming or real-time processing

One-liner: dbt brings version control, testing, and CI/CD to SQL-based analytics.

2. The Problem dbt Solves

Before dbt:

Business logic lives in BI tools
No version control for SQL
Broken dashboards after schema changes
No visibility into downstream impact

With dbt:

All transformations live in Git
SQL is modular and testable
Automatic dependency graph (lineage)
Safe deployments via CI/CD

3. Core dbt Concepts (You Must Know)

3.1 Models

SQL SELECT statements
Each model becomes a table, view, or incremental table
Dependencies defined using ref()

select *
from {{ ref('stg_orders') }}

3.2 Materialization (Very Important)

Materialization defines how a model is stored in the warehouse.

Type	Use Case
`view`	Lightweight, always fresh
`table`	BI-facing, fast queries
`incremental`	Large fact tables
`ephemeral`	Reusable logic, no storage

Example (incremental):

{{ config(materialized='incremental', unique_key='order_id') }}

select *
from raw.orders
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

Materialization is dbt’s cost and performance control knob.

3.3 Sources

Explicit declaration of raw tables
Enables freshness and source tests

sources:
  - name: raw
    tables:
      - name: orders

3.4 Tests

Built-in tests:
- not_null
- unique
- relationships
Custom SQL tests supported

Tests fail the pipeline, not silently.

3.5 Macros

Reusable SQL using Jinja
Helps avoid duplication
Used for:
- Date logic
- Environment-based behavior
- Custom tests

3.6 Snapshots

Track historical changes (SCD Type 2)
Useful when source systems don’t provide history

4. Data Lineage Explained Simply

Lineage = visual map of data flow from source to final model.

dbt builds lineage automatically using:

source()
ref()

Example:

raw.orders
   ↓
stg_orders
   ↓
int_orders
   ↓
fct_orders

Why lineage matters

Impact analysis
Faster debugging
Safer CI/CD
Easier onboarding

dbt lineage shows data dependencies, not execution order.

5. Medallion Architecture (Bronze → Silver → Gold)

Bronze (Raw)

Ingested data
Minimal transformation
Append-only

Silver (Cleaned)

Deduplication
Type casting
Business logic begins

Gold (Analytics)

Aggregated
Business-ready
Used by BI tools

How dbt fits

Bronze → dbt sources
Silver → staging models
Gold → marts

6. dbt in the Modern Data Stack

Tool	Responsibility
NiFi / Fivetran	Ingestion
AWS Glue	Heavy Spark ETL
Airflow	Orchestration
dbt	Analytics transformations

dbt focuses on the last mile of analytics.

7. CI/CD for dbt (Production-Grade)

Why CI/CD is critical

Prevent broken SQL in prod
Catch schema changes early
Enforce data quality

CI (Pull Requests)

dbt compile
Slim CI (state:modified+)
Run tests on changed models only

CD (Main branch)

Full dbt run
Full dbt test
Generate documentation

8. GitHub Actions – Complete CI/CD Workflow

Environments

dev → CI
prod → production deploy

CI + CD Workflow

name: dbt-ci-cd

on:
  pull_request:
  push:
    branches: [main]

jobs:
  dbt-ci:
    runs-on: ubuntu-latest
    environment: dev
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.10'
      - run: pip install dbt-snowflake
      - run: dbt deps
      - run: dbt compile
      - run: dbt test

  dbt-prod:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: prod
    needs: dbt-ci
    steps:
      - uses: actions/checkout@v4
      - run: pip install dbt-snowflake
      - run: dbt run
      - run: dbt test

9. Real Interview Scenarios

Why dbt over stored procedures?

Version control
Testing
Lineage
CI/CD
Modular SQL

How do you handle large tables?

Incremental models
Partitioning
Late-arriving data logic

How do you stop bad data in prod?

CI tests
Protected branches
Mandatory PR checks

10. Final Mental Model

dbt → Transform data
Materialization → Control cost & performance
Lineage → Trust & impact analysis
Medallion architecture → Structure data layers
CI/CD → Safe, scalable deployments

Final One-Liner

dbt is a SQL-based analytics engineering framework that enables trusted, testable, and production-ready data transformations using modern software engineering practices.

Mastering dbt: From Core Concepts to CI/CD, Lineage, and Medallion Architecture

1. What is dbt?

What dbt is for

What dbt is NOT for

2. The Problem dbt Solves

3. Core dbt Concepts (You Must Know)

3.1 Models

3.2 Materialization (Very Important)

3.3 Sources

3.4 Tests

3.5 Macros

3.6 Snapshots

4. Data Lineage Explained Simply

Why lineage matters

5. Medallion Architecture (Bronze → Silver → Gold)

Bronze (Raw)

Silver (Cleaned)

Gold (Analytics)

How dbt fits

6. dbt in the Modern Data Stack

7. CI/CD for dbt (Production-Grade)

Why CI/CD is critical

CI (Pull Requests)

CD (Main branch)

8. GitHub Actions – Complete CI/CD Workflow

Environments

CI + CD Workflow

9. Real Interview Scenarios

Why dbt over stored procedures?

How do you handle large tables?

How do you stop bad data in prod?

10. Final Mental Model

Final One-Liner

Comments

More from this blog

Fixing AWS Billing & Cost Explorer Access for IAM Identity Center (SSO) Users

Stop Forgetting: Automate Your Study Schedule with Google Apps Script

A Plain-English Guide to Modern Data Architecture (No Jargon, I Promise)

Zero to Production: Deploying Next.js to AWS EC2 using Docker and GitHub Actions (Self-Hosted)

Command Palette

1. What is dbt?

What dbt is for

What dbt is NOT for

2. The Problem dbt Solves

3. Core dbt Concepts (You Must Know)

3.1 Models

3.2 Materialization (Very Important)

3.3 Sources

3.4 Tests

3.5 Macros

3.6 Snapshots

4. Data Lineage Explained Simply

Why lineage matters

5. Medallion Architecture (Bronze → Silver → Gold)

Bronze (Raw)

Silver (Cleaned)

Gold (Analytics)

How dbt fits

6. dbt in the Modern Data Stack

7. CI/CD for dbt (Production-Grade)

Why CI/CD is critical

CI (Pull Requests)

CD (Main branch)

8. GitHub Actions – Complete CI/CD Workflow

Environments

CI + CD Workflow

9. Real Interview Scenarios

Why dbt over stored procedures?

How do you handle large tables?

How do you stop bad data in prod?

10. Final Mental Model

Final One-Liner

Comments

More from this blog