The Modern Data Stack for SMEs: Building Enterprise-Grade BI Without Enterprise Budgets

Let's be honest: the data industry has a pricing problem.

Tableau: ~$70/user/month (minimum 5 users = $4,200/year) Looker: ~$60/user/month (minimum 10 users = $7,200/year) Power BI Premium: ~$20/user/month (minimum 500 users = $120,000/year)

For enterprise companies with dedicated data teams, these prices are rounding errors. For the small e-commerce business doing $2M/year in revenue? They're deal-breakers.

Here's the thing: you don't need expensive tools to build analytics that actually get used.

I recently built a complete e-commerce analytics platform—customer segmentation, product performance, seller scorecards, geographic analysis—entirely with open-source tools. The licensing cost? $0.

Let me show you how.

The Open-Source Alternative Stack

Instead of proprietary platforms, we'll use three open-source tools that have been running in production for years:

Tool	Role	Replaces
Supabase	Data warehouse + API	Snowflake, BigQuery (for SME workloads)
dbt	Transformation layer	Looker's LookML, Tableau Prep
Metabase	Visualization + exploration	Tableau, Looker, Power BI

Add Docker for deployment and optionally Airflow for orchestration, and you have a complete modern data stack.

Why This Stack Works

Supabase: PostgreSQL, But Actually Pleasant to Use

People call it a "Firebase alternative," but that undersells it. Under the hood, it's PostgreSQL with a beautiful developer experience:

What you get:

Full SQL database with proper ACID compliance
Built-in REST API (no backend code needed)
Row-Level Security for GDPR/CCPA compliance
Real-time subscriptions for live dashboards
Authentication system for access control
Dashboard for debugging and exploration

For analytics specifically:

All the indexing and query optimization you expect from PostgreSQL
Extensions like pg_stat_statements for query profiling
Full compatibility with dbt and any SQL-based tool

dbt: SQL on Steroids

dbt (data build tool) transforms how you write analytics SQL. Instead of sprawling stored procedures or fragile ETL scripts, you get:

Version-controlled SQL: Every transformation is a .sql file in Git. No more "what query created this table?"

Automatic dependency management: Write models that reference other models, and dbt figures out the execution order:

-- models/mart/mart_customer_analytics.sql
SELECT
    c.customer_id,
    c.customer_unique_id,
    c.customer_city,
    c.customer_state,
    co.first_order_date,
    co.last_order_date,
    co.total_orders,
    co.total_spent,
    co.avg_order_value,
    CASE
        WHEN co.total_spent > 1000 THEN 'VIP'
        WHEN co.total_spent > 500 THEN 'Regular'
        ELSE 'New'
    END as customer_segment
FROM {{ ref('stg_olist__customers') }} c
LEFT JOIN {{ ref('int_customer_orders') }} co
    ON c.customer_id = co.customer_id

Built-in testing: Catch data quality issues before they hit dashboards:

models:
  - name: mart_customer_analytics
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: total_spent
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0

Auto-generated documentation: Run dbt docs generate and get a complete data catalog with lineage graphs.

Metabase: BI Your Team Won't Hate

I've seen Tableau dashboards that cost $50,000 to build and never get used. The problem isn't the tool—it's the learning curve.

Metabase takes a different approach:

Question-based interface: Users ask questions in plain English, and Metabase suggests visualizations. No SQL required for basic exploration.

Saved questions become building blocks: Create a question once, reuse it across dashboards.

Embedding support: White-label dashboards and embed them in your product.

Self-serve without chaos: Curated collections and permissions keep things organized.

The Three-Layer Architecture

The secret to maintainable analytics is separation of concerns. Raw data, business logic, and presentation should live in different layers.

Layer 1: Staging (Raw Data Preparation)

Take messy source data and make it consistent:

-- models/staging/stg_olist__orders.sql
WITH source AS (
    SELECT * FROM {{ source('olist', 'orders') }}
),

cleaned AS (
    SELECT
        order_id,
        customer_id,
        order_status,
        CAST(order_purchase_timestamp AS TIMESTAMP) as order_purchase_at,
        CAST(order_approved_at AS TIMESTAMP) as order_approved_at,
        CAST(order_delivered_carrier_date AS TIMESTAMP) as shipped_at,
        CAST(order_delivered_customer_date AS TIMESTAMP) as delivered_at,
        CAST(order_estimated_delivery_date AS TIMESTAMP) as estimated_delivery_at
    FROM source
    WHERE order_id IS NOT NULL
)

SELECT * FROM cleaned

Rules for staging:

One staging model per source table
Rename columns to consistent conventions
Cast data types explicitly
Remove obvious garbage rows
No business logic—just cleaning

Layer 2: Intermediate (Business Logic Processing)

Join tables and compute metrics:

-- models/intermediate/int_customer_orders.sql
WITH orders AS (
    SELECT * FROM {{ ref('stg_olist__orders') }}
    WHERE order_status = 'delivered'
),

order_items AS (
    SELECT * FROM {{ ref('stg_olist__order_items') }}
),

customer_orders AS (
    SELECT
        o.customer_id,
        MIN(o.order_purchase_at) as first_order_date,
        MAX(o.order_purchase_at) as last_order_date,
        COUNT(DISTINCT o.order_id) as total_orders,
        SUM(oi.price + oi.freight_value) as total_spent,
        AVG(oi.price + oi.freight_value) as avg_order_value
    FROM orders o
    LEFT JOIN order_items oi ON o.order_id = oi.order_id
    GROUP BY o.customer_id
)

SELECT * FROM customer_orders

Rules for intermediate:

Join related entities
Compute derived metrics
No presentation logic (formatting, sorting)
Reusable across multiple marts

Layer 3: Marts (Analytics-Ready Models)

Purpose-built tables for specific use cases:

-- models/mart/mart_product_analytics.sql
WITH products AS (
    SELECT * FROM {{ ref('stg_olist__products') }}
),

performance AS (
    SELECT * FROM {{ ref('int_product_performance') }}
),

product_analytics AS (
    SELECT
        p.product_id,
        p.product_category_name,
        p.product_weight_g,
        p.product_length_cm,
        p.product_height_cm,
        p.product_width_cm,
        perf.total_orders,
        perf.total_revenue,
        perf.avg_price,
        perf.avg_review_score,
        RANK() OVER (
            PARTITION BY p.product_category_name
            ORDER BY perf.total_revenue DESC
        ) as category_rank
    FROM products p
    LEFT JOIN performance perf ON p.product_id = perf.product_id
)

SELECT * FROM product_analytics

Rules for marts:

One mart per business domain (customers, products, orders)
Include all metrics a dashboard might need
Optimized for BI tool queries (pre-joined, pre-aggregated)

Setting It Up: A Step-by-Step Guide

Step 1: Start the Stack

Clone the repo and launch everything with Docker:

git clone https://github.com/dasdatasensei/supabase_ecommerce_analytics
cd supabase_ecommerce_analytics
docker compose up -d

Wait 60 seconds for services to initialize. Check status:

docker compose ps

Step 2: Load Sample Data

The project includes a Python loader for the Brazilian E-Commerce dataset:

cd data_loader
pip install -r requirements.txt
python load_data.py

This loads ~100K orders across 9 tables into Supabase.

Step 3: Run dbt Transformations

cd dbt_project
dbt deps    # Install packages
dbt build   # Run models + tests
dbt docs generate  # Create documentation
dbt docs serve     # View documentation site

Step 4: Connect Metabase

Open http://localhost:3000
Create an admin account
Add Supabase as a database:
- Host: supabase-db
- Port: 5432
- Database: postgres
- User: postgres
- Password: (from your .env)

Step 5: Import Dashboards

The project includes pre-built dashboard JSON files. Import them from Metabase's admin panel.

Common Mistakes (and How to Avoid Them)

Mistake 1: Skipping Dimensional Modeling

The trap: "We'll just create flat tables—it's simpler."

The reality: You'll end up with:

Duplicate logic across dashboards
Inconsistent metrics ("Why does marketing show different revenue?")
Slow queries as data grows

The fix: Invest time in proper star schema design. Your future self will thank you.

Mistake 2: No Data Quality Tests

The trap: "We'll validate manually when something looks wrong."

The reality: Bad data propagates silently. By the time you notice, you've made decisions on incorrect numbers.

The fix: Add dbt tests for every critical metric:

- name: total_revenue
  tests:
    - not_null
    - dbt_utils.accepted_range:
        min_value: 0
    - dbt_expectations.expect_column_values_to_be_of_type:
        column_type: numeric

Mistake 3: Over-Engineering Early

The trap: "We need real-time streaming, ML predictions, and multi-cloud deployment."

The reality: You need basic dashboards that business users will actually look at.

The fix: Start with batch processing, simple aggregations, and core metrics. Add complexity only when justified by usage.

Mistake 4: Ignoring Documentation

The trap: "The SQL is self-documenting."

The reality: In 6 months, no one (including you) will remember why customer_segment uses those specific thresholds.

The fix: Use dbt's built-in documentation:

{{
  config(
    description="Customer segmentation based on lifetime value and purchase frequency."
  )
}}

When to Graduate to Paid Tools

This stack is perfect for:

Companies with <$10M annual revenue
Teams with <50 analytics users
Datasets under 100GB
Batch refresh cadence (hourly or daily)

Consider paid alternatives when:

You need sub-second real-time analytics
Your data team exceeds 5 full-time analysts
Compliance requirements demand vendor SLAs
Dataset size exceeds 1TB

Even then, dbt remains valuable—it integrates with Snowflake, BigQuery, and Redshift.

The Bottom Line

You don't need a six-figure budget to build serious analytics.

The combination of Supabase + dbt + Metabase gives you:

✅ Production-grade PostgreSQL database
✅ Version-controlled, tested transformations
✅ Self-serve dashboards for business users
✅ Docker-based deployment that actually works
✅ $0 in licensing costs

These tools exist, they're free, and they work. Stop paying six figures for dashboards nobody looks at.

I wrote up the full build process with architecture decisions in the E-Commerce Analytics case study. There's also a setup checklist if you want to replicate this.

Thinking about building a BI stack like this? Let me know—I'm always curious what problems people are trying to solve.