Let's be honest: the data industry has a pricing problem.
Tableau: ~$70/user/month (minimum 5 users = $4,200/year) Looker: ~$60/user/month (minimum 10 users = $7,200/year) Power BI Premium: ~$20/user/month (minimum 500 users = $120,000/year)
For enterprise companies with dedicated data teams, these prices are rounding errors. For the small e-commerce business doing $2M/year in revenue? They're deal-breakers.
Here's the thing: you don't need expensive tools to build analytics that actually get used.
I recently built a complete e-commerce analytics platform—customer segmentation, product performance, seller scorecards, geographic analysis—entirely with open-source tools. The licensing cost? $0.
Let me show you how.
The Open-Source Alternative Stack
Instead of proprietary platforms, we'll use three open-source tools that have been running in production for years:
| Tool | Role | Replaces |
|---|---|---|
| Supabase | Data warehouse + API | Snowflake, BigQuery (for SME workloads) |
| dbt | Transformation layer | Looker's LookML, Tableau Prep |
| Metabase | Visualization + exploration | Tableau, Looker, Power BI |
Add Docker for deployment and optionally Airflow for orchestration, and you have a complete modern data stack.
Why This Stack Works
Supabase: PostgreSQL, But Actually Pleasant to Use
People call it a "Firebase alternative," but that undersells it. Under the hood, it's PostgreSQL with a beautiful developer experience:
What you get:
- Full SQL database with proper ACID compliance
- Built-in REST API (no backend code needed)
- Row-Level Security for GDPR/CCPA compliance
- Real-time subscriptions for live dashboards
- Authentication system for access control
- Dashboard for debugging and exploration
For analytics specifically:
- All the indexing and query optimization you expect from PostgreSQL
- Extensions like
pg_stat_statementsfor query profiling - Full compatibility with dbt and any SQL-based tool
dbt: SQL on Steroids
dbt (data build tool) transforms how you write analytics SQL. Instead of sprawling stored procedures or fragile ETL scripts, you get:
Version-controlled SQL:
Every transformation is a .sql file in Git. No more "what query created this table?"
Automatic dependency management: Write models that reference other models, and dbt figures out the execution order:
-- models/mart/mart_customer_analytics.sql
SELECT
c.customer_id,
c.customer_unique_id,
c.customer_city,
c.customer_state,
co.first_order_date,
co.last_order_date,
co.total_orders,
co.total_spent,
co.avg_order_value,
CASE
WHEN co.total_spent > 1000 THEN 'VIP'
WHEN co.total_spent > 500 THEN 'Regular'
ELSE 'New'
END as customer_segment
FROM {{ ref('stg_olist__customers') }} c
LEFT JOIN {{ ref('int_customer_orders') }} co
ON c.customer_id = co.customer_id
Built-in testing: Catch data quality issues before they hit dashboards:
models:
- name: mart_customer_analytics
columns:
- name: customer_id
tests:
- unique
- not_null
- name: total_spent
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
Auto-generated documentation:
Run dbt docs generate and get a complete data catalog with lineage graphs.
Metabase: BI Your Team Won't Hate
I've seen Tableau dashboards that cost $50,000 to build and never get used. The problem isn't the tool—it's the learning curve.
Metabase takes a different approach:
Question-based interface: Users ask questions in plain English, and Metabase suggests visualizations. No SQL required for basic exploration.
Saved questions become building blocks: Create a question once, reuse it across dashboards.
Embedding support: White-label dashboards and embed them in your product.
Self-serve without chaos: Curated collections and permissions keep things organized.
The Three-Layer Architecture
The secret to maintainable analytics is separation of concerns. Raw data, business logic, and presentation should live in different layers.
Layer 1: Staging (Raw Data Preparation)
Take messy source data and make it consistent:
-- models/staging/stg_olist__orders.sql
WITH source AS (
SELECT * FROM {{ source('olist', 'orders') }}
),
cleaned AS (
SELECT
order_id,
customer_id,
order_status,
CAST(order_purchase_timestamp AS TIMESTAMP) as order_purchase_at,
CAST(order_approved_at AS TIMESTAMP) as order_approved_at,
CAST(order_delivered_carrier_date AS TIMESTAMP) as shipped_at,
CAST(order_delivered_customer_date AS TIMESTAMP) as delivered_at,
CAST(order_estimated_delivery_date AS TIMESTAMP) as estimated_delivery_at
FROM source
WHERE order_id IS NOT NULL
)
SELECT * FROM cleaned
Rules for staging:
- One staging model per source table
- Rename columns to consistent conventions
- Cast data types explicitly
- Remove obvious garbage rows
- No business logic—just cleaning
Layer 2: Intermediate (Business Logic Processing)
Join tables and compute metrics:
-- models/intermediate/int_customer_orders.sql
WITH orders AS (
SELECT * FROM {{ ref('stg_olist__orders') }}
WHERE order_status = 'delivered'
),
order_items AS (
SELECT * FROM {{ ref('stg_olist__order_items') }}
),
customer_orders AS (
SELECT
o.customer_id,
MIN(o.order_purchase_at) as first_order_date,
MAX(o.order_purchase_at) as last_order_date,
COUNT(DISTINCT o.order_id) as total_orders,
SUM(oi.price + oi.freight_value) as total_spent,
AVG(oi.price + oi.freight_value) as avg_order_value
FROM orders o
LEFT JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.customer_id
)
SELECT * FROM customer_orders
Rules for intermediate:
- Join related entities
- Compute derived metrics
- No presentation logic (formatting, sorting)
- Reusable across multiple marts
Layer 3: Marts (Analytics-Ready Models)
Purpose-built tables for specific use cases:
-- models/mart/mart_product_analytics.sql
WITH products AS (
SELECT * FROM {{ ref('stg_olist__products') }}
),
performance AS (
SELECT * FROM {{ ref('int_product_performance') }}
),
product_analytics AS (
SELECT
p.product_id,
p.product_category_name,
p.product_weight_g,
p.product_length_cm,
p.product_height_cm,
p.product_width_cm,
perf.total_orders,
perf.total_revenue,
perf.avg_price,
perf.avg_review_score,
RANK() OVER (
PARTITION BY p.product_category_name
ORDER BY perf.total_revenue DESC
) as category_rank
FROM products p
LEFT JOIN performance perf ON p.product_id = perf.product_id
)
SELECT * FROM product_analytics
Rules for marts:
- One mart per business domain (customers, products, orders)
- Include all metrics a dashboard might need
- Optimized for BI tool queries (pre-joined, pre-aggregated)
Setting It Up: A Step-by-Step Guide
Step 1: Start the Stack
Clone the repo and launch everything with Docker:
git clone https://github.com/dasdatasensei/supabase_ecommerce_analytics
cd supabase_ecommerce_analytics
docker compose up -d
Wait 60 seconds for services to initialize. Check status:
docker compose ps
Step 2: Load Sample Data
The project includes a Python loader for the Brazilian E-Commerce dataset:
cd data_loader
pip install -r requirements.txt
python load_data.py
This loads ~100K orders across 9 tables into Supabase.
Step 3: Run dbt Transformations
cd dbt_project
dbt deps # Install packages
dbt build # Run models + tests
dbt docs generate # Create documentation
dbt docs serve # View documentation site
Step 4: Connect Metabase
- Open
http://localhost:3000 - Create an admin account
- Add Supabase as a database:
- Host:
supabase-db - Port:
5432 - Database:
postgres - User:
postgres - Password: (from your
.env)
- Host:
Step 5: Import Dashboards
The project includes pre-built dashboard JSON files. Import them from Metabase's admin panel.
Common Mistakes (and How to Avoid Them)
Mistake 1: Skipping Dimensional Modeling
The trap: "We'll just create flat tables—it's simpler."
The reality: You'll end up with:
- Duplicate logic across dashboards
- Inconsistent metrics ("Why does marketing show different revenue?")
- Slow queries as data grows
The fix: Invest time in proper star schema design. Your future self will thank you.
Mistake 2: No Data Quality Tests
The trap: "We'll validate manually when something looks wrong."
The reality: Bad data propagates silently. By the time you notice, you've made decisions on incorrect numbers.
The fix: Add dbt tests for every critical metric:
- name: total_revenue
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: numeric
Mistake 3: Over-Engineering Early
The trap: "We need real-time streaming, ML predictions, and multi-cloud deployment."
The reality: You need basic dashboards that business users will actually look at.
The fix: Start with batch processing, simple aggregations, and core metrics. Add complexity only when justified by usage.
Mistake 4: Ignoring Documentation
The trap: "The SQL is self-documenting."
The reality: In 6 months, no one (including you) will remember why customer_segment uses those specific thresholds.
The fix: Use dbt's built-in documentation:
{{
config(
description="Customer segmentation based on lifetime value and purchase frequency."
)
}}
When to Graduate to Paid Tools
This stack is perfect for:
- Companies with <$10M annual revenue
- Teams with <50 analytics users
- Datasets under 100GB
- Batch refresh cadence (hourly or daily)
Consider paid alternatives when:
- You need sub-second real-time analytics
- Your data team exceeds 5 full-time analysts
- Compliance requirements demand vendor SLAs
- Dataset size exceeds 1TB
Even then, dbt remains valuable—it integrates with Snowflake, BigQuery, and Redshift.
The Bottom Line
You don't need a six-figure budget to build serious analytics.
The combination of Supabase + dbt + Metabase gives you:
- ✅ Production-grade PostgreSQL database
- ✅ Version-controlled, tested transformations
- ✅ Self-serve dashboards for business users
- ✅ Docker-based deployment that actually works
- ✅ $0 in licensing costs
These tools exist, they're free, and they work. Stop paying six figures for dashboards nobody looks at.
I wrote up the full build process with architecture decisions in the E-Commerce Analytics case study. There's also a setup checklist if you want to replicate this.
Thinking about building a BI stack like this? Let me know—I'm always curious what problems people are trying to solve.



