Skip to main content

Case Study: Open Source Software Analytics Transformation

How a B2B SaaS company reduced analytics costs by 70% and gained data ownership through open-source stack migration.

Client Overview

| Attribute | Details | |-----------|---------| | Industry | B2B SaaS (Project Management) | | ARR | $8M | | Monthly Active Users | 50,000 | | Previous Stack | Segment + Amplitude + Multiple point solutions | | Annual Analytics Spend | $180,000 |

The Challenge

Existing Stack Problems

Previous Architecture:
─────────────────────────────────────────────────────────
                    Website + App
                         │
                         ▼
                ┌────────────────┐
                │    Segment     │  $48,000/year
                │   (100K MTU)   │
                └───────┬────────┘
                        │
        ┌───────────────┼───────────────┐
        ▼               ▼               ▼
   ┌─────────┐    ┌──────────┐    ┌─────────┐
   │Amplitude│    │ Mixpanel │    │  GA360  │
   │$72K/yr  │    │ $24K/yr  │    │ $36K/yr │
   └─────────┘    └──────────┘    └─────────┘

Total Annual Cost: $180,000
Data Ownership: Limited
Vendor Lock-in: High

Pain Points

| Issue | Business Impact | |-------|-----------------| | High costs | $180K/year, growing with users | | Vendor lock-in | Difficult to switch or negotiate | | Data silos | No unified customer view | | Limited customization | Couldn't build custom metrics | | Data ownership | Vendor-controlled, export limitations |

Specific Frustrations

  1. Segment pricing - Per-MTU model punishing growth
  2. Amplitude limits - Event quotas causing sampling
  3. Integration gaps - Manual work connecting tools
  4. Export restrictions - Couldn't access raw data easily
  5. Feature gaps - Waiting on vendor roadmaps

The Solution

New Architecture

Open Source Software-First Architecture:
─────────────────────────────────────────────────────────
                    Website + App
                         │
                         ▼
           ┌─────────────────────────┐
           │    Rudderstack (Open Source)    │  Self-hosted
           │    Event Collection     │  $0/year
           └───────────┬─────────────┘
                       │
           ┌───────────┴───────────┐
           ▼                       ▼
   ┌───────────────┐      ┌───────────────┐
   │   BigQuery    │      │   GA4 Free    │
   │   $300/mo     │      │   $0/year     │
   └───────┬───────┘      └───────────────┘
           │
   ┌───────┴───────┐
   ▼               ▼
┌──────┐    ┌─────────────┐
│ dbt  │    │Looker Studio│
│ $0   │    │   $0/year   │
└──────┘    └─────────────┘

Annual Cost: ~$8,000
Data Ownership: 100%
Vendor Lock-in: None

Migration Strategy

Phase 1: Data Collection (Month 1)

Replaced Segment with self-hosted Rudderstack:

// Rudderstack SDK (Segment-compatible API)
rudderanalytics.identify('user_123', {
  email: '[email protected]',
  plan: 'enterprise',
  company: 'Acme Corp'
});

rudderanalytics.track('Feature Used', {
  feature_name: 'kanban_board',
  project_id: 'proj_456'
});

Phase 2: Data Warehouse (Month 1-2)

Set up BigQuery as central data store:

  • All events streamed to BigQuery
  • Existing data migrated from Amplitude
  • Schema designed for analytics queries
  • Retention policies configured

Phase 3: Transformation Layer (Month 2-3)

Implemented dbt for data modeling:

-- models/marts/user_engagement.sql
WITH daily_activity AS (
  SELECT
    user_id,
    DATE(timestamp) as activity_date,
    COUNT(*) as events,
    COUNT(DISTINCT feature_name) as features_used
  FROM {{ ref('stg_events') }}
  WHERE event_name = 'Feature Used'
  GROUP BY user_id, DATE(timestamp)
)
SELECT
  user_id,
  COUNT(DISTINCT activity_date) as active_days,
  SUM(events) as total_events,
  AVG(features_used) as avg_features_per_day,
  MAX(activity_date) as last_active
FROM daily_activity
GROUP BY user_id

Phase 4: Visualization (Month 3)

Replaced Amplitude dashboards with Looker Studio:

  • Connected directly to BigQuery
  • Recreated key dashboards
  • Added new custom views
  • Enabled self-serve exploration

Phase 5: Parallel Running (Month 4)

Ran both systems in parallel:

  • Validated data accuracy
  • Trained team on new tools
  • Identified gaps
  • Built confidence

Phase 6: Cutover (Month 5)

  • Deprecated old tools
  • Cancelled vendor contracts
  • Full production on new stack

Results

Cost Savings

| Component | Before | After | Savings | |-----------|--------|-------|---------| | Event collection | $48,000 | $0 | 100% | | Product analytics | $72,000 | $0 | 100% | | Secondary analytics | $24,000 | $0 | 100% | | GA360 | $36,000 | $0 | 100% | | BigQuery | $0 | $3,600 | - | | Cloud Run (Rudderstack) | $0 | $2,400 | - | | Total | $180,000 | $6,000 | 97% |

Capability Improvements

| Capability | Before | After | |------------|--------|-------| | Data ownership | Vendor-controlled | 100% owned | | Raw data access | Limited exports | Full SQL access | | Custom metrics | Vendor features only | Unlimited | | Event limits | Quotas/sampling | Unlimited | | Retention | 12 months | Unlimited | | Real-time | Limited | Full streaming |

Data Quality

-- Before: Amplitude event counts (sampled)
-- After: BigQuery exact counts

Comparison (30-day period):
────────────────────────────────────────────────────────
Event               Amplitude       BigQuery      Diff
────────────────────────────────────────────────────────
Page View           1,245,000       1,312,450     +5.4%
Feature Used          892,000         923,100     +3.5%
Sign Up                 2,100           2,247     +7.0%
Subscription Start        450             478     +6.2%
────────────────────────────────────────────────────────
Note: Amplitude was sampling at ~95% due to event limits

Business Intelligence Transformation

New Capabilities:

  1. Product-Led Growth Metrics

    -- Calculate PLG funnel in BigQuery
    SELECT
      signup_cohort_week,
      COUNT(DISTINCT user_id) as signups,
      COUNT(DISTINCT CASE WHEN activated THEN user_id END) as activated,
      COUNT(DISTINCT CASE WHEN converted THEN user_id END) as converted,
      SUM(mrr) as cohort_mrr
    FROM user_cohorts
    GROUP BY signup_cohort_week
    
  2. Custom LTV Models

    • Built ML model in BigQuery ML
    • Predicts 12-month LTV
    • Updates daily automatically
  3. Real-Time Dashboards

    • Live user activity
    • Feature adoption tracking
    • Experiment monitoring

Implementation Details

Infrastructure

| Component | Technology | Cost | |-----------|------------|------| | Event Collection | Rudderstack Open Source (Cloud Run) | $200/mo | | Data Warehouse | BigQuery | $300/mo | | Transformation | dbt Cloud (free tier) | $0 | | Visualization | Looker Studio | $0 | | Orchestration | Cloud Scheduler | $5/mo |

Team Skills Required

| Skill | Level Needed | Training Provided | |-------|--------------|-------------------| | SQL | Advanced | 2 weeks | | dbt | Intermediate | 1 week | | BigQuery | Intermediate | 1 week | | Looker Studio | Basic | 2 days |

Migration Timeline

| Phase | Duration | Resources | |-------|----------|-----------| | Planning | 2 weeks | 1 analyst, 1 engineer | | Infrastructure | 1 week | 1 engineer | | Data Collection | 2 weeks | 1 engineer | | Modeling | 3 weeks | 1 analyst | | Dashboards | 2 weeks | 1 analyst | | Parallel Run | 4 weeks | Team | | Cutover | 1 week | Team | | Total | ~4 months | |

Key Learnings

Success Factors

  1. Start with data model - Design schema before migration
  2. Keep Segment compatibility - Rudderstack eased transition
  3. Document everything - Critical for team adoption
  4. Parallel run essential - Built confidence, caught issues
  5. Executive sponsorship - Enabled timeline and resources

Challenges Overcome

| Challenge | Solution | |-----------|----------| | Team SQL skills | Intensive training program | | Dashboard recreation | Prioritized top 10 views first | | Historical data | One-time migration script | | Real-time needs | BigQuery streaming + Cloud Functions |

What We'd Do Differently

  1. Start dbt earlier - Underestimated modeling time
  2. More testing - Edge cases in event schema
  3. Better docs - Team onboarding took longer than expected

ROI Summary

5-Year Analysis:
────────────────────────────────────────────────────────
Previous Stack (5 years):
  Annual cost:              $180,000
  5-year total:             $900,000
  + Annual growth (20%):    $1,620,000 projected

Open Source Software Stack (5 years):
  Implementation:           $50,000 (one-time)
  Annual cost:              $6,000
  5-year total:             $80,000

5-Year Savings:             $1,540,000
ROI:                        1,925%
Payback Period:             4 months
────────────────────────────────────────────────────────

Related: Open Source Software Migration Service | Data Warehouse Integration