Case Study: Open Source Software Analytics Transformation
How a B2B SaaS company reduced analytics costs by 70% and gained data ownership through open-source stack migration.
Client Overview
| Attribute | Details | |-----------|---------| | Industry | B2B SaaS (Project Management) | | ARR | $8M | | Monthly Active Users | 50,000 | | Previous Stack | Segment + Amplitude + Multiple point solutions | | Annual Analytics Spend | $180,000 |
The Challenge
Existing Stack Problems
Previous Architecture:
─────────────────────────────────────────────────────────
Website + App
│
▼
┌────────────────┐
│ Segment │ $48,000/year
│ (100K MTU) │
└───────┬────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌─────────┐
│Amplitude│ │ Mixpanel │ │ GA360 │
│$72K/yr │ │ $24K/yr │ │ $36K/yr │
└─────────┘ └──────────┘ └─────────┘
Total Annual Cost: $180,000
Data Ownership: Limited
Vendor Lock-in: High
Pain Points
| Issue | Business Impact | |-------|-----------------| | High costs | $180K/year, growing with users | | Vendor lock-in | Difficult to switch or negotiate | | Data silos | No unified customer view | | Limited customization | Couldn't build custom metrics | | Data ownership | Vendor-controlled, export limitations |
Specific Frustrations
- Segment pricing - Per-MTU model punishing growth
- Amplitude limits - Event quotas causing sampling
- Integration gaps - Manual work connecting tools
- Export restrictions - Couldn't access raw data easily
- Feature gaps - Waiting on vendor roadmaps
The Solution
New Architecture
Open Source Software-First Architecture:
─────────────────────────────────────────────────────────
Website + App
│
▼
┌─────────────────────────┐
│ Rudderstack (Open Source) │ Self-hosted
│ Event Collection │ $0/year
└───────────┬─────────────┘
│
┌───────────┴───────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ BigQuery │ │ GA4 Free │
│ $300/mo │ │ $0/year │
└───────┬───────┘ └───────────────┘
│
┌───────┴───────┐
▼ ▼
┌──────┐ ┌─────────────┐
│ dbt │ │Looker Studio│
│ $0 │ │ $0/year │
└──────┘ └─────────────┘
Annual Cost: ~$8,000
Data Ownership: 100%
Vendor Lock-in: None
Migration Strategy
Phase 1: Data Collection (Month 1)
Replaced Segment with self-hosted Rudderstack:
// Rudderstack SDK (Segment-compatible API)
rudderanalytics.identify('user_123', {
email: '[email protected]',
plan: 'enterprise',
company: 'Acme Corp'
});
rudderanalytics.track('Feature Used', {
feature_name: 'kanban_board',
project_id: 'proj_456'
});
Phase 2: Data Warehouse (Month 1-2)
Set up BigQuery as central data store:
- All events streamed to BigQuery
- Existing data migrated from Amplitude
- Schema designed for analytics queries
- Retention policies configured
Phase 3: Transformation Layer (Month 2-3)
Implemented dbt for data modeling:
-- models/marts/user_engagement.sql
WITH daily_activity AS (
SELECT
user_id,
DATE(timestamp) as activity_date,
COUNT(*) as events,
COUNT(DISTINCT feature_name) as features_used
FROM {{ ref('stg_events') }}
WHERE event_name = 'Feature Used'
GROUP BY user_id, DATE(timestamp)
)
SELECT
user_id,
COUNT(DISTINCT activity_date) as active_days,
SUM(events) as total_events,
AVG(features_used) as avg_features_per_day,
MAX(activity_date) as last_active
FROM daily_activity
GROUP BY user_id
Phase 4: Visualization (Month 3)
Replaced Amplitude dashboards with Looker Studio:
- Connected directly to BigQuery
- Recreated key dashboards
- Added new custom views
- Enabled self-serve exploration
Phase 5: Parallel Running (Month 4)
Ran both systems in parallel:
- Validated data accuracy
- Trained team on new tools
- Identified gaps
- Built confidence
Phase 6: Cutover (Month 5)
- Deprecated old tools
- Cancelled vendor contracts
- Full production on new stack
Results
Cost Savings
| Component | Before | After | Savings | |-----------|--------|-------|---------| | Event collection | $48,000 | $0 | 100% | | Product analytics | $72,000 | $0 | 100% | | Secondary analytics | $24,000 | $0 | 100% | | GA360 | $36,000 | $0 | 100% | | BigQuery | $0 | $3,600 | - | | Cloud Run (Rudderstack) | $0 | $2,400 | - | | Total | $180,000 | $6,000 | 97% |
Capability Improvements
| Capability | Before | After | |------------|--------|-------| | Data ownership | Vendor-controlled | 100% owned | | Raw data access | Limited exports | Full SQL access | | Custom metrics | Vendor features only | Unlimited | | Event limits | Quotas/sampling | Unlimited | | Retention | 12 months | Unlimited | | Real-time | Limited | Full streaming |
Data Quality
-- Before: Amplitude event counts (sampled)
-- After: BigQuery exact counts
Comparison (30-day period):
────────────────────────────────────────────────────────
Event Amplitude BigQuery Diff
────────────────────────────────────────────────────────
Page View 1,245,000 1,312,450 +5.4%
Feature Used 892,000 923,100 +3.5%
Sign Up 2,100 2,247 +7.0%
Subscription Start 450 478 +6.2%
────────────────────────────────────────────────────────
Note: Amplitude was sampling at ~95% due to event limits
Business Intelligence Transformation
New Capabilities:
-
Product-Led Growth Metrics
-- Calculate PLG funnel in BigQuery SELECT signup_cohort_week, COUNT(DISTINCT user_id) as signups, COUNT(DISTINCT CASE WHEN activated THEN user_id END) as activated, COUNT(DISTINCT CASE WHEN converted THEN user_id END) as converted, SUM(mrr) as cohort_mrr FROM user_cohorts GROUP BY signup_cohort_week -
Custom LTV Models
- Built ML model in BigQuery ML
- Predicts 12-month LTV
- Updates daily automatically
-
Real-Time Dashboards
- Live user activity
- Feature adoption tracking
- Experiment monitoring
Implementation Details
Infrastructure
| Component | Technology | Cost | |-----------|------------|------| | Event Collection | Rudderstack Open Source (Cloud Run) | $200/mo | | Data Warehouse | BigQuery | $300/mo | | Transformation | dbt Cloud (free tier) | $0 | | Visualization | Looker Studio | $0 | | Orchestration | Cloud Scheduler | $5/mo |
Team Skills Required
| Skill | Level Needed | Training Provided | |-------|--------------|-------------------| | SQL | Advanced | 2 weeks | | dbt | Intermediate | 1 week | | BigQuery | Intermediate | 1 week | | Looker Studio | Basic | 2 days |
Migration Timeline
| Phase | Duration | Resources | |-------|----------|-----------| | Planning | 2 weeks | 1 analyst, 1 engineer | | Infrastructure | 1 week | 1 engineer | | Data Collection | 2 weeks | 1 engineer | | Modeling | 3 weeks | 1 analyst | | Dashboards | 2 weeks | 1 analyst | | Parallel Run | 4 weeks | Team | | Cutover | 1 week | Team | | Total | ~4 months | |
Key Learnings
Success Factors
- Start with data model - Design schema before migration
- Keep Segment compatibility - Rudderstack eased transition
- Document everything - Critical for team adoption
- Parallel run essential - Built confidence, caught issues
- Executive sponsorship - Enabled timeline and resources
Challenges Overcome
| Challenge | Solution | |-----------|----------| | Team SQL skills | Intensive training program | | Dashboard recreation | Prioritized top 10 views first | | Historical data | One-time migration script | | Real-time needs | BigQuery streaming + Cloud Functions |
What We'd Do Differently
- Start dbt earlier - Underestimated modeling time
- More testing - Edge cases in event schema
- Better docs - Team onboarding took longer than expected
ROI Summary
5-Year Analysis:
────────────────────────────────────────────────────────
Previous Stack (5 years):
Annual cost: $180,000
5-year total: $900,000
+ Annual growth (20%): $1,620,000 projected
Open Source Software Stack (5 years):
Implementation: $50,000 (one-time)
Annual cost: $6,000
5-year total: $80,000
5-Year Savings: $1,540,000
ROI: 1,925%
Payback Period: 4 months
────────────────────────────────────────────────────────
Related: Open Source Software Migration Service | Data Warehouse Integration