GA4 Data Discrepancies
Understanding and resolving data differences in GA4 reporting.
Common Discrepancy Sources
Why Numbers Don't Match
┌─────────────────────────────────────────────────────────┐
│ Data Discrepancy Factors │
├─────────────────────────────────────────────────────────┤
│ │
│ Collection Issues │ Processing Differences │
│ ───────────────────────────────────────────────────── │
│ • Ad blockers (10-30%) │ • Sampling in reports │
│ • JavaScript errors │ • Data thresholds │
│ • Consent blocking │ • Attribution models │
│ • Network failures │ • Time zone settings │
│ • Bot filtering │ • Currency conversion │
│ │
│ Platform Differences │ Implementation Errors │
│ ───────────────────────────────────────────────────── │
│ • Different measurement │ • Duplicate events │
│ • Lookback windows │ • Missing events │
│ • Conversion counting │ • Wrong event values │
│ • Session definitions │ • Inconsistent user ID │
│ │
└─────────────────────────────────────────────────────────┘
GA4 vs. Backend Data
Transaction Mismatches
Typical Range: GA4 tracks 85-95% of backend transactions
| Gap Size | Likely Cause | |----------|--------------| | 5-10% | Normal (ad blockers, tracking failures) | | 10-20% | Implementation issues | | >20% | Significant tracking problem |
Diagnosis Query
-- Compare GA4 to backend transactions
WITH ga4_transactions AS (
SELECT
(SELECT value.string_value FROM UNNEST(event_params)
WHERE key = 'transaction_id') as transaction_id,
(SELECT value.double_value FROM UNNEST(event_params)
WHERE key = 'value') as ga4_value
FROM `project.analytics_XXXXXX.events_*`
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
),
backend_transactions AS (
SELECT
order_id as transaction_id,
total_value as backend_value
FROM `project.backend.orders`
WHERE DATE(created_at) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
)
SELECT
b.transaction_id,
b.backend_value,
g.ga4_value,
CASE
WHEN g.transaction_id IS NULL THEN 'Missing in GA4'
WHEN ABS(b.backend_value - g.ga4_value) > 1 THEN 'Value Mismatch'
ELSE 'Match'
END as status
FROM backend_transactions b
LEFT JOIN ga4_transactions g ON b.transaction_id = g.transaction_id
WHERE g.transaction_id IS NULL
OR ABS(b.backend_value - g.ga4_value) > 1
Common Fixes
| Issue | Solution | |-------|----------| | Missing transactions | Implement server-side tracking | | Value mismatches | Verify currency, tax inclusion | | Duplicate transactions | Add transaction_id deduplication |
GA4 vs. Google Ads
Conversion Discrepancies
| Factor | GA4 | Google Ads | |--------|-----|------------| | Lookback window | 30-90 days | 30-90 days | | Attribution | Data-driven | Data-driven | | Conversion counting | Once per event | Per-click option | | Session definition | 30 min inactivity | New each click |
Expected Variance
- Within 10%: Normal differences
- 10-30%: Check import settings
- >30%: Configuration issue
Troubleshooting Steps
-
Verify linking:
- GA4 Admin → Google Ads Links
- Check conversion import settings
-
Compare settings:
- Attribution model match
- Lookback window match
- Conversion counting method
-
Check data:
- Filter by Google Ads source in GA4
- Compare date ranges exactly
GA4 vs. Meta Ads
Attribution Differences
| Factor | GA4 | Meta Ads | |--------|-----|----------| | Default model | Data-driven | 7d click / 1d view | | View-through | Not default | Included | | Cross-device | Limited | Full |
Why Meta Shows More
Meta counts:
- View-through conversions (saw ad, didn't click)
- Cross-device conversions
- Shorter attribution window (inflates recent value)
Reconciliation
-- GA4 analysis for Meta traffic
SELECT
DATE(TIMESTAMP_MICROS(event_timestamp)) as date,
COUNT(*) as ga4_conversions,
SUM(value) as ga4_revenue
FROM (
SELECT
event_timestamp,
(SELECT value.double_value FROM UNNEST(event_params)
WHERE key = 'value') as value
FROM `project.analytics_XXXXXX.events_*`
WHERE event_name = 'purchase'
AND (LOWER(traffic_source.source) LIKE '%facebook%'
OR LOWER(traffic_source.source) LIKE '%instagram%'
OR LOWER(traffic_source.source) LIKE '%fb%')
)
GROUP BY date
ORDER BY date DESC
Session Discrepancies
GA4 Session Definition
- 30 minutes of inactivity = new session
- Midnight in property timezone = new session
- New campaign parameters = new session
Why Session Counts Differ
| Scenario | Result | |----------|--------| | User opens 10 tabs | 1 session (shared) | | User returns after 35 min | 2 sessions | | User clicks different ad | New session | | Cross-midnight activity | 2 sessions |
Debugging Sessions
// Check session in console
// Look for ga_session_id in requests
// Should be consistent within 30 min
// GA4 stores in cookie: _ga_MEASUREMENTID
// Format: GS1.1.SESSION_ID.SESSION_COUNT...
User Count Differences
New vs. Returning
GA4 defines:
- New user: First-ever event from device/browser
- Returning user: Has prior visit (cookie present)
Why Counts Seem Wrong
| Issue | Cause | |-------|-------| | Too many new users | Cookie deleted, incognito, ITP | | User count > session count | Cross-device users | | Sudden increase in new | Tracking code reinstalled |
Sampling Issues
When Sampling Occurs
GA4 samples when:
- Date range > 7 days
- Complex explorations
- High cardinality dimensions
- Limited processing capacity
Detecting Sampling
Look for:
- Yellow shield icon in reports
- "Based on X% of data" message
- Inconsistent numbers between date ranges
Reducing Sampling
- Shorten date range
- Use standard reports (less sampling)
- Export to BigQuery (no sampling)
- Remove unnecessary dimensions
Data Thresholds
What Gets Hidden
GA4 hides data when:
- Row has fewer than 10 users (privacy)
- Google signals enabled (demographic data)
- Small segments
Signs of Thresholds
- Missing rows in tables
- (other) row with large values
- Totals don't match sum of rows
Solutions
- Disable Google signals (if not needed)
- Increase date range for more data
- Use BigQuery export (no thresholds)
- Reduce dimension granularity
Reporting Delays
Processing Times
| Data Type | Typical Delay | |-----------|---------------| | Real-time | Minutes | | Standard reports | 24-48 hours | | BigQuery daily | ~24 hours | | BigQuery streaming | Minutes |
Why Data Changes
- Late-arriving events
- Attribution recalculation
- Spam/bot filtering
- Data-driven model updates
Best Practice
Wait 48-72 hours before final reporting on any date.
Debugging Checklist
Quick Diagnosis
## Data Discrepancy Checklist
### Collection
- [ ] GA4 tag firing on all pages?
- [ ] Purchase event has transaction_id?
- [ ] Values include/exclude tax consistently?
- [ ] Currency code correct?
### Processing
- [ ] Same date range compared?
- [ ] Same timezone settings?
- [ ] Sampling indicator present?
- [ ] Data thresholds applied?
### Platform Comparison
- [ ] Same attribution model?
- [ ] Same lookback window?
- [ ] Same conversion definition?
- [ ] Different deduplication rules?
Validation Query
-- Data quality check
SELECT
event_date,
COUNT(*) as events,
COUNT(DISTINCT user_pseudo_id) as users,
COUNTIF(event_name = 'purchase') as purchases,
COUNTIF(event_name = 'purchase'
AND (SELECT value.string_value FROM UNNEST(event_params)
WHERE key = 'transaction_id') IS NULL) as purchases_no_txn_id,
COUNTIF(event_name = 'purchase'
AND (SELECT value.double_value FROM UNNEST(event_params)
WHERE key = 'value') IS NULL) as purchases_no_value
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
GROUP BY event_date
ORDER BY event_date DESC
Previous: GTM Troubleshooting Next: Cross-Domain Tracking