Skip to main content

BigQuery for GA4

Unlock the full potential of GA4 data with BigQuery exports, custom queries, and advanced analytics.

Why BigQuery?

GA4 UI vs. BigQuery

| Capability | GA4 Interface | BigQuery | |------------|---------------|----------| | Data retention | 14 months max | Unlimited | | Custom queries | Limited | Full SQL | | Raw event access | No | Yes | | Data joining | No | Yes | | ML capabilities | Basic | Full BigQuery ML | | Export to other tools | Limited | Full | | Historical reprocessing | No | Yes |

When to Use BigQuery

  • Advanced attribution - Custom multi-touch models
  • Customer analysis - LTV, cohorts, RFM
  • Data joining - CRM, advertising, inventory
  • Machine learning - Predictions, clustering
  • Long-term storage - Beyond 14 months
  • Custom reporting - Complex business logic

Setup & Configuration

Enable BigQuery Export

  1. GA4 AdminBigQuery Links
  2. Click Link
  3. Select your GCP project
  4. Configure settings:

| Setting | Recommendation | |---------|----------------| | Data location | Same region as other data | | Frequency | Daily (free) + Streaming (paid) | | Include advertising identifiers | Yes (if needed) |

Export Types

| Type | Latency | Cost | Use Case | |------|---------|------|----------| | Daily | Next day | Free | Historical analysis | | Streaming | Real-time | ~$0.05/GB | Real-time dashboards |

Table Structure

-- Tables created by GA4 export
project.analytics_PROPERTY_ID.events_YYYYMMDD  -- Daily tables
project.analytics_PROPERTY_ID.events_intraday_YYYYMMDD  -- Streaming
project.analytics_PROPERTY_ID.pseudonymous_users_YYYYMMDD  -- User data

Query Fundamentals

Event Schema

-- Explore the schema
SELECT *
FROM `project.analytics_XXXXXX.events_*`
LIMIT 1

Key fields:

  • event_date - Date string (YYYYMMDD)
  • event_timestamp - Microseconds since epoch
  • event_name - Event name (page_view, purchase, etc.)
  • event_params - Array of key-value pairs
  • user_pseudo_id - Anonymous user identifier
  • user_id - Your user ID (if set)
  • user_properties - Array of user attributes

Extracting Event Parameters

-- Extract specific parameter values
SELECT
  event_date,
  event_name,
  -- String parameters
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
  -- Integer parameters
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') as engagement_time,
  -- Double parameters
  (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') as value
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

Using UNNEST

-- Unnest event parameters for filtering
SELECT
  event_date,
  user_pseudo_id,
  params.key,
  COALESCE(params.value.string_value, CAST(params.value.int_value AS STRING)) as value
FROM `project.analytics_XXXXXX.events_*`,
UNNEST(event_params) as params
WHERE event_name = 'page_view'
  AND params.key IN ('page_location', 'page_title')

Common Queries

Session Analysis

-- Sessions by source/medium
WITH sessions AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') as session_id,
    MIN(event_timestamp) as session_start,
    MAX(event_timestamp) as session_end,
    COUNTIF(event_name = 'page_view') as pageviews,
    MAX(traffic_source.source) as source,
    MAX(traffic_source.medium) as medium
  FROM `project.analytics_XXXXXX.events_*`
  WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  GROUP BY user_pseudo_id, session_id
)
SELECT
  source,
  medium,
  COUNT(*) as sessions,
  AVG(pageviews) as avg_pageviews,
  AVG(TIMESTAMP_DIFF(
    TIMESTAMP_MICROS(session_end),
    TIMESTAMP_MICROS(session_start),
    SECOND
  )) as avg_duration_seconds
FROM sessions
GROUP BY source, medium
ORDER BY sessions DESC

Conversion Analysis

-- Conversion rate by landing page
WITH sessions AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') as session_id,
    MIN(CASE
      WHEN event_name = 'page_view'
      THEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')
    END) as landing_page,
    MAX(CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END) as converted
  FROM `project.analytics_XXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
    AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  GROUP BY user_pseudo_id, session_id
)
SELECT
  REGEXP_EXTRACT(landing_page, r'^https?://[^/]+(/[^?]*)') as landing_path,
  COUNT(*) as sessions,
  SUM(converted) as conversions,
  SAFE_DIVIDE(SUM(converted), COUNT(*)) as conversion_rate
FROM sessions
WHERE landing_page IS NOT NULL
GROUP BY landing_path
HAVING sessions >= 100
ORDER BY conversions DESC

Funnel Analysis

-- E-commerce funnel
WITH user_events AS (
  SELECT
    user_pseudo_id,
    MAX(CASE WHEN event_name = 'view_item' THEN 1 ELSE 0 END) as viewed_product,
    MAX(CASE WHEN event_name = 'add_to_cart' THEN 1 ELSE 0 END) as added_to_cart,
    MAX(CASE WHEN event_name = 'begin_checkout' THEN 1 ELSE 0 END) as began_checkout,
    MAX(CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END) as purchased
  FROM `project.analytics_XXXXXX.events_*`
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
    AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
  GROUP BY user_pseudo_id
)
SELECT
  COUNT(*) as total_users,
  SUM(viewed_product) as viewed_product,
  SUM(added_to_cart) as added_to_cart,
  SUM(began_checkout) as began_checkout,
  SUM(purchased) as purchased,
  SAFE_DIVIDE(SUM(added_to_cart), SUM(viewed_product)) as view_to_cart,
  SAFE_DIVIDE(SUM(began_checkout), SUM(added_to_cart)) as cart_to_checkout,
  SAFE_DIVIDE(SUM(purchased), SUM(began_checkout)) as checkout_to_purchase
FROM user_events
WHERE viewed_product = 1

Cohort Retention

-- Weekly cohort retention
WITH user_cohorts AS (
  SELECT
    user_pseudo_id,
    DATE_TRUNC(PARSE_DATE('%Y%m%d', MIN(event_date)), WEEK) as cohort_week
  FROM `project.analytics_XXXXXX.events_*`
  GROUP BY user_pseudo_id
),
user_activity AS (
  SELECT DISTINCT
    user_pseudo_id,
    DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), WEEK) as activity_week
  FROM `project.analytics_XXXXXX.events_*`
)
SELECT
  c.cohort_week,
  DATE_DIFF(a.activity_week, c.cohort_week, WEEK) as weeks_since_cohort,
  COUNT(DISTINCT a.user_pseudo_id) as users
FROM user_cohorts c
JOIN user_activity a ON c.user_pseudo_id = a.user_pseudo_id
WHERE c.cohort_week >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
GROUP BY cohort_week, weeks_since_cohort
ORDER BY cohort_week, weeks_since_cohort

E-commerce Queries

Revenue Analysis

-- Revenue by product
SELECT
  items.item_id,
  items.item_name,
  items.item_category,
  SUM(items.quantity) as units_sold,
  SUM(items.price * items.quantity) as revenue,
  COUNT(DISTINCT user_pseudo_id) as unique_buyers
FROM `project.analytics_XXXXXX.events_*`,
UNNEST(items) as items
WHERE event_name = 'purchase'
  AND _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
    AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY items.item_id, items.item_name, items.item_category
ORDER BY revenue DESC
LIMIT 50

Customer Lifetime Value

-- LTV by acquisition source
WITH customer_revenue AS (
  SELECT
    user_id,
    traffic_source.source as first_source,
    traffic_source.medium as first_medium,
    SUM((SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')) as total_revenue,
    COUNT(*) as purchases,
    MIN(event_date) as first_purchase,
    MAX(event_date) as last_purchase
  FROM `project.analytics_XXXXXX.events_*`
  WHERE event_name = 'purchase'
    AND user_id IS NOT NULL
  GROUP BY user_id, first_source, first_medium
)
SELECT
  first_source,
  first_medium,
  COUNT(DISTINCT user_id) as customers,
  AVG(total_revenue) as avg_ltv,
  AVG(purchases) as avg_purchases,
  SUM(total_revenue) as total_revenue
FROM customer_revenue
GROUP BY first_source, first_medium
HAVING customers >= 10
ORDER BY avg_ltv DESC

Optimization Tips

Partitioning & Clustering

-- Create optimized table from GA4 data
CREATE TABLE `project.dataset.events_optimized`
PARTITION BY DATE(event_timestamp_dt)
CLUSTER BY event_name, user_pseudo_id
AS
SELECT
  *,
  TIMESTAMP_MICROS(event_timestamp) as event_timestamp_dt
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX >= '20240101'

Query Cost Reduction

| Technique | Impact | |-----------|--------| | Limit date range | -80-90% | | Select specific columns | -50-70% | | Use partitioned tables | -90% | | Pre-aggregate data | -70-80% |

Best Practices

-- DO: Limit date range
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'

-- DON'T: Scan all tables
WHERE _TABLE_SUFFIX LIKE '%'

-- DO: Select only needed columns
SELECT event_name, user_pseudo_id, event_timestamp

-- DON'T: Select everything
SELECT *

-- DO: Use approximate functions for large data
SELECT APPROX_COUNT_DISTINCT(user_pseudo_id)

-- DON'T: Exact count on huge datasets
SELECT COUNT(DISTINCT user_pseudo_id)

Scheduled Queries

Daily Aggregation

-- Schedule to run daily
CREATE OR REPLACE TABLE `project.dataset.daily_metrics`
AS
SELECT
  PARSE_DATE('%Y%m%d', event_date) as date,
  COUNT(DISTINCT user_pseudo_id) as users,
  COUNT(DISTINCT CONCAT(
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
  )) as sessions,
  COUNTIF(event_name = 'purchase') as purchases,
  SUM((SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')) as revenue
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY date

Setting Up Schedules

  1. BigQueryScheduled queries
  2. Create query with parameterized dates
  3. Set schedule (daily, hourly, etc.)
  4. Configure destination table
  5. Set up email notifications

Next: Looker Studio