BigQuery for GA4
Unlock the full potential of GA4 data with BigQuery exports, custom queries, and advanced analytics.
Why BigQuery?
GA4 UI vs. BigQuery
| Capability | GA4 Interface | BigQuery | |------------|---------------|----------| | Data retention | 14 months max | Unlimited | | Custom queries | Limited | Full SQL | | Raw event access | No | Yes | | Data joining | No | Yes | | ML capabilities | Basic | Full BigQuery ML | | Export to other tools | Limited | Full | | Historical reprocessing | No | Yes |
When to Use BigQuery
- Advanced attribution - Custom multi-touch models
- Customer analysis - LTV, cohorts, RFM
- Data joining - CRM, advertising, inventory
- Machine learning - Predictions, clustering
- Long-term storage - Beyond 14 months
- Custom reporting - Complex business logic
Setup & Configuration
Enable BigQuery Export
- GA4 Admin → BigQuery Links
- Click Link
- Select your GCP project
- Configure settings:
| Setting | Recommendation | |---------|----------------| | Data location | Same region as other data | | Frequency | Daily (free) + Streaming (paid) | | Include advertising identifiers | Yes (if needed) |
Export Types
| Type | Latency | Cost | Use Case | |------|---------|------|----------| | Daily | Next day | Free | Historical analysis | | Streaming | Real-time | ~$0.05/GB | Real-time dashboards |
Table Structure
-- Tables created by GA4 export
project.analytics_PROPERTY_ID.events_YYYYMMDD -- Daily tables
project.analytics_PROPERTY_ID.events_intraday_YYYYMMDD -- Streaming
project.analytics_PROPERTY_ID.pseudonymous_users_YYYYMMDD -- User data
Query Fundamentals
Event Schema
-- Explore the schema
SELECT *
FROM `project.analytics_XXXXXX.events_*`
LIMIT 1
Key fields:
event_date- Date string (YYYYMMDD)event_timestamp- Microseconds since epochevent_name- Event name (page_view, purchase, etc.)event_params- Array of key-value pairsuser_pseudo_id- Anonymous user identifieruser_id- Your user ID (if set)user_properties- Array of user attributes
Extracting Event Parameters
-- Extract specific parameter values
SELECT
event_date,
event_name,
-- String parameters
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
-- Integer parameters
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') as engagement_time,
-- Double parameters
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') as value
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
Using UNNEST
-- Unnest event parameters for filtering
SELECT
event_date,
user_pseudo_id,
params.key,
COALESCE(params.value.string_value, CAST(params.value.int_value AS STRING)) as value
FROM `project.analytics_XXXXXX.events_*`,
UNNEST(event_params) as params
WHERE event_name = 'page_view'
AND params.key IN ('page_location', 'page_title')
Common Queries
Session Analysis
-- Sessions by source/medium
WITH sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') as session_id,
MIN(event_timestamp) as session_start,
MAX(event_timestamp) as session_end,
COUNTIF(event_name = 'page_view') as pageviews,
MAX(traffic_source.source) as source,
MAX(traffic_source.medium) as medium
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY user_pseudo_id, session_id
)
SELECT
source,
medium,
COUNT(*) as sessions,
AVG(pageviews) as avg_pageviews,
AVG(TIMESTAMP_DIFF(
TIMESTAMP_MICROS(session_end),
TIMESTAMP_MICROS(session_start),
SECOND
)) as avg_duration_seconds
FROM sessions
GROUP BY source, medium
ORDER BY sessions DESC
Conversion Analysis
-- Conversion rate by landing page
WITH sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') as session_id,
MIN(CASE
WHEN event_name = 'page_view'
THEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')
END) as landing_page,
MAX(CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END) as converted
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY user_pseudo_id, session_id
)
SELECT
REGEXP_EXTRACT(landing_page, r'^https?://[^/]+(/[^?]*)') as landing_path,
COUNT(*) as sessions,
SUM(converted) as conversions,
SAFE_DIVIDE(SUM(converted), COUNT(*)) as conversion_rate
FROM sessions
WHERE landing_page IS NOT NULL
GROUP BY landing_path
HAVING sessions >= 100
ORDER BY conversions DESC
Funnel Analysis
-- E-commerce funnel
WITH user_events AS (
SELECT
user_pseudo_id,
MAX(CASE WHEN event_name = 'view_item' THEN 1 ELSE 0 END) as viewed_product,
MAX(CASE WHEN event_name = 'add_to_cart' THEN 1 ELSE 0 END) as added_to_cart,
MAX(CASE WHEN event_name = 'begin_checkout' THEN 1 ELSE 0 END) as began_checkout,
MAX(CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END) as purchased
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY user_pseudo_id
)
SELECT
COUNT(*) as total_users,
SUM(viewed_product) as viewed_product,
SUM(added_to_cart) as added_to_cart,
SUM(began_checkout) as began_checkout,
SUM(purchased) as purchased,
SAFE_DIVIDE(SUM(added_to_cart), SUM(viewed_product)) as view_to_cart,
SAFE_DIVIDE(SUM(began_checkout), SUM(added_to_cart)) as cart_to_checkout,
SAFE_DIVIDE(SUM(purchased), SUM(began_checkout)) as checkout_to_purchase
FROM user_events
WHERE viewed_product = 1
Cohort Retention
-- Weekly cohort retention
WITH user_cohorts AS (
SELECT
user_pseudo_id,
DATE_TRUNC(PARSE_DATE('%Y%m%d', MIN(event_date)), WEEK) as cohort_week
FROM `project.analytics_XXXXXX.events_*`
GROUP BY user_pseudo_id
),
user_activity AS (
SELECT DISTINCT
user_pseudo_id,
DATE_TRUNC(PARSE_DATE('%Y%m%d', event_date), WEEK) as activity_week
FROM `project.analytics_XXXXXX.events_*`
)
SELECT
c.cohort_week,
DATE_DIFF(a.activity_week, c.cohort_week, WEEK) as weeks_since_cohort,
COUNT(DISTINCT a.user_pseudo_id) as users
FROM user_cohorts c
JOIN user_activity a ON c.user_pseudo_id = a.user_pseudo_id
WHERE c.cohort_week >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
GROUP BY cohort_week, weeks_since_cohort
ORDER BY cohort_week, weeks_since_cohort
E-commerce Queries
Revenue Analysis
-- Revenue by product
SELECT
items.item_id,
items.item_name,
items.item_category,
SUM(items.quantity) as units_sold,
SUM(items.price * items.quantity) as revenue,
COUNT(DISTINCT user_pseudo_id) as unique_buyers
FROM `project.analytics_XXXXXX.events_*`,
UNNEST(items) as items
WHERE event_name = 'purchase'
AND _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY items.item_id, items.item_name, items.item_category
ORDER BY revenue DESC
LIMIT 50
Customer Lifetime Value
-- LTV by acquisition source
WITH customer_revenue AS (
SELECT
user_id,
traffic_source.source as first_source,
traffic_source.medium as first_medium,
SUM((SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')) as total_revenue,
COUNT(*) as purchases,
MIN(event_date) as first_purchase,
MAX(event_date) as last_purchase
FROM `project.analytics_XXXXXX.events_*`
WHERE event_name = 'purchase'
AND user_id IS NOT NULL
GROUP BY user_id, first_source, first_medium
)
SELECT
first_source,
first_medium,
COUNT(DISTINCT user_id) as customers,
AVG(total_revenue) as avg_ltv,
AVG(purchases) as avg_purchases,
SUM(total_revenue) as total_revenue
FROM customer_revenue
GROUP BY first_source, first_medium
HAVING customers >= 10
ORDER BY avg_ltv DESC
Optimization Tips
Partitioning & Clustering
-- Create optimized table from GA4 data
CREATE TABLE `project.dataset.events_optimized`
PARTITION BY DATE(event_timestamp_dt)
CLUSTER BY event_name, user_pseudo_id
AS
SELECT
*,
TIMESTAMP_MICROS(event_timestamp) as event_timestamp_dt
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX >= '20240101'
Query Cost Reduction
| Technique | Impact | |-----------|--------| | Limit date range | -80-90% | | Select specific columns | -50-70% | | Use partitioned tables | -90% | | Pre-aggregate data | -70-80% |
Best Practices
-- DO: Limit date range
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'
-- DON'T: Scan all tables
WHERE _TABLE_SUFFIX LIKE '%'
-- DO: Select only needed columns
SELECT event_name, user_pseudo_id, event_timestamp
-- DON'T: Select everything
SELECT *
-- DO: Use approximate functions for large data
SELECT APPROX_COUNT_DISTINCT(user_pseudo_id)
-- DON'T: Exact count on huge datasets
SELECT COUNT(DISTINCT user_pseudo_id)
Scheduled Queries
Daily Aggregation
-- Schedule to run daily
CREATE OR REPLACE TABLE `project.dataset.daily_metrics`
AS
SELECT
PARSE_DATE('%Y%m%d', event_date) as date,
COUNT(DISTINCT user_pseudo_id) as users,
COUNT(DISTINCT CONCAT(
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
)) as sessions,
COUNTIF(event_name = 'purchase') as purchases,
SUM((SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')) as revenue
FROM `project.analytics_XXXXXX.events_*`
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY date
Setting Up Schedules
- BigQuery → Scheduled queries
- Create query with parameterized dates
- Set schedule (daily, hourly, etc.)
- Configure destination table
- Set up email notifications
Next: Looker Studio