4 Steps To Validate The Google 360 Reporting UI Against BigQuery

Thomas Spicer
Openbridge

--

For teams that want to establish cross-checks for the standard Google Analytics 360 reporting UI versus the data in BigQuery, we pulled together a few tests to help you along the way. Why run these tests? We published “3 Reasons Your Google Analytics 360 Reporting UI Is Undermining You which highlights some of the challenges teams face when reconciling the UI with their preferred reporting tools.

Self-Service Analytics Tools?

Teams that use self-service analytic tools can avoid some of the common traps having two different reports can create (Google UI vs your reporting tool). What are some examples of self-service analytics tools? Here are few:

For our examples, we will run these sample queries for a hypothetical company called ACME Corp.

Step 1: Validate Country and Region

A great place to start in cross-referencing the reporting UI to the 360 BigQuery data is using country and region.

This query will generate counts for Sweden in BigQuery:

SELECT
date,
fullVisitorId,
visitId
FROM
[acmecorp.ga_sessions_20170312]
WHERE
geoNetwork.country LIKE ‘%Sweden%’
GROUP BY
date,
visitId,
fullVisitorId

When running the query the results ACME Corp had a count of 3,112. When logging into the reporting UI and checking for the same time period, the count was 3,112. Success!

Let’s try the same thing with a different country (United States):

SELECT
date,
fullVisitorId,
visitId
FROM
[acmecorp.ga_sessions_20170312]
WHERE
geoNetwork.country LIKE ‘%United States%’
GROUP BY
date,
visitId,
fullVisitorId

The query had a count of 54,112. When logging into the reporting UI and checking for the same time period, the count was 54,112. We are on a roll!

Let’s try the same thing for the United States except for this time we will add a region (California):

SELECT
date,
fullVisitorId,
visitId
FROM
[acmecorp.ga_sessions_20170312]
WHERE
geoNetwork.country LIKE ‘%United States%’ and geoNetwork.region LIKE ‘%Cali%’
GROUP BY
date,
visitId,
fullVisitorId

The count in BigQuery was 22,512. Logging into the reporting UI, the count was 22,512. So far the tests are tying out nicely.

Step 2: Validate Users Counts

Ok, our location tests have passed, we can start to test counts for users:

SELECT
fullVisitorId,
SUM(totals.visits) as visits,
SUM(totals.pageviews) as pageviews,
FROM
[acmecorp.ga_sessions_20170312]
GROUP BY
fullVisitorId

We run the query and get a count of 180,999. When we log into the reporting UI the count was 180,999. Bingo! Another test passed.

Let's try something different. We will run the same query with a slight variation. This query will be counting those with visits > 0 and pageviews > 0. Let’s see what happens:

SELECT
fullVisitorId,
SUM(totals.visits) as visits,
SUM(totals.pageviews) as pageviews,
FROM
[acmecorp.ga_sessions_20170312]
GROUP BY
fullVisitorId
HAVING visits > 0 and pageviews > 0

We run the query and get a count of 174,401. When we log into the reporting UI the count was 180,999. Hmm, different numbers. Actually, this test passed too! Why ???? It is a perfect example of how a variation on a query will create two different narratives. Which is more accurate? That depends. The query above excludes certain events from the count that are included in the UI. However, if a user in Tableau or PowerBI is reporting 174,401 and an executive is seeing 180,999 in the Google reporting UI, this disconnect will lead to confusion unless cared for properly. This is why you need to make sure you are aware that variations in the UI and reporting tools can be problematic.

Step 3: Traffic Source

The following is a check for traffic by source. In this case, we are using organic as a filter:

SELECT
COUNT(totals.visits) AS visitCount
FROM
[acmecorp.ga_sessions_20170312]
WHERE
trafficSource.medium = “organic”

The results are confirmed. We have 93,043 in the data and 93,043 in the UI.

Let's run the same query but we will replace organic with % :

SELECT
COUNT(totals.visits) AS visitCount
FROM
[acmecorp.ga_sessions_20170312]
WHERE
trafficSource.medium LIKE “%”

The results are confirmed again. We have 193,444 in the data and 193,444 in the UI.

Step 4: Bounces

Our last category is bounces. Everybody loves bounces! This query will check the bounce rate:

SELECT
COUNT(totals.bounces)/COUNT(totals.visits) AS bounceRate
FROM
[acmecorp.ga_sessions_20170312]

The rate matches! We have 0.5227826319203327 in the data and 52.28 in the UI.

How about bounces by medium (organic)?

SELECT
trafficSource.medium AS medium,
COUNT(totals.bounces)/COUNT(totals.visits) AS bounceRate
FROM
[acmecorp.ga_sessions_20170312]
GROUP BY
medium

Checking (organic) the numbers we see that they align too. The data has 0.345743366 and the reporting UI is 34.57.

How about we finish our tests by using a combination of sessions and bounces by medium?

SELECT
trafficSource.source + ' / ' + trafficSource.medium AS source_medium,
count(DISTINCT CONCAT(fullVisitorId, STRING(visitId)), 100000) as sessions,
SUM(totals.bounces) as bounces,
100 * SUM(totals.bounces) / count(DISTINCT CONCAT(fullVisitorId, STRING(visitId)), 100000) as bounce_rate,
SUM(totals.transactions) as transactions,
FROM
[acmecorp.ga_sessions*]
GROUP BY source_medium
ORDER BY sessions DESC

We spot check bing (organic)for sessions. The data is 2243
and the UI at 2243. The bounce rate is .2734857849196539 in the data and 27.35 in the reporting UI.

Why Bother Validating The Numbers?

If you only use the reporting UI, then you likely don’t know what BigQuery is so it does not matter. 😄

However, if you are like ACME Corp, you likely use tools such as PowerBI, Tableau, Mode Analytics, Grow, Chartio, or Oracle BI. If you use these tools, then validating matters, a lot.

Running these queries, and others like them provide a cross-check between the data and the reporting UI. This helps build organizational confidence in work product. Why? By validating that you can re-create the numbers in the same way Google does in the reporting UI you are building a foundation of credibility for more important analytics work later on. If an executive has confidence that the basics are done right, they are more apt to trust an analytics team’s more complex analysis via their preferred self-service analytics tools.

Just be mindful of any custom reports in the Google reporting UI, they can be a source of trouble for everyone if not properly set up.

Good luck!

Did we miss anything? If you have any questions about Google Analytics, BigQuery, or data in general, feel free to leave a comment or contact us at support@openbridge.com. You can also visit us at https://www.openbridge.com to learn how we are helping other companies with their data efforts.

--

--