使用 BigQuery 提取命中级别数据时,Google Analytics Metrics 被夸大了 [英] Google Analytics Metrics are inflated when extracting hit level data using BigQuery
问题描述
我正在尝试在已链接到 bigquery 的 Google Analytics 汇总属性中显示源属性名称.问题是,当我尝试下面的一些指标时,会变得非常膨胀.我猜这与重复字段有关,但不确定该怎么做.我尝试了许多变通方法,例如使用max",但这不会显示每个属性名称.
I'm trying to display the source property name within the Google Analytics roll up property I have linked to bigquery. Issue is, is that when I try the below some of the metrics become hugely inflated. I'm guessing this is to do with repeated fields but not sure what to do about it. I've tried a number of workarounds like using "max" but this doesn't display every property name.
除用户和访问量之外的所有指标似乎都被夸大了.
All metrics except users and visits seem to be inflated.
SELECT
date,
MAX(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640')) THEN 'MUG'
WHEN EXISTS (
SELECT
1
FROM
UNNEST(hits) hits
WHERE
hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social') THEN 'Social'ELSE 'Website' END) AS Property,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(cast(visitId AS STRING),fullVisitorId)) as visits,
sum(totals.visits) as visits2,
COUNT(DISTINCT(fullVisitorId)) AS Users,
h.sourcePropertyInfo.sourcePropertyDisplayName as display,
SUM((
SELECT
SUM(latencyTracking.pageLoadTime)
FROM
UNNEST(hits)
WHERE
page.pagePath = '/' ))/SUM((
SELECT
SUM(latencyTracking.pageLoadSample)
FROM
UNNEST(hits)
WHERE
page.pagePath = '/')) AS pageloadspeed,
SUM(totals.newVisits) AS new_,
SUM(totals.screenviews) AS PAGEVIEWS,
SUM(totals.bounces) AS BOUNCES,
sum(CASE
WHEN device.isMobile = TRUE THEN (totals.visits)
ELSE 0 END) mobilevisits,
SUM(CASE
WHEN trafficSource.medium = 'organic' THEN (totals.visits)
ELSE 0 END) organicvisits,
SUM(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro')) THEN 1
ELSE 0 END) AS NewRegistrations,
SUM(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::')) THEN 1
ELSE 0 END) AS ClickToBuy,
SUM(totals.transactions) AS Transactions
FROM
`project.dataset.ga_sessions_*`, UNNEST(hits) as h
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01')
AND TIMESTAMP('2017-05-01')
GROUP BY
date,
Country,
display
ORDER BY
visits DESC;
我已经尝试从 FROM 子句中简单地将 UNNEST(HITS) 命中作为 h 删除,这给了我以下错误:
I've tried simply removing UNNEST(HITS) hits as h from the FROM clause, this gave me the following error:
错误:无法在 [16:14] 处访问类型为 ARRAY> 的值上的字段 sourcePropertyInfo
Error: Cannot access field sourcePropertyInfo on a value with type ARRAY> at [16:14]
我也试过在子查询中使用它,如下所示:
I've also tried using it in a subquery as follows:
(select h.sourcePropertyInfo.sourcePropertyDisplayName from unnest(hits) h) as displayname,
并得到错误:
标量子查询产生了多个元素
Scalar subquery produced more than one element
推荐答案
由于您需要在命中级别计算多个值,因此取消嵌套字段 hits 是最好的方法.缺点是您丢失了会话级别的 totals 字段聚合,但您仍然可以解决它.
As you need to compute several values on the hits level maybe unnesting the field hits is the best approach. The downside is that you lose the totals field aggregation for the session level but still you can work it around.
举个例子:
SELECT
date,
CASE
WHEN REGEXP_CONTAINS(h.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640') THEN 'MUG'
WHEN h.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website'
END AS Property,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(CAST(visitId AS STRING),fullVisitorId)) AS visits,
COUNT(DISTINCT(fullVisitorId)) AS Users,
h.sourcePropertyInfo.sourcePropertyDisplayName AS display,
SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadTime END) / SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadSample END) AS pageloadspeed,
COUNT(DISTINCT
CASE
WHEN totals.newVisits = 1 THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END) new_visits,
COUNT(CASE
WHEN h.type = 'PAGE' THEN h.page.pagepath END) pageviews,
SUM(CASE
WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces,
COUNT(DISTINCT (CASE
WHEN device.isMobile = TRUE THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) mobilevisits,
COUNT(DISTINCT (CASE
WHEN trafficSource.medium = 'organic' THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) organicvisits,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1 END) AS NewRegistrations,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1 END) AS ClickToBuy,
COUNT(h.transaction.transactionid) transactions
FROM
`project_id.dataset_id.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-01')
GROUP BY
date,
Country,
display,
Property
我针对我们的数据集运行了它,它似乎工作正常.我做了一些改变:
I ran it against our dataset and it seems to be working. Some changes I did:
- 删除了Property 的
MAX
操作并将其添加到group by. - pageviews 被视为命中数,其中 hit.type = 'PAGE'.不确定这是否与屏幕浏览量相同.
- 在发生进入和退出事件时计算反弹.
- 总交易数是对交易 ID 的计数(希望您的数据集中也填写此字段).
- Removed the
MAX
operation for the Property and added it to the group by. - pageviews was considered as the count of hits where hit.type = 'PAGE'. Not sure if this is the same for screenviews though.
- bounce is computed when there's an entrance and exit event.
- Total transactions is a count on transaction ids (hopefully this field is being filled in your dataset as well).
这篇关于使用 BigQuery 提取命中级别数据时,Google Analytics Metrics 被夸大了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!