使用 BigQuery 提取命中级别数据时,Google Analytics Metrics 被夸大了 [英] Google Analytics Metrics are inflated when extracting hit level data using BigQuery

查看:14
本文介绍了使用 BigQuery 提取命中级别数据时,Google Analytics Metrics 被夸大了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在已链接到 bigquery 的 Google Analytics 汇总属性中显示源属性名称.问题是,当我尝试下面的一些指标时,会变得非常膨胀.我猜这与重复字段有关,但不确定该怎么做.我尝试了许多变通方法,例如使用max",但这不会显示每个属性名称.

I'm trying to display the source property name within the Google Analytics roll up property I have linked to bigquery. Issue is, is that when I try the below some of the metrics become hugely inflated. I'm guessing this is to do with repeated fields but not sure what to do about it. I've tried a number of workarounds like using "max" but this doesn't display every property name.

除用户和访问量之外的所有指标似乎都被夸大了.

All metrics except users and visits seem to be inflated.

SELECT
  date,
  MAX(CASE
      WHEN EXISTS(  SELECT 1  FROM UNNEST(hits) hits  WHERE REGEXP_CONTAINS(hits.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640')) THEN 'MUG'
      WHEN EXISTS (
    SELECT
      1
    FROM
      UNNEST(hits) hits
    WHERE
      hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social') THEN 'Social'ELSE 'Website' END) AS Property,
  geoNetwork.country AS Country,
 COUNT(DISTINCT CONCAT(cast(visitId AS STRING),fullVisitorId)) as visits,
 sum(totals.visits) as visits2,
  COUNT(DISTINCT(fullVisitorId)) AS Users,
 h.sourcePropertyInfo.sourcePropertyDisplayName as display,
  SUM((
    SELECT
      SUM(latencyTracking.pageLoadTime)
    FROM
      UNNEST(hits)
    WHERE
      page.pagePath = '/' ))/SUM((
    SELECT
      SUM(latencyTracking.pageLoadSample)
    FROM
      UNNEST(hits)
    WHERE
      page.pagePath = '/')) AS pageloadspeed,
  SUM(totals.newVisits) AS new_,
  SUM(totals.screenviews) AS PAGEVIEWS,
  SUM(totals.bounces) AS BOUNCES,
   sum(CASE
      WHEN device.isMobile = TRUE THEN (totals.visits)
      ELSE 0 END) mobilevisits,
  SUM(CASE
      WHEN trafficSource.medium = 'organic' THEN (totals.visits)
      ELSE 0 END) organicvisits,
  SUM(CASE
      WHEN EXISTS(  SELECT 1  FROM UNNEST(hits) hits  WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro')) THEN 1
      ELSE 0 END) AS NewRegistrations,
  SUM(CASE
      WHEN EXISTS(  SELECT 1  FROM UNNEST(hits) hits  WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::')) THEN 1
      ELSE 0 END) AS ClickToBuy,
  SUM(totals.transactions) AS Transactions
FROM
  `project.dataset.ga_sessions_*`, UNNEST(hits) as h
WHERE
  1 = 1
  AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01')
  AND TIMESTAMP('2017-05-01')
GROUP BY
  date,
  Country,
  display
ORDER BY
  visits DESC;

我已经尝试从 FROM 子句中简单地将 UNNEST(HITS) 命中作为 h 删除,这给了我以下错误:

I've tried simply removing UNNEST(HITS) hits as h from the FROM clause, this gave me the following error:

错误:无法在 [16:14] 处访问类型为 ARRAY> 的值上的字段 sourcePropertyInfo

Error: Cannot access field sourcePropertyInfo on a value with type ARRAY> at [16:14]

我也试过在子查询中使用它,如下所示:

I've also tried using it in a subquery as follows:

(select h.sourcePropertyInfo.sourcePropertyDisplayName from unnest(hits) h) as displayname, 

并得到错误:

标量子查询产生了多个元素

Scalar subquery produced more than one element

推荐答案

由于您需要在命中级别计算多个值,因此取消嵌套字段 hits 是最好的方法.缺点是您丢失了会话级别的 totals 字段聚合,但您仍然可以解决它.

As you need to compute several values on the hits level maybe unnesting the field hits is the best approach. The downside is that you lose the totals field aggregation for the session level but still you can work it around.

举个例子:

SELECT
  date,
  CASE
    WHEN REGEXP_CONTAINS(h.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640') THEN 'MUG'
    WHEN h.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website'
  END AS Property,
  geoNetwork.country AS Country,
  COUNT(DISTINCT CONCAT(CAST(visitId AS STRING),fullVisitorId)) AS visits,
  COUNT(DISTINCT(fullVisitorId)) AS Users,
  h.sourcePropertyInfo.sourcePropertyDisplayName AS display,
  SUM(CASE
      WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadTime END) / SUM(CASE
      WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadSample END) AS pageloadspeed,
  COUNT(DISTINCT
    CASE
      WHEN totals.newVisits = 1 THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END) new_visits,
  COUNT(CASE
         WHEN h.type = 'PAGE' THEN h.page.pagepath END) pageviews,
  SUM(CASE
       WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces,
  COUNT(DISTINCT (CASE
        WHEN device.isMobile = TRUE THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) mobilevisits,
  COUNT(DISTINCT (CASE
        WHEN trafficSource.medium = 'organic' THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) organicvisits,
  SUM(CASE
       WHEN REGEXP_CONTAINS(h.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1 END) AS NewRegistrations,
  SUM(CASE
       WHEN REGEXP_CONTAINS(h.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1 END) AS ClickToBuy,
  COUNT(h.transaction.transactionid) transactions
FROM
  `project_id.dataset_id.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  1 = 1
  AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-01')
GROUP BY
  date,
  Country,
  display,
  Property

我针对我们的数据集运行了它,它似乎工作正常.我做了一些改变:

I ran it against our dataset and it seems to be working. Some changes I did:

  • 删除了PropertyMAX 操作并将其添加到group by.
  • pageviews 被视为命中数,其中 hit.type = 'PAGE'.不确定这是否与屏幕浏览量相同.
  • 在发生进入和退出事件时计算反弹.
  • 总交易数是对交易 ID 的计数(希望您的数据集中也填写此字段).
  • Removed the MAX operation for the Property and added it to the group by.
  • pageviews was considered as the count of hits where hit.type = 'PAGE'. Not sure if this is the same for screenviews though.
  • bounce is computed when there's an entrance and exit event.
  • Total transactions is a count on transaction ids (hopefully this field is being filled in your dataset as well).

这篇关于使用 BigQuery 提取命中级别数据时,Google Analytics Metrics 被夸大了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆