平整Google Analytics数据(带重复字段)不再有效 [英] Flattening Google Analytics data (with repeated fields) not working anymore

查看:72
本文介绍了平整Google Analytics数据(带重复字段)不再有效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个高级Google Analytics帐户,可以让我们访问行级别的事件数据。这些数据每天都会导出到Google Bigquery,并且每天都会在数据集中创建一个新表格。

直到一周前,我们才能将此Google Analytics数据导出到CSV,方法是将其平铺到临时临时表中,然后将其导出到CSV。我们曾经这样做的查询是这样的:

  SELECT * FROM 
flatten([xxxxxxxx.ga_sessions_20140829],
flatten([xxxxxxxx.ga_sessions_20140828],点击),
flatten([xxxxxxxx.ga_sessions_20140827],点击),
flatten([xxxxxxxx.ga_sessions_20140826],点击)

昨天我注意到这个查询现在会抛出一个错误:

 不能同时输出多个独立重复的字段。发现customDimensions_value和hits_product_productSKU 

显然,flatten()函数有所改变,因为hits_product_productSKU是命中字段。



我也尝试了一些查询历史记录中的旧查询,但它们也被破坏了。
没有发布说明提及任何更改,那么发生了什么?



如何再次导出Google AnalyticsBigQuery导出文件中的所有内容?

解决方案

这实际上是我上周提交的错误修正的结果,并且可以防止您得到不正确的结果。
$ b

在默认情况下,BigQuery会在返回它们之前展平所有查询结果,但我们只想要平整一个独立重复的字段以避免数据的跨产品扩展。问题在于我们对多个重复字段的检查在某些情况下没有考虑到父记录的重复性,这导致我们无法平整一些独立重复的字段。这意味着我们可以返回扁平的行,其中独立重复的值实际上被展平为依赖重复的值,而不是生成交叉产品,这实际上是错误的结果。



你在这里看到的是更严格的检查的结果:在我们试图压扁结果之前,在输出模式中至少有两个重复字段。



<另外需要注意的是,FLATTEN([table-value],[field])函数只会平滑指定为第二个参数的字段的重复性。当你说扁平化([xxxxxxxx.ga_sessions_20140829],点击)时,你只会压扁点击记录。如果您还想要重复其重复的孩子(产品,促销等),则必须明确地为这些字段添加另一个扁平标记,例如:

FLATTEN(FLATTEN([xxxxxxxx点击),hits.product)

-



您有几个选项可供选择您的示例工作:



1)选择较少的字段。如果您只关心获取几个字段的平滑输出,您可以通过明确选择仅关注的字段来从查询结果中删除独立重复的字段。



2 )添加更多FLATTEN。您需要在每个重复的字段上进行拼合,该字段至少包含命中,hit.product和customDimensions。您可能会发现错误消息会抱怨不同的重复字段:在模式中的重复字段中添加更多FLATTEN,直到它正常工作。


We have a premium Google Analytics account, which will give us access to row level event data. This data is exported daily to Google Bigquery and for every day a new table will be created in a data set.

Until a week ago we were able to export this Google Analytics data to CSV by flattening it to a temp staging table and then export it to CSV. The query we used to do so was like this:

SELECT * FROM 
    flatten([xxxxxxxx.ga_sessions_20140829],hits),
    flatten([xxxxxxxx.ga_sessions_20140828],hits),
    flatten([xxxxxxxx.ga_sessions_20140827],hits),
    flatten([xxxxxxxx.ga_sessions_20140826],hits)

Yesterday I noticed this query will now throw an error:

Cannot output multiple independently repeated fields at the same time. Found customDimensions_value and hits_product_productSKU

Apparently something has changed regarding the flatten() function, as hits_product_productSKU is child of the hits field.

I also tried some old queries which were in the query history, but they're broken as well. There is no release note mentioning any change, so what is happening?

How can I export everything in the Google Analytics BigQuery export files again?

解决方案

This is actually the result of a bugfix I submitted last week, and is preventing you from getting incorrect results.

BigQuery by default flattens all query results before returning them, but we only want to flatten one independently repeated field to avoid a cross-product expansion of data. The bug was that our checks for multiple repeated fields failed to take into account repeatedness of parent records in some cases, which caused us to fail to flatten some independently repeated fields. This meant that we could return flat rows where independently repeated values were actually "flattened" into dependently repeated values, instead of generating the cross-product, which is actually a wrong result.

What you're seeing here is a result of the stricter check: you have (at least) two repeated fields in your output schema before we attempt to flatten the results.

Another important thing to note is that the FLATTEN([table-value], [field]) function only flattens the repeatedness of the field you specify as the second argument. When you say flatten([xxxxxxxx.ga_sessions_20140829],hits), you are flattening only the "hits" record. If you also want to flatten its repeated children (product, promotion, etc.) you must explicitly add another flatten for those fields, like:

FLATTEN(FLATTEN([xxxxxxxx.ga_sessions_20140829],hits),hits.product)

--

You have a couple options to make your example work:

1) Select fewer fields. If you only care about getting flattened output of a few fields, you can remove the independently repeated fields from your query results by explicitly selecting only the fields you care about.

2) Add more FLATTENs. You'll need to flatten on each repeated field, which looks to include at least hits, hits.product, and customDimensions. You may find that the error message will then complain about different repeated fields: add more FLATTENs on the repeated fields in your schema until it works.

这篇关于平整Google Analytics数据(带重复字段)不再有效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆