扁平化 Google Analytics 数据(带有重复字段)不再起作用 [英] Flattening Google Analytics data (with repeated fields) not working anymore

查看:16
本文介绍了扁平化 Google Analytics 数据(带有重复字段)不再起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个高级 Google Analytics(分析)帐户,可让我们访问行级事件数据.这些数据每天都会导出到 Google Bigquery,并且每天都会在数据集中创建一个新表.

We have a premium Google Analytics account, which will give us access to row level event data. This data is exported daily to Google Bigquery and for every day a new table will be created in a data set.

直到一周前,我们才能够通过将 Google Analytics 数据展平到临时临时表然后将其导出为 CSV 将其导出为 CSV.我们过去这样做的查询是这样的:

Until a week ago we were able to export this Google Analytics data to CSV by flattening it to a temp staging table and then export it to CSV. The query we used to do so was like this:

SELECT * FROM 
    flatten([xxxxxxxx.ga_sessions_20140829],hits),
    flatten([xxxxxxxx.ga_sessions_20140828],hits),
    flatten([xxxxxxxx.ga_sessions_20140827],hits),
    flatten([xxxxxxxx.ga_sessions_20140826],hits)

昨天我注意到这个查询现在会抛出一个错误:

Yesterday I noticed this query will now throw an error:

Cannot output multiple independently repeated fields at the same time. Found customDimensions_value and hits_product_productSKU

显然,flatten() 函数发生了一些变化,因为 hits_product_productSKU 是 hits 字段的子项.

Apparently something has changed regarding the flatten() function, as hits_product_productSKU is child of the hits field.

我还尝试了一些查询历史记录中的旧查询,但它们也被破坏了.没有提及任何更改的发行说明,那么发生了什么?

I also tried some old queries which were in the query history, but they're broken as well. There is no release note mentioning any change, so what is happening?

如何再次导出 Google Analytics BigQuery 导出文件中的所有内容?

How can I export everything in the Google Analytics BigQuery export files again?

推荐答案

这实际上是我上周提交的一个错误修正的结果,它可以防止你得到错误的结果.

This is actually the result of a bugfix I submitted last week, and is preventing you from getting incorrect results.

BigQuery 默认在返回所有查询结果之前展平所有查询结果,但我们只想展平一个独立重复的字段,以避免数据的跨产品扩展.问题是我们对多个重复字段的检查在某些情况下没有考虑到父记录的重复性,这导致我们无法展平一些独立重复的字段.这意味着我们可以返回扁平行,其中独立重复的值实际上被扁平化"为相关重复值,而不是生成叉积,这实际上是错误的结果.

BigQuery by default flattens all query results before returning them, but we only want to flatten one independently repeated field to avoid a cross-product expansion of data. The bug was that our checks for multiple repeated fields failed to take into account repeatedness of parent records in some cases, which caused us to fail to flatten some independently repeated fields. This meant that we could return flat rows where independently repeated values were actually "flattened" into dependently repeated values, instead of generating the cross-product, which is actually a wrong result.

您在这里看到的是更严格检查的结果:在我们尝试展平结果之前,您的输出架构中有(至少)两个重复的字段.

What you're seeing here is a result of the stricter check: you have (at least) two repeated fields in your output schema before we attempt to flatten the results.

另一个需要注意的重要事项是 FLATTEN([table-value], [field]) 函数只会展平您指定为第二个参数的字段的重复性.当你说 flatten([xxxxxxxx.ga_sessions_20140829],hits) 时,你只是扁平化了命中"记录.如果您还想展平其重复的子项(产品、促销等),则必须为这些字段明确添加另一个展平,例如:

Another important thing to note is that the FLATTEN([table-value], [field]) function only flattens the repeatedness of the field you specify as the second argument. When you say flatten([xxxxxxxx.ga_sessions_20140829],hits), you are flattening only the "hits" record. If you also want to flatten its repeated children (product, promotion, etc.) you must explicitly add another flatten for those fields, like:

FLATTEN(FLATTEN([xxxxxxxx.ga_sessions_20140829],hits),hits.product)

FLATTEN(FLATTEN([xxxxxxxx.ga_sessions_20140829],hits),hits.product)

--

您有几个选项可以使您的示例工作:

You have a couple options to make your example work:

1) 选择较少的字段.如果您只关心获得几个字段的扁平化输出,则可以通过仅显式选择您关心的字段来从查询结果中删除独立重复的字段.

1) Select fewer fields. If you only care about getting flattened output of a few fields, you can remove the independently repeated fields from your query results by explicitly selecting only the fields you care about.

2) 添加更多 FLATTEN.您需要对每个重复的字段进行展平,这些字段看起来至少包括 hits、hits.product 和 customDimensions.您可能会发现错误消息随后会抱怨不同的重复字段:在架构中的重复字段上添加更多 FLATTEN,直到它起作用为止.

2) Add more FLATTENs. You'll need to flatten on each repeated field, which looks to include at least hits, hits.product, and customDimensions. You may find that the error message will then complain about different repeated fields: add more FLATTENs on the repeated fields in your schema until it works.

这篇关于扁平化 Google Analytics 数据(带有重复字段)不再起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆