BigQuery过滤标准sql中的记录 [英] BigQuery filtering records in standard sql

查看:292
本文介绍了BigQuery过滤标准sql中的记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在计算在我们的主页上提交邮编的所有访问者。我在传统SQL中提出了以下查询:

pre $ SELECT $ $ $ b $ TABLE_DATE_RANGE([ga_sessions_],TIMESTAMP ('2017-01-29'),CURRENT_TIMESTAMP())
where hits.page.pagePath ='/ broadband /'
and visitStartTime> 1483228800
和hits.type ='EVENT'
和hits.eventInfo.eventCategory ='Homepage'
和hits.eventInfo.eventAction ='提交邮政编码';

然后,我想将它转换为标准SQL以在CTE中使用,虽然看起来不错。

  SELECT fullVisitorId,visitStartTime 
从``ga_sessions_ *``,UNNEST(hits)h
其中

_TABLE_SUFFIX> '2017-01-29'
AND h.page.pagePath ='/ broadband /'
and visitStartTime> 1483228800
和h.type ='EVENT'
和h.eventInfo.eventCategory ='Homepage'
和h.eventInfo.eventAction ='提交邮政编码';

第一个处理327 MB并返回4117个结果,第二个处理6.98 GB并返回60745个结果。



我看了迁移指南,但它对我没有任何帮助。

ga_sessions包含标准模式的GA导入到Bigquery中。

解决方案

看起来不同之处在于标准SQL当 CROSS JOIN UNNEST(hits)放在 FROM <时, hits / code>子句,因此在结果中添加更多行。更多等价的查询将会是:

pre $#$标准SQL
SELECT fullVisitorId,visitStartTime $ b $``从`ga_sessions_ *`$
其中
_TABLE_SUFFIX> '20170129'
and visitStartTime> 1483228800
和EXISTS(
SELECT 1 FROM UNNEST(hits)h
WHERE h.type ='EVENT'
and h.page.pagePath ='/ broadband /'
和h.eventInfo.eventCategory ='主页'
和h.eventInfo.eventAction ='提交邮编');


I'm working on counting all visitors that submitted postcode on our homepage. I came up with following query in legacy SQL:

SELECT fullVisitorId, visitStartTime
FROM TABLE_DATE_RANGE([ga_sessions_], TIMESTAMP('2017-01-29'), CURRENT_TIMESTAMP())
where hits.page.pagePath = '/broadband/'
and visitStartTime > 1483228800
and hits.type   = 'EVENT'
and hits.eventInfo.eventCategory = 'Homepage'
and hits.eventInfo.eventAction = 'Submit Postcode';

I then wanted to convert it to standard SQL to use within CTE and came up with this one that doesn't seem right though.

SELECT fullVisitorId, visitStartTime
FROM ``ga_sessions_*``, UNNEST(hits) as h
where 

_TABLE_SUFFIX > '2017-01-29'
AND h.page.pagePath = '/broadband/'
and visitStartTime > 1483228800
and h.type  = 'EVENT'
and h.eventInfo.eventCategory = 'Homepage'
and h.eventInfo.eventAction = 'Submit Postcode';

The first one processes 327 MB and returns 4117 results, the second one processes 6.98 GB and returns 60745 results.

I've looked at the migration guide, but it didn't prove very helpful for me.

ga_sessions has standard schema of GA import into Bigquery.

解决方案

It looks like difference is coming from the fact that with Standard SQL you are flattening the table on hits when you CROSS JOIN UNNEST(hits) in the FROM clause, and therefore adding more rows to the result. More equivalent query would be:

#standardSQL
SELECT fullVisitorId, visitStartTime
FROM `ga_sessions_*`
where 
_TABLE_SUFFIX > '20170129'   
and visitStartTime > 1483228800
and EXISTS(
  SELECT 1 FROM UNNEST(hits) h 
  WHERE h.type  = 'EVENT'
    and h.page.pagePath = '/broadband/'
    and h.eventInfo.eventCategory = 'Homepage'
    and h.eventInfo.eventAction = 'Submit Postcode');

这篇关于BigQuery过滤标准sql中的记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆