除了执行时间之外,BigQuery返回了完全相同的SELECT Query的不一致(=不同)结果 [英] BigQuery returned inconsistent (= different) results of exactly same SELECT Query except for executing time

查看:132
本文介绍了除了执行时间之外,BigQuery返回了完全相同的SELECT Query的不一致(=不同)结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

查询是 SELECT COUNT(*)FROM [table.click] WHERE time> = DATE_ADD('2015-03-25 06:00:00',-9,'HOUR') AND时间< DATE_ADD('2015-03-25 07:00:00',-9,'HOUR')



时间在6到7之间。

  [07:25:57] $ bq querySELECT COUNT(*)FROM [table .click] WHERE time> = DATE_ADD('2015-03-25 06:00:00',-9,'HOUR')和时间< DATE_ADD('2015-03-25 07:00:00', - 9,'HOUR')
等待bqjob_r5e92dc9eca9622ed_0000014c4de50d59_1 ...(0s)当前状态:完成
+ ----- +
| f0_ |
+ ----- +
| 0 |
+ ----- +`

但是我的导入数据的过程已经起作用很长一段时间,我确认这个过程没有问题(= 6到7之间的流入导入效果很好)。

过了10分钟后,我执行了完全相同的查询。

  [07:35:15] $ bq querySELECT COUNT(*)FROM [table.click] WHERE time> = DATE_ADD('2015-03-25 06:00:00',-9,'HOUR')和时间等待bqjob_r338acff11f068b44_0000014c4ded45c5_1 ...(2s)当前状态:完成
+ ------ +
| f0_ |
+ ------ +
| 1954 |
+ ------ +`

这次运作良好!在此之后,BigQuery一直在返回1954条记录,如下所示。

  [10:49:59] $ bq querySELECT COUNT (*)FROM [table.click] WHERE time> = DATE_ADD('2015-03-25 06:00:00',-9,'HOUR')and time< DATE_ADD('2015-03-25 07: 00:00',-9,'HOUR')
等待bqjob_r5693edc7523c1ca2_0000014c4e9f4f52_1 ...(0s)当前状态:完成
+ ------ +
| f0_ |
+ ------ +
| 1954 |
+ ------ +`

Google BigQuery相同的查询会给出不同的结果是类似的问题帖子,但是 BigQuery团队作出了回应,表示问题已解决,因此我决定发布此问题帖子。



这个问题发生了两次。第一次是 2015-03-24 22:00:00 JST〜2015-03-24 23:00:00 JST ,第二次是 2015-03-25 06:00:00 JST〜2015-03-25 07:00:00 JST



我附上了截图 $ b

发生时的Google Cloud状态显示BigQuery没有系统麻烦。

https://i.stack.imgur.com/qLqOB.pngalt =发生时的Google Cloud状态>

解决方案

正如Pentium 10指出的那样,你似乎受到流媒体数据延迟这一事实的影响。


好像您的第一个查询在流式插入传播到处之前触发您的数据。这是正常的。似乎不是一个陈旧的数据问题,正如Pentium指出的那样,更重要的是等待数据流之后的延迟。 10分钟似乎有点长,但我不明白实际问题在哪里。


The query is SELECT COUNT(*) FROM [table.click] WHERE time >= DATE_ADD('2015-03-25 06:00:00', -9, 'HOUR') AND time < DATE_ADD('2015-03-25 07:00:00', -9, 'HOUR')

I wanted to fetch records with time of between 6 to 7.

[07:25:57] $ bq query "SELECT COUNT(*) FROM [table.click] WHERE time >= DATE_ADD('2015-03-25 06:00:00', -9, 'HOUR') AND time < DATE_ADD('2015-03-25 07:00:00', -9, 'HOUR')"
Waiting on bqjob_r5e92dc9eca9622ed_0000014c4de50d59_1 ... (0s) Current status: DONE
+-----+
| f0_ |
+-----+
|   0 |
+-----+`

But the my process of importing data has worked for a long time and I confirmed this process is no problem (= streaming import between 6 to 7 worked well).

After 10 mins passed, I executed exactly same query.

[07:35:15]$ bq query "SELECT COUNT(*) FROM [table.click] WHERE time >= DATE_ADD('2015-03-25 06:00:00', -9, 'HOUR') AND time < DATE_ADD('2015-03-25 07:00:00', -9, 'HOUR')"
Waiting on bqjob_r338acff11f068b44_0000014c4ded45c5_1 ... (2s) Current status: DONE    
+------+
| f0_  |
+------+
| 1954 |
+------+`

It worked well this time ! And after that, BigQuery has kept returning 1954 records like below.

[10:49:59]$ bq query "SELECT COUNT(*) FROM [table.click] WHERE time >= DATE_ADD('2015-03-25 06:00:00', -9, 'HOUR') AND time < DATE_ADD('2015-03-25 07:00:00', -9, 'HOUR')"
Waiting on bqjob_r5693edc7523c1ca2_0000014c4e9f4f52_1 ... (0s) Current status: DONE    
+------+
| f0_  |
+------+
| 1954 |
+------+`

Google BigQuery same queries give different results is a similar question post, but the BigQuery team made a response that the issue is resolved, so I decide to issue this question post.

This problem happened twice. First time is 2015-03-24 22:00:00 JST ~ 2015-03-24 23:00:00 JST, and second time is 2015-03-25 06:00:00 JST ~ 2015-03-25 07:00:00 JST.

I attached a capture of Google Cloud Status at the time that happened that showed no system trouble of BigQuery globally.

解决方案

As Pentium10 pointed out, it seems that you were hit by the fact that there is a delay in streaming data.

Seems like your first query hit your data before the streaming insert propagate everywhere. Which is normal. Doesn't seem to be a matter of stale data, more a matter of, as was pointed out by Pentium, waiting for the delay after streaming data. 10 minutes seems a bit long maybe, but I don't see where the actual problem is.

这篇关于除了执行时间之外,BigQuery返回了完全相同的SELECT Query的不一致(=不同)结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆