BigQuery串流资料不在表格中 [英] BigQuery streamed data is not in table

查看:43
本文介绍了BigQuery串流资料不在表格中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个ETL流程,可将数据从mongo集群流式传输到BigQuery.每周通过cron运行一次,并在需要时手动运行.我为每个客户都有一个单独的数据集,它们之间的表结构完全相同.

I've got an ETL process which streams data from a mongo cluster to BigQuery. This runs via cron on a weekly basis, and manually when needed. I have a separate dataset for each of our customers, with the table structures being identical across them.

我只是运行了该过程,只是发现虽然我的所有数据块都从insertAll api返回了成功"响应({"kind":"bigquery#tableDataInsertAllResponse"}),但该表对于一个特定的表为空数据集.

I just ran the process, only to find that while all of my data chunks returned a "success" response ({"kind": "bigquery#tableDataInsertAllResponse"}) from the insertAll api, the table is empty for one specific dataset.

我之前见过这种情况发生过几次,但从未能够重新创建.现在,我又运行了两次,得到了相同的结果.我知道我的代码可以正常工作,因为其他数据集已正确填充.

I had seen this happen a few times before, but was never able to re-create. I've now run it twice more with the same results. I know my code is working, because the other datasets are properly populated.

表详细信息中没有流缓冲区",并且运行count(*)查询将返回0响应.我什至尝试从查询中删除缓存的结果以强制更新-但没有帮助.

There's no 'streaming buffer' in the table details, and running a count(*) query returns 0 response. I've even tried removing cached results from the query, to force freshness - but nothing helps.

编辑-从数据流开始经过10分钟后(我保留了带有时间戳的日志)-部分数据现在显示在表格中;但是,又过了40分钟,看起来没有任何新数据流入.

Edit - After 10 minutes from my data stream (I keep timestamped logs) - partial data now appears in the table; however, after another 40 minutes, it doesn't look like any new data is flowing in.

还有其他人在流媒体服务中遇到麻烦吗?

Is anyone else experiencing hiccups in streaming service?

可能值得一提的是,我的过程的一部分是将现有表复制到备份表,删除原始表,然后使用最新架构重新创建它.这会影响某些特定边缘情况下的刀片吗?

Might be worth mentioning that part of my process is to copy the existing table to a backup table, remove the original table, and recreate it with the latest schema. Could this be affecting the inserts on some specific edge cases?

推荐答案

可能这是您正在发生的事情:

Probably this is what is happening to you: BigQuery table truncation before streaming not working

如果删除或创建表,则必须等待至少2分钟才能开始在表上流式传输数据.

If you delete or create a table, you must wait a least 2 minutes to start streaming data on it.

由于您提到所有其他表都正常工作,并且只有具有删除过程的表没有保存数据,所以这可能解释了您正在观察的内容.

Since you mentioned that all other tables are working correctly and only the table that has the deletion process is not saving data then probably this explains what you are observing.

要解决此问题,您可以在 delete create 操作之后等待更长的时间才能流传输数据,或者可以更改上传数据的策略(也许可以保存数据)插入一些CSV文件,然后使用 job 插入方法将数据上传到表格中.

To fix this issue you can either wait a bit longer before streaming data after the delete and create operations or maybe changing the strategy to upload the data (maybe saving it into some CSV file and then using job insert methods to upload the data into the table).

这篇关于BigQuery串流资料不在表格中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆