从数据流流式传输时从BigQuery删除数据 [英] Delete data from BigQuery while streaming from Dataflow

查看:175
本文介绍了从数据流流式传输时从BigQuery删除数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在从Apache Beam管道将数据加载到BigQuery表中时将其删除.

Is it possible to delete data from a BigQuery table while loading data into it from an Apache Beam pipeline.

我们的用例是,我们需要基于时间戳字段(Dataflow从Pubsub主题提取消息的时间)从表中删除3天之前的数据.

Our use case is such that we need to delete 3 days prior data from the table on the basis of a timestamp field (time when Dataflow pulls message from Pubsub topic).

是否建议这样做?如果是,有什么方法可以实现?

Is it recommended to do something like this? If yes, is there any way to achieve this?

谢谢.

推荐答案

我认为将表设置为分区表(基于摄取时间)的最佳方法

I think best way of doing this setup you table as partitioned (based on ingestion time) table https://cloud.google.com/bigquery/docs/partitioned-tables And you can drop old partition manually

bq rm 'mydataset.mytable$20160301'

您还可以设置到期时间

bq update --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[TABLE]

如果摄取时间不适合您,则可以查看 https://cloud.google.com/bigquery/docs/creating-column-partitions -但它处于测试版中-工作可靠,但这是您的要求

If ingestion time does not work for you you can look into https://cloud.google.com/bigquery/docs/creating-column-partitions - but it is in beta - works reliably but it is your call

这篇关于从数据流流式传输时从BigQuery删除数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆