从数据流流式传输时从BigQuery删除数据 [英] Delete data from BigQuery while streaming from Dataflow
问题描述
是否有可能在从Apache Beam管道将数据加载到BigQuery表中时将其删除.
Is it possible to delete data from a BigQuery table while loading data into it from an Apache Beam pipeline.
我们的用例是,我们需要基于时间戳字段(Dataflow从Pubsub主题提取消息的时间)从表中删除3天之前的数据.
Our use case is such that we need to delete 3 days prior data from the table on the basis of a timestamp field (time when Dataflow pulls message from Pubsub topic).
是否建议这样做?如果是,有什么方法可以实现?
Is it recommended to do something like this? If yes, is there any way to achieve this?
谢谢.
推荐答案
I think best way of doing this setup you table as partitioned (based on ingestion time) table https://cloud.google.com/bigquery/docs/partitioned-tables And you can drop old partition manually
bq rm 'mydataset.mytable$20160301'
您还可以设置到期时间
bq update --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[TABLE]
如果摄取时间不适合您,则可以查看 https://cloud.google.com/bigquery/docs/creating-column-partitions -但它处于测试版中-工作可靠,但这是您的要求
If ingestion time does not work for you you can look into https://cloud.google.com/bigquery/docs/creating-column-partitions - but it is in beta - works reliably but it is your call
这篇关于从数据流流式传输时从BigQuery删除数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!