如何仅在 BigQuery 中查询流缓冲区中的数据? [英] How to query for data in streaming buffer ONLY in BigQuery?
问题描述
我们在 BigQuery 中有一个按天分区的表,它通过流式插入更新.
We have a table partitioned by day in BigQuery, which is updated by streaming inserts.
doc 说:流式传输到分区表时, 流缓冲区中的数据对于 _PARTITIONTIME 伪列具有 NULL 值"
The doc says that: "when streaming to a partitioned table, data in the streaming buffer has a NULL value for the _PARTITIONTIME pseudo column"
但是如果我查询 select count(*) from table where _PARTITIONTIME is NULL
它总是返回 0,即使 bq show
告诉我有很多流缓冲区中的行.
But if I query for select count(*) from table where _PARTITIONTIME is NULL
it always returns 0, even though bq show
tells me that there are a lot of rows in the streaming buffer.
这是否意味着流缓冲区中的行根本不存在伪列?在任何情况下,如何仅在流式缓冲区中查询数据而不使其成为全表扫描?
Does this mean that the pseudo column is not present at all for rows in streaming buffer? In any case, how can I query for the data ONLY in the streaming buffer without it becoming a full table scan?
提前致谢
推荐答案
流式缓冲区中的数据对于 _PARTITIONTIME
列具有 NULL
值.
Data in the streaming buffer has a NULL
value for the _PARTITIONTIME
column.
SELECT
fields
FROM
`dataset.partitioned_table_name`
WHERE
_PARTITIONTIME IS NULL
https://cloud.google.com/bigquery/docs/partitioned-表格#copying_to_partitioned_tables
这篇关于如何仅在 BigQuery 中查询流缓冲区中的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!