BigQuery流式传输和分区:何时真正评估_PARTITIONTIME? [英] BigQuery streaming and partitions: when is _PARTITIONTIME really evaluated?
问题描述
_PARTITIONTIME表示将行插入BigQuery的时间(截断到一天).
_PARTITIONTIME represents the time (truncated to the day) when a row is inserted into BigQuery.
However, when looking closely at the streaming mechanism (https://cloud.google.com/blog/products/gcp/life-of-a-bigquery-streaming-insert ), we can see 3 different "insertion times" when a row is inserted into BigQuery:
- 流式提取工作者"收到该行的时间
- 将行存储到流缓冲"中的时间
- 该行处于提取状态的时间,工作人员将其存储到最终(电容器)存储中.
有人知道这三个时刻中的哪一个对应于_PARTITIONTIME吗?
Does somebody knows which one of those 3 moments correspond to _PARTITIONTIME ?
推荐答案
当行仍在流缓冲区中时,此行的_PARTITIONTIME为空;提取该行之后,提取时间为该行的_PARTITIONTIME.一个例外是,当该行直接流式传输到分区时,为"table $ 20180101".在这种情况下,_PARTITIONTIME始终为"2018-01-01".
When the row is still in the streaming buffer, _PARTITIONTIME is null for this row; after the row is extracted, the extraction time is the _PARTITIONTIME for this row. An exception is that when the row is streamed into a partition directly, "table$20180101". In this case the _PARTITIONTIME is always "2018-01-01".
这篇关于BigQuery流式传输和分区:何时真正评估_PARTITIONTIME?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!