BigQuery 不使用分区列处理以毫秒为单位的时间戳 [英] BigQuery not dealing with timestamp in millisecond with partition column
问题描述
我有一个 unix 时间戳列,它在我的 csv 文件中以毫秒表示.现在,当我在 bigQuery 表中插入此数据并进行查询时,出现此错误
I have a unix timestamp column which is represented in millisecond in my csv file. Now when I insert this data in my bigQuery table and query it I get this error
现在我想将此列作为分区列.我有几个问题1)即使我保存为int64,我如何在这个字段上做一个分区列?2)我想避免重复的表.
Now I would like to make this column as a partition column. I have a few questions 1) Even if I save it as int64, how can I make a partition column on this field? 2) I would like to avoid duplicate tables.
推荐答案
如果您的时间戳数据以毫秒表示,您将无法正确创建分区表.相反,您应该使用@TimBiegeleisen 所述的TIMESTAMP 或 DATE 列".时间戳 将使用微秒精度.一旦您的列以微秒为单位,您就可以使用以下内容来创建分区表:
If your timestamp data is represented in milliseconds, you won't be able to properly create the Partitioned table. Instead you should use a "TIMESTAMP or DATE column" as stated by @TimBiegeleisen. Timestamp will use microsecond precision. Once your column is in microsecond you can use something like the following to create the partitioned table:
bq load --schema <your-timestamp-column>:TIMESTAMP,<some-other-column>:FLOAT --skip_leading_rows=1 --source_format=CSV --time_partitioning_field=<your-timestamp-column> <your-dataset>.<your-table> <your-csv-file>
(如果 csv 文件中有列名,请使用 --skip_leading_rows
.)
(use --skip_leading_rows
if you have the column names in the csv file.)
使用标准 SQL 而不是 Legacy 查询您的表,正如您在官方中看到的 文档在这里:
Query your table using Standard SQL not Legacy, as you can see in official docs here:
您不能使用旧版 SQL 查询分区表或将查询结果写入分区表.
You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.
这篇关于BigQuery 不使用分区列处理以毫秒为单位的时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!