BigQuery不处理分区列中的时间戳(以毫秒为单位) [英] BigQuery not dealing with timestamp in millisecond with partition column
问题描述
我在我的csv文件中有一个unix时间戳列,以毫秒为单位表示.现在,当我将这些数据插入到bigQuery表中并对其进行查询时,会出现此错误
I have a unix timestamp column which is represented in millisecond in my csv file. Now when I insert this data in my bigQuery table and query it I get this error
现在,我想将此列作为分区列. 我有几个问题 1)即使将其保存为int64,如何在此字段上创建一个分区列? 2)我想避免重复的表.
Now I would like to make this column as a partition column. I have a few questions 1) Even if I save it as int64, how can I make a partition column on this field? 2) I would like to avoid duplicate tables.
推荐答案
如果时间戳数据以毫秒表示,则将无法正确创建分区表.相反,您应该使用@TimBiegeleisen所述的"TIMESTAMP或DATE列".时间戳将使用微秒精度.一旦您的列以微秒为单位,您就可以使用类似于以下内容的方法来创建分区表:
If your timestamp data is represented in milliseconds, you won't be able to properly create the Partitioned table. Instead you should use a "TIMESTAMP or DATE column" as stated by @TimBiegeleisen. Timestamp will use microsecond precision. Once your column is in microsecond you can use something like the following to create the partitioned table:
bq load --schema <your-timestamp-column>:TIMESTAMP,<some-other-column>:FLOAT --skip_leading_rows=1 --source_format=CSV --time_partitioning_field=<your-timestamp-column> <your-dataset>.<your-table> <your-csv-file>
(如果csv文件中有列名,请使用--skip_leading_rows
.)
(use --skip_leading_rows
if you have the column names in the csv file.)
使用标准SQL而不是旧版查询表,因为您可以在官方网站上看到此处的文档:
Query your table using Standard SQL not Legacy, as you can see in official docs here:
您不能使用旧版SQL查询分区表或将查询结果写入分区表.
You cannot use legacy SQL to query partitioned tables or to write query results to partitioned tables.
这篇关于BigQuery不处理分区列中的时间戳(以毫秒为单位)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!