使用通配符将大文件从BigQuery导出到Google云 [英] Exporting large file from BigQuery to Google cloud using wildcard
问题描述
我在BigQuery中有8Gb表,我正尝试将其导出到Google Cloud Storage(GCS).如果我照原样指定网址,则会收到错误消息
I have 8Gb table in BigQuery that I'm trying to export to Google Cloud Storage (GCS). If I specify url as it is, I'm getting an error
Errors:
Table gs://***.large_file.json too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data. (error code: invalid)
好吧...我在文件名中指定*,但是将其导出为2个文件:一个7.13Gb和一个〜150Mb.
Okay... I'm specifying * in a file name, but it exports it in 2 files: one 7.13Gb and one ~150Mb.
UPD .我以为我应该得到大约8个文件,每个文件1Gb?我错了吗?还是我做错了什么?
UPD. I thought I should get about 8 files, 1Gb each? Am I wrong? Or what am I doing wrong?
P.S.我在WebUI模式以及Java库中都尝试过.
P.S. I tried this in WebUI mode as well as using Java library.
推荐答案
对于某些大小或更大的文件,BigQuery将导出到多个GCS文件-这就是为什么它要求使用"*" glob的原因.
For files of certain size or larger, BigQuery will export to multiple GCS files - that's why it asks for the "*" glob.
一旦您在GCS中拥有多个文件,就可以通过compose
操作将它们合并为1:
Once you have multiple files in GCS, you can join them into 1 with the compose
operation:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
- https://cloud.google.com/storage/docs/gsutil /commands/compose
- https://cloud.google.com/storage/docs/gsutil/commands/compose
这篇关于使用通配符将大文件从BigQuery导出到Google云的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!