使用通配符将大文件从BigQuery导出到Google云 [英] Exporting large file from BigQuery to Google cloud using wildcard

查看:151
本文介绍了使用通配符将大文件从BigQuery导出到Google云的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在BigQuery中有8Gb表,我正尝试将其导出到Google Cloud Storage(GCS).如果我照原样指定网址,则会收到错误消息

I have 8Gb table in BigQuery that I'm trying to export to Google Cloud Storage (GCS). If I specify url as it is, I'm getting an error

Errors:
Table gs://***.large_file.json too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data. (error code: invalid)

好吧...我在文件名中指定*,但是将其导出为2个文件:一个7.13Gb和一个〜150Mb.

Okay... I'm specifying * in a file name, but it exports it in 2 files: one 7.13Gb and one ~150Mb.

UPD .我以为我应该得到大约8个文件,每个文件1Gb?我错了吗?还是我做错了什么?

UPD. I thought I should get about 8 files, 1Gb each? Am I wrong? Or what am I doing wrong?

P.S.我在WebUI模式以及Java库中都尝试过.

P.S. I tried this in WebUI mode as well as using Java library.

推荐答案

对于某些大小或更大的文件,BigQuery将导出到多个GCS文件-这就是为什么它要求使用"*" glob的原因.

For files of certain size or larger, BigQuery will export to multiple GCS files - that's why it asks for the "*" glob.

一旦您在GCS中拥有多个文件,就可以通过compose操作将它们合并为1:

Once you have multiple files in GCS, you can join them into 1 with the compose operation:

gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆