我可以上传bzip2中的文件到存储中,然后在bigquery中使用它们吗? [英] Can I upload files in bzip2 to storage and then use them in bigquery?
问题描述
我有一堆(每个大文件,每个10GB)文件,格式为 bz2
。我想上传它们,然后对它们执行一些查询。大查询理解bzip,因为它是gzip吗?我应该转换它们吗?上传它们的最佳方法是什么?
I have a bunch of (largish, 10GB each) files in bz2
format. I would like to upload them and then perform some queries on them. Does big query "understand" bzip as it does gzip? Should I convert them? What would be the best way to upload them?
推荐答案
我假设这些文件是CSV或JSON格式。根据BigQuery文档( https://cloud.google.com/bigquery/preparing-数据加载),只支持 gzip
压缩。即使 bz2
被支持,但使用10GB大小的压缩文件并不是一个好主意。问题是,与未压缩的文件不同 - BigQuery将无法将它们拆分为小块,并且必须使用整个10GB文件,这将非常缓慢。
I assume the files are in CSV or JSON format. Per BigQuery documentation (https://cloud.google.com/bigquery/preparing-data-for-loading), only gzip
compression is supported. Bit even if bz2
was supported, it wouldn't be a good idea to work with 10GB sized compressed files. The problem is that unlike with uncompressed file - BigQuery won't be able to split them into pieces, and will have to work with entire 10GB file, which will be very slow.
这篇关于我可以上传bzip2中的文件到存储中,然后在bigquery中使用它们吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!