加载到BigQuery表时发生内部错误 [英] Internal error while loading to Bigquery table

查看:114
本文介绍了加载到BigQuery表时发生内部错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  bq load --project_id = ardent-course- 601 --source_format = NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs:// sm-uk-hadoop / queries / logsToBq_transformLogs / rome_defaults / 20140819 / 23af7218-617d-42e8-884e-f213a583094a / part * / opt / sm-analytics / projects / logsTobqMR / jsonschema / rome_defaultsSchema.txt 

我得到这个错误:


等待bqjob_r46f38146351d545_00000147ef890755_1 ...(11s)当前状态:DONE
加载操作中的BigQuery错误:处理作业'ardent-course-601:bqjob_r46f38146351d545_00000147ef890755_1'时出错:错误太多遇到。限制为:0.
失败详细信息:
- 文件:5:意外。


在此之后我尝试了很多次,但仍然遇到同样的错误。



为了调试出错的地方,我将每个文件逐个加载到BigQuery表中。例如:

  / usr / local / bin / bq加载--project_id = ardent-course-601 --source_format = NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part-m-00011.gz / opt / sm-analytics / projects / logsTobqMR / jsonschema /rome_defaultsSchema.txt 

总共有11个文件,每个文件运行良好。



有人可以帮忙吗?这是Bigquery方面的一个bug吗?



谢谢。

解决方案

读取其中一个文件时出错: gs://...part-m-00005.gz



看看导入日志,看起来gzip阅读器在解压文件时遇到错误。


看起来这个文件可能实际上没有被压缩。 BigQuery对列表中第一个文件的标头进行采样,以确定它是处理压缩或未压缩的文件并确定压缩类型。当您一次导入所有文件时,它只会对第一个文件进行采样。



当您单独运行这些文件时,bigquery将读取文件头并确定它实际上并没有被压缩(尽管有后缀'.gz'),因此将它作为普通的平面文件导入。



如果运行的是不混合压缩的加载和未压缩的文件,它应该工作成功。



请让我知道,如果您认为情况并非如此,那么我会再挖掘一些。


I ran this command to load 11 files to a Bigquery table:

bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part* /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt

I got this error:

Waiting on bqjob_r46f38146351d545_00000147ef890755_1 ... (11s) Current status: DONE BigQuery error in load operation: Error processing job 'ardent-course-601:bqjob_r46f38146351d545_00000147ef890755_1': Too many errors encountered. Limit is: 0. Failure details: - File: 5: Unexpected. Please try again.

I tried many times after that and still got the same error.

To debug what went wrong, I instead load each file one by one to the Bigquery table. For example:

/usr/local/bin/bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part-m-00011.gz /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt

There are 11 files total and each ran fine.

Could someone please help? Is this a bug on Bigquery side?

Thank you.

解决方案

There was an error reading one of the files: gs://...part-m-00005.gz

Looking at the import logs, it appears that the gzip reader encountered an error decompressing the file.

It looks like that file may not actually be compressed. BigQuery samples the header of the first file in the list to determine whether it is dealing with compressed or uncompressed files and to determine the compression type. When you import all of the files at once, it only samples the first file.

When you run the files individually, bigquery reads the header of the file and determines that it isn't actually compressed (despite having the suffix '.gz') so imports it as a normal flat file.

If you run a load that doesn't mix compressed and uncompressed files, it should work successfully.

Please let me know if you think this is not the case and I'll dig in some more.

这篇关于加载到BigQuery表时发生内部错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆