Cloud Dataflow到BigQuery-来源过多 [英] Cloud Dataflow to BigQuery - too many sources

查看：72 发布时间：2020/11/18 1:58:55 google-cloud-dataflow

本文介绍了Cloud Dataflow到BigQuery-来源过多的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一份工作，除其他事项外，还将从文件中读取的一些数据插入到BigQuery表中，以供以后进行手动分析.

I have a job that among other things also inserts some of the data it reads from files into BigQuery table for later manual analysis.

它失败并出现以下错误:

It fails with the following error:

job error: Too many sources provided: 10001. Limit is 10000., error: Too many sources provided: 10001. Limit is 10000.

什么是源"?是文件步骤还是管道步骤?

What does it refer to as "source"? Is it a file or a pipeline step?

谢谢， G

推荐答案

我猜测该错误来自BigQuery，这意味着在创建输出表时我们试图上传太多文件.

I'm guessing the error is coming from BigQuery and means that we are trying to upload too many files when we create your output table.

您能否提供有关错误/上下文的更多详细信息(例如命令行输出的摘要(如果使用BlockingDataflowPipelineRunner)，以便我可以确认吗?jobId也会有所帮助.

Could you provide some more details on the error / context (like a snippet of the commandline output (if using the BlockingDataflowPipelineRunner) so I can confirm? A jobId would also be helpful.

关于您的管道结构，是否会产生大量输出文件?这可能是大量数据，也可能是经过精细分片的输入文件，而没有随后的GroupByKey操作(这会使我们将数据重新分片成更大的块).

Is there something about your pipeline structure that is going to result in a large number of output files? That could either be a large amount of data or perhaps finely sharded input files without a subsequent GroupByKey operation (which would let us reshard the data into larger pieces).

这篇关于Cloud Dataflow到BigQuery-来源过多的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Cloud Dataflow到BigQuery-来源过多 [英] Cloud Dataflow to BigQuery - too many sources

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Cloud Dataflow到BigQuery-来源过多 [英] Cloud Dataflow to BigQuery - too many sources

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭