Apache Beam with Dataflow-从BigQuery读取时为Nullpointer [英] Apache Beam with Dataflow - Nullpointer when reading from BigQuery

查看:99
本文介绍了Apache Beam with Dataflow-从BigQuery读取时为Nullpointer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用从BigQuery表和文件读取的apache beam编写的Google数据流在Google数据流上运行作业。转换数据并将其写入其他BigQuery表。作业通常成功,但是有时从大查询表中读取时我会随机得到nullpointer异常,而我的作业失败:

I am running a job on google dataflow written with apache beam that reads from BigQuery table and from files. Transforms the data and writes it into other BigQuery tables. The job "usually" succeeds, but sometimes I am randomly getting nullpointer exception when reading from big query table and my job fails:

(288abb7678892196): java.lang.NullPointerException
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:98)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:261)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:209)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:184)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:161)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:47)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:341)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:297)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:125)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:105)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我不知道这和什么有关。当我清除temp目录并重新上传模板时,作业再次通过。

I cannot figure out what is this connected to. When I clear the temp directory and reupload my template the job passes again.

我从BQ读取的方式很简单:

The way I read from BQ is simply with:

BigQueryIO.read().fromQuery()

非常感谢您提供帮助。

有人吗?

推荐答案

我最终在Google issuetracker中添加了错误。
与Google员工进行了更长时间的交谈并进行了调查之后,事实证明,将模板与从BigQuery读取的数据流批处理作业一起使用是没有意义的,因为您只能执行一次。

I ended up adding bug in google issuetracker. After longer conversation with google employee and their investigation it turned out that it doesn't make sense to use templates with dataflow batch jobs that read from BigQuery, because you can only execute them once.

要引用:对于BigQuery批处理管道,模板只能执行一次,因为BigQuery作业ID是在模板创建时设置的。此限制将在以后的SDK 2版本中删除。但是我不能说。
创建模板: https://cloud.google.com/dataflow/docs/templates/creating-templates#pipeline-io-and-runtime-parameters

To quote: "for BigQuery batch pipelines, templates can only be executed once, as the BigQuery job ID is set at template creation time. This restriction will be removed in a future release for the SDK 2, but when I cannot say. Creating Templates: https://cloud.google.com/dataflow/docs/templates/creating-templates#pipeline-io-and-runtime-parameters"

如果错误比NullpointerException更清楚,还是会很好的。

It still would be good if the error would be more clear than NullpointerException.

无论如何,我希望以后对某人有所帮助。

Anyway I hope that helps someone in the future.

如果有人对整个对话感兴趣,这就是问题:
https://issuetracker.google.com/issues/63124894

Here is the issue if someone is interested in whole conversation: https://issuetracker.google.com/issues/63124894

这篇关于Apache Beam with Dataflow-从BigQuery读取时为Nullpointer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆