带有数据流的 Apache Beam - 从 BigQuery 读取时出现空指针 [英] Apache Beam with Dataflow - Nullpointer when reading from BigQuery

查看:29
本文介绍了带有数据流的 Apache Beam - 从 BigQuery 读取时出现空指针的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用从 BigQuery 表和文件中读取的 apache beam 编写的 google 数据流上运行作业.转换数据并将其写入其他 BigQuery 表.工作通常"会成功,但有时我会在从大查询表中读取时随机收到空指针异常并且我的工作失败:

I am running a job on google dataflow written with apache beam that reads from BigQuery table and from files. Transforms the data and writes it into other BigQuery tables. The job "usually" succeeds, but sometimes I am randomly getting nullpointer exception when reading from big query table and my job fails:

(288abb7678892196): java.lang.NullPointerException
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:98)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:261)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:209)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:184)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:161)
at com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:47)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:341)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:297)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:125)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:105)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

我无法弄清楚这与什么有关.当我清除临时目录并重新上传我的模板时,作业再次通过.

I cannot figure out what is this connected to. When I clear the temp directory and reupload my template the job passes again.

我从 BQ 中读取的方式很简单:

The way I read from BQ is simply with:

BigQueryIO.read().fromQuery()

我将不胜感激任何帮助.

I would greatly appreciate any help.

有人吗?

推荐答案

我最终在 google issuetracker 中添加了错误.在与 Google 员工进行更长时间的对话并进行调查后,结果证明将模板与从 BigQuery 读取的数据流批处理作业一起使用是没有意义的,因为您只能执行一次.

I ended up adding bug in google issuetracker. After longer conversation with google employee and their investigation it turned out that it doesn't make sense to use templates with dataflow batch jobs that read from BigQuery, because you can only execute them once.

引用:对于 BigQuery 批处理管道,模板只能执行一次,因为 BigQuery 作业 ID 在模板创建时设置.此限制将在 SDK 2 的未来版本中删除,但当我不能说.创建模板:https://cloud.google.com/dataflow/docs/templates/creating-templates#pipeline-io-and-runtime-parameters"

To quote: "for BigQuery batch pipelines, templates can only be executed once, as the BigQuery job ID is set at template creation time. This restriction will be removed in a future release for the SDK 2, but when I cannot say. Creating Templates: https://cloud.google.com/dataflow/docs/templates/creating-templates#pipeline-io-and-runtime-parameters"

如果错误比 NullpointerException 更清楚,那就更好了.

It still would be good if the error would be more clear than NullpointerException.

无论如何,我希望对未来的人有所帮助.

Anyway I hope that helps someone in the future.

如果有人对整个对话感兴趣,则问题如下:https://issuetracker.google.com/issues/63124894

Here is the issue if someone is interested in whole conversation: https://issuetracker.google.com/issues/63124894

这篇关于带有数据流的 Apache Beam - 从 BigQuery 读取时出现空指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆