带有数据流的Apache Beam Go SDK [英] Apache Beam Go SDK with Dataflow
问题描述
我一直在使用Go Beam SDK(v2.13.0),但无法获取
I've been working with the Go Beam SDK (v2.13.0) and can't get the wordcount example working on GCP Dataflow. It enters crash loop trying to start the org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness
. The example is executing correctly when run locally using the Direct runner.
该示例与上面给出的原始示例完全未经修改.
The example was completely unmodified from the original example given above.
堆栈跟踪为:
org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.InvalidProtocolBufferException: Protocol message had invalid UTF-8.
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.InvalidProtocolBufferException.invalidUtf8(InvalidProtocolBufferException.java:148)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readStringRequireUtf8(CodedInputStream.java:2353)
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec.<init>(RunnerApi.java:59611)
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec.<init>(RunnerApi.java:59572)
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec$1.parsePartialFrom(RunnerApi.java:60241)
at org.apache.beam.model.pipeline.v1.RunnerApi$FunctionSpec$1.parsePartialFrom(RunnerApi.java:60235)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424)
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder.<init>(RunnerApi.java:27531)
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder.<init>(RunnerApi.java:27489)
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$1.parsePartialFrom(RunnerApi.java:28410)
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$1.parsePartialFrom(RunnerApi.java:28404)
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$Builder.mergeFrom(RunnerApi.java:28028)
at org.apache.beam.model.pipeline.v1.RunnerApi$Coder$Builder.mergeFrom(RunnerApi.java:27868)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2408)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntryLite.parseField(MapEntryLite.java:128)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntryLite.parseEntry(MapEntryLite.java:184)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry.<init>(MapEntry.java:106)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry.<init>(MapEntry.java:50)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:70)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.MapEntry$Metadata$1.parsePartialFrom(MapEntry.java:64)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424)
at org.apache.beam.model.pipeline.v1.RunnerApi$Components.<init>(RunnerApi.java:930)
at org.apache.beam.model.pipeline.v1.RunnerApi$Components.<init>(RunnerApi.java:848)
at org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2714)
at org.apache.beam.model.pipeline.v1.RunnerApi$Components$1.parsePartialFrom(RunnerApi.java:2708)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2424)
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.<init>(RunnerApi.java:2892)
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.<init>(RunnerApi.java:2850)
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:3981)
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline$1.parsePartialFrom(RunnerApi.java:3975)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:221)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:239)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:244)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at org.apache.beam.vendor.grpc.v1p13p1.com.google.protobuf.GeneratedMessageV3.parseWithIOException(GeneratedMessageV3.java:311)
at org.apache.beam.model.pipeline.v1.RunnerApi$Pipeline.parseFrom(RunnerApi.java:3222)
at org.apache.beam.runners.dataflow.worker.DataflowWorkerHarnessHelper.getPipelineFromEnv(DataflowWorkerHarnessHelper.java:131)
at org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:59)
我正在使用示例中指定的docker镜像从我自己的泊坞窗尝试使用相同的标签(v2.13.0),但仍收到相同的错误.我意识到还没有量产,但我希望样品能工作.
I was using the docker image specified in the example and also tried from my own docker using the same tag (v2.13.0) but still get the same error. I Realize it's not production ready, but I am hoping the samples should work.
按照入门指南,我按如下方式进行工作:
As per the instructions on the getting starting I ran the job like this:
wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt \
--output gs://example-bucket/counts \
--runner dataflow \
--project example-project \
--temp_location gs://example-bucket/tmp/ \
--staging_location gs://example-bucket/binaries/ \
--worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515
同样,我尝试了入门中提供的docker以及使用v2.13.0构建的docker.
Again I tried that docker provided in the getting started, as well as one built using v2.13.0.
我的示例文件go.mod是:
My go.mod for the sample file is:
module example.org/wordcount
go 1.12
require (
cloud.google.com/go v0.41.0 // indirect
github.com/apache/beam v2.13.0+incompatible
github.com/pkg/errors v0.8.1 // indirect
golang.org/x/net v0.0.0-20190628185345-da137c7871d7 // indirect
google.golang.org/grpc v1.22.0 // indirect
)
这可能是什么原因造成的?
What could be causing this?
推荐答案
数据流不正式支持Apache Beam Go SDK.有些用户已经可以使用它了.我怀疑此版本可能存在问题.您也许可以尝试其他版本.
Dataflow doesn't officially support the Apache Beam Go SDK. Some users have had been able to use it though. I suspect that this release may have had issues. You may be able to try a different version.
您也许可以与 Beam邮件列表上的其他用户进行讨论哪些版本适用于它们(尽管不受支持).
You may be able to discuss with other users on the Beam mailing list about which versions have worked for them (though, unsupported).
这篇关于带有数据流的Apache Beam Go SDK的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!