通过将PubsubIO.Read映射到PubsubIO.Read/PubsubUnboundedSource来更新作业,是否给出“编码器或步骤的类型已更改"兼容性检查失败? [英] Updating job by mapping PubsubIO.Read to PubsubIO.Read/PubsubUnboundedSource gives 'Coder or type for step has changed' compatibility check failure?

查看:51
本文介绍了通过将PubsubIO.Read映射到PubsubIO.Read/PubsubUnboundedSource来更新作业,是否给出“编码器或步骤的类型已更改"兼容性检查失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将当前正在运行的Google Cloud Dataflow作业从v1.8 Java Dataflow SDK更新到v2.4 Java Dataflow SDK,并且作为该过程的一部分,根据1.x-> 2.x的发行说明移动( https://cloud.google.com/dataflow/release-notes/release-notes-java-2#changed_pubsubio_api )我正在更改函数PubsubIO.Read,用法如下:

I'm updating a currently running google cloud dataflow job from the v1.8 Java Dataflow SDK to v2.4 Java Dataflow SDK and as part of that process as per the release notes for the 1.x -> 2.x move (https://cloud.google.com/dataflow/release-notes/release-notes-java-2#changed_pubsubio_api) I'm changing the function PubsubIO.Read as used below:

 PCollection<String> streamData =
      pipeline
        .apply(PubsubIO.Read
                .timestampLabel(PUBSUB_TIMESTAMP_LABEL_KEY)
                .topic(options.getPubsubTopic()));

改为如下所示的PubsubIO.readStrings():

to instead be PubsubIO.readStrings() as below:

PCollection<String> streamData =
      pipeline
        .apply(PubsubIO.readStrings()
                .withTimestampAttribute(PUBSUB_TIMESTAMP_LABEL_KEY)
                .fromTopic(options.getPubsubTopic()));

这随后导致我需要像这样使用转换映射命令行参数

Which then leads me to need to use the transform mapping command line argument like so

'--transformNameMapping={\"PubsubIO.Read\": \"PubsubIO.Read/PubsubUnboundedSource\"}'

但是我遇到了兼容性检查失败:

But I get a compatabiltiy check failure:

工作流程失败.原因:新作业与2016-12-13_15_23_40 -.....原始作业尚未中止.步骤PubsubIO.Read/PubsubUnboundedSource的编码器或类型具有改变了.

Workflow failed. Causes: The new job is not compatible with 2016-12-13_15_23_40-..... The original job has not been aborted., The Coder or type for step PubsubIO.Read/PubsubUnboundedSource has changed.

这让我有些困惑,好像旧代码正在使用字符串,而新代码仍在使用字符串,有人可以帮助我了解此错误消息告诉我的内容吗?也许我可以通过一种日志记录语句来告诉我我正在使用什么Coder,以便可以使用旧代码和新代码运行测试,看看有什么区别?

This confuses me a bit as it seems like the old code was working with strings and the new code is still using strings, can anyone help me understand what this error message is telling me? Is there perhaps a way for me to add a logging statement that will tell me what Coder I am using so that I can run my tests with my old code and new code and see what the difference is?

推荐答案

我认为问题是您正在尝试更新现有作业.由于2.x版本引入了重大更改,因此无法更新流作业.在文档页面显示为:

I think that the problem is that you are trying to update an existing job. As the 2.x release introduced breaking changes, streaming jobs cannot be updated. There is a warning for users upgrading from 1.x at the top of that documentation page that reads:

  • 更新不兼容性:Java的Dataflow SDK 2.x与Dataflow 1.x更新不兼容.使用数据流流作业1.x SDK无法更新为使用Dataflow 2.x SDK.Dataflow 2.x管道只能在从SDK开始的各个版本之间进行更新版本2.0.0.

关于编码器的更改,在 BEAM-1415 上有一些解释:

Regarding the Coder changes there is some explanation on BEAM-1415:

不再有读写通用类型T的方法.有 PubsubIO.{read,write} {Strings,Protos,PubsubMessages} .字符串和原型很常见,因此它们有简写形式.为了其他所有内容,请使用 PubsubMessage 自行解析.的情况下阅读,您可以阅读带有或不带有属性的内容.这摆脱了编码器的丑陋用法,用于解码消息的有效载荷(禁止样式指南),并且由于PubsubMessage易于编码,因此再次样式指南还要求明确将其用作转换的输入/返回类型

There's no longer a way to read/write a generic type T. Instead, there's PubsubIO. {read,write} {Strings,Protos,PubsubMessages}. Strings and protos are a very common case so they have shorthands. For everything else, use PubsubMessage and parse it yourself. In case of read, you can read them with or without attributes. This gets rid of the ugly use of Coder for decoding a message's payload (forbidden by the style guide), and since PubsubMessage is easily encodable, again the style guide also dictates to use that explicitly as the input/return type of the transforms

在测试中,您可以像 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆