如何在 Apache Beam 中使用 ParDo 和 DoFn 创建读取转换 [英] How to create read transform using ParDo and DoFn in Apache Beam
问题描述
根据Apache Beam 文档 推荐方式编写简单的源代码是通过使用读取转换和 ParDo
.不幸的是,Apache Beam 文档让我失望了.
According to the Apache Beam documentation the recommended way
to write simple sources is by using Read Transforms and ParDo
. Unfortunately the Apache Beam docs has let me down here.
我正在尝试编写一个简单的无界数据源,它使用 ParDo
发出事件,但编译器一直抱怨 DoFn
对象的输入类型:>
I'm trying to write a simple unbounded data source which emits events using a ParDo
but the compiler keeps complaining about the input type of the DoFn
object:
message: 'The method apply(PTransform<? super PBegin,OutputT>) in the type PBegin is not applicable for the arguments (ParDo.SingleOutput<PBegin,Event>)'
我的尝试:
public class TestIO extends PTransform<PBegin, PCollection<Event>> {
@Override
public PCollection<Event> expand(PBegin input) {
return input.apply(ParDo.of(new ReadFn()));
}
private static class ReadFn extends DoFn<PBegin, Event> {
@ProcessElement
public void process(@TimerId("poll") Timer pollTimer) {
Event testEvent = new Event(...);
//custom logic, this can happen infinitely
for(...) {
context.output(testEvent);
}
}
}
}
推荐答案
A DoFn
执行逐元素处理.如所写,ParDo.of(new ReadFn())
将具有类型 PTransform
.具体来说,ReadFn
表示它需要一个 PBegin
类型的 element 并返回 0 个或多个 Event
类型的元素.
A DoFn
performs element-wise processing. As written, ParDo.of(new ReadFn())
will have type PTransform<PCollection<PBegin>, PCollection<Event>>
. Specifically, the ReadFn
indicates it takes an element of type PBegin
and returns 0 or more elements of type Event
.
相反,您应该使用实际的Read
操作.有各种提供.如果您要使用一组特定的内存中集合,也可以使用 Create
.
Instead, you should use an actual Read
operation. There are a variety provided. You can also use Create
if you have a specific set of in-memory collections to use.
如果您需要创建自定义源,您应该使用 读取转换.由于您使用的是计时器,因此您可能希望创建一个无界源(元素流).
If you need to create a custom source you should use the Read transform. Since you're using timers, you likely want to create an Unbounded Source (a stream of elements).
这篇关于如何在 Apache Beam 中使用 ParDo 和 DoFn 创建读取转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!