如何在Apache Beam中使用ParDo和DoFn创建读取转换 [英] How to create read transform using ParDo and DoFn in Apache Beam

查看:353
本文介绍了如何在Apache Beam中使用ParDo和DoFn创建读取转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 Apache Beam文档推荐方式 要编写简单的源代码,请使用读取转换"和ParDo.不幸的是,Apache Beam文档让我失望了.

According to the Apache Beam documentation the recommended way to write simple sources is by using Read Transforms and ParDo. Unfortunately the Apache Beam docs has let me down here.

我正在尝试编写一个简单的无限制数据源,该数据源使用ParDo发出事件,但是编译器一直在抱怨DoFn对象的输入类型:

I'm trying to write a simple unbounded data source which emits events using a ParDo but the compiler keeps complaining about the input type of the DoFn object:

message: 'The method apply(PTransform<? super PBegin,OutputT>) in the type PBegin is not applicable for the arguments (ParDo.SingleOutput<PBegin,Event>)'

我的尝试:

public class TestIO extends PTransform<PBegin, PCollection<Event>> {

    @Override
    public PCollection<Event> expand(PBegin input) {
        return input.apply(ParDo.of(new ReadFn()));
    }

    private static class ReadFn extends DoFn<PBegin, Event> {
        @ProcessElement
        public void process(@TimerId("poll") Timer pollTimer) {
            Event testEvent = new Event(...);

            //custom logic, this can happen infinitely
            for(...) {
                context.output(testEvent);
            }
        }
    }
}

推荐答案

DoFn执行逐元素处理.按照书面规定,ParDo.of(new ReadFn())将具有类型PTransform<PCollection<PBegin>, PCollection<Event>>.具体来说,ReadFn表示它接受类型为PBegin 元素,并返回0个或多个类型为Event的元素.

A DoFn performs element-wise processing. As written, ParDo.of(new ReadFn()) will have type PTransform<PCollection<PBegin>, PCollection<Event>>. Specifically, the ReadFn indicates it takes an element of type PBegin and returns 0 or more elements of type Event.

相反,您应该使用实际的Read操作.有提供了各种.如果要使用一组特定的内存中集合,也可以使用Create.

Instead, you should use an actual Read operation. There are a variety provided. You can also use Create if you have a specific set of in-memory collections to use.

如果您需要创建自定义来源,则应使用

If you need to create a custom source you should use the Read transform. Since you're using timers, you likely want to create an Unbounded Source (a stream of elements).

这篇关于如何在Apache Beam中使用ParDo和DoFn创建读取转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆