如何在 Apache Beam 中使用 ParDo 和 DoFn 创建读取转换 [英] How to create read transform using ParDo and DoFn in Apache Beam

查看:28
本文介绍了如何在 Apache Beam 中使用 ParDo 和 DoFn 创建读取转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据Apache Beam 文档 推荐方式编写简单的源代码是通过使用读取转换和 ParDo.不幸的是,Apache Beam 文档让我失望了.

According to the Apache Beam documentation the recommended way to write simple sources is by using Read Transforms and ParDo. Unfortunately the Apache Beam docs has let me down here.

我正在尝试编写一个简单的无界数据源,它使用 ParDo 发出事件,但编译器一直抱怨 DoFn 对象的输入类型:

I'm trying to write a simple unbounded data source which emits events using a ParDo but the compiler keeps complaining about the input type of the DoFn object:

message: 'The method apply(PTransform<? super PBegin,OutputT>) in the type PBegin is not applicable for the arguments (ParDo.SingleOutput<PBegin,Event>)'

我的尝试:

public class TestIO extends PTransform<PBegin, PCollection<Event>> {

    @Override
    public PCollection<Event> expand(PBegin input) {
        return input.apply(ParDo.of(new ReadFn()));
    }

    private static class ReadFn extends DoFn<PBegin, Event> {
        @ProcessElement
        public void process(@TimerId("poll") Timer pollTimer) {
            Event testEvent = new Event(...);

            //custom logic, this can happen infinitely
            for(...) {
                context.output(testEvent);
            }
        }
    }
}

推荐答案

A DoFn 执行逐元素处理.如所写,ParDo.of(new ReadFn()) 将具有类型 PTransform、PCollection>.具体来说,ReadFn 表示它需要一个 PBegin 类型的 element 并返回 0 个或多个 Event 类型的元素.

A DoFn performs element-wise processing. As written, ParDo.of(new ReadFn()) will have type PTransform<PCollection<PBegin>, PCollection<Event>>. Specifically, the ReadFn indicates it takes an element of type PBegin and returns 0 or more elements of type Event.

相反,您应该使用实际的Read 操作.有各种提供.如果您要使用一组特定的内存中集合,也可以使用 Create.

Instead, you should use an actual Read operation. There are a variety provided. You can also use Create if you have a specific set of in-memory collections to use.

如果您需要创建自定义源,您应该使用 读取转换.由于您使用的是计时器,因此您可能希望创建一个无界源(元素流).

If you need to create a custom source you should use the Read transform. Since you're using timers, you likely want to create an Unbounded Source (a stream of elements).

这篇关于如何在 Apache Beam 中使用 ParDo 和 DoFn 创建读取转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆