在processElement()中选择元素-Apache Beam [英] Pick elements in processElement() - Apache Beam

查看:60
本文介绍了在processElement()中选择元素-Apache Beam的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道,当我们执行ParDo转换时,我们会从数据中拾取单个元素(基本上以"\ n"分隔).但是,如果我的元素在文件中占用了两行,该怎么办?我可以根据自己的条件来选择元素吗?还是总是有必要在一行中包含一个元素?

I know that when we implement a ParDo transform, we pick up individual elements from our data(basically separated by "\n"). But what if I have an element that occupies two lines in my file. Can I apply my own condition to pick elements according to it? Or is it always necessary to have an element in a single line?

推荐答案

文本文件的读取由TextIO而不是ParDo控制-我想这就是您的意思.确实,现在TextIO将文件每行分割为1个元素,但是正在进行更改工作.您可以在 https://issues.apache.org/jira/browse/上关注该工作. BEAM-2802 .

Reading of text files is controlled by TextIO, not by ParDo - I suppose that's what you meant. Indeed right now TextIO splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.

如果您进一步了解了文件格式,请确保该文件在范围内,这对于该工作很有用.

It would be useful for that work, if you told more about your file format, to make sure it is in scope.

这篇关于在processElement()中选择元素-Apache Beam的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆