在processElement()中选择元素-Apache Beam [英] Pick elements in processElement() - Apache Beam
问题描述
我知道,当我们执行ParDo转换时,我们会从数据中拾取单个元素(基本上以"\ n"分隔).但是,如果我的元素在文件中占用了两行,该怎么办?我可以根据自己的条件来选择元素吗?还是总是有必要在一行中包含一个元素?
I know that when we implement a ParDo transform, we pick up individual elements from our data(basically separated by "\n"). But what if I have an element that occupies two lines in my file. Can I apply my own condition to pick elements according to it? Or is it always necessary to have an element in a single line?
推荐答案
文本文件的读取由TextIO
而不是ParDo
控制-我想这就是您的意思.确实,现在TextIO
将文件每行分割为1个元素,但是正在进行更改工作.您可以在 https://issues.apache.org/jira/browse/上关注该工作. BEAM-2802 .
Reading of text files is controlled by TextIO
, not by ParDo
- I suppose that's what you meant. Indeed right now TextIO
splits files into 1 element per line, however there is work in progress on changing that. You can follow the work at https://issues.apache.org/jira/browse/BEAM-2802.
如果您进一步了解了文件格式,请确保该文件在范围内,这对于该工作很有用.
It would be useful for that work, if you told more about your file format, to make sure it is in scope.
这篇关于在processElement()中选择元素-Apache Beam的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!