非 KV 元素的 GroupIntoBatches [英] GroupIntoBatches for non-KV elements

查看：17 发布时间：2021/11/11 22:27:54 apache-beam

本文介绍了非 KV 元素的 GroupIntoBatches的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

根据Apache Beam 2.0.0 SDK 文档 GroupIntoBatches 仅适用于 KV 集合.

According to the Apache Beam 2.0.0 SDK Documentation GroupIntoBatches works only with KV collections.

我的数据集只包含值，不需要引入键.但是，为了使用 GroupIntoBatches，我必须使用空字符串作为键来实现假"键:

My dataset contains only values and there's no need for introducing keys. However, to make use of GroupIntoBatches I had to implement "fake" keys with an empty string as a key:

static class FakeKVFn extends DoFn<String, KV<String, String>> {
  @ProcessElement
  public void processElement(ProcessContext c) {
    c.output(KV.of("", c.element()));
  }
}

所以整个管道看起来如下:

So the overall pipeline looks like the following:

public static void main(String[] args) {
  PipelineOptions options = PipelineOptionsFactory.create();
  Pipeline p = Pipeline.create(options);

  long batchSize = 100L;

  p.apply("ReadLines", TextIO.read().from("./input.txt"))
      .apply("FakeKV", ParDo.of(new FakeKVFn()))
      .apply(GroupIntoBatches.<String, String>ofSize(batchSize))
      .setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of())))
      .apply(ParDo.of(new DoFn<KV<String, Iterable<String>>, String>() {
        @ProcessElement
        public void processElement(ProcessContext c) {
          c.output(callWebService(c.element().getValue()));
        }
      }))
      .apply("WriteResults", TextIO.write().to("./output/"));

  p.run().waitUntilFinish();
}

有没有办法在不引入假"键的情况下分组?

Is there any way to group into batches without introducing "fake" keys?

非 KV 元素的 GroupIntoBatches [英] GroupIntoBatches for non-KV elements

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

非 KV 元素的 GroupIntoBatches [英] GroupIntoBatches for non-KV elements

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭