使用Apache Beam查找2个列表的笛卡尔积 [英] Find cartesian product of 2 lists with Apache Beam

查看:86
本文介绍了使用Apache Beam查找2个列表的笛卡尔积的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个 PCollection :

PCollection<List<String>> ListA =
        pipeline.apply("getListA", ParDo.of(new getListA()))
PCollection<List<String>> ListB =
        pipeline.apply("getListB", ParDo.of(new getListB()))

ListA 包含:

["1","2","3"]

ListB 包含:

["A","B","C"]

我如何最终得到一个包含以下内容的 PCollection :

How do I end up with a PCollection that contains:

[
 ["A","1"],["A","2"],["A","3"],
 ["B","1"],["B","2"],["B","3"],
 ["C","1"],["C","2"],["C","3"],
]

我的搜索使我指向:

如何做数据流中有两个PCollection?

但这是使用带有2个输出的coGroupby处理KV的.可以使用coGroupby来创建2个列表的笛卡尔积,但我没有看到它.

But this is dealing with KV using coGroupby with 2 outputs. It's possible that coGroupby can be used to create the cartesian product of 2 lists but I am not seeing it.

推荐答案

每个PCollection中似乎都有一个元素,因此您只需要加入这些元素,然后就可以在DoFn中自己做笛卡尔积

It looks like you have a single element in each PCollection, so you just need to join those elements, and then you can do the cartesian product yourself in a DoFn

类似

Flatten.pcollections(ListA, List)
.apply(WithKeys.of(null))
.apply(GroupByKey.create())

在那之后,您将拥有一个包含单个元素的PCollection,该元素是KV(null,Iterable(ListA,ListB)),并且可以使用一些for循环生成笛卡尔乘积.

After that, you'll have a PCollection with a single element, which is a KV(null, Iterable(ListA, ListB)), and you can generate the cartesian product with some for loops.

这篇关于使用Apache Beam查找2个列表的笛卡尔积的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆