Spark复杂分组 [英] Spark complex grouping
本文介绍了Spark复杂分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在Spark中具有以下数据结构:
I have this data structure in Spark:
val df = Seq(
("Package 1", Seq("address1", "address2", "address3")),
("Package 2", Seq("address3", "address4", "address5", "address6")),
("Package 3", Seq("address7", "address8")),
("Package 4", Seq("address9")),
("Package 5", Seq("address9", "address1")),
("Package 6", Seq("address10")),
("Package 7", Seq("address8"))).toDF("Package", "Destinations")
df.show(20, false)
我需要找到在不同软件包中一起看到的所有地址.看来我找不到有效的方法.我试图进行分组,映射等.理想情况下,给定 df
的结果将是
I need to find all the addresses that were seen together across different packages. Looks like I can't find a way to efficiently do that. I've tried to group, map, etc. Ideally, result of the given df
would be
+----+------------------------------------------------------------------------+
| Id | Addresses |
+----+------------------------------------------------------------------------+
| 1 | [address1, address2, address3, address4, address5, address6, address9] |
| 2 | [address7, address8] |
| 3 | [address10] |
+----+------------------------------------------------------------------------+
推荐答案
查看全文