pyspark - 合并 2 列集合 [英] pyspark - merge 2 columns of sets
本文介绍了pyspark - 合并 2 列集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个 spark 数据框,它有 2 列由函数 collect_set 组成.我想将这 2 列集合组合成 1 列集合.我该怎么做?它们都是一组字符串
I have a spark dataframe that has 2 columns formed from the function collect_set. I would like to combine these 2 columns of sets into 1 column of set. How should I do so? They are both set of strings
例如,我通过调用 collect_set 形成了 2 列
For Instance I have 2 columns formed from calling collect_set
Fruits | Meat
[Apple,Orange,Pear] [Beef, Chicken, Pork]
我如何把它变成:
Food
[Apple,Orange,Pear, Beef, Chicken, Pork]
非常感谢您提前提供的帮助
Thank you very much for your help in advance
推荐答案
假设 df
有
+--------------------+--------------------+
| Fruits| Meat|
+--------------------+--------------------+
|[Pear, Orange, Ap...|[Chicken, Pork, B...|
+--------------------+--------------------+
然后
import itertools
df.rdd.map(lambda x: [item for item in itertools.chain(x.Fruits, x.Meat)]).collect()
创建一组 Fruits
&肉
合二为一即
creates a set of Fruits
& Meat
combined into one set i.e.
[[u'Pear', u'Orange', u'Apple', u'Chicken', u'Pork', u'Beef']]
希望这会有所帮助!
Hope this helps!
这篇关于pyspark - 合并 2 列集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文