如何使用 dict 创建新的 DataFrame [英] How to create new DataFrame with dict
本文介绍了如何使用 dict 创建新的 DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个 dict
,比如:
cMap = {"k1" : "v1", "k2" : "v1", "k3" : "v2", "k4" : "v2"}
和一个 DataFrame A
,例如:
and one DataFrame A
, like:
+---+
|key|
+----
| k1|
| k2|
| k3|
| k4|
+---+
使用代码创建上面的 DataFame:
to create the DataFame above with code:
data = [('k1'),
('k2'),
('k3'),
('k4')]
A = spark.createDataFrame(data, ['key'])
我想获得新的 DataFrame,例如:
I want to get the new DataFrame, like:
+---+----------+----------+
|key| v1 | v2 |
+---+----------+----------+
| k1|true |false |
| k2|true |false |
| k3|false |true |
| k4|false |true |
+---+----------+----------+
希望得到一些建议,谢谢!
I wish to get some suggestions, thanks!
推荐答案
感谢大家的建议,我想出了解决pivot问题的另一种方法,代码是:
Thanks everyone for some suggestions, I figured out the other way to resolve my problem with pivot, the code is:
cMap = {"k1" : "v1", "k2" : "v1", "k3" : "v2", "k4" : "v2"}
a_cMap = [(k,)+(v,) for k,v in cMap.items()]
data = spark.createDataFrame(a_cMap, ['key','val'])
from pyspark.sql.functions import count
data = data.groupBy('key').pivot('val').agg(count('val'))
data.show()
+---+----+----+
|key| v1| v2|
+---+----+----+
| k2| 1|null|
| k4|null| 1|
| k1| 1|null|
| k3|null| 1|
+---+----+----+
data = data.na.fill(0)
data.show()
+---+---+---+
|key| v1| v2|
+---+---+---+
| k2| 1| 0|
| k4| 0| 1|
| k1| 1| 0|
| k3| 0| 1|
+---+---+---+
keys = spark.createDataFrame([('k1','2'),('k2','3'),('k3','4'),('k4','5'),('k5','6')], ["key",'temp'])
newDF = keys.join(data,'key')
newDF.show()
+---+----+---+---+
|key|temp| v1| v2|
+---+----+---+---+
| k2| 3| 1| 0|
| k4| 5| 0| 1|
| k1| 2| 1| 0|
| k3| 4| 0| 1|
+---+----+---+---+
但是,我无法将 1 转换为 true,将 0 转换为 false.
But, I can't convert 1 to true, 0 to false.
这篇关于如何使用 dict 创建新的 DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文