如何转换每一列的 DataFrame 以在 pyspark 中创建两个新列? [英] How to transform DataFrame per one column to create two new columns in pyspark?
问题描述
我有一个数据框x",其中有两列x1"和x2"
x1(状态)x2千伏,真 45bm,真 65mp,真 75千伏,空 450bm,null 550mp,空 650
我想将此数据帧转换为根据其状态和值过滤数据的格式
x1 true null千伏 45 450宝马 65 550mp 75 650
有没有办法做到这一点,我正在使用 pyspark datadrame
是的,有办法.首先使用 split 函数,然后将此数据帧拆分为两个数据帧(使用 where
两次),然后在第一列上简单地加入此新数据帧..>
在用于 Scala 的 Spark API 中,如下所示:
val x1status = Seq(("kv,true",45),("bm,true",65),("mp,true",75),("kv,null",450),("bm,null",550),("mp,null",650)).toDF("x1", "x2")val x1 = x1 状态.withColumn("split", split('x1, ",")).withColumn("x1", 'split getItem 0).withColumn("status", 'split getItem 1).drop("拆分")标度>x1.show+---+---+------+|x1|x2|状态|+---+---+------+|电压|45|真||米|65|真||mp|75|真||千伏|450|空||BM|550|空||mp|650|空|+---+---+------+val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true")val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null")val 结果 = trueDF.join(nullDF, "x1").drop("status")标度>结果显示+---+----+----+|x1|真|空|+---+----+----+|电压|45|450||米|65|550||mp|75|650|+---+----+----+
I have a dataframe "x", In which their are two columns "x1" and "x2"
x1(status) x2
kv,true 45
bm,true 65
mp,true 75
kv,null 450
bm,null 550
mp,null 650
I want to convert this dataframe into a format in which data is filtered according to its status and value
x1 true null
kv 45 450
bm 65 550
mp 75 650
Is there a way to do this, I am using pyspark datadrame
Yes, there is a way. First split the first column by ,
using split function, then split this dataframe into two dataframes (using where
twice) and simply join this new dataframes on first column..
In Spark API for Scala it'd be as follows:
val x1status = Seq(
("kv,true",45),
("bm,true",65),
("mp,true",75),
("kv,null",450),
("bm,null",550),
("mp,null",650)).toDF("x1", "x2")
val x1 = x1status
.withColumn("split", split('x1, ","))
.withColumn("x1", 'split getItem 0)
.withColumn("status", 'split getItem 1)
.drop("split")
scala> x1.show
+---+---+------+
| x1| x2|status|
+---+---+------+
| kv| 45| true|
| bm| 65| true|
| mp| 75| true|
| kv|450| null|
| bm|550| null|
| mp|650| null|
+---+---+------+
val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true")
val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null")
val result = trueDF.join(nullDF, "x1").drop("status")
scala> result.show
+---+----+----+
| x1|true|null|
+---+----+----+
| kv| 45| 450|
| bm| 65| 550|
| mp| 75| 650|
+---+----+----+
这篇关于如何转换每一列的 DataFrame 以在 pyspark 中创建两个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!