如何转换每一列的 DataFrame 以在 pyspark 中创建两个新列? [英] How to transform DataFrame per one column to create two new columns in pyspark?

查看:33
本文介绍了如何转换每一列的 DataFrame 以在 pyspark 中创建两个新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框x",其中有两列x1"和x2"

x1(状态)x2千伏,真 45bm,真 65mp,真 75千伏,空 450bm,null 550mp,空 650

我想将此数据帧转换为根据其状态和值过滤数据的格式

x1 true null千伏 45 450宝马 65 550mp 75 650

有没有办法做到这一点,我正在使用 pyspark datadrame

解决方案

是的,有办法.首先使用 split 函数,然后将此数据帧拆分为两个数据帧(使用 where 两次),然后在第一列上简单地加入此新数据帧..>

在用于 Scala 的 Spark API 中,如下所示:

val x1status = Seq(("kv,true",45),("bm,true",65),("mp,true",75),("kv,null",450),("bm,null",550),("mp,null",650)).toDF("x1", "x2")val x1 = x1 状态.withColumn("split", split('x1, ",")).withColumn("x1", 'split getItem 0).withColumn("status", 'split getItem 1).drop("拆分")标度>x1.show+---+---+------+|x1|x2|状态|+---+---+------+|电压|45|真||米|65|真||mp|75|真||千伏|450|空||BM|550|空||mp|650|空|+---+---+------+val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true")val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null")val 结果 = trueDF.join(nullDF, "x1").drop("status")标度>结果显示+---+----+----+|x1|真|空|+---+----+----+|电压|45|450||米|65|550||mp|75|650|+---+----+----+

I have a dataframe "x", In which their are two columns "x1" and "x2"

x1(status)    x2  
kv,true       45
bm,true       65
mp,true       75
kv,null       450
bm,null       550
mp,null       650

I want to convert this dataframe into a format in which data is filtered according to its status and value

x1  true  null
kv   45    450
bm   65    550
mp   75    650

Is there a way to do this, I am using pyspark datadrame

解决方案

Yes, there is a way. First split the first column by , using split function, then split this dataframe into two dataframes (using where twice) and simply join this new dataframes on first column..

In Spark API for Scala it'd be as follows:

val x1status = Seq(
  ("kv,true",45),
  ("bm,true",65),
  ("mp,true",75),
  ("kv,null",450),
  ("bm,null",550),
  ("mp,null",650)).toDF("x1", "x2")

val x1 = x1status
  .withColumn("split", split('x1, ","))
  .withColumn("x1", 'split getItem 0)
  .withColumn("status", 'split getItem 1)
  .drop("split")

scala> x1.show
+---+---+------+
| x1| x2|status|
+---+---+------+
| kv| 45|  true|
| bm| 65|  true|
| mp| 75|  true|
| kv|450|  null|
| bm|550|  null|
| mp|650|  null|
+---+---+------+

val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true")
val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null")

val result = trueDF.join(nullDF, "x1").drop("status")

scala> result.show
+---+----+----+
| x1|true|null|
+---+----+----+
| kv|  45| 450|
| bm|  65| 550|
| mp|  75| 650|
+---+----+----+

这篇关于如何转换每一列的 DataFrame 以在 pyspark 中创建两个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆