如何在每一列中转换DataFrame以在pyspark中创建两个新列? [英] How to transform DataFrame per one column to create two new columns in pyspark?

查看:159
本文介绍了如何在每一列中转换DataFrame以在pyspark中创建两个新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框"x",其中有两个列"x1"和"x2"

I have a dataframe "x", In which their are two columns "x1" and "x2"

x1(status)    x2  
kv,true       45
bm,true       65
mp,true       75
kv,null       450
bm,null       550
mp,null       650

我想将此数据框转换为一种格式,其中根据数据的状态和值对数据进行过滤

I want to convert this dataframe into a format in which data is filtered according to its status and value

x1  true  null
kv   45    450
bm   65    550
mp   75    650

有没有办法做到这一点, 我正在使用pyspark datadrame

Is there a way to do this, I am using pyspark datadrame

推荐答案

是的,有一种方法.首先使用,拆分为第一列="nofollow noreferrer"> split 函数,然后将此数据帧拆分为两个数据帧(使用where两次),然后在第一列上简单地加入此新数据帧.

Yes, there is a way. First split the first column by , using split function, then split this dataframe into two dataframes (using where twice) and simply join this new dataframes on first column..

在Scala的Spark API中,如下所示:

In Spark API for Scala it'd be as follows:

val x1status = Seq(
  ("kv,true",45),
  ("bm,true",65),
  ("mp,true",75),
  ("kv,null",450),
  ("bm,null",550),
  ("mp,null",650)).toDF("x1", "x2")

val x1 = x1status
  .withColumn("split", split('x1, ","))
  .withColumn("x1", 'split getItem 0)
  .withColumn("status", 'split getItem 1)
  .drop("split")

scala> x1.show
+---+---+------+
| x1| x2|status|
+---+---+------+
| kv| 45|  true|
| bm| 65|  true|
| mp| 75|  true|
| kv|450|  null|
| bm|550|  null|
| mp|650|  null|
+---+---+------+

val trueDF = x1.where('status === "true").withColumnRenamed("x2", "true")
val nullDF = x1.where('status === "null").withColumnRenamed("x2", "null")

val result = trueDF.join(nullDF, "x1").drop("status")

scala> result.show
+---+----+----+
| x1|true|null|
+---+----+----+
| kv|  45| 450|
| bm|  65| 550|
| mp|  75| 650|
+---+----+----+

这篇关于如何在每一列中转换DataFrame以在pyspark中创建两个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆