在列中将列名作为元素动态添加数据帧中的列 [英] add columns in dataframes dynamically with column names as elements in List

查看:136
本文介绍了在列中将列名作为元素动态添加数据帧中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的列表[N]

  val检查=列表("a","b","c","d") 

其中N可以是任意数量的元素.

我有一个 dataframe ,其中只有一列称为值".根据值的内容,我需要创建N列,其中列名作为列表中的元素,列内容为 substring(x,y)

我尝试了所有可能的方法,例如 withColumn selectExpr ,但没有任何效果.请考虑 substring(X,Y),其中X和Y是基于某些元数据的一些数字

下面是我尝试过的不同代码,但是没有用,


  val df = sqlContext.read.text("xxxxx")val编码器:(String => String)=(arg:String)=>{val param ="NULL"如果(arg.length()> Y)arg.substring(X,Y)别的val sqlfunc = udf(编码器)val check =列表("a","b","c","d")用于(名称<-检查){val testDF2 = df.withColumn(name,sqlfunc(df("value")))}} 

testDF2仅具有最后一列d,而表中未添加其他列(如a,b,c)


  var z:Array [String] = new Array [String](check.size)变量i = 0对于(x<-check){如果((i + 1)== check.size){z(i)= s"" substring(a.value,X,Y)as $ x""i = i + 1}别的{z(i)= s""子字符串(a.value,X,Y)为$ x,""i = i + 1}}val zz = z.mkString(")df.alias("a").selectExpr(s"$ zz").show() 

这会引发错误


请帮助如何在列中将列名作为元素动态添加到DF中的列

我期望有一个像下面这样的Df

  -----------------------------价值|一个|b |c |d |.... N-----------------------------| xxx | xxx | xxx | xxx | xxx | xxxxxx-| xxx | xxx | xxx | xxx | xxx | xxxxxx-| xxx | xxx | xxx | xxx | xxx | xxxxxx------------------------------ 

解决方案

您可以使用例如

I have List[N] like below

val check = List ("a","b","c","d")

where N can be any number of elements.

I have a dataframe with only column called "value". Based on the contents of value i need to create N columns with column names as elements in the list and column contents as substring(x,y)

I have tried all possible ways, like withColumn, selectExpr, nothing works. Please consider substring(X,Y) where X and Y as some numbers based on some metadata

Below are my different codes which I tried, but none worked,


val df = sqlContext.read.text("xxxxx")
val coder: (String => String) = (arg: String) => {
val param = "NULL"
if (arg.length() > Y )
arg.substring(X,Y)
else
val sqlfunc = udf(coder)
val check = List ("a","b","c","d")
for (name <- check){val testDF2 = df.withColumn(name, sqlfunc(df("value")))}

testDF2 has only last column d and other columns such as a,b,c are not added in table


var z:Array[String] = new Array[String](check.size)
var i=0
for ( x <- check ) {
if ( (i+1) == check.size) {
z(i) = s""""substring(a.value,X,Y) as $x""""
i = i+1}
else{
z(i) = s""""substring(a.value,X,Y) as $x","""
i = i+1}}
val zz = z.mkString(" ")
df.alias("a").selectExpr(s"$zz").show()

This throws error


Please help how to add columns in DF dynamically with column names as elements in List

I am expecting an Df like below

-----------------------------
Value| a | b | c | d | .... N
-----------------------------
|xxx|xxx|xxx|xxx|xxx|xxxxxx-                
|xxx|xxx|xxx|xxx|xxx|xxxxxx- 
|xxx|xxx|xxx|xxx|xxx|xxxxxx-
-----------------------------

解决方案

you can dynamically add columns from your list using for instance this answer by user6910411 to a similar question (see her/his full answer for more possibilities):

val newDF = check.foldLeft(<yourdf>)((df, name) => df.withColumn(name,<yourUDF>$"value"))

这篇关于在列中将列名作为元素动态添加数据帧中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆