如何在缺少名称时将新列添加到 DataFrame 中? [英] How to add new columns to DataFrame given their names when they are missing?

查看:26
本文介绍了如何在缺少名称时将新列添加到 DataFrame 中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将选定的列添加到尚不可用的 DataFrame.

I'd like to add selected columns to a DataFrame that are not available already.

val columns=List("Col1","Col2","Col3") 
for(i<-columns) 
 if(!df.schema.fieldNames.contains(i)==true)
 df.withColumn(i,lit(0))

选择列时,数据框只有旧列出现,新列不出现.

When select column the data frame only old column are coming, new columns are not coming.

推荐答案

更多的是关于如何在 Scala 中做到这一点而不是 Spark 并且是 foldLeft(我的最爱!)

It's more about how to do it in Scala than Spark and is excellent case for foldLeft (my favorite!)

// start with an empty DataFrame, but could be anything
val df = spark.emptyDataFrame
val columns = Seq("Col1", "Col2", "Col3")
val columnsAdded = columns.foldLeft(df) { case (d, c) =>
  if (d.columns.contains(c)) {
    // column exists; skip it
    d
  } else {
    // column is not available so add it
    d.withColumn(c, lit(0))
  }
}

scala> columnsAdded.printSchema
root
 |-- Col1: integer (nullable = false)
 |-- Col2: integer (nullable = false)
 |-- Col3: integer (nullable = false)

这篇关于如何在缺少名称时将新列添加到 DataFrame 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆