替换spark Dataframe中所有列名称中的空格 [英] Replacing whitespace in all column names in spark Dataframe

查看:575
本文介绍了替换spark Dataframe中所有列名称中的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些列名称中带有空格的spark数据框,必须将其替换为下划线.

I have spark dataframe with whitespaces in some of column names, which has to be replaced with underscore.

我知道可以在sparkSQL中使用withColumnRenamed()重命名单个列,但是要重命名"n"个列,此功能必须链接"n"次(据我所知).

I know a single column can be renamed using withColumnRenamed() in sparkSQL, but to rename 'n' number of columns, this function has to chained 'n' times (to my knowledge).

要使其自动化,我已经尝试过:

To automate this, i have tried:

val old_names = df.columns()        // contains array of old column names

val new_names = old_names.map { x => 
   if(x.contains(" ") == true) 
      x.replaceAll("\\s","_") 
   else x 
}                    // array of new column names with removed whitespace.

现在,如何用new_names

推荐答案

  var newDf = df
  for(col <- df.columns){
    newDf = newDf.withColumnRenamed(col,col.replaceAll("\\s", "_"))
  }

您可以用某种方法将其封装起来,以免造成太多污染.

You can encapsulate it in some method so it won't be too much pollution.

这篇关于替换spark Dataframe中所有列名称中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆