替换spark Dataframe中所有列名称中的空格 [英] Replacing whitespace in all column names in spark Dataframe
本文介绍了替换spark Dataframe中所有列名称中的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一些列名称中带有空格的spark数据框,必须将其替换为下划线.
I have spark dataframe with whitespaces in some of column names, which has to be replaced with underscore.
我知道可以在sparkSQL中使用withColumnRenamed()
重命名单个列,但是要重命名"n"个列,此功能必须链接"n"次(据我所知).
I know a single column can be renamed using withColumnRenamed()
in sparkSQL, but to rename 'n' number of columns, this function has to chained 'n' times (to my knowledge).
要使其自动化,我已经尝试过:
To automate this, i have tried:
val old_names = df.columns() // contains array of old column names
val new_names = old_names.map { x =>
if(x.contains(" ") == true)
x.replaceAll("\\s","_")
else x
} // array of new column names with removed whitespace.
现在,如何用new_names
推荐答案
var newDf = df
for(col <- df.columns){
newDf = newDf.withColumnRenamed(col,col.replaceAll("\\s", "_"))
}
您可以用某种方法将其封装起来,以免造成太多污染.
You can encapsulate it in some method so it won't be too much pollution.
这篇关于替换spark Dataframe中所有列名称中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文