Scala Spark DataFrame:dataFrame.select给定列名称序列的多个列 [英] Scala Spark DataFrame : dataFrame.select multiple columns given a Sequence of column names

查看:456
本文介绍了Scala Spark DataFrame:dataFrame.select给定列名称序列的多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

val columnName=Seq("col1","col2",....."coln");

有没有一种方法可以执行dataframe.select操作来获取仅包含指定列名的数据框. 我知道我可以做dataframe.select("col1","col2"...) 但是columnName是在运行时生成的. 我可以为循环中的每个列名重复执行dataframe.select(),这会不会有性能开销?还有其他更简单的方法可以做到这一点吗?

Is there a way to do dataframe.select operation to get dataframe containing only the column names specified . I know I can do dataframe.select("col1","col2"...) but the columnNameis generated at runtime. I could do dataframe.select() repeatedly for each column name in a loop.Will it have any performance overheads?. Is there any other simpler way to accomplish this?

推荐答案

val columnNames = Seq("col1","col2",....."coln")

// using the string column names:
val result = dataframe.select(columnNames.head, columnNames.tail: _*)

// or, equivalently, using Column objects:
val result = dataframe.select(columnNames.map(c => col(c)): _*)

这篇关于Scala Spark DataFrame:dataFrame.select给定列名称序列的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆