基于列索引的 Spark Dataframe 选择 [英] Spark Dataframe select based on column index

查看：35 发布时间：2021/11/14 22:20:01 scala apache-spark dataframe apache-spark-sql

本文介绍了基于列索引的 Spark Dataframe 选择的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何在 Scala 中选择具有特定索引的数据帧的所有列?

How do I select all the columns of a dataframe that has certain indexes in Scala?

例如，如果一个数据框有 100 列，而我只想提取列 (10,12,13,14,15)，该怎么做?

For example if a dataframe has 100 columns and i want to extract only columns (10,12,13,14,15), how to do the same?

下面从数据帧 df 中选择所有列，其具有在数组 colNames 中提到的列名:

Below selects all columns from dataframe df which has the column name mentioned in the Array colNames:

df = df.select(colNames.head,colNames.tail: _*)

如果有类似的，colNos 数组里面有

If there is similar, colNos array which has

colNos = Array(10,20,25,45)

如何转换上述 df.select 以仅获取特定索引处的那些列.

How do I transform the above df.select to fetch only those columns at the specific indexes.

推荐答案

您可以map覆盖columns:

import org.apache.spark.sql.functions.col

df.select(colNos map df.columns map col: _*)

或:

df.select(colNos map (df.columns andThen col): _*)

或:

df.select(colNos map (col _ compose df.columns): _*)

上面显示的所有方法都是等效的，不会造成性能损失.以下映射:

All the methods shown above are equivalent and don't impose performance penalty. Following mapping:

colNos map df.columns

只是一个本地 Array 访问(每个索引的恒定时间访问) 和基于 select 变体的 String 或 Column 之间的选择不会影响执行计划:

is just a local Array access (constant time access for each index) and choosing between String or Column based variant of select doesn't affect the execution plan:

val df = Seq((1, 2, 3 ,4, 5, 6)).toDF

val colNos = Seq(0, 3, 5)

df.select(colNos map df.columns map col: _*).explain

== Physical Plan ==
LocalTableScan [_1#46, _4#49, _6#51]

df.select("_1", "_4", "_6").explain

== Physical Plan ==
LocalTableScan [_1#46, _4#49, _6#51]

这篇关于基于列索引的 Spark Dataframe 选择的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于列索引的 Spark Dataframe 选择 [英] Spark Dataframe select based on column index

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

基于列索引的 Spark Dataframe 选择 [英] Spark Dataframe select based on column index

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭