如何选择以公共标签开头的所有列 [英] how to select all columns that starts with a common label

查看：89 发布时间：2020/9/4 6:46:43 scala apache-spark spark-dataframe

本文介绍了如何选择以公共标签开头的所有列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Spark 1.6中有一个数据框，只想从中选择一些列.列名称如下:

I have a dataframe in Spark 1.6 and want to select just some columns out of it. The column names are like:

colA, colB, colC, colD, colE, colF-0, colF-1, colF-2

我知道我可以这样做来选择特定的列:

I know I can do like this to select specific columns:

df.select("colA", "colB", "colE")

但是如何一次选择说"colA"，"colB"和所有colF- *列呢?是否有类似 Pandas 的方式>?

but how to select, say "colA", "colB" and all the colF-* columns at once? Is there a way like in Pandas?

推荐答案

首先使用df.columns抓取列名，然后过滤为仅需要.filter(_.startsWith("colF"))的列名.这为您提供了一个字符串数组.但是选择需要select(String, String*).幸运的是，选择的列是select(Column*)，因此最后使用.map(df(_))将字符串转换为列，最后使用: _*将列数组转换为变量arg.

First grab the column names with df.columns, then filter down to just the column names you want .filter(_.startsWith("colF")). This gives you an array of Strings. But the select takes select(String, String*). Luckily select for columns is select(Column*), so finally convert the Strings into Columns with .map(df(_)), and finally turn the Array of Columns into a var arg with : _*.

df.select(df.columns.filter(_.startsWith("colF")).map(df(_)) : _*).show

可以使此过滤器更复杂(与Pandas相同).但是，这是一个非常丑陋的解决方案(IMO):

This filter could be made more complex (same as Pandas). It is however a rather ugly solution (IMO):

df.select(df.columns.filter(x => (x.equals("colA") || x.startsWith("colF"))).map(df(_)) : _*).show

如果其他列的列表是固定的，则还可以将固定的列名称数组与过滤后的数组合并.

If the list of other columns is fixed you could also merge a fixed array of columns names with filtered array.

df.select((Array("colA", "colB") ++ df.columns.filter(_.startsWith("colF"))).map(df(_)) : _*).show

这篇关于如何选择以公共标签开头的所有列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何选择以公共标签开头的所有列 [英] how to select all columns that starts with a common label

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何选择以公共标签开头的所有列 [英] how to select all columns that starts with a common label

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭