如何“否定选择"spark数据框中的列 [英] How to "negative select" columns in spark's dataframe
问题描述
我无法弄清楚,但我想这很简单.我有一个火花数据框 df.这个 df 有列A"、B"和C".现在假设我有一个包含此 df 列名称的数组:
I can't figure it out, but guess it's simple. I have a spark dataframe df. This df has columns "A","B" and "C". Now let's say I have an Array containing the name of the columns of this df:
column_names = Array("A","B","C")
我想以这种方式执行 df.select()
,以便我可以指定不选择哪些列.示例:假设我不想选择列B".我试过了
I'd like to do a df.select()
in such a way, that I can specify which columns not to select.
Example: let's say I do not want to select columns "B". I tried
df.select(column_names.filter(_!="B"))
但这不起作用,因为
org.apache.spark.sql.DataFrame不能应用于 (Array[String])
org.apache.spark.sql.DataFrame cannot be applied to (Array[String])
所以,此处 它说它应该与 Seq 一起使用.但是,尝试
So, here it says it should work with a Seq instead. However, trying
df.select(column_names.filter(_!="B").toSeq)
结果
org.apache.spark.sql.DataFrame不能应用于 (Seq[String]).
org.apache.spark.sql.DataFrame cannot be applied to (Seq[String]).
我做错了什么?
推荐答案
自 Spark 1.4 你可以使用 drop
方法:
Since Spark 1.4 you can use drop
method:
Scala:
case class Point(x: Int, y: Int)
val df = sqlContext.createDataFrame(Point(0, 0) :: Point(1, 2) :: Nil)
df.drop("y")
Python:
df = sc.parallelize([(0, 0), (1, 2)]).toDF(["x", "y"])
df.drop("y")
## DataFrame[x: bigint]
这篇关于如何“否定选择"spark数据框中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!