在 Scala Spark 中按数据类型删除列 [英] Dropping columns by data type in Scala Spark
问题描述
df1.printSchema()
打印出列名和它们拥有的数据类型.
df1.printSchema()
prints out the column names and the data type that they possess.
df1.drop($"colName")
将按名称删除列.
有没有办法修改这个命令来代替数据类型?
Is there a way to adapt this command to drop by the data-type instead?
推荐答案
如果您希望根据类型删除数据框中的特定列,那么下面的代码段会有所帮助.在这个例子中,我有一个数据框,其中有两列分别是 String 和 Int 类型.我正在根据其类型从架构中删除我的 String(所有类型为 String 的字段都将被删除)字段.
If you are looking to drop specific columns in the dataframe based on the types, then the below snippet would help. In this example, I have a dataframe with two columns of type String and Int respectivly. I am dropping my String (all fields of type String would be dropped) field from the schema based on its type.
import sqlContext.implicits._
val df = sc.parallelize(('a' to 'l').map(_.toString) zip (1 to 10)).toDF("c1","c2")
df.schema.fields
.collect({case x if x.dataType.typeName == "string" => x.name})
.foldLeft(df)({case(dframe,field) => dframe.drop(field)})
newDf
的模式是 org.apache.spark.sql.DataFrame = [c2: int]
这篇关于在 Scala Spark 中按数据类型删除列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!