Spark scala 删除仅包含空值的列 [英] Spark scala remove columns containing only null values
问题描述
有没有办法删除只包含空值的 spark dataFrame 列?(我使用的是 Scala 和 Spark 1.6.2)
Is there a way to remove the columns of a spark dataFrame that contain only null values ? (I am using scala and Spark 1.6.2)
目前我正在这样做:
var validCols: List[String] = List()
for (col <- df_filtered.columns){
val count = df_filtered
.select(col)
.distinct
.count
println(col, count)
if (count >= 2){
validCols ++= List(col)
}
}
构建包含至少两个不同值的列列表,然后在 select() 中使用它.
to build the list of column containing at least two distinct values, and then use it in a select().
谢谢!
推荐答案
我遇到了同样的问题,我想出了一个类似的 Java 解决方案.在我看来,目前没有其他方法可以做到这一点.
I had the same problem and i came up with a similar solution in Java. In my opinion there is no other way of doing it at the moment.
for (String column:df.columns()){
long count = df.select(column).distinct().count();
if(count == 1 && df.select(column).first().isNullAt(0)){
df = df.drop(column);
}
}
我将删除包含一个不同值且第一个值为空的所有列.这样我就可以确定我不会删除所有值都相同但不为空的列.
I'm dropping all columns containing exactly one distinct value and which first value is null. This way I can be sure that i don't drop columns where all values are the same but not null.
这篇关于Spark scala 删除仅包含空值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!