Spark Scala删除仅包含空值的列 [英] Spark scala remove columns containing only null values

查看：572 发布时间：2020/9/4 19:03:33 scala null spark-dataframe

本文介绍了Spark Scala删除仅包含空值的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法删除仅包含空值的spark dataFrame的列? (我正在使用scala和Spark 1.6.2)

Is there a way to remove the columns of a spark dataFrame that contain only null values ? (I am using scala and Spark 1.6.2)

此刻我正在这样做:

var validCols: List[String] = List()
for (col <- df_filtered.columns){
  val count = df_filtered
    .select(col)
    .distinct
    .count
  println(col, count)
  if (count >= 2){
    validCols ++= List(col)
  }
}

构建包含至少两个不同值的列的列表，然后在select()中使用它.

to build the list of column containing at least two distinct values, and then use it in a select().

谢谢！

推荐答案

我遇到了同样的问题，并且我想出了类似的Java解决方案.我认为目前没有其他方法可以这样做.

I had the same problem and i came up with a similar solution in Java. In my opinion there is no other way of doing it at the moment.

for (String column:df.columns()){
    long count = df.select(column).distinct().count();

    if(count == 1 && df.select(column).first().isNullAt(0)){
        df = df.drop(column);
    }
}

我要删除所有仅包含一个不同值且第一个值为null的列.这样，我可以确定我不会删除所有值都相同但不为空的列.

I'm dropping all columns containing exactly one distinct value and which first value is null. This way I can be sure that i don't drop columns where all values are the same but not null.

这篇关于Spark Scala删除仅包含空值的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Scala删除仅包含空值的列 [英] Spark scala remove columns containing only null values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Scala删除仅包含空值的列 [英] Spark scala remove columns containing only null values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭