调用distinct 和map 会在spark 库中抛出NPE [英] call of distinct and map together throws NPE in spark library

查看：26 发布时间：2021/11/12 5:38:36 scala nullpointerexception apache-spark

本文介绍了调用distinct 和map 会在spark 库中抛出NPE的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我不确定这是否是一个错误，所以如果你做这样的事情

I am unsure if this is a bug, so if you do something like this

// d:spark.RDD[String]
d.distinct().map(x => d.filter(_.equals(x)))

您将获得一个 Java NPE.但是，如果您在 distinct 之后立即执行 collect，一切都会好起来的.

you will get a Java NPE. However if you do a collect immediately after distinct, all will be fine.

我使用的是 spark 0.6.1.

I am using spark 0.6.1.

推荐答案

Spark 不支持嵌套 RDD 或引用其他 RDD 的用户定义函数，因此出现 NullPointerException；请参阅 spark-users 邮件列表上的此主题.

Spark does not support nested RDDs or user-defined functions that refer to other RDDs, hence the NullPointerException; see this thread on the spark-users mailing list.

看起来您当前的代码正在尝试按值对 d 的元素进行分组；您可以使用 groupBy() RDD 方法:

It looks like your current code is trying to group the elements of d by value; you can do this efficiently with the groupBy() RDD method:

scala> val d = sc.parallelize(Seq("Hello", "World", "Hello"))
d: spark.RDD[java.lang.String] = spark.ParallelCollection@55c0c66a

scala> d.groupBy(x => x).collect()
res6: Array[(java.lang.String, Seq[java.lang.String])] = Array((World,ArrayBuffer(World)), (Hello,ArrayBuffer(Hello, Hello)))

这篇关于调用distinct 和map 会在spark 库中抛出NPE的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

调用distinct 和map 会在spark 库中抛出NPE [英] call of distinct and map together throws NPE in spark library

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

调用distinct 和map 会在spark 库中抛出NPE [英] call of distinct and map together throws NPE in spark library

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭