Graphx:mapVertices内部有NullPointerException [英] Graphx: I've got NullPointerException inside mapVertices

查看:39
本文介绍了Graphx:mapVertices内部有NullPointerException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用graphx.现在,我只是在本地启动它.这几行中有NullPointerException.第一个println运作良好,而第二个失败.

I want to use graphx. For now I just launchs it locally. I've got NullPointerException in these few lines. First println works well, and second one fails.

..........
val graph: Graph[Int, Int] = Graph(users, relationships)
println("graph.inDegrees = " + graph.inDegrees.count) // this line works well
graph.mapVertices((id, v) => {
  println("graph.inDegrees = " + graph.inDegrees.count) // but this one fails
  42 // doesn't mean anything
}).vertices.collect

我调用图形"对象的哪种方法都没有关系.但是'mapVertices'中的'graph'不为null.

And it does not matter which method of 'graph' object I call. But 'graph' is not null inside 'mapVertices'.

Exception failure in TID 2 on host localhost: 
java.lang.NullPointerException
org.apache.spark.graphx.impl.GraphImpl.mapReduceTriplets(GraphImpl.scala:168)
org.apache.spark.graphx.GraphOps.degreesRDD(GraphOps.scala:72)
org.apache.spark.graphx.GraphOps.inDegrees$lzycompute(GraphOps.scala:49)
org.apache.spark.graphx.GraphOps.inDegrees(GraphOps.scala:48)
ololo.MyOwnObject$$anonfun$main$1.apply$mcIJI$sp(Twitter.scala:42)

推荐答案

在Spark 1.0.2上使用GraphX 2.10复制.我会给您一个解决方法,然后解释我认为正在发生的事情.这对我有用:

Reproduced using GraphX 2.10 on Spark 1.0.2. I'll give you a workaround and then explain what I think is happening. This works for me:

val c = graph.inDegrees.count
graph.mapVertices((id, v) => {
  println("graph.inDegrees = " + c)
}).vertices.collect

通常,当您尝试访问打算在代码上并行执行的整个 RDD 或其他分布式对象(例如 Graph )时,Spark会变得棘手.单个分区,就像您要传递给 mapVertices 的函数一样.但是即使您可以使用它,通常也是一个坏主意.(如您所见,作为一个单独的问题,当它不起作用时,往往会导致真正无益的行为.)

In general, Spark gets prickly when you try to access an entire RDD or other distributed object (like a Graph) in code that's intended to execute in parallel on a single partition, like the function you're passing into mapVertices. But it's also usually a bad idea even when you can get it to work. (As a separate matter, as you've seen, when it doesn't work it tends to result in really unhelpful behavior.)

Graph 的顶点表示为 RDD ,传递给 mapVertices 的函数在相应分区中本地运行可以访问本地顶点数据: id v .您确实不希望将整个图形复制到每个分区.在这种情况下,您只需要向每个分区广播标量,因此将其拉出即可解决问题,并且广播确实很便宜.

The vertices of a Graph are represented as an RDD, and the function you pass into mapVertices runs locally in the appropriate partitions, where it is given access to local vertex data: id and v. You really don't want the entire graph to be copied to each partition. In this case you just need to broadcast a scalar to each partition, so pulling it out solved the problem and the broadcast is really cheap.

在这种情况下,Spark API中有一些技巧可以访问更复杂的对象,但是如果您不小心使用它们,则会破坏性能,因为它们会引入大量的通信.人们常常会想使用它们,因为他们不了解计算模型,而不是因为他们确实需要,尽管确实如此.

There are tricks in the Spark APIs for accessing more complex objects in such a situation, but if you use them carelessly they will destroy your performance because they'll tend to introduce lots of communication. Often people are tempted to use them because they don't understand the computation model, rather than because they really need to, although that does happen too.

这篇关于Graphx:mapVertices内部有NullPointerException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆