如何使用Long数据类型在Apache Spark GraphX中创建VertexId? [英] How to create a VertexId in Apache Spark GraphX using a Long data type?

查看:428
本文介绍了如何使用Long数据类型在Apache Spark GraphX中创建VertexId?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用一些Google Web Graph数据创建一个Graph,可以在这里找到:

I'm trying to create a Graph using some Google Web Graph data which can be found here:

https://snap.stanford.edu/data/web-Google.html

import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD



val textFile = sc.textFile("hdfs://n018-data.hursley.ibm.com/user/romeo/web-Google.txt")
val arrayForm = textFile.filter(_.charAt(0)!='#').map(_.split("\\s+")).cache()
val nodes = arrayForm.flatMap(array => array).distinct().map(_.toLong)
val edges = arrayForm.map(line => Edge(line(0).toLong,line(1).toLong))

val graph = Graph(nodes,edges)

不幸的是,我收到此错误:

Unfortunately, I get this error:

<console>:27: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[Long]
 required: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId, ?)]
Error occurred in an application involving default arguments.
       val graph = Graph(nodes,edges)

那么如何创建VertexId对象?以我的理解,通过Long足够了.

So how can I create a VertexId object? For my understanding it should be sufficient to pass a Long.

有什么想法吗?

非常感谢!

romeo

推荐答案

不完全是.如果查看Graph对象的apply方法的签名,您会看到类似这样的信息(有关完整签名,请参见

Not exactly. If you take a look at the signature of the apply method of the Graph object you'll see something like this (for a full signature see API docs):

apply[VD, ED](
    vertices: RDD[(VertexId, VD)], edges: RDD[Edge[ED]], defaultVertexAttr: VD)

您可以阅读说明:

根据具有属性的顶点和边的集合来构造图形.

Construct a graph from a collection of vertices and edges with attributes.

因此,您不能简单地将RDD[Long]作为vertices参数传递(RDD[Edge[Nothing]]作为edges也不起作用).

Because of that you cannot simply pass RDD[Long] as a vertices argument ( RDD[Edge[Nothing]] as edges won't work either).

import scala.{Option, None}

val nodes: RDD[(VertexId, Option[String])] = arrayForm.
    flatMap(array => array).
    map((_.toLong, None))

val edges: RDD[Edge[String]] = arrayForm.
    map(line => Edge(line(0).toLong, line(1).toLong, ""))

请注意:

任意选择重复的顶点

Duplicate vertices are picked arbitrarily

因此nodes上的.distinct()已过时.

如果要创建不带属性的Graph,则可以使用Graph.fromEdgeTuples.

If you want to create a Graph without attributes you can use Graph.fromEdgeTuples.

这篇关于如何使用Long数据类型在Apache Spark GraphX中创建VertexId?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆