阿帕奇graphx合并/合并多个图表 [英] apache graphx merge/combine multiple graphs

查看:956
本文介绍了阿帕奇graphx合并/合并多个图表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新来的Apache GraphX​​,我想看看我能做到图形合并/在graphX​​结合起来。
  我想要做的是说我有2个图如下

I'm new to Apache GraphX and I want to see if I can do graph merge/combine in graphX. What I want to do is say I have 2 graph as below


graph1:     A —1—> B —1—> C —1—> D
            |
             —1—> E —1—> F

graph2:    A —1—> B —1—> C
           |
            —1—> G

和我想要得到合并/合并导致像

and I want to get merge/combine result like



merge result: A —2—> B —2—> C —1—> D
              |
               —1—> E —1—> F
              |
               —1—> G

我可以在Neo4j的做到这一点的嵌入式graphDB与Path对象比较路径,积累边缘计数和失踪路径加入。

I can do this in Neo4j embedded graphDB with Path object to compare path, accumulate edge count and join in missing path.

反正是有或示例,可以帮助我做同样的事情在GraphX​​?

Is there anyway or example that can help me do the same thing in GraphX?

感谢

推荐答案

您需要根据顶点和边的联合创建一个新的图形,然后用groupEdges():

You need to create a new graph based on a union of the vertices and edges and then use groupEdges():

import org.apache.spark.graphx._
import org.apache.spark.graphx.PartitionStrategy.RandomVertexCut

val verts1 = sc.parallelize(Seq(
    (1L,"A"),
    (2L,"B"),
    (3L,"C"),
    (4L,"D"),
    (5L,"E"),
    (6L,"F")))

val edges1 = sc.parallelize(Seq(
    Edge(1L,2L,1),
    Edge(2L,3L,1),
    Edge(3L,4L,1),
    Edge(1L,5L,1),
    Edge(5L,6L,1)))

val graph1 = Graph(verts1, edges1)

val verts2 = sc.parallelize(Seq(
    (1L,"A"),
    (2L,"B"),
    (3L,"C"),
    (7L,"G")))

val edges2 = sc.parallelize(Seq(
    Edge(1L,2L,1),
    Edge(2L,3L,1),
    Edge(1L,7L,1)))

val graph2 = Graph(verts2, edges2)

val graph: Graph[String,Int] = Graph(
    graph1.vertices.union(graph2.vertices),
    graph1.edges.union(graph2.edges)
).partitionBy(RandomVertexCut).
   groupEdges( (attr1, attr2) => attr1 + attr2 )

如果你现在看看这个新图可以看到合并结果的边缘:

If you now look at the edges of this new graph you can see the merge results:

scala> graph.edges.collect
res0: Array[org.apache.spark.graphx.Edge[Int]] = 
      Array(Edge(1,2,2), Edge(2,3,2), Edge(1,5,1), 
            Edge(5,6,1), Edge(1,7,1), Edge(3,4,1))

这篇关于阿帕奇graphx合并/合并多个图表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆