如何计算顶点相似于graphx邻居 [英] how to compute vertex similarity to neighbors in graphx

查看:269
本文介绍了如何计算顶点相似于graphx邻居的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个简单的图形,如:

  VAL用户= sc.parallelize(阵列(
                 (1L,SEQ(M,2014年,40376,空,N,1,拉贾斯坦)),
                 (2L,SEQ(M,2009年,20231,NULL,N,1,拉贾斯坦)),
                 (3L,SEQ(F,2016年,40376,空,N,1,拉贾斯坦))
            ))
VAL边缘= sc.parallelize(阵列(
                 边缘(1L,2L,),
                 边缘(1L,3L,),
                 边缘(2L,3L,)))
VAL图=图(用户,边缘)

我要计算每个顶点多少类似于其对每个属性的邻居。

理想的输出(RDD一个数据框或)将持有这些结果:

  1L:0.5,0.5,0.5,1.0,1.0,1.0,1.0
2L:0.5,0.0,0.0,1.0,1.0,1.0,1.0
3L:0.0,0.5,0.5,1.0,1.0,1.0,1.0

例如,对于1L所述第一值表示在2个邻居,仅1共享相同的值...

我与aggregateMessage玩只是为了计算有多少邻居也有类似的属性值,但无济于事迄今:

  VAL结果= graph.aggregateMessages [(智力,SEQ [任何])](
    //建立消息
    SENDMSG = {
        //地图功能
        三重=>
        //发送消息到目标顶点
        triplet.sendToDst(1,triplet.srcAttr)
        //发送消息到源顶点
        triplet.sendToSrc(1,triplet.dstAttr)
    } //试图指望有类似性质的邻居
    {情况下((CNT1,发件人),(CNT2,接收器))=>
        VAL为prop1 =如果(发送者(0)==接收机(0))1d的别的0D
        VAL prop2 = IF(Math.abs(发送者(1).asInstanceOf [INT] - 接收器(1).asInstanceOf [INT])3;)1D其他0D
        VAL prop3 =如果(发送者(2)==接收机(2))1d的别的0D
        VAL prop4 =如果(发送者(3)==接收机(3))1d的别的0D
        VAL prop5 =如果(发送者(4)==接收机(4))1d的别的0D
        VAL prop6 =如果(发送者(5)==接收机(5))1d的别的0D
        VAL prop7 =如果(发送者(6)==接收机(6))1d的别的0D
        (CNT1 + CNT2,SEQ(为prop1,prop2,prop3,prop4,prop5,prop6,prop7))
    }

这让我对每个顶点正确的邻里大小,但没有总结权值:

  //> (1,(2,列表(0.0,0.0,0.0,1.0,1.0,1.0,1.0)))
// | (2,(2,列表(0.0,1.0,1.0,1.0,1.0,1.0,1.0)))
// | (3,(2,列表(1.0,0.0,0.0,1.0,1.0,1.0,1.0)))


解决方案

因为在你的code,任何款项它并不值求和。而且你的逻辑是错误的。 mergeMsg 接收消息没有(的消息电流)对。尝试是这样的:

 进口breeze.linalg.DenseVector高清compareAttrs(XS:序号[任何]伊苏:序号[任何])=
  DenseVector(xs.zip(YS).MAP {壳体(X,Y)=>如果(X == y)的1L别的0L} .toArray)VAL结果= graph.aggregateMessages [(长,DenseVector [龙])](
  三重=> {
    VAL comparedAttrs = compareAttrs(triplet.dstAttr,triplet.srcAttr)
    triplet.sendToDst(1L,comparedAttrs)
    triplet.sendToSrc(1L,comparedAttrs)
  },
  {情况下((CNT1,V1),(CNT2,V2))=> (CNT1 + CNT2,V1 + V2)}
)result.mapValues​​(KV =方式>(kv._2.map(_ toDouble)/ kv._1.toDouble))收集。
//阵列(
//(1,DenseVector(0.5,0.0,0.5,1.0,1.0,1.0,1.0)),
//(2,DenseVector(0.5,0.0,0.0,1.0,1.0,1.0,1.0)),
//(3,DenseVector(0.0,0.0,0.5,1.0,1.0,1.0,1.0)))

Suppose to have a simple graph like:

val users = sc.parallelize(Array(
                 (1L, Seq("M", 2014, 40376, null, "N", 1, "Rajastan")),
                 (2L, Seq("M", 2009, 20231, null, "N", 1, "Rajastan")),
                 (3L, Seq("F", 2016, 40376, null, "N", 1, "Rajastan"))
            ))                                
val edges = sc.parallelize(Array(
                 Edge(1L, 2L, ""), 
                 Edge(1L, 3L, ""), 
                 Edge(2L, 3L, "")))
val graph = Graph(users, edges)

I'd like to compute how much each vertex is similar to its neighbors on each attribute.

The ideal output (an RDD or DataFrame) would hold these results:

1L: 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 1.0
2L: 0.5, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0
3L: 0.0, 0.5, 0.5, 1.0, 1.0, 1.0, 1.0

For instance, the first value for 1L means that on 2 neighbors, just 1 share the same value...

I am playing with aggregateMessage just to count how many neighbors have a similar attribute value but with no avail so far:

val result = graph.aggregateMessages[(Int, Seq[Any])](
    // build the message
    sendMsg = {
        // map function
        triplet =>
        // send message to destination vertex
        triplet.sendToDst(1, triplet.srcAttr)
        // send message to source vertex 
        triplet.sendToSrc(1, triplet.dstAttr)
    }, // trying to count neighbors with similar property
    { case ((cnt1, sender), (cnt2, receiver)) =>
        val prop1 = if(sender(0) == receiver(0)) 1d else 0d
        val prop2 = if(Math.abs(sender(1).asInstanceOf[Int] - receiver(1).asInstanceOf[Int])<3) 1d else 0d
        val prop3 = if(sender(2) == receiver(2)) 1d else 0d
        val prop4 = if(sender(3) == receiver(3)) 1d else 0d
        val prop5 = if(sender(4) == receiver(4)) 1d else 0d
        val prop6 = if(sender(5) == receiver(5)) 1d else 0d
        val prop7 = if(sender(6) == receiver(6)) 1d else 0d
        (cnt1 + cnt2, Seq(prop1, prop2, prop3, prop4, prop5, prop6, prop7))
    }
)

this gives me the correct neighborhood size for each vertex but is not summing up the values right:

//> (1,(2,List(0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0)))
//| (2,(2,List(0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)))
//| (3,(2,List(1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0)))

解决方案

It doesn't sum values because there is no sum in your code. Moreover your logic is wrong. mergeMsg receives messages not (message, current) pairs. Try something like this:

import breeze.linalg.DenseVector

def compareAttrs(xs: Seq[Any], ys: Seq[Any]) = 
  DenseVector(xs.zip(ys).map{ case (x, y) => if (x == y) 1L else 0L}.toArray)

val result = graph.aggregateMessages[(Long, DenseVector[Long])](
  triplet => {
    val comparedAttrs = compareAttrs(triplet.dstAttr, triplet.srcAttr)
    triplet.sendToDst(1L, comparedAttrs)
    triplet.sendToSrc(1L, comparedAttrs)
  },
  { case ((cnt1, v1), (cnt2, v2)) => (cnt1 + cnt2, v1 + v2) }
)

result.mapValues(kv => (kv._2.map(_.toDouble) / kv._1.toDouble)).collect
// Array(
//   (1,DenseVector(0.5, 0.0, 0.5, 1.0, 1.0, 1.0, 1.0)),
//   (2,DenseVector(0.5, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0)), 
//   (3,DenseVector(0.0, 0.0, 0.5, 1.0, 1.0, 1.0, 1.0)))

这篇关于如何计算顶点相似于graphx邻居的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆