用Gremlin对二分图进行随机游走 [英] Random Walk on Bipartite Graph with Gremlin

查看:1096
本文介绍了用Gremlin对二分图进行随机游走的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我想根据给定的用户偏好(用户喜欢的项目)根据在有向二分图上随机行走来对项目进行排名。该图形具有以下基本结构:
$ b $ [User1] ---'likes'---> [ItemA]< ---'likes'- - [User2] ---'likes'---> [ItemB]



以下是我提出的查询:

  def runRankQuery(def userVertex){
def m = [:]
def c = 0
while(c < 1000){
userVertex
.out('likes')//获取当前或类似用户的所有喜欢项目
.shuffle [0] //随机选择一个喜欢的项目
.groupCount(m)//更新所选项目的计数
.in('likes')//获得所有也喜欢商品的用户
.shuffle [0] //随机选择一个喜欢商品的用户
.loop(5){Math.random()< 0.5} //遵循新用户喜欢的边缘(在循环中馈送新用户)
//或中止查询(从原始用户重新启动,外部循环)
.iterate()
c ++
}
m = m.sort {a,b - > b.value< => a.value}
printlnintermediate result $ m
m.keySet()。removeAll(userVertex.out('likes')。toList())
//编辑sense - remove):m.each {k,v - > m [k] = v / m.values()。sum()}
//编辑(无意义 - 删除):m.sort {-it.value}
return m.keySet )作为列表;
}

然而这段代码并没有找到新的项目(上面例子中的[ItemB]),但只有仅限给定用户的喜欢项目(例如[ItemA])。


  • 为了继续步行,我需要改变为用户提供一个新的用户(例如[User2]),然后返回到'out('likes')'步骤?





  • hr>

    这里运行这个例子的代码:

      g = new TinkerGraph )

    user1 = g.addVertex()
    user1.name ='User1'
    user2 = g.addVertex()
    user2.name ='User2'
    itemA = g.addVertex()
    itemA.name ='ItemA'
    itemB = g.addVertex()
    itemB.name ='ItemB'

    g.addEdge(user1,itemA,'likes')
    g.addEdge(user2,itemA,'likes')
    g.addEdge(user2,itemB,'likes')

    println运行RankQuery(user1)

    并输出:

      intermediate result [v [2]:1000] 
    []
    ==> null
    gremlin> g.v(2).name
    ==> ItemA
    gremlin>


    解决方案

    我发现这是一个很奇怪的问题。我发现了几个很难解释的奇怪问题,最后我不确定他们为什么如此。对我而言,这两件大事很奇怪:


    1. 我不确定随机播放步骤。它似乎并没有在你的案例中正确随机化。我似乎无法在这种情况之外重新创建问题,所以我不确定它是否与您的数据大小或其他相关。

    2. 我遇到了奇怪的问题使用 Math.random()来摆脱循环。

    无论如何,我想我已经捕获了代码的本质,而且我的更改似乎按照您的要求进行:

  • p>

      runRankQuery = {userVertex  - > 
    def m = [:]
    def c = 0
    def rand = new java.util.Random()
    while(c <1000){
    def max = rand.nextInt(10)+ 1
    userVertex ._()。as('x')
    .out('likes')
    .gather.transform {it [rand。 nextInt(it.size())]}
    .groupCount(m)
    .in('likes')
    .gather.transform {it [rand.nextInt(it.size() )]}
    .loop('x'){it.loops< ()。
    .iterate()
    c ++
    }
    printlnintermediate result $ m
    m.keySet()。removeAll(userVertex.out('likes' ).toList())
    m.each {k,v - > m [k] = v / m.values()。sum()}
    m.sort {-it.value}
    返回m.keySet()作为List;
    }

    我替换了 shuffle 用自己的品牌随机播放,从收集的列表中随机选择一个顶点。我也随机选择了一个 max 循环,而不是依赖 Math.random()。当我现在运行它时,我想我会得到你正在寻找的结果:

      gremlin> runRankQuery(user1)
    intermediate result [v [2]:1787,v [3]:326]
    ==> v [3]
    gremlin> runRankQuery(user1)
    intermediate result [v [2]:1848,v [3]:330]
    ==> v [3]
    gremlin> runRankQuery(user1)
    intermediate result [v [2]:1899,v [3]:339]
    ==> v [3]
    gremlin> runRankQuery(user1)
    intermediate result [v [2]:1852,v [3]:360]
    ==> v [3]
    pre>

    您可能还得到了 Math.random(),因为它在某些迭代中对我的行为可预测与此合作。

    I would like to rank items according to a given users preference (items liked by the user) based on a random walk on a directed bipartite graph using gremlin in groovy.

    The graph has the following basic structure:

    [User1] ---'likes'---> [ItemA] <---'likes'--- [User2] ---'likes'---> [ItemB]

    Hereafter the query that I came up with:

    def runRankQuery(def userVertex) {
        def m = [:]
        def c = 0
        while (c < 1000) {
            userVertex
                .out('likes')   // get all liked items of current or similar user
                .shuffle[0]     // select randomly one liked item
                .groupCount(m)  // update counts for selected item
                .in('likes')    // get all users who also liked item
                .shuffle[0]     // select randomly one user that liked item
                .loop(5){Math.random() < 0.5}   // follow liked edge of new user (feed new user in loop) 
                                                // OR abort query (restart from original user, outer loop)      
                .iterate()
            c++
        }
        m = m.sort {a, b -> b.value <=> a.value}
        println "intermediate result $m"
        m.keySet().removeAll(userVertex.out('likes').toList())
        // EDIT (makes no sense - remove): m.each{k,v -> m[k] = v / m.values().sum()}
        // EDIT (makes no sense - remove): m.sort {-it.value }
        return m.keySet() as List;
    }
    

    However this code does not find new items ([ItemB] in example above) but only the liked items of the given user (e.g. [ItemA]).

    • What do I need to change to feed a new user (e.g. [User2]) with the loop step back to the 'out('likes')' step in order to continue the walk?

    • Once this code is working, can it be seen as an implementation of 'Personalized PageRank'?


    Here the code to run the example:

    g = new TinkerGraph()
    
    user1 = g.addVertex()
    user1.name ='User1'
    user2 = g.addVertex()
    user2.name ='User2'
    itemA = g.addVertex()
    itemA.name ='ItemA'
    itemB = g.addVertex()
    itemB.name ='ItemB'
    
    g.addEdge(user1, itemA, 'likes')
    g.addEdge(user2, itemA, 'likes')
    g.addEdge(user2, itemB, 'likes')
    
    println runRankQuery(user1)
    

    And the output:

    intermediate result [v[2]:1000]
    []
    ==>null
    gremlin> g.v(2).name
    ==>ItemA
    gremlin> 
    

    解决方案

    I found this to be a really strange issue. I found several very strange problems which aren't easily explainable and in the end, I'm not sure why they are the way they are. The two big things that are strange to me are:

    1. I'm not sure if there is a problem with the shuffle step. It does not seem to randomize properly in your case here. I can't seem to recreate the problem outside of this case, so I'm not sure if it's somehow related to the size of your data or something else.
    2. I hit strange problems with use of Math.random() to break out of the loop.

    Anyway, I think I've captured the essence of your code here with my changes that seem to do what you want:

    runRankQuery = { userVertex ->
        def m = [:]
        def c = 0
        def rand = new java.util.Random()
        while (c < 1000) {
            def max = rand.nextInt(10) + 1
            userVertex._().as('x')
                .out('likes')   
                .gather.transform{it[rand.nextInt(it.size())]}
                .groupCount(m) 
                .in('likes')    
                .gather.transform{it[rand.nextInt(it.size())]}
                .loop('x'){it.loops < max}  
                .iterate()
            c++
        }
        println "intermediate result $m"
        m.keySet().removeAll(userVertex.out('likes').toList())
        m.each{k,v -> m[k] = v / m.values().sum()}
        m.sort {-it.value }
        return m.keySet() as List;
    }
    

    I replaced shuffle with my own brand of "shuffle" by randomly selecting a single vertex from the gathered list. I also randomly selected a max loops rather than relying on Math.random(). When I run this now, I think I get the results you are looking for:

    gremlin> runRankQuery(user1)                                       
    intermediate result [v[2]:1787, v[3]:326]
    ==>v[3]
    gremlin> runRankQuery(user1)
    intermediate result [v[2]:1848, v[3]:330]
    ==>v[3]
    gremlin> runRankQuery(user1)
    intermediate result [v[2]:1899, v[3]:339]
    ==>v[3]
    gremlin> runRankQuery(user1)
    intermediate result [v[2]:1852, v[3]:360]
    ==>v[3]
    

    You might yet get Math.random() to work as it did behave predictably for me on some iterations of working with this.

    这篇关于用Gremlin对二分图进行随机游走的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆