格林姆林:调用fold()或count()后无法返回上一步 [英] Gremlin: Cannot go back to a previous step after calling fold() or count()

查看:92
本文介绍了格林姆林:调用fold()或count()后无法返回上一步的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此查询不返回任何内容,因为调用fold()会删除之前存储的所有as():

This query does not return anything because calling fold() removes all the previous as() stored:

g.V()
.hasLabel('user')
.project("user")
.by(
    as("singleUser")
    .V()
    .fold()
    .choose(
        count(local).is(gt(1)),
        select('singleUser'),
        unfold()
    )
)

当然,我总是可以以牺牲性能为代价来解决此类问题,通过搜索2次来复制成本,我正在寻找比这更好的解决方案.

Of course I can always solve this kind of problem at the expense of performance, duplicating the cost by searching 2 times, I'm looking for a better solution than that.

也用store()替换as()会得到不同的输出,因此也不是解决方案. store()可以保留到fold(),但是使用相同的字符串多次调用会将每个调用添加到列表中,而as()将第二个调用替换为第一个调用,而不是同一工具.

Also replacing as() by store() gives a different output, so that also is not a solution. store() survives to fold() but called multiple times with the same string adds each call to a list and as() replaces the first call with the second call, not the same tool.

您可以尝试: https://gremlify.com/tgq24psdfri

一个更接近我的真实查询的示例是:

An example closer to my real query is this:

g.V()
.hasLabel('user')
.project("u")
.by(
    as("appUser")
    .both("friend")
    .project("result")
    .by(
        as("appUserFriend")
        .choose(
            both("friend").where(bothE('friend').where(bothV().as('appUser'))).count().is(lt(2)),
            constant("too small").fold(),
            union(
                both("friend").where(bothE('friend').where(bothV().as('appUser'))),
                select("appUserFriend")
            ).order().by("name").values("name").fold()
        )
    ).select(values).unfold()
).select(values).unfold().dedup()

此查询找到所有可能的朋友组".要组成一组朋友,每个成员都必须是至少2个其他朋友用户(至少一个三角形)的朋友.该查询有效,但还会生成总共2个成员的组,也就是说,当不满足2个朋友的条件时,这些组会因为太小"而被忽略.

This query finds all possible "groups of friends". To form a group of friends each member needs to be friend of at least 2 other friend users (at least a triangle). The query works but also generates groups of 2 total members, that is when the condition of 2 friends is not met so these groups are ignored for being "too small".

您可以在此处运行查询: https://gremlify.com/lu64acieuw

You can run the query here: https://gremlify.com/lu64acieuw

查询运行并且输出正确,但是请注意第11和14行(在gremlify中)搜索是相同的,为了提高性能,我想调用select()返回而不是编写相同的搜索,但这是不可能的,因为这个问题的问题.欢迎您使用其他任何技巧两次不写相同的搜索结果.

The query runs and the output is correct, but notice in line numbers 11 and 14 (in gremlify) the search is the same, to improve performance I want to call select() to go back instead of writing the same search, but it's not possible because of the problem of this question. Any other trick to not writing the same search 2 times is welcome.

这是它如何逐步运行的描述:

This is a description of how it works step by step:

  1. 选择该应用程序的所有用户,我们称它们为"appUser"
  2. 选择所有appUser的朋友,我们称它们为"appUserFriend"
  3. 选择"appUserFriend"的朋友;也是"appUser"的朋友并将它们添加到数组中
  4. 在数组中包括"appUserFriend"和"appUser"
  5. 删除重复项

推荐答案

我将假设您并不真正在意太小"的内容.该小组给出了您编写问题的方式,并且该问题与您在最后列举其步骤时所描述的算法有关.考虑到这一假设,我注意到您基本上是在检测三角形,然后尝试相应地对它们进行分组.在Gremlin食谱此处中讨论了循环检测,该模式基本上是:

I'm going to assume that you don't really care about the "too small" group given the way you wrote your question and that this question is about the algorithm you described in enumeration of its steps at the end. With that assumption in mind I notice that you are basically detecting triangles and then trying to group them accordingly. Cycle detection is discussed in Gremlin Recipes here and the pattern is basically:

g.V().as("a").repeat(both().simplePath()).times(2).where(both().as("a")).path()

或删除了重复的路径:

g.V().as("a").repeat(both().simplePath()).times(2).where(both().as("a")).path().
  dedup().by(unfold().order().by(id).dedup().fold())

以此为基础,您只需要将这些结果转换为您要寻找的组即可.如果您发现效率更高,则可以在Gremlin之外的自己的应用程序代码中执行此操作,但是使用Gremlin进行编码的一种方法是将三角形内的所有对都分组,然后组合这些已分组路径的元素:

With that as a basis you then just need to convert those results into the groups you are looking for. You could likely do that in your own application code outside of Gremlin if you find that more efficient, but one way to do it with Gremlin would involve grouping all the pairs within the triangles and then combining the elements of those paths that grouped:

g.V().as('a').
  repeat(both().simplePath()).
    times(2).
  where(both().as('a')).
  path().
  map(unfold().limit(3).order().by(id).dedup().fold())
  dedup().
  group('m').
    by(limit(local,2)).
  group('m').
    by(tail(local,2)).
  group('m').
    by(union(limit(local,1),tail(local,1)).fold()).     
  cap('m').
  unfold().
  map(select(values).unfold().unfold().order().by(id).dedup().fold()).
  dedup().
  map(unfold().values('name').fold())

也许还有更好的方法,但是我认为此查询至少使您免于一次又一次地查询和重新查询相同的路径.我还认为,这样做比较容易,因为一旦读者注意到三角形的计数模式,剩下的就是没有回溯和很多层次结构.奇怪的是,看看在Gremlin或您的应用程序代码中将三角处理成组的处理是否更好.那可能值得探索.

There might yet be a better way but I think this query at least saves you from querying and re-querying the same paths again and again. I also think it's easier to follow, as once the reader notices the triangle counting pattern the rest is free of backtracking and a lot of hierarchy. It would be curious to see if triangle processing into groups was better handled in Gremlin or in your application code. That might be worth exploring.

我不确定您的图形有多大,但是此特定查询可能更适合使用Spark和 connectedComponent() .

I'm not sure how big your graph is, but this particular query might be better suited to OLAP style processing with Spark and a custom VertexProgram, something perhaps similar to connectedComponent().

这篇关于格林姆林:调用fold()或count()后无法返回上一步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆