使用Gremlin / TitanDB执行aggregrate查询 [英] Performing aggregrate queries using Gremlin / TitanDB

查看:555
本文介绍了使用Gremlin / TitanDB执行aggregrate查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Titan图形数据库,有一组顶点通过一个名为property1的属性连接起来。

I have a Titan graph database witha set of vertices connected by an edge with a property named "property1".

可以写一个Gremlin否则Titan会支持)查询:

Is it possible to write a Gremlin (or anything else Titan would support) query to:

Find all edges that have a value for "property1" that is seen 5 or less times.

在SQL中,我将使用Group By,在MongoDB中,我将使用一个聚合函数。

In SQL I would use "Group By", in MongoDB I would use one of the aggregate functions.

我认为这可能是Furnace / Faunus的工作?

I am thinking this may be a job for Furnace/Faunus?

推荐答案

您可以通过迭代所有边并使用 groupBy 来执行此操作。下面是使用 weight 代替 property1 的玩具图形的示例:

You can do this by iterating all edges and using groupBy. Here's an example with the toy graph using weight in place of property1:

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.E.groupBy{it.weight}{it}.cap.next()                         
==>0.5=[e[7][1-knows->2]]
==>1.0=[e[8][1-knows->4], e[10][4-created->5]]
==>0.4=[e[11][4-created->3], e[9][1-created->3]]
==>0.2=[e[12][6-created->3]]

因此,通过 weight 分组所有边。从那里,你可以下降到标准的groovy函数,如 findAll 过滤掉你不想要的(这里我过滤掉了> 1 边缘...在您的情况下它将是< 5 )。

So that groups all edges by their weight. From there you can drop down to standard groovy functions like findAll to filter out what you don't want (here i filter out weights that have >1 edge in them...in your case it would be <5).

gremlin> g.E.groupBy{it.weight}{it}.cap.next().findAll{k,v->v.size()>1}
==>1.0=[e[8][1-knows->4], e[10][4-created->5]]
==>0.4=[e[11][4-created->3], e[9][1-created->3]]

显然这是一个昂贵的操作在一个真正大的图上,因为你有很多迭代来做边缘,你必须在内存中构建一个 Map ,这可能会很大,取决于值的多样性在 property1 中。如果你能找到方法来限制边缘迭代与其他过滤器,这可能是有帮助的。

Obviously this is a bit of an expensive operation on a really large graph as you have a lot of iteration to do over edges and you have to build up a Map in memory which could be big depending on the diversity of the values in property1. If you can find ways to limit edge iteration with other filters, that might be helpful.

如果你有一个非常大的图,这将是一个很好的Faunus。我将用简单的答案在这里,简单地说,你不一定希望特定的边缘与 property1 值发生少于5次,你只想要知道多少次不同的 property1 值出现。与Faunus你可以得到一个像这样的分配:

This would be a good job for Faunus if you had a really large graph. I'll go with the easy answer here and simply say that you don't necessarily want the specific edges with a property1 value occurring less than 5 times and that you just want to know how many times different property1 values occur. With Faunus you could get a distribution like that with:

g.E.property1.groupCount()

这篇关于使用Gremlin / TitanDB执行aggregrate查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆