使用Gremlin / TitanDB执行aggregrate查询 [英] Performing aggregrate queries using Gremlin / TitanDB
问题描述
我有一个Titan图形数据库,有一组顶点通过一个名为property1的属性连接起来。
I have a Titan graph database witha set of vertices connected by an edge with a property named "property1".
可以写一个Gremlin否则Titan会支持)查询:
Is it possible to write a Gremlin (or anything else Titan would support) query to:
Find all edges that have a value for "property1" that is seen 5 or less times.
在SQL中,我将使用Group By,在MongoDB中,我将使用一个聚合函数。
In SQL I would use "Group By", in MongoDB I would use one of the aggregate functions.
我认为这可能是Furnace / Faunus的工作?
I am thinking this may be a job for Furnace/Faunus?
推荐答案
您可以通过迭代所有边并使用 groupBy
来执行此操作。下面是使用 weight
代替 property1
的玩具图形的示例:
You can do this by iterating all edges and using groupBy
. Here's an example with the toy graph using weight
in place of property1
:
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.E.groupBy{it.weight}{it}.cap.next()
==>0.5=[e[7][1-knows->2]]
==>1.0=[e[8][1-knows->4], e[10][4-created->5]]
==>0.4=[e[11][4-created->3], e[9][1-created->3]]
==>0.2=[e[12][6-created->3]]
因此,通过 weight
分组所有边。从那里,你可以下降到标准的groovy函数,如 findAll
过滤掉你不想要的(这里我过滤掉了> 1
边缘...在您的情况下它将是< 5
)。
So that groups all edges by their weight
. From there you can drop down to standard groovy functions like findAll
to filter out what you don't want (here i filter out weights that have >1
edge in them...in your case it would be <5
).
gremlin> g.E.groupBy{it.weight}{it}.cap.next().findAll{k,v->v.size()>1}
==>1.0=[e[8][1-knows->4], e[10][4-created->5]]
==>0.4=[e[11][4-created->3], e[9][1-created->3]]
显然这是一个昂贵的操作在一个真正大的图上,因为你有很多迭代来做边缘,你必须在内存中构建一个 Map
,这可能会很大,取决于值的多样性在 property1
中。如果你能找到方法来限制边缘迭代与其他过滤器,这可能是有帮助的。
Obviously this is a bit of an expensive operation on a really large graph as you have a lot of iteration to do over edges and you have to build up a Map
in memory which could be big depending on the diversity of the values in property1
. If you can find ways to limit edge iteration with other filters, that might be helpful.
如果你有一个非常大的图,这将是一个很好的Faunus。我将用简单的答案在这里,简单地说,你不一定希望特定的边缘与 property1
值发生少于5次,你只想要知道多少次不同的 property1
值出现。与Faunus你可以得到一个像这样的分配:
This would be a good job for Faunus if you had a really large graph. I'll go with the easy answer here and simply say that you don't necessarily want the specific edges with a property1
value occurring less than 5 times and that you just want to know how many times different property1
values occur. With Faunus you could get a distribution like that with:
g.E.property1.groupCount()
这篇关于使用Gremlin / TitanDB执行aggregrate查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!