Gremlin找到最高的匹配 [英] Gremlin find highest match

查看:109
本文介绍了Gremlin找到最高的匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我计划使用可以与Gremlin一起查询的图形数据库(AWS Neptune),作为一种知识库。 KB将用作具有多个功能的实体的分类工具。为简单起见,在此示例中,我使用几何形状编码实体的属性。假设我要对与正方形,三角形和圆形相关的点进行分类。我已经绘制了点与可能的正方形,三角形和圆形的不同可能关系的蓝图,如下图所示。

I am planning to use a Graph Database (AWS Neptune) that can be queried with Gremlin as a sort of Knowledge base. The KB would be used as a classification tool on with entities with multiple features. For simplicity, I am using geometric shapes to code the properties of my entities in this example. Let's suppose I want to classify Points that can be related to Squares, Triangles and Circles. I have blueprint the different possible relationships of Points with the possibles Squares, Triangles and Circles in a graph as depicted in the picture below.

创建于:


g.addV('Square').property(id, 'S_A')
 .addV('Square').property(id, 'S_B')
 .addV('Circle').property(id, 'C_A')
 .addV('Triangle').property(id, 'T_A')
 .addV('Triangle').property(id, 'T_B')
 .addV('Point').property(id, 'P1')
 .addV('Point').property(id, 'P2')
 .addV('Point').property(id, 'P3')

g.V('P1').addE('Has_Triangle').to(g.V('T_B'))
g.V('P2').addE('Has_Triangle').to(g.V('T_A'))
g.V('P1').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Square').to(g.V('S_A'))
g.V('P2').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Circle').to(g.V('C_A'))
g.V('P3').addE('Has_Square').to(g.V('S_B'))


不同的实体例如点,正方形,三角形,圆形。

The different entities are for example Points, Squares, Triangles, Circles.

所以我的最终目标是找到满足最高条件的点。例如,

So my ultimate goal is to find the Point that satisfies the highest number of conditions. E.g.

g.V().hasLabel('Point').where(and(
    out('Has_Triangle').hasId('T_A'),
    out('Has_Circle').hasId('C_A'),
    out('Has_Square').hasId('S_A')
))

// ==>v[P2]

上面的查询有效非常适合将具有属性(T_A,S_A,C_A)的点(a)分别分类为点2 ( P2)类型。但是,如果我必须使用相同的查询对具有属性(C_A,S_B,T_X)的Point进行分类,例如:

The query above works very well for classifying a Point (a) with properties (T_A,S_A,C_A) respectively as a Point 2 (P2) type for example. But if I would have to use the same query for classifying a Point with properties (C_A,S_B,T_X) for example:

g.V().hasLabel('Point').where(and(
    out('Has_Triangle').hasId('T_X'),
    out('Has_Circle').hasId('C_A'),
    out('Has_Square').hasId('S_B')
))

查询将无法将此点归类为Point 3(P3),因为在KB中没有已知的 Triangle 属性用于 P3

The query would fail to classify this point as Point 3 (P3) as in the KB there is no known Triangle property for P3.

有没有一种方法可以表达一个查询,该查询返回具有最高 match 的顶点(在这种情况下为P3)?

Is there a way I can express a query that returns the vertex with the highest match which in this case would be P3?

谢谢。

编辑

解决此问题的最佳方法是为不存在的KB属性添加前哨值。然后修改查询以匹配每个确切的属性或标记值。但这意味着如果我将来在Point中添加新的类型属性,例如一个点Has_Hexagon,比我需要向图形的所有点添加定点六边形。

Best idea to solve this so far, is to put sentinel values for KB properties that do not exist. Then modify the query to match each exact property or the sentinel value. But this means that if I add a new "type" of property to a Point in the future e.g. a Point Has_Hexagon, than I need to add sentinel Hexagon to all Points of my graph.

EDIT 2

添加了创建示例数据的Gremlin脚本

Added Gremlin script that creates sample data

推荐答案

您可以使用选择()步骤,为每个匹配项增加一个计数器(麻袋),然后按计数器值排序(降序)并选择第一个(最高匹配项)

You can use the choose() step to increment a counter (sack) for each match, then order by counter values (descending) and pick the first one (highest match).

gremlin> g.withSack(0).V().hasLabel('Point').
           choose(out('Has_Triangle').hasId('T_A'), sack(sum).by(constant(1))).
           choose(out('Has_Circle').hasId('T_A'),   sack(sum).by(constant(1))).
           choose(out('Has_Square').hasId('T_A'),   sack(sum).by(constant(1))).
           order().
             by(sack(), decr).
           limit(1)
==>v[P2]

gremlin> g.withSack(0).V().hasLabel('Point').
           choose(out('Has_Triangle').hasId('T_X'), sack(sum).by(constant(1))).
           choose(out('Has_Circle').hasId('T_A'),   sack(sum).by(constant(1))).
           choose(out('Has_Square').hasId('S_B'),   sack(sum).by(constant(1))).
           order().
             by(sack(), decr).
           limit(1)
==>v[P3]

每个上面查询中的 choose()步骤可以理解为,如果(条件)递增计数器。无论如何,无论条件是否满足,选择都会发出原始顶点( Point )步骤。

Each choose() step in the queries above can be read as if (condition) increment-counter. In any case, whether the condition is met or not, the original vertex (Point) will be emitted by the choose-step.

这篇关于Gremlin找到最高的匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆