Gremlin在Azure CosmosDB上:如何投影相关顶点的属性? [英] Gremlin on Azure CosmosDB: how to project the related vertices' properties?

查看:77
本文介绍了Gremlin在Azure CosmosDB上:如何投影相关顶点的属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Microsoft.Azure.Graphs库连接到Cosmos数据库实例并查询图形数据库.

I use Microsoft.Azure.Graphs library to connect to a Cosmos DB instance and query the graph database.

我正在尝试优化我的Gremlin查询,以便仅选择我只需要的那些属性.但是,我不知道如何选择要从边和顶点中选择的属性.

I'm trying to optimize my Gremlin queries in order to only select those properties that I only require. However, I don't know how to choose which properties to select from edges and vertices.

假设我们从以下查询开始:

Let's say we start from this query:

gremlin> g.V().hasLabel('user').
   project('user', 'edges', 'relatedVertices')
     .by()
     .by(bothE().fold())
     .by(both().fold())

这将返回以下内容:

{
    "user": {
        "id": "<userId>",
        "type": "vertex",
        "label": "user",
        "properties": [
            // all vertex properties
        ]
    },
    "edges": [{
        "id": "<edgeId>",
        "type": "edge",
        "label": "<edgeName>",
        "inV": <relatedVertexId>,
        "inVLabel": "<relatedVertexLabel>",
        "outV": "<relatedVertexId>",
        "outVLabel": "<relatedVertexLabel>"
        "properties": [
            // edge properties, if any
        ]
    }],
    "relatedVertices": [{
        "id": "<vertexId>",
        "type": "vertex",
        "label": "<relatedVertexLabel>",
        "properties": [
            // all related vertex properties
        ]
    }]
}

现在让我们说我们仅从我们称为用户"的根顶点获取几个属性:

Now let's say we only take a couple of properties from the root vertex which we named "User":

gremlin> g.V().hasLabel('user').
   project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
     .by(id)
     .by('prop1')
     .by('prop2')
     .by(bothE().fold())
     .by(both().fold())

这将为我们带来一些进步,并产生以下收益:

Which will make some progress for us and yield something along the lines of:

{
    "id": "<userId>",
    "prop1": "value1",
    "prop2": "value2",
    "edges": [{
        "id": "<edgeId>",
        "type": "edge",
        "label": "<edgeName>",
        "inV": <relatedVertexId>,
        "inVLabel": "<relatedVertexLabel>",
        "outV": "<relatedVertexId>",
        "outVLabel": "<relatedVertexLabel>"
        "properties": [
            // edge properties, if any
        ]
    }],
    "relatedVertices": [{
        "id": "<vertexId>",
        "type": "vertex",
        "label": "<relatedVertexLabel>",
        "properties": [
            // all related vertex properties
        ]
    }]
}

现在可以进行类似于边和相关顶点的操作了吗?说些类似的话:

Now is it possible to do something similar to edges and related vertices? Say, something along the lines of:

gremlin> g.V().hasLabel('user').
   project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
     .by(id)
     .by('prop1')
     .by('prop2')
     .by(bothE().fold()
         .project('edgeId', 'edgeLabel', 'edgeInV', 'edgeOutV')
              .by(id)
              .by(label)
              .by(inV)
              .by(outV))
     .by(both().fold()
         .project('vertexId', 'someProp1', 'someProp2')
              .by(id)
              .by('someProp1')
              .by('someProp2'))

我的目标是获得这样的输出:

My aim is to get an output like this:

{
    "id": "<userId>",
    "prop1": "value1",
    "prop2": "value2",
    "edges": [{
        "edgeId": "<edgeId>",
        "edgeLabel": "<edgeName>",
        "edgeInV": <relatedVertexId>,
        "edgeOutV": "<relatedVertexId>"
    }],
    "relatedVertices": [{
        "vertexId": "<vertexId>",
        "someProp1": "someValue1",
        "someProp2": "someValue2"
    }]
}

推荐答案

您非常接近:

gremlin> g.V().hasLabel('person').
......1>   project('name','age','edges','relatedVertices').
......2>   by('name').
......3>   by('age').
......4>   by(bothE().
......5>      project('id','inV','outV').
......6>        by(id).
......7>        by(inV().id()).
......8>        by(outV().id()).
......9>      fold()).
.....10>   by(both().
.....11>      project('id','name').
.....12>        by(id).
.....13>        by('name').
.....14>      fold())
==>[name:marko,age:29,edges:[[id:9,inV:3,outV:1],[id:7,inV:2,outV:1],[id:8,inV:4,outV:1]],relatedVertices:[[id:3,name:lop],[id:2,name:vadas],[id:4,name:josh]]]
==>[name:vadas,age:27,edges:[[id:7,inV:2,outV:1]],relatedVertices:[[id:1,name:marko]]]
==>[name:josh,age:32,edges:[[id:10,inV:5,outV:4],[id:11,inV:3,outV:4],[id:8,inV:4,outV:1]],relatedVertices:[[id:5,name:ripple],[id:3,name:lop],[id:1,name:marko]]]
==>[name:peter,age:35,edges:[[id:12,inV:3,outV:6]],relatedVertices:[[id:3,name:lop]]]

编写Gremlin时应考虑两点:

Two points you should consider when writing Gremlin:

  1. 上一步的输出将输入到下一步骤的输入中,如果您不能清楚地看到特定步骤的结果,那么后面的步骤可能就不正确了.在您的示例中,在第一个by()中,您在fold()之后添加了project(),这基本上是在说:嘿,Gremlin,为我投影边缘的List".但是,在project()by()调制器中,您将输入的投影不视为List,而是视为单独的边沿,这很可能导致错误.在Java中,该错误是:"java.util.ArrayList无法转换为org.apache.tinkerpop.gremlin.structure.Element".出现这样的错误是一个提示,表明您在Gremlin的某个地方没有正确跟踪步骤的输出和输入.
  2. fold()将遍历流中的所有元素转换为List.因此,在有许多对象的地方,fold()之后将有一个.要再次将它们作为流处理,您需要unfold()以获取分别对其进行操作的步骤.在这种情况下,我们只需要在为每个边/顶点执行sub- project()之后将fold()移动到语句的末尾.但是为什么我们完全需要fold()?答案是传递给by()调制器的遍历没有被其修改的步骤(在本例中为project())完全迭代.该步骤仅调用next()来获取流中的第一个元素-这是设计使然.因此,在要处理by()的整个流的情况下,必须将流缩减为单个对象.您可以使用fold()进行此操作,但其他示例包括sum()count()mean()等.
  1. The output of the previous step feeds into the input of the following step and if you don't clearly see what's coming out of a particular step, then the steps that follow may not end up being right. In your example, in the first by() you added the project() after the fold() which was basically saying "Hey, Gremlin, project that List of edges for me." But in the by() modulators for project() you treated the input to project not as a List but as individual edges which likely led to an error. In Java, that error is: "java.util.ArrayList cannot be cast to org.apache.tinkerpop.gremlin.structure.Element". An error like that is a clue that somewhere in your Gremlin you are not properly following the outputs and inputs of your steps.
  2. fold() takes all the elements in the stream of the traversal and converts them to a List. So where you had many objects, you will now have one after the fold(). To process them as a stream again, you would need to unfold() them for steps to operate on them individually. In this case, we just needed to move the fold() to the end of the statement after doing the sub-project() for each edge/vertex. But why do we need fold() at all? The answer is that the traversal passed to the by() modulator is not iterated completely by the step that it modifies (in this case project()). The step only calls next() to get the first element in the stream - this is by design. Therefore, in cases where you want the entire stream of a by() to be processed you must reduce the stream to a single object. You might do that with fold(), but other examples include sum(), count(), mean(), etc.

这篇关于Gremlin在Azure CosmosDB上:如何投影相关顶点的属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆