使用AQL(或arangojs)从ArangoDB获取d3的数据 [英] Getting data for d3 from ArangoDB using AQL (or arangojs)

查看:745
本文介绍了使用AQL(或arangojs)从ArangoDB获取d3的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ArangoDB在后端构建一个基于d3强制导向图的应用程序,我希望能够从Arango动态加载节点和链接数据,尽可能高效。



我不是d3中的专家,但是一般来说强制布局似乎希望它的数据是一个节点数组和一个链接数组,它们具有实际的节点对象作为它们的源和目标,像这样:

  var nodes = [
{id:0,reflexive:false},
{id:1,reflexive:true},
{id:2,reflexive:false}
],
links = [
{source:nodes [0] node [1],left:false,right:true},
{source:nodes [1],target:nodes [2],left:false,right:true}
]

目前我使用以下AQL查询来获取邻居节点,但是相当麻烦。部分困难在于,我想要为节点包括边缘信息,即使这些边缘没有遍历(为了显示一个节点在从数据库加载这些链接之前的链接数量)。



<$ c $ p> LET docId =ExampleDocClass / 1234567

//获取所有边的数据
LET es = GRAPH_EDGES EdgeClass',docId,{direction:'any',maxDepth:1,includeData:true})

//创建一个所有邻居节点的数组
LET vArray = b FOR v IN GRAPH_TRAVERSAL('EdgeClass',docId [0],'any',{maxDepth:1})
FOR v1 IN v RETURN v1.vertex


//使用节点数组,为每个节点返回入站和出站
LET vs =(
FOR v IN vArray
//入站和出站是单独的查询,因为我无法弄清楚
//如何获取Arango以在查询结果中区分inbout和outbound
LET oe =(FOR oe1 IN GRAPH_EDGES('EdgeClass',v,{direction:'outbound',maxDepth:1,includeData:true })RETURN oe1._to)
LET ie =(FOR ie1 IN GRAPH_EDGES('EdgeClass',v,{direction:'inbound',maxDepth:1,includeData:true})RETURN ie1._from)
RETURN {'vertexData':v,'outEdges':oe,'inEdges':ie}

RETURN {'edges':es,'vertices':vs}

结束输出如下所示:
http://pastebin.com/raw.php?i=B7uzaWxs
...可以直接读入d3(我只需要重复数据删除)。



我的图形节点有大量的链接,因此性能很重要(无论是在服务器和客户端的负载,两者之间的通信)。我还计划创建各种命令以与图形交互,除了简单地扩展相邻节点。有没有办法更好地结构这个AQL查询(例如通过避免四个单独的图查询)或避免AQL一起使用arangojs函数或FOXX应用程序,同时仍然构造响应我需要为d3(包括每个节点的链接数据)?

解决方案

对于迟到的回复,我们正在忙于构建v2.8;)
我建议在数据库端做尽可能多的事情,因为通过网络复制和序列化/反序列化JSON通常是很昂贵的,所以传输尽可能少的数据应该是一个好的目标。



首先,我使用你的查询,并在我创建的样本数据集上执行它(约800个顶点和800个边缘在我的数据集中命中)
作为基线,我使用了查询的执行时间,我的情况是〜5.0s



所以我试图创建完全相同的结果,只需要在AQL。
我在您的查询中发现了一些改进:
1. GRAPH_NEIGHBORS GRAPH_EDGES
2.如果可能,避免 {includeData:true} 如果你不需要数据
特别是如果你需要/ from vertices._id only GRAPH_NEIGHBORS {includeData:false} 优于 GRAPH_EDGES 的幅度。
3. GRAPH_NEIGHBORS已重复数据删除,GRAPH_EDGES未重复数据删除。在你的情况似乎是所期望的。
3.你可以删除一些子查询。



这里是纯AQL查询我可以想出:

  LET docId =ExampleDocClass / 1234567
LET edges = GRAPH_EDGES('EdgeClass',docId,{direction:'any',maxDepth :1,includeData:true})
LET verticesTmp =(FOR v IN GRAPH_NEIGHBORS('EdgeClass',docId,{direction:'any',maxDepth:1,includeData:true})
RETURN {
vertexData:v,
outEdges:GRAPH_NEIGHBORS('EdgeClass',v,{direction:'outbound',maxDepth:1,includeData:false}),
inEdges:GRAPH_NEIGHBORS v,{direction:'inbound',maxDepth:1,includeData:false})
})
LET vertices = PUSH(verticesTmp,{
vertexData:DOCUMENT(docId),
outEdges:GRAPH_NEIGHBORS('EdgeClass',docId,{direction:'outbound',maxDepth:1,includeData:false}),
inEdges:GRAPH_NEIGHBORS('EdgeClass',docId,{direction:'inbound',maxDepth :1,includeData:false})
})
RETURN {edges,vertices}

这会产生与查询相同的结果格式,并且具有连接到docId的每个顶点在顶点中只存储一次的优点。此外,docId本身在顶点中只存储一次。
客户端不需要重复数据删除。
但是,在每个顶点的outEdges / inEdges中,所有连接的顶点也只是一次,我不知道你是否需要知道这个列表中顶点之间是否有多个边。



此查询对我的数据集使用〜0.06s



但是如果你付出更多努力也可以考虑在Foxx应用程序中使用手工制作的遍历。
这有点复杂,但在你的情况下可能会更快,因为你做更少的子查询。
这个代码可能如下所示:

  var traversal = require(org / arangodb / graph /遍历); 
var result = {
edges:[],
vertices:{}
}
var myVisitor = function(config,result,vertex,path,connected){
switch(path.edges.length){
case 0:
if(!result.vertices.hasOwnProperty(vertex._id)){
//如果我们访问顶点,我们存储它的数据和准备out / in
result.vertices [vertex._id] = {
vertexData:vertex,
outEdges:[],
inEdges:[]
};
}

//无进一步操作
break;
case 1:
if(!result.vertices.hasOwnProperty(vertex._id)){
//如果我们访问一个顶点,我们存储它的数据并准备
result.vertices [vertex._id] = {
vertexData:vertex,
outEdges:[],
inEdges:[]
};
}
//第一个深度,我们需要EdgeData
var e = path.edges [0];
result.edges.push(e);
//从/到两个顶点填充
result.vertices [e._from] .outEdges.push(e._to);
result.vertices [e._to] .inEdges.push(e._from);
break;
case 2:
//第二个深度,我们不需要EdgeData
var e = path.edges [1];
//从/到所有存在的顶点填充
if(result.vertices.hasOwnProperty(e._from)){
result.vertices [e._from] .outEdges.push (e._to);
}
if(result.vertices.hasOwnProperty(e._to)){
result.vertices [e._to] .inEdges.push(e._from);
}
break;
}
};
var config = {
datasource:traversal.generalGraphDatasourceFactory(EdgeClass),
strategy:depthfirst,
order:preorder,
visitor:myVisitor ,
expander:traversal.anyExpander,
minDepth:0,
maxDepth:2
};
var traverser = new traversal.Traverser(config);
traverser.traverse(result,{_id:ExampleDocClass / 1234567});
return {
edges:result.edges,
vertices:Object.keys(result.vertices).map(function(key){
return result.vertices [key]
})
};

此遍历的想法是访问从起始顶点到最多两个边缘的所有顶点。
0到1深度的所有顶点将被添加到vertices对象的数据。
源自起始顶点的所有边将被添加到边列表中的数据。
深度2中的所有顶点将只在结果中设置outEdges / inEdges。



这有一个优点, vertices 已重复数据删除。



此遍历在〜0.025s 因此它是AQL唯一解决方案的两倍快。



希望这仍然有帮助;)


I'm building an app based around a d3 force-directed graph with ArangoDB on the backend, and I want to be able to load node and link data dynamically from Arango as efficiently as possible.

I'm not an expert in d3, but in general the force layout seems to want the its data as an array of nodes and an array of links that have the actual node objects as their sources and targets, like so:

var nodes = [
        {id: 0, reflexive: false},
        {id: 1, reflexive: true },
        {id: 2, reflexive: false}
    ],
    links = [
        {source: nodes[0], target: nodes[1], left: false, right: true },
        {source: nodes[1], target: nodes[2], left: false, right: true }
    ];

Currently I am using the following AQL query to get neighboring nodes, but it is quite cumbersome. Part of the difficulty is that I want to include edge information for nodes even when those edges are not traversed (in order to display the number of links a node has before loading those links from the database).

LET docId = "ExampleDocClass/1234567"

 // get data for all the edges
LET es = GRAPH_EDGES('EdgeClass',docId,{direction:'any',maxDepth:1,includeData:true})

// create an array of all the neighbor nodes
LET vArray = ( 
    FOR v IN GRAPH_TRAVERSAL('EdgeClass',docId[0],'any',{ maxDepth:1})
        FOR v1 IN v RETURN v1.vertex
    )

// using node array, return inbound and outbound for each node 
LET vs = (
    FOR v IN vArray
        // inbound and outbound are separate queries because I couldn't figure out
        // how to get Arango to differentiate inbout and outbound in the query results
        LET oe = (FOR oe1 IN GRAPH_EDGES('EdgeClass',v,{direction:'outbound',maxDepth:1,includeData:true}) RETURN oe1._to)
        LET ie = (FOR ie1 IN GRAPH_EDGES('EdgeClass',v,{direction:'inbound',maxDepth:1,includeData:true}) RETURN ie1._from)
        RETURN {'vertexData': v, 'outEdges': oe, 'inEdges': ie}
    )
RETURN {'edges':es,'vertices':vs}

The end output looks like this: http://pastebin.com/raw.php?i=B7uzaWxs ...which can be read almost directly into d3 (I just have to deduplicate a bit).

My graph nodes have a large amount of links, so performance is important (both in terms of load on the server and client, and file size for communication between the two). I am also planning on creating a variety of commands to interact with the graph aside from simply expanding neighboring nodes. Is there a way to better structure this AQL query (e.g. by avoiding four separate graph queries) or avoid AQL altogether using arangojs functions or a FOXX app, while still structuring the response in the format I need for d3 (including link data with each node)?

解决方案

sorry for the late reply, we were busy building v2.8 ;) I would suggest to do as many things as possible on the database side, as copying and serializing/deserializing JSON over the network is typically expensive, so transferring as little data as possible should be a good aim.

First of all i have used your query and executed it on a sample dataset i created (~ 800 vertices and 800 edges are hit in my dataset) As a baseline i used the execution time of your query which in my case was ~5.0s

So i tried to create the exact same result as you need in AQL only. I have found some improvements in your query: 1. GRAPH_NEIGHBORS is a bit faster than GRAPH_EDGES. 2. If possible avoid {includeData: true} if you do not need the data Especially if you need to/from vertices._id only GRAPH_NEIGHBORS with {includeData: false} outperforms GRAPH_EDGES by an order of magnitude. 3. GRAPH_NEIGHBORS is deduplicated, GRAPH_EDGES is not. Which in your case seems to be desired. 3. You can get rid of a couple of subqueries there.

So here is the pure AQL query i could come up with:

LET docId = "ExampleDocClass/1234567"
LET edges = GRAPH_EDGES('EdgeClass',docId,{direction:'any',maxDepth:1,includeData:true})
LET verticesTmp = (FOR v IN GRAPH_NEIGHBORS('EdgeClass', docId, {direction: 'any', maxDepth: 1, includeData: true})
  RETURN {
    vertexData: v,
    outEdges: GRAPH_NEIGHBORS('EdgeClass', v, {direction: 'outbound', maxDepth: 1, includeData: false}),
    inEdges: GRAPH_NEIGHBORS('EdgeClass', v, {direction: 'inbound', maxDepth: 1, includeData: false})
  })
LET vertices = PUSH(verticesTmp, {
  vertexData: DOCUMENT(docId),
  outEdges: GRAPH_NEIGHBORS('EdgeClass', docId, {direction: 'outbound', maxDepth: 1, includeData: false}),
  inEdges: GRAPH_NEIGHBORS('EdgeClass', docId, {direction: 'inbound', maxDepth: 1, includeData: false})
})
RETURN { edges, vertices }

This yields the same result format as your query and has the advantage that every vertex connected to docId is stored exactly once in vertices. Also docId itself is stored exactly once in vertices. No deduplication required on client side. But, in outEdges / inEdges of each vertices all connected vertices are also exactly once, i do not know if you need to know if there are multiple edges between vertices in this list as well.

This query uses ~0.06s on my dataset.

However if you put some more effort into it you could also consider to use a hand-crafted traversal inside a Foxx application. This is a bit more complicated but might be faster in your case, as you do less subqueries. The code for this could look like the following:

var traversal = require("org/arangodb/graph/traversal");
var result = {
  edges: [],
  vertices: {}
}
var myVisitor = function (config, result, vertex, path, connected) {
  switch (path.edges.length) {
    case 0:
      if (! result.vertices.hasOwnProperty(vertex._id)) {
        // If we visit a vertex, we store it's data and prepare out/in
        result.vertices[vertex._id] = {
          vertexData: vertex,
          outEdges: [],
          inEdges: []
        };
      }

      // No further action
      break;
    case 1:
      if (! result.vertices.hasOwnProperty(vertex._id)) {
        // If we visit a vertex, we store it's data and prepare out/in
        result.vertices[vertex._id] = {
          vertexData: vertex,
          outEdges: [],
          inEdges: []
        };
      }
      // First Depth, we need EdgeData
      var e = path.edges[0];
      result.edges.push(e);
      // We fill from / to for both vertices
      result.vertices[e._from].outEdges.push(e._to);
      result.vertices[e._to].inEdges.push(e._from);
      break;
    case 2:
      // Second Depth, we do not need EdgeData
      var e = path.edges[1];
      // We fill from / to for all vertices that exist
      if (result.vertices.hasOwnProperty(e._from)) {
        result.vertices[e._from].outEdges.push(e._to);
      }
      if (result.vertices.hasOwnProperty(e._to)) {
        result.vertices[e._to].inEdges.push(e._from);
      }
      break;
  }
};
var config = {
  datasource: traversal.generalGraphDatasourceFactory("EdgeClass"),
  strategy: "depthfirst",
  order: "preorder",
  visitor: myVisitor,
  expander: traversal.anyExpander,
  minDepth: 0,
  maxDepth: 2
};
var traverser = new traversal.Traverser(config);
traverser.traverse(result, {_id: "ExampleDocClass/1234567"});
return {
  edges: result.edges,
  vertices: Object.keys(result.vertices).map(function (key) {
              return result.vertices[key];
            })
};

The idea of this traversal is to visit all vertices from the start vertex to up to two edges away. All vertices in 0 - 1 depth will be added with data into the vertices object. All edges originating from the start vertex will be added with data into the edges list. All vertices in depth 2 will only set the outEdges / inEdges in the result.

This has the advantage that, vertices is deduplicated. and outEdges/inEdges contain all connected vertices multiple times, if there are multiple edges between them.

This traversal executes on my dataset in ~0.025s so it is twice as fast as the AQL only solution.

hope this still helps ;)

这篇关于使用AQL(或arangojs)从ArangoDB获取d3的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆