如何在Gremlin中执行分页 [英] How to perform pagination in Gremlin

查看:482
本文介绍了如何在Gremlin中执行分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Tinkerpop 3中,如何执行分页?我想获取查询的前10个元素,然后获取下10个元素,而不必将它们全部加载到内存中.例如,下面的查询返回1000,000条记录.我想以10乘10的方式获取它们,而不一次加载所有1000,000.

In Tinkerpop 3, how to perform pagination? I want to fetch the first 10 elements of a query, then the next 10 without having to load them all in memory. For example, the query below returns 1000,000 records. I want to fetch them 10 by 10 without loading all the 1000,000 at once.

g.V().has("key", value).limit(10)

编辑

在Gremlin Server上通过HttpChannelizer起作用的解决方案将是理想的.

Edit

A solution that works through HttpChannelizer on Gremlin Server would be ideal.

推荐答案

从功能的角度来看,用于分页的Gremlin看起来很不错:

From a functional perspective, a nice looking bit of Gremlin for paging would be:

gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 0, 2)).
                 by(count(local))
==>[persons:[v[1],v[2]],count:4]
gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 2, 4)).
                 by(count(local))
==>[persons:[v[4],v[6]],count:4]

这样,您将获得带有结果的顶点总数.不幸的是,fold()迫使您计算所有顶点,这将需要对其进行迭代(即,将它们全部存储到内存中).

In this way you get the total count of vertices with the result. Unfortunately, the fold() forces you to count all the vertices which will require iterating them all (i.e. bringing them all into memory).

在这种情况下,只要您打算在多个单独的尝试中执行遍历,实际上就无法避免对所有100,000个顶点进行迭代.例如:

There really is no way to avoid iterating all 100,000 vertices in this case as long as you intend to execute your traversal in multiple separate attempts. For example:

gremlin> g.V().hasLabel('person').range(0,2)
==>v[1]
==>v[2]
gremlin> g.V().hasLabel('person').range(2,4)
==>v[4]
==>v[6]

第一条语句与使用limit(2)终止遍历相同.在第二次遍历中,只需要第二个顶点,就好像您不是在魔术地跳过对前两个顶点的迭代一样,因为这是一个新的遍历.我不知道任何TinkerPop图形数据库实现都可以有效地做到这一点-它们都具有这种行为.

The first statement is the same as if you'd terminated the traversal with limit(2). On the second traversal, that only wants the second two vertices, it not as though you magically skip iterating the first two as it is a new traversal. I'm not aware of any TinkerPop graph database implementation that will do that efficiently - they all have that behavior.

一次将十个顶点全部保留在内存中的唯一方法是使用与以下示例相同的Traversal实例:

The only way to do ten vertices at a time without having them all in memory is to use the same Traversal instance as in:

gremlin> t = g.V().hasLabel('person');[]
gremlin> t.next(2)
==>v[1]
==>v[2]
gremlin> t.next(2)
==>v[4]
==>v[6]

使用该模型,您只需要迭代一次顶点,而不会在单个时间点将所有顶点都存储到内存中.

With that model you only iterate the vertices once and don't bring them all into memory at a single point in time.

有关此主题的其他一些想法可以在此博客文章.

Some other thoughts on this topic can be found in this blog post.

这篇关于如何在Gremlin中执行分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆