Neo4j索引(使用Lucene) - 组织节点“类型”的好方法? [英] Neo4j indexing (with Lucene) - good way to organize node "types"?

查看:105
本文介绍了Neo4j索引(使用Lucene) - 组织节点“类型”的好方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这实际上更像是一个Lucene问题,但它是在neo4j数据库的上下文中。

This is more actually more of a Lucene question, but it's in the context of a neo4j database.

我有一个数据库被分为50个左右的节点类型(所以其他类型的dbs中的集合或表)。每个都有一个需要索引的属性子集,有些属性名称相同,有些则没有。

I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't.

搜索时,我总是希望找到特定的节点类型,永远不会跨越所有节点。

When searching, I always want to find nodes of a specific type, never across all nodes.

我可以看到三种组织方式:

I can see three ways of organizing this:


  • 每种类型一个索引,属性自然映射到索引字段:index'foo','id'='1234'

单个全局索引,每个字段映射到属性名称,以区分类型,将其包含为值的一部分('id'='foo :1234')或者一旦返回它们就检查节点(我希望重复节点非常罕见)。

A single global index, each field maps to a property name, to distinguish the type either include it as part of the value ('id'='foo:1234') or check the nodes once they're returned (I expect duplicates to be very rare).

单个节点index,type是字段名称的一部分:'foo.id'='1234'

A single index, type is part of the field name: 'foo.id'='1234'.

创建后,数据库是只读的。

Once created, the database is read-only.

在方便性,尺寸方面,其中一个是否有任何好处/缓存效率还是性能?

Are there any benefits to one of those, in terms of convenience, size/cache efficiency, or performance?

正如我所说的那样理解它,对于第一个选项,neo4j将为每种类型创建一个单独的物理索引,这似乎不是最理想的。对于第三个,我最终得到的大多数lucene文档只有一小部分字段,不确定是否会影响任何内容。

As I understand it, for the first option neo4j will create a separate physical index for each type, which seems suboptimal. For the third, I end up with most lucene docs only having a small subset of the fields, not sure if that affects anything.

推荐答案

单个索引将小于几个小索引,因为某些数据(如术语词典)将被共享。但是,由于术语字典查找是O(lg(n))操作,因此在较大术语字典中查找可能会慢一些。 (如果你有50个索引,这只需要6(2 ^ 6> = 50)个比较,你可能不会发现任何差异。)

A single index will be smaller than several little indexes, because some data, such as the term dictionary, will be shared. However, since a term dictionary lookup is a O(lg(n)) operation, a lookup in a bigger term dictionary might be a little slower. (If you have 50 indexes, this would only require 6 (2^6>=50) more comparisons, it is likely you won't notice any difference.)

较小索引的另一个优点是操作系统缓存可能使查询运行得更快。

Another advantage of a smaller index is that the OS cache is likely to make queries run faster.

我将索引两个不同的字段<$>而不是选项2和3 c $ c> id 和输入并搜索( id :ID AND type :TYPE)但我不知道是否可以使用neo4j。

Instead of your options 2 and 3, I would index two different fields id and type and search for (id:ID AND type:TYPE) but I don't know if it is possible with neo4j.

这篇关于Neo4j索引(使用Lucene) - 组织节点“类型”的好方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆