Neo4j索引(使用Lucene) - 组织节点“类型”的好方法? [英] Neo4j indexing (with Lucene) - good way to organize node "types"?
问题描述
这实际上更像是一个Lucene问题,但它是在neo4j数据库的上下文中。
This is more actually more of a Lucene question, but it's in the context of a neo4j database.
我有一个数据库被分为50个左右的节点类型(所以其他类型的dbs中的集合或表)。每个都有一个需要索引的属性子集,有些属性名称相同,有些则没有。
I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't.
搜索时,我总是希望找到特定的节点类型,永远不会跨越所有节点。
When searching, I always want to find nodes of a specific type, never across all nodes.
我可以看到三种组织方式:
I can see three ways of organizing this:
-
每种类型一个索引,属性自然映射到索引字段:index'foo',
'id'='1234'
。
单个全局索引,每个字段映射到属性名称,以区分类型,将其包含为值的一部分('id'='foo :1234'
)或者一旦返回它们就检查节点(我希望重复节点非常罕见)。
A single global index, each field maps to a property name, to distinguish the type either include it as part of the value ('id'='foo:1234'
) or check the nodes once they're returned (I expect duplicates to be very rare).
单个节点index,type是字段名称的一部分:'foo.id'='1234'
。
A single index, type is part of the field name: 'foo.id'='1234'
.
创建后,数据库是只读的。
Once created, the database is read-only.
在方便性,尺寸方面,其中一个是否有任何好处/缓存效率还是性能?
Are there any benefits to one of those, in terms of convenience, size/cache efficiency, or performance?
正如我所说的那样理解它,对于第一个选项,neo4j将为每种类型创建一个单独的物理索引,这似乎不是最理想的。对于第三个,我最终得到的大多数lucene文档只有一小部分字段,不确定是否会影响任何内容。
As I understand it, for the first option neo4j will create a separate physical index for each type, which seems suboptimal. For the third, I end up with most lucene docs only having a small subset of the fields, not sure if that affects anything.
推荐答案
单个索引将小于几个小索引,因为某些数据(如术语词典)将被共享。但是,由于术语字典查找是O(lg(n))操作,因此在较大术语字典中查找可能会慢一些。 (如果你有50个索引,这只需要6(2 ^ 6> = 50)个比较,你可能不会发现任何差异。)
A single index will be smaller than several little indexes, because some data, such as the term dictionary, will be shared. However, since a term dictionary lookup is a O(lg(n)) operation, a lookup in a bigger term dictionary might be a little slower. (If you have 50 indexes, this would only require 6 (2^6>=50) more comparisons, it is likely you won't notice any difference.)
较小索引的另一个优点是操作系统缓存可能使查询运行得更快。
Another advantage of a smaller index is that the OS cache is likely to make queries run faster.
我将索引两个不同的字段<$>而不是选项2和3 c $ c> id 和输入
并搜索( id
:ID AND type
:TYPE)但我不知道是否可以使用neo4j。
Instead of your options 2 and 3, I would index two different fields id
and type
and search for (id
:ID AND type
:TYPE) but I don't know if it is possible with neo4j.
这篇关于Neo4j索引(使用Lucene) - 组织节点“类型”的好方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!