如何将树数据存储在Lucene / Solr / Elasticsearch索引或NoSQL数据库中? [英] How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?

查看:125
本文介绍了如何将树数据存储在Lucene / Solr / Elasticsearch索引或NoSQL数据库中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说,而不是文档我有一个小树,我需要存储在Lucene索引。我该怎么做?

Say instead of documents I have small trees that I need to store in a Lucene index. How do I go about doing that?

树中的一个例子节点:

class Node
{
    String data;
    String type;
    List<Node> children;
}

在上述节点中,data成员变量是一个空格分隔的字符串字,所以需要全文搜索。 type成员变量只是一个单词。

In the above node the "data" member variable is a space separated string of words, so that needs to be full-text searchable. The "type" member variable is just a single word.

搜索查询将是一个树本身,并且将搜索每个节点中的数据和类型以及结构的树匹配。在与子节点匹配之前,查询必须首先匹配父节点数据和类型。对数据值进行近似匹配是可以接受的。

The search query will be a tree itself and will search both the data and type in each node and also the structure of the tree for a match. Before matching against a child node, the query must first match the parent node data and type. Approximate matching on the data value is acceptable.

索引这种数据的最佳方式是什么?如果Lucene不直接支持索引这些数据,那么这可以由Solr或Elasticsearch完成?

What's the best way to index this kind of data? If Lucene does not directly support indexing these data then can this be done by Solr or Elasticsearch?

我快速浏览了neo4j,但似乎存储了一个整个图形在db中,不是一个很大的集合(比如数十亿或数万亿)的小树结构。或者我的理解是错误的?

I took a quick look at neo4j, but it seems to store an entire graph in the db, not a large collection (say billions or trillions) of small tree structures. Or my understanding was wrong?

另外,非基于Lucene的NoSQL解决方案是否更适合这个?

Also, is a non-Lucene based NoSQL solution is better suited for this?

推荐答案

另一种方法是将当前节点的位置的表示存储在树中。例如,第14棵树的第1级第1级节点的第3级第2级节点的第17叶将被表示为 014.001.003.017

Another approach is to store a representation of the current node's location in the tree. For example, the 17th leaf of the 3rd 2nd-level node of the 1st 1st-level node of the 14th tree would be represented as 014.001.003.017.

假设treepath是树位置的字段名称,您可以在treepath:014 *上查询第14个树中的所有节点和树叶。同样,要找到第十四棵树的所有孩子,您可以在treepath:014。*上查询。

Assuming 'treepath' is the field name of the tree location, you would query on 'treepath:014*' to find all nodes and leaves in the 14th tree. Similarly, to find all of the children of the 14th tree you would query on 'treepath:014.*'.

这种方法的主要问题是移动分支需要在移动的分支后重新排序每个分支。如果你的树是相对静态的,那实际上可能只是一个小问题。

The major problem with this approach is that moving branches around requires re-ordering every branch after the branch that was moved. If your trees are relatively static, that may only be a minor problem in practice.

(我已经看到这种方法叫做路径枚举一个'杜威十进制'表示。)

这篇关于如何将树数据存储在Lucene / Solr / Elasticsearch索引或NoSQL数据库中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆