如何将树数据存储在 Lucene/Solr/Elasticsearch 索引或 NoSQL 数据库中? [英] How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?

查看:34
本文介绍了如何将树数据存储在 Lucene/Solr/Elasticsearch 索引或 NoSQL 数据库中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一些小树而不是文档,我需要将它们存储在 Lucene 索引中.我该怎么做?

Say instead of documents I have small trees that I need to store in a Lucene index. How do I go about doing that?

树中的一个示例节点:

class Node
{
    String data;
    String type;
    List<Node> children;
}

在上面的节点中,data"成员变量是一个空格分隔的词串,所以需要全文搜索.type"成员变量只是一个词.

In the above node the "data" member variable is a space separated string of words, so that needs to be full-text searchable. The "type" member variable is just a single word.

搜索查询本身就是一棵树,将搜索每个节点中的数据和类型以及树的结构以进行匹配.在匹配子节点之前,查询必须首先匹配父节点的数据和类型.数据值的近似匹配是可以接受的.

The search query will be a tree itself and will search both the data and type in each node and also the structure of the tree for a match. Before matching against a child node, the query must first match the parent node data and type. Approximate matching on the data value is acceptable.

索引此类数据的最佳方法是什么?如果 Lucene 不直接支持索引这些数据,那么可以通过 Solr 或 Elasticsearch 来完成吗?

What's the best way to index this kind of data? If Lucene does not directly support indexing these data then can this be done by Solr or Elasticsearch?

我快速浏览了neo4j,但它似乎在数据库中存储了一个完整的图形,而不是一个大集合(比如数十亿或数万亿)的小树结构.还是我的理解有误?

I took a quick look at neo4j, but it seems to store an entire graph in the db, not a large collection (say billions or trillions) of small tree structures. Or my understanding was wrong?

此外,非基于 Lucene 的 NoSQL 解决方案是否更适合于此?

Also, is a non-Lucene based NoSQL solution is better suited for this?

推荐答案

另一种方法是在树中存储当前节点位置的表示.例如,第 14 棵树的第 1 个一级节点的第 3 个二级节点的第 17 个叶子将表示为 014.001.003.017.

Another approach is to store a representation of the current node's location in the tree. For example, the 17th leaf of the 3rd 2nd-level node of the 1st 1st-level node of the 14th tree would be represented as 014.001.003.017.

假设 'treepath' 是树位置的字段名称,您将查询 'treepath:014*' 以查找第 14 棵树中的所有节点和叶子.类似地,要查找第 14 棵树的所有子树,您可以查询treepath:014.*".

Assuming 'treepath' is the field name of the tree location, you would query on 'treepath:014*' to find all nodes and leaves in the 14th tree. Similarly, to find all of the children of the 14th tree you would query on 'treepath:014.*'.

这种方法的主要问题是移动分支需要在移动的分支之后重新排序每个分支.如果您的树相对静态,那么这在实践中可能只是一个小问题.

The major problem with this approach is that moving branches around requires re-ordering every branch after the branch that was moved. If your trees are relatively static, that may only be a minor problem in practice.

(我见过这种方法称为路径枚举"或杜威十进制"表示.)

这篇关于如何将树数据存储在 Lucene/Solr/Elasticsearch 索引或 NoSQL 数据库中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆