在Lucene.NET索引存储的关系数据 [英] Storing relational data in a Lucene.NET index

查看:102
本文介绍了在Lucene.NET索引存储的关系数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前想实现一个大的数据库上Lucene.NET基于搜索和我碰钉子试图做的是什么本质上的关系数据的搜索。

I'm currently trying to implement a Lucene.NET based search on a large database and I've hit a snag trying to do a search on what is essentially relational data.

在高级别我想要查询的数据被分组,每个项目属于1至3个基团。然后,我需要能够做的是组的组合中的所有项目进行搜索。(EG:每个项目属于A组和B组)

At a high level the data I'm trying to search is grouped, each item belongs to 1 to 3 groups. I then need to be able to do a search for all items that are in a combination of groups (EG: Each item belongs to both group A and group B).

每个分组都有的ID和我在寻找数据已有说明,但描述可能是另一个(EG:一组名为东西和其他的其他东西)的子串,和我不吨要匹配有一个我正在寻找的子字符串的类别。

Each of these groupings have ID's and Descriptions existing from the data I'm searching, but the descriptions may be sub-strings of one another (EG: One group named "Stuff" and the other "Other stuff"), and I don't want to match the categories that have a sub-string of the one I'm looking for.

我一直在考虑把数据回没有这个过滤,然后过滤的ID,但我打算从分页返回的Lucene出于性能方面的数据。我也考虑把ID的空间分隔,做在球场上的文本搜索,但是这似乎是一个总的黑客...

I've been considering pulling the data back without this filtering and then filtering the ID's, but I was intending to paginate the data returned from Lucene for performance reasons. I've also considered putting the ID's in space-separated and doing a text-search on the field, but that seems like a total hack...

有没有人有任何想法如何最好地处理这种在Lucene.NET搜索? (只是为了澄清有人说我用错了工具之前,这仅仅是一个更大的组过滤器,其中包括全文检索的子集。如果你还认为我使用了错误的工具,虽然我很乐意听到正确的是什么)

Does anyone have any idea how to best handle this kind of search in Lucene.NET? (Just to clarify before someone says I'm using the wrong tool, this is only a subset of a larger set of filters which includes full-text searching. If you still think I'm using the wrong tool though I'd love to hear what the right one is)

推荐答案

我有我的问题,份额存储我Lucene的关系型数据,但一个你有应该很容易解决。

I've had my share of problems with storing relational data i Lucene but the one you have should be easy to fix.

我猜你的记号化组字段,这使得它可以搜索字段值中的子字符串。只需添加领域的非记号化,它应该像预期。

I guess you tokenize the group fields and that makes it possible to search for substrings in the field value. Just add the field untokenized and it should work like expected.

请检查下面的一小段代码:

Please check the following small piece of code:

internal class Program {
    private static void Main(string[] args) {
        var directory = new RAMDirectory();
        var writer = new IndexWriter(directory, new StandardAnalyzer());
        AddDocument(writer, "group", "stuff", Field.Index.UN_TOKENIZED);
        AddDocument(writer, "group", "other stuff", Field.Index.UN_TOKENIZED);
        writer.Close(true);

        var searcher = new IndexSearcher(directory);
        Hits hits = searcher.Search(new TermQuery(new Term("group", "stuff")));

        for (int i = 0; i < hits.Length(); i++) {
            Console.WriteLine(hits.Doc(i).GetField("group").StringValue());
        }
    }

    private static void AddDocument(IndexWriter writer, string name, string value, Field.Index index) {
        var document = new Document();
        document.Add(new Field(name, value, Field.Store.YES, index));
        writer.AddDocument(document);
    }
}



该示例将两个文件它们非记号化索引,确实为东西搜索,并得到一击。如果你改变了代码添加然后将它们标记过你将有两支安打,你现在看到的。

The sample adds two documents to the index which are untokenized, does a search for stuff and gets one hit. If you changed the code to add them tokenized then you will have two hits as you see now.

使用Lucene的关系数据的问题是,它可以预期通配符和范围搜索总是会工作。这是不是真的情况下,如果该指数大,由于Lucene的方式解决这些查询。

The issue with using Lucene for relational data is that it might be expected that wildcard and range searches always will work. That is not really the case if the index is big due to way Lucene resolves those queries.

另一个示例来说明行为:

Another sample to illustrate the behavior:

    private static void Main(string[] args) {
        var directory = new RAMDirectory();
        var writer = new IndexWriter(directory, new StandardAnalyzer());

        var documentA = new Document();
        documentA.Add(new Field("name", "A", Field.Store.YES, Field.Index.UN_TOKENIZED));
        documentA.Add(new Field("group", "stuff", Field.Store.YES, Field.Index.UN_TOKENIZED));
        documentA.Add(new Field("group", "other stuff", Field.Store.YES, Field.Index.UN_TOKENIZED));
        writer.AddDocument(documentA);
        var documentB = new Document();
        documentB.Add(new Field("name", "B", Field.Store.YES, Field.Index.UN_TOKENIZED));
        documentB.Add(new Field("group", "stuff", Field.Store.YES, Field.Index.UN_TOKENIZED));
        writer.AddDocument(documentB);
        var documentC = new Document();
        documentC.Add(new Field("name", "C", Field.Store.YES, Field.Index.UN_TOKENIZED));
        documentC.Add(new Field("group", "other stuff", Field.Store.YES, Field.Index.UN_TOKENIZED));
        writer.AddDocument(documentC);

        writer.Close(true);

        var query1 = new TermQuery(new Term("group", "stuff"));
        SearchAndDisplay("First sample", directory, query1);

        var query2 = new TermQuery(new Term("group", "other stuff"));
        SearchAndDisplay("Second sample", directory, query2);

        var query3 = new BooleanQuery();
        query3.Add(new TermQuery(new Term("group", "stuff")), BooleanClause.Occur.MUST);
        query3.Add(new TermQuery(new Term("group", "other stuff")), BooleanClause.Occur.MUST);
        SearchAndDisplay("Third sample", directory, query3);
    }

    private static void SearchAndDisplay(string title, Directory directory, Query query3) {
        var searcher = new IndexSearcher(directory);
        Hits hits = searcher.Search(query3);
        Console.WriteLine(title);
        for (int i = 0; i < hits.Length(); i++) {
            Console.WriteLine(hits.Doc(i).GetField("name").StringValue());
        }
    }

这篇关于在Lucene.NET索引存储的关系数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆