Lucene.NET:如何使用BlockJoinQuery? [英] Lucene.NET: How to use BlockJoinQuery?

查看:76
本文介绍了Lucene.NET:如何使用BlockJoinQuery?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Lucene.NET 4.8进行关系搜索(实际上,我使用最新资源进行了编译),方法是此帖子.我引用了 Lucene.Net Lucene.Net.Analysis.Common Lucene.Net.Grouping Lucene.Net.Join Lucene.Net.QueryParser .

I am trying to do a relational search with Lucene.NET 4.8 (actually I compiled it using the latest sources) by following this post. I reference Lucene.Net, Lucene.Net.Analysis.Common, Lucene.Net.Grouping, Lucene.Net.Join, and Lucene.Net.QueryParser.

问题是:我没有得到任何结果.在下面的示例中,我认为 blog parent ,而 comments children .我想找到一个博客,该博客包含 first ,并且其注释包含 like (具有 Id 1的博客).

The problem is: I do not get any results. In my example below I consider blog the parent while comments are the children. I want to find a blog which contains first and which has a comment containing like (which is the one with Id 1).

如何修复此示例代码?

    static void BlockJoinQueryTest(string dbFolder)
    {
        var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
        var config = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
        config.SetOpenMode(IndexWriterConfig.OpenMode_e.CREATE_OR_APPEND);

        var indexPathBlog = dbFolder + "\\blog_db";
        if (System.IO.Directory.Exists(indexPathBlog))
        {
            System.IO.Directory.Delete(indexPathBlog, true);
        }
        System.IO.Directory.CreateDirectory(indexPathBlog);
        var indexDirectoryBlog = FSDirectory.Open(new System.IO.DirectoryInfo(indexPathBlog));
        var indexWriterBlog = new IndexWriter(indexDirectoryBlog, config);

        Document comment = new Document();
        comment.Add(new TextField("BlogId", "1", Field.Store.YES));
        comment.Add(new TextField("CommentContent", "I like your first blog!", Field.Store.YES));
        comment.Add(new TextField("Type", "comment", Field.Store.YES));
        comment.Add(new TextField("Note", "child", Field.Store.YES));
        indexWriterBlog.AddDocument(comment);

        comment = new Document();
        comment.Add(new TextField("BlogId", "1", Field.Store.YES));
        comment.Add(new TextField("CommentContent", "Not that great.", Field.Store.YES));
        comment.Add(new TextField("Type", "comment", Field.Store.YES));
        comment.Add(new TextField("Note", "child", Field.Store.YES));
        indexWriterBlog.AddDocument(comment);

        Document blog = new Document();
        blog.Add(new TextField("Id", "1", Field.Store.YES));
        blog.Add(new TextField("BlogContent", "Content of first blog", Field.Store.YES));
        blog.Add(new TextField("Type", "blog", Field.Store.YES));
        blog.Add(new TextField("Note", "parent", Field.Store.YES));
        indexWriterBlog.AddDocument(blog);

        blog = new Document();
        blog.Add(new TextField("Id", "2", Field.Store.YES));
        blog.Add(new TextField("BlogContent", "This is the second blog!", Field.Store.YES));
        blog.Add(new TextField("Type", "blog", Field.Store.YES));
        blog.Add(new TextField("Note", "parent", Field.Store.YES));
        indexWriterBlog.AddDocument(blog);

        indexWriterBlog.Commit();

        var searcher = new IndexSearcher(DirectoryReader.Open(indexDirectoryBlog));

        Console.WriteLine("Begin content enumeration:");
        for (int i = 0; i < searcher.IndexReader.MaxDoc; i++)
        {
            var doc = searcher.IndexReader.Document(i);
            Console.WriteLine("Document " + i + ": " + doc.ToString());
        }
        Console.WriteLine("End content enumeration.");

        Filter blogs = new CachingWrapperFilter(
                new QueryWrapperFilter(
                  new TermQuery(
                    new Term("Type", "blog"))));
        BooleanQuery commentQuery = new BooleanQuery();
        commentQuery.Add(new TermQuery(new Term("CommentContent", "like")), BooleanClause.Occur.MUST);
        //commentQuery.Add(new TermQuery(new Term("BlogId", "1")), BooleanClause.Occur.MUST);

        var commentJoinQuery = new ToParentBlockJoinQuery(
            commentQuery,
            blogs,
            ScoreMode.None);

        BooleanQuery query = new BooleanQuery();
        query.Add(new TermQuery(new Term("BlogContent", "first")), BooleanClause.Occur.MUST);
        query.Add(commentQuery, BooleanClause.Occur.MUST);
        var c = new ToParentBlockJoinCollector(
            Sort.RELEVANCE, // sort
            10,             // numHits
            true,           // trackScores
            false           // trackMaxScore
            );
        searcher.Search(query, c);
        int maxDocsPerGroup = 10;
        var hits = c.GetTopGroups(
            commentJoinQuery,
            Sort.INDEXORDER,
            0,   // offset
            maxDocsPerGroup,  // maxDocsPerGroup
            0,   // withinGroupOffset
            true // fillSortFields
          );
        if (hits != null)
        {
            Console.WriteLine("Found " + hits.TotalGroupCount + " groups:");
            for (int i = 0; i < hits.TotalGroupCount; i++)
            {
                var group = hits.Groups[i];
                Console.WriteLine("Group " + i + ": " + group.ToString());

                for (int j = 0; j < group.TotalHits && j < maxDocsPerGroup; j++)
                {
                    Document doc = searcher.Doc(group.ScoreDocs[j].Doc);
                    Console.WriteLine("Hit " + i + ": " + doc.ToString());
                }
            }
        }
        else
        {
            Console.WriteLine("No hits.");
        }

        Console.WriteLine("Done.");

推荐答案

我也偶然发现了这个问题,并设法进行了修复.

I also stumbled across this and managed to fix it.

  • @Ant在声明父文档必须是块中的最后一个文档时是正确的.

但是代码仍然存在两个问题

But there were two remaining problems with the code

  1. 由于某些原因-对于不是Lucene专家感到抱歉-当CommentCOntent是一个句子(我喜欢您的第一个博客!")并且您使用术语查询来搜索它时,您不会得到任何结果.我想这与该领域的分析有关.所以我要做的是用博客"

现在,IndexSercher似乎找到了一个结果,但引发了一个错误,例如"System.InvalidOperationException:'parentFilter必须返回FixedBitSet;获取了Lucene.Net.Search.QueryWrapperFilter + DocIdSetAnonymousInnerClassHelper"通过lucene.net(Github)的测试案例,我发现我不得不将parentQuery包装在FixedBitSetCachingWrapperFilter中:过滤parentQuery =新的FixedBitSetCachingWrapperFilter(新的QueryWrapperFilter(新的TermQuery(new Term("Type","blog")))));

Now the IndexSercher seemed to find a result, but threw an error as "System.InvalidOperationException: 'parentFilter must return FixedBitSet; got Lucene.Net.Search.QueryWrapperFilter+DocIdSetAnonymousInnerClassHelper" Looking through the test cases of lucene.net (Github), I saw that I had to wrap the parentQuery in a FixedBitSetCachingWrapperFilter: Filter parentQuery = new FixedBitSetCachingWrapperFilter( new QueryWrapperFilter( new TermQuery( new Term("Type", "blog"))));

完整代码是:

  var analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
            var config = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
            config.SetOpenMode(OpenMode.CREATE_OR_APPEND);

            var indexPathBlog = Path.Combine(Environment.CurrentDirectory, "index");
            if (System.IO.Directory.Exists(indexPathBlog))
            {
                System.IO.Directory.Delete(indexPathBlog, true);
            }

            System.IO.Directory.CreateDirectory(indexPathBlog);
            var indexDirectoryBlog = FSDirectory.Open(new System.IO.DirectoryInfo(indexPathBlog));
            var indexWriterBlog = new IndexWriter(indexDirectoryBlog, config);

            var one = new List<Document>();
            var two = new List<Document>();


            Document commentOne = new Document();
            commentOne.Add(new TextField("BlogId", "1", Field.Store.YES));
            commentOne.Add(new TextField("CommentContent", "blog", Field.Store.YES));
            commentOne.Add(new TextField("Type", "comment", Field.Store.YES));
            commentOne.Add(new TextField("Note", "child", Field.Store.YES));
            one.Add(commentOne);

            var blogOne = new Document();
            blogOne.Add(new TextField("Id", "1", Field.Store.YES));
            blogOne.Add(new TextField("BlogContent", "Content of first blog!", Field.Store.YES));
            blogOne.Add(new TextField("Type", "blog", Field.Store.NO));
            blogOne.Add(new TextField("Note", "parent", Field.Store.YES));
            one.Add(blogOne);

            var commentTwo = new Document();
            commentTwo.Add(new TextField("BlogId", "2", Field.Store.YES));
            commentTwo.Add(new TextField("CommentContent", "Not that great.", Field.Store.YES));
            commentTwo.Add(new TextField("Type", "comment", Field.Store.YES));
            commentTwo.Add(new TextField("Note", "child", Field.Store.YES));
            two.Add(commentTwo);

            Document blogTwo = new Document();
            blogTwo.Add(new TextField("Id", "2", Field.Store.YES));
            blogTwo.Add(new TextField("BlogContent", "This is the second blog!", Field.Store.YES));
            blogTwo.Add(new TextField("Type", "blog", Field.Store.NO));
            blogTwo.Add(new TextField("Note", "parent", Field.Store.YES));
            two.Add(blogTwo);

            indexWriterBlog.AddDocuments(one);
            indexWriterBlog.AddDocuments(two);

            indexWriterBlog.Commit();

            var searcher = new IndexSearcher(DirectoryReader.Open(indexDirectoryBlog));

            Filter parentQuery =
                new FixedBitSetCachingWrapperFilter(
                    new QueryWrapperFilter(
                        new TermQuery(
                            new Term("Type", "blog"))));

            BooleanQuery childQuery = new BooleanQuery();
            childQuery.Add(new TermQuery(new Term("CommentContent", "blog")), Occur.MUST);

            var commentJoinQuery = new ToParentBlockJoinQuery(
                childQuery,
                parentQuery,
                ScoreMode.None);

            BooleanQuery query = new BooleanQuery();
            //query.Add(new TermQuery(new Term("Type", "blog")), BooleanClause.Occur.MUST);
            query.Add(commentJoinQuery, Occur.MUST);

            var c = new ToParentBlockJoinCollector(
                Sort.RELEVANCE, // sort
                10,             // numHits
                false,           // trackScores
                false           // trackMaxScore
            );

            searcher.Search(commentJoinQuery, c);

            int maxDocsPerGroup = 10;
            var hits = c.GetTopGroups(
                commentJoinQuery,
                Sort.INDEXORDER,
                0,   // offset
                maxDocsPerGroup,  // maxDocsPerGroup
                0,   // withinGroupOffset
                true // fillSortFields
            );

            if (hits != null)
            {
                Console.WriteLine("Found " + hits.TotalGroupCount + " groups:");
                for (int i = 0; i < hits.TotalGroupCount; i++)
                {
                    var group = hits.Groups[i];
                    Console.WriteLine("Group " + i + ": " + group.ToString());

                    for (int j = 0; j < group.TotalHits && j < maxDocsPerGroup; j++)
                    {
                        Document doc = searcher.Doc(group.ScoreDocs[j].Doc);
                        Console.WriteLine("Hit " + i + ": " + doc.ToString());
                    }
                }
            }
            else
            {
                Console.WriteLine("No hits.");
            }

            Console.WriteLine("Done.");
            Console.ReadKey();

请注意,我在.NET Core控制台应用程序中使用了以下Pacakges:

Note that I used the following Pacakges in a .NET Core Console app:

<PackageReference Include="Lucene.Net" Version="4.8.0-beta00005" />
<PackageReference Include="Lucene.Net.Analysis.Common" Version="4.8.0-beta00005" />
<PackageReference Include="Lucene.Net.Join" Version="4.8.0-beta00005" />

这篇关于Lucene.NET:如何使用BlockJoinQuery?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆