在 Lucene 5.0 中按字母顺序排序字符串字段 [英] Sortiing String field alphabetically in Lucene 5.0

查看:24
本文介绍了在 Lucene 5.0 中按字母顺序排序字符串字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Lucene 5.0 中对字符串字段进行排序时遇到问题.显然,自 Lucene 4 以来您可以进行排序的方式已经改变.下面显示了一些正在为我的文档编制索引的字段的片段.

I'm having issues sorting on string fields in Lucene 5.0. Apparantly the way you could sort since Lucene 4 has changed. Below shows a snippet of some of the fields that are being index for my documents.

@Override
public Document generateDocument(Process entity)
{
    Document doc = new Document();
    doc.add(new IntField(id, entity.getID(), Field.Store.YES));
    doc.add(new TextField(title, entity.getProcessName(), Field.Store.YES));
    doc.add(new IntField(organizationID, entity.getOrganizationID(), Field.Store.YES));
    doc.add(new StringField(versionDate, DateTools.dateToString(entity.getVersionDate(), DateTools.Resolution.SECOND), Field.Store.YES));
    doc.add(new LongField(entityDate, entity.getVersionDate().getTime(), Field.Store.YES)); 
    return doc;
}

我想先对相关性进行排序,这很好用.我遇到的问题是标题字段上的排序不起作用.我创建了一个排序字段,我试图在一系列方法调用之后与 TopFieldCollector 一起使用.

I would like to sort on relevance first, which works just fine. The issue I have is that sorting on the title field doesn't work. I've created a sortfield which i'm trying to use with a TopFieldCollector after a chain of method calls.

public BaseSearchCore<Process, ProcessSearchResultScore>.SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage) throws IOException, ParseException
    {
    SortField titleSort = new SortField(title, SortField.Type.STRING, true);
    return super.search(searchQuery, filter, page, hitsPerPage, title);
    }

去往:

public SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage, SortField... sortfields) throws IOException, ParseException 
    {
        Query query = getQuery(searchQuery);
        TopFieldCollector paginate = getCollector(sortfields);
        int startIndex = (page -1) * hitsPerPage;
        ScoreDoc[] hits = executeSearch(query, paginate, filter, startIndex, hitsPerPage);

        return collectResults(query, filter, hitsPerPage, hits, page);
  }

最后是应用排序字段的方法:

And finally to the method that applies the sort field:

private TopFieldCollector getCollector(SortField sortfield) throws IOException
    {
        SortField[] sortFields = new SortField[] {SortField.FIELD_SCORE, sortField};
        Sort sorter = new Sort(sortFields);
        TopFieldCollector collector = TopFieldCollector.create(sorter, 25000, true, false, true);
        return collector;
    }

使用返回的收集器执行常规查询,并返回结果.但是,如果我尝试使用这个 SortField 进行排序,我会得到这个异常:

Using the returned collector a regular query is performed, and a result is returned. However, if I try to sort with this SortField i'll get this exception:

java.lang.IllegalStateException:字段标题"的意外文档值类型 NONE(预期 = SORTED).使用 UninvertingReader 或 index with docvalues.

java.lang.IllegalStateException: unexpected docvalues type NONE for field 'title' (expected=SORTED). Use UninvertingReader or index with docvalues.

我应该如何索引一个字符串字段以便能够在 Lucene 5 中按字母顺序(使用排序字段)对其进行排序?任何代码示例或片段都会非常有用.

How am I supposed to index a string field to be able to sort it alphabetically(using sortfields) in Lucene 5? Any code examples or snippets would be much appriciated.

按相关性搜索效果很好,但是当用户输入空搜索查询时,所有结果都具有相同的相关性.对于这些查询,我宁愿按结果标题排序,这会在这次 Lucene 迭代中引起问题.

Searching by relevancy works just fine, but when users enter empty search queries all the results have the same relevancy. With those queries I'd rather sort by the results titles, which is causing issues in this iteration of Lucene.

推荐答案

注意:如果您尝试将其归结为最小的错误,则更容易找出错误(对于您自己和您所询问的人)你可以先举个例子.与其对您的体系结构和我无权访问或不了解的类等进行分类,我将解决以下问题:

A note: It's way easier to figure out bugs (both for yourself and for the people you're asking) if you try to boil it down to the smallest example that you can first. Rather than sort through your architecture, and classes I don't have access to or know anything about, and such, I'll be addressing the problem as reproduced by this:

Sort sort = new Sort(new SortField("title", SortField.Type.STRING));
TopDocs docs = searcher.search(new TermQuery(new Term("title", "something")), 10, sort);

title 的定义类似于:

Where title is defined something like:

doc.add(new TextField("title", term, Field.Store.YES));

这里对字段进行排序的最佳方法可能是采纳关于 docvalues 的建议.将 DocValues 添加到字段本质上是对其进行索引以进行排序,据我了解,它比 Lucene 4.X 中的典型排序方法更有效.将典型的 TextFieldSortedDocValuesField 添加到同一个字段(名称)似乎效果很好,并且支持使用相同的字段名称进行搜索和排序:

The best approach to sorting fields here is probably going to be to take the advice on docvalues. Adding DocValues to the field is essentially indexing it for sorting, and is much more efficient the typical sorting method in Lucene 4.X, as I understand it. Adding both the typical TextField and the SortedDocValuesField to the same field (name) seems to work rather well, and supports both searching and sorting with the same field name:

doc.add(new TextField("title", term, Field.Store.YES));
doc.add(new SortedDocValuesField("title", new BytesRef(term)));

这篇关于在 Lucene 5.0 中按字母顺序排序字符串字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆