在Lucene 5.0中按字母顺序对字符串字段进行排序 [英] Sortiing String field alphabetically in Lucene 5.0

查看:152
本文介绍了在Lucene 5.0中按字母顺序对字符串字段进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Lucene 5.0中对字符串字段进行排序时遇到问题。自Lucene 4改变以来你可以选择的方式。下面显示了我的文档索引的一些字段的片段。

I'm having issues sorting on string fields in Lucene 5.0. Apparantly the way you could sort since Lucene 4 has changed. Below shows a snippet of some of the fields that are being index for my documents.

@Override
public Document generateDocument(Process entity)
{
    Document doc = new Document();
    doc.add(new IntField(id, entity.getID(), Field.Store.YES));
    doc.add(new TextField(title, entity.getProcessName(), Field.Store.YES));
    doc.add(new IntField(organizationID, entity.getOrganizationID(), Field.Store.YES));
    doc.add(new StringField(versionDate, DateTools.dateToString(entity.getVersionDate(), DateTools.Resolution.SECOND), Field.Store.YES));
    doc.add(new LongField(entityDate, entity.getVersionDate().getTime(), Field.Store.YES)); 
    return doc;
}

我想首先关注相关性,这很好用。我遇到的问题是在标题字段上排序不起作用。我创建了一个sortfield,我试图在一系列方法调用之后使用TopFieldCollector。

I would like to sort on relevance first, which works just fine. The issue I have is that sorting on the title field doesn't work. I've created a sortfield which i'm trying to use with a TopFieldCollector after a chain of method calls.

public BaseSearchCore<Process, ProcessSearchResultScore>.SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage) throws IOException, ParseException
    {
    SortField titleSort = new SortField(title, SortField.Type.STRING, true);
    return super.search(searchQuery, filter, page, hitsPerPage, title);
    }

其中:

public SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage, SortField... sortfields) throws IOException, ParseException 
    {
        Query query = getQuery(searchQuery);
        TopFieldCollector paginate = getCollector(sortfields);
        int startIndex = (page -1) * hitsPerPage;
        ScoreDoc[] hits = executeSearch(query, paginate, filter, startIndex, hitsPerPage);

        return collectResults(query, filter, hitsPerPage, hits, page);
  }

最后是应用排序字段的方法:

And finally to the method that applies the sort field:

private TopFieldCollector getCollector(SortField sortfield) throws IOException
    {
        SortField[] sortFields = new SortField[] {SortField.FIELD_SCORE, sortField};
        Sort sorter = new Sort(sortFields);
        TopFieldCollector collector = TopFieldCollector.create(sorter, 25000, true, false, true);
        return collector;
    }

使用返回的收集器执行常规查询,并返回结果。但是,如果我尝试使用此SortField进行排序,我将获得此异常:

Using the returned collector a regular query is performed, and a result is returned. However, if I try to sort with this SortField i'll get this exception:

java.lang.IllegalStateException:意外的docvalues为字段'title'键入NONE(预期=排序)。使用UninvertingReader或索引与docvalues。

java.lang.IllegalStateException: unexpected docvalues type NONE for field 'title' (expected=SORTED). Use UninvertingReader or index with docvalues.

我如何索引字符串字段以便能够在Lucene 5中按字母顺序(使用sortfields)对其进行排序?任何代码示例或代码段都会非常适用。

How am I supposed to index a string field to be able to sort it alphabetically(using sortfields) in Lucene 5? Any code examples or snippets would be much appriciated.

按相关性搜索工作正常,但是当用户输入空搜索查询时,所有结果都具有相同的相关性。对于这些查询,我宁愿按结果标题排序,这会导致Lucene迭代中出现问题。

Searching by relevancy works just fine, but when users enter empty search queries all the results have the same relevancy. With those queries I'd rather sort by the results titles, which is causing issues in this iteration of Lucene.

推荐答案

注意事项:如果你试图把它归结为你可以先做的最小的例子,那么就可以更容易地找出错误(包括你自己和你所问的人)。而不是对你的架构进行排序,以及我无法访问或了解任何内容的类,等等,我将解决这个问题:

A note: It's way easier to figure out bugs (both for yourself and for the people you're asking) if you try to boil it down to the smallest example that you can first. Rather than sort through your architecture, and classes I don't have access to or know anything about, and such, I'll be addressing the problem as reproduced by this:

Sort sort = new Sort(new SortField("title", SortField.Type.STRING));
TopDocs docs = searcher.search(new TermQuery(new Term("title", "something")), 10, sort);

其中title定义如下:

Where title is defined something like:

doc.add(new TextField("title", term, Field.Store.YES));

这里排序字段的最佳方法可能是接受有关docvalues的建议。将DocValues添加到字段实际上是将其编入索引以进行排序,并且正如我所理解的那样,Lucene 4.X中的典型排序方法效率更高。将典型的 TextField SortedDocValuesField 添加到同一个字段(名称)似乎工作得相当好,并支持两者使用相同的字段名称进行搜索和排序:

The best approach to sorting fields here is probably going to be to take the advice on docvalues. Adding DocValues to the field is essentially indexing it for sorting, and is much more efficient the typical sorting method in Lucene 4.X, as I understand it. Adding both the typical TextField and the SortedDocValuesField to the same field (name) seems to work rather well, and supports both searching and sorting with the same field name:

doc.add(new TextField("title", term, Field.Store.YES));
doc.add(new SortedDocValuesField("title", new BytesRef(term)));

这篇关于在Lucene 5.0中按字母顺序对字符串字段进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆