如何按Lucene.Net字段排序并忽略常见的停用词,例如"a"和"the"? [英] How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?

查看:48
本文介绍了如何按Lucene.Net字段排序并忽略常见的停用词,例如"a"和"the"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了如何按Lucene.Net索引中的给定字段而不是按分数对查询结果进行排序;它所需要的只是一个已建立索引但未标记化的字段.但是,我仍无法弄清楚如何在不考虑停用词(例如"a"和"the")的情况下对该字段进行排序,例如,以下书名将按照升序排序,如下所示:

I've found how to sort query results by a given field in a Lucene.Net index instead of by score; all it takes is a field that is indexed but not tokenized. However, what I haven't been able to figure out is how to sort that field while ignoring stop words such as "a" and "the", so that the following book titles, for example, would sort in ascending order like so:

  1. 戴帽子的猫
  2. 霍顿听见了谁

这样的事情有可能吗?如果可以,怎么办?

Is such a thing possible, and if yes, how?

我正在使用Lucene.Net 2.3.1.2.

I'm using Lucene.Net 2.3.1.2.

推荐答案

我将Lucene返回的结果包装到我自己的自定义对象集合中.然后,我可以用额外的信息/上下文信息填充它(并使用荧光笔类之类的内容来提取匹配的摘录),并添加分页.如果您采用类似的方法,则可以创建一个结果"类/对象,添加类似SortBy属性的内容,并获取要排序的任何字段,删除任何停用词,然后将其保存在此属性中.现在,只需根据该属性对集合进行排序即可.

I wrap the results returned by Lucene into my own collection of custom objects. Then I can populate it with extra info/context information (and use things like the highlighter class to pull out a snippet of the matches), plus add paging. If you took a similar route you could create a "result" class/object, add something like a SortBy property and grab whatever field you wanted to sort by, strip out any stop words, then save it in this property. Now just sort the collection based on that property instead.

这篇关于如何按Lucene.Net字段排序并忽略常见的停用词,例如"a"和"the"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆