使用Solr搜索数字版 [英] Search Number Plates using Solr

查看:96
本文介绍了使用Solr搜索数字版的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个包含巨大的车牌记录数据库的数据库。我打算使用Apache Solr来实现搜索功能。我不知道如何调用我想要实现的搜索功能。但让我向你解释我的要求:



当人们搜索时,我想要一个Solr将某些数字替换为字母?
例如

12 = R

13 = B

4 = A

11 = H

etc etc?



因此,例如,当有人搜索John时,将提供搜索结果,应该从可用的车牌列表中获得以下建议。

JO11 NYJ - 搜索应该用11代替H!



例如,看看 http://www.privatenumberplates.com/list/JOHN



我不知道我如何在Solr中完成这项工作,任何想要开始在Solr中处理这个问题的想法都会很棒!什么应该最适合使用?同义词,soundex,模糊或其他?什么分析器/词干库应该被使用?

PatternReplaceCharFilterFactory 将数字 - >字母(每次转换需要覆盖一次)加上一个语音过滤器匹配类似的发音词可以作为一个起点。

您应该在索引和查询时都这样做。这应该工作...但你可能会想'john'匹配'john'比'jo11n'更高的分数'吗?

因此你应该使用复制域来匹配(有不同的提升)几个领域,一个原创,一个与数字 - >字母转换应用,一个与应用语音过滤器等,你可以得到你想要的幻想。



你也可以自己写一个Analizer,但我会留下来以备后用,以免使用内置的Analizer。


I am working on searching a database containing huge db of records of number plates. I am planning to use Apache Solr for implementing the search feature. I don't know the term how to call the search feature I want to implement. But let me explain my requirements to you:

When people search, I want a Solr to subtitute certain numbers for letters? Eg.

12 = R

13 = B

4 = A

11 = H

etc etc?

So for example, when someone search for "John" a search result will be offered should have following suggestions from available list of number plates.

JO11 NYJ - Search should substitute 11 for H!

For example, have a look at http://www.privatenumberplates.com/list/JOHN

I am not sure how I can get this done in Solr, any idea to get started with handling this in Solr would be great! What should be most appropriate to use? Synonym, soundex, fuzzy or something else? What analyzers / stemming libraries should be used?

解决方案

A number of PatternReplaceCharFilterFactory to convert number->letter (one per conversion you need to cover) plus a phonetic filter to match similar sounding words could work as a starting point.

You should do this both at index and query time. This should work...BUT you probably would want 'john' to match 'john' with a higher score than 'jo11n' right?

So you should use copyfields to match (with different boosts) several fields, one original, one with the number->letter conversion applied, one with the phonetic filter applied, etc. You can get as fancy as you need.

You might also write your own Analizer, but I would leave it for later, in case using the built in ones is not good enough.

这篇关于使用Solr搜索数字版的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆