Solr中的一对多地理空间搜索索引设计 [英] One-to-Many Geospatial Search Index Design in Solr

查看:345
本文介绍了Solr中的一对多地理空间搜索索引设计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望就设计Solr索引的最佳方法提出一些建议,其中每个文档都有多个标签以及多个lat / lng对。

I'm hoping to get some advice on the best way to design a Solr index where each document has multiple tags as well as multiple lat/lng pairs.

示例文档的JSON表示:

The JSON representation of an example document:

Document {
    id: 123,
    name: "Sample Doc",
    tags: [
        {tag:"example1", weight:0.5},
        {tag:"example2", weight:1.0},
        {tag:"example3", weight:1.5}
    ],
    locations: [
        {lat:1.234, lng:5.678},
        {lat:9.876, lng:5.432}
    ]
}

标签需要在索引时分配不同的权重(权重不查询之间的变化)。对索引的搜索包括针对名称的文本搜索以及与纬度/经度对相距特定距离内的所有文档的标记。例如,在9.876 / 5.432的5000米范围内搜索:示例示例3。

Tags need to be assigned various weights at indexing time (weights do not change between queries). A search against the index consists of a text search against the name and the tags of all the documents within a specific distance from a lat/lng pair. For example, a search for: "Sample example3" within 5000 meters of 9.876/5.432.

在此类搜索中,具有更多标记匹配和与标题匹配的文档应该排名更高(不确定Solr是否默认),同时仍然考虑标签权重(这使得某个标签可能导致文档在搜索中因其权重而排名很高)。

In such a search, documents with more tag matches and matches against the title should rank higher (not sure if Solr does by default), while still considering tag weights (which makes it possible that a certain tag may cause the document to rank very high in the search because of its weight).

我过去曾使用Solr进行全文搜索,并且我已经玩过它的地理空间功能。我来自Sphinx背景,但我认为Solr是一款更强大的产品,可满足我的大多数需求。我只需要一些帮助来设计一个可以有效地完成全文+加权+地理空间的索引。非常感谢任何建议!

I've used Solr in the past to perform fulltext search and I've played around with its geospatial features. I'm coming from a Sphinx background but I think Solr is a more robust product for most of my needs. I just need some help to design an index that can do a fulltext + weighted + geospatial efficiently. Any advice is greatly appreciated!

推荐答案

地理空间多值数据可通过Solr的开箱即用架构中的location_rpt轻松处理。

The geospatial multi-valued data is handled easily via location_rpt in Solr's out of the box schema.

这里比较棘手的部分是加权标签。作为第一个剪辑,我将索引3个字段,tags05 tags10 tags15,每个字段分别有0.5,1.0和1.5的3个单独的查询时间提升(通过edismax的qf param)。这是一种离散化方法,根据您拥有的桶数量,您可以放松一些重量保真度(此处显示3个)。如果可以,请避免Solr 4 JOIN查询;他们往往很慢。由于数据被拆分,IDF得分会有点不好,所以你可能想为这些不考虑IDF的字段尝试不同的相似性实现。

The trickier part here is the weighted tags. As a first cut, I'd index 3 fields, tags05 tags10 tags15, each with 3 separate query-time boosts (via edismax's qf param) of 0.5, 1.0, and 1.5 respectively. This is a discretization approach in which you loose some of the weight fidelity depending on how many buckets you have (3 shown here). If you can, avoid Solr 4 JOIN queries; they are often quite slow. The IDF scores would be a little bad due to the data being split up, so you might want to try a different similarity implementation for these fields that don't consider IDF, perhaps.

这篇关于Solr中的一对多地理空间搜索索引设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆