Solr中的一对多地理空间搜索索引设计 [英] One-to-Many Geospatial Search Index Design in Solr

查看：345 发布时间：2018/8/2 15:32:57 mysql solr lucene indexing geospatial

本文介绍了Solr中的一对多地理空间搜索索引设计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望就设计Solr索引的最佳方法提出一些建议，其中每个文档都有多个标签以及多个lat / lng对。

I'm hoping to get some advice on the best way to design a Solr index where each document has multiple tags as well as multiple lat/lng pairs.

示例文档的JSON表示：

The JSON representation of an example document:

Document {
    id: 123,
    name: "Sample Doc",
    tags: [
        {tag:"example1", weight:0.5},
        {tag:"example2", weight:1.0},
        {tag:"example3", weight:1.5}
    ],
    locations: [
        {lat:1.234, lng:5.678},
        {lat:9.876, lng:5.432}
    ]
}

标签需要在索引时分配不同的权重（权重不查询之间的变化）。对索引的搜索包括针对名称的文本搜索以及与纬度/经度对相距特定距离内的所有文档的标记。例如，在9.876 / 5.432的5000米范围内搜索：示例示例3。

Tags need to be assigned various weights at indexing time (weights do not change between queries). A search against the index consists of a text search against the name and the tags of all the documents within a specific distance from a lat/lng pair. For example, a search for: "Sample example3" within 5000 meters of 9.876/5.432.

在此类搜索中，具有更多标记匹配和与标题匹配的文档应该排名更高（不确定Solr是否默认），同时仍然考虑标签权重（这使得某个标签可能导致文档在搜索中因其权重而排名很高）。

In such a search, documents with more tag matches and matches against the title should rank higher (not sure if Solr does by default), while still considering tag weights (which makes it possible that a certain tag may cause the document to rank very high in the search because of its weight).

我过去曾使用Solr进行全文搜索，并且我已经玩过它的地理空间功能。我来自Sphinx背景，但我认为Solr是一款更强大的产品，可满足我的大多数需求。我只需要一些帮助来设计一个可以有效地完成全文+加权+地理空间的索引。非常感谢任何建议！

I've used Solr in the past to perform fulltext search and I've played around with its geospatial features. I'm coming from a Sphinx background but I think Solr is a more robust product for most of my needs. I just need some help to design an index that can do a fulltext + weighted + geospatial efficiently. Any advice is greatly appreciated!

推荐答案

地理空间多值数据可通过Solr的开箱即用架构中的location_rpt轻松处理。

The geospatial multi-valued data is handled easily via location_rpt in Solr's out of the box schema.

这里比较棘手的部分是加权标签。作为第一个剪辑，我将索引3个字段，tags05 tags10 tags15，每个字段分别有0.5,1.0和1.5的3个单独的查询时间提升（通过edismax的qf param）。这是一种离散化方法，根据您拥有的桶数量，您可以放松一些重量保真度（此处显示3个）。如果可以，请避免Solr 4 JOIN查询;他们往往很慢。由于数据被拆分，IDF得分会有点不好，所以你可能想为这些不考虑IDF的字段尝试不同的相似性实现。

The trickier part here is the weighted tags. As a first cut, I'd index 3 fields, tags05 tags10 tags15, each with 3 separate query-time boosts (via edismax's qf param) of 0.5, 1.0, and 1.5 respectively. This is a discretization approach in which you loose some of the weight fidelity depending on how many buckets you have (3 shown here). If you can, avoid Solr 4 JOIN queries; they are often quite slow. The IDF scores would be a little bad due to the data being split up, so you might want to try a different similarity implementation for these fields that don't consider IDF, perhaps.

这篇关于Solr中的一对多地理空间搜索索引设计的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Solr中的一对多地理空间搜索索引设计 [英] One-to-Many Geospatial Search Index Design in Solr

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

Solr中的一对多地理空间搜索索引设计 [英] One-to-Many Geospatial Search Index Design in Solr

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭