Geohash索引在Lucene中如何工作 [英] How does geohash index work in Lucene

查看:144
本文介绍了Geohash索引在Lucene中如何工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Lucene空间4中,我想知道geohash索引在后台如何工作.我了解geohash的概念,它基本上需要2点(纬度,经度)并创建一个字符串"哈希.

In lucene spatial 4 I'm wondering how the geohash index works behind the scenes. I understand the concept of the geohash which basically takes 2 points (lat, lon) and creates a single "string" hash.

该索引只是一个字符串"索引(r树或四叉树)还是沿这些方向的内容(例如仅索引一个姓氏).....或其中是否有特殊之处.

Is the index just a "string" index (r-tree or quad-tree) or something along these lines (such as just indexing a last name).....or is there something special with it.

对于固定类型的搜索,是否对哈希的所有n-gram都进行索引,例如是否使用了geohash

For pre-fixed type searches do all of the n-grams of the hash get indexed such as if a geohash is

drgt2abc会将其索引为d,dr,drg,drgt等.

drgt2abc does this get indexed as d, dr, drg, drgt, etc..

是否存在我们可能希望索引的默认n-gram数?

Is there a default number of n-grams that we might want indexed?

使用这种类型的索引将搜索具有10万条记录的查询,而1亿条记录对于空间查询具有类似的查询性能. (例如框/多边形或距离),或者随着添加大量记录,我可以期望索引的一般/典型缓慢降级.

With this type of indexing will search queries with 100 thousand records verse 100 million records have similar query performance for spatial queries. (Such as box/polygon, or distance) or can I expect a general/typical slow degradation of the index as lots of records added.

谢谢

推荐答案

最好的在线说明是我的视频: Lucene/Solr 4空间深度潜水

The best online explanation is my video: Lucene / Solr 4 Spatial deep dive

索引是仅仅是字符串"索引(r树或四叉树)还是其他东西 遵循这些原则(例如仅索引姓氏).....还是在那里 有一些特别的东西.

Is the index just a "string" index (r-tree or quad-tree) or something along these lines (such as just indexing a last name).....or is there something special with it.

Lucene从根本上说,只有一个索引用于文本,数字和现在的空间.您可以说这是一个字符串索引.这是字节/字符串的排序列表.从更高的角度来看,以这种方式使用空间是计算机科学中的"Tries"又称为"PrefixTrees"家族.

Lucene, fundamentally, has just one index used for text, numbers, and now spatial. You could say it's a string index. It's a sorted list of bytes/strings. From a higher level view, using spatial in this way is the family of "Tries" AKA "PrefixTrees" in computer science.

对于固定类型搜索,请执行哈希的所有n元语法 索引,例如geohash是

For pre-fixed type searches do all of the n-grams of the hash get indexed such as if a geohash is

drgt2abc会将其索引为d,dr,drg,drgt等.

drgt2abc does this get indexed as d, dr, drg, drgt, etc..

是的

是否存在我们可能希望索引的默认n-gram数?

Is there a default number of n-grams that we might want indexed?

您可以根据您对精度的要求方便地告诉它,它会查询它需要多长时间.或者您可以按长度告诉它.

You tell it conveniently in terms of the precision requirements you have and it'll lookup how long it needs to be. Or you can tell it by length.

使用这种类型的索引编制将搜索10万条查询 1亿条记录对以下内容具有类似的查询性能 空间查询. (例如框/多边形或距离),或者我可以期望 大量记录导致索引的一般/典型缓慢降级 添加.

With this type of indexing will search queries with 100 thousand records verse 100 million records have similar query performance for spatial queries. (Such as box/polygon, or distance) or can I expect a general/typical slow degradation of the index as lots of records added.

实际上,这种类型的索引(更具体地说是使用它的聪明的递归搜索树算法)意味着您将具有可扩展的搜索性能. 100m是一个文档要匹配的大量文档,因此它当然比仅匹配10万个文档的文档要慢,但是绝对是线性的.到明年,它将变得更快,这是由于今年夏天在新的PrefixTree编码上进行的工作以及正在进行的空间基准测试,这将使我能够进行我计划的进一步优化.

Indeed, this type of indexing (and more specifically the clever recursive search tree algorithm that uses it) means that you'll have scalable search performance. 100m is a ton of documents for one filter to match so it's of course going to be slower than one that matches only 100k docs, but it's definitely sub-linear. And by next year it'll be even faster, due to work happening this summer on a new PrefixTree encoding plus a spatial benchmark in progress which will allow me to make further tuning optimizations I have planned.

这篇关于Geohash索引在Lucene中如何工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆