使用弹性搜索地理功能查找最常见的位置? [英] Using Elastic Search Geo Functionality To Find Most Common Locations?

查看:172
本文介绍了使用弹性搜索地理功能查找最常见的位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个geojson文件,其中包含一个位置列表,每个位置都有经度,纬度和时间戳。注意经度和纬度乘以10000000。

  {
locations:[{
时间戳:1461820561530,
latitudeE7:-378107308,
longitudeE7:1449654070,
accuracy:35,
junk_i_want_to_save_but_ignore:[{.. }
},{
timestampMs:1461820455813,
latitudeE7:-378107279,
longitudeE7:1449673809,
accuracy 33
},{
timestampMs:1461820281089,
latitudeE7:-378105184,
longitudeE7:1449254023,
accuracy:35
},{
timestampMs:1461820155814,
latitudeE7:-378177434,
longitudeE7:1429653949,
准确度:34
}
..

这些位置中的许多都将是相同的物理位置例如用户的家),但显然经度和纬度可能不完全相同。



d喜欢使用弹性搜索,并且它的地理位置功能可以产生最常见的位置的排名列表,其中位置相同,如果它们在彼此之间,例如100m,则



对于每个常见的位置,如果可能,我也希望在该位置的所有时间戳列表!



我非常感谢示例查询让我开始!



非常感谢提前。

解决方案

为了使其工作,您需要修改您的映射,如下所示:

  PUT / locations 
{
mappings:{
location:{
properties:{
location:{
type:geo_point
}
timestampMs:{
type:long
},
accuracy:{
type:long
}
}
}
}
}

然后,当你在你的文件,你需要将纬度和经度除以10000000,并且这样的索引:

  PUT / locations / location / 1 
{
timestampMs:1461820561530,
location:{
lat:-37.8103308,
lon:14.4967407
},
准确度:35
}

最后,查询下面...

  POST / locations / location / _search 
{
aggregations:{
zoomedInView:{
filter:{
geo_bounding_box:{
location:{
top_left:-37,14 ,
bottom_right:-38,15
}
}
},
聚合:{
zoom1:{
geohash_grid:{
field:location,
precision:6
},
aggs:{
ts :{
date_histogram:{
field:timestampMs,
interval:15m,
format:DDD yyyy-MM-dd HH:mm
}
}
}
}
}
}
}
}

...将产生以下结果:

  {
aggregations: {
zoomedInView:{
doc_count:1,
zoom1:{
buckets:[
{
key :k362cu,
doc_count:1,
ts:{
buckets:[
{
key_as_string 04-28 05:15,
key:1461820500000,
doc_count:1
}
]
}
}
]
}
}
}
}

更新



根据我们的d打击,这里是一个可以为您工作的解决方案。使用 Logstash ,您可以调用API并检索大JSON文件(使用 http_poller input ),提取/转换所有位置并将其汇入到Elasticsearch(使用 elasticsearch 输出)很容易。



以下是我的初始答案中描述的每个事件的格式。


  1. 使用 http_poller 您可以检索JSON位置(请注意,我将轮询间隔设置为1天,但您可以将其更改为某个其他值,或者在每次要检索位置时手动运行Logstash)

  2. 然后我们 split 将位置数组变成个体事件

  3. 然后,我们将纬度/经度字段除以10,000,000,以获得正确的坐标

  4. 我们还需要通过移动和移除一些字段

  5. 最后,我们将每个事件发送到Elasticsearch

Logstash配置 locations.conf

  input {
http_poller {
urls => {
get_locations => {
method => get
url => http://your_api.com/locations.json
headers => {
Accept => application / json
}
}
}
request_timeout => 60
interval => 86400000
codec => json
}
}
过滤器{
split {
field => locations
}
ruby​​ {
code =>
event ['location'] = {
'lat'=> event ['locations'] ['latitudeE7'] / 10000000.0,
'lon'=> event ['位置'] ['longitudeE7'] / 10000000.0
}

}
mutate {
add_field => {
timestampMs=> %{[locations] [timestampMs]}
accuracy=> %{[locations] [accuracy]}
junk_i_want_to_save_but_ignore=> %{[locations] [junk_i_want_to_save_but_ignore]}
}
remove_field => [
位置,@timestamp,@version
]
}
}
输出{
elasticsearch {
主机=> [localhost:9200]
index => locations
document_type => location
}
}

然后可以使用以下命令运行:

  bin / logstash -f locations.conf 

当运行时,您可以启动搜索查询,您应该得到您期望的结果。


I have a geojson file containing a list of locations each with a longitude, latitude and timestamp. Note the longitudes and latitudes are multiplied by 10000000.

{
  "locations" : [ {
    "timestampMs" : "1461820561530",
    "latitudeE7" : -378107308,
    "longitudeE7" : 1449654070,
    "accuracy" : 35,
    "junk_i_want_to_save_but_ignore" : [ { .. } ]
  }, {
    "timestampMs" : "1461820455813",
    "latitudeE7" : -378107279,
    "longitudeE7" : 1449673809,
    "accuracy" : 33
  }, {
    "timestampMs" : "1461820281089",
    "latitudeE7" : -378105184,
    "longitudeE7" : 1449254023,
    "accuracy" : 35
  }, {
    "timestampMs" : "1461820155814",
    "latitudeE7" : -378177434,
    "longitudeE7" : 1429653949,
    "accuracy" : 34
  }
  ..

Many of these locations will be the same physical location (e.g. the user's home) but obviously the longitude and latitudes may not be exactly the same.

I would like to use Elastic Search and it's Geo functionality to produce a ranked list of most common locations where locations are deemed to be the same if they are within, say, 100m of each other?

For each common location I'd also like the list of all timestamps they were at that location if possible!

I'd very much appreciate a sample query to get me started!

Many thanks in advance.

解决方案

In order to make it work you need to modify your mapping like this:

PUT /locations
{
  "mappings": {
    "location": {
      "properties": {
        "location": {
          "type": "geo_point"
        },
        "timestampMs": {
          "type": "long"
        },
        "accuracy": {
          "type": "long"
        }
      }
    }
  }
}

Then, when you index your documents, you need to divide the latitude and longitude by 10000000, and index like this:

PUT /locations/location/1
{
  "timestampMs": "1461820561530",
  "location": {
    "lat": -37.8103308,
    "lon": 14.4967407
  },
  "accuracy": 35
}

Finally, your search query below...

POST /locations/location/_search
{
  "aggregations": {
    "zoomedInView": {
      "filter": {
        "geo_bounding_box": {
          "location": {
            "top_left": "-37, 14",
            "bottom_right": "-38, 15"
          }
        }
      },
      "aggregations": {
        "zoom1": {
          "geohash_grid": {
            "field": "location",
            "precision": 6
          },
          "aggs": {
            "ts": {
              "date_histogram": {
                "field": "timestampMs",
                "interval": "15m",
                "format": "DDD yyyy-MM-dd HH:mm"
              }
            }
          }
        }
      }
    }
  }
}

...will yield the following result:

{
  "aggregations": {
    "zoomedInView": {
      "doc_count": 1,
      "zoom1": {
        "buckets": [
          {
            "key": "k362cu",
            "doc_count": 1,
            "ts": {
              "buckets": [
                {
                  "key_as_string": "Thu 2016-04-28 05:15",
                  "key": 1461820500000,
                  "doc_count": 1
                }
              ]
            }
          }
        ]
      }
    }
  }
}

UPDATE

According to our discussion, here is a solution that could work for you. Using Logstash, you can call your API and retrieve the big JSON document (using the http_poller input), extract/transform all locations and sink them to Elasticsearch (with the elasticsearch output) very easily.

Here is how it goes in order to format each event as depicted in my initial answer.

  1. Using http_poller you can retrieve the JSON locations (note that I've set the polling interval to 1 day, but you can change that to some other value, or simply run Logstash manually each time you want to retrieve the locations)
  2. Then we split the locations array into individual events
  3. Then we divide the latitude/longitude fields by 10,000,000 to get proper coordinates
  4. We also need to clean it up a bit by moving and removing some fields
  5. Finally, we just send each event to Elasticsearch

Logstash configuration locations.conf:

input {
  http_poller {
    urls => {
      get_locations => {
        method => get
        url => "http://your_api.com/locations.json"
        headers => {
          Accept => "application/json"
        }
      }
    }
    request_timeout => 60
    interval => 86400000
    codec => "json"
  }
}
filter {
  split {
    field => "locations" 
  }
  ruby {
    code => "
      event['location'] = {
        'lat' => event['locations']['latitudeE7'] / 10000000.0,
        'lon' => event['locations']['longitudeE7'] / 10000000.0
      }
    "
  }
  mutate {
    add_field => {
      "timestampMs" => "%{[locations][timestampMs]}"
      "accuracy" => "%{[locations][accuracy]}"
      "junk_i_want_to_save_but_ignore" => "%{[locations][junk_i_want_to_save_but_ignore]}"
    }
    remove_field => [
      "locations", "@timestamp", "@version" 
    ]
  }
}
output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "locations"
    document_type => "location"
  }
}

You can then run with the following command:

bin/logstash -f locations.conf

When that has run, you can launch your search query and you should get what you expect.

这篇关于使用弹性搜索地理功能查找最常见的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆