使用弹性搜索地理功能查找最常见的位置? [英] Using Elastic Search Geo Functionality To Find Most Common Locations?
问题描述
{
locations:[{
时间戳:1461820561530,
latitudeE7:-378107308,
longitudeE7:1449654070,
accuracy:35,
junk_i_want_to_save_but_ignore:[{.. }
},{
timestampMs:1461820455813,
latitudeE7:-378107279,
longitudeE7:1449673809,
accuracy 33
},{
timestampMs:1461820281089,
latitudeE7:-378105184,
longitudeE7:1449254023,
accuracy:35
},{
timestampMs:1461820155814,
latitudeE7:-378177434,
longitudeE7:1429653949,
准确度:34
}
..
这些位置中的许多都将是相同的物理位置例如用户的家),但显然经度和纬度可能不完全相同。
d喜欢使用弹性搜索,并且它的地理位置功能可以产生最常见的位置的排名列表,其中位置相同,如果它们在彼此之间,例如100m,则
对于每个常见的位置,如果可能,我也希望在该位置的所有时间戳列表!
我非常感谢示例查询让我开始!
非常感谢提前。
为了使其工作,您需要修改您的映射,如下所示:
PUT / locations
{
mappings:{
location:{
properties:{
location:{
type:geo_point
}
timestampMs:{
type:long
},
accuracy:{
type:long
}
}
}
}
}
然后,当你在你的文件,你需要将纬度和经度除以10000000,并且这样的索引:
PUT / locations / location / 1
{
timestampMs:1461820561530,
location:{
lat:-37.8103308,
lon:14.4967407
},
准确度:35
}
最后,查询下面...
POST / locations / location / _search
{
aggregations:{
zoomedInView:{
filter:{
geo_bounding_box:{
location:{
top_left:-37,14 ,
bottom_right:-38,15
}
}
},
聚合:{
zoom1:{
geohash_grid:{
field:location,
precision:6
},
aggs:{
ts :{
date_histogram:{
field:timestampMs,
interval:15m,
format:DDD yyyy-MM-dd HH:mm
}
}
}
}
}
}
}
}
...将产生以下结果:
{
aggregations: {
zoomedInView:{
doc_count:1,
zoom1:{
buckets:[
{
key :k362cu,
doc_count:1,
ts:{
buckets:[
{
key_as_string 04-28 05:15,
key:1461820500000,
doc_count:1
}
]
}
}
]
}
}
}
}
更新
根据我们的d打击,这里是一个可以为您工作的解决方案。使用 Logstash ,您可以调用API并检索大JSON文件(使用 http_poller
input ),提取/转换所有位置并将其汇入到Elasticsearch(使用 elasticsearch
输出)很容易。
以下是我的初始答案中描述的每个事件的格式。
- 使用
http_poller
您可以检索JSON位置(请注意,我将轮询间隔设置为1天,但您可以将其更改为某个其他值,或者在每次要检索位置时手动运行Logstash)
- 然后我们 split 将位置数组变成个体事件
- 然后,我们将纬度/经度字段除以10,000,000,以获得正确的坐标
- 我们还需要通过移动和移除一些字段
- 最后,我们将每个事件发送到Elasticsearch
Logstash配置 locations.conf
:
input {
http_poller {
urls => {
get_locations => {
method => get
url => http://your_api.com/locations.json
headers => {
Accept => application / json
}
}
}
request_timeout => 60
interval => 86400000
codec => json
}
}
过滤器{
split {
field => locations
}
ruby {
code =>
event ['location'] = {
'lat'=> event ['locations'] ['latitudeE7'] / 10000000.0,
'lon'=> event ['位置'] ['longitudeE7'] / 10000000.0
}
}
mutate {
add_field => {
timestampMs=> %{[locations] [timestampMs]}
accuracy=> %{[locations] [accuracy]}
junk_i_want_to_save_but_ignore=> %{[locations] [junk_i_want_to_save_but_ignore]}
}
remove_field => [
位置,@timestamp,@version
]
}
}
输出{
elasticsearch {
主机=> [localhost:9200]
index => locations
document_type => location
}
}
然后可以使用以下命令运行:
bin / logstash -f locations.conf
当运行时,您可以启动搜索查询,您应该得到您期望的结果。
I have a geojson file containing a list of locations each with a longitude, latitude and timestamp. Note the longitudes and latitudes are multiplied by 10000000.
{
"locations" : [ {
"timestampMs" : "1461820561530",
"latitudeE7" : -378107308,
"longitudeE7" : 1449654070,
"accuracy" : 35,
"junk_i_want_to_save_but_ignore" : [ { .. } ]
}, {
"timestampMs" : "1461820455813",
"latitudeE7" : -378107279,
"longitudeE7" : 1449673809,
"accuracy" : 33
}, {
"timestampMs" : "1461820281089",
"latitudeE7" : -378105184,
"longitudeE7" : 1449254023,
"accuracy" : 35
}, {
"timestampMs" : "1461820155814",
"latitudeE7" : -378177434,
"longitudeE7" : 1429653949,
"accuracy" : 34
}
..
Many of these locations will be the same physical location (e.g. the user's home) but obviously the longitude and latitudes may not be exactly the same.
I would like to use Elastic Search and it's Geo functionality to produce a ranked list of most common locations where locations are deemed to be the same if they are within, say, 100m of each other?
For each common location I'd also like the list of all timestamps they were at that location if possible!
I'd very much appreciate a sample query to get me started!
Many thanks in advance.
In order to make it work you need to modify your mapping like this:
PUT /locations
{
"mappings": {
"location": {
"properties": {
"location": {
"type": "geo_point"
},
"timestampMs": {
"type": "long"
},
"accuracy": {
"type": "long"
}
}
}
}
}
Then, when you index your documents, you need to divide the latitude and longitude by 10000000, and index like this:
PUT /locations/location/1
{
"timestampMs": "1461820561530",
"location": {
"lat": -37.8103308,
"lon": 14.4967407
},
"accuracy": 35
}
Finally, your search query below...
POST /locations/location/_search
{
"aggregations": {
"zoomedInView": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": "-37, 14",
"bottom_right": "-38, 15"
}
}
},
"aggregations": {
"zoom1": {
"geohash_grid": {
"field": "location",
"precision": 6
},
"aggs": {
"ts": {
"date_histogram": {
"field": "timestampMs",
"interval": "15m",
"format": "DDD yyyy-MM-dd HH:mm"
}
}
}
}
}
}
}
}
...will yield the following result:
{
"aggregations": {
"zoomedInView": {
"doc_count": 1,
"zoom1": {
"buckets": [
{
"key": "k362cu",
"doc_count": 1,
"ts": {
"buckets": [
{
"key_as_string": "Thu 2016-04-28 05:15",
"key": 1461820500000,
"doc_count": 1
}
]
}
}
]
}
}
}
}
UPDATE
According to our discussion, here is a solution that could work for you. Using Logstash, you can call your API and retrieve the big JSON document (using the http_poller
input), extract/transform all locations and sink them to Elasticsearch (with the elasticsearch
output) very easily.
Here is how it goes in order to format each event as depicted in my initial answer.
- Using
http_poller
you can retrieve the JSON locations (note that I've set the polling interval to 1 day, but you can change that to some other value, or simply run Logstash manually each time you want to retrieve the locations) - Then we
split
the locations array into individual events - Then we divide the latitude/longitude fields by 10,000,000 to get proper coordinates
- We also need to clean it up a bit by moving and removing some fields
- Finally, we just send each event to Elasticsearch
Logstash configuration locations.conf
:
input {
http_poller {
urls => {
get_locations => {
method => get
url => "http://your_api.com/locations.json"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
interval => 86400000
codec => "json"
}
}
filter {
split {
field => "locations"
}
ruby {
code => "
event['location'] = {
'lat' => event['locations']['latitudeE7'] / 10000000.0,
'lon' => event['locations']['longitudeE7'] / 10000000.0
}
"
}
mutate {
add_field => {
"timestampMs" => "%{[locations][timestampMs]}"
"accuracy" => "%{[locations][accuracy]}"
"junk_i_want_to_save_but_ignore" => "%{[locations][junk_i_want_to_save_but_ignore]}"
}
remove_field => [
"locations", "@timestamp", "@version"
]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "locations"
document_type => "location"
}
}
You can then run with the following command:
bin/logstash -f locations.conf
When that has run, you can launch your search query and you should get what you expect.
这篇关于使用弹性搜索地理功能查找最常见的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!