查询Elasticsearch中每种类型的最新文档 [英] Query the latest document of each type on Elasticsearch

查看：184 发布时间：2017/8/6 23:43:16 api elasticsearch timestamp

本文介绍了查询Elasticsearch中每种类型的最新文档的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试运行一个关于Elasticsearch的简单查询的开始，但我似乎无法得到我正在寻找的结果。

这是一个简单的例子，我正在努力做：

我有一个新闻数据库。每个消息都包含一个源，一个标题，一个时间戳和一个用户。

我想要为给定的每个可用源获取最后一个（基于时间戳的）标题用户。

 ＃！/ bin / bash 
 
 export ELASTICSEARCH_ENDPOINT =http：// localhost： 9200
 
＃创建索引
 
 curl -XPUT$ ELASTICSEARCH_ENDPOINT / news-d'{
mappings：{
news ：{
properties：{
source：{type：string，index：not_analyzed}，
headline：{type 对象}，
timestamp：{type：date，format：date_hour_minute_second_millis}，
user：{type：string ：not_analyzed} 
} 
} 
} 
}'
 
＃索引文档
 curl -XPOST$ ELASTICSEARCH_ENDPOINT / _bulk ？refresh = true-d'
 {index：{_ index：news，_ type：news}} 
 {user：John ： C NN，标题：好消息，时间戳：2015-07-28T00：07：29.000} 
 {index：{_ index：news，_ type news}} 
 {user：John，source：CNN，headline：更多好消息，timestamp：2015-07-28T00：08：23.000 } 
 {index：{_ index：news，_ type：news}} 
 {user：John，source：ESPN 标题：体育新闻，时间戳：2015-07-28T00：09：32.000} 
 {index：{_ index：news，_ type }} 
 {user：John，source：ESPN，headline：更多体育新闻，时间戳：2015-07-28T00：10：35.000 b $ b {index：{_ index：news，_ type：news}} 
 {user：Mary，source：Yahoo ：更多新闻，时间戳：2015-07-28T00：11：54.000} 
 {index：{_ index：news，_ type：news b $ b {user：Mary，source：Yahoo，headline：Crazy news，timestamp：2015-07-28T00：12：31.000} 
'

那么如何从John获得最后的CNN和最后一个ESPN标题？我有蜜蜂n寻找多重搜索API，但这意味着我需要事先知道所有的来源（在这种情况下是CNN和ESPN）。

解决方案

首先，请注意，我不得不将标题字段的映射更改为 string 如您的样本文档中的标题为 string s而不是对象 s。

所以，像下面这样一个查询将会检索你的期望：

  curl -XPOST$ ELASTICSEARCH_ENDPOINT /新闻/ _search-d'{
size：0，
query：{
filtered：{
filter：{
术语：{
user：John< ---用户的过滤器= John 
} 
} 
} 
}，
 aggs：{
sources：{
terms：{
field：source< ---按源码汇总
}，
aggs：{
latest：{
top_ hits：{
size：1，< ---只取第一个... 
_source：[< ---只有日期和标题
标题，
时间戳
]，
排序：{
时间戳：desc<... ...只有最新的命中
} 
} 
} 
} 
} 
} 
}'

这将产生如下结果：

  {
 ... 
aggregate：{
sources：{
doc_count_error_upper_bound：0，
sum_other_doc_count：0，
buckets {
key：CNN，
doc_count：2，
最新：{
hits：{
total ，
max_score：null，
hits：[{
_index：news，
_type：news，
 _id：AU7Sh3VDGDddn2ZNuDVl，
_score：null，
_source：{
标题：更多好消息，
timestamp 2015-07-28T00：08：23.000
}，
sort：[1438042103000] 
}] 
} 
} 
}，{ 
key：ESPN，
doc_count：2，
最新：{
hits：{
total 
max_score：null，
hits：[{
_index：news，
_type：news，
_id ：AU7Sh3VDGDddn2ZNuDVn，
_score：null，
_source：{
标题：更多体育新闻，
timestamp 07-28T00：10：35.000
}，
sort：[1438042235000] 
}] 
} 
 } 
}] 
} 
} 
}

I'm trying to run what started to look like a simple query on Elasticsearch, but I just can't seem to get the result I'm looking for.

Here's a brief example of what I'm trying to do:

I have a database of news. Each piece of news contains a source, a headline, a timestamp and a user.

I want the get the last (timestamp based) headline for each available source for a given user.

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/news" -d '{
    "mappings": {
        "news": {
            "properties": {
                "source": { "type": "string", "index": "not_analyzed" },
                "headline": { "type": "object" },
                "timestamp": { "type": "date", "format": "date_hour_minute_second_millis" },
                "user": { "type": "string", "index": "not_analyzed" }
            }
        }
    }
}'

# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "CNN", "headline": "Great news", "timestamp": "2015-07-28T00:07:29.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "CNN", "headline": "More great news", "timestamp": "2015-07-28T00:08:23.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "ESPN", "headline": "Sports news", "timestamp": "2015-07-28T00:09:32.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "John", "source": "ESPN", "headline": "More sports news", "timestamp": "2015-07-28T00:10:35.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "Mary", "source": "Yahoo", "headline": "More news", "timestamp": "2015-07-28T00:11:54.000"}
{"index":{"_index":"news","_type":"news"}}
{"user": "Mary", "source": "Yahoo", "headline": "Crazy news", "timestamp": "2015-07-28T00:12:31.000"}
'

So how do I get the last CNN and last ESPN headlines from John for example?

I've been looking into the multi search API, but this would mean that I would need to know all the sources beforehand (in this case CNN and ESPN).

解决方案

First, please note that I had to change your mapping for the headline field to string, as in your sample documents headlines are strings and not objects.

So, a query like the following one would retrieve what you expect:

curl -XPOST "$ELASTICSEARCH_ENDPOINT/news/_search" -d '{
  "size": 0,
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "user": "John"           <--- filter for user=John
        }
      }
    }
  },
  "aggs": {
    "sources": {
      "terms": {
        "field": "source"          <--- aggregate by source
      },
      "aggs": {
        "latest": {
          "top_hits": {
            "size": 1,             <--- only take the first...
            "_source": [           <--- only the date and headline
               "headline",
               "timestamp"
            ],
            "sort": {
              "timestamp": "desc"  <--- ...and only the latest hit
            }
          }
        }
      }
    }
  }
}'

That will yield something like this:

{
  ...
  "aggregations" : {
    "sources" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "CNN",
        "doc_count" : 2,
        "latest" : {
          "hits" : {
            "total" : 2,
            "max_score" : null,
            "hits" : [ {
              "_index" : "news",
              "_type" : "news",
              "_id" : "AU7Sh3VDGDddn2ZNuDVl",
              "_score" : null,
              "_source":{
                  "headline": "More great news", 
                  "timestamp": "2015-07-28T00:08:23.000"
              },
              "sort" : [ 1438042103000 ]
            } ]
          }
        }
      }, {
        "key" : "ESPN",
        "doc_count" : 2,
        "latest" : {
          "hits" : {
            "total" : 2,
            "max_score" : null,
            "hits" : [ {
              "_index" : "news",
              "_type" : "news",
              "_id" : "AU7Sh3VDGDddn2ZNuDVn",
              "_score" : null,
              "_source":{
                   "headline": "More sports news", 
                   "timestamp": "2015-07-28T00:10:35.000"
              },
              "sort" : [ 1438042235000 ]
            } ]
          }
        }
      } ]
    }
  }
}

这篇关于查询Elasticsearch中每种类型的最新文档的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

查询Elasticsearch中每种类型的最新文档 [英] Query the latest document of each type on Elasticsearch

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

查询Elasticsearch中每种类型的最新文档 [英] Query the latest document of each type on Elasticsearch

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭