ElasticSearch加入过滤器:使用子查询结果作为过滤器输入可能? [英] ElasticSearch Join Filter: Using subquery results as filter input possible?

查看:175
本文介绍了ElasticSearch加入过滤器:使用子查询结果作为过滤器输入可能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用例,我想使用ElasticSearch进行实时分析。在这一点上,我想要能够计算一些简单的亲和度分数。



目前使用筛选条件用户群执行的事务数量进行定义,使用完整的用户群。



根据我的理解,我需要执行以下操作:


  1. 获取我过滤的用户基础的不同交易

  2. 在完整的用户群中查询这些交易(类型)

  3. 计算(规范等)

为了获得过滤的用户群的不同的事务,我目前使用的是条件过滤器查询具有返回所有术语(事务类型)的faceting。据我所知,我需要使用这个结果作为条款过滤器查询的输入,以便第二步能够接收到我想要的结果。



我读了在GitHub上有一个拉动请求,似乎实现了这一点( https://github.com/elasticsearch/弹性搜索/拉/ 3278 ),但对我来说这是否已经可以在当前版本中使用并不是很明显。



如果没有,是否有一些解决方法如何实现这一点?



作为附加信息,这里是我的样本映射:

  curl -XPUT'http:// localhost:9200 / store / user / _mapping'-d'
{
user:{
properties {
user_id:{type:integer},
gender:{type:string,index:not_analyzed},
年龄:{type:integer},
age_bracket:{type:string,index:not_analyzed },
current_city:{type:string,index:not_analyzed},
relationship_status:{type:string,index not,$$$$$$$$$$$$ },
t_oid:{type:string,index:not_analyzed},
t_name:{type:string,index not_analyzed},
tt_id:{type:integer},
tt_name:{type:string,index:not_analyzed},
}
}
}
}
}'

所以,对于我的示例使用案例的实际期望结果,我将具有以下内容:


  1. 我的已过滤的用户群将有这个例子过滤:gender:male& relation_status:单。对于这些,我想获得不同的事务类型(嵌套文档的字段tt_name),并计算不同user_ids的数量。

  2. 接下来,我想查询我的完整的用户基(不包括来自1.的交易类型列表中的过滤器)并计算不同user_ids的数量

  3. 执行亲和度计算


解决方案

以下是一个可运行示例的链接:



http://sense.qbox.io/gist/9da6a30fc12c36f90ae39111a08df283b56ec03c



它假定文件如下:

  {transaction_type:some_transaction,user_base:some_user_base_id} 

查询设置为不返回结果,因为聚合负责计算您要查找的统计信息:

  {
size:0,
que r ${
match_all:{}
},
aggs:{
distinct_transactions:{
terms:{
field:transaction_type,
size:20
},
aggs:{
by_user_base:{
terms {
field:user_base,
size:20
}
}
}
}
}
}

这里的结果如下:


$ b $
b
key:
订阅,
doc_count:4,
by_user_base:{
buckets:[
{
key:2,
doc_count:3
},
{
key:1,
doc_count:1
}
]
}
},
{
key:purchase,
doc_count :3,
by_user_base:{
buckets:[
{
key:1,
doc_count:2
},
{
key:2,
doc_count:1
}
]
}
}
]
}
}

所以,在聚合你会有一个distinct_transactions列表。密钥将是交易类型,而doc_count将代表所有用户的总交易。



在每个distinct_transaction内部都有by_user_base,这是另一个术语agg(嵌套)。就像交易一样,密钥将代表用户名(或ID或任何),doc_count将代表唯一用户群的交易数量。



是什么你在做什么?希望我帮忙。


I have a Use Case where I want to use ElasticSearch for realtime analytics. Within that, I want to be able to calculate some simple affinity scores.

Those are currently defined using the number of transactions a filtered-by-criteria user base performs, compared with the complete user base.

From my understanding, I'd need to do the following:

  1. Get the distinct transactions of my filtered user base
  2. Query for these transaction (types) in the complete user base
  3. Do the calculation (norming etc.)

To get the "distinct transactions" for the filtered user base, I currently use a Terms Filter Query with faceting which returns all terms (transaction types). As far as I understand, I's need to use this result as input of a Terms Filter Query for the second step to be able to receive the result I want.

I read that there's a pull request on GitHub which seems to implement this (https://github.com/elasticsearch/elasticsearch/pull/3278), but it's not really obvious to me whether this is already usable in a current release or not.

If not, are there some workarounds how I could implement this?

As additional info, here is my sample mapping:

curl -XPUT 'http://localhost:9200/store/user/_mapping' -d '
{
  "user": {
    "properties": {
      "user_id": { "type": "integer" },
      "gender": { "type": "string", "index" : "not_analyzed" },
      "age": { "type": "integer" },
      "age_bracket": { "type": "string", "index" : "not_analyzed" },
      "current_city": { "type": "string", "index" : "not_analyzed" },
      "relationship_status": { "type": "string", "index" : "not_analyzed" },
      "transactions" : {
        "type": "nested",
        "properties" : {
          "t_id": { "type": "integer" },
          "t_oid": { "type": "string", "index" : "not_analyzed" },
          "t_name": { "type": "string", "index" : "not_analyzed" },
          "tt_id": { "type": "integer" },
          "tt_name": { "type": "string", "index" : "not_analyzed" },
        }
      }
    }
  }
}'

So, for my actual desired result for my example Use Case, I'd have the following:

  1. My filtered user base would have this example filter: "gender": "male" & "relationship_status": "single". For these, I want to get the distinct transaction types (field "tt_name" of the nested document) and count the number of distinct user_ids.
  2. Next, I want to query my complete user base (no filter other than the list of transaction types from 1.) and count the number of distinct user_ids
  3. Do the "affinity" calculations

解决方案

Here's a link to a runnable example:

http://sense.qbox.io/gist/9da6a30fc12c36f90ae39111a08df283b56ec03c

It presumes documents that look like:

{ "transaction_type" : "some_transaction", "user_base" : "some_user_base_id" }

The query is set to return no results, since aggregations take care of computing the stats you're looking for:

{
  "size" : 0,
  "query" : {
    "match_all" : {}
  },
  "aggs" : {
    "distinct_transactions" : {
      "terms" : {
        "field" : "transaction_type",
        "size" : 20
      },
      "aggs" : {
        "by_user_base" : {
          "terms" : {
            "field" : "user_base",
            "size" : 20
          }
        }
      }
    }
  }
}

And here's what the result looks like:

  "aggregations": {
      "distinct_transactions": {
         "buckets": [
            {
               "key": "subscribe",
               "doc_count": 4,
               "by_user_base": {
                  "buckets": [
                     {
                        "key": "2",
                        "doc_count": 3
                     },
                     {
                        "key": "1",
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "purchase",
               "doc_count": 3,
               "by_user_base": {
                  "buckets": [
                     {
                        "key": "1",
                        "doc_count": 2
                     },
                     {
                        "key": "2",
                        "doc_count": 1
                     }
                  ]
               }
            }
         ]
      }
   }

So, inside of "aggregations", you'll have a list of "distinct_transactions". The key will be the transaction type, and the doc_count will represent the total transactions by all users.

Inside of each "distinct_transaction", there's "by_user_base", which is another terms agg (nested). Just like the transactions, the key will represent the user base name (or ID or whatever) and the doc_count will represent that unique user base's # of transactions.

Is that what you were looking to do? Hope I helped.

这篇关于ElasticSearch加入过滤器:使用子查询结果作为过滤器输入可能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆