ElasticSearch加入过滤器:使用子查询结果作为过滤器输入可能? [英] ElasticSearch Join Filter: Using subquery results as filter input possible?
问题描述
目前使用筛选条件用户群执行的事务数量进行定义,使用完整的用户群。
根据我的理解,我需要执行以下操作:
- 获取我过滤的用户基础的不同交易
- 在完整的用户群中查询这些交易(类型)
- 计算(规范等)
为了获得过滤的用户群的不同的事务,我目前使用的是条件过滤器查询具有返回所有术语(事务类型)的faceting。据我所知,我需要使用这个结果作为条款过滤器查询的输入,以便第二步能够接收到我想要的结果。
我读了在GitHub上有一个拉动请求,似乎实现了这一点( https://github.com/elasticsearch/弹性搜索/拉/ 3278 ),但对我来说这是否已经可以在当前版本中使用并不是很明显。
如果没有,是否有一些解决方法如何实现这一点?
作为附加信息,这里是我的样本映射:
curl -XPUT'http:// localhost:9200 / store / user / _mapping'-d'
{
user:{
properties {
user_id:{type:integer},
gender:{type:string,index:not_analyzed},
年龄:{type:integer},
age_bracket:{type:string,index:not_analyzed },
current_city:{type:string,index:not_analyzed},
relationship_status:{type:string,index not,$$$$$$$$$$$$ },
t_oid:{type:string,index:not_analyzed},
t_name:{type:string,index not_analyzed},
tt_id:{type:integer},
tt_name:{type:string,index:not_analyzed},
}
}
}
}
}'
所以,对于我的示例使用案例的实际期望结果,我将具有以下内容:
- 我的已过滤的用户群将有这个例子过滤:gender:male& relation_status:单。对于这些,我想获得不同的事务类型(嵌套文档的字段tt_name),并计算不同user_ids的数量。
- 接下来,我想查询我的完整的用户基(不包括来自1.的交易类型列表中的过滤器)并计算不同user_ids的数量
- 执行亲和度计算
以下是一个可运行示例的链接:
http://sense.qbox.io/gist/9da6a30fc12c36f90ae39111a08df283b56ec03c
它假定文件如下:
{transaction_type:some_transaction,user_base:some_user_base_id}
查询设置为不返回结果,因为聚合负责计算您要查找的统计信息:
{
size:0,
que r ${
match_all:{}
},
aggs:{
distinct_transactions:{
terms:{
field:transaction_type,
size:20
},
aggs:{
by_user_base:{
terms {
field:user_base,
size:20
}
}
}
}
}
}
这里的结果如下:
$ b $
b
key:
订阅,
doc_count:4,
by_user_base:{
buckets:[
{
key:2,
doc_count:3
},
{
key:1,
doc_count:1
}
]
}
},
{
key:purchase,
doc_count :3,
by_user_base:{
buckets:[
{
key:1,
doc_count:2
},
{
key:2,
doc_count:1
}
]
}
}
]
}
}
所以,在聚合你会有一个distinct_transactions列表。密钥将是交易类型,而doc_count将代表所有用户的总交易。
在每个distinct_transaction内部都有by_user_base,这是另一个术语agg(嵌套)。就像交易一样,密钥将代表用户名(或ID或任何),doc_count将代表唯一用户群的交易数量。
是什么你在做什么?希望我帮忙。
I have a Use Case where I want to use ElasticSearch for realtime analytics. Within that, I want to be able to calculate some simple affinity scores.
Those are currently defined using the number of transactions a filtered-by-criteria user base performs, compared with the complete user base.
From my understanding, I'd need to do the following:
- Get the distinct transactions of my filtered user base
- Query for these transaction (types) in the complete user base
- Do the calculation (norming etc.)
To get the "distinct transactions" for the filtered user base, I currently use a Terms Filter Query with faceting which returns all terms (transaction types). As far as I understand, I's need to use this result as input of a Terms Filter Query for the second step to be able to receive the result I want.
I read that there's a pull request on GitHub which seems to implement this (https://github.com/elasticsearch/elasticsearch/pull/3278), but it's not really obvious to me whether this is already usable in a current release or not.
If not, are there some workarounds how I could implement this?
As additional info, here is my sample mapping:
curl -XPUT 'http://localhost:9200/store/user/_mapping' -d '
{
"user": {
"properties": {
"user_id": { "type": "integer" },
"gender": { "type": "string", "index" : "not_analyzed" },
"age": { "type": "integer" },
"age_bracket": { "type": "string", "index" : "not_analyzed" },
"current_city": { "type": "string", "index" : "not_analyzed" },
"relationship_status": { "type": "string", "index" : "not_analyzed" },
"transactions" : {
"type": "nested",
"properties" : {
"t_id": { "type": "integer" },
"t_oid": { "type": "string", "index" : "not_analyzed" },
"t_name": { "type": "string", "index" : "not_analyzed" },
"tt_id": { "type": "integer" },
"tt_name": { "type": "string", "index" : "not_analyzed" },
}
}
}
}
}'
So, for my actual desired result for my example Use Case, I'd have the following:
- My filtered user base would have this example filter: "gender": "male" & "relationship_status": "single". For these, I want to get the distinct transaction types (field "tt_name" of the nested document) and count the number of distinct user_ids.
- Next, I want to query my complete user base (no filter other than the list of transaction types from 1.) and count the number of distinct user_ids
- Do the "affinity" calculations
Here's a link to a runnable example:
http://sense.qbox.io/gist/9da6a30fc12c36f90ae39111a08df283b56ec03c
It presumes documents that look like:
{ "transaction_type" : "some_transaction", "user_base" : "some_user_base_id" }
The query is set to return no results, since aggregations take care of computing the stats you're looking for:
{
"size" : 0,
"query" : {
"match_all" : {}
},
"aggs" : {
"distinct_transactions" : {
"terms" : {
"field" : "transaction_type",
"size" : 20
},
"aggs" : {
"by_user_base" : {
"terms" : {
"field" : "user_base",
"size" : 20
}
}
}
}
}
}
And here's what the result looks like:
"aggregations": {
"distinct_transactions": {
"buckets": [
{
"key": "subscribe",
"doc_count": 4,
"by_user_base": {
"buckets": [
{
"key": "2",
"doc_count": 3
},
{
"key": "1",
"doc_count": 1
}
]
}
},
{
"key": "purchase",
"doc_count": 3,
"by_user_base": {
"buckets": [
{
"key": "1",
"doc_count": 2
},
{
"key": "2",
"doc_count": 1
}
]
}
}
]
}
}
So, inside of "aggregations", you'll have a list of "distinct_transactions". The key will be the transaction type, and the doc_count will represent the total transactions by all users.
Inside of each "distinct_transaction", there's "by_user_base", which is another terms agg (nested). Just like the transactions, the key will represent the user base name (or ID or whatever) and the doc_count will represent that unique user base's # of transactions.
Is that what you were looking to do? Hope I helped.
这篇关于ElasticSearch加入过滤器:使用子查询结果作为过滤器输入可能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!