获取在ElasticSearch中进行了特定购买次数的用户数 [英] Get user count that made a specific number of purchases in ElasticSearch

查看:89
本文介绍了获取在ElasticSearch中进行了特定购买次数的用户数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的商店中的每个用户都可以进行多次购买,并且有关购买的信息存储在ElasticSearch 6.5中.

Each user in my shop can make multiple purchases and information about such purchase is being stored inside ElasticSearch 6.5.

我想计算有多少用户在我的商店中进行了特定数量的购买-一到三.我们将基于一个名为 user_purchases 的文档进行搜索,如下所示:

I would like to count how many users made a specific number of purchases in my shop - one or three. We would base our search on a documents called user_purchases looking like this:

{
    "user_id" : 1,
    "total_value" : 111.56,
    "total_products" : 2
}

如果要使用SQL进行此操作,我们将执行以下操作:

If we were to make this using SQL we would do something like:

SELECT COUNT(u.id) FROM users u WHERE u.id IN (SELECT up.user_id FROM user_purchases up WHERE COUNT(up.id) = 1 GROUP BY up.user_id);

SELECT COUNT(u.id) FROM users u WHERE u.id IN (SELECT up.user_id FROM user_purchases up WHERE COUNT(up.id) = 3 GROUP BY up.user_id);

我已经使用ElasticSearch已有一段时间了,但是这种情况让我很头疼.我试图找到一些类似的案例,但是google合作不力,ES文档没有太多示例可以弄清楚.

I have been using ElasticSearch for some time now but this case is giving me a serious headache. I have tried to find some similar case examples but google is not cooperating very well and ES documentation does not have too many examples to figure it out.

似乎最好对每个案例进行三个单独的查询-一个或三个案例,并可能使用无痛+ ctx脚本,但我只是不知道从哪里开始.我用它做了一些不太复杂的操作,如果包含此操作,然后执行该操作",但这将花费我很长的时间.

It seems it would be best to make three separate queries for each - one or three - case and maybe use painless + ctx scripting but I just don't know where to start looking. I worked with it a little doing less complicated 'if contains this then do that` actions but this here will take me a very long time to figure out.

如果任何人都可以使用search或multisearch API(或其他任何可行的API)提供帮助并给出查询示例,我将非常感激!

If anyone could help out and give an example of a query using search or multisearch API (or any other that would work) I would be very greatful!

推荐答案

这是我设法完成所需工作的方式.假设我们有一个名为 visitor_carts 的索引,其中包含诸如此类的文档:

Here is how I have managed to accomplish what I need. Let's say we have an index called visitor_carts with documents like that:

{
    "visitor_id" : 1,
    "total_value" : 111,
    "total_products" : 2
}

{
    "visitor_id" : 1,
    "total_value" : 199.99,
    "total_products" : 1
}

{
    "visitor_id" : 1,
    "total_value" : 890.56,
    "total_products" : 2
}

{
    "visitor_id" : 2,
    "total_value" : 223.56,
    "total_products" : 2
}

{
    "visitor_id" : 3,
    "total_value" : 4.56,
    "total_products" : 2
}

有一个解决方案-称为脚本化指标聚合.有了它,您几乎可以构建任何想要的东西,缺点是您必须熟悉用法示例非常有用.无论如何,这里是可行的解决方案:

There is a solution - it is called scripted metric aggregation. With that you can build pretty much anything you want, the downside is that you have to familiarize yourself with painless scripting. The documentation in that regard is quite hard to understand, on top of that it seems specific versions are not very well maintained as what is in painless documentation does not work with my version of ElasticSearch 6.5 (even though it should according to said documentation). So a word of warning - if it does not work keep looking for more examples. I have found usage examples here very useful. Anyways here is the working solution:

POST visitor_carts/_search
{
  "query" : {
    "match_all" : {}
  },
  "aggs": {
    "purchases": {
      "scripted_metric": {
        "init_script" : "state['visitorPurchases'] = [:]",
        "map_script" : "if (state['visitorPurchases'].containsKey(doc['visitor_id'].value)) {state['visitorPurchases'][doc['visitor_id'].value]++} else {state['visitorPurchases'][doc['visitor_id'].value] = 1}",
        "combine_script": "def combine = [:]; for (visitor in state['visitorPurchases'].entrySet()) {if (combine.containsKey(visitor.getValue().toString())) {combine[visitor.getValue().toString()]++} else {combine[visitor.getValue().toString()] = 1}} return combine",
        "reduce_script": "def reduce = [:]; for (shard in states) { for (count in shard.entrySet()) {if (reduce.containsKey(count.getKey())) {reduce[count.getKey()] += count.getValue()} else {reduce[count.getKey()] = count.getValue()}}} return reduce"
      }
    }
  }
}

map_script 中,它会查询查询匹配的所有文档,并计算每个 visitor_id 的出现次数.然后,在 combine_script 中,它采用先前准备的 map_script 并按发生次数对结果进行分组.由于 combine_script 每个分片都可以工作,我们需要让 reduce_script 汇集每个分片的所有结果集,并像这样返回它:

In map_script it looks through all the documents matched by the query and counts occurences of every visitor_id. Then in combine_script it takes what map_script prepared earlier and groups the result by occurence count. As combine_script works per shard we need to have the reduce_script bring together all results sets from every shard and return it like:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "purchases" : {
      "value" : {
        "1" : 2,
        "3" : 1
      }
    }
  }
}

请参阅

Refer to the scripted metric aggregation documentation to find out what each of the script types do and then follow the usage examples here to construct what you need.

我对ElasticSearch实在是太新鲜了,以至于不能说出该解决方案的效率.它可以与我测试过的数千个文档一起很好地工作,但是我不知道它如何处理数百万/数十亿条记录.如果有人想测试一下-请成为我的客人:)

I am too fresh with ElasticSearch to be able to tell how efficient that solution is. It works well with a few thousands of documents I have tested it against but I have no idea how it will behave with millions/billions of records. If anyone cares to test this - be my guest :)

这篇关于获取在ElasticSearch中进行了特定购买次数的用户数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆