如何根据源点击量在 elasticsearch 中获得准确的总和? [英] How do i get accurate sum in elasticsearch based on source hits?

查看:30
本文介绍了如何根据源点击量在 elasticsearch 中获得准确的总和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 elasticsearch 中获得精确的总和聚合?前参考我目前使用的是 elasticsearch 5.6,我的索引映射如下所示:

How do i get an exact sum aggregation in elasticsearch? Fore reference i am currently using elasticsearch 5.6 and the my index mapping looks like this:

{
  "my-index":{
    "mappings":{
      "my-type":{
        "properties":{
          "id":{
            "type":"keyword"
          },
          "fieldA":{
            "type":"double"
          },
          "fieldB":{
            "type":"double"
          },
          "fieldC":{
            "type":"double"
          },
          "version":{
            "type":"long"
          }
        }
      }
    }
  }
}

生成的搜索查询(使用java客户端)是:

The search query generated (using java client) is:

{
 /// ... some filters here
 "aggregations" : {
       "fieldA" : {
         "sum" : {
           "field" : "fieldA"
         }
       },
       "fieldB" : {
         "sum" : {
           "field" : "fieldB"
         }
       },
       "fieldC" : {
         "sum" : {
           "field" : "fieldC"
         }
       }
     }
}

但是我的结果命中生成以下内容:

However my result hits generate the following:

{
    "took": 10,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 3,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 3.8466966,
        "hits": [
            {
                "_index": "my-index",
                "_type": "my-type",
                "_id": "25a203b63e264fd2be13db006684b06d",
                "_score": 3.8466966,
                "_source": {
                    "fieldC": 108,
                    "fieldA": 108,
                    "fieldB": 0
                }
            },
            {
                "_index": "my-index",
                "_type": "my-type",
                "_id": "25a203b63e264fd2be13db006684b06d",
                "_score": 3.8466966,
                "_source": {
                    "fieldC": -36,
                    "fieldA": 108,
                    "fieldB": 144
                }
            },
            {
                "_index": "my-index",
                "_type": "my-type",
                "_id": "25a203b63e264fd2be13db006684b06d",
                "_score": 3.8466966,
                "_source": {
                    "fieldC": -7.2,
                    "fieldA": 1.8,
                    "fieldB": 9
                }
            },
            {
                "_index": "my-index",
                "_type": "my-type",
                "_id": "25a203b63e264fd2be13db006684b06d",
                "_score": 3.8466966,
                "_source": {
                    "fieldC": 14.85,
                    "fieldA": 18.9,
                    "fieldB": 4.05
                }
            },
            {
                "_index": "my-index",
                "_type": "my-type",
                "_id": "25a203b63e264fd2be13db006684b06d",
                "_score": 3.8466966,
                "_source": {
                    "fieldC": 36,
                    "fieldA": 36,
                    "fieldB": 0
                }
            }
        ]
    },
    "aggregations": {
        "fieldA": {
            "value": 272.70000000000005
        },
        "fieldB": {
            "value": 157.05
        },
        "fieldC": {
            "value": 115.64999999999999
        }
    }
}

为什么我得到:

115.64999999999999 而不是 fieldC 中的 115.65272.70000000000005 而不是 fieldA 中的 272.7

115.64999999999999 instead of 115.65 in fieldC 272.70000000000005 instead of 272.7 in fieldA

我应该使用 float 而不是 double 吗?或者有没有一种方法可以在不使用无痛脚本并使用具有指定精度和舍入模式的 java 的 BigDecimal 的情况下更改查询?

should i use float instead of double? or is there a way i can change the query without using painless script and using java's BigDecimal with specified precision and rounding mode?

推荐答案

它与 JavaScript 中的浮点数精度有关(类似于可以看到的 此处 并解释了此处).

It has to do with float number precision in JavaScript (similar to what can be seen here and explained here).

这里有两种方法可以检查:

Here are two ways to check this:

A.如果你安装了 node.js,只需在提示符下输入 node 然后输入所有 fieldA 值的总和:

A. If you node.js installed, just type node at the prompt and then enter the sum of all fieldA values:

 $ node
 108 - 36 - 7.2 + 14.85 + 36
 115.64999999999999            <--- this is the answer

B.打开浏览器的开发人员工具并选择控制台视图.然后输入与上面相同的总和:

B. Open the Developer tools of your browser and pick the Console view. Then type the same sum as above:

 > 108-36-7.2+14.85+36
 < 115.64999999999999

如您所见,这两个结果都与您在 ES 响应中看到的一致.

As you can see, both results are consistent with what you're seeing in your ES response.

规避此问题的一种方法是将您的数字存储为普通整数(即 1485 而非 14.85、3600 而非 36 等)或 scaled_float缩放因子为 100(或更大,取决于您需要的精度)

One way to circumvent this is to store your numbers either as normal integers (i.e. 1485 instead of 14.85, 3600 instead of 36, etc) or as scaled_float with a scaling factor of 100 (or bigger depending on the precision you need)

这篇关于如何根据源点击量在 elasticsearch 中获得准确的总和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆