获取直方图的数据 [英] Getting data for histogram plot

查看:71
本文介绍了获取直方图的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以指定MySQL中的bin大小?现在,我正在尝试以下SQL查询:

Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query:

select total, count(total) from faults GROUP BY total;

正在生成的数据足够好,但是行太多了.我需要的是一种将数据分组到预定义的bin中的方法.我可以通过脚本语言来做到这一点,但是有没有办法直接在SQL中做到这一点?

The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL?

示例:

+-------+--------------+
| total | count(total) |
+-------+--------------+
|    30 |            1 | 
|    31 |            2 | 
|    33 |            1 | 
|    34 |            3 | 
|    35 |            2 | 
|    36 |            6 | 
|    37 |            3 | 
|    38 |            2 | 
|    41 |            1 | 
|    42 |            5 | 
|    43 |            1 | 
|    44 |            7 | 
|    45 |            4 | 
|    46 |            3 | 
|    47 |            2 | 
|    49 |            3 | 
|    50 |            2 | 
|    51 |            3 | 
|    52 |            4 | 
|    53 |            2 | 
|    54 |            1 | 
|    55 |            3 | 
|    56 |            4 | 
|    57 |            4 | 
|    58 |            2 | 
|    59 |            2 | 
|    60 |            4 | 
|    61 |            1 | 
|    63 |            2 | 
|    64 |            5 | 
|    65 |            2 | 
|    66 |            3 | 
|    67 |            5 | 
|    68 |            5 | 
------------------------

我在寻找什么

+------------+---------------+
| total      | count(total)  |
+------------+---------------+
|    30 - 40 |            23 | 
|    40 - 50 |            15 | 
|    50 - 60 |            51 | 
|    60 - 70 |            45 | 
------------------------------

我想这不能直接实现,但是对任何相关存储过程的引用也可以.

I guess this cannot be achieved in a straight forward manner but a reference to any related stored procedure would be fine as well.

推荐答案

这是一篇有关创建直方图的超级快捷方法的文章 在MySQL中获取数值.

This is a post about a super quick-and-dirty way to create a histogram in MySQL for numeric values.

还有多种其他方法可以创建更好的直方图,并且 使用CASE语句和其他类型的复杂逻辑可以更加灵活. 这种方法一次又一次地赢得了我,因为它是如此简单 针对每个用例进行修改,因此简洁明了.这就是你 做到这一点:

There are multiple other ways to create histograms that are better and more flexible, using CASE statements and other types of complex logic. This method wins me over time and time again since it's just so easy to modify for each use case, and so short and concise. This is how you do it:

SELECT ROUND(numeric_value, -2)    AS bucket,
       COUNT(*)                    AS COUNT,
       RPAD('', LN(COUNT(*)), '*') AS bar
FROM   my_table
GROUP  BY bucket;

只需将numeric_value更改为您的列即可,请更改 四舍五入的增量,仅此而已.我已经把酒吧放进去了 对数刻度,这样当您拥有时它们不会增长太多 大价值.

Just change numeric_value to whatever your column is, change the rounding increment, and that's it. I've made the bars to be in logarithmic scale, so that they don't grow too much when you have large values.

numeric_value应该在ROUNDing操作中基于舍入增量进行偏移,以确保第一个存储桶包含与后续存储桶一样多的元素.

numeric_value should be offset in the ROUNDing operation, based on the rounding increment, in order to ensure the first bucket contains as many elements as the following buckets.

例如如果使用ROUND(numeric_value,-1),则将范围[0,4](5个元素)中的numeric_value放置在第一个存储桶中,而将[5,14](10个元素)放置在第二个存储桶中,将[15,24]放置在第三个存储桶中,除非通过ROUND(numeric_value-5,-1)适当地对numeric_value进行偏移.

e.g. with ROUND(numeric_value,-1), numeric_value in range [0,4] (5 elements) will be placed in first bucket, while [5,14] (10 elements) in second, [15,24] in third, unless numeric_value is offset appropriately via ROUND(numeric_value - 5, -1).

这是对某些看起来很漂亮的随机数据进行这种查询的示例 甜的.足够好,可以快速评估数据.

This is an example of such query on some random data that looks pretty sweet. Good enough for a quick evaluation of the data.

+--------+----------+-----------------+
| bucket | count    | bar             |
+--------+----------+-----------------+
|   -500 |        1 |                 |
|   -400 |        2 | *               |
|   -300 |        2 | *               |
|   -200 |        9 | **              |
|   -100 |       52 | ****            |
|      0 |  5310766 | *************** |
|    100 |    20779 | **********      |
|    200 |     1865 | ********        |
|    300 |      527 | ******          |
|    400 |      170 | *****           |
|    500 |       79 | ****            |
|    600 |       63 | ****            |
|    700 |       35 | ****            |
|    800 |       14 | ***             |
|    900 |       15 | ***             |
|   1000 |        6 | **              |
|   1100 |        7 | **              |
|   1200 |        8 | **              |
|   1300 |        5 | **              |
|   1400 |        2 | *               |
|   1500 |        4 | *               |
+--------+----------+-----------------+

一些注意事项:不匹配的范围不会出现在计数中- 您在count列中将不会为零.另外,我正在使用 ROUND功能在这里.您可以轻松地将其替换为TRUNCATE 如果您觉得这对您更有意义.

Some notes: Ranges that have no match will not appear in the count - you will not have a zero in the count column. Also, I'm using the ROUND function here. You can just as easily replace it with TRUNCATE if you feel it makes more sense to you.

我在这里找到它 http://blog.shlomoid.com /2011/08/how-to-quickly-create-histogram-in.html

这篇关于获取直方图的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆