SQL查询中的四分位数 [英] Quartiles in SQL query

查看:2379
本文介绍了SQL查询中的四分位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的表格,

I have a very simple table like that:

CREATE TABLE IF NOT EXISTS LuxLog (
  Sensor TINYINT,
  Lux INT,
  PRIMARY KEY(Sensor)
)

它包含来自不同传感器的数千条日志.

It contains thousands of logs from different sensors.

我希望所有传感器都具有Q1和Q3.

I would like to have Q1 and Q3 for all sensors.

我可以对每个数据进行一次查询,但是对所有传感器进行一次查询会更好(从一次查询中获取Q1和Q3)

I can do one query for every data, but it would be better for me to have one query for all sensors (getting Q1 and Q3 back from one query)

尽管这将是一个相当简单的操作,但由于四分位数被广泛使用并且是频率计算中的主要统计变量之一.事实是,我发现了很多过于复杂的解决方案,而我却希望找到一些简洁明了的东西.

I though it would be a fairly simple operation, as quartiles are broadly used and one of the main statistical variables in frequency calculation. The truth is that I found loads of overcomplicated solutions, while I was hoping to find something neat and simple.

任何人都可以给我提示吗?

Anyone can give me a hint?

这是我在网上找到的一段代码,但不适用于我:

This is a piece of code that I found online, but it is not working for me:

SELECT  SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(                 -- 1) make a sorted list of values
                Lux
                ORDER BY Lux
                SEPARATOR ','
            )
        ,   ','                           -- 2) cut at the comma
        ,   75/100 * COUNT(*)        --    at the position beyond the 90% portion
        )
    ,   ','                               -- 3) cut at the comma
    ,   -1                                --    right after the desired list entry
    )                 AS `75th Percentile`
    FROM    LuxLog
    WHERE   Sensor=12
    AND     Lux<>0

我得到1作为返回值,但它应该是可以除以10(10,20,30 ..... 1000)的数字

I am getting 1 as return value, while it should be a number that can be divided by 10 (10,20,30.....1000)

推荐答案

请参阅SqlFiddle: http ://sqlfiddle.com/#!9/accca6/2/6 注意:对于sqlfiddle,我已经生成了100行,介于1和100之间的每个整数都有一行,但这是随机顺序(在excel中完成).

See SqlFiddle : http://sqlfiddle.com/#!9/accca6/2/6 Note : for the sqlfiddle I've generated 100 rows, each integer between 1 and 100 has a row, but it is a random order (done in excel).

这是代码:

SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);
SET @quartile := (ROUND(@number_of_rows*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

SET @current_sensor := 101;
SET @quartile := (ROUND((SELECT COUNT(*) FROM LuxLog WHERE Sensor = @current_sensor)*0.25));
SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));
SET @sql_q3 := (CONCAT('( SELECT "Q3" AS quartile_name , Lux, Sensor FROM LuxLog WHERE Sensor=', @current_sensor,' ORDER BY Lux ASC LIMIT 1 OFFSET ', @quartile,');'));
SET @sql := (CONCAT(@sql_q1,' UNION ',@sql_q3));
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

基本推理如下: 对于四分位数1,我们希望从顶部获得25%,所以我们想知道有多少行,那就是:

Underlying reasoning is as follows : For quartile 1 we want to get 25% from the top so we want to know how much rows there are, that's :

SET @number_of_rows := (SELECT COUNT(*) FROM LuxLog);

现在我们知道行数,我们想知道它的25%,就是这一行:

Now that we know the number of rows, we want to know what is 25% of that, it is this line :

SET @quartile := (ROUND(@number_of_rows*0.25));

然后找到一个四分位数,我们要按Lux排序LuxLog表,然后获取行号"@quartile",为此,我们将OFFSET设置为@quartile表示要开始选择从行号@quartile开始,我们说限制1表示我们只想检索一行.那是:

Then to find a quartile we want to order the LuxLog table by Lux, then to get the row number "@quartile", in order to do that we set the OFFSET to @quartile to say that we want to start our select from the row number @quartile and we say limit 1 to say that we want to retrieve only one row. That's :

SET @sql_q1 := (CONCAT('(SELECT "Q1" AS quartile_name , Lux, Sensor FROM LuxLog ORDER BY Lux DESC LIMIT 1 OFFSET ', @quartile,')'));

对于其他四分位数,我们(几乎)执行相同的操作,但不是从顶部开始(从较高的值到较低的值),而是从底部开始(它解释了ASC).

We do (almost) the same for the other quartile, but rather than starting from the top (from higher values to lower) we start from the bottom (it explains the ASC).

但是现在我们只是将字符串存储在变量@ sql_q1和@ sql_q3中,因此将它们连接起来,我们合并查询结果,准备查询并执行它.

But for now we just have strings stored in the variables @sql_q1 and @sql_q3, so the concatenate them, we union the results of the queries, we prepare the query and execute it.

这篇关于SQL查询中的四分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆