在一个查询调用中计算同一表中多个列的中值 [英] Calculate medians for multiple columns in the same table in one query call

查看:123
本文介绍了在一个查询调用中计算同一表中多个列的中值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

StackOverflow to rescue !,我需要在一个查询调用中立即找到五列的中位数。



下面的中位数计算可以用于单列,但是当组合时,多次使用rownum会抛出查询。如何更新为多列工作?谢谢。它是创建一个网络工具,非营利组织可以将其财务指标与用户定义的对等组进行比较。

  SELECT t1_wages.totalwages_pctoftotexp AS median_totalwages_pctoftotexp 
FROM(

SELECT @rownum:= @rownum +1 AS`row_number`,d_wages.totalwages_pctoftotexp
FROM data_990_c3 d_wages,(

SELECT @ rownum:= 0
)r_wages
WHERE totalwages_pctoftotexp> 0
ORDER BY d_wages.totalwages_pctoftotexp
)AS t1_wages,(

SELECT COUNT(*) AS total_rows
FROM data_990_c3 d_wages
WHERE totalwages_pctoftotexp> 0
)AS t2_wages
WHERE 1
AND t1_wages.row_number = FLOOR(total_rows / 2)+1

--- [这是一个中位数,以下是另一个] ---

SELECT t1_solvent.solvent_days AS median_solvent_days
FROM(

SELECT @rownum:= @rownum +1 AS`row_number`,d_solvent.solvent_days
FROM data_990_c3 d_solvent,(

SELECT @rownum:= 0
)r_solvent
WHERE solvent_days> 0
ORDER BY d_solvent.solvent_days
)AS t1_solvent,(

SELECT COUNT(*)AS total_rows
FROM data_990_c3 d_solvent
WHERE solvent_days> 0
)AS t2_solvent
WHERE 1
AND t1_solvent.row_number = FLOOR(total_rows / 2)+1

[这两个 - 总共有五个我最终需要一次找到中位数]

解决方案

这种事情在MySQL的脖子上是一个很大的痛苦。如果您要执行此统计排名工作的吨位,您可能会明智地使用免费的Oracle Express Edition或PostgreSQL。它们都具有内置或可用作扩展的集合函数 MEDIAN(value)。这是一个小sqlfiddle演示。 http://sqlfiddle.com/#!4/53de8/6/0



但是你没有问过。



在MySQL中,您的基本问题是变量的范围像@rownum。你也有一个摆动的问题:就是说,你需要把查询的行转成列。



我们先来解决一下问题。你要做的是创建几个大胖子查询的联合。例如:

  SELECT'median_wages'AS标签,工资AS值
FROM(大胖查询使中位数工资) A
UNION
SELECT'median_volunteer_hours'AS标签,小时AS值
FROM(大胖查询使中位志愿小时)B
UNION
SELECT'median_solvent_days'AS标签,天AS价值
FROM(大胖查询中位偿付能力日)C

所以这里您在标签/值对表中的结果。您可以这样转动该表,以便在每列中获取一行值。

  SELECT SUM(CASE tag WHEN'中位数值$ TH $($)
SELECT SUM(CASE标签WHEN'median_volunteer_hours'THEN值ELSE 0 END
)AS median_volunteer_hours,
SELECT SUM(CASE标签WHEN'median_solvent_days'THEN值ELSE 0 END
)AS median_solvent_days
FROM(
/ *上述巨大的UNION查询* /
)Q

这就是将列(在这种情况下从UNION查询)转移到列的方式。这是一个关于这个主题的教程。 http://www.artfulsoftware.com/infotree/qrytip.php?id=523



现在我们需要解决中位数计算子查询。你的问题中的代码看起来不错。我没有你的数据,所以我很难评估它。



但是你需要避免重复使用@rownum变量。在您的一个查询中将@ rownum1称为@ rownum1,在下一个中将其称为@ rownum2,依此类推。这是一个dinky sql小提琴,只做其中之一。 http://sqlfiddle.com/#!2/2f770/1/0



现在让我们建立一下,做两个不同的中位数。这是小提琴 http://sqlfiddle.com/#!2/2f770/2/0 ,这里是UNION查询。 通知联合查询的下半部分使用 @ rownum2 而不是 @rownum 。 / p>

最后,这是完整的查询与转动。 http://sqlfiddle.com/#!2/2f770/13/0

  SELECT SUM(CASE tag WHEN'Boston'THEN value ELSE 0 END)AS Boston,
SUM(CASE tag WHEN'Bronx'THEN value ELSE 0 END)AS Bronx
FROM(
SELECT'Boston'AS标签,pop AS VALUE
FROM(
SELECT @rownum:= @rownum + 1 AS`row_number`,pop
FROM popps,
(SELECT @rownum:= 0)r
WHERE pop> 0 AND city ='Boston'
ORDER BY pop
)AS ordered_rows

SELECT COUNT(*)AS total_rows
FROM popps
WHERE pop> 0 AND city ='Boston'
)AS rowcount
WHERE ordered_rows.row_number = FLOOR(total_rows / 2)+1
UNION ALL
SELECT'Bronx'AS标签,pop AS VALUE
FROM(
SELECT @ rownum2:= @ rownum2 +1 AS`row_number`,pop
FROM popps
(SELECT @ rownum2:= 0)r
WHERE pop> 0 AND city ='Bronx'
ORDER BY pop
)AS ordered_rows,

SELECT COUNT(*)AS total_rows
FROM popps
WHERE pop> 0 AND city ='Bronx'
)AS rowcount
WHERE ordered_rows.row_number = FLOOR (total_rows / 2)+1
)D

这只是两个中位数。你需要五个我认为在单一查询中,这种中位数计算在MySQL中难以做到这一点很容易。


StackOverflow to the rescue!, I need to find the medians for five columns at once, in one query call.

The median calculations below work for single columns, but when combined, multiple uses of "rownum" throws the query off. How can I update this to work for multiple columns? THANK YOU. It's to create a web tool where nonprofits can compare their financial metrics to user-defined peer groups.

SELECT t1_wages.totalwages_pctoftotexp AS median_totalwages_pctoftotexp
FROM (

SELECT @rownum := @rownum +1 AS  `row_number` , d_wages.totalwages_pctoftotexp
FROM data_990_c3 d_wages, (

SELECT @rownum :=0
)r_wages
WHERE totalwages_pctoftotexp >0
ORDER BY d_wages.totalwages_pctoftotexp
) AS t1_wages, (

SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_wages
WHERE totalwages_pctoftotexp >0
) AS t2_wages
WHERE 1 
AND t1_wages.row_number = FLOOR( total_rows /2 ) +1

--- [that was one median, below is another] ---

SELECT t1_solvent.solvent_days AS median_solvent_days
FROM (

SELECT @rownum := @rownum +1 AS  `row_number` , d_solvent.solvent_days
FROM data_990_c3 d_solvent, (

SELECT @rownum :=0
)r_solvent
WHERE solvent_days >0
ORDER BY d_solvent.solvent_days
) AS t1_solvent, (

SELECT COUNT( * ) AS total_rows
FROM data_990_c3 d_solvent
WHERE solvent_days >0
) AS t2_solvent
WHERE 1 
AND t1_solvent.row_number = FLOOR( total_rows /2 ) +1

[those are two - there are five in total I'll eventually need to find medians for at once]

解决方案

This kind of thing is a big pain in the neck in MySQL. You might be wise to use the free Oracle Express Edition or postgreSQL if you're going to do tonnage of this statistical ranking work. They all have MEDIAN(value) aggregate functions that are either built-in or available as extensions. Here's a little sqlfiddle demonstrating that. http://sqlfiddle.com/#!4/53de8/6/0

But you didn't ask about that.

In MySQL, your basic problem is the scope of variables like @rownum. You also have a pivoting problem: that is, you need to turn rows of your query into columns.

Let's tackle the pivot problem first. What you're going to do is create a union of several big fat queries. For example:

SELECT 'median_wages' AS tag, wages AS value
  FROM (big fat query making median wages) A
 UNION
SELECT 'median_volunteer_hours' AS tag, hours AS value
  FROM (big fat query making median volunteer hours) B
 UNION
SELECT 'median_solvent_days' AS tag, days AS value
  FROM (big fat query making median solvency days) C

So here are your results in a table of tag / value pairs. You can pivot that table like so, to get one row with a value in each column.

SELECT SUM( CASE tag WHEN 'median_wages' THEN value ELSE 0 END 
          ) AS median_wages, 
SELECT SUM( CASE tag WHEN 'median_volunteer_hours' THEN value ELSE 0 END
          ) AS median_volunteer_hours, 
SELECT SUM( CASE tag WHEN 'median_solvent_days' THEN value ELSE 0 END 
          ) AS median_solvent_days
FROM (
    /* the above gigantic UNION query */
 ) Q

That's how you pivot up rows (from the UNION query in this case) to columns. Here's a tutorial on the topic. http://www.artfulsoftware.com/infotree/qrytip.php?id=523

Now we need to tackle the median-computing subqueries. The code in your question looks pretty good. I don't have your data so it's hard for me to evaluate it.

But you need to avoid reusing the @rownum variable. Call it @rownum1 in one of your queries, @rownum2 in the next one, and so on. Here's a dinky sql fiddle doing just one of these. http://sqlfiddle.com/#!2/2f770/1/0

Now let's build it up a bit, doing two different medians. Here's the fiddle http://sqlfiddle.com/#!2/2f770/2/0 and here's the UNION query. Notice the second half of the union query uses @rownum2 instead of @rownum.

Finally, here's the full query with the pivoting. http://sqlfiddle.com/#!2/2f770/13/0

 SELECT SUM( CASE tag WHEN 'Boston' THEN value ELSE 0 END ) AS Boston,
           SUM( CASE tag WHEN 'Bronx' THEN value ELSE 0 END ) AS Bronx   
   FROM (
 SELECT 'Boston' AS tag, pop AS VALUE
  FROM (
        SELECT @rownum := @rownum +1 AS  `row_number` , pop
          FROM pops, 
        (SELECT @rownum :=0)r
          WHERE pop >0 AND city = 'Boston'
          ORDER BY pop
        ) AS ordered_rows, 
        ( 
         SELECT COUNT( * ) AS total_rows
           FROM pops
          WHERE pop >0 AND city = 'Boston'
        ) AS rowcount
  WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
  UNION ALL
 SELECT 'Bronx' AS tag, pop AS VALUE
  FROM (
        SELECT @rownum2 := @rownum2 +1 AS  `row_number` , pop
          FROM pops, 
        (SELECT @rownum2 :=0)r
          WHERE pop >0 AND city = 'Bronx'
          ORDER BY pop
        ) AS ordered_rows, 
        ( 
         SELECT COUNT( * ) AS total_rows
           FROM pops
          WHERE pop >0 AND city = 'Bronx'
        ) AS rowcount
  WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1
) D

This is just two medians. You need five. I think it's easy to make the case that this median computation is absurdly difficult to do in MySQL in a single query.

这篇关于在一个查询调用中计算同一表中多个列的中值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆