使用 SQL 查找协方差 [英] Finding Covariance using SQL

查看:86
本文介绍了使用 SQL 查找协方差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

# dt---------indx_nm1-----indx_val1-------indx_nm2------indx_val2
2009-06-08----ABQI------1001.2------------ACNACTR----------300.05
2009-06-09----ABQI------1002.12 ----------ACNACTR----------341.19
2009-06-10----ABQI------1011.4------------ACNACTR----------382.93
2009-06-11----ABQI------1015.43 ----------ACNACTR----------362.63

我有一个看起来像 ^ 的表(但有数百行的日期从 2009 年到 2013 年).有没有办法计算协方差: [(indx_val1 - avg(indx_val1)) * (indx_val2 - avg(indx_val2)] 除以 indx_val1indx_val2 的每个值的总行数(循环遍历整个表)并仅返回 cov(ABQI, ACNACTR)

I have a table that looks like ^ (but with hundreds of rows that dates from 2009 to 2013). Is there a way that I could calculate the covariance : [(indx_val1 - avg(indx_val1)) * (indx_val2 - avg(indx_val2)] divided by total number of rows for each value of indx_val1 and indx_val2 (loop through the entire table) and return just a simple value for cov(ABQI, ACNACTR)

推荐答案

由于您有在两个不同组上运行的聚合,您将需要两个不同的查询.主要的一组按 dt 获取每个日期的行值.另一个查询必须在 整个 行集上执行 AVG()COUNT() 聚合.

Since you have aggregates operating over two different groups, you will need two different queries. The main one groups by dt to get your row values per date. The other query has to perform AVG() and COUNT() aggregates across the whole rowset.

要同时使用它们,您需要将它们JOIN.但由于两个查询之间没有实际关系,因此它是笛卡尔积,我们将使用 CROSS JOIN.实际上,这将主查询的每一行与聚合查询检索到的单行连接起来.然后,您可以在 SELECT 列表中执行算术运算,使用两者中的值:

To use them both at the same time, you need to JOIN them together. But since there's no actual relation between the two queries, it is a cartesian product and we'll use a CROSS JOIN. Effectively, that joins every row of the main query with the single row retrieved by the aggregate query. You can then perform the arithmetic in the SELECT list, using values from both:

因此,以您之前问题中的查询为基础:

So, building on the query from your earlier question:

SELECT 
 indxs.*,
 ((indx_val2 - indx_val2_avg) * (indx_val1 - indx_val1_avg)) / total_rows AS cv
FROM (
    SELECT 
      dt,
      MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_nm ELSE NULL END) AS indx_nm1,
      MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1,
      MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_nm ELSE NULL END) AS indx_nm2,
      MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2
    FROM table1 a
    GROUP BY dt
  ) indxs 
  CROSS JOIN (
    /* Join against a query returning the AVG() and COUNT() across all rows */
    SELECT
      'ABQI' AS indx_nm1_aname,
      AVG(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1_avg,
      'ACNACTR' AS indx_nm2_aname,
      AVG(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2_avg,
      COUNT(*) AS total_rows
    FROM table1 b
    WHERE indx_nm IN ('ABQI','ACNACTR')
    /* And it is a cartesian product */
  ) aggs
WHERE
  indx_nm1 IS NOT NULL
  AND indx_nm2 IS NOT NULL
ORDER BY dt

这是一个演示,建立在您之前的演示之上:http://sqlfiddle.com/#!6/2ec65/14

Here's a demo, building on your earlier one: http://sqlfiddle.com/#!6/2ec65/14

这篇关于使用 SQL 查找协方差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆