T-SQL 中的加权平均值(如 Excel 的 SUMPRODUCT) [英] Weighted average in T-SQL (like Excel's SUMPRODUCT)

查看:42
本文介绍了T-SQL 中的加权平均值(如 Excel 的 SUMPRODUCT)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法,从具有相同列数的两行数据中得出加权平均值,其中平均值如下(借用 Excel 表示法):

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):

(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)

第一部分反映了与 Excel 的 SUMPRODUCT() 函数相同的功能.

The first part reflects the same functionality as Excel's SUMPRODUCT() function.

我的问题是我需要动态指定哪一行被加权平均,权重来自哪一行,以及一个日期范围.

My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.

这比我想象的要容易,因为 Excel 让我觉得我需要某种支点.到目前为止,我的解决方案是:

This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:

select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Weighty'
) baseSeries inner join (       
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Tons Milled'   
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate

推荐答案

Quassnoi 的回答显示了如何执行 SumProduct,并且使用 WHERE 子句将允许您通过日期字段进行限制...

Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...

SELECT
   SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
   [tbl]
WHERE
   [tbl].date >= '2009 Jan 01'
   AND [tbl].date < '2010 Jan 01'

更复杂的部分是您要动态指定"什么字段是 [data],什么字段是 [weight].简短的回答是,实际上您必须使用动态 SQL.类似以下内容:
- 创建一个字符串模板
- 用适当的数据字段替换 [tbl].data 的所有实例
- 用适当的权重字段替换 [tbl].weight 的所有实例
- 执行字符串

The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string

但是,动态 SQL 有其自身的开销.是查询比较少,还是查询本身的执行时间比较长,这个可能没有关系.但是,如果它们很常见且很短,您可能会注意到使用动态 sql 会带来显着的开销.(更不用说小心SQL注入攻击等)

Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)

在您最近的示例中,您突出显示了三个字段:

In your lastest example you highlight three fields:

  • 记录日期
  • 关键绩效指标
  • 实际

当 [KPI] 为Weight Y"时,则 [Actual] 要使用的权重系数.
当 [KPI] 为Tons Milled"时,[Actual] 是您要聚合的数据.

When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.


我的一些问题是:

  • 还有其他字段吗?
  • 每个 KPI 每个日期是否只有一个实际值?

我问的原因是你想确保你所做的 JOIN 只是 1:1.(你不希望 5 个 Actuals 加入 5 个权重,给出 25 个结果记录)

The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)

无论如何,您的查询的稍微简化当然是可能的...

Regardless, a slight simplification of your query is certainly possible...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
   CalcProductionRecords AS [baseSeries]
INNER JOIN
   CalcProductionRecords AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate
--    AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
   [baseSeries].KPI = 'Tons Milled'
   AND [weightSeries].KPI = 'Weighty'

仅当您需要额外的谓词以确保数据和权重之间的 1:1 关系时才需要注释掉的行.

The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.


如果您不能保证每个日期只有一个值,并且没有任何其他字段要加入,您可以稍微修改基于 sub_query 的版本...

If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
    SELECT
        RecordDate,
        SUM(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Tons Milled'
    GROUP BY
        RecordDate
)
   AS [baseSeries]
INNER JOIN
(
    SELECT
        RecordDate,
        AVG(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Weighty'
    GROUP BY
        RecordDate
)
   AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate

如果同一天有多个权重,则假设权重的 AVG 有效.

This assumes the AVG of the weight is valid if there are multiple weights for the same day.


有人投了这个票,所以我想我会改进最终答案:)

EDIT : Someone just voted for this so I thought I'd improve the final answer :)

SELECT
   SUM(Actual * Weight) / SUM(Weight)
FROM
(
    SELECT
        RecordDate,
        SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END)   AS Actual,
        AVG(CASE WHEN KPI = 'Weighty'     THEN Actual ELSE NULL END)   AS Weight
    FROM
        CalcProductionRecords
    WHERE
        KPI IN ('Tons Milled', 'Weighty')
    GROUP BY
        RecordDate
)
   AS pivotAggregate

这避免了 JOIN 并且也只扫描表一次.

This avoids the JOIN and also only scans the table once.

它依赖于在计算 AVG() 时忽略 NULL 值的事实.

It relies on the fact that NULL values are ignored when calculating the AVG().

这篇关于T-SQL 中的加权平均值(如 Excel 的 SUMPRODUCT)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆