将集合拆分为不均匀百分比的桶 [英] Split set into uneven percentage buckets

查看:56
本文介绍了将集合拆分为不均匀百分比的桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每天我都会收到一组 x 行(在 5 到 2000 之间).

Every day I am returned a set of x rows (between 5 and 2000).

我需要根据规则更新此集中的列.我认为这个(不完全有效)示例证明了这一点

I need to update a column from this set based on rules. I think this (not exactly working) example demonstrates this

/* 
    35% a
    25% b
    30% c
    10% null
*/

WITH tally
(vals, updateThis, bucket)
AS
(
    SELECT
         DATEADD(DAY, - ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), GETDATE())
        , NULL
        , NTILE(100) OVER (ORDER BY (SELECT NULL))
    FROM
    (
        VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS a(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS b(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS c(n)
    )
--UPDATE
    --SET updateThis
, updated
AS
(
    SELECT
     t.vals
    , CASE
        WHEN t.bucket <= 35 THEN 'a'
        WHEN t.bucket > 35 AND t.bucket <=60 THEN 'b'
        WHEN t.bucket > 60 AND t.bucket <=90 THEN 'c'
        WHEN t.bucket > 60 AND t.bucket <=90 THEN 'NULL'
    END AS updated
    , t.bucket
    FROM tally t
)
SELECT 
    U.updated
    , COUNT(1) AS actual
FROM 
updated u
GROUP BY U.updated

此解决方案并不精确,即使 a + b + c 确实占 100%,它也可能不会更新所有行.它也不适用于小于 100 行的集合.

this solution is not precise and it might not update all the rows even if a + b + c did make up 100%. Also It wouldn't work for sets smaller than 100 rows.

我目前的工作解决方案是:

My current working solution is:

  • 计算总行数
  • 计算实际需要的行数(CEILING((@totalRows * ratio)/100)
  • 在 WHILE LOOP 中更新最终集合,选择当前值和所需的行.

是否有更好的基于集合的解决方案可以帮助我摆脱循环?

Is there a better - set based solution that would help me get rid of the loop?

推荐答案

不知道,如果我理解正确...

Don't know, if I get this correctly...

首先这里似乎有一个相当明显的错误:

First of all there seems to be a rather obvious mistake here:

    WHEN t.bucket > 60 AND t.bucket <=90 THEN 'NULL'

不应该是这样的:

    WHEN t.bucket >90 THEN 'NULL'

函数 NTILE 会将您的集合分散到相当均匀的桶中.检查我的输出并找出它在极端情况下的表现.我建议每行使用计算百分比,如下所示:

The function NTILE will spread your sets into rather even buckets. Check my output and find how this behaves in the corner-cases. I suggest to use a computed percentage per row like here:

WITH tally
(vals, bucket)
AS
(
    SELECT
         DATEADD(DAY, - ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), GETDATE())
        ,NTILE(100) OVER (ORDER BY (SELECT NULL))
    FROM
    (
        VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS a(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS b(n)
        CROSS JOIN (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS c(n)
    )
SELECT *
INTO #tmpBuckets
FROM Tally;

--我使用这个#tmpBuckets-table 来更接近您的我有一张桌子 场景

--I use this #tmpBuckets-table to get closer to your I have a table scenario

WITH Numbered AS
(
    SELECT *
          ,ROW_NUMBER() OVER(ORDER BY vals DESC) / ((SELECT COUNT(*) FROM #tmpBuckets)/100.0)  AS RunningPercentage
    FROM #tmpBuckets
)
,ComputeBuckets AS
(
    SELECT
     t.*
    , CASE
        WHEN t.RunningPercentage <= 35 THEN 'a'
        WHEN t.RunningPercentage > 35 AND t.RunningPercentage <=60 THEN 'b'
        WHEN t.RunningPercentage > 60 AND t.RunningPercentage <=90 THEN 'c'
        WHEN t.RunningPercentage >90  THEN 'NULL'
    END AS ShnugoMethod
    , CASE
        WHEN t.bucket <= 35 THEN 'a'
        WHEN t.bucket > 35 AND t.RunningPercentage <=60 THEN 'b'
        WHEN t.bucket > 60 AND t.RunningPercentage <=90 THEN 'c'
        WHEN t.bucket > 90  THEN 'NULL'
    END AS ZikatoMethod
    FROM Numbered t
)
SELECT cb.*
FROM ComputeBuckets cb
ORDER BY cb.vals DESC

GO
DROP TABLE #tmpBuckets;

我想你知道,如何使用这样的 cte 来更新源表.否则就再问一个问题:-)

I think you know, how to use such a cte to update the source table. Otherwise just come back with another question :-)

这篇关于将集合拆分为不均匀百分比的桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆