使用带有 CTE 的 NEWID() 生成随机的行子集会产生奇怪的结果 [英] Using NEWID() with CTE to produce random subset of rows produces odd results

查看:34
本文介绍了使用带有 CTE 的 NEWID() 生成随机的行子集会产生奇怪的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在存储过程中编写一些 SQL,以将数据集减少到我想要报告的有限随机行数.

I'm writing some SQL in a stored procedure to reduce a dataset to a limited random number of rows that I want to report on.

报告以 UsersGroup 开头,并应用过滤器来指定所需的随机行总数 (@SampleLimit).

The report starts with a Group of Users and a filter is applied to specify the total number of random rows required (@SampleLimit).

为了达到预期的结果,我首先创建一个 CTE(临时表):

To achieve the desired result, I start by creating a CTE (temp table) with:

  • 应用了 top(@SampleLimit)
  • 按 UserId 分组(因为 UserID 出现多次)
  • order by NEWID() 将结果随机排列
  • The top(@SampleLimit) applied
  • group by UserId (as the UserID appears multiple times)
  • order by NEWID() to put the results in a random order

SQL:

; with cte_temp as 
       (select top(@SampleLimit) UserId from QueryResults 
        where (GroupId = @GroupId)
        group by UserId order by NEWID()) 

获得此结果集后,我会删除 UserId 为 NOT IN 上一步中创建的 CTE 的所有结果.

Once I have this result set, I then delete any results where the UserId is NOT IN the CTE created in the previous step.

delete QueryResults 
where (GroupId = @GroupId) and (UserId not in(select UserId from cte_temp))

我遇到的问题是,有时我得到的结果多于 @SampleLimit 中指定的结果,而其他时候它完全按预期工作.

The issue that I'm having is that from time to time, I get more results than specified in the @SampleLimit and other times it works exactly as expected.

我已尝试分解 SQL 并在应用程序外部执行它,但无法重现该问题.

I've tried breaking up the SQL and executing it outside the application and I cannot reproduce the issue.

我所做的事情是否存在根本性的错误,可以解释为什么我偶尔会得到我要求的更多结果?

Is there anything fundamentally wrong with what I am doing that could explain why I occasionally get more results that I request?

为了完整性 - 我根据以下答案重构了解决方案:

select top(@SampleLimit) UserId into #T1
from  QueryResults
where (GroupId = @GroupId)
group by UserId
order by NEWID() 

delete QueryResults 
where (GroupId = @GroupId) and (UserId not in(select UserId from #T1))

推荐答案

涉及NEWID()SELECT语句会被执行多少次是不确定的.

It is undeterministic how many times the SELECT statement involving NEWID() will be executed.

如果您在 QueryResultscte_temp 之间得到一个嵌套循环反半连接,并且计划中没有假脱机,它可能会被重新评估多次是 QueryResults 中的行,这意味着对于每个外部行,与 NOT IN 进行比较的集合可能完全不同.

If you get a nested loops anti semi join between QueryResults and cte_temp and there is no spool in the plan it will likely be re-evaluated as many times as there are rows in QueryResults this means that for each outer row the set that is being compared against with NOT IN may be entirely different.

您可以将结果具体化到一个临时表中来避免这种情况,而不是使用 CTE.

Instead of using a CTE you can materialize the results into a temporary table to avoid this.

INSERT INTO #T
SELECT TOP(@SampleLimit) UserId
FROM   QueryResults
WHERE  ( GroupId = @GroupId )
GROUP  BY UserId
ORDER  BY NEWID() 

然后在 DELETE

这篇关于使用带有 CTE 的 NEWID() 生成随机的行子集会产生奇怪的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆