从sqlserver中快速选择随机抽样 [英] Select random sampling from sqlserver quickly

查看:145
本文介绍了从sqlserver中快速选择随机抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大于1000万行的巨大表。我需要从它有效地抓取5000的随机抽样。我有一些constriants,减少我想找的总行数,像9 millon。

I have a huge table of > 10 million rows. I need to efficiently grab a random sampling of 5000 from it. I have some constriants that reduces the total rows I am looking for to like 9 millon.

我尝试使用order by NEWID(),但是该查询将需要很长时间,因为它需要对所有行进行表扫描。

I tried using order by NEWID(), but that query will take too long as it has to do a table scan of all rows.

有更快的方法吗?

推荐答案

如果您可以使用伪随机抽样,而且使用的是SQL Server 2005/2008, TABLESAMPLE。例如,来自SQL Server 2008 / AdventureWorks 2008的一个示例,它基于行:

If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows:

USE AdventureWorks2008; 
GO 


SELECT FirstName, LastName
FROM Person.Person 
TABLESAMPLE (100 ROWS)
WHERE EmailPromotion = 2;

捕获的是,TABLESAMPLE不是完全随机的,因为它从每个物理页。您可能无法回到正好5000行,除非你限制与TOP。如果你使用的是SQL Server 2000,你将不得不生成一个与主键匹配的临时表,否则你将不得不使用NEWID()方法。

The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().

这篇关于从sqlserver中快速选择随机抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆