SQL Server 中对大型数据集的慢速不同查询 [英] Slow distinct query in SQL Server over large dataset

查看:26
本文介绍了SQL Server 中对大型数据集的慢速不同查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用 SQL Server 2005 来跟踪大量不断传入的数据(每秒 5-15 次更新).我们注意到,在它投入生产几个月后,其中一个表开始花费大量时间进行查询.

We're using SQL Server 2005 to track a fair amount of constantly incoming data (5-15 updates per second). We noticed after it has been in production for a couple months that one of the tables has started to take an obscene amount of time to query.

该表有 3 列:

  • id -- 自动编号(集群)
  • typeUUID -- 在插入之前生成的 GUID;用于将类型组合在一起
  • typeName -- 类型名称(废话...)
  • id -- autonumber (clustered)
  • typeUUID -- GUID generated before the insert happens; used to group the types together
  • typeName -- The type name (duh...)

我们运行的其中一个查询在 typeName 字段上是不同的:

One of the queries we run is a distinct on the typeName field:

SELECT DISTINCT [typeName] FROM [types] WITH (nolock);

typeName 字段上有一个非聚集的、非唯一的升序索引.该表目前包含大约 200M 条记录.当我们运行这个查询时,查询用了 5m 58s 返回!也许我们不了解索引是如何工作的……但我认为我们对它们的没有太多误解.

The typeName field has a non-clusted, non-unique ascending index on it. The table contains approximately 200M records at the moment. When we run this query, the query took 5m 58s to return! Perhaps we're not understanding how the indexes work... But I didn't think we mis-understood them that much.

为了进一步测试,我们运行了以下查询:

To test this a little further, we ran the following query:

SELECT DISTINCT [typeName] FROM (SELECT TOP 1000000 [typeName] FROM [types] WITH (nolock)) AS [subtbl]

这个查询在大约 10 秒后返回,正如我所料,它正在扫描表.

This query returns in about 10 seconds, as I would expect, it's scanning the table.

我们在这里遗漏了什么吗?为什么第一个查询需要这么长时间?

Is there something we're missing here? Why does the first query take so long?

啊,抱歉,第一个查询返回 76 条记录,谢谢九边.

Ah, my apologies, the first query returns 76 records, thank you ninesided.

跟进:谢谢大家的回答,现在对我来说更有意义(我不知道为什么以前没有......).没有索引,它在 200M 行上做一个表扫描,有一个索引,它在 200M 行上做一个索引扫描......

Follow up: Thank you all for your answers, it makes more sense to me now (I don't know why it didn't before...). Without an index, it's doing a table scan across 200M rows, with an index, it's doing an index scan across 200M rows...

SQL Server 确实更喜欢索引,它确实带来了一点性能提升,但没什么值得兴奋的.重建索引确实将查询时间缩短到 3m 多一点,而不是 6m,这是一个改进,但还不够.我只是要向我的老板推荐我们规范化表结构.

SQL Server does prefer the index, and it does give a little bit of a performance boost, but nothing to be excited about. Rebuilding the index did take the query time down to just over 3m instead of 6m, an improvement, but not enough. I'm just going to recommend to my boss that we normalize the table structure.

再次感谢大家的帮助!!

Once again, thank you all for your help!!

推荐答案

你确实误解了索引.即使它确实使用了索引,它仍然会跨 200M 条目进行索引扫描.这将需要很长时间,加上执行 DISTINCT(导致排序)所需的时间,而且运行起来很糟糕.在查询中看到 DISTINCT 总是会引发危险信号,并导致我仔细检查查询.在这种情况下,您可能遇到了规范化问题?

You do misunderstand the index. Even if it did use the index it would still do an index scan across 200M entries. This is going to take a long time, plus the time it takes to do the DISTINCT (causes a sort) and it's a bad thing to run. Seeing a DISTINCT in a query always raises a red flag and causes me to double check the query. In this case, perhaps you have a normalization issue?

这篇关于SQL Server 中对大型数据集的慢速不同查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆