SQL Server 2008 分区表和并行性 [英] SQL Server 2008 Partitioned Table and Parallelism

查看:42
本文介绍了SQL Server 2008 分区表和并行性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的公司正在迁移到 SQL Server 2008 R2.我们有一个包含大量存档数据的表格.大多数使用此表的查询在 where 语句中使用 DateTime 值.例如:

My company is moving to SQL Server 2008 R2. We have a table with tons of archive data. Majority of the queries that uses this table employs DateTime value in the where statement. For example:

查询 1

SELECT COUNT(*) 
FROM TableA 
WHERE 
     CreatedDate > '1/5/2010' 
     and CreatedDate < '6/20/2010'  

我假设分区是在 CreatedDate 上创建的,每个分区分布在多个驱动器上,我们有 8 个 CPU,数据库中有 5 亿条记录均匀分布在从 1/1/2008 至 2/24/2011(38 个分区).这些数据也可以分为一年的几个季度或其他时间段,但让假设保持在几个月内.

I'm making the assumption that partitions are created on CreatedDate and each partition is spread out across multiple drives, we have 8 CPUs, and there are 500 million records in the database that are evenly spread out across the dates from 1/1/2008 to 2/24/2011 (38 partitions). This data could also be portioned in to quarters of a year or other time durations, but lets keep the assumptions to months.

在这种情况下,我认为将使用 8 个 CPU,并且只会查询 6 个分区以获取 1/5/2010 和 6/20/2010 之间的日期.

In this case I would believe that the 8 CPU's would be utilized, and only the 6 partitions would be queried for dates between 1/5/2010 and 6/20/2010.

现在,如果我运行以下查询并且我的假设与上述相同.

Now what if I ran the following query and my assumptions are the same as above.

查询 2

SELECT COUNT(*) 
FROM TableA 
WHERE State = 'Colorado'

有问题吗?
1. 会查询所有分区吗?是的
2. 是否会使用所有 8 个 CPU 来执行查询?是的
3. 性能会比查询未分区的表更好吗?是的
4. 还有什么我遗漏的吗?
5. 分区索引有什么帮助?

Questions?
1. Will all partitions be queried? Yes
2. Will all 8 CPUs be used to execute the query? Yes
3. Will performance be better than querying a table that is not partitoned? Yes
4. Is there anything else I'm missing?
5. How would Partition Index help?

基于我对 SQL Server 2008 分区表的有限了解,我回答了上面的前 3 个问题.并行性.但是,如果我的回答不正确,您能否提供反馈,说明我为什么不正确.

I answer the first 3 questions above, base on my limited knowledge of SQL Server 2008 Partitioned Table & Parallelism. But if my answers are incorrect, can you provide feedback any why I'm incorrect.

资源:

  • Video: Demo SQL Server 2008 Partitioned Table Parallelism (5 minutes long)
  • MSDN: Partitioned Tables and Indexes
  • MSDN: Designing Partitions to Manage Subsets of Data
  • MSDN: Query Processing Enhancements on Partitioned Tables and Indexes
  • MSDN: Word Doc: Partitioned Table and Index Strategies Using SQL Server 2008 white paper

BarDev

推荐答案

分区可以提高性能——我已经见过很多次了.开发分区的原因是性能,尤其是对于插入.下面是一个真实世界的例子:

Partitioning can increase performance--I have seen it many times. The reason partitioning was developed was and is performance, especially for inserts. Here is an example from the real world:

据我们所知,我在 SAN 上有多个表,并带有一个大的 ole 喇叭磁盘.SAN 管理员坚持认为 SAN 无所不知,因此不会优化数据分布.分区可能有什么帮助?事实:确实如此.

I have multiple tables on a SAN with one big ole honking disk as far as we can tell. The SAN administrators insist that the SAN knows all so will not optimize the distribution of data. How can a partition possibly help? Fact: it did and does.

我们使用相同的方案 (FileID%200) 对多个表进行了分区,其中 200 个分区全部位于主分区上.如果拥有分区方案的唯一原因是为了交换",那有什么用?没有,但分区的目的是性能.你看,每个分区都有自己的分页方案.我可以一次将数据写入所有这些,并且不会出现死锁.页面不能被锁定,因为每个写入过程都有一个唯一的 ID,它等同于一个分区.200 个分区将性能提高了 2000 倍(事实),死锁从每小时 7500 次下降到每天 3-4 次.原因很简单,页面锁升级总是在大量数据和高容量 OLTP 系统中发生,页面锁是导致死锁的原因.分区,即使在同一个卷和文件组上,也会将分区数据放在不同的页面上,锁升级没有任何影响,因为进程不会尝试访问相同的页面.

We partitioned multiple tables using the same scheme (FileID%200) with 200 partitions ALL on primary. What use would that be if the only reason to have a partitioning scheme is for "swapping"? None, but the purpose of partitioning is performance. You see, each of those partitions has its own paging scheme. I can write data to all of them at once and there is no possibility of a deadlock. The pages cannot be locked because each writing process has an unique ID that equates to a partition. 200 partitions increased performance 2000x (fact) and deadlocks dropped from 7500 per hour to 3-4 per day. This for the simple reason that page lock escalation always occurs with large amounts of data and a high volume OLTP system and page locks are what cause deadlocks. Partitioning, even on the same volume and file group, places the partitioned data on different pages and lock escalation has no effect since processes are not attempting to access the same pages.

选择数据有好处,但没有那么大.但通常分区方案的开发会考虑到数据库的目的.我敢打赌 Remus 开发他的方案时考虑的是增量加载(例如每日加载)而不是事务处理.现在,如果经常选择带有锁定(已提交读)的行,那么如果进程试图同时访问同一页,就会导致死锁.

THe benefit is there, but not as great, for selecting data. But typically the partitioning scheme would be developed with the purpose of the DB in mind. I am betting Remus developed his scheme with incremental loading (such as daily loads) rather than transactional processing in mind. Now if one were frequently selecting rows with locking (read committed) then deadlocks could result if processes attempted to access the same page simultaneously.

但 Remus 是对的——在你的例子中,我认为没有任何好处,实际上在跨不同分区查找行时可能会有一些开销成本.

But Remus is right--in your example I see no benefit, in fact there may be some overhead cost in finding the rows across different partitions.

这篇关于SQL Server 2008 分区表和并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆