非标识列上的集群索引以加快批量插入? [英] Clustered indexes on non-identity columns to speed up bulk inserts?

查看:166
本文介绍了非标识列上的集群索引以加快批量插入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的两个问题是:


  • 我可以使用聚集索引加速
    批量插入大表吗? li>
  • 如果我的
    IDENTITY列不是群集的
    索引,我还能有效地使用
    外键关系吗?

要详细说明,我有一个数据库,其中包含两个非常大的(包含100-1000 mn行)包含公司数据的表。通常在这样的表中有大约20-40个公司的数据,每个作为由CompanyIdentifier(INT)标记的它们自己的块。此外,每个公司都有大约20个部门,每个部门都有自己的subchunk标记为DepartmentIdentifier(INT)。

To elaborate, I have a database with a couple of very big (between 100-1000 mln rows) tables containing company data. Typically there is data about 20-40 companies in such a table, each as their own "chunk" marked by "CompanyIdentifier" (INT). Also, every company has about 20 departments, each with their own "subchunk" marked by "DepartmentIdentifier" (INT).

经常发生一个整体 subchunk从表中添加或删除。我的第一个想法是对这些块使用表分区,但由于我使用SQL Server 2008标准版,我没有权利。仍然,大多数查询我执行的块或子块,而不是作为一个整体的表。

It frequently happens that a whole "chunk" or "subchunk" is added or removed from the table. My first thought was to use Table Partitioning on those chunks, but since I am using SQL Server 2008 Standard Edition I am not entitled to it. Still, most queries I have are executed on a "chunk" or "subchunk" rather than on the table as a whole.

我一直在努力优化这些表以下函数:

I have been working to optimize these tables for the following functions:


  1. 在子块上运行的查询

  2. 基准化

对于1 )和2)我没有遇到很多问题。我已经创建了几个索引关键字段(也包含CompanyIdentifier和DepartmentIdentifier在有用的)和查询运行良好。

For 1) and 2) I haven't encountered a lot of problems. I have created several indexes on key fields (also containing CompanyIdentifier and DepartmentIdentifier where useful) and the queries are running fine.

但对于3)我一直在努力找到一个很好的解决方案。
我的第一个策略是总是禁用索引,批量插入一个大块并重建索引。这在开始时非常快,但现在数据库中有很多公司,每次重建索引需要很长时间。

But for 3) I have struggled to find a good solution. My first strategy was to always disable indexes, bulk insert a big chunk and rebuild indexes. This was very fast in the beginning, but now that there are a lot of companies in the database, it takes a very long time to rebuild the index each time.

我的策略已经改变为只插入索引,因为现在似乎更快了。但我想要进一步优化插入速度。

At the moment my strategy has changed to just leaving the index on while inserting, since this seems to be faster now. But I want to optimize the insert speed even further.

我似乎已经注意到,通过添加在CompanyIdentifier + DepartmentIdentifier上定义的聚集索引,将新的chunks加载到表中更快。在我放弃了这个策略以支持在IDENTITY列上添加聚集索引之前,几个文章指出,聚集索引包含在所有其他索引中,因此聚集索引应该尽可能小。但现在我正在考虑恢复这个老策略加快插入。我的问题,这是明智的,还是我会在其他领域遭受性能打击?这将真的加快我的插入或只是我的想象力?

I seem to have noticed that by adding a clustered index defined on CompanyIdentifier + DepartmentIdentifier, the loading of new "chunks" into the table is faster. Before I had abandoned this strategy in favour of adding a clustered index on an IDENTITY column, as several articles pointed out to me that the clustered index is contained in all other indexes and so the clustered index should be as small as possible. But now I am thinking of reviving this old strategy to speed up the inserts. My question, would this be wise, or will I suffer performance hits in other areas? And will this really speed up my inserts or is that just my imagination?

我也不确定在我的情况下,IDENTITY列是否真的需要。我想能够与其他表建立外键关系,但我也可以使用像CompanyIdentifier + DepartmentIdentifier + [uniquifier]方案吗?

I am also not sure whether in my case an IDENTITY column is really needed. I would like to be able to establish foreign key relationships with other tables, but can I also use something like a CompanyIdentifier+DepartmentIdentifier+[uniquifier] scheme for that? Or does it have to be a table-wide, fragmented IDENTITY number?

非常感谢任何建议或解释。

Thanks a lot for any suggestions or explanations.

推荐答案

好吧,我把它放到测试中,将聚簇索引放在两个块定义列上可以提高我的表的性能。

Well, I've put it to the test, and putting a clustered index on the two "chunk-defining" columns increases the performance of my table.

与我拥有集群IDENTITY密钥的情况相比,插入一个块现在相对较快,而且我没有任何聚集索引时快。删除一个块比使用或不使用聚簇索引都快。

Inserting a chunk is now relatively fast compared to the situation where I had a clustered IDENTITY key, and about as fast as when I did not have any clustered index. Deleting a chunk is faster than with or without clustered index.

我认为所有要删除或插入的记录都保证一定在一定的硬盘的一部分使表格更快 - 对我来说似乎合乎逻辑。

I think the fact that all the records I want to delete or insert are guaranteed to be all together on a certain part of the harddisk makes the tables faster - it would seem logical to me.

:经过一年的设计经验,我可以说,对于这种工作方式,有必要安排定期重建所有的索引(我们每周做一次)。否则,索引很快就会碎片化并且性能丢失。然而,我们正在迁移到使用分区表的新数据库设计,这在基本上更好 - 除了企业服务器许可证成本,但我们现在已经忘记了。至少我有。

Update: After a year of experience with this design I can say that for this approach to work, it is necessary to schedule regular rebuilding of all the indexes (we do it once a week). Otherwise, the indexes become fragmented very soon and performance is lost. Nevertheless, we are in a process of migration to a new database design with partitioned tables, which is basically better in every way - except for the Enterprise Server license cost, but we've already forgotten about it by now. At least I have.

这篇关于非标识列上的集群索引以加快批量插入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆