PCI IN速度慢且具有群集索引 [英] PCI IN slow with cluster index

查看:110
本文介绍了PCI IN速度慢且具有群集索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用ASE SYBASE 15.7

We are working with ASE SYBASE 15.7

我们有一个庞大的历史表:7亿条记录和一个簇索引:c1,c2,c3,c4,带有ALLOW_DUP_ROWS

We have a huge historical table: 700 millions of records and one CLUSTER INDEX: c1,c2,c3,c4 with ALLOW_DUP_ROWS

每个月我们通过BCP IN将700,000条记录加载到表中 这些记录有很多重复的记录,因为字段c2和c3对于所有记录都具有相同的值 BCP IN的时间为6小时

Every month we load 700,000 records en the table via BCP IN These records had a lot of duplicate records because fields: c2 and c3 had the same value for all records The time of BCP IN was 6 hours

我们在月度表中更改了INSERT的逻辑,现在CLUSTER INDEX的所有字段:c1,c2,c3,c4都是不同的.

We changed the logical of INSERT's in the monthly table and now all of fields of CLUSTER INDEX: c1,c2,c3,c4 are diferents.

此更改后,BCP IN过程需要2个小时!

After this change, the BCP IN process takes 2 hours!

为什么现在的时间更短?我们不清楚原因

Why the time now is lower? We are not clear about the reasons

非常感谢!

推荐答案

我猜你的表的锁定方案是allpages.

I'm guessing your table's locking scheme is allpages.

如果allpages表上的聚集索引允许使用dup键/行,则dup键/行条目将保留在所谓的溢出页面中(而在较早的手册中,此

If a clustered index on a allpages table allows dup keys/rows, the dup key/row entries are maintained in what's known as overflow pages (while from an older manual, this info on overflow pages is still accurate).

每次将dup键/行插入表中时,都会从头到尾扫描整个溢出页面链,然后在链的末尾添加新条目.

Each time a dup key/row is inserted into the table, the entire chain of overflow pages is scanned from beginning to end, and then the new entry is added on the end of the chain.

您可以想象,随着该链越来越长,每次连续插入都会花费越来越多的时间.

As you can imagine, as this chain grows longer and longer, it takes more and more time for each successive insert to occur.

如果您有数据加载的历史记录,我想您会发现随着时间的推移,随着(重复)行数量的增加,加载时间也随之增加……这可能是由于扫描越来越多的溢出页面链所需的时间更长.

If you have a history of your data loads I'm guessing you'll find that over time, as the volume of (dup) rows grew, so did the load times ... and this would've been attributable to the greater amount of time required to scan ever increasing overflow page chains.

通过使新数据唯一",您可能消除了扫描溢出页面链的大部分开销.我之所以说可能",是因为您的帖子尚不清楚,是否某些新数据可能包含表中已经存在的键值(即,键在给定要插入的给定数据批次中可能是唯一的,但是键不是在表中不一定是唯一的.

By making the new data 'unique' you've likely eliminated most of the overhead for scanning overflow page chains. I say 'likely' because it's not clear from your post if some of the new data could contain key values that already exist in the table (ie, the keys may be unique within a given batch of data to be inserted, but the keys aren't necessarily unique within the table).

我通常建议客户端不要不要将非唯一的聚集索引放置在allpages锁定表上,完全是因为溢出页链处理的开销.

I usually recommend that clients do NOT place non-unique clustered indexes on allpages locked tables solely because of the overhead for overflow page chain processing.

可能是,如果消除了非唯一的聚集索引,则可能会看到数据加载运行得更快,例如:

Chances are that if you eliminated the non-unique clustered index then you may see your data loads run even quicker, eg:

  • 将表转换为数据行锁定;可比的带有重复行的CLUSTERED w/dup行不使用代价高昂的溢出页面链(尽管不再通过索引键按物理顺序维护新数据)
  • 使用非聚集索引删除/替换聚集索引;非聚集索引不使用代价高昂的溢出页面链

这篇关于PCI IN速度慢且具有群集索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆