分区数据库中的性能问题 [英] Performance issue in a partitioned database

查看:56
本文介绍了分区数据库中的性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在开发

数据库中,ETL应用程序在14秒内加载了大约3000行,而在UAT数据库中加载需要2个小时。


UAT db是分区的。

Dev db没有被分区。


应用程序查找表中的现有行...如果已经

存在,然后更新,否则插入它们。


表非常大,大约650万行。

由于RI,它正在查找进入可能位于不同节点的父表中。


此问题已将批处理窗口从2小时增加到4小时这是

当然不能接受。

如何解决这个性能问题?


干杯,

San。

The ETL application loaded around 3000 rows in 14 seconds in a Development
database while it took 2 hours to load in a UAT database.

UAT db is partitioned.
Dev db is not partitioned.

the application looks for existing rows in the table...if they already
exist then it updates otherwise inserts them.

The table is pretty large, around 6.5 million rows.
Due to RI, it is looking up into parent tables which possibly reside of
different nodes.

This issue has increased the batch window from 2 hrs to 4 hrs and this is
certainly not acceptable.
How can I get rid of this performance issue?

Cheers,
San.

推荐答案

shsandeep写道:
shsandeep wrote:

ETL应用程序加载到3000左右开发中的14秒内的行

数据在UAT数据库中加载需要2个小时。


UAT db已分区。

Dev db未分区。
The ETL application loaded around 3000 rows in 14 seconds in a Development
database while it took 2 hours to load in a UAT database.

UAT db is partitioned.
Dev db is not partitioned.



您应该将您的开发系统分区。 2逻辑

分区可以。

You should make your dev system partitioned as well. 2 logical
partitions would do.


应用程序查找表中的现有行...如果它们已经存在,那么它会更新,否则会插入它们。


表格相当大,大约650万行。

由于RI,它正在查找可能存在的父表格

不同的节点。
the application looks for existing rows in the table...if they already
exist then it updates otherwise inserts them.

The table is pretty large, around 6.5 million rows.
Due to RI, it is looking up into parent tables which possibly reside of
different nodes.



向我们展示您运行的语句和计划(db2exfmt输出)。

父表有多大?通常父表很小,应该在所有分区上复制



干杯

Serge

-

Serge Rielau

DB2解决方案开发

IBM多伦多实验室


WAIUG会议
http://www.iiug.org/waiug /present/Fo...Forum2006.html


Serge Rielau写道:
Serge Rielau wrote:

Show我们运行的语句和计划(db2exfmt输出)。

父表有多大?通常父表很小,应该在所有分区上复制



干杯

Serge

-

Serge Rielau

DB2解决方案开发

IBM多伦多实验室
Show us the statements you run and the plans (db2exfmt output).
How big are the parent tables? Often parent tables are small and should
be replicated on all partitions.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab



如果父表太大而无法在每个分区上复制,并且

您使用的是自然键,那么您可以将

分区键设置为相同父表和子表。这将

确保父母和孩子之间的连接不需要跨越分区I / O(使用DPF时的关键性能问题)。

If the parent table is too large to replicate on each partition, and
you are using natural keys, then you may be able to make the
partitioning key the same on the parent and the child table. This will
make sure that joins between parent and child do not require
cross-partition I/O (a key performance issue when using DPF).


shsandeep写道:
shsandeep wrote:

ETL应用程序在14秒内在一个开发中加载了大约3000行

数据库,而在UAT数据库中加载需要2个小时。


UAT db已分区。

Dev db未分区。


应用程序查找表中的现有行...如果它们已经存在,那么它会更新,否则会插入它们。


表格非常大,大约650万行。

由于RI,它正在查找可能位于不同节点的b / b
的父表。 />

此问题已将批处理窗口从2小时增加到4小时这是

肯定是不可接受的。

我怎样才能获得摆脱这个性能问题?


干杯,

圣。
The ETL application loaded around 3000 rows in 14 seconds in a Development
database while it took 2 hours to load in a UAT database.

UAT db is partitioned.
Dev db is not partitioned.

the application looks for existing rows in the table...if they already
exist then it updates otherwise inserts them.

The table is pretty large, around 6.5 million rows.
Due to RI, it is looking up into parent tables which possibly reside of
different nodes.

This issue has increased the batch window from 2 hrs to 4 hrs and this is
certainly not acceptable.
How can I get rid of this performance issue?

Cheers,
San.



一面注意。根据我的经验,一个偶然的失败更新比在尝试更新之前检查记录是否存在便宜得多。比b $ b更新更便宜。由于更新通常会在主键列上键入

并使用索引,因此更新将成功或立即失败并使用非常便宜的索引查找。如果正在处理的大多数记录将导致更新,那么首先检查

是否存在是非常昂贵的。

记录然后更新索引。只有在尝试主键

索引键添加时才会检测到重复,然后必须回滚

记录添加和任何其他索引键更新。尽管如此,如果处理的绝大多数记录都会导致插入,那么
仍可能比成千上万次失败的更新更便宜。


我的经验法则是,如果75%的处理记录将导致插入

首先尝试插入并在插入失败时更新。否则先尝试更新

,如果失败则插入。在任何情况下都不应该有一个

SELECT来验证行的先前存在。如果

插入和更新之间的权重随着时间的推移而变化,则任务可以是动态的,无论是否首先使用某个阈值尝试插入更新。所以说如果

最后两个操作是插入,请先尝试下次插入等等。


这与你的问题没有直接关系,但是,这些

建议将倾向于提高这些类型任务的性能

显着而且我发现,至少在Informix服务器中我有

没有理由看到为什么DB2的行为会有所不同,对于分区(IDS中的碎片)表,改进是明显更大的,除非

分区直接基于主键。无论如何它都值得测试。


Art S. Kagel

One side note. In my experience, an occassional failed update is much
cheaper than checking to see if the record exists first before trying to
update. Since the update will typically key on the primary key column(s)
and use an index the update will either succeed or instantly fail with a
very cheap index lookup. If the majority of the records being processed
will result in an update then it is very expensive to do the SELECT to check
for existence first.

Similarly, a failed insert is very expensive as the typically the data
record is written then the indexes are updated. Only when the primary key
index key add is attempted will the duplication be detected and then the
record add and any other index key updates have to be rolled back. Still,
if the vast majority of records processed will result in an insert this may
still be cheaper than the thousands of failed updates.

My rule of thumb is if 75% of the records processed will result in an insert
try the insert first and update if the insert fails. Otherwise try the
update first and insert if that fails. In neither case should there be a
SELECT to verify the prior existence of the row. If the weighting between
inserts and updates tends to vary over time the task can be dynamic about
whether to try inserts of updates first using some threshhold. So say if
the last two operations were inserts try insert first next time, etc.

This is not directly related to your question, however, following these
suggestions will tend to improve performance of these types of tasks
significantly and I have found that, at least in Informix servers and I have
not reason to see why DB2 would behave differently, the improvements are
noticably greater for partitioned (fragmented in IDS) tables unless that
partitioning is based directly on the primary key. It''s worth testing anyway.

Art S. Kagel


这篇关于分区数据库中的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆