拆分表会改善我们的选择性能吗? [英] Will splitting up a table improve our select performance?

查看:62
本文介绍了拆分表会改善我们的选择性能吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[DB2 8.2.1 Workse on Suse 8.1 on a double Xeon with 2.5Gb ram,6

discura at RAID5]


其中一个我们数据库中的较大表格现在为680万行(自2000年起每金额为1美元b $ b金融交易)。


每次修改预订时,新的对于每个事务,行都被添加到

表中,所以一般来说我们从来没有调用过
更新旧行。通常情况下,我们只处理当前财政年度或前一财年的交易分析

,但*偶尔*

我们希望进一步回归。


所以我想的是:


将每行早于2004年4月1日的transaction_date放入

transactions_to_end_mar04。

将每行以后的transaction_date放入transactions_current。

在transaction_date上的每个表上放置一个聚类索引。

创建一个视图如下:

CREATE VIEW交易AS

SELECT * FROM transactions_to_end_mar04

UNION ALL

SELECT * FROM transactions_current;

我的分析师将在他们针对此视图编写的任何SQL的WHERE子句中使用transaction_date

中的99次。

聚类索引是否有效?当我的

分析师要求从05年4月1日开始预订时,优化者是否会意识到它不需要扫描transactions_to_end_mar04?


我们现在正在尝试这个并准备好通过

解释一些SQL,但我不确定我是否有任何根本性的东西错过了
错过了。所以任何评论/警告/表达的蔑视都将是

感激不尽。


此外,目前每个表都位于同一个SMS表空间中。 />
这个设置在多个表空间中效率更高,还是需要DMS表空间的
? (DBA的时间和资源对我们来说代价很高

我们不想使用太多的东西)


谢谢

[DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6
discs at RAID5]

One of the larger tables in our database is now 6.8 million rows (1 per
financial transaction since 2000).

Every time an amendment is made to a booking, new rows are added to the
table for each transaction, so in general we never have any call to
update old rows. Usually, we only deal with analysis on transactions
in the current financial year or the previous one, but *occasionally*
we''ll want to go back further.

So I''m thinking as follows:

Put every row with transaction_date earlier than 1 April 2004 into
transactions_to_end_mar04.
Put every row with a later transaction_date into transactions_current.
Put a clustering index on each table on transaction_date.
Create a view as follows:
CREATE VIEW transactions AS
SELECT * FROM transactions_to_end_mar04
UNION ALL
SELECT * FROM transactions_current;
My analysts will 99 times out of a hundred use the transaction_date
within the WHERE clause of any SQL they write against this view. Will
the clustering indexes be efficacious? Is the optimiser going to
realise it doesn''t need to scan transactions_to_end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?

We''re trying this now and getting ready to put bits of SQL through the
explain, but I''m not sure if there''s anything fundamental that I''ve
missed. So any comments/warnings/expressions of disdain would be
gratefully received.

Further to this, currently every table sits in the same SMS tablespace.
Would this set up be more efficient in multiple tablespaces, or would
that require DMS tablespaces? (DBA time and resource is costly to us
and we prefer not to use too much of it)

Thanks

推荐答案

拆分表的最简单方法是移动到企业,并使用DPF在多个节点上散列表。如果你想追求UNION ALL方法,请参阅:
http://www-128.ibm.com/developerwork...m-0202zuzarte/

The simplest way to split the table would be to move to enterprise and
use DPF to hash the table across multiple nodes. If you want to pursue
the UNION ALL approach, see this:
http://www-128.ibm.com/developerwork...m-0202zuzarte/


James Conrad St.John Foreman写道:
James Conrad St.John Foreman wrote:
[DB2 8.2.1 Workse on Suse 8.1 on a double Xeon with 2.5Gb ram,6
disc在RAID5]

我们数据库中的一个较大的表现在是680万行(自2000年以来每金融交易1次)。

每次修订都是为了预订,每个交易都会在
表中添加新行,所以一般来说我们从来没有打过任何更新旧行的电话。通常情况下,我们只处理当前财政年度或前一财政年度的交易分析,但*偶尔*我们会想要进一步回归。

所以我的想法如下:

将每行早于2004年4月1日的transaction_date放入
transactions_to_end_mar04。
将每行后面的transaction_date放入transactions_current。
在transaction_date上的每个表上放置一个聚类索引。
创建一个视图,如下所示:
CREATE VIEW事务AS
SELECT * FROM transactions_to_end_mar04
UNION ALL
SELECT * FROM transactions_current;
我的分析师将在他们针对此视图编写的任何SQL的WHERE子句中使用transaction_date中的99次。
聚类索引是否有效?当我的
分析师要求从05年4月1日开始预订时,优化人员是否意识到它不需要扫描transactions_to_end_mar04?

我们正在尝试这个现在,并准备通过
解释把一些SQL,但我不确定是否有任何基本的,我已经错过了。因此,任何评论/警告/表达的蔑视都会被感激地收到。

此外,目前每个表都位于同一个SMS表空间中。
此设置是否会更多在多个表空间中有效,还是需要DMS表空间? (DBA时间和资源对我们来说代价很高
我们不想过多使用它)

谢谢
[DB2 8.2.1 Workgroup on Suse 8.1 on a dual Xeon with 2.5Gb of ram, 6
discs at RAID5]

One of the larger tables in our database is now 6.8 million rows (1 per
financial transaction since 2000).

Every time an amendment is made to a booking, new rows are added to the
table for each transaction, so in general we never have any call to
update old rows. Usually, we only deal with analysis on transactions
in the current financial year or the previous one, but *occasionally*
we''ll want to go back further.

So I''m thinking as follows:

Put every row with transaction_date earlier than 1 April 2004 into
transactions_to_end_mar04.
Put every row with a later transaction_date into transactions_current.
Put a clustering index on each table on transaction_date.
Create a view as follows:
CREATE VIEW transactions AS
SELECT * FROM transactions_to_end_mar04
UNION ALL
SELECT * FROM transactions_current;
My analysts will 99 times out of a hundred use the transaction_date
within the WHERE clause of any SQL they write against this view. Will
the clustering indexes be efficacious? Is the optimiser going to
realise it doesn''t need to scan transactions_to_end_mar04 when my
analyst asks just for bookings from 1 April 05 onward?

We''re trying this now and getting ready to put bits of SQL through the
explain, but I''m not sure if there''s anything fundamental that I''ve
missed. So any comments/warnings/expressions of disdain would be
gratefully received.

Further to this, currently every table sits in the same SMS tablespace.
Would this set up be more efficient in multiple tablespaces, or would
that require DMS tablespaces? (DBA time and resource is costly to us
and we prefer not to use too much of it)

Thanks



确保设置INTRA_PARALLEL在数据库管理器配置中为是。

处理UNION时,应充分利用处理器

ALL。


此外,具有多个容器的DMS表空间可能是更好的选择。

您可以拥有多个容器(=目录)用于SMS,但仅当您创建表空间时才会使用
。使用DMS,您可以在创建

后添加容器。


-

Anton Versteeg

IBM荷兰


Make sure you set INTRA_PARALLEL to YES in the Database Manager Config.
This should make best usage of your processors when processing the UNION
ALL.

Also DMS table spaces with multiple containers might be a better option.
You can have multiple containers (=directories) for SMS, but only when
you create the tablespace. With DMS you can add containers after it is
created.

--
Anton Versteeg
IBM Netherlands


嗯..纯粹在SQL中思考,而不是分区,硬件等。

也许你可以使用集群表,并使用年份作为

cluseter指数。这样你就可以得到一个3D矩阵。我不确定这是否会提高整体性能,但据我所知,它可以提高索引扫描性能。不知道从哪个FP这个

功能可用。


-R-
Well.. Thinking purely in SQL, and not partitioning, hardware, etc.
Perhaps you can make use of a clustered table, and use year as a
cluseter index. This way you''ll get a 3D matrix. Don''t know for sure if
this will increase overall performance, but as far as I can see it, it
will improve index scan performance. Don''t know from which FP this
feature was available.

-R-


这篇关于拆分表会改善我们的选择性能吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆