水平分区问题 [英] Horizontal Partitioning question

查看:63
本文介绍了水平分区问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近遇到了一个数据库,其中数据按地理划分为
到4个表中。我不确定这是不是一个糟糕的设计选择,或者是否因为有效的性能原因而完成了b
。这些表的模式基本上是相同的,只是它们被命名为differenly并且列被命名为

differenlty以区分数据和业务用法透视。通过在

聚集索引中添加一个新列,可以很容易地将

表组合在一起,用于区分业务使用情况。我是

试图评估组合表是否会提高性能还是

如果最好让它们保持原样。对这些表运行

的许多查询不会从多个

表中请求记录,这很好。但是,有许多进程针对相同聚簇索引范围内的所有表查询

。我不是确定表中有多少行,但我相当确定

整个数据库是< 50 GB。

I recently came across a database where the data are horizonally partitioned
into 4 tables. I''m not sure if this was a poor design choice, or if it was
done for valid performance reasons. The schema of the tables are essentially
the same, it''s just that they are named differenly and the columns are named
differenlty to differentiate the data from a business usage perspective. The
tables could easily be combined inot one by adding a new colum to the
clustered index that would be used to differentiate the business usage. I am
trying to evaluate whether combining the tables would improve performance or
if it would be better to leave them the way they are. Many queries that run
against these tables do not request records from more than one of the
tables, which is good. However, there are a number of processes that query
against all of the tables on the identical clustered index range. I am not
sure exactly how many rows are in the tables but I''m fairly certain the
entire database is < 50 GB.

推荐答案




你不知道他们是否已被设置为一个分区视图,但你对商业用途的评论往往意味着他们没有?如果他们没有b $ b那么这将是我首先看到的变化,特别是如果系统的增长率表明联邦将是必要的
/>

如果只有一小部分查询访问了所有表,那么这可能会表明有一个性能优势。如果这些表位于不同的
文件组并且位于不同的光盘子系统上,那么性能可能会有一个合理的理由将它们分开。


当没有在那里进行分区分割时,你将不会知道这个设计的基本属性或原因,我会打赌它们是b / b
他们没有记录!


如果要将它们组合起来,那么创建一个基准测试,这样你就可以比较每个配置,并测试两个配置

受控环境中的替代品。如果你不能这样做,那么除非有一个特定的理由改变已经在工作的东西(并且表现良好!)然后

我不会。


John


" MissLivvy" <氙******************* @ yahoo.com>在消息中写道

news:DS ****************** @ newsread1.news.pas.earth link.net ...
Hi

You don''t say if they have been set up as a partitioned view, but your
comment about business usage would tend to imply they haven''t? If they
haven''t then this would be the change I would look at first, especially if
the growth rate of the system would indicate federation will be necessary

If only a small percentage of queries access all the tables, then this may
also indicate there is a performance benefit. If the tables are on different
filegroups and are on different disc subsystems then performance may have
been a valid reason to split them up.

Without being there when the decission to partition them was made, you will
not know the underlying stats or reasons for this design, and I would bet
they have not been documented!

If you are going to combine them, then create a benchmark test so that you
can compare each configuration, and test the two alternatives in a
controlled environment. If you can''t do that, then unless there is a
specific reason to change what is already working (and perfoming well!) then
I wouldn''t.

John

"MissLivvy" <Xe*******************@yahoo.com> wrote in message
news:DS******************@newsread1.news.pas.earth link.net...
我最近遇到了一个数据库,其中数据被划分为4个表格。我不确定这是一个糟糕的设计选择,还是因为有效的性能原因而完成的。这些表的架构基本上是相同的,只是它们被命名为差异,并且这些列被命名为
不同以区分数据和业务用法通过向
聚簇索引添加新列,可以很容易地将
表组合在一起,用于区分业务使用情况。我试着评估合并表是否会提高性能

如果最好让它们保持原样。对这些表运行的许多查询不会从多个表中请求记录,这很好。但是,有许多进程可以查询相同聚簇索引范围内的所有表。我不确定表格中究竟有多少行,但我相当确定
整个数据库是< 50 GB。
I recently came across a database where the data are horizonally
partitioned
into 4 tables. I''m not sure if this was a poor design choice, or if it was
done for valid performance reasons. The schema of the tables are
essentially
the same, it''s just that they are named differenly and the columns are
named
differenlty to differentiate the data from a business usage perspective.
The
tables could easily be combined inot one by adding a new colum to the
clustered index that would be used to differentiate the business usage. I
am
trying to evaluate whether combining the tables would improve performance
or
if it would be better to leave them the way they are. Many queries that
run
against these tables do not request records from more than one of the
tables, which is good. However, there are a number of processes that query
against all of the tables on the identical clustered index range. I am not
sure exactly how many rows are in the tables but I''m fairly certain the
entire database is < 50 GB.



>>我最近遇到了一个数据库,其中数据按地理位置分成4个表格。我不确定这是否是一个糟糕的设计

选择,或者是否因为有效的性能原因而完成了
。 <


如果不知道更多,智能会打赌设计不佳

...
>> I recently came across a database where the data are horizonally
partitioned into 4 tables. I''m not sure if this was a poor design
choice, or if it was
done for valid performance reasons. <<

Without knowing any more than that, the smart would bet on poor design
...
表的模式基本相同,只是
它们被命名为differenly而列被命名为differenlty

从业务使用角度区分数据。 <


这里我们可能有一个有效的设计理由。在每种情况下,数据逻辑上是不同的吗?不只是状态变化(付费与未付款

账单等),真的不同吗?如果没有,那么这就是一团糟。

通过在聚集索引中添加一个新列
可以很容易地将表组合成一个,用于区分

业务使用情况。 <<


宾果!没有逻辑差异,数据模型中没有单独的表。

我正在尝试评估组合表是否会提高
的性能,或者是否更好地保留它们的方式。

<<


性能是次要问题。正确性和删除还原剂

数据元素名称是第一个问题。做对了,然后快速做到。
快。

针对这些表运行的许多查询不请求记录
The schema of the tables are essentially the same, it''s just that they are named differenly and the columns are named differenlty to
differentiate the data from a business usage perspective. <<

Here we MAY have a valid design reason. Is the data logically
different in each case? Not just a status change (paid versus unpaid
bills, etc.), really different? If not, then this is a mess.
The tables could easily be combined inot one by adding a new column to the clustered index that would be used to differentiate the
business usage. <<

Bingo! No logical differences, no separate tables in the data model.
I am trying to evaluate whether combining the tables would improve performance or if it would be better to leave them the way they are.
<<

Performance is a secondary issue. Correctness and removing redudant
data element name is the first issue. Make it right, then make it
fast.
Many queries that run against these tables do not request records



[原文如此]来自多个表格,这很好。但是,有一些进程可以查询

相同聚簇索引范围内的所有表。

是多个进程。我不确定表中有多少行

但是我相当确定整个数据库是< 50

GB。 <


在数据上写一些VIEW。具有聚集索引的性能
从状态列开始的
就可以了。


[sic] from more than one of the tables, which is good. However, there
are a number of processes that query against all of the tables on the
identical clustered index range. I am not sure exactly how many rows
are in the tables but I''m fairly certain the entire database is < 50
GB. <<

Write some VIEWs on the data. Performance with a clustered index
starting on the status column will be fine.



你不要说它们是否已被设置为分区视图,但是您对业务使用的评论往往暗示它们不是吗?


正确。没有分区视图。我不认为当前的设计

适用于此,因为当前没有可用的列

用于检查约束。存在遍布所有表的数据,其中

是相同的主键。具有相同PK的数据在逻辑上与

业务角度相关。为了创建一个检查约束,我想我们必须添加

另一个列,就像我在下面提到的那样。

改变已经工作的具体原因(和表现良好!)
然后


虽然需要同时对所有表格查询

,但性能肯定是个问题。例如,用户通常需要做的一件事就是从所有

表中复制大量行,并将它们插回到相同的表中(使用新的PK,当然)。

我会试着找出不同的文件组是否用于不同的

表,但我猜这不是这种情况。


在我的情况下,因为有时候我们需要同时访问所有表格,而有时候不需要b $ b,我需要做的就是测量在需要访问其中一个表的情况下,改善

性能之间的权衡,而在需要访问所有表时支付

penaly。我的直觉是,在组合表中遍历B树所花费的时间的增加应该比我们在数据分割时所支付的罚款更为重要


需要同时访问所有表。但同样,我真的需要

来衡量这个。


谢谢。

MissLivvy <氙******************* @ yahoo.com>在消息中写道
新闻:DS ****************** @ newsread1.news.pas.earth link.net ...

You don''t say if they have been set up as a partitioned view, but your
comment about business usage would tend to imply they haven''t?
Correct. There is no partitioned view. I don''t think the current design
lends itself to that since there is currenlty no column that could be used
for the check constraint. There exist data spread across all tables with the
same primary key. Data with the same PK are logically related from a
business perspective. To create a check constraint, I think we''d have to add
another column like the one I mention below.
specific reason to change what is already working (and perfoming well!) then

Peformance is definately a problem though with operations that need to query
against all of the tables at the same time. For example, one thing that
users routinely need to do is copy a large range of rows from all of the
tables and insert them back into the same tables (with a new PK, of course).
I will try to find out if different filegroups were used for the different
tables, but I''m guessing this is not the case.

In my case, since sometimes we need to acess all of the tables at once, and
sometimes not, what I need to do is measure the tradeoff between improved
performance in situations where only 1 of the tables need accessed, vs the
penaly paid when all tables need to be accessed. My gut feeling is that
increase in time spent traversing the B-tree in the combined table should be
less significant than the penalty paid for having the data split up when we
need to access all tables at the same time. But again, I really need to
measure this.

Thanks.
"MissLivvy" <Xe*******************@yahoo.com> wrote in message
news:DS******************@newsread1.news.pas.earth link.net...
我最近遇到了一个数据库,数据在地理上被分区为4个表。我不确定这是一个糟糕的设计选择,还是因为有效的性能原因而做了
。这些表的架构基本上是相同的,只是它们被命名为差异,并且这些列被命名为
不同以区分数据和业务用法通过向
聚簇索引添加新列,可以很容易地将
表组合在一起,用于区分业务使用情况。
我正在尝试评估合并表是否会提高
的表现,或者如果最好让他们保持原样。对这些表运行的许多查询不会从多个表中请求记录,这很好。但是,有许多进程可以对相同聚簇索引范围内的所有表进行
查询。我是
不确定表中究竟有多少行,但我相当确定
整个数据库是< 50 GB。
I recently came across a database where the data are horizonally
partitioned
into 4 tables. I''m not sure if this was a poor design choice, or if it was done for valid performance reasons. The schema of the tables are
essentially
the same, it''s just that they are named differenly and the columns are
named
differenlty to differentiate the data from a business usage perspective.
The
tables could easily be combined inot one by adding a new colum to the
clustered index that would be used to differentiate the business usage. I am
trying to evaluate whether combining the tables would improve performance or
if it would be better to leave them the way they are. Many queries that
run
against these tables do not request records from more than one of the
tables, which is good. However, there are a number of processes that query against all of the tables on the identical clustered index range. I am not sure exactly how many rows are in the tables but I''m fairly certain the
entire database is < 50 GB.




这篇关于水平分区问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆