数据库分区 - 水平vs垂直 - 规范化和行拆分之间的差异? [英] Database partitioning - Horizontal vs Vertical - Difference between Normalization and Row Splitting?

查看:234
本文介绍了数据库分区 - 水平vs垂直 - 规范化和行拆分之间的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了解数据库分区的不同概念,这是我的理解:



水平分区/ Sharding :将表拆分到不同的表中,该表将包含初始表中的行的子集(如果按照Continent拆分Users表,我看到很多示例,例如North的子表美国,另一个为欧洲等)。每个分区在不同的物理位置(了解机器)。
根据我的理解,水平分区和分片是完全相同的事情(?)。



垂直分区:从什么了解( http://technet.microsoft.com/en-us/ library / ms178148%28v = sql.105%29.aspx ),有两种垂直分区:




  • p> 正规化(其中包括通过分割表并使用外键将它们链接而从数据库中删除冗余)。


  • 行拆分,这里是我不明白的,行拆分




我也读过这篇文章(水平和垂直缩放数据库之间的差异),水平分区和垂直之间的差异分区是,在第一个扩展通过添加更多的机器,而在第二个,你通过增加更多的电源(CPU,RAM)到现有的机器,是一个正确的定义?我认为这两种技术之间的核心 区别在于您拆分表格的方式。



对不起,问题的负载,但我有点困惑作为很多不同的网站,我遇到过说不同的事情。



任何帮助澄清将非常感谢。

解决方案

分区是一个相当普遍的概念,可以应用在许多上下文中。当它考虑关系数据数据的分区时,通常指的是逐行(水平)或逐列(垂直)分解表。



垂直分割,也称为行分割,使用与数据库规范化相同的分割技术,但是通常术语(垂直/水平)数据分割是指物理优化
,而归一化是对<



因为你要求一个简单的演示 - 假设你有一个这样的表:

 创建表数据(
id整数主键,
状态char(1)不为null,
data1 varchar2 ,
data2 varchar2(10)not null);

一种分区数据的方法 垂直:按如下方式拆分:

 创建表data_main(
id整数主键,
status char(1)not null,
data1 varchar2(10)not null);

create table data_rarely_used(
id整数主键,
data2 varchar2(10)不为null,
外键(id)引用data_main(id)

例如,当您的查询中很少需要列数据2时,可以应用这种分区。分区data_main将占用更少的空间,因此全表扫描将更快,并且它更有可能适合DBMS的页缓存。缺点:当您必须查询 data 的所有列时,您无意中必须连接表,查询原始表格将更昂贵。



请注意,以与对规范化表格时相同的方式拆分列。然而,在这种情况下, data 可能已经归一化为3NF(甚至BCNF和4NF),但是您决定进一步拆分它的物理优化的原因。



使用Oracle语法分割数据 水平的一种方法:

 创建表数据(
id整数主键,
status char(1),
data1 varchar2(10)
data2 varchar2(10))
按列表分区(状态)(
分区active_data值('A'),
分区other_data值(默认)
);

这将告诉DBMS在内部存储表 data 在两个段(类似两个表),取决于状态的值。这种分区 data 的方式可以应用于,例如,当你通常只查询一个分区的行,例如状态'A'行(我们称之为活动行) 。像以前一样,完全扫描将更快(特别是如果只有很少活动的行),活动的行(和其他行)被连续存储(它们不会分散在页面上,它们与不同的行共享状态值,并且更有可能活动的行将在页面缓存中。


I am trying to grasp the different concepts of Database Partitioning and this is what I understood of it:

Horizontal Partitioning/Sharding: Splitting a table into different table that will contain a subset of the rows that were in the initial table (an example that I have seen a lot if splitting a Users table by Continent, like a sub table for North America, another one for Europe, etc...). Each partition being in a different physical location (understand 'machine'). As I understood it, Horizontal Partitioning and Sharding are the exact same thing(?).

Vertical Partitioning: From what I understood (http://technet.microsoft.com/en-us/library/ms178148%28v=sql.105%29.aspx ), there are 2 sorts of Vertical Partitioning:

  • Normalization (which consists of removing redundancies from a the database by splitting tables and linking them with a foreign key).

  • Row Splitting, here is what I don't understand, what is the difference between Normalization and Row Splitting? In what those 2 techniques differ from each other?

I have also read in this post (Difference between scaling horizontally and vertically for databases ) that the difference between Horizontal Partitioning and Vertical Partitioning is that in the first you scale by adding more machines, while in the second one you scale by adding more power (CPU, RAM) to your existing machine, is that a correct definition? I thought that the core difference between those 2 techniques resides in the way you split your tables.

I am sorry for the load of questions but I am a bit confused as a lot of different websites that I have came across say different things.

Any help clarifying would be greatly appreciated. Any link to a clear and simple demonstration with a few tables would also be very helpful.

解决方案

Partitioning is a rather general concept and can be applied in many contexts. When it considers the partitioning of relational data, it usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically).

Vertical partitioning, aka row splitting, uses the same splitting techniques as database normalization, but ususally the term (vertical / horizontal) data partitioning refers to a physical optimization whereas normalization is an optimization on the conceptual level.

Since you ask for a simple demonstration - assume you have a table like this:

create table data (
    id integer primary key, 
    status char(1) not null, 
    data1 varchar2(10) not null, 
    data2 varchar2(10) not null);

One way to partition data vertically: Split it as follows:

create table data_main (
    id integer primary key,
    status char(1) not null,
    data1 varchar2(10) not null );

create table data_rarely_used (
    id integer primary key,
    data2 varchar2(10) not null,
    foreign key (id) references data_main (id) );

This kind of partitioning can be applied, for example, when you rarely need column data2 in your queries. Partition data_main will take less space, hence full table scans will be faster and it is more likely that it fits into the DBMS' page cache. The downside: When you have to query all columns of data, you obivously have to join the tables, which will be more expensive that querying the original table.

Notice you are splitting the columns in the same way as you would when you normalize tables. However, in this case data could already be normalized to 3NF (and even BCNF and 4NF), but you decide to further split it for the reason of physical optimization.

One way to partition data horizontally, using Oracle syntax:

create table data (
    id integer primary key, 
    status char(1), 
    data1 varchar2(10), 
    data2 varchar2(10) )
    partition by list (status) ( 
       partition active_data values ( 'A' ),
       partition other_data values(default) 
    );

This would tell the DBMS to internally store the table data in two segments (like two tables), depending on the value of the column status. This way of partitioning data can be applied, for example, when you usually query only rows of one partition, e.g., the status 'A' rows (let's call them active rows). Like before, full scans will be faster (particularly if there are only few active rows), the active rows (and the other rows resp.) are stored contiguously (they won't be scattered around pages that they share with rows of a different status value, and it is more likely that the active rows will be in the page cache.

这篇关于数据库分区 - 水平vs垂直 - 规范化和行拆分之间的差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆