最佳数据库结构 - “更宽”的表有空字段或更多的表? [英] Optimal database structure - 'wider' table with empty fields or greater number of tables?

查看:139
本文介绍了最佳数据库结构 - “更宽”的表有空字段或更多的表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将其他数据放入数据库,我可以选择修改现有表(table_existing)或创建新表。



这是如何table_existing现在看起来像:

  table_existing 
--------------- ----------
| ID | SP | SV | Field1 |
| .. | WW | 1 | ...... |
| .. | WW | 1 | ...... |
-------------------------

选项(A)

  table_existing 
--------- -------------------------------------------------- -----------
| ID | SP | SV | Field1 | Field2 | Field3 | Field4 | Field5 | Field6 |
| .. | XX | 1 | ...... | ...... | ...... | ...... | ...... | ...... |
| .. | YY | 2 | ...... | ...... | ...... | ...... | ...... | ...... |
---------------------------------------------- ------------------------

选项(B)

  table_existing将转换为table_WW_1_data 
--------- ------
| ID | Field1 |
| .. | ...... |
| .. | ...... |
---------------

table_XX_1_data
------------------ ------
| ID | Field1 | Field2 |
| .. | ...... | ...... |
| .. | ...... | ...... |
------------------------

table_YY_2_data
--------- ------------------------
| ID | Field1 | Field2 | Field3 |
| .. | ...... | ...... | ...... |
| .. | ...... | ...... | ...... |
---------------------------------

上下文:SP,SV的组合确定将填充的字段的数量。例如,(XX,1)具有2个字段。 (YY,2)有3个字段。



如果我使用Option(A),我将在 / p>

如果我使用Option(B),我基本上创建更多的表...一个SP,SV的每个组合 - 将有4-5总共。但每个都将填充正确数量的字段。

从速度的角度来看,更优化的数据库结构是什么? 我认为从可维护性的角度来看,选项(B)可能会更好。






strong>



这两个选项都不是我应用程序中最关键/最常用的表。



在选项(B)中,数据拆分后,将不需要加入它们。如果我知道我需要XX_1的字段,我会去那个表。



我试图理解,如果有一个大型表有许多未使用的价值vs有相同的数据分割在更多的表上有利弊。

解决方案

更多的表会导致数据库中的性能损失从速度的角度来看,更优化的数据库结构是什么?



好,什么是正确的,最佳实践等称为规范化。如果你这样做正确,将没有可选列(不是字段),没有空。可选列将位于单独的表中,行数较少。当然,您可以排列表格,使它们是可选列的集合,而不是(一个PK加号)一列。



组合子表中的行到一个5NF行很容易,做那个ia视图(但是不通过视图更新,通过事务存储过程直接到每个子表)。



更多,更小的表,是规范化关系数据库的性质。习惯它。较少,较大的表较慢,由于缺乏规范化,重复和Null。加入在SQL中是麻烦的,但这是我们所有的。在连接本身没有成本,只有被连接的表(行,行宽,连接列,数据类型,不匹配,索引[或没有])。数据库针对规范化表而不是针对数据堆进行了优化。和大量的表。



这是最佳的性能,没有惊喜。由于两个原因:


  1. 表格更窄,因此每页有更多行, O和在同一缓存空间中的更多行。


  2. 由于您没有空值,所以这些列是固定的,不解压缩提取列的内容。


大型表没有可选(null)列,只有cons。从来没有人违反标准。



答案是不变的,无论你是否正在考虑4或400个新表。




  • 一个建议,如果你正在认真考虑许多表:你正朝着第六正常形式的方向,没有意识到。所以实现它,并这样正式。 400个表将受到更好的控制。如果你得到专业人士,他们会正常化,并回到小于100。


I need to fit in additional data into a database, and I have a choice between modifying an existing table (table_existing) or creating new tables.

This is how table_existing looks like right now:

table_existing
-------------------------
| ID | SP | SV | Field1 |
| .. | WW |  1 | ...... |
| .. | WW |  1 | ...... |
-------------------------

Option (A)

table_existing
----------------------------------------------------------------------
| ID | SP | SV | Field1 | Field2 | Field3 | Field4 | Field5 | Field6 |
| .. | XX |  1 | ...... | ...... | ...... | ...... | ...... | ...... |
| .. | YY |  2 | ...... | ...... | ...... | ...... | ...... | ...... |
----------------------------------------------------------------------

Option (B)

table_existing would be converted into table_WW_1_data
---------------
| ID | Field1 |
| .. | ...... |
| .. | ...... |
---------------

table_XX_1_data
------------------------
| ID | Field1 | Field2 |
| .. | ...... | ...... |
| .. | ...... | ...... |
------------------------

table_YY_2_data
---------------------------------
| ID | Field1 | Field2 | Field3 |
| .. | ...... | ...... | ...... |
| .. | ...... | ...... | ...... |
---------------------------------

Context: The combination of SP, SV determine the "number" of fields that will be populated. For instance, (XX, 1) has 2 fields. (YY, 2) has 3 fields.

If I were to go with Option (A) I would have many empty/NULL values in the "wider" table.

If I go with Option (B), I am basically creating more tables... one for "each" combination of SP, SV - there will be perhaps 4-5 in total. But each would be fully populated with the right number of fields. table_existing would be changed as well.

What is the more optimal database structure from the speed point of view? I think that from the maintainability point of view, Option (B) might be better.


Edit1

Neither of the two Options will be the most critical / frequently used tables in my application.

In Option (B), after the data has been split up, there would be no need of JOINing them at all. If I know I need Fields for XX_1, I will go to that table.

I'm trying to understand if there are pros and cons for having ONE large table with many unused values vs having the same data split across more number of tables. Do the larger number of tables lead to a performance hit in the database (we've got ~80 tables already)?

解决方案

What is the more optimal database structure from the speed point of view?

Well, what is correct, best practice, etc, is called Normalisation. If you do that correctly, there will be no optional columns (not fields), no Nulls. The optional columns will be in a separate table, with fewer rows. Sure, you can arrange the tables so that they are sets of optional columns, rather than (one PK plus) one column each.

Combining the rows from the sub-tables into one 5NF row is easy, do that i a view (but do not update via the view, do that directly to each sub-table, via a transactional stored proc).

More, smaller tables, are the nature of a Normalised Relational database. Get used to it. Fewer, larger tables are slower, due to lack of normalisation, duplicates and Nulls. Joining is cumbersome in SQL< but that is all we have. There is no cost in joins themselves, only it the tables being joined (rows, row width, join columns, datatypes, mismatches, indices [or not] ). Databases are optimised for Normalised tables, not for data heaps. And large numbers of tables.

Which happens to be optimal re performance, no surprise. For two reasons:

  1. The tables are narrower, so there are more rows per page, you get more rows per physical I/O, and more rows in the same cache space.

  2. Since you have No Nulls, those columns are fixed len, no unpacking to extract the contents of the column.

There are no pros for large tables with many optional (null) columns, only cons. There never is a pro for breaching standards.

The answer is unchanged regardless of whether you are contemplating 4 or 400 new tables.

  • One recommendation if you are seriously considering that many tables: you are heading in the direction of Sixth Normal Form, without realising it. So realise it, and do so formally. The 400 tables will be much better controlled. If you get a professional to do it, they will normalise that, and end up back at less than 100.

这篇关于最佳数据库结构 - “更宽”的表有空字段或更多的表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆