如何在表上建立多个m:n关系的数据库 [英] How to model a database with many m:n relations on a table
问题描述
一个人有一些工作,工作由一些人履行。一个人有一些房屋,房屋被一些人占用。一个人有一些他喜欢的餐馆,餐厅有很多人喜欢这家餐厅。
首先我设计如下:
表:人,工作,房屋,餐厅,Person_Job,Person_House,Person_Restaurant。
关系1 - n:个人 - > Person_Job, - > Person_House,Person - > Person_Restaurant,Job - > Person_Job,House - > Person_House,Restaurant - > Person_Restaurant。
这很快导致拥挤而复杂的ER模型
试图简化这个我建模如下:
表格:人,工作,房子,餐厅,Person_Attributes
关系1 - n:Person - > Person_Attributes,Job - > Person_Attributes,House - > Person_Attributes,Restaurant - > Person_Attributes
Person_Attributes表应如下所示:
personId
jobId
houseId
restaurantId
如果一个人 - 工作关系存在,我会添加一个条目,如下所示:
P1,J1,NULL,NULL
如果存在一个人与房屋关系,我将添加一个条目,如下所示:
P1,NULL,H1,NULL
所以第二个示例中的属性表将具有与添加的第一个示例的链接表相同的条目数。
简单地说,ER模型很多,只要我为personId + jobId,personId + houseId和personId + restaurantId创建索引,我认为不会有太多的性能影响。
我的问题是:
第二种方法是正确的建模方式吗?如果没有,为什么?对于性能影响我是对的吗?如果没有,为什么?为什么?
MySQL Workbench的例子我可以在这里找到:
http://www.2shared.com/file/3GBnodEZ/example.html
您的设计违反了第四正常表格。您正试图将多个事实存储在一个表中,并导致异常。
Person_Attributes表应该是这样的:personId jobId houseId restaurantId
所以如果我联系一个工作,一个房子,但是两个餐馆,我存储以下? / p>
personId jobId houseId restaurantId
1234 42 87 5678
1234 42 87 9876
如果我添加了第三家餐厅,我会复制其他列?
personId jobId houseId restaurantId
1234 123 87 5678
1234 123 87 9876
1234 42 87 13579
完成!哦,等等,发生了什么事?在添加新餐厅的同时我改变了工作。现在我不正确地关联了两个工作,但没有办法区分正确的 正好与两个作业相关联。
另外,即使与两个作业相关联也是正确的,数据看起来不是这样吗?
personId jobId houseId restaurantId
1234 123 87 5678
1234 123 87 9876
1234 123 87 13579
1234 42 87 5678
1234 42 87 9876
1234 42 87 13579
它开始看起来像一个笛卡尔乘积。事实上,这是因为这个表试图存储多个独立的事实。
正确的关系设计需要一个单独的交叉表,用于每对多对多的关系。对不起,你还没有找到一个快捷方式。
(关于规范化的许多文章都表示,过去3NF的更高级正常格式是深奥的,而且从来没有必要担心4NF或5NF。让这个例子反驳这个说法。)
重新发表关于使用NULL的评论:那么你有一个强制执行唯一性的问题,因为一个 PRIMARY KEY
约束要求所有列不为空。
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 NULL NULL 13579
另外,如果我在上表中添加了第二个房子或第二个jobId,我将其放在哪一行?你可以这样做:
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 42 NULL 13579
现在如果我取消关联restaurantId 9876,我可以将其更新为空值。但是,这留下了一排NULL,我真的应该删除。
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL NULL
1234 42 NULL 13579
而如果我已经解除了餐厅13579的关联,我可以将其更新为NULL,并将该行放在适当位置。
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 42 NULL NULL
但是,不应该整合行,将jobId移动到另一行,只要有空位在该列中?
personId jobId houseId restaurantId
1234 123 87 5678
1234 42 NULL 9876
麻烦的是,现在添加或删除关联越来越复杂,需要多个SQL语句进行更改。您将不得不编写大量繁琐的应用程序代码来处理这种复杂性。
但是,如果您定义一个表,所有各种更改都很容易
向餐厅添加关联只是Person_Restaurant表的 INSERT
。删除该关联只是一个 DELETE
。关于工作或房屋的协会有多少关系。并且您可以在每个这些交集表中定义主键约束以强制执行唯一性。
I am currently setting up a database which has a large number of many-to-many relations. Every relationship was modeled via a link table. Example:
A person has a number of jobs, jobs are fulfilled by a number of persons. A person has a number of houses, houses are occupied by a number of persons. A person has a number of restaurants he likes, restaurants have a number of persons who like the restaurant.
First I designed this as follows:
Tables: Person, Job, House, Restaurant, Person_Job, Person_House, Person_Restaurant.
Relationships 1 - n: Person -> Person_Job, Person -> Person_House, Person -> Person_Restaurant, Job -> Person_Job, House -> Person_House, Restaurant -> Person_Restaurant.
This leads pretty quickly to a crowded and complex ER model.
Trying to simplify this I modeled it as follows:
Tabels: Person, Job, House, Restaurant, Person_Attributes
Relationships 1 - n: Person -> Person_Attributes, Job -> Person_Attributes, House -> Person_Attributes, Restaurant -> Person_Attributes
The Person_Attributes table should look something like this: personId jobId houseId restaurantId
If a person - job relationship exists, I'll add an entry looking like:
P1, J1, NULL, NULL
If a person - house relationship exists, I'll add an entry looking like:
P1, NULL, H1, NULL
So the attributes table in the second example will have the same number of entries as the link tables of the first examples added up.
This simplyfies the ER Model a lot, and as long as I build indexes for personId + jobId, personId + houseId and personId + restaurantId, there won't be a lot of performance impact, I think.
My questions are: Is the second method a correct way of modelling this? If not, why? Am I right about performance impact? If not, why?
MySQL Workbench example of what I mean can be found here:
http://www.2shared.com/file/3GBnodEZ/example.html
Your design violates Fourth Normal Form. You're trying to store multiple "facts" in one table, and it leads to anomalies.
The Person_Attributes table should look something like this: personId jobId houseId restaurantId
So if I associate with one job, one house, but two restaurants, do I store the following?
personId jobId houseId restaurantId
1234 42 87 5678
1234 42 87 9876
And if I add a third restaurant, I copy the other columns?
personId jobId houseId restaurantId
1234 123 87 5678
1234 123 87 9876
1234 42 87 13579
Done! Oh, wait, what happened there? I changed jobs at the same time as adding the new restaurant. Now I'm incorrectly associated with two jobs, but there's no way to distinguish between that and correctly being associated with two jobs.
Also, even if it is correct to be associated with two jobs, shouldn't the data look like this?
personId jobId houseId restaurantId
1234 123 87 5678
1234 123 87 9876
1234 123 87 13579
1234 42 87 5678
1234 42 87 9876
1234 42 87 13579
It starts looking like a Cartesian product of all distinct values of jobId, houseId, and restaurantId. In fact, it is -- because this table is trying to store multiple independent facts.
Correct relational design requires a separate intersection table for each many-to-many relationship. Sorry, you have not found a shortcut.
(Many articles about normalization say the higher normal forms past 3NF are esoteric, and one never has to worry about 4NF or 5NF. Let this example disprove that claim.)
Re your comment about using NULL: Then you have a problem enforcing uniqueness, because a PRIMARY KEY
constraint requires that all columns be NOT NULL.
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 NULL NULL 13579
Also, if I add a second house or a second jobId to the above table, which row do I put it in? You could end up with this:
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 42 NULL 13579
Now if I disassociate restaurantId 9876, I could update it to NULL. But that leaves a row of all NULLs, which I really should just delete.
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL NULL
1234 42 NULL 13579
Whereas if I had disassociated restaurant 13579, I could update it to NULL and leave the row in place.
personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 42 NULL NULL
But shouldn't I consolidate rows, moving the jobId to another row, provided there's a vacancy in that column?
personId jobId houseId restaurantId
1234 123 87 5678
1234 42 NULL 9876
The trouble is, now it's getting more and more complex to add or remove associations, requiring multiple SQL statements for changes. You're going to have to write a lot of tedious application code to handle this complexity.
However, all the various changes are easy if you define one table per many-to-many relationship. You do need the complexity of having that many more tables, but by doing that you will simplify your application code.
Adding an association to a restaurant is simply an INSERT
to the Person_Restaurant table. Removing that association is simply a DELETE
. It doesn't matter how many associations there are to jobs or houses. And you can define a primary key constraint in each of these intersection tables to enforce uniqueness.
这篇关于如何在表上建立多个m:n关系的数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!