具有NULL的唯一键 [英] Unique key with NULLs
问题描述
这个问题需要一些假设的背景。让我们考虑有 name ,
,date_of_birth
的 ,
title
, salary
,使用MySQL作为RDBMS。因为如果任何一个人与另一个人有相同的名字和出生日期,根据定义,他们是同一个人(除非有巧合,我们有两个人名为亚伯拉罕·林肯出生于1809年2月12日),我们将把一个 name
和 date_of_birth
上的唯一键,表示不要存储同一个人两次。现在考虑这个数据:
id name date_of_birth标题工资
1 John Smith 1960-10-02总裁500,000
2 Jane Doe 1982-05-05会计80,000
3 Jim Johnson NULL办公室经理40,000
4 Tim Smith 1899-04-11 Janitor 95,000
如果我现在尝试运行以下语句,它应该会失败:
INSERT INTO employee(name,date_of_birth,title,salary)
VALUES('Tim Smith','1899-04-11','Janitor','95,000')
如果我尝试这个,它会成功:
INSERT INTO employee(name,title,salary)
/ pre>
VALUES('Jim Johnson','Office Manager','40,000')
现在我的数据看起来像这样:
id name date_of_birth标题工资
1 John Smith 1960-10-02总裁500,000
2 Jane Doe 1982-05-05会计80,000
3 Jim Johnson NULL办公室经理40,000
4 Tim Smith 1899-04-11 Janitor 95,000
5 Jim Johnson NULL办公室经理40,000
不是我想要的,但我不能说我完全不同意发生了什么。如果我们用数学集来说话,
{'Tim Smith','1899-04-11'} = {' Tim Smith','1899-04-11'}< - TRUE
{'Tim Smith','1899-04-11'} = {'Jane Doe','1982-05-05'} < - FALSE
{'Tim Smith','1899-04-11'} = {'Jim Johnson',NULL}< - UNKNOWN
{'Jim Johnson',NULL} = {'Jim Johnson',NULL}< - UNKNOWN
我的猜测是MySQL说,由于我不是知道
NULL
出生日期的吉姆约翰逊已经不在此表中,我会添加他。
我的问题是:即使
date_of_birth
不总是已知,我如何才能防止重复?我到目前为止最好的是把date_of_birth
移动到不同的表。然而,问题是,我可能会说,两个收银员具有相同的名称,头衔和薪水,不同的出生日期,没有办法存储他们,没有重复。解决方案唯一键的基本属性是
它必须是唯一的。
您的问题有两种可能的解决方案:
-
一种方式,错误的方式,将使用一些魔术日期来表示未知。这只是让你超过
DBMS问题,但不解决在逻辑意义上的问题。
预期两个John Smith条目的问题,出现日期不明的
。这些家伙是同一个还是他们独特的个人?
如果你知道他们不同,那么你回到同一个老问题 -
你的唯一键是不是唯一的。甚至不要考虑分配一整个魔术日期
来表示未知 - 这是真正的地狱之路。 -
更好的方法是创建一个EmployeeId属性作为代理键。这只是一个
任意标识符,您分配给知道的个人是唯一的。这个
标识符通常只是一个整数值。
然后创建一个Employee表以将EmployeeId(唯一的,不可为空的
键)与您相信的依赖属性相关联,在本例中为
姓名和出生日期(其中任何一个可以是可空的)。在
先前使用的名称/出生日期的任何地方使用EmployeeId代理键。这会向系统添加一个新表,但
以稳健的方式解决未知值的问题。
This question requires some hypothetical background. Let's consider an employee
table that has columns name
, date_of_birth
, title
, salary
, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name
and date_of_birth
that means "don't store the same person twice." Now consider this data:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
If I now try to run the following statement, it should and will fail:
INSERT INTO employee (name, date_of_birth, title, salary)
VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')
If I try this one, it will succeed:
INSERT INTO employee (name, title, salary)
VALUES ('Jim Johnson', 'Office Manager', '40,000')
And now my data will look like this:
id name date_of_birth title salary
1 John Smith 1960-10-02 President 500,000
2 Jane Doe 1982-05-05 Accountant 80,000
3 Jim Johnson NULL Office Manager 40,000
4 Tim Smith 1899-04-11 Janitor 95,000
5 Jim Johnson NULL Office Manager 40,000
This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,
{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE
{'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE
{'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN
{'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN
My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL
birth date isn't already in this table, I'll add him."
My question is: How can I prevent duplicates even though date_of_birth
is not always known? The best I've come up with so far is to move date_of_birth
to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.
A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property.
There are two possible solutions to your problem:
One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell.
A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you know are unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.
这篇关于具有NULL的唯一键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!