具有NULL的唯一键 [英] Unique key with NULLs

查看:224
本文介绍了具有NULL的唯一键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题需要一些假设的背景。让我们考虑有 name ,,date_of_birth title salary ,使用MySQL作为RDBMS。因为如果任何一个人与另一个人有相同的名字和出生日期,根据定义,他们是同一个人(除非有巧合,我们有两个人名为亚伯拉罕·林肯出生于1809年2月12日),我们将把一个 name date_of_birth 上的唯一键,表示不要存储同一个人两次。现在考虑这个数据:

  id name date_of_birth标题工资
1 John Smith 1960-10-02总裁500,000
2 Jane Doe 1982-05-05会计80,000
3 Jim Johnson NULL办公室经理40,000
4 Tim Smith 1899-04-11 Janitor 95,000

如果我现在尝试运行以下语句,它应该会失败:

  INSERT INTO employee(name,date_of_birth,title,salary)
VALUES('Tim Smith','1899-04-11','Janitor','95,000')

如果我尝试这个,它会成功:

  INSERT INTO employee(name,title,salary)
VALUES('Jim Johnson','Office Manager','40,000')
/ pre>

现在我的数据看起来像这样:

  id name date_of_birth标题工资
1 John Smith 1960-10-02总裁500,000
2 Jane Doe 1982-05-05会计80,000
3 Jim Johnson NULL办公室经理40,000
4 Tim Smith 1899-04-11 Janitor 95,000
5 Jim Johnson NULL办公室经理40,000

不是我想要的,但我不能说我完全不同意发生了什么。如果我们用数学集来说话,

  {'Tim Smith','1899-04-11'} = {' Tim Smith','1899-04-11'}<  -  TRUE 
{'Tim Smith','1899-04-11'} = {'Jane Doe','1982-05-05'} < - FALSE
{'Tim Smith','1899-04-11'} = {'Jim Johnson',NULL}< - UNKNOWN
{'Jim Johnson',NULL} = {'Jim Johnson',NULL}< - UNKNOWN

我的猜测是MySQL说,由于我不是知道 NULL 出生日期的吉姆约翰逊已经不在此表中,我会添加他。



我的问题是:即使 date_of_birth 不总是已知,我如何才能防止重复?我到目前为止最好的是把 date_of_birth 移动到不同的表。然而,问题是,我可能会说,两个收银员具有相同的名称,头衔和薪水,不同的出生日期,没有办法存储他们,没有重复。

解决方案

唯一键的基本属性是
它必须是唯一的。



您的问题有两种可能的解决方案:




  • 一种方式,错误的方式,将使用一些魔术日期来表示未知。这只是让你超过
    DBMS问题,但不解决在逻辑意义上的问题。
    预期两个John Smith条目的问题,出现日期不明的
    。这些家伙是同一个还是他们独特的个人?
    如果你知道他们不同,那么你回到同一个老问题 -
    你的唯一键是不是唯一的。甚至不要考虑分配一整个魔术日期
    来表示未知 - 这是真正的地狱之路。


  • 更好的方法是创建一个EmployeeId属性作为代理键。这只是一个
    任意标识符,您分配给知道的个人是唯一的。这个
    标识符通常只是一个整数值。
    然后创建一个Employee表以将EmployeeId(唯一的,不可为空的
    键)与您相信的依赖属性相关联,在本例中为
    姓名和出生日期(其中任何一个可以是可空的)。在
    先前使用的名称/出生日期的任何地方使用EmployeeId代理键。这会向系统添加一个新表,但
    以稳健的方式解决未知值的问题。



This question requires some hypothetical background. Let's consider an employee table that has columns name, date_of_birth, title, salary, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name and date_of_birth that means "don't store the same person twice." Now consider this data:

id name        date_of_birth title          salary
 1 John Smith  1960-10-02    President      500,000
 2 Jane Doe    1982-05-05    Accountant      80,000
 3 Jim Johnson NULL          Office Manager  40,000
 4 Tim Smith   1899-04-11    Janitor         95,000

If I now try to run the following statement, it should and will fail:

INSERT INTO employee (name, date_of_birth, title, salary)
VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')

If I try this one, it will succeed:

INSERT INTO employee (name, title, salary)
VALUES ('Jim Johnson', 'Office Manager', '40,000')

And now my data will look like this:

id name        date_of_birth title          salary
 1 John Smith  1960-10-02    President      500,000
 2 Jane Doe    1982-05-05    Accountant      80,000
 3 Jim Johnson NULL          Office Manager  40,000
 4 Tim Smith   1899-04-11    Janitor         95,000
 5 Jim Johnson NULL          Office Manager  40,000

This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,

{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE
{'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE
{'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN
{'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN

My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL birth date isn't already in this table, I'll add him."

My question is: How can I prevent duplicates even though date_of_birth is not always known? The best I've come up with so far is to move date_of_birth to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.

解决方案

A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property.

There are two possible solutions to your problem:

  • One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell.

  • A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you know are unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.

这篇关于具有NULL的唯一键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆