如何在我的数据库中避免 NULL,同时也表示丢失的数据? [英] How can I avoid NULLs in my database, while also representing missing data?

查看:19
本文介绍了如何在我的数据库中避免 NULL,同时也表示丢失的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SQL 和关系理论(CJ Date,2009)第 4 章中提倡避免重复行,同时也避免 我们存储的数据中的 NULL 属性.虽然我可以轻松避免重复行,但我正在努力了解如何在不使用 NULL 的情况下对数据进行建模.以以下为例 - 这有点来自工作.

In SQL and Relational Theory (C.J. Date, 2009) chapter 4 advocates avoiding duplicate rows, and also to avoid NULL attributes in the data we store. While I have no troubles avoiding duplicate rows, I am struggling to see how I can model data without making use of NULL. Take the following, for example - which is a bit from work.

我们有一个 artist 表,其中包含一个 gender 列.这是 gender 表的外键.然而,对于一些艺术家,我们不知道他们的性别——例如,我们得到了一个没有艺术家描述的新音乐列表.如何不使用 NULL 来表示这些数据?我看到的唯一解决方案是在 gender 表中添加一个新的性别未知".

We have an artist table, which has, amongst other columns, a gender column. This is a foreign key to the gender table. However, for some artists, we don't know their gender - for example we've been given a list of new music which has no descriptions of the artist. How, without using NULL is one meant to represent this data? The only solution I see is to add a new gender, "unknown", to the gender table.

虽然我非常喜欢这本书,但当章节结束时我真的很失望:

While I am thoroughly enjoying this book, I was really disappointed when the chapter concluded with:

当然,如果禁止空值,那么丢失的信息将不得不通过其他方式处理.不幸的是,这些其他方法过于复杂,无法在此详细讨论.

Of course, if nulls are prohibited, then missing information will have to be handled by some other means. Unfortunately, those other means are much too complex to be discussed in detail here.

这真是一种耻辱 - 因为这是我一直在等待阅读的解决方案!有一个阅读附录的参考,其中有很多出版物可供阅读,但在我深入阅读这些出版物之前,我希望能有更多脚踏实地的总结.

Which is a real shame - because this was the solution I was waiting to read about! There is a reference to read the appendix which has lots of publications to read, but I was hoping for a little bit more of a down to earth summary before I dived into reading these.

有些人评论说他们不明白为什么我希望避免使用NULL",因此我将再次引用这本书.进行以下查询:

I'm getting a few people commenting that they don't understand why I wish to avoid 'NULL' so I will quote the book again. Take the following query:

SELECT s.sno, p.pno
  FROM s, p
 WHERE s.city <> p.city
    OR p.city <> 'Paris'

现在,以 s.city 为伦敦,p.city 为巴黎为例.在这种情况下,伦敦 <> 巴黎,因此查询为真.现在假设 p.city 不是巴黎,而是 xyz.在这种情况下, (London <> xyz) OR (xyz <> Paris) 也是 True.所以,给定任何数据 - 这个查询是正确的.但是,如果 xyz 为NULL",则场景会发生变化.在这种情况下,这两个表达式既不是 True 也不是 False,它们实际上是 Unknown.在这种情况下,由于结果未知,您将不会返回任何行.

Now, take the example that s.city is London, and p.city is Paris. In this case, London <> Paris, so the query is true. Now take the case that p.city is not Paris, and is infact xyz. In this case, (London <> xyz) OR (xyz <> Paris) is also True. So, given any data - this query is true. However, if xyz is 'NULL' the scenario changes. In this case both of these expressions are neither True nor False, they are in fact, Unknown. And in this case because the result is unknown you will not get any rows returned.

从 2 值逻辑到 3 值逻辑的转变很容易引入这样的错误.事实上,我刚刚在工作中介绍了一个激发了这篇文章的动机.我想要 type != 0 的所有行但是,这实际上最终匹配 type == 0 OR type IS NULL - 令人困惑的行为.

The move from 2 value logic to 3 value logic can easily introduce bugs like this. Infact, I just introduced one at work which motivated this very post. I wanted all rows where the type != 0 However, this actually ends up matching type == 0 OR type IS NULL - confusing behavior.

我是否在将来使用或不使用 NULL 对我的数据进行建模尚不清楚,但我很好奇其他解决方案是什么.(我也一直认为如果你不知道,你应该使用 NULL).

Whether or not I model my data with or without NULL in the future is unclear, but I'm very curious what the other solutions are. (I too have always been of the argument that if you don't know, you should use NULL).

推荐答案

干得好,消除空值.我从来没有在我的任何数据库中允许空值.

Good on you, for eliminating Nulls. I have never allowed Nulls in any of my databases.

当然,如果禁止空值,则必须通过其他方式处理丢失的信息.不幸的是,这些其他方法过于复杂,无法在此详细讨论.

其实一点也不难.有三种选择.

Actually it is not so hard at all. There are three alternatives.

  1. 这是关于 如何处理丢失不使用 NULL 的信息 作者 H Darwen,这可能有助于解决问题.

  1. Here's a paper on How To Handle Missing Information Without Using NULL by H Darwen, that may help to get your head around the problem.

1.1.第六范式就是答案.但是您不必将整个数据库规范化为 6NF.对于每个可选的列,您需要一个主表外的子表,只有 PK,这也是 FK,因为它是 1::0-1 关系.除了 PK 之外,唯一的列是可选列.

1.1. Sixth Normal Form is the answer. But you do not have to normalise your entire database to 6NF. For each column that is optional, you need a child table off the main table, with just the PK, which is also the FK, because it is a 1::0-1 relation. Other than the PK, the only column is the optional column.

看看这个数据模型;第 4 页的 AssetSerial 是一个经典案例:并非所有的Assets 都有 SerialNumbers;但是当他们这样做时,我希望他们存储它们;更重要的是我想确保它们是独一无二的.

Look at this Data Model; AssetSerial on page 4 is a classic case: not allAssets have SerialNumbers; but when they do, I want them to store them; more important I want to ensure that they are Unique.

(顺便说一句,对于 OO 的人来说,这是关系符号中的三级类图,具体表继承",没什么大不了的,我们已经有 30 年了.)

(For the OO people out there, incidentally, that is a three level class diagram in Relational notation, a "Concrete Table Inheritance", no big deal, we've had it fro 30 years.)

1.2.对于每个这样的表,使用一个视图来提供表的 5NF 形式.当然,使用 Null(或任何适合该列的值)来标识任何行的列不存在.但不要通过视图更新.

1.2. For each such table, use a View to provide the 5NF form of the table. Sure, use Null (or any value that is appropriate for the column) to identify the absence of the column for any row. But do not update via the view.

1.3 不要使用直连接来抓取 6NF 列.也不要使用外部联接(并让服务器为缺失的行填充 Null).使用子查询来填充列,并指定要为缺失值返回的值(除非您有 Oracle,因为它的子查询处理甚至比其设置处理更差).例如.只是一个例子.您可以将数字列转换为字符串,并对缺失的行使用Missing".

1.3 Do not use straight joins to grab the 6NF column. Do not use outer joins, either (and have the server fill in a Null for the missing rows). Use a subquery to populate the column, and specify the value that you want returned for a missing value (except if you have Oracle, because its Subquery processing is even worse than its set processing). Eg. and just an eg. you can convert a numeric column to string, and use "Missing" for the missing rows.

当您不想走那么远 (6NF) 时,您还有两个选择.

When you do not want to go that far (6NF), you have two more options.

  1. 您可以使用 Null 替代品.我使用 CHAR(0) 表示字符 colomns 和 0 表示数字.但我不允许 FK 这样做.显然,您需要一个超出正常数据范围的值.这不允许三值逻辑.

  1. You can use Null substitutes. I use CHAR(0) for character colomns and 0 for numeric. But I do not allow that for FKs. Obviously you need a value that is outside the normal range of data. This does not allow Three Valued Logic.

除了(2)之外,对于每个 Nullable 列,您需要一个布尔指标.对于 Sex 列的示例,指标类似于 SexIsMissingSexLess(抱歉).这允许非常紧密的三值逻辑.5% 中的许多人喜欢它,因为 db 保持在 5NF(和更少的表);缺少信息的列加载了从未使用过的值;它们仅在指标为假时使用.如果您有企业数据库,则可以将其包装在函数中,并始终使用 UDF,而不是原始列.

In addition to (2), for each Nullable column, you need a boolean Indicator. For the example of the Sex column, the Indicator would be something like SexIsMissing or SexLess (sorry). This allows very tight Three Valued Logic. Many people in that 5% like it because the db remains at 5NF (and less tables); the columns with missing info are loaded with values that are never used; they are only used if the Indicator is false. If you have an enterprise db, you can wrap that in a Function, and always use the UDF, not the raw column.

当然,在所有情况下,您都无法逃避编写处理缺失信息所需的代码.无论是 ISNULL(),还是 6NF 列的子查询,还是使用值前要检查的 Indicator,或 UDF.

Of course, in all cases, you can never get away from writing code that is required to handle the missing info. Whether it is ISNULL(), or a subquery for the 6NF column, or an Indicator to check before using the value, or an UDF.

如果 Null 具有特定含义...... 那么它就不是 Null!根据定义,Null 是未知值.

If Null has a specific meaning ... then it is not a Null! By definition, Null is the Unknown Value.

这篇关于如何在我的数据库中避免 NULL,同时也表示丢失的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆