建模和归一化时如何处理空值? [英] What to do with null values when modeling and normalizing?

查看:48
本文介绍了建模和归一化时如何处理空值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 SQL 新手(仍在学习),我必须为场地创建一个数据库.客户预订的活动房间.问题是客户并不总是提供他们的姓名、电子邮件和电话号码.大多数情况下,要么是姓名和电子邮件,要么是姓名和电话.很少是全部 3,但它发生了.我需要将这些存储在各自的属性中(姓名、电子邮件、电话).但是他们给我他们的信息的方式,我有很多空值.我可以用这些空值做什么?有人告诉我最好不要有空值.之后我还需要规范化我的表格.请提出任何建议.

I'm new in SQL(still learning) and I have to create a database for a venue. A client book for a room for an event. The problem is that the clients don't always provide their name, their email, and their phone number. Most of the time it's either name and email or name and phone. It's rarely all 3 but it happens. I need to store each of these in their respective attribute (name, email, phone). But the way they give me their info, I have a lot of null values. What can I do with these nulls? I've been told that it's better to not have nulls. I also need to normalize my table after that. Any suggestions please.

推荐答案

SQL 根据其 3VL(3 值逻辑)版本对 NULL 进行特殊处理.标准化&其他关系理论没有.但是,我们可以将 SQL 设计转换为关系设计并返回.(假设这里没有重复的行.)

SQL treats NULL specially per its version of 3VL (3-valued logic). Normalization & other relational theory does not. However, we can translate SQL designs into relational designs and back. (Assume no duplicate rows here.)

归一化发生在 关系 中,并根据不特别对待 NULL 的运算符进行定义.术语规范化"有两种最常见的不同含义:将表放入1NF"中;并转化为更高的 NF(标准形式)".NULL 不影响标准化为 1NF".标准化到更高的 NF"用自然连接回它的较小表替换表.出于规范化的目的,除了 SQL 类型的值之外,您还可以将 NULL 视为可空列的域中允许的值.如果我们的 SQL 表没有 NULL,那么我们可以将它们解释为关系 &SQL 连接等作为连接等.但是如果您分解组件之间共享可空列的位置,然后意识到要在 SQL 中重建原始列,您必须在相同名称的列上进行 SQL 连接,或两者都是 NULL.并且您不会希望在 SQL 数据库中使用此类 CK(候选密钥).例如,您不能将其声明为 SQL PK(主键),因为这意味着 UNIQUE NOT NULL.例如,涉及可空列的 UNIQUE 约束允许多行在该列中具有 NULL,即使这些行在每一列中具有相同的值.例如,SQL FK 中的 NULL 会导致它们被满足(以各种方式每个 MATCH 模式),不会因未出现在引用表中而失败.(但 DBMS 与标准 SQL 有异曲同工之处.)

Normalization happens to relations and is defined in terms of operators that don't treat NULL specially. The term "normalization" has two most common distinct meanings: putting a table into "1NF" and into "higher NFs (normal forms)". NULL doesn't affect "normalization to 1NF". "Normalization to higher NFs" replaces a table by smaller tables that natural join back to it. For purposes of normalization you could treat NULL like a value that is allowed in the domain of a nullable column in addition to the values of its SQL type. If our SQL tables have no NULLs then we can interpret them as relations & SQL join etc as join, etc. But if you decompose where a nullable column was shared between components then realize that to reconstruct the original in SQL you have to SQL join on same-named columns being equal or both NULL. And you won't want such CKs (candidate keys) in an SQL database. Eg you can't declare it as an SQL PK (primary key) because that means UNIQUE NOT NULL. Eg a UNIQUE constraint involving a nullable column allows multiple rows that have a NULL in that column, even if the rows have the same values in every column. Eg NULLs in SQL FKs cause them to be satisfied (in various ways per MATCH mode), not to fail from not appearing in the referenced table. (But DBMSs idiosyncratically differ from standard SQL.)

不幸的是,分解可能会导致表的所有 CKs 包含NULL,因此我们没有什么可声明为SQL PK 或UNIQUE NOT NULL.唯一确定的解决方案是转换为无 NULL 设计.在标准化之后,我们可能希望在组件中重新引入一些可空性.

Unfortunately decomposition might lead to a table with all CKs containing NULL, so that we have nothing to declare as SQL PK or UNIQUE NOT NULL. The only sure solution is to convert to a NULL-free design. After then normalizing we might want to reintroduce some nullability in the components.

在实践中,我们设法设计表,以便始终有一组可以通过 SQL PK 或 UNIQUE NOT NULL 声明为 CK 的无 NULL 列.然后我们可以通过从表中删除它并添加一个包含该列的表和一些无 NULL CK 的列来摆脱一个可为空的列:如果旧设计中的一行的列是非空的,那么一行它的 CK 子行和列值进入添加的表中;否则在旧设计中它是 NULL 并且在添加的表中没有相应的行.(原表是新表的自然左连接.)当然,我们也要把查询从旧的设计修改为新的设计.

In practice, we manage to design tables so that there is always a set of NULL-free columns that we can declare as CK, via SQL PK or UNIQUE NOT NULL. Then we can get rid of a nullable column by dropping it from the table and adding a table with that column and the columns of some NULL-free CK: If the column is non-NULL for a row in the old design then a row with its CK subrow and column value go in the added table; otherwise it is NULL in the old design and no corresponding row is in the added table. (The original table is a natural left join of the new ones.) Of course, we also have to modify queries from the old design to the new design.

我们总是可以通过一种设计来避免 NULL,该设计为每个旧的可空列添加一个布尔列,并使旧列 NOT NULL.新列对于一行表示旧列在旧设计中是否为 NULL,当为 true 时,旧列是我们为该类型在整个数据库中为此目的选择的某个值.当然,我们也必须将查询从旧设计修改为新设计.

We can always avoid NULLs via a design that adds a boolean column for each old nullable column and has the old column NOT NULL. The new column says for a row whether the old column was NULL in the old design and when true has the old column be some one value that we pick for that purpose for that type throughout the database. Of course, we also have to modify queries from the old design to the new design.

是否要避免 NULL 是一个单独的问题.您的数据库在某些方面可能更好"或更糟"为您的应用程序设计.避免 NULL 背后的想法是 它使查询的含义复杂化,因此与来自更多无 NULL 表的更多连接的复杂性.(通常通过在查询表达式中尽可能靠近它们出现的位置删除 NULL 来管理这种反常性.)

Whether you want to avoid NULL is a separate question. Your database might in some way be "better" or "worse" for your application with either design. The idea behind avoiding NULL is that it complicates the meanings of queries, hence complicates querying, in a perverse way, compared to the complication of more joins from more NULL-free tables. (That perversity is typically managed by removing NULLs in query expressions as close to where they appear as possible.)

PS 许多 SQL 术语,包括 PK &FK 不同于关系术语.SQL PK 的意思更像是超级键;SQL FK 的意思更像是外键;但是谈论超级密钥"甚至没有任何意义.在 SQL 中:

PS Many SQL terms including PK & FK differ from the relational terms. SQL PK means something more like superkey; SQL FK means something more like foreign superkey; but it doesn't even make sense to talk about a "superkey" in SQL:

由于 SQL 表与关系的相似性,涉及关系的术语被草率地应用于表.但是虽然你可以借用术语并赋予它们 SQL 含义——值、表、FD(函数依赖)、superkey、CK(候选键)、PK(主键)、FK(外键)、join、和、谓词、NF(标准形式)、规范化、1NF 等——你不能仅仅用那些 SQL 含义代替 RM 定义、定理或算法中的那些词,并得到一些合理或真实的东西.此外,RM 概念的 SQL 演示几乎从未实际上告诉您如何将 RM 概念完美地应用于 SQL 数据库.他们只是模仿 RM 演示文稿,不知道他们对术语的 SQL 含义的使用是否会使事情变得荒谬或无效.

Because of the resemblance of SQL tables to relations, terms that involve relations get sloppily applied to tables. But although you can borrow terms and give them SQL meanings--value, table, FD (functional dependency), superkey, CK (candidate key), PK (primary key), FK (foreign key), join, and, predicate, NF (normal form), normalize, 1NF, etc--you can't just substitute those SQL meanings for those words in RM definitions, theorems or algorithms and get something sensible or true. Moreover SQL presentations of RM notions almost never actually tell you how to soundly apply RM notions to an SQL database. They just parrot RM presentations, oblivious to whether their use of SQL meanings for terms makes things nonsensical or invalid.

这篇关于建模和归一化时如何处理空值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆