具有多列的单个固定表与灵活的抽象表 [英] single fixed table with multiple columns vs flexible abstract tables

查看:25
本文介绍了具有多列的单个固定表与灵活的抽象表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道您是否有一个网站,其中包含需要不同字段的十几种不同类型的列表(商店、餐厅、俱乐部、酒店、活动),创建一个具有这样定义的列的表格是否有好处
示例商店:

shop_id |姓名 |X |是 |城市 |区 |地区|地铁|站 |地址 |电话 |电子邮件 |网站 |营业时间

或者类似的更抽象的方法:

object_id |姓名---------------1 |凌乱的乔2 |贝茨汽车旅馆type_id |姓名---------------1 |酒店2 |餐厅object_id |type_id---------------1 |22 |1field_id |姓名 |字段类型---------------1 |地址 |文本2 |营业时间 |日期3 |专业|文本type_id |字段标识---------------1 |11 |22 |12 |3object_id |field_id |价值1 |1 |第一街....1 |3 |英式料理

当然,如果值是预定义的,它会更抽象(例如:专业可以有自己的列表)

如果我采用抽象方法,它会非常灵活,但是如果有很多连接,查询会更加复杂.但我不知道这是否会影响性能,执行这些更复杂"的查询.

我很想知道这两种方法的优缺点.我可以自己想象,但我没有经验来证实这一点.

解决方案

某些问题需要澄清和解决,然后我们才能进行合理的讨论.

先决条件

  1. 标签
    在一个要求精确的行业中,重要的是我们使用精确的标签,以避免混淆,这样我们就可以在无需使用冗长的描述和限定词的情况下进行交流.

    您作为固定表发布的内容是非规范化.公平地说,它可能是对第三范式的尝试,但实际上它是一个平面文件,未规范化(不是非规范化").您作为 AbstractTables 发布的内容准确地说是 Entity-Attribute-Value,它几乎(但不完全)是第六范式,因此比 3NF 更规范化.当然,前提是操作正确.

    • 未规范化的平面文件不是非规范化的".它充满了重复(没有做任何事情来删除重复的组和重复的列或解决依赖关系)和 Null,它在很多方面都是性能猪,并阻止了并发.

    • 为了被非规范化,它必须首先被规范化,然后出于某种充分的理由使规范化退避一点.由于它首先没有被规范化,因此它不能被非规范化.它只是非规范化.

    • 不能说它是非规范化的为了性能",因为作为一个性能猪,它是性能的对立面.好吧,他们需要为缺乏正式设计的理由],以及为了性能".是吗.即使是最细微的正式审查也会暴露出虚假陈述(但很少有人能够提供,因此它一直处于隐藏状态,直到他们让局外人解决,你猜对了,巨大的性能问题).

    • 规范化结构的性能远优于非规范化结构.归一化程度高的结构 (EAV/6NF) 比归一化程度低的结构 (3NF/5NF) 表现更好.

    • 我同意 OMG Ponies 的主旨,但不同意他们的标签和定义

    • 而不是说不要非规范化";除非你必须',我是说,'忠实地标准化,时期''如果有性能问题,你没有正确地标准化'.

  2. 维基百科
    Normal Forms 和 Normalization 的条目提供了不正确的定义;他们混淆了范式;他们缺乏规范化的过程;他们对早已被揭穿的荒谬或有问题的 NF 给予同等重视.结果是,维基百科增加了一个已经混淆且很少被理解的主题.所以不要浪费你的时间.

    但是,为了进步,没有那个参考构成障碍,让我这样说.

    • 3NF 的定义是稳定的,没有改变.
    • 3NF 和 5NF 之间的 NF 存在很多混淆.事实上,这是一个在过去 15 年里取得进展的领域;许多组织、学者和供应商的产品有局限性,纷纷创建了一种新的规范形式".验证他们的产品.所有这些都服务于商业利益并且在学术上不健全.3NF 在其原始未篡改状态下旨在并保证某些属性.
    • 总而言之,今天的 5NF 是 15 年前的 3NF,您可以跳过商业玩笑和十二个左右的特殊".(商业和伪学术)介于两者之间的 NF,其中一些在 Wikipedia 中有所标识,甚至以令人困惑的术语表示.
  3. 第五范式
    由于您已经能够在您的帖子中理解和实施 EAV,因此您理解以下内容不会有任何问题.当然,真正的关系模型是先决条件、强键等.第五范式是,因为我们跳过了第四范式:

    • 第三范式
      • 简单来说,每个表中的每个非键列都与表的主键有 1::1 关系,
      • 并且没有其他非关键列
    • 零数据重复(结果,如果规范化是勤奋推进的;不是仅靠智力或经验,或通过在没有正式流程的情况下将其作为目标而努力实现的)
    • 无更新异常(当您在某处更新一列时,您不必更新位于其他地方的同一列;该列仅存在于一处).
    • 如果您理解上述内容,那么 4NF、BCNF 和所有愚蠢的NF"可以不考虑,它们是学术界提倡的物理化记录归档系统所必需的,与关系模型 (Codd) 非常陌生.
  4. 第六范式

    • 目的是消除缺失数据(属性列),也就是消除空值
    • 这是 Null 问题(也称为处理缺失值)的一个真正解决方案,结果是一个没有 Null 的数据库.(可以使用标准和 Null 替代在 5NF 下完成,但这不是最佳选择.)如何解释和显示缺失值是另一回事.
    • 从技术上讲,它不是真正的范式,因为它没有 5NF 作为先决条件,但它具有价值
  5. EAV 与第六范式
    我写的所有数据库,除了一个,都是纯 5NF.我曾使用(管理、修复、增强)几个 EAV 数据库,并且我已经实现了许多真正的 6NF 数据库.EAV 是 6NF 的松散实现,通常由对规范化和 NF 不太了解但可以看到 EAV 的价值并需要其灵活性的人完成.你是一个完美的例子.

    区别在于:因为它是松散的,并且因为实现者没有一个可以忠实于的引用(6NF),所以他们只实现了他们需要的东西,并且全部用代码编写;最终成为一个不一致的模型.

    然而,纯 6NF 实现确实有一个纯学术参考点,因此它通常更紧密、一致.通常,这显示在两个可见元素中:

    • 6NF 有一个包含元数据的目录,一切都在元数据中定义,而不是代码.EAV 没有,一切都在代码中(实现者跟踪对象和属性).显然,目录简化了列的添加和导航,并允许形成实用程序.
    • 理解 6NF 后,它为空问题提供了真正的解决方案.EAV 实现者,因为他们没有 6NF 上下文,处理代码中丢失的数据,不一致,或者更糟,允许数据库中的空值.6NF 实现者禁止 Null,并一致且优雅地处理丢失的数据,无需代码构造(对于 Null 处理;当然,您仍然必须为丢失的数据编写代码).

例如.对于带有目录的 6NF 数据库,我有一组 procs 将 [重新] 生成执行所有 SELECT 所需的 SQL,并且我为所有用户提供 5NF 中的视图,因此他们不需要知道或了解底层的 6NF 结构.他们被赶出目录.因此,更改很容易且自动化.由于没有目录,EAV 类型会手动执行此操作.

讨论

现在,我们可以开始讨论了.

<块引用>

"当然可以更抽象,如果值是预定义的(例如:专业可以有自己的列表)"

当然.但不要太抽象".保持一致性并以与其他列表相同的 EAV(或 6NF)方式实现此类列表.

<块引用>

"如果我采取抽象的方法可以非常灵活,但查询会有很多连接会更复杂.但是不知道会不会影响性能,执行这些更多复杂的'查询."

  1. 联接在关系数据库中是行人.不是数据库的问题,问题是SQL在处理join时很麻烦,尤其是复合键.

  2. EAV 和 6NF 数据库有更多的 Join,就像行人一样,不多也不少.如果您必须手动对每个 SELECT 进行编码,当然,麻烦会变得非常麻烦.

  3. 整个问题可以通过 (a) 在 EAV 上使用 6NF 和 (b) 实现目录来消除,您可以 (c) 从中生成所有基本 SQL.还消除了一整类错误.

  4. Join 以某种方式有成本是一个普遍的神话.完全错误.

    • 连接是在编译时实现的,没有任何实质内容可以消耗"CPU 周期.
    • 问题在于要连接的表的大小,而不是这些相同表之间的连接成本.
    • 在正确的 PK⇢FK 关系上连接两个各有数百万行的表,每个表都有适当的索引
      (在父[PK]端唯一;在子端唯一[PK=父FK+某物]
      是瞬时的
    • 如果子索引不唯一,但至少前导列有效,则速度较慢;没有有用的索引,当然很慢.
    • 这与加入成本无关.
    • 返回行数多的地方,瓶颈是网络和磁盘布局;不是连接处理.
  5. 因此你可以得到复杂的"随心所欲,没有成本,SQL 可以搞定.

<块引用>

我很想知道是什么两种方法的优缺点.我只能自己想象,但我没有经验证实这个.

  1. 5NF(对于那些没有取得进展的人来说是3NF)在实施方面是最简单和最好的;易用性(开发人员和用户);和维护.

    • 缺点是,每次添加一列,都必须改变数据库结构(表DDL).在某些情况下这很好,但在大多数情况下并非如此,因为变更控制到位,相当繁重.
    • 其次,您必须更改现有代码(处理新列的代码不算数,因为这是必要的):在实施良好标准的情况下,将其最小化;如果它们不存在,则范围是不可预测的.
  2. EAV(这是您发布的内容),允许在不更改 DDL 的情况下添加列.这是人们选择它的唯一原因.(处理新列的代码不算数,因为这是必要的).如果实现得好,不会影响现有代码;如果没有,它会.

  3. 但您需要具备 EAV 能力的开发人员.

    • 当 EAV 实施不当时,它是可恶的,比 5NF 做得不好更糟,但不比大多数数据库所使用的非规范化更糟糕(被误解为性能非规范化").
    • 当然,拥有强大的事务上下文更重要(比在 5NF/3NF 中),因为列的分布要远得多.
    • 同样,保留声明式引用完整性至关重要:我所看到的混乱很大程度上是由于开发人员删除了 DRI,因为它变得太难维护",结果是,正如您想象的那样,数据堆的母体,到处都是重复的 3NF/5NF 行和列.以及不一致的 Null 处理.
  4. 假设服务器已针对预期目的进行了合理配置,则性能没有差异.(好吧,有一些特定的优化只能在 6NF 中进行,而在其他 NF 中是不可能的,但我认为这超出了本线程的范围.)而且,EAV 做得不好会导致不必要的瓶颈,仅此而已非规范化.

  5. 当然,如果你用EAV,我建议更正式;购买全价;与 6NF 一起使用;实施目录;用于生成 SQL 的实用程序;意见;始终如一地处理缺失数据;完全消除空值.这减少了您对开发人员质量的脆弱性;他们可以忘记 EAV/6NF 深奥的问题,使用视图,专注于应用逻辑.

I was wondering if you have a website with a dozen different types of listings (Shops, Restaurants, Clubs, Hotels, Events) that require different fields, is there a benefit of creating a table with columns defined like so
Example Shop:

shop_id | name | X | Y | city | district | area | metro | station | address | phone | email | website | opening_hours

Or a more abstract approach similar to this:

object_id | name        
---------------
1         | Messy Joe's  
2         | Bate's Motel 

type_id | name
---------------
1       | hotel
2       | restaurant


object_id | type_id
---------------
1         | 2
2         | 1

field_id | name           | field_type
---------------
1        | address        | text
2        | opening_hours  | date 
3        | speciality     | text

type_id | field_id
---------------
1       | 1
1       | 2
2       | 1
2       | 3

object_id | field_id | value
1         | 1        | 1st street....
1         | 3        | English Cuisine

Of course it can be more abstract if value's are predefined (Example: specialties could have their own list)

If I take the abstract approach it can be very flexible, but queries will be more complex with a lot of joins. But I don't know if this affects the performance, executing these 'more complex' queries.

I would be interested to know what are the up and downsides of both methods. I can just imagine for myself, but I don't have the experience to confirm this.

解决方案

Certain issues need to be clarified and resolved before we can enter into a reasonable discussion.

Pre-requisite Resolution

  1. Labels
    In a profession that demands precision, it is important that we use precise labels, to avoid confusion, and so that we can communicate without having to use long-winded descriptions and qualifiers.

    What you have posted as FixedTables, is Unnormalised. Fair enough, it may be an attempt at Third Normal form, but in fact it is a flat file, Unnormalised (not "denormalised). What you have posted as AbstractTables is, to be precise, Entity-Attribute-Value, which is almost, but not quite, Sixth Normal form, and is therefore more Normalised than 3NF. Assuming it is done correctly, of course.

    • The Unnormalised flat file is not "denormalised". It is chock full of duplication (nothing has been done to remove repeating groups and duplicate columns or to resolve dependencies) and Nulls, it is a performance hog in many ways, and prevents concurrency.

    • In order to be Denormalised, it has to first be Normalised, and then the Normalisation backed off a little for some good reason. Since it is not Normalised in the first place, it cannot be Denormalised. It is simply Unnormalised.

    • It cannot be said to be denormalised "for performance", because being a performance hog, it is the very antithesis of performance. Well, they need a justification for the lack of formalised design], and "for performance" is it. Even the smallest formal scrutiny exposed the misrepresentation (but very few people can provide, so it remains hidden, until they get an outsider to address, you guessed it, the massive performance problem).

    • Normalised structures perform far better than Unnormalised structures. More normalised structures (EAV/6NF) perform better than less normalised structures (3NF/5NF).

    • I am agreeing with the thrust of OMG Ponies, but not their labels and definitions

    • rather than saying 'don't "denormalise" unless you have to', I am saying, 'Normalise faithfully, period' and 'if there is a performance problem, you have not Normalised correctly'.

  2. Wikipedia
    The entries for Normal Forms and Normalisation offer definitions that are incorrect; they confuse the Normal Forms; they are lacking regarding the process of Normalisation; and they give equal weight to absurd or questionable NFs which have been debunked long ago. The result is, Wikipedia adds to an already confused and rarely understood subject. So don't waste your time.

    However, in order to progress, without that reference posing a hindrance, let me say this.

    • The definition of 3NF is stable, and has not changed.
    • There is a lot of confusion of the NFs between 3NF and 5NF. The truth is that this is an area that progressed over the last 15 years; and many orgs, academics as well as vendors with their products with limitations, jumped to create a new "Normal Form" to validate their offerings. All serving commercial interests and academically unsound. 3NF in its original untampered state intended and guaranteed certain attributes.
    • The sum total is, 5NF is today, what 3NF was intended to be 15 years ago, and you can skip the commercial banter and the twelve or so "special" (commercial and pseudo-academic) NFs in-between, some of which are identified in Wikipedia, and even that in confusing terms.
  3. Fifth Normal Form
    Since you have been able to understand and implement the EAV in your post, you will have no problem understanding the following. Of course a true Relational Model is pre-requisite, strong keys, etc. Fifth Normal Form is, since we are skipping the Fourth:

    • Third Normal Form
      • which in simple definitive terms is, every non-key column in every table has a 1::1 relationship to the Primary Key of the table,
      • and to no other non-key columns
    • Zero data duplication (the result, if Normalisation is progressed diligently; not achieved by intelligence or experience alone, or by working toward it as a goal without the formal process)
    • no Update Anomalies (when you update a column somewhere, you do not have to update the same column located somewhere else; the column exists in one and only one place).
    • If you understand the above, 4NF, BCNF, and all the silly "NFs" can be dismissed, they are required for physicalised Record Filing Systems, as promoted by academics, quite foreign to the Relational Model (Codd).
  4. Sixth Normal Form

    • The purpose is elimination of missing data (attribute columns), aka elimination of Nulls
    • This is the one true solution to the Null Problem (also called Handling Missing Values), and the result is a database without Nulls. (It can be done at 5NF with standards and Null substitutes but that is not optimal.) How you interpret and display the missing values is another story.
    • Technically, is not a true Normal Form, because it does not have 5NF as a pre-requisite, but it has a value
  5. EAV vs Sixth Normal Form
    All the databases I have written, except one, are pure 5NF. I have worked with (administered, fixed up, enhanced) a couple of EAV databases, and I have implemented many true 6NF databases. EAV is a loose implementation of 6NF, often done by people who do not have a good grasp on Normalisation and the NFs, but who can see the value in, and need the flexibility of, EAV. You are a perfect example.

    The difference is this: because it is loose, and because implementers do not have a reference (6NF) to be faithful to, they only implement what they need, and they write it all in code; that ends up being an inconsistent model.

    Whereas, a pure 6NF implementation does have a pure academic reference point, and thus it is usually tighter, and consistent. Typically this shows up in two visible elements:

    • 6NF has a catalogue to contain metadata, and everything is defined in metadata, not code. EAV does not have one, everything is in code (implementers keep track of the objects and attributes). Obviously a catalogue eases the addition of columns, navigation, and allows utilities to be formed.
    • 6NF when understood, provides the true solution to The Null Problem. EAV implementers, since they are absent the 6NF context, handle missing data in code, inconsistently, or worse, allow Nulls in the database. 6NF implementers disallow Nulls, and handle missing Data consistently and elegantly, without requiring code constructs (for Null handling; you still have to code for missing data of course).

Eg. For 6NF databases with a catalogue, I have a set of procs that will [re]generate the SQL required to perform all SELECTs, and I provide Views in 5NF for all users, so they do not need to know or understand the underlying 6NF structure. They are driven off the catalogue. Thus changes are easy and automated. EAV types do that manually, due to the absence of the catalogue.

Discussion

Now, we can start the discussion.

"Of course it can be more abstract if value's are predefined (Example: specialities could have their own list)"

Sure. But do not get too "abstract". Maintain consistency and implement such lists in the same EAV (or 6NF) manner as other lists.

"If I take the abstract approach it can be very flexible, but queries will be more complex with a lot of joins. But I don't know if this affects the performance, executing these 'more complex' queries."

  1. Joins are pedestrian in relational databases. The problem is not the database, the problem is that SQL is cumbersome when handling joins, especially compound keys.

  2. EAV and 6NF databases have more Joins, which just as pedestrian, no more, no less. If you have to code each SELECT manually, sure, the cumbersome gets really cumbersome.

  3. The entire problem can be eliminated by (a) going with 6NF over EAV and (b) implementing a catalogue, from which you can (c) generate all the basic SQL. Eliminates an entire class of errors as well.

  4. It is a common myth that Joins somehow have a cost. Totally false.

    • The join is implemented at compile time, there is nothing of substance to 'cost' CPU cycles.
    • The issue is the size of tables being joined, not the cost of the Join between those same tables.
    • Joining two tables with millions of rows each, on a correct PK⇢FK relation, each of which have the appropriate indices
      (Unique on the parent [PK] side; Unique on the Child side [PK=parent FK + something]
      is instantaneous
    • Where the Child index is not unique, but at least the leading columns are valid, it is slower; where there is no useful index, of course it is very slow.
    • None of it has to do with Join cost.
    • Where many rows are returned, the bottleneck will be the network and the disk layout; not the join processing.
  5. Therefore you can get as "complex" as you like, there is no cost, SQL can handle it.

I would be interested to know what are the up and downsides of both methods. I can just imagine for myself, but I don't have the experience to confirm this.

  1. 5NF (or 3NF for those who have not made the progression) is the easiest and best, in terms of implementation; ease of use (developers as well as users); and maintenance.

    • The drawback is, every time you add a column, you have to change the database structure (table DDL). That is fine is some cases, but not in most cases, due to change control in place, quite onerous.
    • Second, you have to change existing code (code handling the new column does not count, because that is an imperative): where good standards are implemented, that is minimised; where they are absent, the scope is unpredictable.
  2. EAV (which is what you have posted), allows columns to be added without DDL changes. That is the single reason people choose it. (code handling the new column does not count, because that is an imperative). If implemented well, it will not affect existing code; if not, it will.

  3. But you need EAV-capable developers.

    • When EAV is implemented badly, it is abominable, a worse mess than 5NF done badly, but not any worse than Unnormalised which is what most databases out there are (misrepresented as "denormalised for performance").
    • Of course, it is even more important (than in 5NF/3NF) to hold a strong Transaction context, because the columns are far more distributed.
    • Likewise, it is essential to retain Declarative Referential Integrity: the messes I have seen were due in large part to the developers removing DRI because it became "too hard to maintain", the result was, as you can imagine, one mother of a data heap with duplicate 3NF/5NF rows and columns all over the place. And inconsistent Null handling.
  4. There is no difference in performance, assuming that the server has been reasonably configured for the intended purpose. (Ok, there are specific optimisations that are possible only in 6NF, which are not possible in other NFs, but I think that is outside the scope of this thread.) And again, EAV done badly can cause unnecessary bottlenecks, no more so than Unnormalised.

  5. Of course, if you go with EAV, I am recommending more formality; buy the full quid; go with 6NF; implement a catalogue; utilities to produce SQL; Views; handle Missing Data consistently; eliminate Nulls altogether. This reduces your vulnerability to the quality of your developers; they can forget about the EAV/6NF esoteric issues, use Views, and concentrate on the app logic.

这篇关于具有多列的单个固定表与灵活的抽象表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆