单个固定表与多个列和灵活的抽象表 [英] single fixed table with multiple columns vs flexible abstract tables

查看:110
本文介绍了单个固定表与多个列和灵活的抽象表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道,如果你有一个网站有十多种不同类型的列表(商店,餐馆,俱乐部,酒店,事件),需要不同的字段,有一个好处是创建一个表,列定义如此

示例商店:

  shop_id |名称| X | Y |城市|区|区域|地铁|站|地址|电话|电子邮件|网站| open_hours 

或类似这样的更抽象的方法:

  object_id | name 
---------------
1 | Messy Joe的
2 | Bate's Motel

type_id | name
---------------
1 |酒店
2 |餐厅


object_id | type_id
---------------
1 | 2
2 | 1

field_id |名称| field_type
---------------
1 |地址| text
2 | open_hours | date
3 |专业| text

type_id | field_id
---------------
1 | 1
1 | 2
2 | 1
2 | 3

object_id | field_id | value
1 | 1 |第一街....
1 | 3 |英语菜

当然,如果价值是预先定义的,列表)



如果我采用抽象方法,它可以非常灵活,但查询会更复杂,有很多连接。
但是我不知道这是否影响性能,执行这些更复杂的查询。



我很想知道这两种方法的上下是什么。我可以想象自己,但我没有经验来证实这一点。

解决方案

某些问题需要澄清



先决条件解决


$ b


  1. 标签

    在需要精确度的专业中,重要的是我们使用精确的标签,以避免混淆,我们可以进行通信,而无需使用冗长的描述和限定符。



    您发布为FixedTables的是未归一化。公平的,它可能是第三正常形式的尝试,但实际上它是一个平面文件,非正规化(不是denormalised)。你发布为AbstractTables是,准确地说,实体属性值,这几乎是但不完全是第六普通形式,因此比3NF更标准化。假设它正确地完成了。




    • Unnormalised平面文件不是denormalised,它是充满了重复(没有做删除重复组和重复列或解决依赖关系)和Null,它是一个性能猪


    • 为了被Denormlaised,它必须首先被规范化,然后正规化退回一点一些好的原因因为它不是归一化的,它不能被去归一化,它只是未归一化。


    • 它不能被说成是 ,因为作为一个性能猪,它是性能的反对。嗯,他们需要一个理由缺乏正式的设计],和为性能是它。即使是最小的正式审查暴露了失实陈述(但很少有人可以提供,所以它仍然是隐藏的,直到他们得到一个局外人解决,你猜到,巨大的性能问题)。


    • 规范化结构的性能远优于非规范化结构。更规一化的结构(EAV / 6NF)比较不规则的结构(3NF / 5NF)表现更好。


    • 我同意OMG小马的主旨,但不同意他们的标签和定义


    • 除非你必须',我在说,如果有性能问题,我会说,正常化忠实,周期和 ,您尚未正确归一化



  2. Wiki

    正常表单和规范化是一个完整的笑话。具体来说,定义是不正确的;他们混淆了正常形式;他们无能为力的归一化过程;他们对那些早已被揭穿的荒谬或有疑问的NF给予同等的权重。结果是,维基增加了一个已经困惑和很少理解的主题。所以不要浪费你的时间。



    然而,为了进步,没有这个参考构成障碍,让我说这个。




    • 3NF的定义是稳定的,并且未发生变化。

    • NF在3NF和5NF之间的混淆。事实是,这是一个在过去15年中取得进展的领域;和许多组织,学术界以及供应商,他们的产品有限制,跳转创建一个新的正常形式来验证他们的产品。所有服务的商业利益和学术上不健全。 3NF在其原始的未篡改状态意图和保证某些属性。

    • 总和是,5NF是今天,3NF是15年前,你可以跳过商业扑克和十二个左右的特殊学术)NF之间,其中一些在Wiki中被识别,甚至是在混乱的条款。



  3. 由于您已经能够在您的帖子中了解和实施EAV,因此您不必理解以下内容。当然,一个真正的关系模型是先决条件,强键等。第五正常表单是,因为我们跳过第四:




    • 第三个正常表单

      • 在简单的定义中,每个表中的每个非键列都与主键有1 :: 1关系


      • 零数据复制

      • 没有更新异常(当您更新列时(如果正常化进行得很顺利,无法通过智能或经验独自完成,在某处,您不必更新位于其他位置的同一列;该列只存在于一个地方)。



    • 第六个正常表单当然是第五个正常表单,另外还有




      • 消除缺失的数据(列)。这是对空问题(也称为处理缺失值)的一个真正的解决方案,结果是一个没有空值的数据库。 (它可以在5NF,标准和Null替代品,但这不是最佳的。)如何解释和显示缺失值是另一个故事。



    • EAV vs第六正常表单

      我写的所有数据库,除了一个,都是纯5NF。我已经(管理,修复,增强)几个EAV数据库,我已经实现了一个真正的6NF数据库。 EAV是6NF的松散实现,通常由对标准化和NF不了解的人来完成,但是谁能看到EAV的价值,并且需要EAV的灵活性。你是一个完美的例子。区别是这样的:因为它是松散的,并且因为实现者没有引用(6NF)来忠实于它们,它们只实现他们需要的,并且它们都在代码中写;结果是一个不一致的模型。



      然而,纯6NF实现具有纯学术参考点,因此它通常更紧密,更一致。通常这显示在两个可见的元素:

      • 6NF有一个目录包含元数据,一切都定义在元数据,而不是代码。 EAV没有一个,一切都在代码中(实现者跟踪对象和属性)。显然,目录简化了添加列,导航,并允许形成实用程序。

      • 6NF当理解时,提供真空解决方案的空问题。 EAV实现者,因为他们缺少6NF上下文,处理代码中缺少的数据,不一致或更糟,允许数据库中的Null。 6NF实现者不允许Null,并且一致和优雅地处理丢失的数据,而不需要代码构造(对于Null处理;您仍然必须对丢失的数据进行编码)



        例如。对于具有目录的6NF数据库,我有一组proc,将生成执行所有SELECT所需的SQL,我为所有用户提供5NF中的Views,因此他们不需要知道或理解底层的6NF结构。他们被赶出目录。因此,改变是容易和自动的。 EAV类型由于缺少目录而手动执行。


现在,我们可以开始



讨论


当然,如果
值是预定义的(例如:
专业可以有自己的
列表)


当然可以。但不要太抽象。保持一致性,并以与其他列表相同的EAV(或6NF)方式实施此类列表。


方法它
可以非常灵活,但查询将
更复杂与许多连接
但是我不知道这是否影响
性能,执行这些'更多
复杂的查询。





  1. 关系数据库。问题不是数据库,问题是处理连接时,尤其是复合键,SQL是麻烦的。

  2. EAV和6NF数据库有更多的连接,就像行人,没有更多。

  3. 整个问题可以通过(a)使用6NF over EAV和(b)实现目录来消除,从中可以(c)生成所有的基本SQL。

  4. 这是一个常见的神话,加入会以某种方式有成本。完全假。连接在编译时实现,没有什么是实质的成本CPU周期。问题是要连接的表的大小,而不是这些表之间的连接的成本。连接两个表,每个数百万行,在一个正确的PK⇢FK关系,每个都有适当的索引(父[FK]侧唯一;唯一在孩子一侧)是即时的; ;其中Child索引不是唯一的,但至少前导列有效,它更慢;其中没有有用的指数,当然是非常缓慢。它与加入成本无关。在返回许多行的情况下,瓶颈将是网络和磁盘布局;



  • 因此,您可以随心所欲获得复杂


    我会有兴趣知道什么是
    这两种方法的上下方向。
    我可以想象为自己,但我
    没有确认
    的经验。





    1. 5NF(或3NF,对于那些没有进展)是最简单和最好的,在实施,易用性用户),维护。缺点是,每次添加列时,都必须更改数据库结构(表DDL)。这是很好的一些情况,但不是在大多数情况下,由于变化控制到位,相当繁重。第二,你必须改变现有的代码(代码处理新的列不计数,因为这是一个命令):其中实现良好的标准,这是最小化;


    2. EAV(这是您发布的)允许添加列而不进行DDL更改。这是人们选择它的唯一原因。 (代码处理新列不计算,因为这是一个命令)。如果实施得很好,它不会影响现有代码;如果没有,它会。但是你需要有EAV的开发人员。当EAV被执行得很糟糕时,它是可恶的,比5NF糟糕的糟糕,但没有比Unnormalised更糟糕,这是大多数数据库有(错误表示为Denormalised for performance)。当然,更重要的是(比在5NF / 3NF中)保持一个强的事务上下文,因为列更分散。同样,必须保留声明性引用完整性:我所看到的混乱在很大程度上是由于开发人员删除DRI,因为它变得太难以维护,结果是,你可以想象,一个数据的母亲堆与重复的3NF / 5NF行和列在所有地方。和不一致的空处理。


    3. 假设服务器已经为预期目的进行了合理配置,性能没有差别。 (好吧,有特定的优化,只有在6NF,这是不可能在其他NF,但我认为这是在这个线程的范围之外的可能)。再次,EAV做得不好可能会导致不必要的瓶颈,没有更多非正规化。


    4. 当然,如果你用EAV,我建议更多的正式;买满了;用6NF;实现目录;实用程序生成SQL;意见;处理缺失数据一致;完全消除Null。这降低了您对开发人员的质量的脆弱性;他们可以忘记EAV / 6NF秘密发行,使用视图,并专注于应用逻辑。


    长篇帖子。


    I was wondering if you have a website with a dozen different types of listings (Shops, Restaurants, Clubs, Hotels, Events) that require different fields, is there a benefit of creating a table with columns defined like so
    Example Shop:

    shop_id | name | X | Y | city | district | area | metro | station | address | phone | email | website | opening_hours
    

    Or a more abstract approach similar to this:

    object_id | name        
    ---------------
    1         | Messy Joe's  
    2         | Bate's Motel 
    
    type_id | name
    ---------------
    1       | hotel
    2       | restaurant
    
    
    object_id | type_id
    ---------------
    1         | 2
    2         | 1
    
    field_id | name           | field_type
    ---------------
    1        | address        | text
    2        | opening_hours  | date 
    3        | speciality     | text
    
    type_id | field_id
    ---------------
    1       | 1
    1       | 2
    2       | 1
    2       | 3
    
    object_id | field_id | value
    1         | 1        | 1st street....
    1         | 3        | English Cuisine
    

    Of course it can be more abstract if value's are predefined (Example: specialties could have their own list)

    If I take the abstract approach it can be very flexible, but queries will be more complex with a lot of joins. But I don't know if this affects the performance, executing these 'more complex' queries.

    I would be interested to know what are the up and downsides of both methods. I can just imagine for myself, but I don't have the experience to confirm this.

    解决方案

    Certain issues need to be clarified and resolved before we can enter into a reasonable discussion.

    Pre-requisite Resolution

    1. Labels
      In a profession that demands precision, it is important that we use precise labels, to avoid confusion, and so that we can communicate without having to use long-winded descriptions and qualifiers.
      .
      What you have posted as FixedTables, is Unnormalised. Fair enough, it may be an attempt at Third Normal form, but in fact it is a flat file, Unnormalised (not "denormalised). What you have posted as AbstractTables is, to be precise, Entity-Attribute-Value, which is almost, but not quite, Sixth Normal form, and is therefore more Normalised than 3NF. Assuming it is done correctly, of course.

      • The Unnormalised flat file is not "denormalised". It is chock full of duplication (nothing has been done to remove repeating groups and duplicate columns or to resolve dependencies) and Nulls, it is a performance hog in many ways, and prevents concurrency.

      • In order to be Denormlaised, it has to first be Normalised, and then the Normalisation backed off a little for some good reason. Since it is not Normalised in the first place, it cannot be Denormalised. It is simply Unnormalised.

      • It cannot be said to be denormalised "for performance", because being a performance hog, it is the very antithesis of performance. Well, they need a justification for the lack of formalised design], and "for performance" is it. Even the smallest formal scrutiny exposed the misrepresentation (but very few people can provide, so it remains hidden, until they get an outsider to address, you guessed it, the massive performance problem).

      • Normalised structures perform far better than Unnormalised structures. More normalised structures (EAV/6NF) perform better than less normalised structures (3NF/5NF).

      • I am agreeing with the thrust of OMG Ponies, but not their labels and definitions

      • rather than saying 'don't "denormalise" unless you have to', I am saying, 'Normalise faithfully, period' and 'if there is a performance problem, you have not Normalised correctly'.
        .
    2. Wiki
      The entries re Normal Forms and Normalisation are a complete joke. Specifically, the definitions are incorrect; they confuse the Normal Forms; they are clueless re the process of Normalisation; and they give equal weight to absurd or questionable NFs which have been debunked long ago. The result is, Wiki adds to an already confused and rarely understood subject. So don't waste your time.
      .
      However, in order to progress, without that reference posing a hindrance, let me say this.

      • The definition of 3NF is stable, and has not changed.
      • There is a lot of confusion of the NFs between 3NF and 5NF. The truth is that this is an area that progressed over the last 15 years; and many orgs, academics as well as vendors with their products with limitations, jumped to create a new "Normal Form" to validate their offerings. All serving commercial interests and academically unsound. 3NF in its original untampered state intended and guaranteed certain attributes.
      • The sum total is, 5NF is today, what 3NF was intended to be 15 years ago, and you can skip the commercial banter and the twelve or so "special" (commercial and pseudo-academic) NFs in-between, some of which are identified in Wiki, and even that in confusing terms.
        .
    3. Since you have been able to understand and implement the EAV in your post, you will have no problem understanding the following. Of course a true Relational Model is pre-requisite, strong keys, etc. Fifth Normal Form is, since we are skipping the Fourth:

      • Third Normal Form
        • which in simple definitive terms is, every non-key column in every table has a 1::1 relationship to the Primary Key of the table,
        • and to no other non-key columns
      • Zero data duplication (the result, if Normalisation is progressed diligently; not achieved by intelligence or experience alone, or by working toward it as a goal sans the formal process)
      • no Update Anomalies (when you update a column somewhere, you do not have to update the same column located somewhere else; the column exists in one and only one place).
        .
    4. Sixth Normal Form is of course Fifth Normal Form, plus:

      • Elimination of missing data (columns). This is the one true solution to the Null Problem (also called Handling Missing Values), and the result is a database without Nulls. (It can be done at 5NF with standards and Null substitutes but that is not optimal.) How you interpret and display the missing values is another story.
        .
    5. EAV vs Sixth Normal Form
      All the databases I have written, except one, are pure 5NF. I have worked with (administered, fixed up, enhanced) a couple of EAV databases, and I have implemented one true 6NF database. EAV is a loose implementation of 6NF, often done by people who do not have a good grasp on Normalisation and the NFs, but who can see the value in, and need the flexibility of, EAV. You are a perfect example. The difference is this: because it is loose, and because implementors do not have a reference (6NF) to be faithful to, they only implement what they need, and they write it all in code; that ends up being an inconsistent model.
      .
      Whereas, a pure 6NF implementation does have a pure academic reference point, and thus it is usually tighter, and consistent. Typically this shows up in two visible elements:
      • 6NF has a catalogue to contain metadata, and everything is defined in metadata, not code. EAV does not have one, everything is in code (implementers keep track of the objects and attributes). Obviously a catalogue eases the addition of columns, navigation, and allows utilities to be formed.
      • 6NF when understood, provides the true solution to The Null Problem. EAV implementers, since they are absent the 6NF context, handle missing data in code, inconsistently, or worse, allow Nulls in the database. 6NF implementers disallow Nulls, and handle missing Data consistently and elegantly, without requiring code constructs (for Null handling; you still have to code for missing data of course).
        .
        Eg. For 6NF databases with a catalogue, I have a set of procs that will [re]generate the SQL required to perform all SELECTs, and I provide Views in 5NF for all users, so they do not need to know or understand the underlying 6NF structure. They are driven off the catalogue. Thus changes are easy and automated. EAV types do that manually, due to the absence of the catalogue.

    Now, we can start the

    Discussion

    "Of course it can be more abstract if value's are predefined (Example: specialities could have their own list)"

    Sure. But do not get too "abstract". Maintain consistency and implement such lists in the same EAV (or 6NF) manner as other lists.

    "If I take the abstract approach it can be very flexible, but queries will be more complex with a lot of joins. But I don't know if this affects the performance, executing these 'more complex' queries."

    1. Joins are pedestrian in Relational databases. The problem is not the database, the problem is that SQL is cumbersome when handling joins, especially compound keys.
    2. EAV and 6NF databases have more Joins, which just as pedestrian, no more no less. If you have to code each SELECT manually, sure, the cumbersome gets really cumbersome.
    3. The entire problem can be eliminated by (a) going with 6NF over EAV and (b) implementing a catalogue, from which you can (c) generate all the basic SQL. Eliminates an entire class of errors as well.
    4. It is a common myth that Joins somehow have a cost. Totally false. The join is implemented at compile time, there is nothing of substance to 'cost' CPU cycles. The issue is the size of tables being joined, not the cost of the Join between those same tables. Joining two tables with millions of rows each, on a correct PK⇢FK relation, each of which have the appropriate indices (Unique on the parent[FK] side; Unique on the Child side) is instantaneous; ; where the Child index is not unique, but at least the leading column is valid, it is slower; where there is no useful index, of course that is very slow. None of it has to do with Join cost. Where many rows are returned, the bottleneck will be the network and the disk layout; not the join processing.
    5. Therefore you can get as "complex" as you like, there is no cost, SQL can handle it.

    I would be interested to know what are the up and downsides of both methods. I can just imagine for myself, but I don't have the experience to confirm this.

    1. 5NF (or 3NF for those who have not made the progression) is the easiest and best, in terms of implementation, ease of use (developers as well as users), maintenance. The drawback is, every time you add a column, you have to change the database structure (table DDL). That is fine is some cases, but not in most cases, due to change control in place, quite onerous. Second, you have to change existing code (code handling the new column does not count, because that is an imperative): where good standards are implemented, that is minimised; where they are absent, the scope is unpredictable.

    2. EAV (which is what you have posted), allows columns to be added without DDL changes. That is the single reason people choose it. (code handling the new column does not count, because that is an imperative). If implemented well, it will not affect existing code; if not, it will. But you need EAV-capable developers. When EAV is implemented badly, it is abominable, a worse mess than 5NF done badly, but not any worse than Unnormalised which is what most databases out there are (misrepresented as "Denormalised for performance"). of course, it is even more important (than in 5NF/3NF) to hold a strong Transaction context, because the columns are far more distributed. Likewise, it is essential to retain Declarative Referential Integrity: the messes I have seen were due in large part to the developers removing DRI because it became "too hard to maintain", the result was, as you can imagine, one mother of a data heap with duplicate 3NF/5NF rows and columns all over the place. And inconsistent Null handling.

    3. There is no difference in performance, assuming that the server has been reasonably configured for the intended purpose. (Ok, there are specific optimisations that are possible only in 6NF, which are not possible in other NFs, but I think that is outside the scope of this thread.) And again, EAV done badly can cause unnecessary bottlenecks, no more so than Unnormalised.

    4. Of course, if you go with EAV, I am recommending more formality; buy the full quid; go with 6NF; implement a catalogue; utilities to produce SQL; Views; handle Missing Data consistently; eliminate Nulls altogether. This reduces your vulnerability to the quality of your developers; they can forget about the EAV/6NF esoteric issuses, use Views, and concentrate on the app logic.

    Pardon the long post.

    这篇关于单个固定表与多个列和灵活的抽象表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆