关系v分层数据模型 [英] Relational v Hierarchical data models

查看:69
本文介绍了关系v分层数据模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当FE Codd提出关系模型时,已建立的时间数据库分层模型.我的理解是,关系模型被认为是对分层方法的重大改进.

我的直觉是出于某些原因,这是合理的".

  • 关系模型似乎是不可知的",因为它不是反映您查询方式的数据形状,而是结构化的,因此可以相对轻松地提出任何问题.
  • 关系模型还使可变性变得简​​单.您可以通过向表中添加行(向集合中添加元组)或删除它们来断言或撤消事实.相比之下,在分层设置中,您需要添加其他对象或从其他对象中删除,这会引入一些次要问题,例如,如果父对象不存在,则需要创建父对象,如果它为空,则需要删除父对象.
  • 关系模型可以轻松地建模不容易适合父子方法的关系,例如三个实体之间的关系.
  • 关系模型似乎更适合架构增长,因为可以使用新表添加新的事实.谨慎操作不必破坏现有的表(和事实)或依赖于它们的服务.

但是,尽管感觉关系数据模型具有优势,但是我想对为什么它在当时肯定被认为是显着优越的前提下有一定的了解,并且可能仍然如此.

我真的很感谢以某种形式提炼出来的论点,或者理想情况下是一篇或多篇论文或其他文档,或者经过其背后推理的规范参考.

为清楚起见,我不是在问哪种方法的实际实现,也不是在存储或计算方面它们的相对资源使用情况,除非这对答案很重要.

谢谢.

解决方案

说当时建立的数据库使用了层次模型"并不是很正确.

首先(对于nit-pick),是/不使用某些物理结构的数据库管理系统.数据库"-即数据库设计可能使用各种抽象.不管最终的物理平台如何,实体关系建模作为一种设计工具一直很流行.

第二,当时,大型大型"数据库通常使用分层模型,而索引为-顺序在过去称为微型计算机"(例如DEC PDP-8/-11; IBM System/34,/36; ICL 1900/ME29;霍尼韦尔DPS4/DPS7)上更为常见./p>

我们可以说,磁盘上的索引顺序组织源于使用逐批更新的打孔卡或磁带系统.这就是顺序"的来源.

您说您不想询问实际的实现;但是答案完全是关于实际实现的.顺序读取磁盘比随机访问(后者需要读取头跳动)更有效.这就是为什么与磁盘相比,内存被称为随机访问内存".(很久以前,RAM变得如此便宜,我们可以将整个数据库保留在内存中.)

类似地,组织了层次模型以提供对常用查询路径的快速访问.层次结构将紧密链接的节点放在同一物理磁盘补丁上.因此,很容易从客户"导航到该客户的订单,再到该订单的项目行".

不利之处在于,很难遍历"层次结构-例如,查找项目P5432的所有订单行,而与哪个客户/订单无关.(此外,如果您随后要检索正在订购P5432的客户,则需要在层次结构上向后"工作.如果它们全部位于同一磁盘补丁中,则希望您不必走得太远/也许它在其中相同的磁盘桶加载到RAM.)

类似地,索引顺序组织倾向于一个特定的索引-主键.如果要按客户名称而不是编号进行搜索,则需要具有各种丑陋组织的二级索引",才能将索引存储桶保留在数据附近.还有臭名昭著的存储桶溢出"现象,当您修改名称中的一个小而笨拙的拼写错误时,可能会阻止机器死机,从而将其转移到完全不同的字母位置.

(顺便说一下,NoSQL数据库是仅具有一个键的键值存储,似乎注定要陷入与二级索引有关的所有陷阱.它们需要第二个键值存储来提供备用索引,使他们保持同步的各种乐趣.回到未来!)

Codd在实施关系模型时遇到的最大问题是说服IBM高层认为该模型可以有效地支持通过多个访问路径"进行查询.您会看到他的许多早期论文都在谈论将导航"从查询编写器/编程器中抽象出来.实际上,原始的System/R设计有很多折衷之处,因为

a)IBM工程师只是不理解Codd所说的数学抽象;

b)他们被吓到会像狗一样无所畏惧,失去工作.

[天哪!个人意见:但是该小组聚在一起很久了,网上有些地方让人回想起.]这些折衷一直持续到今天在SQL中.坦率地说,这是一堆粗碎的东西,应该只是作为一个有趣的概念证明而被杀死.

Codd的模型(或SQL模型,不是 Codd的)如何成功?

  • 磁盘技术得到改进-特别是查找时间

  • 有人找出了哈希索引和b树,并​​将表的所有索引保存在与实际数据分开的内存中;而不是试图像磁悬浮磁带的串行存储一样保存它.

  • 拉里·埃里森(Larry Ellison)嗅到即将发生的事情,并偷走了IBM工程团队的成员在Oracle上构建同样的东西.迈克尔·斯通布雷克(Michael Stonebreaker)也成立了英格里斯(Ingres).

比赛开始了!没有时间停下来让一切都变得正确.实施您已有的知识(即SQL的概念证明),然后将其匆匆推向市场,无论是否准备就绪.(听起来像是一个熟悉的故事?)

关于关系模型的优越性的观点都是很好的.它们本质上是根据规范化技术得出的.我要说的是,在上世纪70年代/80年代后期,人们对它们还没有很好的理解.模式设计看起来很像分层或索引顺序数据模型,只是被转换为平面"表.特别是,有一种趋势是设计宽"表,以将我们所了解的有关客户"的所有信息聚集在一个磁盘片上,而不是垂直分区.(由于担心将分区连接在一起会对性能造成影响.)这意味着很多不适用或未知"的字段,这是SQL空值的可憎之处.

因此,您的改进"还只是部分实现.也许有一天,我们会看到为关系模型设计的DBMS.现在,我们必须忍受SQL.

When the relational model was put forward by F.E. Codd, the established databases of the time used the hierarchical model. My understanding is that the relational model was felt to be a significant improvement on the hierarchical approach.

My intuition is that this "makes sense" for a few reasons.

  • The relational model seems to be "query agnostic" in that instead of the data's shape reflecting the way that you're likely to query it, it is instead structured so that any question can be asked relatively easily.
  • The relational model also makes mutability simple. You assert or retract facts by adding rows to a table (adding tuples to a set), or removing them. In contrast, in a hierarchical setting you need to add or remove from some other object, which introduces secondary questions such as does the parent object need to be created if it doesn't exist, or removed if it is empty.
  • The relational model readily models relationships that don't easily fit in to a parent child approach, such as the relation between three entities.
  • The relational model seems like it is better suited to schema growth as new kinds of fact can be added using new tables. Doing so with some care need not disrupt existing tables (and facts), or services that depend on them.

However, while it feels like the relational data model has advantages, I'd like to have some insight on why it was definitely believed to be significantly superior at the time, and presumably, still is.

I'd really appreciate the arguments in some kind of distilled form, or ideally, one or more papers or other documents, or a canonical reference that go through the reasoning behind this.

For clarity, I'm not asking about actual implementations of either approach, or their relative resource use in terms of storage or compute, unless this is of great salience to the answer.

Thanks.

解决方案

It's not quite right to say "established databases of the time used the hierarchical model".

Firstly (to nit-pick), it's database management systems that do/don't use some physical structure. "databases" -- that is database designs might use all sorts of abstractions. Entity-Relational modelling was and still is popular as a design tool irrespective of the eventual physical platform.

Secondly, at the time whilst hierarchical models were usual for 'big iron' large databases, indexed-sequential was far more common on what used to be called 'mini-computers' (like DEC PDP-8/-11; IBM System/34,/36; ICL 1900/ME29; Honeywell DPS4/DPS7).

We might say indexed-sequential organisation on disk grew out of punched-card or magtape systems using batch update-by-copy. That's where the "sequential" comes from.

You say you don't want to ask about the actual implementation; but the answer is all about actual implementation. Reading disk sequentially is more efficent than random access (which needs the read-head to jump about). That's why in contrast to disk, memory is called "Random Access Memory". (This was long before RAM became so cheap we could keep a whole database in memory.)

Similarly the hierarchical model was organised to provide rapid access for commonly-needed query paths. The hierarchy put closely-linked nodes on the same patch of physical disk. So it was easy to navigate from Customer to that Customer's Orders to that Order's Item Lines.

The downside was it was difficult to navigate 'across' the hierarchy -- for example to find all the Order Lines for Item P5432, irrespective of which Customer/Order. (Furthermore if you then want to retrieve the Customers who are ordering P5432, you need to work 'backwards' up the hierarchy. If it's all on the same patch of disk, hopefully you don't need to look too far/perhaps it's in the same disk bucket loaded to RAM.)

Similarly the indexed-sequential organisation favoured one particular index -- the primary key. If you wanted to search by Customer name rather than number, that required a 'secondary index' with all sorts of ugly organisation to keep the index buckets somewhere near the data. And the notorious 'bucket overflow' which could stop a machine dead as you amended one teensy weensy spelling mistake in a name, so shifting it to an entirely different alphabetic position.

(By the way, NoSQL databases, being key-value stores with only one key, seem destined to fall into all those traps to do with secondary indexing. They need a second key-value store to provide an alternative index, with all sorts of fun keeping them in synch. Back to the future!)

The biggest problem Codd had in implementing the Relational Model was to persuade IBM top brass that the model could efficiently support querying through multiple 'access paths'. You'll see a lot of his early papers talking about abstracting the 'navigation' away from the query-writer/programmer. In fact the original System/R design had lots of compromises because

a) the IBM Engineers just didn't understand the mathematical abstractions Codd was talking about;

b) they were scared shitless it would perform like a dog and they'd lose their jobs.

[ahem! personal opinion: but the group got together long after, and there's some reminiscences somewhere around on the web.] Those compromises have persisted 'til today in SQL; which is frankly a pile of crud and should have been killed off as merely an interesting proof of concept.

How did Codd's model succeed (or rather the SQL model, not Codd's)?

  • Disk technology improved -- particularly seek times

  • somebody figured out hash-indexing and b-trees, and keeping all the indexes for a table in separate memory to the actual data; rather than trying to hold it like a magtape serial store.

  • Larry Ellison sniffed out something was afoot, and stole members of the IBM engineering team to build the same thing at Oracle. Also Michael Stonebreaker formed Ingres.

The race was on! There was no time to stop and get everything right. Implement what you've got (i.e. the SQL proof of concept) and rush it to market, ready or not. (Sound like a familiar story?)

Your points about the superiority of the Relational Model are all well-made. They essentially follow from normalisation techniques. I would say, though, that they weren't well understood in the late '70's/'80's. Schema designs looked a lot like the hierarchical or indexed-sequential data models, just transposed to 'flat' tables. In particular, there was a tendency to design 'wide' tables to bring together on one patch of disk everything we know about some Customer, rather than vertically partitioning. (For fear of performance hits from joining together the partitions.) That meant a lot of not-applicable or 'not known' fields -- which is the abomination of SQL's null.

So your "improvements" are as yet only partially attained. One day perhaps we'll see a DBMS engineered to the Relational Model. For now we'll have to put up with SQL.

这篇关于关系v分层数据模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆