Hadoop与RDBMS的比较 [英] Hadoop comparison to RDBMS

查看:154
本文介绍了Hadoop与RDBMS的比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的不明白hadoop比RDBMS更好的原因。任何人都可以请详细解释一下吗?这是否与底层数据结构和应用程序有关?算法

解决方案

RDBMS在处理巨大数据量的Terabytes& Peta字节。即使您拥有独立/廉价磁盘冗余阵列(RAID)&数据粉碎,它不能很好地扩展大量数据。您需要非常昂贵的硬件。

编辑
回答时,为什么RDBMS无法缩放, a href =http://cs-www.cs.yale.edu/homes/dna/papers/oltpperf-sigmod08.pdf =nofollow noreferrer> RBDMS的开销。



记录。组装日志记录并跟踪数据库结构中的所有更改
会降低性能。如果不需要可恢复性,或者通过其他方式(例如网络上的其他站点)提供可恢复性
,则记录可能不是必需的


锁定即可。传统的两阶段锁定带来了相当大的开销
,因为对数据库结构的所有访问都由一个
分开的实体,锁管理器进行管理。



<强>锁存即可。在多线程数据库中,许多数据结构
在被访问之前必须被锁存。删除此
功能并转到单线程方式会对
性能产生显着影响。

缓冲区管理。主内存数据库系统不需要
通过缓冲池访问页面,从而在每次访问记录时消除了
的间接级别。



< Hadoop是如何处理 :



Hadoop是一个免费的基于Java的编程框架,支持处理大数据设置在分布式计算环境中,可以在商品硬件上运行。它对于存储&检索大量的数据。

这种可扩展性&使用Hadoop实现存储机制(HDFS)&处理作业(YARN Map缩减作业)。除了可扩展性之外,Hadoop还提供存储数据的高可用性。



可扩展性,高可用性,处理大量数据(结构化数据,非结构化数据,半结构化数据)是Hadoop成功的关键。

数据存储在数千个节点上。处理是通过Map Reduce作业存储数据的节点(大部分时间)完成的。处理方面的数据局部性 Hadoop 的一个关键领域。 这已经通过名称节点,数据节点&资源管理器

为了理解Hadoop如何实现这一目标,您应该访问以下链接:



查看更多相关的SE问题:

NoSql vs关系数据库


I really do not understand the actual reason behind hadoop scaling better than RDBMS . Can anyone please explain at a granular level ? Has this got something to do with underlying datastructures & algorithms

解决方案

RDBMS have challenges in handling huge data volumes of Terabytes & Peta bytes. Even if you have Redundant Array of Independent/Inexpensive Disks (RAID) & data shredding, it does not scale well for huge volume of data. You require very expensive hardware.

EDIT: To answer, why RDBMS cannot scale, have a look at Overheads of RBDMS.

Logging. Assembling log records and tracking down all changes in database structures slows performance. Logging may not be necessary if recoverability is not a requirement or if recoverability is provided through other means (e.g., other sites on the network).

Locking. Traditional two-phase locking poses a sizeable overhead since all accesses to database structures are governed by a separate entity, the Lock Manager.

Latching. In a multi-threaded database, many data structures have to be latched before they can be accessed. Removing this feature and going to a single-threaded approach has a noticeable performance impact.

Buffer management. A main memory database system does not need to access pages through a buffer pool, eliminating a level of indirection on every record access.

How Hadoop handles?:

Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment, which can run on commodity hardware. It is useful for storing & retrieval of huge volumes of data.

This scalability & efficiency are possible with Hadoop implementation of storage mechanism (HDFS) & processing jobs (YARN Map reduce jobs). Apart from scalability, Hadoop provides high availability of stored data.

Scalability, High Availability, Processing of huge volumes of data (Strucutred data, Unstructured data, Semi structured data) with flexibility are key to success of Hadoop.

Data is stored on thousands of nodes & processing is done on the node where data is stored (most of the times) through Map Reduce jobs. Data Locality on processing front is one key area of success of Hadoop.

This has been achieved with Name Node, Data Node & Resource Manager.

To understand how Hadoop achieve this, you should must visit these links : HDFS Architecture , YARN Architecture and HDFS Federation

Still RDBMS is good for multiple write/read/updates and consistent ACID transactions on Giga bytes of data. But not good for processing of Tera bytes & Peta bytes of data. NoSQL with two of Consistency ,Availability Partitioning attributes of CAP theory is good in some of use cases.

But Hadoop is not meant for real time transaction support with ACID properties. It is good for Business intelligence reporting with batch processing - "Write once, multiple read" paradigm.

From slideshare.net

Have a look at one more related SE question :

NoSql vs Relational database

这篇关于Hadoop与RDBMS的比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆