HBase 和 Hadoop/HDFS 的区别 [英] Difference between HBase and Hadoop/HDFS

查看:48
本文介绍了HBase 和 Hadoop/HDFS 的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个很幼稚的问题,但我是 NoSQL 范式的新手,对此知之甚少.因此,如果有人可以帮助我清楚地了解 HBase 和 Hadoop 之间的区别,或者提供一些可能有助于我了解区别的指示.

This is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So if somebody can help me clearly understand difference between the HBase and Hadoop or if give some pointers which might help me understand the difference.

到现在为止,我做了一些研究和acc.据我了解,Hadoop 提供了在 HDFS 中处理原始数据(文件)块的框架,而 HBase 是 Hadoop 之上的数据库引擎,它基本上可以处理结构化数据而不是原始数据块.就像 SQL 一样,Hbase 在 HDFS 上提供了一个逻辑层.正确吗?

Till now, I did some research and acc. to my understanding Hadoop provides framework to work with raw chunk of data(files) in HDFS and HBase is database engine above Hadoop, which basically works with structured data instead of raw data chunk. Hbase provides a logical layer over HDFS just as SQL does. Is it correct?

推荐答案

Hadoop 基本上是三样东西,一个 FS(Hadoop 分布式文件系统)、一个计算框架(MapReduce)和一个管理桥(Yet Another Resource Negotiator).HDFS 允许您以分布式(提供更快的读/写访问)和冗余(提供更好的可用性)方式存储大量数据.而 MapReduce 允许您以分布式和并行的方式处理这些庞大的数据.但 MapReduce 不仅限于 HDFS.作为FS,HDFS缺乏随机读/写能力.它有利于顺序数据访问.这就是 HBase 出现的地方.它是一个 NoSQL 数据库,运行在您的 Hadoop 集群之上,并为您提供对数据的随机实时读/写访问.

Hadoop is basically 3 things, a FS (Hadoop Distributed File System), a computation framework (MapReduce) and a management bridge (Yet Another Resource Negotiator). HDFS allows you store huge amounts of data in a distributed (provides faster read/write access) and redundant (provides better availability) manner. And MapReduce allows you to process this huge data in a distributed and parallel manner. But MapReduce is not limited to just HDFS. Being a FS, HDFS lacks the random read/write capability. It is good for sequential data access. And this is where HBase comes into picture. It is a NoSQL database that runs on top your Hadoop cluster and provides you random real-time read/write access to your data.

您可以在 Hadoop 和 HBase 中存储结构化和非结构化数据.它们都为您提供了多种访问数据的机制,例如 shell 和其他 API.而且,HBase 以列式方式将数据存储为键/值对,而 HDFS 将数据存储为平面文件.这两个系统的一些显着特点是:

You can store both structured and unstructured data in Hadoop, and HBase as well. Both of them provide you multiple mechanisms to access the data, like the shell and other APIs. And, HBase stores data as key/value pairs in a columnar fashion while HDFS stores data as flat files. Some of the salient features of both the systems are :

Hadoop

  1. 针对大文件的流式访问进行了优化.
  2. 遵循一次写入多次读取的思想.
  3. 不支持随机读/写.

HBase

  1. 以柱状方式存储键/值对(列作为列族组合在一起).
  2. 提供对大型数据集中少量数据的低延迟访问.
  3. 提供灵活的数据模型.

Hadoop 最适合离线批处理,而 HBase 则用于有实时需求的应用.

Hadoop is most suited for offline batch-processing kinda stuff while HBase is used when you have real-time needs.

在 MySQL 和 Ext4 之间进行类似的比较.

An analogous comparison would be between MySQL and Ext4.

这篇关于HBase 和 Hadoop/HDFS 的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆