Hadoop如何处理大数据? [英] How Hadoop Handles Big Data?

查看:115
本文介绍了Hadoop如何处理大数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设用户想要在hadoop集群上运行作业,主数据大小为10 petabytes.how,当客户端节点将这些数据分成块时?

我的意思是,由于客户端资源有限,用户无法直接在其上传这么大的文件。他应该逐个复制并等待客户端将这些部分存储为块。然后发送其他部分。

但是在我读过的任何文件中都没有提到这种细分。



如何完成这个过程?

解决方案

好吧,你需要区分

1.构建一个1 PB的数据集

通常,你不构建一个1 PB的数据数据库通过导入1 PB文件,通常PB数据库随着时间的推移逐渐建立起来,一次只需要一小块



2.运行1 PB数据集的分析

Hadoop HDFS(Hadoop分布式文件系统) - 例如,每个从属存储1TB的总数1 PB,您希望从分布在1000个从站(1TB / 1PB)的10PB数据集中找到Max(x)。 Hadoop通过在每个从站上运行Max计算(每个在不同的机器上)来完成此操作。客户端可以从1000个从站运行10个结果(最大1TB)的单独最大计算。这样,您就不需要在内存中组装整个1PB。



这使得Hadoop非常/可线性可扩展。


可能这个链接可以帮到你

http://www.youtube.com/watch? v = ziqx2hJY8Hg [ ^ ]

suppose that a user wants to run a job on a hadoop cluster,with a primary data of size 10 petabytes.how and when the client node,breaks this data into blocks?
I mean,since the client has limited resources,the user can't upload such a big file directly on it.he should copy it part by part and wait for client to store those parts as blocks.and then send other parts.
but such segmentation is not mentioned in any of the documents I've read.

How this process is done?

解决方案

okay, you need to distinguish between
1. building a 1 petabytes data set
Usually, you don't build a single 1 petabytes database by importing a 1 PB file, usually a PB database get built up over time, one small piece a time

2. running an analysis of 1 petabytes data set
Hadoop HDFS (Hadoop Distributed File System) - for example, each slave stores 1TB of the total 1 PB and you want to find Max(x) from this 10PB data set distributed over 1000 slaves (1TB/1PB). Hadoop does this by running Max calculation on each slaves (each on separate machine). Client can run a separate "max" computation over ten results (max of 1TB) from 1000 slaves. This way, you don't ever need assemble the entire 1PB in memory.

This makes Hadoop very/lineraly scalable.


May be this link could help you
http://www.youtube.com/watch?v=ziqx2hJY8Hg[^]


这篇关于Hadoop如何处理大数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆