亚马逊EMR:配置存储的数据节点上 [英] Amazon EMR: Configuring storage on data nodes

查看:119
本文介绍了亚马逊EMR:配置存储的数据节点上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Amazon EMR,我能够运行大多数工作正常。我遇到一个问题,当我开始加载和EMR集群内产生更多的数据。群集用完存储空间。

I'm using Amazon EMR and I'm able to run most jobs fine. I'm running into a problem when I start loading and generating more data within the EMR cluster. The cluster runs out of storage space.

每个数据节点是一个c1.medium实例。据链接 rel="nofollow">和的此处每个数据节点应该与实例存储的350GB。通过ElasticMa preduce从安全组,我已经能够在我的AWS控制台来验证c1.medium数据节点都在运行,并用来存贮。

Each data node is a c1.medium instance. According to the links here and here each data node should come with 350GB of instance storage. Through the ElasticMapReduce Slave security group I've been able to verify in my AWS Console that the c1.medium data nodes are running and are instance stores.

当我在名称节点上运行Hadoop的dfsadmin -report,每个数据节点存储大约〜10GB。这是通过运行进一步验证DF -h

When I run hadoop dfsadmin -report on the namenode, each data node has about ~10GB of storage. This is further verified by running df -h

hadoop@domU-xx-xx-xx-xx-xx:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.9G  2.6G  6.8G  28% /
tmpfs                 859M     0  859M   0% /lib/init/rw
udev                   10M   52K   10M   1% /dev
tmpfs                 859M  4.0K  859M   1% /dev/shm

我该如何配置我的数据节点,推出的全350GB存储空间?有没有一种方法使用一个引导作用要做到这一点?

How can I configure my data nodes to launch with the full 350GB storage? Is there a way to do this using a bootstrap action?

推荐答案

在对AWS论坛,更多的研究和张贴我得到了一个解决方案,虽然没有引擎盖下发生的事情有充分的认识。想到我会张贴此作为一个答案,如果可以的话。

After more research and posting on the AWS forum I got a solution although not a full understanding of what happened under the hood. Thought I would post this as an answer if that's okay.

原来有一个在AMI 2.0版,这当然是我试图使用的版本的错误。 (我已经切换到2.0,因为我想的Hadoop 0.20是默认的)在AMI 2.0版prevents安装实例存储在32位实例,这是什么c1.mediums推出的的问题。

Turns out there is a bug in the AMI Version 2.0, which of course was the version I was trying to use. (I had switched to 2.0 because I wanted hadoop 0.20 to be the default) The bug in AMI Version 2.0 prevents mounting of instance storage on 32-bit instances, which is what the c1.mediums launch as.

通过指定的CLI工具的AMI版本应该使用最新,问题得到了解决,并与存储的相应350GB每次启动c1.medium。

By specifying on the CLI tool that the AMI Version should use "latest", the problem was fixed and each c1.medium launched with the appropriate 350GB of storage.

例如

./elastic-mapreduce --create --name "Job" --ami-version "latest" --other-options

有关使用的AMI和更多信息最新,可以找到<一href="http://docs.amazonwebservices.com/ElasticMa$p$pduce/latest/DeveloperGuide/EnvironmentConfig_AMIVersion.html">here.目前,最新的设置为AMI 2.0.4。 AMI 2.0.5的最新版本,但看起来也还是有点马车。

More information about using AMIs and "latest" can be found here. Currently "latest" is set to AMI 2.0.4. AMI 2.0.5 is the most recent release but looks like it is also still a little buggy.

这篇关于亚马逊EMR:配置存储的数据节点上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆