如何收集Hadoop集群大小/内核数量信息 [英] How to collect Hadoop Cluster Size/Number of Cores Information

查看:397
本文介绍了如何收集Hadoop集群大小/内核数量信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在由多台机器组成的集群上运行我的hadoop作业,这些机器的大小未知(主内存,内核数量,每台机器的大小等)。如果没有使用任何操作系统专用库(* .so文件,我的意思是说),是否有任何类或工具用于hadoop本身或一些额外的库,我可以在Hadoop MR作业正在执行时收集信息:


  1. 作业使用的内核总数/内核数量
  2. 总可用主内存/已分配可用主内存
  3. li>
  4. 每台机器上的存储空间总量/分配的存储空间
    4.

我不没有硬件信息或集群的规格,这就是为什么我想以我的hadoop代码编程收集这类信息的原因。



我该如何做到这一点?我想知道这种信息是因为不同的原因。原因之一是出现以下错误:我想知道哪台机器的空间不足。

  12/07/17 14 :28:25信息mapred.JobClient:任务ID:attempt_201205221754_0208_m_001087_0,状态:FAILED 

org.apache.hadoop.util.DiskChecker $ DiskErrorException:找不到任何有效的本地目录输出/ spill2.out

at org.apache.hadoop.fs.LocalDirAllocator $ AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376)

at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite( LocalDirAllocator.java:146)

at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)

at org.apache.hadoop.mapred。 MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)

at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill(MapTask.java:1247)

at org。 apache.hadoop.mapred.MapTask $ MapOutputBuffer.flush(MapTask.java :1155)

at org.apache.hadoop.mapred.MapTask $ NewOutputCollector.close(MapTask.java:582)

at org.apache.hadoop.mapred.MapTask .runNewMapper(MapTask.java:649)

在org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)

在org.apache.hadoop .mapred.Child $ 4.run(Child.java:270)
$ b $位于java.security.AccessController.doPrivileged(Native方法)
$ b位于javax.security.auth.Subject 。


解决方案

主节点可以访问所有的从节点并且所有节点的列表应该位于 slaves 文件中。因此,编写一个脚本遍历 slaves 文件中的节点列表,并使用 scp 。



类似这样的脚本应该可以工作


/ home / praveensripati / Installations / hadoop-0.21.0 / conf / slaves`;

do

scp praveensripati @ $ i:/ proc / cpuinfo cpuinfo_ $ i

scp praveensripati @ $ i:/ proc / meminfo meminfo_ $ i

完成


/ ip($ i)会被追加到cpuinfo和meminfo文件中。 MR工作对于这项任务来说是一种矫枉过正。

I am running my hadoop jobs on a cluster consisting of multiple machines whose sizes are not known (main memory, number of cores, size etc.. per machine). Without using any OS specific library (*.so files I mean), is there any class or tools for hadoop in itself or some additional libraries where I could collect information like while the Hadoop MR jobs are being executed:

  1. Total Number of cores / number of cores employed by the job
  2. Total available main memory / allocated available main memory
  3. Total Storage space on each machine/allocated storage space
  4. 4.

I don't have the hardware information or the specs of the cluster which is why I want to collect this kind of information programmatically in my hadoop code.

How can I achieve this? I want to know this kind of information because of different reasons. One reason is given by the following error: I want to know which machine ran out of space.

12/07/17 14:28:25 INFO mapred.JobClient: Task Id : attempt_201205221754_0208_m_001087_0, Status : FAILED

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill2.out

        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376)

        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)

        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)

        at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)

        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1247)

        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155)

        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)

        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)

        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.

解决方案

The master node would have ssh access to all the slaves and the list of all the nodes should be there in the slaves files. So, write a script which iterates through the list of nodes in the slaves file and copies the file to the master using scp.

Something like this script should work

for i in `cat /home/praveensripati/Installations/hadoop-0.21.0/conf/slaves`;
do
scp praveensripati@$i:/proc/cpuinfo cpuinfo_$i
scp praveensripati@$i:/proc/meminfo meminfo_$i
done

The hos name/ip ($i) would be appended to the cpuinfo and the meminfo files. MR job would be an overkill for this task.

这篇关于如何收集Hadoop集群大小/内核数量信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆