Hadoop流媒体“超出GC开销限制” [英] Hadoop streaming "GC overhead limit exceeded"

查看:282
本文介绍了Hadoop流媒体“超出GC开销限制”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我运行这个命令:

  hadoop jar hadoop-streaming.jar -D stream.tmpdir = / tmp -input <输入目录> -output<输出目录> -mappergrep 20151026-reducerwc -l

其中< ;输入目录> 是一个包含许多 avro 文件的目录。



错误:
$ b


线程main中的异常java.lang.OutOfMemoryError:GC开销
限制超出
org .apache.hadoop.hdfs.protocol.DatanodeID.updateXferAddrAndInvalidateHashCode(DatanodeID.java:287)
at
org.apache.hadoop.hdfs.protocol.DatanodeID。(DatanodeID.java:91)

org.apache.hadoop.hdfs.protocol.DatanodeInfo。(DatanodeInfo.java:136)
at
org.apache.hadoop.hdfs.protocol.DatanodeInfo。(DatanodeInfo.java :122)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:633)
at
org.apache.hadoop.hdfs.protocolPB .BBHelper.convert(PBHelper.java:793)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1252)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1270)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert (PBHelper.java:1533)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown来源)在
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)at
org.apache .hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
在com.sun.proxy。$ Proxy15.getListing(Unknown Source)
org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)at
org.apache.hadoop.hdfs.DistributedFileSystem $ DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:888)
at
org.apache.hadoop.hdfs.DistributedFileSystem $ DirListingIterator.hasNext(DistributedFileSystem.java:863)
at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat。
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at
org.apache.hadoop.mapred.FileInputFormat。

org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
at
org.apache.hadoop。getSplits(FileInputFormat.java:313)
at
org.apache.hadoop。 mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
at
org.apache.hadoop上的
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
。 $ $ rr(Job.java:1296)
o rg.apache.hadoop.mapreduce.Job $ 10.run(Job.java:1293)at
java.security.AccessController.doPrivileged(Native Method)at
javax.security.auth.Subject.doAs(

如何解决这个问题?

解决方案

它花了一段时间,但我找到了解决方案 here

预先加入 HADOOP_CLIENT_OPTS = - Xmx1024M来解决问题。



最后的命令行是:

  HADOOP_CLIENT_OPTS = -  Xmx1024Mhadoop jar hadoop-streaming.jar -D stream.tmpdir = / tmp -input< input dir> -output<输出目录> -mappergrep 20151026-reducerwc -l


I am running this command:

hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>"  -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"

Where <input dir> is a directory with many avro files.

And getting this error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.hdfs.protocol.DatanodeID.updateXferAddrAndInvalidateHashCode(DatanodeID.java:287) at org.apache.hadoop.hdfs.protocol.DatanodeID.(DatanodeID.java:91) at org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:136) at org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:122) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:633) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:793) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1252) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1270) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:888) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:863) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:267) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)

How can this issue be resolved ?

解决方案

It took a while, but I found the solution here.

Prepending HADOOP_CLIENT_OPTS="-Xmx1024M" to the command solves the problem.

The final commandline is:

HADOOP_CLIENT_OPTS="-Xmx1024M" hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>"  -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"

这篇关于Hadoop流媒体“超出GC开销限制”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆