Hadoop错误:使用大数据集时的Java堆空间 [英] Hadoop Error: Java heap space when using big dataset

查看:136
本文介绍了Hadoop错误:使用大数据集时的Java堆空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在大文本数据集(〜3.1Tb)上运行Hadoop程序.

I'm trying to run a Hadoop program over a big text dataset (~3.1Tb).

我一直都在获取此错误,并且看不到任何日志:

I'm obtaining this error all the time and I cannot see any log:

15/04/29 13:31:30 INFO mapreduce.Job:  map 86% reduce 3%
15/04/29 13:33:33 INFO mapreduce.Job:  map 87% reduce 3%
15/04/29 13:35:34 INFO mapreduce.Job:  map 88% reduce 3%
15/04/29 13:37:34 INFO mapreduce.Job:  map 89% reduce 3%
15/04/29 13:39:33 INFO mapreduce.Job:  map 90% reduce 3%
15/04/29 13:41:27 INFO mapreduce.Job:  map 91% reduce 3%
15/04/29 13:42:51 INFO mapreduce.Job: Task Id : attempt_1430221604005_0004_m_018721_0, Status : FAILED
Error: Java heap space
15/04/29 13:43:03 INFO mapreduce.Job: Task Id : attempt_1430221604005_0004_m_018721_1, Status : FAILED
Error: Java heap space
15/04/29 13:43:21 INFO mapreduce.Job: Task Id : attempt_1430221604005_0004_m_018721_2, Status : FAILED
Error: Java heap space
15/04/29 13:43:23 INFO mapreduce.Job:  map 92% reduce 3%
15/04/29 13:43:53 INFO mapreduce.Job:  map 100% reduce 100%
15/04/29 13:44:00 INFO mapreduce.Job: Job job_1430221604005_0004 failed with state FAILED due to: Task failed task_1430221604005_0004_m_018721
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/04/29 13:44:00 INFO mapreduce.Job: Counters: 40
    File System Counters
        FILE: Number of bytes read=1671885418232
        FILE: Number of bytes written=3434806868906
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=2421645776312
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=54123
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Failed map tasks=4
        Killed map tasks=53
        Killed reduce tasks=13
        Launched map tasks=18098
        Launched reduce tasks=13
        Other local map tasks=3
        Data-local map tasks=18095
        Total time spent by all maps in occupied slots (ms)=833322750
        Total time spent by all reduces in occupied slots (ms)=179324736
        Total time spent by all map tasks (ms)=833322750
        Total time spent by all reduce tasks (ms)=44831184
        Total vcore-seconds taken by all map tasks=833322750
        Total vcore-seconds taken by all reduce tasks=44831184
        Total megabyte-seconds taken by all map tasks=1644979108500
        Total megabyte-seconds taken by all reduce tasks=353987028864
    Map-Reduce Framework
        Map input records=4341029640
        Map output records=3718782624
        Map output bytes=1756332044946
        Map output materialized bytes=1769982618200
        Input split bytes=2694367
        Combine input records=0
        Spilled Records=7203900023
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=10688027
        CPU time spent (ms)=391899480
        Physical memory (bytes) snapshot=15069669965824
        Virtual memory (bytes) snapshot=61989010124800
        Total committed heap usage (bytes)=17448162033664
    File Input Format Counters 
        Bytes Read=2421643081945

地图制作过程需要3个多小时,而且真的很难启动,因为这是我唯一能看到的输出.

The map process take more than 3 hours, and it is really difficult to debut it since that is the only output I can see.

我有一个集群,其中有10台服务器,每台服务器的内存为24Gb,配置为:

I have a cluster with 10 servers each with 24Gb of ram and the configuration is:

<configuration>
<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>
<property>
    <name>mapreduce.jobtracker.address</name>
    <value>computer61:8021</value>
</property>
<property>
    <name>mapreduce.map.memory.mb</name>
    <value>1974</value>
</property>

<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>7896</value>
</property>

<property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx1580m</value>
</property>

<property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx6320m</value>
</property>

</configuration>

我添加了一行

导出HADOOP_HEAPSIZE = 8192

export HADOOP_HEAPSIZE=8192

到hadoop-env.sh文件,但没有任何变化.

to the hadoop-env.sh file but nothing change.

我知道这是一个老问题,但是我在50篇帖子中应用了推荐的解决方案,没有任何改善.

I know this is an old question, but I applied the recommended solutions in like 50 post without any improvement.

当我对相同的代码使用较小的数据集(〜1Tb)时,效果很好.

When I use smaller dataset (~1Tb) for the same code it works fine.

您至少知道如何保存日志以了解在哪里出现特定错误吗?

Do you know at least how I can keep the logs to know where I'm getting the specific error?

谢谢

更新:

在删除日志之前,我设法查看了该日志.基本上错误是:

I have managed to see the log before it was deleted. Basically the error is:

2015-04-29 18:23:45,719 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 25874428(103497712); length = 339969/6553600
2015-04-29 18:23:47,110 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2015-04-29 18:23:47,676 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:201)
    at java.lang.String.substring(String.java:1956)
    at java.lang.String.trim(String.java:2865)
    at analysis.MetaDataMapper.map(MetaDataMapper.java:109)
    at analysis.MetaDataMapper.map(MetaDataMapper.java:21)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

推荐答案

减小缓冲区大小可能会有所帮助.默认情况下,Hadoop在开始进行排序之前会先从映射器中缓冲70%的数据,但是对于大型数据集而言,这可能会太大.您可以通过在mapred-site.xml中添加以下属性来减少此输入缓冲区百分比.

Reducing the buffer size might help. By default, Hadoop buffers 70% of the data from a mapper before it starts sorting, but for large datasets this can be too large. You can reduce this input buffer percentage by adding the following property to mapred-site.xml.

<property>
  <name>mapred.job.shuffle.input.buffer.percent</name>
  <value>0.20</value>
</property>

我已将该值设置为20%,但是您可能希望根据您的数据集和可用RAM的数量进一步降低该值.

I have set the value to 20%, but you may want to reduce this even further depending on your dataset and the amount of RAM available.

这篇关于Hadoop错误:使用大数据集时的Java堆空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆