由于inputStream的EOF过早导致Hadoop MapReduce作业I/O异常 [英] Hadoop MapReduce job I/O Exception due to premature EOF from inputStream

查看:122
本文介绍了由于inputStream的EOF过早导致Hadoop MapReduce作业I/O异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用命令 hadoop jar< jar>运行了MapReduce程序.[mainClass]路径/到/输入路径/到/输出.但是,我的工作是挂在: INFO mapreduce.Job:地图100%减少29%.

很久以后,我终止并检查了datanode日志(我正在伪分布式模式下运行).它包含以下异常:

  java.io.IOException:inputStream中的过早EOF在org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)在org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)在org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)在org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)在org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)在org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)在org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)在org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)在org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)在org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)在java.lang.Thread.run(Thread.java:745) 

在日志中

5秒后发生了 ERROR DataXceiver错误处理WRITE_BLOCK操作.

什么问题可能导致此异常和错误?

我的NodeHealthReport说:

  1/1本地目录是错误的:/home/$ USER/hadoop/nm-local-dir;1/1日志目录不正确:/home/$USER/hadoop-2.7.1/logs/userlogs 

我发现 this 表示可能需要增加datanode的 ulimit .我的 ulimit -n (打开的文件)是1024.如果增加此分辨率可以解决我的问题,我应该将其设置为什么?

解决方案

由于多种原因,可能会发生过早的EOF,其中之一就是使用FileOutputCommitter生成了大量线程,以将它们写入一个reducer节点上的磁盘.MultipleOutputs类允许您使用自定义名称写入文件,并完成该操作,它为每个文件生成一个线程,并绑定一个端口以写入磁盘.现在,这限制了可以在一个reducer节点上写入的文件数量.当文件数大约在一个reducer节点上越过12000时,由于线程被杀死并且_temporary文件夹被删除,导致大量这些异常消息,我遇到了此错误.我的猜测是-这不是内存超调问题,也不能通过允许hadoop引擎生成更多线程来解决.减少一次在一个节点上写入的文件数解决了我的问题-通过减少实际写入的文件数或通过增加reducer节点.

I ran a MapReduce program using the command hadoop jar <jar> [mainClass] path/to/input path/to/output. However, my job was hanging at: INFO mapreduce.Job: map 100% reduce 29%.

Much later, I terminated and checked the datanode log (I am running in pseudo-distributed mode). It contained the following exception:

java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:201)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:849)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)

5 seconds later in the log was ERROR DataXceiver error processing WRITE_BLOCK operation.

What problem might be causing this exception and error?

My NodeHealthReport said:

1/1 local-dirs are bad: /home/$USER/hadoop/nm-local-dir; 
1/1 log-dirs are bad: /home/$USER/hadoop-2.7.1/logs/userlogs

I found this which indicates that dfs.datanode.max.xcievers may need to be increased. However, it is deprecated and the new property is called dfs.datanode.max.transfer.threads with default value 4096. If changing this would fix my problem, what new value should I set it to?

This indicates that the ulimit for the datanode may need to be increased. My ulimit -n (open files) is 1024. If increasing this would fix my problem, what should I set it to?

解决方案

Premature EOF can occur due to multiple reasons, one of which is spawning of huge number of threads to write to disk on one reducer node using FileOutputCommitter. MultipleOutputs class allows you to write to files with custom names and to accomplish that, it spawns one thread per file and binds a port to it to write to the disk. Now this puts a limitation on the number of files that could be written to at one reducer node. I encountered this error when the number of files crossed 12000 roughly on one reducer node, as the threads got killed and the _temporary folder got deleted leading to plethora of these exception messages. My guess is - this is not a memory overshoot issue, nor it could be solved by allowing hadoop engine to spawn more threads. Reducing the number of files being written at one time at one node solved my problem - either by reducing the actual number of files being written, or by increasing reducer nodes.

这篇关于由于inputStream的EOF过早导致Hadoop MapReduce作业I/O异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆