在集群之间运行hadoop distcp -update时,SocketTimeoutException [英] SocketTimeoutException when running hadoop distcp -update between clusters

查看:454
本文介绍了在集群之间运行hadoop distcp -update时,SocketTimeoutException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用hadoop distcp -update将目录从一个HDFS集群复制到另一个。
有时候(很经常)我得到这样的例外:

I'm using hadoop distcp -update to copy directory from one HDFS cluster to different one. Sometime (pretty often) I get this kind of exception:

13/07/03 00:20:03 INFO tools.DistCp: srcPaths=[hdfs://HDFS1:51175/directory_X]
13/07/03 00:20:03 INFO tools.DistCp: destPath=hdfs://HDFS2:51175/directory_X
13/07/03 00:25:27 WARN hdfs.DFSClient: src=directory_X, datanodes[0].getName()=***.***.***.***:8550
java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/***.***.***.***:35872 remote=/***.***.***.***:8550]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
        at java.io.DataInputStream.readShort(DataInputStream.java:295)
        at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:885)
        at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:822)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:541)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:53)
        at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1230)
        at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1110)
        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
13/07/03 00:26:40 INFO tools.DistCp: sourcePathsCount=8542
13/07/03 00:26:40 INFO tools.DistCp: filesToCopyCount=0
13/07/03 00:26:40 INFO tools.DistCp: bytesToCopyCount=0.0

有人有什么想法可以吗?
使用Hadoop 0.20.205.0

Does anyone has any idea what could it be? Using Hadoop 0.20.205.0

推荐答案

建议增加 dfs.socket的超时。超时,用于读取超时。而$ code> dfs.datanode.socket.write.timeout ,用于写入超时。

Suggest to increase timeouts for both dfs.socket.timeout, for read timeout. And dfs.datanode.socket.write.timeout, for write timeout.

默认值:

// Timeouts for communicating with DataNode for streaming writes/reads
public static int READ_TIMEOUT = 60 * 1000; // here, 69000 millis > 60000
public static int WRITE_TIMEOUT = 8 * 60 * 1000;

添加以下 hadoop-site.xml hdfs-site.xml

<property>
  <name>dfs.datanode.socket.write.timeout</name>
  <value>3000000</value>
</property>

<property>
  <name>dfs.socket.timeout</name>
  <value>3000000</value>
</property>

希望有帮助。

这篇关于在集群之间运行hadoop distcp -update时,SocketTimeoutException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆