HDFS:上传后文件不分发 [英] HDFS: file is not distributed after upload

查看:148
本文介绍了HDFS:上传后文件不分发的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在8节点集群上部署了hadoop(0.20.203.0rc1)。将文件上传到hdfs后,我只在其中一个节点上获得该文件,而不是在所有节点上均匀分布。有什么可以解决的问题?

  $ HADOOP_HOME / bin / hadoop dfs -copyFromLocal ../data/rmat-20.0 / user / frolo / input / rmat-20.0 

$ HADOOP_HOME / bin / hadoop dfs -stat%b%o%r%n/ user / frolo / input / rmat- *
1220222968 67108864 1 rmat-20.0

$ HADOOP_HOME / bin / hadoop dfsadmin -report
配置容量:2536563998720(2.31 TB)
现有容量:1642543419392(1.49 TB)
DFS剩余的:1641312030720(1.49 TB)
使用的DFS:1231388672(1.15 GB)
使用的DFS百分比:0.07%
在复制块之下:0
具有损坏副本的块:0
缺失块:0

------------------------------------ -------------
Datanodes可用:8(共8个,已死亡)

名称:10.10.1.15:50010
退役状态:正常
配置容量:317070499840(295.29 GB)
使用的DFS:24576(24 KB)
使用的非DFS:131536928768(122.5 GB)
剩余的DFS:185533546496(172.79 GB)
已用DFS%:0%
剩余DFS:58.51%
La st联系方式:Fri Feb 07 12:10:27 MSK 2014


名称:10.10.1.13:50010
退役状态:正常
已配置容量:317070499840(295.29 GB)
使用的DFS:24576(24 KB)
使用的非DFS:131533377536(122.5 GB)
剩余的DFS:185537097728(172.79 GB)
使用的DFS%:0%
DFS余下%:58.52%
最后一次联系:星期五Feb 07 12:10:27 MSK 2014


名称:10.10.1.17:50010
退役状态:正常
配置容量:317070499840(295.29 GB)
使用的DFS:24576(24 KB)
使用的非DFS:120023924736(111.78 GB)
剩余的DFS:197046550528(183.51 GB)
使用的DFS%:0%
剩余的DFS%:62.15%
最后一次联系:Fri Feb 07 12:10:27 MSK 2014


名称:10.10.1.18:50010
退役状态:正常
配置容量:317070499840(295.29 GB)
使用的DFS:24576(24 KB)
使用的非DFS:78510628864( 73.12 GB)
剩余的DFS:238559846400(222.18 GB)
使用的DFS%:0%
剩余的DFS%:75.24%
最后一次联系:星期五Feb 07 12:10:24 MSK 2014


名称:10.10.1.14:50010
退役状态:正常
配置容量:317070499840(295.29 GB)
使用的DFS:24576(24 KB)
使用的非DFS:131537530880(122.5 GB)
剩余的DFS:185532944384(172.79 GB)
使用的DFS% :0%
DFS余下%:58.51%
最后一次联系:星期五Feb 07 12:10:27 MSK 2014


名称:10.10.1.11:50010
退役状态:正常
配置容量:317070499840(295.29 GB)
使用的DFS:1231216640(1.15 GB)
使用的非DFS:84698116096(78.88 GB)
剩余的DFS :231141167104(215.27 GB)
已使用DFS百分比:0.39%
DFS剩余百分比:72.9%
最后一次联系:Fri Feb 07 12:10:24 MSK 2014


名称:10.10.1.16:50010
退役状态:正常
配置容量:317070499840(295.29 GB)
使用的DFS:24576(24 KB)
非DFS使用:131537494016(122.5 GB)
剩余的DFS:185532981248(172.79 GB)
使用的DFS%:0%
DFS Re maining%:58.51%
最后一次联系:Fri Feb 07 12:10:27 MSK 2014


名称:10.10.1.12:50010
退役状态:正常
配置容量:317070499840(295.29 GB)
使用的DFS:24576(24 KB)
使用的非DFS:84642578432(78.83 GB)
剩余的DFS:232427896832(216.47 GB)
使用的DFS%:0%
剩余的DFS%:73.3%
最后一次联系:星期五Feb 07 12:10:27 MSK 2014


解决方案

您的文件已使用复制因子编写,如 hadoop fs -stat 命令输出所证明的。这意味着该文件下的块只有一个块复制品。



写入的默认复制因子由属性 dfs.replication $ HADOOP_HOME / conf / hdfs-site.xml 下。如果在它下面没有指定,默认值是 3 ,但它可能有一个覆盖它的值,它的值是 1 。将其值更改回 3 或完全删除(以调用默认值)将使所有新文件写入使用 3 副本

您也可以使用 -D 复制因子 c $ c>由 hadoop fs 实用程序支持的属性传递方法,如:


hadoop fs -Ddfs.replication = 3 -copyFromLocal ../data/rmat-20.0 /user/frolo/input/rmat-20.0



您可以通过使用 hadoop fs -setrep 实用程序来更改现有文件的复制因子,例如:


hadoop fs -setrep 3 -w /user/frolo/input/rmat-20.0
$ b

具有 HDFS 复制因子的文件大于 1 会自动分布在多个节点上。 HDFS 永远不会将一个块的多个副本写入同一个 DataNode


I've deployed hadoop (0.20.203.0rc1) on 8-node cluster. After uploading file onto hdfs I've got this file only on one of the nodes instead of being uniformly distributed across all nodes. What can be the issue?

$HADOOP_HOME/bin/hadoop dfs -copyFromLocal ../data/rmat-20.0 /user/frolo/input/rmat-20.0

$HADOOP_HOME/bin/hadoop dfs -stat "%b %o %r %n" /user/frolo/input/rmat-*
1220222968 67108864 1 rmat-20.0

$HADOOP_HOME/bin/hadoop dfsadmin -report 
Configured Capacity: 2536563998720 (2.31 TB)
Present Capacity: 1642543419392 (1.49 TB)
DFS Remaining: 1641312030720 (1.49 TB)
DFS Used: 1231388672 (1.15 GB)
DFS Used%: 0.07%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 8 (8 total, 0 dead)

Name: 10.10.1.15:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131536928768 (122.5 GB)
DFS Remaining: 185533546496(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.13:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131533377536 (122.5 GB)
DFS Remaining: 185537097728(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.52%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.17:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 120023924736 (111.78 GB)
DFS Remaining: 197046550528(183.51 GB)
DFS Used%: 0%
DFS Remaining%: 62.15%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.18:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 78510628864 (73.12 GB)
DFS Remaining: 238559846400(222.18 GB)
DFS Used%: 0%
DFS Remaining%: 75.24%
Last contact: Fri Feb 07 12:10:24 MSK 2014


Name: 10.10.1.14:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131537530880 (122.5 GB)
DFS Remaining: 185532944384(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.11:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 1231216640 (1.15 GB)
Non DFS Used: 84698116096 (78.88 GB)
DFS Remaining: 231141167104(215.27 GB)
DFS Used%: 0.39%
DFS Remaining%: 72.9%
Last contact: Fri Feb 07 12:10:24 MSK 2014


Name: 10.10.1.16:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131537494016 (122.5 GB)
DFS Remaining: 185532981248(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.12:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 84642578432 (78.83 GB)
DFS Remaining: 232427896832(216.47 GB)
DFS Used%: 0%
DFS Remaining%: 73.3%
Last contact: Fri Feb 07 12:10:27 MSK 2014

解决方案

Your file has been written with a replication factor of 1, as evidenced by your hadoop fs -stat command output. This means only one block replica will exist for the blocks under the file.

The default replication factor for writes is governed by the property dfs.replication under $HADOOP_HOME/conf/hdfs-site.xml. If unspecified under it, the default is 3, but its likely that you have an override of it specified whose value is 1. Changing its value back to 3 or removing it altogether (to invoke default) will make all new file writes use 3 replicas by default.

You may also pass a specific replication factor with each write command using the -D property passing method supported by the hadoop fs utility, such as:

hadoop fs -Ddfs.replication=3 -copyFromLocal ../data/rmat-20.0 /user/frolo/input/rmat-20.0

And you may alter an existing file's replication factor by using the hadoop fs -setrep utility, such as:

hadoop fs -setrep 3 -w /user/frolo/input/rmat-20.0

Files with a HDFS replication factor greater than 1 will show up automatically distributed across multiple nodes. HDFS will never write more than one replica of a block onto the same DataNode.

这篇关于HDFS:上传后文件不分发的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆