如何通过hdfs检查分布式数据 [英] How to check the distributed data over hdfs

查看:100
本文介绍了如何通过hdfs检查分布式数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道,Hadoop在hdfs中的多个数据节点上复制数据,是否有一条命令用于检查不同节点上的分布式数据。 我认为你可能正在寻找这个命令

  hdfs fsck / hdfs / path / to / data -files -blocks -locations 

你会得到一个如下所示的报告。它报告所有块的列表,它们的复制因子以及这些块位于的主机集。

  /hdfs/path/to/data/file.txt 4771082824 bytes,36 blocks(s):OK 
BP-22525430-10.14.103.78-1355873316066 blk_-3400885615428218530_203522 len = 134217728 repl = 3 [10.14.103.213:50010,10.14.102.190:50010,10.14.102.176:50010]
1. BP-22525430- 10.14.103.78-1355873316066:blk_124203196739652236_203523 len = 134217728 repl = 3 [10.14.103.213:50010,10.14.102.190:50010,10.14.102.1762.50010]
2. BP-22525430-10.14.103.78-1355873316066:blk_5886188080028552249_203524 len = 134217728 repl = 3 [10.14.103.213:50010,10.14.102.190:50010,10.14.102.176:50010]
3. BP-22525430-10.14.103.78-1355873316066:blk_-3222807870390148132_203525 len = 134217728 repl = 3 [ BP-22525430-10.14.103.78-1355873316066:blk_-1285830390698132620_203526 len = 134217728 repl = 3 [10.14.103.213:50010,10.14.103.120:10010,10.14.102.176:50010] 10.14.102.190:50010,10.14.102.176:50010]
5. BP-22525430-10.14.103.78-1355873316066:blk_-2680874809037637827_203527 len = 134217728 repl = 3 [10.14.103.213:50010,10.14.102.190:50010,10.14.102.176:50010]
6. BP-22525430-10.14.103.78-1355873316066:blk_8699277646297360652_203528 len = 134217728 repl = 3 [10.14.103.213: 50010,10.14.102.190:50010,10.14.102.176:50010]
7. BP-22525430-10.14.103.78-1355873316066:blk_-2195916588803548138_203529 len = 134217728 repl = 3 [10.14.103.213:50010,10.14.102.190: 50010,10.144.102.176:50010]
[更多]


we know, Hadoop replicates the data across several data nodes in hdfs, is there a command for checking the distributed data over different nodes.

解决方案

I think you might be looking for this command

hdfs fsck /hdfs/path/to/data -files -blocks -locations

You'll get a report like the one below. It reports a list of all the blocks, their replication factor, and the set of hosts that the blocks are located on.

/hdfs/path/to/data/file.txt 4771082824 bytes, 36 block(s):  OK
0. BP-22525430-10.14.103.78-1355873316066:blk_-3400885615428218530_203522 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
1. BP-22525430-10.14.103.78-1355873316066:blk_124203196739652236_203523 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
2. BP-22525430-10.14.103.78-1355873316066:blk_5886188080028552249_203524 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
3. BP-22525430-10.14.103.78-1355873316066:blk_-3222807870390148132_203525 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
4. BP-22525430-10.14.103.78-1355873316066:blk_-1285830390698132620_203526 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
5. BP-22525430-10.14.103.78-1355873316066:blk_-2680874809037637827_203527 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
6. BP-22525430-10.14.103.78-1355873316066:blk_8699277646297360652_203528 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
7. BP-22525430-10.14.103.78-1355873316066:blk_-2195916588803548138_203529 len=134217728 repl=3 [10.14.103.213:50010, 10.14.102.190:50010, 10.14.102.176:50010]
[more]

这篇关于如何通过hdfs检查分布式数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆