阅读与Java远程HDFS文件 [英] Reading remote HDFS file with Java

查看:280
本文介绍了阅读与Java远程HDFS文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个有点麻烦,用一个简单的Hadoop的安装。我已经下载的Hadoop 2.4.0和安装在一​​台CentOS的Linux的节点(虚拟机)上。作为Apache的网站(<一个描述我与伪分布的单个节点的Hadoop配置href=\"http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html\">http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html).它开始在日志中没有问题,我可以读+写使用Hadoop的FS通过命令行命令。文件

I’m having a bit of trouble with a simple Hadoop install. I’ve downloaded hadoop 2.4.0 and installed on a single CentOS Linux node (Virtual Machine). I’ve configured hadoop for a single node with pseudo distribution as described on the apache site (http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html). It starts with no issues in the logs and I can read + write files using the "hadoop fs" commands from the command line.

我试图使用Java API来读取远程机器上从HDFS文件。本机可连接并列出目录的内容。它也可以判断某个文件与code存在:

I’m attempting to read a file from the HDFS on a remote machine with the Java API. The machine can connect and list directory contents. It can also determine if a file exists with the code:

Path p=new Path("hdfs://test.server:9000/usr/test/test_file.txt");
FileSystem fs = FileSystem.get(new Configuration());
System.out.println(p.getName() + " exists: " + fs.exists(p));

系统打印真,表示它的存在。然而,当我试图读取该文件:

The system prints "true" indicating it exists. However, when I attempt to read the file with:

BufferedReader br = null;
try {
    Path p=new Path("hdfs://test.server:9000/usr/test/test_file.txt");
    FileSystem fs = FileSystem.get(CONFIG);
    System.out.println(p.getName() + " exists: " + fs.exists(p));

    br=new BufferedReader(new InputStreamReader(fs.open(p)));
    String line = br.readLine();

    while (line != null) {
        System.out.println(line);
        line=br.readLine();
    }
}
finally {
    if(br != null) br.close();
}

这code抛出异常:

异常线程mainorg.apache.hadoop.hdfs.BlockMissingException:无法获得块:BP-13917963-127.0.0.1-1398476189167:blk_1073741831_1007文件=的/ usr /测试/ test_file里面。 TXT

Exception in thread "main" org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-13917963-127.0.0.1-1398476189167:blk_1073741831_1007 file=/usr/test/test_file.txt

谷歌搜索给了一些可能的提示,但所有签出。数据节点连接,活跃,有足够的空间。从HDFS dfsadmin -report管理报告显示:

Googling gave some possible tips but all checked out. The data node is connected, active, and has enough space. The admin report from hdfs dfsadmin –report shows:

配置的容量:52844687360(49.22 GB)

  present容量:48507940864(45.​​18 GB)

  DFS剩余:48507887616(45.18 GB)

  DFS使用:53248(52 KB)

  DFS用于%:0.00%

  在复制块:0

  块与腐败副本:0

  缺少块:0

  

  可用的Datanode:1(1总,无死亡)

  

  实况数据节点:

  名称:127.0.0.1:50010(test.server)

  主机名:test.server

  退役状态:正常

  配置的容量:52844687360(49.22 GB)

  DFS使用:53248(52 KB)

  非DFS使用:4336746496(4.04 GB)

  DFS剩余:48507887616(45.18 GB)

  DFS用于%:0.00%

  DFS剩余%:91.79%

  配置的缓存容量:0(0 B)

  使用缓存:0(0 B)

  缓存剩余:0(0 B)

  使用缓存%:100.00%

  高速缓存剩余%:0.00%

  最后的接触:周五4月25日22时十六分56秒PDT 2014年

Configured Capacity: 52844687360 (49.22 GB)
Present Capacity: 48507940864 (45.18 GB)
DFS Remaining: 48507887616 (45.18 GB)
DFS Used: 53248 (52 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

Datanodes available: 1 (1 total, 0 dead)

Live datanodes:
Name: 127.0.0.1:50010 (test.server)
Hostname: test.server
Decommission Status : Normal
Configured Capacity: 52844687360 (49.22 GB)
DFS Used: 53248 (52 KB)
Non DFS Used: 4336746496 (4.04 GB)
DFS Remaining: 48507887616 (45.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.79%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Fri Apr 25 22:16:56 PDT 2014

客户端罐子直接从Hadoop的有复制安装所以没有版本不匹配。我可以浏览文件系统与我的Java类和读取文件属性。我不能没有得到异常读取文件内容。如果我尝试写与code文件:

The client jars were copied directly from the hadoop install so no version mismatch there. I can browse the file system with my Java class and read file attributes. I just can’t read the file contents without getting the exception. If I try to write a file with the code:

FileSystem fs = null;
BufferedWriter br = null;

System.setProperty("HADOOP_USER_NAME", "root");

try {
    fs = FileSystem.get(new Configuraion());

    //Path p = new Path(dir, file);
    Path p = new Path("hdfs://test.server:9000/usr/test/test.txt");
    br = new BufferedWriter(new OutputStreamWriter(fs.create(p,true)));
    br.write("Hello World");
}
finally {
    if(br != null) br.close();
    if(fs != null) fs.close();
}

这将创建文件,但不写任何字节,并抛出异常:

this creates the file but doesn’t write any bytes and throws the exception:

异常线程mainorg.apache.hadoop.ipc.RemoteException(java.io.IOException异常):文件/usr/test/test.txt只能被复制到0节点,而不是minReplication(= 1)。有1个Datanode(S)运行和1个节点(S)被排除在此操作。

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /usr/test/test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

谷歌搜索这表明一个可能的空间问题,而是从dfsadmin报告,似乎有足够的空间。这是一个普通的安装,我不能让过去这个问题。

Googling for this indicated a possible space issue but from the dfsadmin report, it seems there is plenty of space. This is a plain vanilla install and I can’t get past this issue.

环境摘要:

服务器:

2.4.0的Hadoop与伪分布(<一个href=\"http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html\">http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html)

Hadoop 2.4.0 with pseudo-distribution (http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html)

CentOS的6.5虚拟机的64位服务器
的Java 1.7.0_55

CentOS 6.5 Virtual Machine 64 bit server Java 1.7.0_55

客户端:

Windows 8的(虚拟机)
的Java 1.7.0_51

Windows 8 (Virtual Machine) Java 1.7.0_51

任何帮助是极大AP preciated。

Any help is greatly appreciated.

推荐答案

Hadoop的错误消息是令人沮丧的。往往他们不说他们的意思,并没有任何做真正的问题。我见过当客户端,名称节点和数据节点不能正常通信这样的问题发生。你的情况我会选的两个问题之一:

Hadoop error messages are frustrating. Often they don't say what they mean and have nothing to do with the real issue. I've seen problems like this occur when the client, namenode, and datanode cannot communicate properly. In your case I would pick one of two issues:


  • 您的集群中的一个虚拟机上运行,​​而且在客户端虚拟化的网络访问被阻止。

  • 您不是一直使用客户端和主机之间解决相同完全限定域名(FQDN)。

主机名test.server十分可疑。检查所有如下:

The host name "test.server" is very suspicious. Check all of the following:


  • 是一个test.server FQDN?

  • 这是已经在你的conf文件到处被使用的名字?

  • 可以在客户机和主机的所有正向和反向解析
    test.server,其IP地址,并得到同样的事情?

  • 使用
  • 是IP地址而不是FQDN的地方?

  • 是本地主机被任何地方使用?

在使用FQDN,主机名,IP数字,和本地主机的任何矛盾必须拆除。永远不要在你的conf文件或在客户机code混合。一贯使用的FQDN是preferred。一贯使用的数字IP通常也适用。使用不合格的主机名,本地主机,127.0.0.1或者造成问题的。

Any inconsistency in the use of FQDN, hostname, numeric IP, and localhost must be removed. Do not ever mix them in your conf files or in your client code. Consistent use of FQDN is preferred. Consistent use of numeric IP usually also works. Use of unqualified hostname, localhost, or 127.0.0.1 cause problems.

这篇关于阅读与Java远程HDFS文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆