无法从Windows连接到远程HDFS [英] Cannot connect to remote HDFS from Windows
问题描述
我尝试连接到远程HDFS实例,如
Configuration conf = new Configuration();
conf.set(fs.defaultFS,hdfs:// hostName:8020);
conf.set(fs.hdfs.impl,org.apache.hadoop.hdfs.DistributedFileSystem);
FileSystem fs = FileSystem.get(conf);
RemoteIterator< LocatedFileStatus> ri = fs.listFiles(fs.getHomeDirectory(),false);
while(ri.hasNext()){
LocatedFileStatus lfs = ri.next();
//log.debug(lfs.getPath()。toString());
}
fs.close();
这里是我的Maven依赖关系
<依赖项>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-mapreduce-client-core< / artifactId>
< version> 2.7.1< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-client< / artifactId>
< version> 2.7.1< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-examples< / artifactId>
< version> 1.2.1< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-hdfs< / artifactId>
< version> 2.7.1< / version>
< /依赖关系>
这里是我的远程节点上hadoop version命令的结果
hadoop版本
Hadoop 2.7.1.2.3.0.0-2557
但我得到
线程main中的异常java.lang.UnsupportedOperationException:未由DistributedFileSystem FileSystem实现实现
位于org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:217)
位于org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java :2624)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
at org.apache.hadoop.fs.FileSystem.access $ 200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem $ Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
在org.apache .hadoop.fs.FileSystem.get(FileSystem.java:170)
at filecheck.HdfsTest.main(HdfsTest.java:21)
,这是导致错误的行
FileSystem fs = FileSystem.get (CONF);
任何想法为什么会发生这种情况?
在尝试Manjunath的答案后
这里是我得到的
错误util.Shell:无法找到hadoop二进制路径中的winutils二进制文件
java.io.IOException:在Hadoop二进制文件中找不到可执行文件null\bin\winutils.exe 。
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371)
在org.apache.hadoop.util.Shell。< clinit>(Shell.java:364)
at org.apache.hadoop.util.StringUtils。< clinit>(StringUtils.java:80)
at org.apache.hadoop.fs.FileSystem $ Cache $ Key。< init>(FileSystem.java:2807)
at org.apache.hadoop.fs.FileSystem $ Cache $ Key。< init> ;(FileSystem.java:2802)
at org.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem.java:2668)
at org.apache.hadoop.fs.FileSystem.get( FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at filecheck.HdfsTest.main(HdfsTest.java:27)
15/11/16 09:48:23 WARN util.NativeCodeLoader:无法为您的平台加载native-hadoop库......在适用的情况下使用builtin-java类
线程main中的异常java.lang.IllegalArgumentException :来自hdfs:// hostNa的路径名我:8020不是有效的DFS文件名。
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
at org.apache.hadoop.hdfs.DistributedFileSystem.access $ 000(DistributedFileSystem.java:106)
在org.apache.hadoop.hdfs.DistributedFileSystem $ DirListingIterator。< init>(分布式文件系统.java)上的< init>(DistributedFileSystem.java:940) :927)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 19.doCall(DistributedFileSystem.java:872)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 19.doCall(DistributedFileSystem.java:868 )
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886)
at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694)
at org.apache.hadoop.fs.FileSystem $ 6.< init>(FileSystem.ja va:1787)
at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1783)
at filecheck.HdfsTest.main(HdfsTest.java:29)
异常发生在 FileSystem.java
getScheme()方法,它只是抛出 UnsupportedOperationException
异常。
throw new UnsupportedOperationException );
它调用 getScheme()$ c从
DistributedFileSystem
方法
> FileSystem
类的方法,而不是调用 getScheme()
/ code> class。
$ b $ getScheme()
方法 DistributedFileSystem
class返回:
@Override
public String getScheme(){
return HdfsConstants.HDFS_URI_SCHEME;
$ / code>
所以,要解决这个问题,你需要改变FileSystem.get (conf)语句,如下所示:
DistributedFileSystem fs =(DistributedFileSystem)FileSystem.get(conf);
编辑:
我尝试了这个程序,它对我来说工作得很好。事实上,它适用于和不适用于铸造。
以下是我的代码(唯一不同的是,我将递归列表设置为true):
package com .hadooptests;
导入org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
导入org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.hdfs.DistributedFileSystem;
import java.io.IOException;
$ b $ public class HDFSConnect {
public static void main(String [] args)
{
Configuration conf = new Configuration();
conf.set(fs.defaultFS,hdfs:// machine:8020);
conf.set(fs.hdfs.impl,org.apache.hadoop.hdfs.DistributedFileSystem);
DistributedFileSystem fs = null;
尝试{
fs =(DistributedFileSystem)FileSystem.get(conf);
RemoteIterator< LocatedFileStatus>里;
ri = fs.listFiles(new Path(hdfs:// machine:8020 /),true);
while(ri.hasNext()){
LocatedFileStatus lfs = ri.next();
System.out.println(lfs.getPath()。toString());
}
fs.close();
} catch(IOException e){
e.printStackTrace();
我的maven:
<依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-hdfs< / artifactId>
< version> 2.7.1< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-common< / artifactId>
< version> 2.7.1< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-core< / artifactId>
< version> 1.2.1< / version>
< /依赖关系>
< /依赖关系>
< build>
< plugins>
< plugin>
< groupId> org.apache.maven.plugins< / groupId>
< artifactId> maven-jar-plugin< / artifactId>
< version> 2.6< / version>
<配置>
<档案>
< manifest>
< mainClass> com.hadooptests.HDFSConnect
< / mainClass>
< / manifest>
< / archive>
< / configuration>
< / plugin>
< / plugins>
< / build>
我将程序运行为:
java -cp%CLASSPATH%; hadooptests-1.0-SNAPSHOT.jarcom.hadooptests.HDFSConnect
其中CLASSPATH设置为:
。;%HADOOP_HOME%\etc\hadoop\;%HADOOP_HOME%\share\hadoop\ common\ *;%HADOOP_HOME%\share\hadoop\common\lib\ *;%HADOOP_HOME%\share\hadoop\hdfs\ *;%HADOOP_HOME%\share\hadoop\\ \\hdfs\lib\ *;%HADOOP_HOME%\share\hadoop\mapreduce\ *;%HADOOP_HOME%\share\hadoop\mapreduce\lib\ *;%HADOOP_HOME%\share \hadoop\tools\ *;%HADOOP_HOME%\share\hadoop\tools\lib\ *;%HADOOP_HOME%\share\hadoop\yarn\ *;%HADOOP_HOME%\ share \hadoop\yarn\lib\ *
有些输出,我得到了:
hdfs:// machine:8020 / app-logs / machine / logs /application_1439815019232_0001/machine.corp.com_45454
编辑2:
hdfs:// machine:8020 / app-logs / machine / logs / application_1439815019232_0002 / machine.corp.com_45454
hdfs:// machine:8020 / app-logs /machine/logs/application_1439817471006_0002/machine.corp.com_45454
hdfs:// machine:8020 / app-logs / machine / logs / application_1439817471006_0003 / machine.corp.com_45454
我的环境:
Windows上的Hadoop 2.7.1。
我安装了HDP 2.3.0,它部署了Hadoop 2.7.1
i am trying to connect to a remote HDFS instance as
Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://hostName:8020"); conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem"); FileSystem fs = FileSystem.get(conf); RemoteIterator<LocatedFileStatus> ri = fs.listFiles(fs.getHomeDirectory(), false); while (ri.hasNext()) { LocatedFileStatus lfs = ri.next(); //log.debug(lfs.getPath().toString()); } fs.close();
here are my Maven dependencies
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.7.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-examples</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.1</version> </dependency>
and here is the result of hadoop version command on my remote node
hadoop version Hadoop 2.7.1.2.3.0.0-2557
but i get
Exception in thread "main" java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:217) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2624) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2634) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) at filecheck.HdfsTest.main(HdfsTest.java:21)
and this is the line that causes the error
FileSystem fs = FileSystem.get(conf);
any idea why this might be happening?
After trying Manjunath's answer
here is what i get
ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:356) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:371) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:364) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2807) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) at filecheck.HdfsTest.main(HdfsTest.java:27) 15/11/16 09:48:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" java.lang.IllegalArgumentException: Pathname from hdfs://hostName:8020 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:940) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:927) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:872) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:868) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:886) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694) at org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1787) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1783) at filecheck.HdfsTest.main(HdfsTest.java:29)
解决方案The exception is occurring in
FileSystem.java
, ingetScheme()
method, which simply throwsUnsupportedOperationException
exception.public String getScheme() { throw new UnsupportedOperationException("Not implemented by the " + getClass().getSimpleName() + " FileSystem implementation"); }
It is calling
getScheme()
method ofFileSystem
class, instead of callinggetScheme()
method fromDistributedFileSystem
class.The
getScheme()
method ofDistributedFileSystem
class returns:@Override public String getScheme() { return HdfsConstants.HDFS_URI_SCHEME; }
So, to overcome this problem, you need to change the "FileSystem.get(conf)" statement, as shown below:
DistributedFileSystem fs = (DistributedFileSystem) FileSystem.get(conf);
EDIT:
I tried out the program and it worked perfectly fine for me. In fact, it works with and without casting. Following is my code (only difference is, I am setting recursive listing to "true"):
package com.hadooptests; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.LocatedFileStatus; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.RemoteIterator; import org.apache.hadoop.hdfs.DistributedFileSystem; import java.io.IOException; public class HDFSConnect { public static void main(String[] args) { Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://machine:8020"); conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem"); DistributedFileSystem fs = null; try { fs = (DistributedFileSystem) FileSystem.get(conf); RemoteIterator<LocatedFileStatus> ri; ri = fs.listFiles(new Path("hdfs://machine:8020/"), true); while (ri.hasNext()) { LocatedFileStatus lfs = ri.next(); System.out.println(lfs.getPath().toString()); } fs.close(); } catch (IOException e) { e.printStackTrace(); } } }
My maven:
<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> <version>2.6</version> <configuration> <archive> <manifest> <mainClass>com.hadooptests.HDFSConnect </mainClass> </manifest> </archive> </configuration> </plugin> </plugins> </build>
I ran the program as:
java -cp "%CLASSPATH%;hadooptests-1.0-SNAPSHOT.jar" com.hadooptests.HDFSConnect
where CLASSPATH is set to:
.;%HADOOP_HOME%\etc\hadoop\;%HADOOP_HOME%\share\hadoop\common\*;%HADOOP_HOME%\share\hadoop\common\lib\*;%HADOOP_HOME%\share\hadoop\hdfs\*;%HADOOP_HOME%\share\hadoop\hdfs\lib\*;%HADOOP_HOME%\share\hadoop\mapreduce\*;%HADOOP_HOME%\share\hadoop\mapreduce\lib\*;%HADOOP_HOME%\share\hadoop\tools\*;%HADOOP_HOME%\share\hadoop\tools\lib\*;%HADOOP_HOME%\share\hadoop\yarn\*;%HADOOP_HOME%\share\hadoop\yarn\lib\*
Some of the output, I got:
hdfs://machine:8020/app-logs/machine/logs/application_1439815019232_0001/machine.corp.com_45454 hdfs://machine:8020/app-logs/machine/logs/application_1439815019232_0002/machine.corp.com_45454 hdfs://machine:8020/app-logs/machine/logs/application_1439817471006_0002/machine.corp.com_45454 hdfs://machine:8020/app-logs/machine/logs/application_1439817471006_0003/machine.corp.com_45454
EDIT 2:
My environment:
Hadoop 2.7.1 on Windows.
I installed HDP 2.3.0, which deploys Hadoop 2.7.1
这篇关于无法从Windows连接到远程HDFS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!