Hadoop HDFS以编程方式写入操作 [英] Hadoop HDFS Write Operation Programmatically
问题描述
所以我建立了hadoop集群,包含namenode和2个datanodes。我正在使用hadoop 2.9.0。我运行命令hdfs dfs -putSomeRandomFile,它似乎工作正常。我在这里唯一的困惑是为什么它将文件存储到/ user / hduser / path?我没有在配置中的任何地方指定此路径,那么它如何在hdfs上构建此路径?
此外,我创建了一个小型java程序来执行相同的操作。我创建了一个简单的eclipse项目并写了下面几行:
public static boolean fileWriteHDFS(InputStream input,String fileName){
尝试{
System.setProperty(HADOOP_USER_NAME,hduser);
//获取Hadoop系统的配置
配置conf = new Configuration();
conf.set(fs.defaultFS,hdfs:// localhost:9000);
//conf.get(\"fs.defaultFS);
//提取目标路径
URI uri = URI.create(DESTINATION_PATH + fileName);
Path path = new Path(uri);
// HDFS中的目标文件
FileSystem fs = FileSystem.get(uri,conf); //.get(conf);
//检查文件是否已经存在
if(fs.exists(path))
{
//将适当的错误写入日志文件并返回。
返回false;
}
//创建一个输出流到目的路径
FSDataOutputStream out = fs.create(path);
//将文件从输入流复制到HDFS
IOUtils.copyBytes(输入,输出,4096,真);
//关闭所有文件描述符
out.close();
fs.close();
//完全按计划完成
return true;
} catch(Exception e){
//出错了
System.out.println(e.toString());
返回false;
}
}
我添加了以下三个hadoop库:
/home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar
/home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0-tests.jar
/home/hduser/bin/hadoop-2.9.0/share/ hadoop / common / hadoop-nfs-2.9.0.jar
正如你所看到的,我的hadoop安装位置是/ home / hduser / bin / hadoop-2.9.0 / ...当我运行这段代码时,它会抛出一个异常。即线程main中的异常java.lang.NoClassDefFoundError:com / ctc / wstx / io / InputBootstrapper
在com.ws.filewrite.fileWrite.fileWriteHDFS(fileWrite.java:21)
在com.ws.main.listenerService.main(listenerService.java:21)
引起:java.lang.ClassNotFoundException :com.ctc.wstx.io.InputBootstrapper $ b $ java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
具体来说,这个例外就是谎言:
Configuration conf = new Configuration();
我在这里错过了什么吗?什么导致了这个问题?我完全是HDFS的新手,所以请原谅我这是显而易见的问题。
解决方案 我遇到过相同的情况,并尝试查找依赖关系jar。这很困难,并可能在下一次错过另一个jar。 所以,我使用Maven来管理依赖关系。 您只需追加这两个依赖项,问题就会解决。 I asked a similar question a while back but then I had not idea what I was talking about. I am posting this question with further details and to the point queries. So I have set up hadoop cluster with namenode and 2 datanodes. I am using hadoop 2.9.0. I ran the command hdfs dfs -put "SomeRandomFile" and it seems to be working ok. The only confusion I have here is why does it store my file to /user/hduser/ path? I didn't specify this path anywhere in configurations so how is it building this path on hdfs? Furthermore I created a small java program to do the same thing. I created a simple eclipse project and wrote following lines: And I added following three hadoop libraries: /home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar
/home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0-tests.jar
/home/hduser/bin/hadoop-2.9.0/share/hadoop/common/hadoop-nfs-2.9.0.jar As you can see my hadoop installation location is /home/hduser/bin/hadoop-2.9.0/... When I run this code it throws an exception. i.e. Specifically the exception is thrown at the lie: Configuration conf = new Configuration(); Am I missing something here? What is causing this problem? I am completely new to HDFS so pardon me it is obvious problem. hadoop 2.9 dependencies not similar with hadoop 2.6. i had encountered the same situation, and try to find the dependency jar. that's difficult, and may another jar miss in the next time... so, i use Maven to manager dependencies. you just append this two dependency, problem will be solved.
这篇关于Hadoop HDFS以编程方式写入操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-common< / artifactId>
< version> 2.9.0< / version>
<! - < scope>提供< / scope> - >
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-hdfs< / artifactId>
< version> 2.9.0< / version>
< /依赖关系>
public static boolean fileWriteHDFS(InputStream input, String fileName) {
try {
System.setProperty("HADOOP_USER_NAME", "hduser");
//Get Configuration of Hadoop system
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
//conf.get("fs.defaultFS");
//Extract destination path
URI uri = URI.create(DESTINATION_PATH+fileName);
Path path = new Path(uri);
//Destination file in HDFS
FileSystem fs = FileSystem.get(uri, conf); //.get(conf);
//Check if the file already exists
if (fs.exists(path))
{
//Write appropriate error to log file and return.
return false;
}
//Create an Output stream to the destination path
FSDataOutputStream out = fs.create(path);
//Copy file from input steam to HDFSs
IOUtils.copyBytes(input, out, 4096, true);
//Close all the file descriptors
out.close();
fs.close();
//All went perfectly as planned
return true;
} catch (Exception e) {
//Something went wrong
System.out.println(e.toString());
return false;
}
}
Exception in thread "main" java.lang.NoClassDefFoundError: com/ctc/wstx/io/InputBootstrapper
at com.ws.filewrite.fileWrite.fileWriteHDFS(fileWrite.java:21)
at com.ws.main.listenerService.main(listenerService.java:21)
Caused by: java.lang.ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.9.0</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.9.0</version>
</dependency>