使用Java API将数据上传到HDFS [英] Upload data to HDFS with Java API

查看:1621
本文介绍了使用Java API将数据上传到HDFS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在搜索了一段时间,没有任何解决方案可以为我工作。



非常简单 - 我想从本地文件系统上传数据到使用Java API的HDFS。 Java程序将在已配置为通过shell与远程Hadoop群集通信的主机上运行(即 hdfs dfs -ls 等)。



我在我的项目中包含了下面的依赖项:

  hadoop-core:1.2 .1 
hadoop-common:2.7.1
hadoop-hdfs:2.7.1

我的代码如下所示:

 文件localDir = ...; 
文件hdfsDir = ...;
路径localPath =新路径(localDir.getCanonicalPath());
路径hdfsPath =新路径(hdfsDir.getCanonicalPath());
Configuration conf = new Configuration();
conf.set(fs.hdfs.impl,org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set(fs.file.impl,org.apache.hadoop.fs.LocalFileSystem.class.getName());
文件系统fs = FileSystem.get(配置);
fs.getFromLocalFile(localPath,hdfsPath);

本地数据未被复制到Hadoop集群,但没有报告错误,也没有例外抛出。我为 org.apache.hadoop 包启用了 TRACE 日志记录。我看到以下输出:

  DEBUG组:139  - 创建新的组对象
DEBUG组:139 - 创建新的组对象
DEBUG组:59 - 组映射impl = org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout = 300000
DEBUG组:59 - 组映射impl = org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout = 300000
DEBUG UserGroupInformation:147 - hadoop login
DEBUG UserGroupInformation:147 - hadoop login
DEBUG UserGroupInformation:96 - hadoop login commit
DEBUG UserGroupInformation:96 - hadoop login commit $使用本地用户:UnixPrincipal:willra05
DEBUG UserGroupInformation:126 - 使用本地用户:UnixPrincipal:willra05
DEBUG UserGroupInformation:558 - UGI loginUser:< username_redacted>
DEBUG UserGroupInformation:558 - UGI loginUser:< username_redacted>
DEBUG FileSystem:1441 - 为文件创建文件系统:///
DEBUG FileSystem:1441 - 为文件创建文件系统:///
DEBUG FileSystem:1290 - 删除file://文件系统/
DEBUG FileSystem:1290 - 删除文件的文件系统:///
DEBUG FileSystem:1290 - 删除文件的文件系统:///
DEBUG FileSystem:1290 - 删除文件的文件系统:/ //

任何人都可以协助我解决此问题吗?

编辑1:(09/15/2015)



我已经删除了2个Hadoop依赖关系 - 现在只使用一个:

  hadoop-core:1.2.1 

我的代码如下:

 文件localDir = ...; 
文件hdfsDir = ...;
路径localPath =新路径(localDir.getCanonicalPath());
路径hdfsPath =新路径(hdfsDir.getCanonicalPath());
Configuration conf = new Configuration();
fs.getFromLocalFile(localPath,hdfsPath);

我之前使用以下命令执行我的应用程序:

  $ java -jar< app_name> .jar< app_arg1> < app_arg2> ... 

现在我使用以下命令执行它:

  $ hadoop jar< app_name> .jar< app_arg1> < app_arg2> ... 

通过这些更改,我的应用程序可以按照预期与HDFS进行交互。 em>据我所知, hadoop jar 命令仅适用于打包为可执行jar的Map Reduce作业,但这些更改对我来说是个诀窍。

解决方案

我不确定你正在关注的方法,但下面是使用java libs将数据上传到hdfs的一种方法:

  //需要导入
导入org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;

//这里有一些类.....
配置conf = new Configuration();
conf.set(fs.defaultFS,< hdfs写入端点>);
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(< src> ;,< dst>);

另外,如果您有本地hadoop conf xmls,您可以将它们包含在您的课程路径中。然后,hadoop fs的细节将在运行时自动提取,并且您不需要设置fs.defaultFS。另外,如果您使用的是旧版hdfs版本,则可能需要使用fs.default.name而不是fs.defaultFS。如果您不确定hdfs端点,通常是hdfs namenode url。这里是以前的类似问题的例子:将目录从本地系统复制到hdfs Java代码


I've searched for some time now and none of the solutions seem to work for me.

Pretty straightforward - I want to upload data from my local file system to HDFS using the Java API. The Java program will be run on a host that has been configured to talk to a remote Hadoop cluster through shell (i.e. hdfs dfs -ls, etc.).

I have included the below dependencies in my project:

hadoop-core:1.2.1
hadoop-common:2.7.1
hadoop-hdfs:2.7.1

I have code that looks like the following:

 File localDir = ...;
 File hdfsDir = ...;
 Path localPath = new Path(localDir.getCanonicalPath());
 Path hdfsPath = new Path(hdfsDir.getCanonicalPath());
 Configuration conf = new Configuration();
 conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
 conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
 Filesystem fs = FileSystem.get(configuration);
 fs.getFromLocalFile(localPath, hdfsPath);

The local data is not being copied to the Hadoop cluster, but no errors are reported and no exceptions are thrown. I've enabled TRACE logging for the org.apache.hadoop package. I see the following outputs:

 DEBUG Groups:139 -  Creating new Groups object
 DEBUG Groups:139 -  Creating new Groups object
 DEBUG Groups:59 - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
 DEBUG Groups:59 - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
 DEBUG UserGroupInformation:147 - hadoop login
 DEBUG UserGroupInformation:147 - hadoop login
 DEBUG UserGroupInformation:96 - hadoop login commit
 DEBUG UserGroupInformation:96 - hadoop login commit
 DEBUG UserGroupInformation:126 - using local user:UnixPrincipal: willra05
 DEBUG UserGroupInformation:126 - using local user:UnixPrincipal: willra05
 DEBUG UserGroupInformation:558 - UGI loginUser:<username_redacted>
 DEBUG UserGroupInformation:558 - UGI loginUser:<username_redacted>
 DEBUG FileSystem:1441 - Creating filesystem for file:///
 DEBUG FileSystem:1441 - Creating filesystem for file:///
 DEBUG FileSystem:1290 - Removing filesystem for file:///
 DEBUG FileSystem:1290 - Removing filesystem for file:///
 DEBUG FileSystem:1290 - Removing filesystem for file:///
 DEBUG FileSystem:1290 - Removing filesystem for file:///

Can anyone assist in helping me resolve this issue?

EDIT 1: (09/15/2015)

I've removed 2 of the Hadoop dependencies - I'm only using one now:

hadoop-core:1.2.1

My code is now the following:

File localDir = ...;
File hdfsDir = ...;
Path localPath = new Path(localDir.getCanonicalPath());
Path hdfsPath = new Path(hdfsDir.getCanonicalPath());
Configuration conf = new Configuration();
fs.getFromLocalFile(localPath, hdfsPath);

I was previously executing my application with the following command:

$ java -jar <app_name>.jar <app_arg1> <app_arg2> ...

Now I'm executing it with this command:

$ hadoop jar <app_name>.jar <app_arg1> <app_arg2> ...

With these changes, my application now interacts with HDFS as intended. To my knowledge, the hadoop jar command is meant only for Map Reduce jobs packaged as an executable jar, but these changes did the trick for me.

解决方案

i am not sure about the approach you are following, but below is one way data can be uploaded to hdfs using java libs :

//imports required 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;

//some class here .....
Configuration conf = new Configuration();
conf.set("fs.defaultFS", <hdfs write endpoint>);
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(<src>, <dst>);

Also if you have hadoop conf xmls locally, you can include them in you class path. Then hadoop fs details will automatically be picked up at runtime, and you will not need to set "fs.defaultFS" . Also if you are running in old hdfs version you might need to use "fs.default.name" instead of "fs.defaultFS". If you are not sure of the hdfs endpoint, it is usually the hdfs namenode url . Here is example from previous similar question copying directory from local system to hdfs java code

这篇关于使用Java API将数据上传到HDFS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆