HDFS:使用Java / Scala API移动多个文件 [英] HDFS: move multiple files using Java / Scala API

查看:378
本文介绍了HDFS:使用Java / Scala API移动多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Java / Scala程序在HDFS中移动对应于给定正则表达式的多个文件。例如,我必须将名为 *。xml 的所有文件从文件夹 a 移动到文件夹 b

I need to move multiple files in HDFS, that correspond to a given regular expression, using a Java / Scala program. For example, I have to move all files with name *.xml from folder a to folder b.

使用shell命令我可以使用以下命令:

Using a shell command I can use the following:

bin/hdfs dfs -mv a/*.xml b/

我可以在 FileSystem 上使用 rename 方法使用Java API移动单个文件,并使用以下代码(scala语言) c> class:

I can move a single file using Java API, with the following code (scala language), using the rename method on FileSystem class:

// Prepare initial configuration
val conf = new Configuration()
conf.set("fs.defaultFS", "hdfs://hdfs:9000/user/root")
val fs = FileSystem.get(conf)
// Move a single file
val ok = fs.rename(new Path("a/file.xml"), new Path("b/file.xml"));

据我所知 Path class表示一个URI。然后,我不能使用以下方式:

As far as I know the Path class represents an URI. Then, I can't use in the following way:

val ok = fs.rename(new Path("a/*.xml"), new Path("b/"));

有没有办法通过Java / Scala API在HDFS中移动一组文件?

Is there a way to move a set of file in HDFS via Java / Scala API?

推荐答案

您可以使用 fs.rename(new Path(a),new Path(b) )

但是如果你想要 *。xml globfilter。

But if you want to have *.xml there are filter files like globfilter.

FileSystem fs = FileSystem.get(URI.create(arg0[0]), conf);
Path path = new Path(arg0[0] + arg0[1]); // arg0[1] NYSE_201[2-3]
//arg0[0] is base path
//ar0[1] uses regular expression

FileStatus[] status = fs.globStatus(path);
Path[] paths = FileUtil.stat2Paths(status);
for (Path p : paths) {
    // <loops all the source paths>
    // <need to implement logic to rename the paths using fs.rename>
}

这篇关于HDFS:使用Java / Scala API移动多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆