HDFS:使用Java / Scala API移动多个文件 [英] HDFS: move multiple files using Java / Scala API
问题描述
我需要使用Java / Scala程序在HDFS中移动对应于给定正则表达式的多个文件。例如,我必须将名为 *。xml
的所有文件从文件夹 a
移动到文件夹 b
。
I need to move multiple files in HDFS, that correspond to a given regular expression, using a Java / Scala program. For example, I have to move all files with name *.xml
from folder a
to folder b
.
使用shell命令我可以使用以下命令:
Using a shell command I can use the following:
bin/hdfs dfs -mv a/*.xml b/
我可以在 FileSystem $ c $>上使用
rename
方法使用Java API移动单个文件,并使用以下代码(scala语言) c> class:
I can move a single file using Java API, with the following code (scala language), using the rename
method on FileSystem
class:
// Prepare initial configuration
val conf = new Configuration()
conf.set("fs.defaultFS", "hdfs://hdfs:9000/user/root")
val fs = FileSystem.get(conf)
// Move a single file
val ok = fs.rename(new Path("a/file.xml"), new Path("b/file.xml"));
据我所知 Path
class表示一个URI。然后,我不能使用以下方式:
As far as I know the Path
class represents an URI. Then, I can't use in the following way:
val ok = fs.rename(new Path("a/*.xml"), new Path("b/"));
有没有办法通过Java / Scala API在HDFS中移动一组文件?
Is there a way to move a set of file in HDFS via Java / Scala API?
推荐答案
您可以使用 fs.rename(new Path(a),new Path(b) )
但是如果你想要 *。xml
globfilter。
But if you want to have *.xml
there are filter files like globfilter.
FileSystem fs = FileSystem.get(URI.create(arg0[0]), conf);
Path path = new Path(arg0[0] + arg0[1]); // arg0[1] NYSE_201[2-3]
//arg0[0] is base path
//ar0[1] uses regular expression
FileStatus[] status = fs.globStatus(path);
Path[] paths = FileUtil.stat2Paths(status);
for (Path p : paths) {
// <loops all the source paths>
// <need to implement logic to rename the paths using fs.rename>
}
这篇关于HDFS:使用Java / Scala API移动多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!