如何更有效地从Spark重命名HDFS中的文件? [英] How to rename files in hdfs from spark more efficiently?

查看：503 发布时间：2020/11/22 19:23:58 scala apache-spark hdfs

本文介绍了如何更有效地从Spark重命名HDFS中的文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有450K JSON，我想根据某些规则在hdfs中重命名它们.为了简单起见，我只给每个后缀添加一个后缀.finished. 可以通过以下代码来做到这一点:

I have 450K JSONs, and I want to rename them in hdfs based on certain rules. For the sake of simplicity I just add a suffix .finished to each of them. A managed to do this, with the following code:

import org.apache.hadoop.fs._

val hdfs = FileSystem.get(sc.hadoopConfiguration)
val files = hdfs.listStatus(new Path(pathToJson))
val originalPath = files.map(_.getPath())

for(i <- originalPath.indices)
{
   hdfs.rename(originalPath(i), originalPath(i).suffix(".finished"))
}

但是重命名所有这些需要12分钟.有没有办法使它更快? (也许并行化) 我使用的是Spark 1.6.0.

But it takes 12 minutes to rename all of them. Is there a way to make it faster? (Perhaps parallelize) I use spark 1.6.0.

如何更有效地从Spark重命名HDFS中的文件? [英] How to rename files in hdfs from spark more efficiently?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何更有效地从Spark重命名HDFS中的文件? [英] How to rename files in hdfs from spark more efficiently?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭