指定Apache中星火输出文件名 [英] Specifying the output file name in Apache Spark

查看：515 发布时间：2016/5/22 16:15:37 python apache-spark

本文介绍了指定Apache中星火输出文件名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有我想要迁移到一个PySpark马preduce工作。有没有定义输出文件的名称，而不是让任何方式部分-XXXXX ？

I have a MapReduce job that I'm trying to migrate to PySpark. Is there any way of defining the name of the output file, rather than getting part-xxxxx?

在MR，我用的是 org.apache.hadoop.ma pred.lib.MultipleTextOutputFormat 类来实现这一目标，

In MR, I was using the org.apache.hadoop.mapred.lib.MultipleTextOutputFormat class to achieve this,

PS：我也尝试了 saveAsTextFile（）方法。例如：

PS: I did try the saveAsTextFile() method. For example:

lines = sc.textFile(filesToProcessStr)
counts = lines.flatMap(lambda x: re.split('[\s&]', x.strip()))\
.saveAsTextFile("/user/itsjeevs/mymr-output")

这将创建一个相同的部分-0000 文件。

This will create the same part-0000 files.

[13:46:25] [spark] $ hadoop fs -ls /user/itsjeevs/mymr-output/
Found 3 items
-rw-r-----   2 itsjeevs itsjeevs          0 2014-08-13 13:46 /user/itsjeevs/mymr-output/_SUCCESS
-rw-r--r--   2 itsjeevs itsjeevs  101819636 2014-08-13 13:46 /user/itsjeevs/mymr-output/part-00000
-rw-r--r--   2 itsjeevs itsjeevs   17682682 2014-08-13 13:46 /user/itsjeevs/mymr-output/part-00001

修改

最近看了<一个href=\"http://databricks.com/blog/2014/09/17/spark-1-1-bringing-hadoop-inputoutput-formats-to-pyspark.html\"相对=nofollow>的文章这将让生活更容易为用户星火

Recently read the article which would make life much easier for Spark users.

推荐答案

星火还使用引擎盖下的Hadoop，因此你也许可以得到你想要的。这是怎么 saveAsTextFile 实施

Spark is also using Hadoop under the hood, so you can probably get what you want. This is how saveAsTextFile is implemented:

def saveAsTextFile(path: String) {
  this.map(x => (NullWritable.get(), new Text(x.toString)))
    .saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path)
}

您可以在自定义的 OUTPUTFORMAT 传递给 saveAsHadoopFile 。我不知道该怎么做，从Python中虽然。对不起，不完整的答案。

You could pass in a customized OutputFormat to saveAsHadoopFile. I have no idea how to do that from Python though. Sorry for the incomplete answer.

这篇关于指定Apache中星火输出文件名的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

指定Apache中星火输出文件名 [英] Specifying the output file name in Apache Spark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

指定Apache中星火输出文件名 [英] Specifying the output file name in Apache Spark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭