Spark:saveAsTextFile，无压缩 [英] Spark: saveAsTextFile without compression

查看：211 发布时间：2020/9/4 6:11:12 scala apache-spark compression

本文介绍了Spark:saveAsTextFile，无压缩的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

默认情况下，保存文本文件时，较新版本的Spark使用压缩.例如:

By default, newer versions of Spark use compression when saving text files. For example:

val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("/path/to/output")

将以.deflate格式创建文件.更改压缩算法非常容易，例如对于.gzip:

will create files in .deflate format. It's quite easy to change compression algorithm, e.g. for .gzip:

import org.apache.hadoop.io.compress._
val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("/path/to/output", classOf[GzipCodec])

但是有没有一种方法可以将RDD 保存为纯文本文件，即不进行任何压缩?

But is there a way to save RDD as a plain text files, i.e. without any compression?

推荐答案

使用此代码，我可以在HDFS中看到文本文件而无需任何压缩.

I can see the text file in HDFS without any compression with this code.

val conf = new SparkConf().setMaster("local").setAppName("App name")
val sc = new SparkContext(conf);
sc.hadoopConfiguration.set("mapred.output.compress", "false")
val txt = sc.parallelize(List("Hello", "world", "!"))
txt.saveAsTextFile("hdfs/path/to/save/file")

您可以在sc上将所有与Hadoop相关的属性设置为hadoopConfiguration.

You can set all Hadoop related properties to hadoopConfiguration on sc.

在Spark 1.5.2(scala 2.11)中验证了此代码.

Verified this code in Spark 1.5.2(scala 2.11).

这篇关于Spark:saveAsTextFile，无压缩的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark:saveAsTextFile，无压缩 [英] Spark: saveAsTextFile without compression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark:saveAsTextFile，无压缩 [英] Spark: saveAsTextFile without compression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭