在星火DECOM pressing LZ4 COM pressed数据 [英] Decompressing LZ4 compressed data in Spark
问题描述
我有LZ4 COM pressed在HDFS的数据,我想DECOM preSS它在Apache中星火成RDD。据我所知,在 JavaSparkContext
的唯一方法从HDFS读取数据文本文件
仅读取数据它在HDFS。我接触过的文章对的COM pression codeC
但所有这些解释如何COM preSS输出到HDFS,而我需要DECOM preSS什么已经是HDFS上。
I have LZ4 compressed data in HDFS and I'm trying to decompress it in Apache Spark into a RDD. As far as I can tell, the only method in JavaSparkContext
to read data from HDFS is textFile
which only reads data as it is in HDFS. I have come across articles on CompressionCodec
but all of them explain how to compress output to HDFS whereas I need to decompress what is already on HDFS.
我是新来的火花,所以我提前道歉,如果我错过了一些明显的,或者如果我的概念理解不正确,但是这将是巨大的,如果有人能在正确的方向指向我。
I am new to Spark so I apologize in advance if I missed something obvious or if my conceptual understanding is incorrect but it would be great if someone could point me in the right direction.
推荐答案
星火1.1.0 支持通过 sc.textFile
阅读LZ4 COM pressed文件。
我知道了通过使用(在我的情况2.4.1)内置了支持LZ4 Hadoop的星火工作
Spark 1.1.0 supports reading LZ4 compressed files via sc.textFile
.
I've got it working by using Spark that is built with Hadoop that supports LZ4 (2.4.1 in my case)
在此之后,作为的 Hadoop的文档和链接他们他们通过星火 - 驱动程序库的路径
选项。
After that, I've built native libraries for my platform as described in Hadoop docs and linked them them to Spark via --driver-library-path
option.
如果没有连接有未加载本地LZ4库
例外。
根据Hadoop发行版使用的是建设机库的步骤可能是可选的。
Depending on Hadoop distribution you are using building native libraries step may be optional.
这篇关于在星火DECOM pressing LZ4 COM pressed数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!