在星火DECOM pressing LZ4 COM pressed数据 [英] Decompressing LZ4 compressed data in Spark

查看:324
本文介绍了在星火DECOM pressing LZ4 COM pressed数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有LZ4 COM pressed在HDFS的数据,我想DECOM preSS它在Apache中星火成RDD。据我所知,在 JavaSparkContext 的唯一方法从HDFS读取数据文本文件仅读取数据它在HDFS。我接触过的文章对的COM pression codeC 但所有这些解释如何COM preSS输出到HDFS,而我需要DECOM preSS什么已经是HDFS上。

I have LZ4 compressed data in HDFS and I'm trying to decompress it in Apache Spark into a RDD. As far as I can tell, the only method in JavaSparkContext to read data from HDFS is textFile which only reads data as it is in HDFS. I have come across articles on CompressionCodec but all of them explain how to compress output to HDFS whereas I need to decompress what is already on HDFS.

我是新来的火花,所以我提前道歉,如果我错过了一些明显的,或者如果我的概念理解不正确,但是这将是巨大的,如果有人能在正确的方向指向我。

I am new to Spark so I apologize in advance if I missed something obvious or if my conceptual understanding is incorrect but it would be great if someone could point me in the right direction.

推荐答案

星火1.1.0 支持通过 sc.textFile 阅读LZ4 COM pressed文件。
我知道了通过使用(在我的情况2.4.1)内置了支持LZ4 Hadoop的星火工作

Spark 1.1.0 supports reading LZ4 compressed files via sc.textFile. I've got it working by using Spark that is built with Hadoop that supports LZ4 (2.4.1 in my case)

在此之后,作为的 Hadoop的文档和链接他们他们通过星火 - 驱动程序库的路径选项。

After that, I've built native libraries for my platform as described in Hadoop docs and linked them them to Spark via --driver-library-path option.

如果没有连接有未加载本地LZ4库例外。

根据Hadoop发行版使用的是建设机库的步骤可能是可选的。

Depending on Hadoop distribution you are using building native libraries step may be optional.

这篇关于在星火DECOM pressing LZ4 COM pressed数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆