由Hadoop发出的本地活泼压缩数据无法通过java-snappy版本提取 [英] native snappy compressed data emitted by Hadoop cannot extract by java-snappy version

查看:1340
本文介绍了由Hadoop发出的本地活泼压缩数据无法通过java-snappy版本提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  data.saveAsTextFile 

当我们在处理完一些处理后使用Spark时,我将结果存储到文件中并使用简洁的代码: (/data/2014-11-29\",classOf[org.apache.hadoop.io.compress.SnappyCodec])

之后,当我使用Spark来读取这个文件夹文件,所以一切工作完美!但今天我尝试在我的电脑中使用java snappy(java-snappy 1.1.1.2)来解压结果文件夹中的文件(该文件是从此文件夹下载到我的电脑中的文件之一)。
$

 <依赖关系> b $ b 

maven依赖关系:
< groupId> org.xerial.snappy< / groupId>
< artifactId> snappy-java< / artifactId>
< version> 1.1.1.2< / version>
< /依赖关系>

我使用这段代码解压缩:

 文件fileIn =新文件(E:\\\\\\\\\\ -TD-00000.snappy); 
文件fileOut =新建文件(E:
FileOutputStream fos = new FileOutputStream(fileOut,true);
byte [] fileBytes = Files.readAllBytes(Paths.get(fileIn.getPath()));
byte [] fileBytesOut = Snappy.uncompress(fileBytes);
fos.write(fileBytesOut);

但:(我马上得到这个错误:

  java.io.IOException:FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
at org。 xerial.snappy.SnappyNative.rawUncompress(本地方法)
在org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444)
在org.xerial.snappy.Snappy.uncompress(Snappy.java :480)
at org.xerial.snappy.Snappy.uncompress(Snappy.java:456)
at

在我们使用的spark集群中:


spark 1.1.0&& hadoop 2.5.1(with本地hadoop snappy)

下面是运行 hadoop checknative -a 时的结果:

  14/12/09 16:16:57信息bzip2.Bzip2Factory:已成功加载并初始化native-bzip2库系统 - 原生
14/12/09 16:16:57信息zlib.ZlibFactory:成功加载并初始化native-z lib库
本地库检查:
hadoop:true /usr/local/hadoop/hadoop2.5.1/lib/native/libhadoop.so
zlib:true /lib64/libz.so.1
snappy:true /usr/local/hadoop/hadoop2.5.1/lib/native/libsnappy.so.1
lz4:真正的修订版:99
bzip2:true /lib64/libbz2.so .1

我从链接下载并构建了快速原生:



https://code.google.com/p/snappy/
和源自:
https:/ /drive.google.com/file/d/0B0xs9kK-b5nMOWIxWGJhMXd6aGs/edit?usp=sharing



有人请为这个奇怪的错误解释!当hadoop使用native snappy来压缩数据时,会有一些差异吗?java / snappy ??????

解决方案

我是snappy-java的开发者。 Hadoop的SnappyCodec与Snappy的格式规范不完全相同: https://code.google。 com / p / snappy / source / browse / trunk / format_description.txt

Hadoop中的SnappyCodec扩展了这种格式来压缩大数据流。数据被分成块(通过BlockCompressionStream),每个块都有一些头和压缩数据。要使用Snappy.uncompress方法读取压缩数据,您需要提取每个块并删除其头。

When we use Spark after some processing i store result to file and use snappy codec with simple code :

 data.saveAsTextFile("/data/2014-11-29",classOf[org.apache.hadoop.io.compress.SnappyCodec])

after that when I use Spark to read this folder file and so Everything work perfectly ! But today I try to use java snappy ( java-snappy 1.1.1.2) in my pc to decompress a file in result folder ( this file is one of files from this folder downloaded to my Pc )

maven dependency :

<dependency>
    <groupId>org.xerial.snappy</groupId>
    <artifactId>snappy-java</artifactId>
    <version>1.1.1.2</version>
</dependency>

I use this code to decompress :

File fileIn = new File("E:\\dt\\part-00000.snappy");
File fileOut = new File("E:\\dt\\adv1417971604684.dat");
FileOutputStream fos = new FileOutputStream(fileOut, true);
byte[] fileBytes = Files.readAllBytes(Paths.get(fileIn.getPath()));
byte[] fileBytesOut = Snappy.uncompress(fileBytes);
fos.write(fileBytesOut);

but :( I immediately get this error :

    java.io.IOException: FAILED_TO_UNCOMPRESS(5)
 at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
 at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
 at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:444)
 at org.xerial.snappy.Snappy.uncompress(Snappy.java:480)
 at org.xerial.snappy.Snappy.uncompress(Snappy.java:456)
 at

in spark cluster we use :

spark 1.1.0 && hadoop 2.5.1 ( with native hadoop snappy )

Here is result when i run hadoop checknative -a :

    14/12/09 16:16:57 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2    library system-native
14/12/09 16:16:57 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/local/hadoop/hadoop2.5.1/lib/native/libhadoop.so
zlib:   true /lib64/libz.so.1
snappy: true /usr/local/hadoop/hadoop2.5.1/lib/native/libsnappy.so.1
lz4:    true revision:99
bzip2:  true /lib64/libbz2.so.1

I downloaded and build snappy native from link:

https://code.google.com/p/snappy/ and and source from : https://drive.google.com/file/d/0B0xs9kK-b5nMOWIxWGJhMXd6aGs/edit?usp=sharing

Someone please explain for this strange errors !! Are there some differences when hadoop use native snappy to compress data from use java-snappy ??????

解决方案

I am the developer of snappy-java. Hadoop's SnappyCodec is not exactly same with the Snappy's format specification: https://code.google.com/p/snappy/source/browse/trunk/format_description.txt

SnappyCodec in Hadoop extends this format to compress large data streams. Data is split into blocks (via BlockCompressionStream), and each block has some header and compressed data. To read the compressed data with Snappy.uncompress method, you need to extract each block and remove its header.

这篇关于由Hadoop发出的本地活泼压缩数据无法通过java-snappy版本提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆