解密Hadoop Snappy文件 [英] Decrypting Hadoop Snappy File
问题描述
所以我在从HDFS解密快照文件时遇到了一些问题.如果使用hadoop fs -text
,则可以解压缩并输出文件just file.但是,如果我使用hadoop fs -copyToLocal
并尝试使用python-snappy解压缩文件,则会得到
So I'm having some issues decrypting a snappy file from HDFS. If I use hadoop fs -text
I am able to uncompress and output the file just file. However if I use hadoop fs -copyToLocal
and try to uncompress the file with python-snappy I get
snappy.UncompressError:解压缩时出错:输入无效
snappy.UncompressError: Error while decompressing: invalid input
我的python程序非常简单,看起来像这样:
My python program is very simple and looks like this:
import snappy
with open (snappy_file, "r") as input_file:
data = input_file.read()
uncompressed = snappy.uncompress(data)
print uncompressed
这对我来说很惨.因此,我尝试了另一种文本,我从hadoop fs -text
中获取了输出,并使用 python-snappy 库对其进行了压缩.然后,我将其输出到文件中.然后,我可以读取该文件并将其解压缩.
This fails miserably for me. So I tried another text, I took the output from hadoop fs -text
and compressed it using the python-snappy library. I then outputted this to a file. I was able to then read this file in and uncompress it just fine.
AFAIK 快照在各个版本之间是向后兼容的.我的python代码使用的是最新的快照版本,而我猜hadoop使用的是较早的快照版本.这可能是个问题吗?还是在这里我想念其他东西?
AFAIK snappy is backwards compatible between version. My python code is using the latest snappy version and I'm guessing hadoop is using an older snappy version. Could this be a problem? Or is there something else I am missing here?
推荐答案
好吧,我知道了.原来,我使用的是对使用hadoop的帧格式进行压缩的文件的原始模式解压缩.即使当我在0.5.1中尝试StreamDecompressor时,由于成帧错误,它仍然失败. python-snappy 0.5.1默认为新的框架格式,并且因此无法解压缩hadoop的活泼文件.
Okay well I figured it out. Turns out that what I was using was the raw mode decompress on a file that was compressed using hadoop's framing format. Even when I tried the StreamDecompressor in 0.5.1 it still failed due to a framing error. python-snappy 0.5.1 defaults to the new snappy framing format and thus can't decompress the hadoop snappy files.
结果表明,主版本0.5.2添加了对hadoop框架格式的支持.构建并导入它后,就可以轻松地将文件解压缩:
Turns out that the master version, 0.5.2, has added support for the hadoop framing format. Once I built this and imported it I was able to decompress the file easily:
with open (snappy_file, "r") as input_file:
data = input_file.read()
decompressor = snappy.hadoop_snappy.StreamDecompressor()
uncompressed = decompressor.decompress(data)
现在唯一的问题是,从技术上讲,这还不是pip版本,所以我想我必须等待或仅使用源代码构建.
Now the only issue is that this isn't technically a pip version yet, so I guess I'll have to wait or just use the build from source.
这篇关于解密Hadoop Snappy文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!