解密Hadoop Snappy文件 [英] Decrypting Hadoop Snappy File

查看:503
本文介绍了解密Hadoop Snappy文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我在从HDFS解密快照文件时遇到了一些问题.如果使用hadoop fs -text,则可以解压缩并输出文件just file.但是,如果我使用hadoop fs -copyToLocal并尝试使用python-snappy解压缩文件,则会得到

So I'm having some issues decrypting a snappy file from HDFS. If I use hadoop fs -text I am able to uncompress and output the file just file. However if I use hadoop fs -copyToLocal and try to uncompress the file with python-snappy I get

snappy.UncompressError:解压缩时出错:输入无效

snappy.UncompressError: Error while decompressing: invalid input

我的python程序非常简单,看起来像这样:

My python program is very simple and looks like this:

import snappy

with open (snappy_file, "r") as input_file:
    data = input_file.read()
    uncompressed = snappy.uncompress(data)
    print uncompressed

这对我来说很惨.因此,我尝试了另一种文本,我从hadoop fs -text中获取了输出,并使用 python-snappy 库对其进行了压缩.然后,我将其输出到文件中.然后,我可以读取该文件并将其解压缩.

This fails miserably for me. So I tried another text, I took the output from hadoop fs -text and compressed it using the python-snappy library. I then outputted this to a file. I was able to then read this file in and uncompress it just fine.

AFAIK 快照在各个版本之间是向后兼容的.我的python代码使用的是最新的快照版本,而我猜hadoop使用的是较早的快照版本.这可能是个问题吗?还是在这里我想念其他东西?

AFAIK snappy is backwards compatible between version. My python code is using the latest snappy version and I'm guessing hadoop is using an older snappy version. Could this be a problem? Or is there something else I am missing here?

推荐答案

好吧,我知道了.原来,我使用的是对使用hadoop的帧格式进行压缩的文件的原始模式解压缩.即使当我在0.5.1中尝试StreamDecompressor时,由于成帧错误,它仍然失败. python-snappy 0.5.1默认为新的框架格式,并且因此无法解压缩hadoop的活泼文件.

Okay well I figured it out. Turns out that what I was using was the raw mode decompress on a file that was compressed using hadoop's framing format. Even when I tried the StreamDecompressor in 0.5.1 it still failed due to a framing error. python-snappy 0.5.1 defaults to the new snappy framing format and thus can't decompress the hadoop snappy files.

结果表明,主版本0.5.2添加了对hadoop框架格式的支持.构建并导入它后,就可以轻松地将文件解压缩:

Turns out that the master version, 0.5.2, has added support for the hadoop framing format. Once I built this and imported it I was able to decompress the file easily:

with open (snappy_file, "r") as input_file:
  data = input_file.read()
  decompressor = snappy.hadoop_snappy.StreamDecompressor()
  uncompressed = decompressor.decompress(data)

现在唯一的问题是,从技术上讲,这还不是pip版本,所以我想我必须等待或仅使用源代码构建.

Now the only issue is that this isn't technically a pip version yet, so I guess I'll have to wait or just use the build from source.

这篇关于解密Hadoop Snappy文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆