解密Hadoop Snappy文件 [英] Decrypting Hadoop Snappy File

查看：503 发布时间：2020/7/7 5:30:25 python hadoop snappy

本文介绍了解密Hadoop Snappy文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我在从HDFS解密快照文件时遇到了一些问题.如果使用hadoop fs -text，则可以解压缩并输出文件just file.但是，如果我使用hadoop fs -copyToLocal并尝试使用python-snappy解压缩文件，则会得到

So I'm having some issues decrypting a snappy file from HDFS. If I use hadoop fs -text I am able to uncompress and output the file just file. However if I use hadoop fs -copyToLocal and try to uncompress the file with python-snappy I get

snappy.UncompressError:解压缩时出错:输入无效

snappy.UncompressError: Error while decompressing: invalid input

我的python程序非常简单，看起来像这样:

My python program is very simple and looks like this:

import snappy

with open (snappy_file, "r") as input_file:
    data = input_file.read()
    uncompressed = snappy.uncompress(data)
    print uncompressed

这对我来说很惨.因此，我尝试了另一种文本，我从hadoop fs -text中获取了输出，并使用 python-snappy 库对其进行了压缩.然后，我将其输出到文件中.然后，我可以读取该文件并将其解压缩.

This fails miserably for me. So I tried another text, I took the output from hadoop fs -text and compressed it using the python-snappy library. I then outputted this to a file. I was able to then read this file in and uncompress it just fine.

AFAIK 快照在各个版本之间是向后兼容的.我的python代码使用的是最新的快照版本，而我猜hadoop使用的是较早的快照版本.这可能是个问题吗?还是在这里我想念其他东西?

AFAIK snappy is backwards compatible between version. My python code is using the latest snappy version and I'm guessing hadoop is using an older snappy version. Could this be a problem? Or is there something else I am missing here?

推荐答案

好吧，我知道了.原来，我使用的是对使用hadoop的帧格式进行压缩的文件的原始模式解压缩.即使当我在0.5.1中尝试StreamDecompressor时，由于成帧错误，它仍然失败. python-snappy 0.5.1默认为新的框架格式，并且因此无法解压缩hadoop的活泼文件.

Okay well I figured it out. Turns out that what I was using was the raw mode decompress on a file that was compressed using hadoop's framing format. Even when I tried the StreamDecompressor in 0.5.1 it still failed due to a framing error. python-snappy 0.5.1 defaults to the new snappy framing format and thus can't decompress the hadoop snappy files.

结果表明，主版本0.5.2添加了对hadoop框架格式的支持.构建并导入它后，就可以轻松地将文件解压缩:

Turns out that the master version, 0.5.2, has added support for the hadoop framing format. Once I built this and imported it I was able to decompress the file easily:

with open (snappy_file, "r") as input_file:
  data = input_file.read()
  decompressor = snappy.hadoop_snappy.StreamDecompressor()
  uncompressed = decompressor.decompress(data)

现在唯一的问题是，从技术上讲，这还不是pip版本，所以我想我必须等待或仅使用源代码构建.

Now the only issue is that this isn't technically a pip version yet, so I guess I'll have to wait or just use the build from source.

这篇关于解密Hadoop Snappy文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解密Hadoop Snappy文件 [英] Decrypting Hadoop Snappy File

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解密Hadoop Snappy文件 [英] Decrypting Hadoop Snappy File

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭