使用Python读取AVRO文件 [英] Read AVRO file using Python

查看:82
本文介绍了使用Python读取AVRO文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个AVRO文件(由JAVA创建),似乎是hadoop/mapreduce的某种压缩文件,我想将其解压缩"(反序列化)为平面文件.每行每条记录.

I have an AVRO file(created by JAVA) and seems like it is some kind of zipped file for hadoop/mapreduce, i want to 'unzip' (deserialize) it to a flat file. Per record per row.

我了解到,对于python,有一个 AVRO软件包,并且我已经安装了它正确.并运行示例以读取AVRO文件.但是,它提出了以下错误,我想知道如何阅读最简单的示例?谁能帮我解释下面的错误.

I learned that there is an AVRO package for python, and I installed it correctly. And run the example to read the AVRO file. However, it came up with the errors below and I am wondering what is going on reading the simplest example? Can anyone help me interpret the errors bellow.

>>> reader = DataFileReader(open("/tmp/Stock_20130812104524.avro", "r"), DatumReader())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../python2.7/site-packages/avro/datafile.py", line 240, in __init__
    raise DataFileException('Unknown codec: %s.' % self.codec)
avro.datafile.DataFileException: Unknown codec: snappy.

顺便说一句,如果我执行文件的头"操作,并使用VI打开AVRO文件的前几行,则可以看到模式定义以及一些cr脚的怪异字符-可能是压缩后的内容.原始AVRO文件的起始位如下所示:

btw, if I do 'head' of file, and using VI to open up the first few lines of the AVRO file, I could see the schema definition together with some crappy weird characters - probably the zipped content. The starting bit of the raw AVRO file looks like below:

bj^A^D^Tavro.codec^Lsnappy^Vavro.schemaØ${"type":"record","name":"Stoc...

我不知道读取AVRO文件是否需要这些模式,如下所示:

I don't know if those schemas would be necessary to read the AVRO file, something like below:

schema = avro.schema.parse(open("schema").read())
# include schema to do sth...
reader = DataFileReader(open("Stock_20130812104524.avro", "r"), DatumReader())

谢谢.

推荐答案

问题是,如果没有安装Xcode命令行工具,您将无法迅速工作.您可以通过在命令提示符下键入gcc来检查它是否已安装.如果没有,请键入 xcode-select –-install 进行安装.然后安装python-snappy应该可以.谢谢彬!

The problem is that if there is no Xcode command line tools installed you cannot get snappy working. You can check by typing gcc at the command prompt to see if it is installed or not. If not then type xcode-select –-install to install it. Then installing python-snappy should work. Thanks Bin!

这篇关于使用Python读取AVRO文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆