将压缩的XML提要解析为ElementTree [英] Parsing compressed xml feed into ElementTree

查看:98
本文介绍了将压缩的XML提要解析为ElementTree的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将以下feed解析为python中的ElementTree: http:// smarkets .s3.amazonaws.com / oddsfeed.xml (警告大文件)

I'm trying to parse the following feed into ElementTree in python: "http://smarkets.s3.amazonaws.com/oddsfeed.xml" (warning large file)

以下是我到目前为止尝试过的内容:

Here is what I have tried so far:

feed = urllib.urlopen("http://smarkets.s3.amazonaws.com/oddsfeed.xml")

# feed is compressed
compressed_data = feed.read()
import StringIO
compressedstream = StringIO.StringIO(compressed_data)
import gzip
gzipper = gzip.GzipFile(fileobj=compressedstream)
data = gzipper.read()

# Parse XML
tree = ET.parse(data)

,但是似乎只是挂在 compressed_data = feed.read()上,也许无限? (我知道这是一个大文件,但与我解析的其他非压缩提要相比似乎太长了,而这个大文件首先扼杀了gzip压缩带来的任何带宽收益。)

but it seems to just hang on compressed_data = feed.read(), infinitely maybe?? (I know it's a big file, but seems too long compared to other non-compressed feeds I parsed, and this large is killing any bandwidth gains from the gzip compression in the first place).

接下来,我尝试了个请求,其中

Next I tried requests, with

url = "http://smarkets.s3.amazonaws.com/oddsfeed.xml"
headers = {'accept-encoding': 'gzip, deflate'}
r = requests.get(url, headers=headers, stream=True)

但现在

tree=ET.parse(r.content)

tree=ET.parse(r.text)

但这些引发了例外。

执行此操作的正确方法是什么?

What's the proper way to do this?

推荐答案

ET.parse 函数采用包含XML数据的文件名或文件对象。您要为其提供一个包含XML的字符串。它将尝试打开一个文件,该文件的名称就是XML的很大一部分。可能没有这样的文件。

The ET.parse function takes "a filename or file object containing XML data". You're giving it a string full of XML. It's going to try to open a file whose name is that big chunk of XML. There is probably no such file.

您想要 fromstring 函数,或 XML 构造函数。

You want the fromstring function, or the XML constructor.

或者,如果愿意,您已经有一个文件对象, gzipper ;您可以将其传递给 parse 而不是将其读取为字符串。

Or, if you prefer, you've already got a file object, gzipper; you could just pass that to parse instead of reading it into a string.

简短的教程在文档中:


我们可以通过读取文件来导入此数据:

We can import this data by reading from a file:



import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()




或直接从字符串中获取:

Or directly from a string:



root = ET.fromstring(country_data_as_string)

这篇关于将压缩的XML提要解析为ElementTree的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆