保存一个'lxml.etree._ElementTree'对象 [英] saving an 'lxml.etree._ElementTree' object
问题描述
最近几天我一直在学习lxml的基础知识;特别是使用lxml.html解析网站并创建内容的ElementTree.理想情况下,我想保存返回的ElementTree,以便我可以加载它并对其进行实验,而无需每次修改脚本时都解析网站.我以为腌制是必经之路,但是我现在开始怀疑.虽然我可以在腌制后检索ElementTree对象...
I've spent the last couple of days getting to grips with the basics of lxml; in particular using lxml.html to parse websites and create an ElementTree of the content. Ideally, I want to save the returned ElementTree so that I can load it up and experiment with it, without having to parse the website every time I modify my script. I assumed that pickling would be the way to go, however I'm now beginning to wonder. Although I am able to retrieve an ElementTree object after pickling...
type(myObject)
返回
<class 'lxml.etree._ElementTree'>
对象本身似乎是空"的,因为我对其进行的后续方法/属性调用均不会产生任何输出.
the object itself appears to be 'empty', since none of the subsequent method/attribute calls I make on it yield any output.
我的猜测是,在这里酸洗是不合适的,但是任何人都可以建议替代方法吗?
My guess is that pickling isn't appropriate here, but can anyone suggest an alternative?
(如果很重要,上述情况将在python3.2,lxml 2.3.2,snow-leopard中发生)
(In case it matters, the above is happening in: python3.2, lxml 2.3.2, snow-leopard))
推荐答案
您已经在处理XML,lxml
非常适合解析XML.所以我认为
最简单的事情是将序列化为XML:
You are already dealing with XML, and lxml
is great at parsing XML. So I think
the simplest thing to do would be to serialize to XML:
要写入文件:
import lxml.etree as ET
filename = '/tmp/test.xml'
myobject.write(filename)
要调用write
方法,请注意myobject
必须是lxml.etree._ElementTree
.如果是
lxml.etree._Element
,那么您需要
myobject.getroottree().write(filename)
.
To call the write
method, note that myobject
must be an lxml.etree._ElementTree
. If it is an
lxml.etree._Element
, then you would need
myobject.getroottree().write(filename)
.
要从文件名/路径,文件对象或URL进行解析,请执行以下操作:
To parse from file name/path, file object, or URL:
myobject = ET.parse(file_or_url)
要从字符串中解析:
myobject = ET.fromstring(content)
这篇关于保存一个'lxml.etree._ElementTree'对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!