如何使用Python从xml文件读取CDATA [英] How to read CDATA from xml file with Python
本文介绍了如何使用Python从xml文件读取CDATA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我尝试使用Python解析大型的xml文件,但是当我想打印CDATA信息时,没有任何内容,尤其是带有用于描述的 content标记
I try to parse a large xml file with Python, but when I want to print CDATA information, there are nothing, especially with the "content" tag for the description
我的源代码如下:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import xml.sax
import re
from cStringIO import StringIO
class MovieHandler( xml.sax.ContentHandler ):
def __init__(self):
self.item = {}
self.CurrentData = ""
self.url = ""
self.description = ""
self.price = ""
# Call when an element starts
def startElement(self, tag, attributes):
self.CurrentData = tag
# Call when an elements ends
def endElement(self, tag):
elif self.CurrentData == "url":
self.item["url"] = self.url
elif self.CurrentData == "content":
print 'description: ', self.description
elif self.CurrentData == "price":
if self.price:
self.price = re.sub('[^0-9]','',self.price[0].encode('ascii', 'ignore'))
self.item["price"] = int(self.price)
self.CurrentData = ""
print self.item
self.item.clear()
# Call when a character is read
def characters(self, content):
if self.CurrentData == "url":
self.url = content
elif self.CurrentData == "content":
self.description = content
elif self.CurrentData == "price":
self.price = content
if ( __name__ == "__main__"):
# create an XMLReader
parser = xml.sax.make_parser()
# turn off namepsaces
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
# override the default ContextHandler
Handler = MovieHandler()
parser.setContentHandler(Handler)
parser.parse("myfile.xml")
print "done"
内容标签如下:
<content><![CDATA[Jaguar XKR
new tires
perfect condition
Black LeatherInterior]]></content>
预先感谢
推荐答案
.characters()
函数可以被调用多次,每次都带有一段文本。您似乎每次通话都覆盖 self.description
。
The .characters()
function can be called several times, each time with a fragment of the text. You seem to be overwriting self.description
with each call.
尝试以下操作:
def characters(self, content):
...
self.description += content # Note: '+=', not '='
...
并记得设置 self.description =
完成后。
这篇关于如何使用Python从xml文件读取CDATA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文