如何使用Python从xml文件读取CDATA [英] How to read CDATA from xml file with Python

查看：480 发布时间：2020/9/30 1:14:53 python xml cdata

本文介绍了如何使用Python从xml文件读取CDATA的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试使用Python解析大型的xml文件，但是当我想打印CDATA信息时，没有任何内容，尤其是带有用于描述的 content标记

I try to parse a large xml file with Python, but when I want to print CDATA information, there are nothing, especially with the "content" tag for the description

我的源代码如下：

#!/usr/bin/python
# -*- coding: utf-8 -*-  
import xml.sax
import re
from cStringIO import StringIO

class MovieHandler( xml.sax.ContentHandler ):
   def __init__(self):
      self.item = {}
      self.CurrentData = ""
      self.url = ""
      self.description = ""
      self.price = ""



   # Call when an element starts
   def startElement(self, tag, attributes):
      self.CurrentData = tag

   # Call when an elements ends
   def endElement(self, tag):
      elif self.CurrentData == "url":
          self.item["url"] = self.url
      elif self.CurrentData == "content":
    print 'description: ', self.description
      elif self.CurrentData == "price":
    if self.price:
            self.price = re.sub('[^0-9]','',self.price[0].encode('ascii', 'ignore'))
            self.item["price"] = int(self.price)

      self.CurrentData = ""
      print self.item
      self.item.clear()

   # Call when a character is read
   def characters(self, content):
      if self.CurrentData == "url":
         self.url = content
      elif self.CurrentData == "content":
         self.description = content
      elif self.CurrentData == "price":
         self.price = content


if ( __name__ == "__main__"):

   # create an XMLReader
   parser = xml.sax.make_parser()
   # turn off namepsaces
   parser.setFeature(xml.sax.handler.feature_namespaces, 0)

   # override the default ContextHandler
   Handler = MovieHandler()
   parser.setContentHandler(Handler)

   parser.parse("myfile.xml")
   print "done"

内容标签如下：

<content><![CDATA[Jaguar XKR 
new tires 
perfect condition 
Black LeatherInterior]]></content>

预先感谢

推荐答案

.characters（）函数可以被调用多次，每次都带有一段文本。您似乎每次通话都覆盖 self.description 。

The .characters() function can be called several times, each time with a fragment of the text. You seem to be overwriting self.description with each call.

尝试以下操作：

def characters(self, content):
    ...
    self.description += content  # Note: '+=', not '='
    ...

并记得设置 self.description = 完成后。

这篇关于如何使用Python从xml文件读取CDATA的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Python从xml文件读取CDATA [英] How to read CDATA from xml file with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用Python从xml文件读取CDATA [英] How to read CDATA from xml file with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭