python报告行/ XML节点的原点列 [英] python reporting line/column of origin of XML node

查看:123
本文介绍了python报告行/ XML节点的原点列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用xml.dom.minidom来解析python中的一些XML。解析之后,我正在对内容进行一些报告,并且想要在源XML文档中报告标签开始的行(和列),但是我看不到这是可能的。



如果可能,我想坚持xml.dom / xml.dom.minidom,但如果我需要使用SAX解析器来获取原始信息,我可以这样做 - 理想在这种情况下,将使用SAX来跟踪节点位置,但是最终还是需要使用DOM进行后期处理。



有关如何执行此操作的任何建议?希望我只是在文档中忽略某些东西,这非常简单。

解决方案

通过monkeypatching minidom内容处理程序,我能够记录每个节点的行号和列号(如parse_position属性)。这是一个有点脏,但我看不到任何正式认可的方式做:)这是我的测试脚本:

  from xml.dom import minidom 
import xml.sax

doc =\
< File>
< name> Name< / name> ;
< pos> ./& lt; / pos>
< / File>



def set_content_handler(dom_handler)
def startElementNS(name,tagName,attrs):
orig_start_cb(name,tagName,attrs)
cur_elem = dom_handler.elementStack [-1]
cur_elem.parse_position =(
parser._parser.CurrentLineNumber,
parser._parser.CurrentColumnNumber


orig_start_cb = dom_handler.startElementNS
dom_handler.startElementNS = startElementNS
orig_set_content_handler dom_handler)

parser = xml.sax.make_parser()
orig_set_content_handler = parser.setContentHandler
parser.setContentHandler = set_conten t_handler

dom = minidom.parseString(doc,parser)
pos = dom.firstChild.parse_position
print(Parent:'{0}'at {1}:{ 2}格式(
dom.firstChild.localName,pos [0],pos [1]))
为dom.firstChild.childNodes中的子代码:
如果child.localName为无:
continue
pos = child.parse_position
打印Child:'{0}'at {1}:{2}。format(child.localName,pos [0],pos [1])$ ​​b $ b

它输出以下内容:



父母:1:0中的'文件'
孩子:2:2的名称
孩子:3:2上的'pos'


I'm currently using xml.dom.minidom to parse some XML in python. After parsing, I'm doing some reporting on the content, and would like to report the line (and column) where the tag started in the source XML document, but I don't see how that's possible.

I'd like to stick with xml.dom / xml.dom.minidom if possible, but if I need to use a SAX parser to get the origin info, I can do that -- ideal in that case would be using SAX to track node location, but still end up with a DOM for my post-processing.

Any suggestions on how to do this? Hopefully I'm just overlooking something in the docs and this extremely easy.

解决方案

By monkeypatching the minidom content handler I was able to record line and column number for each node (as the 'parse_position' attribute). It's a little dirty, but I couldn't see any "officially sanctioned" way of doing it :) Here's my test script:

from xml.dom import minidom
import xml.sax

doc = """\
<File>
  <name>Name</name>
  <pos>./</pos>
</File>
"""


def set_content_handler(dom_handler):
    def startElementNS(name, tagName, attrs):
        orig_start_cb(name, tagName, attrs)
        cur_elem = dom_handler.elementStack[-1]
        cur_elem.parse_position = (
            parser._parser.CurrentLineNumber,
            parser._parser.CurrentColumnNumber
        )

    orig_start_cb = dom_handler.startElementNS
    dom_handler.startElementNS = startElementNS
    orig_set_content_handler(dom_handler)

parser = xml.sax.make_parser()
orig_set_content_handler = parser.setContentHandler
parser.setContentHandler = set_content_handler

dom = minidom.parseString(doc, parser)
pos = dom.firstChild.parse_position
print("Parent: '{0}' at {1}:{2}".format(
    dom.firstChild.localName, pos[0], pos[1]))
for child in dom.firstChild.childNodes:
    if child.localName is None:
        continue
    pos = child.parse_position
    print "Child: '{0}' at {1}:{2}".format(child.localName, pos[0], pos[1])

It outputs the following:

Parent: 'File' at 1:0
Child: 'name' at 2:2
Child: 'pos' at 3:2

这篇关于python报告行/ XML节点的原点列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆