极少地从XML节点提取文本 [英] Extracting text from XML node with minidom

查看:70
本文介绍了极少地从XML节点提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我浏览了几篇文章,但还没有找到解决我问题的答案.

I've looked through several posts but I haven't quite found any answers that have solved my problem.

样本XML =

<TextWithNodes>
<Node id="0"/>TEXT1<Node id="19"/>TEXT2 <Node id="20"/>TEXT3<Node id="212"/>
</TextWithNodes>

所以我知道通常如果我将TextWithNodes提取为NodeList,我会做类似的事情

So I understand that usually if I had extracted TextWithNodes as a NodeList I would do something like

nodeList = TextWithNodes[0].getElementsByTagName('Node')
for a in nodeList:
    node = a.nodeValue
    print node

我所得到的只是None.我读到您必须写a.childNodes.nodeValue,但是节点列表中没有子节点,因为看起来所有Node Ids都在关闭标签?如果使用a.childNodes,我会得到[].

All I get is None. I've read that you must write a.childNodes.nodeValue but there isn't a child node to the node list since it looks like all the Node Ids are closing tags? If I use a.childNodes I get [].

当我得到a的节点类型时,它是1,而TEXT_NODE =3.我不确定这是否有帮助.

When I get the node type for a it is type 1 and TEXT_NODE = 3. I'm not sure if that is helpful.

我想提取TEXT1TEXT2

推荐答案

直接在文档中使用lxml的解决方案:

A solution with lxml right from the docs:

from lxml import etree
from StringIO import StringIO

xml = etree.parse(StringIO('''<TextWithNodes>
<Node id="0"/>TEXT1<Node id="19"/>TEXT2 <Node id="20"/>TEXT3<Node id="212"/></TextWithNodes>'''))

xml.xpath("//text()")
Out[43]: ['\n', 'TEXT1', 'TEXT2 ', 'TEXT3']

您还可以提取特定节点的文本:

You also can extract the text of an specific node:

xml.find(".//Node[@id='19']").text

这里的问题是XML中的文本不属于任何节点.

The issue here is the text in the XML doesn't belong to any node.

这篇关于极少地从XML节点提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆