极少地从XML节点提取文本 [英] Extracting text from XML node with minidom

查看：70 发布时间：2020/5/25 0:44:30 python xml parsing minidom

本文介绍了极少地从XML节点提取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我浏览了几篇文章，但还没有找到解决我问题的答案.

I've looked through several posts but I haven't quite found any answers that have solved my problem.

样本XML =

<TextWithNodes>
<Node id="0"/>TEXT1<Node id="19"/>TEXT2 <Node id="20"/>TEXT3<Node id="212"/>
</TextWithNodes>

所以我知道通常如果我将TextWithNodes提取为NodeList，我会做类似的事情

So I understand that usually if I had extracted TextWithNodes as a NodeList I would do something like

nodeList = TextWithNodes[0].getElementsByTagName('Node')
for a in nodeList:
    node = a.nodeValue
    print node

我所得到的只是None.我读到您必须写a.childNodes.nodeValue，但是节点列表中没有子节点，因为看起来所有Node Ids都在关闭标签?如果使用a.childNodes，我会得到[].

All I get is None. I've read that you must write a.childNodes.nodeValue but there isn't a child node to the node list since it looks like all the Node Ids are closing tags? If I use a.childNodes I get [].

当我得到a的节点类型时，它是1，而TEXT_NODE =3.我不确定这是否有帮助.

When I get the node type for a it is type 1 and TEXT_NODE = 3. I'm not sure if that is helpful.

我想提取TEXT1，TEXT2等

推荐答案

直接在文档中使用lxml的解决方案:

A solution with lxml right from the docs:

from lxml import etree
from StringIO import StringIO

xml = etree.parse(StringIO('''<TextWithNodes>
<Node id="0"/>TEXT1<Node id="19"/>TEXT2 <Node id="20"/>TEXT3<Node id="212"/></TextWithNodes>'''))

xml.xpath("//text()")
Out[43]: ['\n', 'TEXT1', 'TEXT2 ', 'TEXT3']

您还可以提取特定节点的文本:

You also can extract the text of an specific node:

xml.find(".//Node[@id='19']").text

这里的问题是XML中的文本不属于任何节点.

The issue here is the text in the XML doesn't belong to any node.

这篇关于极少地从XML节点提取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

极少地从XML节点提取文本 [英] Extracting text from XML node with minidom

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

极少地从XML节点提取文本 [英] Extracting text from XML node with minidom

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭