极少地从XML节点提取文本 [英] Extracting text from XML node with minidom
问题描述
我浏览了几篇文章,但还没有找到解决我问题的答案.
I've looked through several posts but I haven't quite found any answers that have solved my problem.
样本XML =
<TextWithNodes>
<Node id="0"/>TEXT1<Node id="19"/>TEXT2 <Node id="20"/>TEXT3<Node id="212"/>
</TextWithNodes>
所以我知道通常如果我将TextWithNodes
提取为NodeList
,我会做类似的事情
So I understand that usually if I had extracted TextWithNodes
as a NodeList
I would do something like
nodeList = TextWithNodes[0].getElementsByTagName('Node')
for a in nodeList:
node = a.nodeValue
print node
我所得到的只是None
.我读到您必须写a.childNodes.nodeValue
,但是节点列表中没有子节点,因为看起来所有Node
Ids都在关闭标签?如果使用a.childNodes
,我会得到[]
.
All I get is None
. I've read that you must write a.childNodes.nodeValue
but there isn't a child node to the node list since it looks like all the Node
Ids are closing tags? If I use a.childNodes
I get []
.
当我得到a
的节点类型时,它是1,而TEXT_NODE
=3.我不确定这是否有帮助.
When I get the node type for a
it is type 1 and TEXT_NODE
= 3. I'm not sure if that is helpful.
我想提取TEXT1
,TEXT2
等
推荐答案
直接在文档中使用lxml
的解决方案:
A solution with lxml
right from the docs:
from lxml import etree
from StringIO import StringIO
xml = etree.parse(StringIO('''<TextWithNodes>
<Node id="0"/>TEXT1<Node id="19"/>TEXT2 <Node id="20"/>TEXT3<Node id="212"/></TextWithNodes>'''))
xml.xpath("//text()")
Out[43]: ['\n', 'TEXT1', 'TEXT2 ', 'TEXT3']
您还可以提取特定节点的文本:
You also can extract the text of an specific node:
xml.find(".//Node[@id='19']").text
这里的问题是XML中的文本不属于任何节点.
The issue here is the text in the XML doesn't belong to any node.
这篇关于极少地从XML节点提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!