如何在Python中的两个xml标记之间获取全部内容? [英] How do I get the whole content between two xml tags in Python?

查看：94 发布时间：2020/5/4 8:21:54 python xml xml-parsing lxml

本文介绍了如何在Python中的两个xml标记之间获取全部内容?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试在开始的xml标记和结束的xml标记之间获取全部内容.

I try to get the whole content between an opening xml tag and it's closing counterpart.

在像下面的title这样的简单情况下获取内容很容易，但是如果使用 mixed-content ，我如何在标签之间获取全部内容是否要保留内部标签?

Getting the content in straight cases like title below is easy, but how can I get the whole content between the tags if mixed-content is used and I want to preserve the inner tags?

<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text sometimes="attribute">Some text with <extradata>data</extradata> in it.
  It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag> 
  or more</sometag>.</text>
</review>

我想要的是两个 text 标签之间的内容，包括任何标签:Some text with <extradata>data</extradata> in it. It spans <sometag>multiple lines: <tag>one</tag>, <tag>two</tag> or more</sometag>.

现在我使用正则表达式，但是有点混乱，我不喜欢这种方法.我倾向于基于XML解析器的解决方案.我查看了minidom，etree，lxml和BeautifulSoup，但找不到这种情况的解决方案(全部内容，包括内部标签).

For now I use regular expressions but it get's kinda messy and I don't like this approach. I lean towards a XML parser based solution. I looked over minidom, etree, lxml and BeautifulSoup but couldn't find a solution for this case (whole content, including inner tags).

推荐答案

from lxml import etree
t = etree.XML(
"""<?xml version="1.0" encoding="UTF-8"?>
<review>
  <title>Some testing stuff</title>
  <text>Some text with <extradata>data</extradata> in it.</text>
</review>"""
)
(t.text + ''.join(map(etree.tostring, t))).strip()

这里的诀窍是t是可迭代的，并且在迭代时会产生所有子节点.由于etree避免了文本节点，因此还需要使用t.text恢复第一个子标记之前的文本.

The trick here is that t is iterable, and when iterated, yields all child nodes. Because etree avoids text nodes, you also need to recover the text before the first child tag, with t.text.

In [50]: (t.text + ''.join(map(etree.tostring, t))).strip()
Out[50]: '<title>Some testing stuff</title>\n  <text>Some text with <extradata>data</extradata> in it.</text>'

或者:

In [6]: e = t.xpath('//text')[0]

In [7]: (e.text + ''.join(map(etree.tostring, e))).strip()
Out[7]: 'Some text with <extradata>data</extradata> in it.'

这篇关于如何在Python中的两个xml标记之间获取全部内容?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Python中的两个xml标记之间获取全部内容? [英] How do I get the whole content between two xml tags in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Python中的两个xml标记之间获取全部内容? [英] How do I get the whole content between two xml tags in Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭