获取lxml中标签内的所有文本 [英] Get all text inside a tag in lxml

查看：462 发布时间：2020/5/4 8:18:55 python parsing lxml

本文介绍了获取lxml中标签内的所有文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想编写一个代码片段，在下面的所有三个实例中(包括代码标签)，它将在lxml中的<content>标签内捕获所有文本.我已经尝试过tostring(getchildren())，但是那样会错过标记之间的文本.我没有太多运气在API中搜索相关功能.你能帮我吗?

I'd like to write a code snippet that would grab all of the text inside the <content> tag, in lxml, in all three instances below, including the code tags. I've tried tostring(getchildren()) but that would miss the text in between the tags. I didn't have very much luck searching the API for a relevant function. Could you help me out?

<!--1-->
<content>
<div>Text inside tag</div>
</content>
#should return "<div>Text inside tag</div>

<!--2-->
<content>
Text with no tag
</content>
#should return "Text with no tag"


<!--3-->
<content>
Text outside tag <div>Text inside tag</div>
</content>
#should return "Text outside tag <div>Text inside tag</div>"

推荐答案

尝试:

def stringify_children(node):
    from lxml.etree import tostring
    from itertools import chain
    parts = ([node.text] +
            list(chain(*([c.text, tostring(c), c.tail] for c in node.getchildren()))) +
            [node.tail])
    # filter removes possible Nones in texts and tails
    return ''.join(filter(None, parts))

示例:

from lxml import etree
node = etree.fromstring("""<content>
Text outside tag <div>Text <em>inside</em> tag</div>
</content>""")
stringify_children(node)

产生:'\nText outside tag <div>Text <em>inside</em> tag</div>\n'

这篇关于获取lxml中标签内的所有文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取lxml中标签内的所有文本 [英] Get all text inside a tag in lxml

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

获取lxml中标签内的所有文本 [英] Get all text inside a tag in lxml

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭