从lxml获取内部文本 [英] Get inner text from lxml

查看:703
本文介绍了从lxml获取内部文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

lxml.html.fromstring坚持将所有内容包装在标记中(默认为p).在此标签树中,

lxml.html.fromstring insists on wrapping up everything in a tag (p default). From this tag tree,

<p>this is <b>the</b> good stuff<p>

我要提取字符串:

this is <b>the</b> good stuff

我该怎么做?

推荐答案

通常称为内部xml",而不是内部文本".这是获取元素内部xml的一种可能方法:

That's often referred to as "inner xml" rather than "inner text". This is one possible way to get inner xml of an element :

import lxml.etree as etree
import lxml.html

html = "<p>this is <b>the</b> good stuff<p>"
tree = lxml.html.fromstring(html)
node = tree.xpath("//p")[0]

result = node.text + ''.join(etree.tostring(e) for e in node)
print(result)

输出:

this is <b>the</b> good stuff

这篇关于从lxml获取内部文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆