用LXML文本元素中的HTML标记替换文本 [英] Replace text with HTML tag in LXML text element

查看:64
本文介绍了用LXML文本元素中的HTML标记替换文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些lxml元素:

>> lxml_element.text
  'hello BREAK world'

我需要将BREAK替换为HTML中断标记— <br />.我试图做简单的文本替换:

I need to replace the word BREAK with an HTML break tag—<br />. I've tried to do simple text replacing:

lxml_element.text.replace('BREAK', '<br />')

,但它会插入带有转义符号的标签,例如&lt;br/&gt;.我该如何解决这个问题?

but it inserts the tag with escaped symbols, like &lt;br/&gt;. How do I solve this problem?

推荐答案

这是您的方法.根据您的问题设置样本lxml:

Here's how you could do it. Setting up a sample lxml from your question:

>>> import lxml
>>> some_data = "<b>hello BREAK world</b>"
>>> root = lxml.etree.fromstring(some_data)
>>> root
<Element b at 0x3f35a50>
>>> root.text
'hello BREAK world'

接下来,创建一个子元素标签< br>:

Next, create a subelement tag <br>:

>>> childbr = lxml.etree.SubElement(root, "br")
>>> childbr
<Element br at 0x3f35b40>
>>> lxml.etree.tostring(root)
'<b>hello BREAK world<br/></b>'

但这不是您想要的.您必须在< br>之前加上文字.并将其放置在:

But that's not all you want. You have to take the text before the <br> and place it in .text:

>>> root.text = "hello"
>>> lxml.etree.tostring(root)
'<b>hello<br/></b>'

然后将子级的.tail设置为包含其余文本:

Then set the .tail of the child to contain the rest of the text:

>>> childbr.tail = "world"
>>> lxml.etree.tostring(root)
'<b>hello<br/>world</b>'

这篇关于用LXML文本元素中的HTML标记替换文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆