Python:使用`lxml.html`将HTML内容注入标签 [英] Python: Injecting HTML content into a tag using `lxml.html`

查看:354
本文介绍了Python:使用`lxml.html`将HTML内容注入标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 lxml.html 库解析HTML文档。



我找到了一个特定的标记,我称之为 content_tag ,并且我想更改其内容(即< div> < / div> ,)并且新内容是一个带有一些html的字符串,比如'Hello< b> world!<<< ; / b>'



我该怎么做?我尝试了 content_tag.text ='Hello< b> world!< / b>',但它转义了所有html标签,取代< ; with & lt; etc。

我想注入文本<没有转义任何HTML。

解决方案

这是一种方式:

 #!/ usr / bin / env python2.6 
from lxml.html import fromstring,tostring
from lxml.html import builder as E
fragment =\
< div id =outer>
< div id =inner>这是div。< / div>
< / (div)
#< div id =outer>
#< div id =inner>这是div。< / div>
#< / div>
div.replace(div.get_element_by_id('inner'),E.DIV('Hello',EB('world!')))
print tostring(div)
#< div id =outer>
#< div> Hello< b> world!< / b>< / div>< / div>

另请参阅: http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory



编辑:因此,我早该承认我并不熟悉lxml。我简要地看了一下文档和源代码,但没有找到一个干净的解决方案。也许,更熟悉的人会停下来,让我们两个都挺直。



与此同时,这似乎奏效,但没有得到充分测试:

  import lxml.html 
content_tag = lxml.html.fromstring('< div> Goodbye。< / div>')
content_tag.text =''仅假设文字开始
for lxml.html.fragments_fromstring('Hello< b> world!< / b>'):
如果输入(elem)== str:#但只有第一个?
content_tag.text + = elem
else:
content_tag.append(elem)
print lxml.html.tostring(content_tag)

再次编辑:此版本删除文字和孩子

 somehtml ='Hello< b> world!< / b>'
#purge元素内容
content_tag.text =''
for child in content_tag.getchildren():
content_tag.remove(child)

fragments = lxml.html.fragments_fromstring(somehtml)
if type(fragments [0])== str:
content_tag.text = fragments.pop(0)
content_tag.extend(fragments)


I'm using the lxml.html library to parse an HTML document.

I located a specific tag, that I call content_tag, and I want to change its content (i.e. the text between <div> and </div>,) and the new content is a string with some html in it, say it's 'Hello <b>world!</b>'.

How do I do that? I tried content_tag.text = 'Hello <b>world!</b>' but then it escapes all the html tags, replacing < with &lt; etc.

I want to inject the text without escaping any HTML. How can I do that?

解决方案

This is one way:

#!/usr/bin/env python2.6
from lxml.html import fromstring, tostring
from lxml.html import builder as E
fragment = """\
<div id="outer">
  <div id="inner">This is div.</div>
</div>"""

div = fromstring(fragment)
print tostring(div)
# <div id="outer">
#   <div id="inner">This is div.</div>
# </div>
div.replace(div.get_element_by_id('inner'), E.DIV('Hello ', E.B('world!')))
print tostring(div)
# <div id="outer">
#   <div>Hello <b>world!</b></div></div>

See also: http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory

Edit: So, I should have confessed earlier that I'm not all that familiar with lxml. I looked at the docs and source briefly, but didn't find a clean solution. Perhaps, someone more familiar will stop by and set us both straight.

In the meantime, this seems to work, but is not well tested:

import lxml.html
content_tag = lxml.html.fromstring('<div>Goodbye.</div>')
content_tag.text = '' # assumes only text to start
for elem in lxml.html.fragments_fromstring('Hello <b>world!</b>'):
    if type(elem) == str: #but, only the first?
        content_tag.text += elem
    else:
        content_tag.append(elem)
print lxml.html.tostring(content_tag)

Edit again: and this version removes text and children

somehtml = 'Hello <b>world!</b>'
# purge element contents
content_tag.text = ''
for child in content_tag.getchildren():
    content_tag.remove(child)

fragments = lxml.html.fragments_fromstring(somehtml)
if type(fragments[0]) == str:
    content_tag.text = fragments.pop(0)
content_tag.extend(fragments)

这篇关于Python:使用`lxml.html`将HTML内容注入标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆