如何在python中修改html树? [英] How to modify an html tree in python?

查看:80
本文介绍了如何在python中修改html树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一些可变片段html代码

Suppose there is some variable fragment html code

<p>
    <span class="code"> string 1 </ span>
    <span class="code"> string 2 </ span>
    <span class="code"> string 3 </ span>
</ p>
<p>
    <span class="any"> Some text </ span>
</ p>

我需要使用类代码<span>修改所有标签的内容,从而跳过某些功能(例如foo),该功能将返回修改后的标签<span>的内容.最终,我应该得到一个新的html文档,如下所示:

I need to modify the contents of all the tags with the class code <span> skipping content through some function, such as foo, which returns the contents of the modified tag <span>. Ultimately, I should get a new piece of html document like this:

<p>
    <span class="code"> modify string 1 </ span>
    <span class="code"> modify string 2 </ span>
    <span class="code"> modify string 3 </ span>
</ p>
<p>
    <span class="any"> Some text </ span>
</ p>

有人建议我可以使用python库 BeautifulSoup4 来轻松搜索特定的html节点.如何执行内容<span class="code">的修改并将新版本另存为新文件?我猜想发现您需要使用soup.find_all ('span', class = re.compile ("code")),只有此函数返回样本对象的list(副本),对其进行修改不会更改汤的内容.我该如何解决这个问题?

I have been suggested that the search for the specific html nodes can be easy using the python library BeautifulSoup4. How to perform a modification of content <span class="code"> and save a new version as a new file ? I guess to find you need to use soup.find_all ('span', class = re.compile ("code")), only this function returns a list ( copy) of the sample objects , modification of which does not change the contents of soup. How do I solve this problem?

推荐答案

</ span>是无效的HTML,甚至Web浏览器的宽大解析器也无法对其进行正确解析.

</ span> is invalid HTML and not even a web browser's lenient parser will parse it properly.

修复了HTML后,就可以使用.replaceWith():

Once you fix your HTML, you can use .replaceWith():

from bs4 import BeautifulSoup

soup = BeautifulSoup('''
    <p>
        <span class="code"> string 1 </span>
        <span class="code"> string 2 </span>
        <span class="code"> string 3 </span>
    </p>
    <p>
        <span class="any"> Some text </span>
    </p>
''', 'html5lib')

for span in soup.find_all('span', class_='code'):
    span.string.replaceWith('modified ' + span.string)

这篇关于如何在python中修改html树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆