BeautifulSoup移除标签 [英] BeautifulSoup removing tags

查看:83
本文介绍了BeautifulSoup移除标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从源代码中删除样式标签及其内容,但是它不起作用,没有错误只是不会分解.这就是我所拥有的:

I'm trying to remove the style tags and their contents from the source, but it's not working, no errors just simply doesn't decompose. This is what I have:

source = BeautifulSoup(open("page.html"))
getbody = source.find('body')
for child in getbody[0].children:
    try:
        if child.get('style') is not None and child.get('style') == "display:none":
            # it in here
            child.decompose()
    except:
        continue
print source
# display:hidden div's are still there.

推荐答案

以下代码可以满足您的需求,并且可以正常工作;除了处理掩盖错误之外,不要不要使用橡皮布:

The following code does what you want and works fine; do not use blanket except handling to mask bugs:

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style='display:none'):
    hidden.decompose()

或更妙的是,使用正则表达式将网络投射得更宽:

or better still, use a regular expression to cast the net a little wider:

import re

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style=re.compile(r'display:\s*none')):
    hidden.decompose()

Tag.children 仅列出 body 标签的 direct 个子级,而不是所有嵌套的子级.

Tag.children only lists direct children of the body tag, not all nested children.

这篇关于BeautifulSoup移除标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆