BeautifulSoup移除标签 [英] BeautifulSoup removing tags
本文介绍了BeautifulSoup移除标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从源代码中删除样式标签及其内容,但是它不起作用,没有错误只是不会分解.这就是我所拥有的:
I'm trying to remove the style tags and their contents from the source, but it's not working, no errors just simply doesn't decompose. This is what I have:
source = BeautifulSoup(open("page.html"))
getbody = source.find('body')
for child in getbody[0].children:
try:
if child.get('style') is not None and child.get('style') == "display:none":
# it in here
child.decompose()
except:
continue
print source
# display:hidden div's are still there.
推荐答案
以下代码可以满足您的需求,并且可以正常工作;除了处理掩盖错误之外,不要不要使用橡皮布:
The following code does what you want and works fine; do not use blanket except handling to mask bugs:
source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style='display:none'):
hidden.decompose()
或更妙的是,使用正则表达式将网络投射得更宽:
or better still, use a regular expression to cast the net a little wider:
import re
source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style=re.compile(r'display:\s*none')):
hidden.decompose()
Tag.children
仅列出 body
标签的 direct 个子级,而不是所有嵌套的子级.
Tag.children
only lists direct children of the body
tag, not all nested children.
这篇关于BeautifulSoup移除标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文