python remove everything between< div class =" comment> .. any ...< / div> [英] python remove everything between <div class="comment> .. any... </div>

查看:178
本文介绍了python remove everything between< div class =" comment> .. any ...< / div>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用python 2.6删除一切,包括< div class =comment> ....删除所有....< / div>



我尝试了各种方式使用re.sub没有任何成功

谢谢

解决方案

解析器如 BeautifulSoup

 >>>来自BeautifulSoup import BeautifulSoup 
>>>> soup = BeautifulSoup('< body>< div> 1< / div>< div class =comment>< strong> 2< / strong>< / div>< / body>')
>>>> for div in soup.findAll('div','comment'):
... div.extract()
...
< div class =comment> ; strong> 2< / strong>< / div>
>>>> soup
< body>< div> 1< / div>< / body&

有关为什么使用正则表达式解析HTML是一个坏主意


how do you use python 2.6 to remove everything including the <div class="comment"> ....remove all ....</div>

i tried various way using re.sub without any success

Thank you

解决方案

This can be done easily and reliably using an HTML parser like BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<body><div>1</div><div class="comment"><strong>2</strong></div></body>')
>>> for div in soup.findAll('div', 'comment'):
...   div.extract()
... 
<div class="comment"><strong>2</strong></div>
>>> soup
<body><div>1</div></body>

See this question for examples on why parsing HTML using regular expressions is a bad idea.

这篇关于python remove everything between&lt; div class =&quot; comment&gt; .. any ...&lt; / div&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆