Python的beautifulsoup试图删除HTML标签“跨度” [英] Python beautifulsoup trying to remove html tags 'span'

查看：147 发布时间：2016/8/5 19:11:12 python regex beautifulsoup

本文介绍了Python的beautifulsoup试图删除HTML标签“跨度”的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图删除

[<span class="street-address">
            510 E Airline Way
           </span>]

和我已经使用这个清洗功能去掉一个是两者之间＆LT; ＆GT;

and I have used this clean function to remove the one that is in between < >

def clean(val):
 if type(val) is not StringType: val = str(val)
 val = re.sub(r'<.*?>', '',val) 
 val = re.sub("\s+" , " ", val)
 return val.strip()

和它产生的 [510ë航空公司路]

我试图在干净的功能添加到删除字符'['和]基本上我只是想获得510ê航空路。

i am trying to add within "clean" function to remove the char '[' and ']' and basically i just want to get the "510 E Airline Way".

任何人有任何线索，我可以添加到清洁功能？

anyone has any clue what can i add to clean function?

感谢您

推荐答案

使用回复：

>>> import re
>>> s='[<span class="street-address">\n            510 E Airline Way\n           </span>]'
>>> re.sub(r'\[|\]|\s*<[^>]*>\s*', '', s)
'510 E Airline Way'

使用BeautifulSoup：

Using BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup
>>> s='[<span class="street-address">\n            510 E Airline Way\n           </span>]'
>>> b = BeautifulSoup(s)
>>> b.find('span').getText()
u'510 E Airline Way'

使用lxml的：

Using lxml:

>>> from lxml import html
>>> s='[<span class="street-address">\n            510 E Airline Way\n           </span>]'
>>> h = html.document_fromstring(s)
>>> h.cssselect('span')[0].text.strip()
'510 E Airline Way'

这篇关于Python的beautifulsoup试图删除HTML标签“跨度”的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python的beautifulsoup试图删除HTML标签“跨度” [英] Python beautifulsoup trying to remove html tags 'span'

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python的beautifulsoup试图删除HTML标签“跨度” [英] Python beautifulsoup trying to remove html tags &#39;span&#39;

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python的beautifulsoup试图删除HTML标签“跨度” [英] Python beautifulsoup trying to remove html tags 'span'

登录关闭