在 Beautifulsoup Python 上排除不需要的标签 [英] Exclude unwanted tag on Beautifulsoup Python
本文介绍了在 Beautifulsoup Python 上排除不需要的标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
<span>
I Like
<span class='unwanted'> to punch </span>
your face
</span>
如何打印我喜欢你的脸"而不是我喜欢打你的脸"
How to print "I Like your face" instead of "I Like to punch your face"
我试过了
lala = soup.find_all('span')
for p in lala:
if not p.find(class_='unwanted'):
print p.text
但它给类型错误:find() 没有关键字参数"
but it give "TypeError: find() takes no keyword arguments"
推荐答案
您可以使用 extract()
在获得文本之前删除不需要的标签.
You can use extract()
to remove unwanted tag before you get text.
但它保留了所有 '
'
和 spaces
所以你需要做一些工作来删除它们.
But it keeps all '
'
and spaces
so you will need some work to remove them.
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
soup = BS(data, 'html.parser')
external_span = soup.find('span')
print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())
unwanted = external_span.find('span')
unwanted.extract()
print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())
结果
1 HTML: <span>
I Like
<span class="unwanted"> to punch </span>
your face
<span></span></span>
1 TEXT: I Like
to punch
your face
2 HTML: <span>
I Like
your face
<span></span></span>
2 TEXT: I Like
your face
<小时>
您可以跳过外部范围内的每个 Tag
对象,只保留 NavigableString
对象(HTML 中的纯文本).
You can skip every Tag
object inside external span and keep only NavigableString
objects (it is plain text in HTML).
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
import bs4
soup = BS(data, 'html.parser')
external_span = soup.find('span')
text = []
for x in external_span:
if isinstance(x, bs4.element.NavigableString):
text.append(x.strip())
print(" ".join(text))
结果
I Like your face
这篇关于在 Beautifulsoup Python 上排除不需要的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文