在 Beautifulsoup Python 上排除不需要的标签 [英] Exclude unwanted tag on Beautifulsoup Python

查看：64 发布时间：2021/12/17 13:26:48 python html web-scraping beautifulsoup

本文介绍了在 Beautifulsoup Python 上排除不需要的标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 </span>

如何打印我喜欢你的脸"而不是我喜欢打你的脸"

How to print "I Like your face" instead of "I Like to punch your face"

我试过了

lala = soup.find_all('span')
for p in lala:
 if not p.find(class_='unwanted'):
    print p.text

但它给类型错误:find() 没有关键字参数"

but it give "TypeError: find() takes no keyword arguments"

推荐答案

您可以使用 extract() 在获得文本之前删除不需要的标签.

You can use extract() to remove unwanted tag before you get text.

但它保留了所有 ' ' 和 spaces 所以你需要做一些工作来删除它们.

But it keeps all ' ' and spaces so you will need some work to remove them.

data = '''<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 <span>'''

from bs4 import BeautifulSoup as BS

soup = BS(data, 'html.parser')

external_span = soup.find('span')

print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())

unwanted = external_span.find('span')
unwanted.extract()

print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())

结果

1 HTML: <span>
  I Like
  <span class="unwanted"> to punch </span>
   your face
 <span></span></span>
1 TEXT: I Like
   to punch 
   your face
2 HTML: <span>
  I Like

   your face
 <span></span></span>
2 TEXT: I Like

   your face

<小时>

您可以跳过外部范围内的每个 Tag 对象，只保留 NavigableString 对象(HTML 中的纯文本).

You can skip every Tag object inside external span and keep only NavigableString objects (it is plain text in HTML).

data = '''<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 <span>'''

from bs4 import BeautifulSoup as BS
import bs4

soup = BS(data, 'html.parser')

external_span = soup.find('span')

text = []
for x in external_span:
    if isinstance(x, bs4.element.NavigableString):
        text.append(x.strip())
print(" ".join(text))

结果

I Like your face

这篇关于在 Beautifulsoup Python 上排除不需要的标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 Beautifulsoup Python 上排除不需要的标签 [英] Exclude unwanted tag on Beautifulsoup Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

在 Beautifulsoup Python 上排除不需要的标签 [英] Exclude unwanted tag on Beautifulsoup Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭