替换 在BeautifulSoap输出中有空格 [英] Replace with space in BeautifulSoap output

查看：44 发布时间：2021/4/15 19:08:28 python web-scraping beautifulsoup

本文介绍了替换 在BeautifulSoap输出中有空格的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用BeautifulSoap抓取一些链接，但是，它似乎完全忽略了  标签.

I am scraping a few links with BeautifulSoap however, it seems to completely ignore   tags.

这是我要删除的URL的源代码的相关部分:

Here is the relevant portion of source code of the URL I am scrapping:

<h1 class="para-title">A quick brown fox jumps over<br>the lazy dog
<span id="something">&#xe800;</span></h1>

这是我的BeautifulSoap代码(仅相关部分)，用于在 h1 标签中获取文本:

Here is my BeautifulSoap code (relevant part only) to get the text within h1 tags:

    soup = BeautifulSoup(page, 'html.parser')
    title_box = soup.find('h1', attrs={'class': 'para-title'})
    title = title_box.text.strip()
    print title

这将提供以下输出:

    A quick brown fox jumps overthe lazy dog

我希望如此:

    A quick brown fox jumps over the lazy dog

如何在代码中用 space 替换  ?

推荐答案

如何将 .get_text()与分隔符参数一起使用?

How about using the .get_text() with the separator parameter?

from bs4 import BeautifulSoup

page = '''<h1 class="para-title">A quick brown fox jumps over<br>the lazy dog
<span>some stuff here</span></h1>'''


soup = BeautifulSoup(page, 'html.parser')
title_box = soup.find('h1', attrs={'class': 'para-title'})
title = title_box.get_text(separator=" ").strip()
print (title)

输出:

print (title)
A quick brown fox jumps over the lazy dog
 some stuff here

这篇关于替换 在BeautifulSoap输出中有空格的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

替换< br>在BeautifulSoap输出中有空格 [英] Replace <br> with space in BeautifulSoap output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

替换&lt; br&gt;在BeautifulSoap输出中有空格 [英] Replace &lt;br&gt; with space in BeautifulSoap output

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

替换< br>在BeautifulSoap输出中有空格 [英] Replace <br> with space in BeautifulSoap output

登录关闭