如何使用BeautifulSoup4获取< br>之前的所有文本.标签 [英] How do I use BeautifulSoup4 to get ALL text before <br> tag

查看:88
本文介绍了如何使用BeautifulSoup4获取< br>之前的所有文本.标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为我的应用抓取一些数据.我的问题是我需要一些 这是HTML代码:

I'm trying to scrape some data for my app. My question is I need some Here is the HTML code:

<tr>
  <td>
    This
    <a class="tip info" href="blablablablabla">is a first</a>
    sentence.
    <br>
    This
    <a class="tip info" href="blablablablabla">is a second</a>
    sentence.
    <br>This
    <a class="tip info" href="blablablablabla">is a third</a>
    sentence.
    <br>
  </td>
</tr>

我希望输出看起来像

这是第一句话.
这是第二句话.
这是第三句话.

This is a first sentence.
This is a second sentence.
This is a third sentence.

有可能这样做吗?

推荐答案

尝试一下.它应该为您提供所需的输出.只需将以下脚本中使用的content变量视为上面粘贴的html elements的所有者即可.

Try this. It should give you the desired output. Just consider the content variable used within the below script to be the holder of your above pasted html elements.

from bs4 import BeautifulSoup

soup = BeautifulSoup(content,"lxml")
items = ','.join([''.join([item.previous_sibling,item.text,item.next_sibling]) for item in soup.select(".tip.info")])
data = ' '.join(items.split()).replace(",","\n")
print(data)

输出:

This is a first sentence. 
This is a second sentence. 
This is a third sentence.

这篇关于如何使用BeautifulSoup4获取&lt; br&gt;之前的所有文本.标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆