使用BeautifulSoup提取标签中的内容 [英] Extract content within a tag with BeautifulSoup

查看:789
本文介绍了使用BeautifulSoup提取标签中的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提取内容Hello world.请注意,页面上还有多个<table>和相似的<td colspan="2">:

I'd like to extract the content Hello world. Please note that there are multiples <table> and similar <td colspan="2"> on the page as well:

<table border="0" cellspacing="2" width="800">
  <tr>
    <td colspan="2"><b>Name: </b>Hello world</td>
  </tr>
  <tr>
...

我尝试了以下操作:

hello = soup.find(text='Name: ')
hello.findPreviousSiblings

但是它什么也没返回.

此外,我在提取My home address时遇到了以下问题:

In addition, I'm also having problem with the following extracting the My home address:

<td><b>Address:</b></td>

<td>My home address</td>

我也使用相同的方法搜索text="Address: ",但是如何导航到下一行并提取<td>的内容?

I'm also using the same method to search for the text="Address: " but how do I navigate down to the next line and extract the content of <td>?

推荐答案

contents运算符非常适合从<tag>text</tag>提取text.

The contents operator works well for extracting text from <tag>text</tag> .

<td>My home address</td>示例:

s = '<td>My home address</td>'
soup =  BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address


<td><b>Address:</b></td>示例:


<td><b>Address:</b></td> example:

s = '<td><b>Address:</b></td>'
soup =  BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address:

这篇关于使用BeautifulSoup提取标签中的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆