使用BeautifulSoup提取标签中的内容 [英] Extract content within a tag with BeautifulSoup
问题描述
我想提取内容Hello world
.请注意,页面上还有多个<table>
和相似的<td colspan="2">
:
I'd like to extract the content Hello world
. Please note that there are multiples <table>
and similar <td colspan="2">
on the page as well:
<table border="0" cellspacing="2" width="800">
<tr>
<td colspan="2"><b>Name: </b>Hello world</td>
</tr>
<tr>
...
我尝试了以下操作:
hello = soup.find(text='Name: ')
hello.findPreviousSiblings
但是它什么也没返回.
此外,我在提取My home address
时遇到了以下问题:
In addition, I'm also having problem with the following extracting the My home address
:
<td><b>Address:</b></td>
<td>My home address</td>
我也使用相同的方法搜索text="Address: "
,但是如何导航到下一行并提取<td>
的内容?
I'm also using the same method to search for the text="Address: "
but how do I navigate down to the next line and extract the content of <td>
?
推荐答案
contents
运算符非常适合从<tag>text</tag>
提取text
.
The contents
operator works well for extracting text
from <tag>text</tag>
.
<td>My home address</td>
示例:
s = '<td>My home address</td>'
soup = BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address
<td><b>Address:</b></td>
示例:
<td><b>Address:</b></td>
example:
s = '<td><b>Address:</b></td>'
soup = BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address:
这篇关于使用BeautifulSoup提取标签中的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!