在beautifulsoup / python中查找带有特定文本的标签的索引 [英] Find index of tag with certain text in beautifulsoup/python
问题描述
我有一个简单的4x2 html表格,其中包含有关属性的信息。
我试图提取 1972
的值,它位于建成年份
。如果我找到所有标签 td
,我该如何提取包含文本 Year Built
的标签索引?
因为一旦找到该索引,我可以添加 4
来到包含值 1972
。
这里是html:
<表>
< tbody>
< tr>
< td>建筑物< / td>
< td>类型< / td>
< td>建成年份< / td>
< td> Sq。 FT< / TD>
< / tr>
< tr>
< td> R01< / td>
< td> DWELL< / td>
< td> 1972< / td>
< td> 1166< / td>
< / tr>
< / tbody>
< / table>
例如我知道如果我的输入是索引 2
,我的输出是该标记的文本 Year Built
,我可以这样做:
from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
print td_list [2] .text
但是,如何使用输入文字建成年份
来取得输出索引 2
?
,最好使用行和列索引。试试这个:
$ $ p $ $ $ $ $ $ $ row $ soup.find(table)。find(tbody)。find_all(tr )
print rows [1] .find_all(td)[2] .get_text()
或者,如果您只想查找包含Year Built的标签的索引号:
from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
i = 0
for td_list中的元素:
if elem.text =='Year内置':
ind = i
i + = 1
print td_list [ind] .text
I have a simple 4x2 html table that contains information about a property.
I'm trying to extract the value 1972
, which is under the column heading of Year Built
. If I find all the tags td
, how do I extract the index of the tag that contains the text Year Built
?
Because once I find that index, I can just add 4
to get to the tag that contains the value 1972
.
Here is the html:
<table>
<tbody>
<tr>
<td>Building</td>
<td>Type</td>
<td>Year Built</td>
<td>Sq. Ft.</td>
</tr>
<tr>
<td>R01</td>
<td>DWELL</td>
<td>1972</td>
<td>1166</td>
</tr>
</tbody>
</table>
For example I know that if my input is index 2
and my output is text of that tag Year Built
, I can just do this:
from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
print td_list[2].text
But how do I use input of text Year Built
to get output of index 2
?
If your table has a static scheme, it is better using row and column indexes. Try this:
rows = soup.find("table").find("tbody").find_all("tr")
print rows[1].find_all("td")[2].get_text()
Alternatively if you just want to find index number of the tag containing "Year Built":
from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
i = 0
for elem in td_list:
if elem.text == 'Year Built':
ind = i
i += 1
print td_list[ind].text
这篇关于在beautifulsoup / python中查找带有特定文本的标签的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!