在beautifulsoup / python中查找带有特定文本的标签的索引 [英] Find index of tag with certain text in beautifulsoup/python

查看:3101
本文介绍了在beautifulsoup / python中查找带有特定文本的标签的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的4x2 html表格,其中包含有关属性的信息。



我试图提取 1972 的值,它位于建成年份。如果我找到所有标签 td ,我该如何提取包含文本 Year Built 的标签索引?



因为一旦找到该索引,我可以添加 4 来到包含值 1972



这里是html:

 <表> 
< tbody>
< tr>
< td>建筑物< / td>
< td>类型< / td>
< td>建成年份< / td>
< td> Sq。 FT< / TD>
< / tr>
< tr>
< td> R01< / td>
< td> DWELL< / td>
< td> 1972< / td>
< td> 1166< / td>
< / tr>
< / tbody>
< / table>

例如我知道如果我的输入是索引 2 ,我的输出是该标记的文本 Year Built ,我可以这样做:

  from bs4 import BeautifulSoup 
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
print td_list [2] .text

但是,如何使用输入文字建成年份来取得输出索引 2

解决方案

,最好使用行和列索引。试试这个:

$ $ p $ $ $ $ $ $ $ row $ soup.find(table)。find(tbody)。find_all(tr )
print rows [1] .find_all(td)[2] .get_text()

或者,如果您只想查找包含Year Built的标签的索引号:

  from bs4 import BeautifulSoup 
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
i = 0
for td_list中的元素:
if elem.text =='Year内置':
ind = i
i + = 1
print td_list [ind] .text


I have a simple 4x2 html table that contains information about a property.

I'm trying to extract the value 1972, which is under the column heading of Year Built. If I find all the tags td, how do I extract the index of the tag that contains the text Year Built?

Because once I find that index, I can just add 4 to get to the tag that contains the value 1972.

Here is the html:

<table>
    <tbody>
        <tr>
            <td>Building</td>
            <td>Type</td>
            <td>Year Built</td>
            <td>Sq. Ft.</td>
        </tr>
        <tr>
            <td>R01</td>
            <td>DWELL</td>
            <td>1972</td>
            <td>1166</td>
        </tr>   
    </tbody>
</table>

For example I know that if my input is index 2 and my output is text of that tag Year Built, I can just do this:

from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
print td_list[2].text

But how do I use input of text Year Built to get output of index 2?

解决方案

If your table has a static scheme, it is better using row and column indexes. Try this:

rows = soup.find("table").find("tbody").find_all("tr")
print rows[1].find_all("td")[2].get_text()

Alternatively if you just want to find index number of the tag containing "Year Built":

from bs4 import BeautifulSoup
soup = BeautifulSoup(myhtml)
td_list = soup.find_all('td')
i = 0
for elem in td_list:
    if elem.text == 'Year Built':
        ind = i
    i += 1
print td_list[ind].text

这篇关于在beautifulsoup / python中查找带有特定文本的标签的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆