BeautifulSoup在多个< div>之后获取内容等级 [英] BeautifulSoup getting content behind multiple <div> levels

查看:188
本文介绍了BeautifulSoup在多个< div>之后获取内容等级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用BeautifulSoup获取两个"div"后面的时间数据?

<div>
<div>
6:00.00
</div>
</div>

我尝试了以下代码

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.energystorageexchange.org/projects/2") 
soup = BeautifulSoup(page.content, 'lxml')

rows = soup.select("div.div")

for r in rows:
    print(r)

但这并不容易.

完整的HTML示例:

<div class='row'>
<hr class='border zeropadding zeromargin'>
<div class='col-md-6 zeropadding'>
<label class='new_font'>Duration at Rated Power (HH:MM)</label>
</div>
<div class='col-md-6 new_font'>
<div></div>
<div>
<div>
6:00.00
</div>
</div>

</div>
</hr>
</div>
<div class='row'>
<hr class='border zeropadding zeromargin'>
<div class='col-md-6 zeropadding new_font'>
<label class='new_font'>Weblink1</label>
</div>
<div class='col-md-6 new_font'>
<div>
<div class='show_value'>
<a href="http://www.gillsonions.com/node/192" target='_new' class='boldbluelink'>http://www.gillsonions.com/node/192</a>
</div>
</div>

它来自 https://www.energystorageexchange.org/projects/2

感谢您的帮助.

第二个问题:

我还想从

捕获以kW为单位的尺寸

<input id='size_in_kw' type='hidden' value='1500'>

我已经尝试过了,但这似乎是不完整的:

value = soup.find('input', {'id': 'size_in_kw'}).get('value')

解决方案

第二个问题:

if "kW" in item.text:
    itemval = item.find_parent().find_next_sibling().text.strip()
    output.append(itemval)

How can I get the time data behind two "divs" with BeautifulSoup?

<div>
<div>
6:00.00
</div>
</div>

I've tried the following code

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.energystorageexchange.org/projects/2") 
soup = BeautifulSoup(page.content, 'lxml')

rows = soup.select("div.div")

for r in rows:
    print(r)

but it doesn't work that easy.

The full HTML sample:

<div class='row'>
<hr class='border zeropadding zeromargin'>
<div class='col-md-6 zeropadding'>
<label class='new_font'>Duration at Rated Power (HH:MM)</label>
</div>
<div class='col-md-6 new_font'>
<div></div>
<div>
<div>
6:00.00
</div>
</div>

</div>
</hr>
</div>
<div class='row'>
<hr class='border zeropadding zeromargin'>
<div class='col-md-6 zeropadding new_font'>
<label class='new_font'>Weblink1</label>
</div>
<div class='col-md-6 new_font'>
<div>
<div class='show_value'>
<a href="http://www.gillsonions.com/node/192" target='_new' class='boldbluelink'>http://www.gillsonions.com/node/192</a>
</div>
</div>

It's from https://www.energystorageexchange.org/projects/2

Thanks for any help.

2nd Question:

I would also like to capture size in kW from

<input id='size_in_kw' type='hidden' value='1500'>

I've tried this, but it seems to be incomplete:

value = soup.find('input', {'id': 'size_in_kw'}).get('value')

解决方案

To your second question:

if "kW" in item.text:
    itemval = item.find_parent().find_next_sibling().text.strip()
    output.append(itemval)

这篇关于BeautifulSoup在多个&lt; div&gt;之后获取内容等级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆