BeautifulSoup:根据先前标签的内容打印div [英] BeautifulSoup: Print div's based on content of preceding tag
本文介绍了BeautifulSoup:根据先前标签的内容打印div的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想根据前面的标签选择元素的内容:
I would like to select the contents of elements based on the preceding tag:
<h4>Models & Products</h4>
<div class="profile-area">...</div>
<h4>Production Capacity (year)</h4>
<div class="profile-area">...</div>
如何根据前面标记的内容获取配置文件区域"值?
How can I get the "profile-area" values based on the content of the preceding tag?
这是我的代码:
import requests
from bs4 import BeautifulSoup
import csv
import re
html_doc = """
<html>
<body>
<div class="col-md-6">
<iframe class="factory_detail_google_map" frameborder="0" src=
"https://www.google.com/maps/embed/v1/search?q=3.037787%2C101.38189&key=AIzaSyCMDADp9QHYbQ8OBGl8puAOv-16W8ziz7Y"
allowfullscreen=""></iframe>
</div>
<div class="col-md-12">
<h4>Models & Products</h4>
<div class="profile-area">
Large Buses, Trucks, Trailer-heads
</div>
<h4>Production Capacity (year)</h4>
<div class="profile-area">
Vehicle 700 units /year
</div>
<h4>Output</h4>
<div class="profile-area">
Vehicle 356 units ( 2016 )
</div>
<div class="profile-area">
Vehicle 477 units ( 2015 )
</div>
<div class="profile-area">
Vehicle 760 units ( 2014 )
</div>
<div class="profile-area">
Vehicle 647 units ( 2013 )
</div>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')
#link=soup.iframe.get('src')
#print(link.split("%2C"))
for item in soup.select("div.profile-area"):
print(item.text)
如您所见,我也在尝试将Google Maps链接拆分为坐标,但是我可能会自行解决.
As you can see I'm also trying to split the Google Maps link into coordinates, but this I will figure out probably on my own.
感谢您的帮助!
推荐答案
使用.find_previous_sibling()
显式查找前面的第一个h4
标记:
Use .find_previous_sibling()
to explicitly find the first preceding h4
tag:
for item in soup.select("div.profile-area"):
prev_h4 = item.find_previous_sibling('h4').text
if 'Capacity' in prev_h4:
print(item.text)
输出
Vehicle 700 units /year
这篇关于BeautifulSoup:根据先前标签的内容打印div的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文