BeautifulSoup部分div类匹配 [英] BeautifulSoup partial div class matching
问题描述
我需要通过抓取从Github获取里程碑信息.
里程碑信息嵌入两种类型的div类中:
table-list-item milestone notdue
和table-list-item milestone
.
I need to fetch milestone information from Github by scraping.
The milestone information is embedded in 2 types of div classes:
table-list-item milestone notdue
and table-list-item milestone
.
如何检索两个类中包含的信息?
How can I retrieve the information contained in both classes?
我有:
milestones = soup.find_all('div', {'class': 'table-list-item milestone'})
但是此行返回table-list-item milestone notdue
I have:
milestones = soup.find_all('div', {'class': 'table-list-item milestone'})
but this line returns empty list for table-list-item milestone notdue
现在我正在执行以下操作(丑陋的骇客):
Right now I am doing the following (ugly hack):
milestones = soup.find_all('div', {'class':'table-list-item milestone'})
milestones.extend(soup.findAll('div', {'class': 'table-list-item milestone notdue'}))
对此有任何优雅的解决方案吗?
Is there any elegant solution for this?
根据此问题,BeautifulSoup应该返回所有匹配的.我的问题正好相反!
As per this question, BeautifulSoup is supposed to return all matching ones. My issue is exactly opposite!
推荐答案
soup.find_all('div', {'class': 'milestone'})
或使用CSS选择器:
soup.select('.milestone')
在bs4中,class
是多值属性:
in bs4, class
is Multi-valued attributes:
它存储在列表中:[table-list-item, milestone, notdue] and [table-list-item, milestone]
您需要做的就是找到共享值,例如milestone
what you need to do is find the shared value,like milestone
这篇关于BeautifulSoup部分div类匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!