BeautifulSoup部分div类匹配 [英] BeautifulSoup partial div class matching

查看:154
本文介绍了BeautifulSoup部分div类匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要通过抓取从Github获取里程碑信息. 里程碑信息嵌入两种类型的div类中: table-list-item milestone notduetable-list-item milestone.

I need to fetch milestone information from Github by scraping. The milestone information is embedded in 2 types of div classes: table-list-item milestone notdue and table-list-item milestone.

如何检索两个类中包含的信息?

How can I retrieve the information contained in both classes?

我有: milestones = soup.find_all('div', {'class': 'table-list-item milestone'}) 但是此行返回table-list-item milestone notdue

I have: milestones = soup.find_all('div', {'class': 'table-list-item milestone'}) but this line returns empty list for table-list-item milestone notdue

现在我正在执行以下操作(丑陋的骇客):

Right now I am doing the following (ugly hack):

milestones = soup.find_all('div', {'class':'table-list-item milestone'})
milestones.extend(soup.findAll('div', {'class': 'table-list-item milestone notdue'}))

对此有任何优雅的解决方案吗?

Is there any elegant solution for this?

根据此问题,BeautifulSoup应该返回所有匹配的.我的问题正好相反!

As per this question, BeautifulSoup is supposed to return all matching ones. My issue is exactly opposite!

推荐答案

soup.find_all('div', {'class': 'milestone'})

或使用CSS选择器:

soup.select('.milestone')

在bs4中,class是多值属性:

in bs4, class is Multi-valued attributes:

它存储在列表中:[table-list-item, milestone, notdue] and [table-list-item, milestone]

您需要做的就是找到共享值,例如milestone

what you need to do is find the shared value,like milestone

这篇关于BeautifulSoup部分div类匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆