Beautifulsoup-如何获取具有特定类的块内的所有链接? [英] Beautifulsoup - How to get all links inside a block with a certain class?
本文介绍了Beautifulsoup-如何获取具有特定类的块内的所有链接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下HTML Dom:
I have the following HTML Dom:
<div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link">
<a class="dev-link" href="http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>
<a class="dev-link" href="mailto:info@jourist.com" rel="nofollow" target="_blank">Написать: info@jourist.com</a>
<div class="content physical-address">Diagonalstraße 41
20537 Hamburg</div> </div> </div>
我需要在块div.meta-info-wide
中获取所有具有类dev-link
的链接(URL).
I need to get all links(url) with class dev-link
inside block div.meta-info-wide
.
我尝试了这种明显的方法,但是不起作用:
I tried this obvious way, but does not work:
divTag = soup.find_all("div", {"class":"meta-info-wide"})
print(len(divTag))
for tag in divTag:
tdTags = tag.find_all("a", {"class":"dev-link"})
for tag in tdTags:
print tag.text
推荐答案
尝试以下操作:
import bs4
html = """
<div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link">
<a class="dev-link" href="http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>
<a class="dev-link" href="mailto:info@jourist.com" rel="nofollow" target="_blank">Написать: info@jourist.com</a>
<div class="content physical-address">Diagonalstraße 4120537 Hamburg</div> </div> </div>"""
soup = bs4.BeautifulSoup(html, "html.parser")
for div in soup.find_all("div", {"class":"meta-info-wide"}):
for link in div.select("a.dev-link"):
print link['href']
这给您:
http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg
mailto:info@jourist.com
select()
用于返回所有具有类dev-link
的a
标签.当涉及两个或多个CSS类时,建议使用此方法.
The select()
is used to return all a
tags which have the class dev-link
. This is the recommended method to use when there are two or more CSS classes involved.
经过BeautifulSoup 4.5.1和Python 2.7.12的测试
Tested with BeautifulSoup 4.5.1, Python 2.7.12
这篇关于Beautifulsoup-如何获取具有特定类的块内的所有链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文