使用Python和Beautiful Soup进行Web抓取 [英] Web scraping with Python and Beautiful Soup

查看:49
本文介绍了使用Python和Beautiful Soup进行Web抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在练习构建刮板机.我现在正在处理的一个工作涉及到一个站点,在该站点上刮取各个城市的链接,然后获取每个城市的所有链接,并刮除上述引用中的物业的所有链接.

I am practicing building web scrapers. One that I am working on now involves going to a site, scraping links for the various cities on that site, then taking all of the links for each of the cities and scraping all the links for the properties in said cites.

我正在使用以下代码:

import requests

from bs4 import BeautifulSoup

main_url = "http://www.chapter-living.com/"

# Getting individual cities url
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('a', class_="nav-title")  # Bottom page not loaded dynamycally
cities_links = [main_url + tag["href"] for tag in city_tags.find_all("a")]  # Links to cities

如果我打印出 city_tags ,我会得到所需的HTML.但是,当我打印 city_links 时,出现 AttributeError:'ResultSet'对象没有属性'find_all'.

If I print out city_tags I get the HTML I want. However, when I print cities_links I get AttributeError: 'ResultSet' object has no attribute 'find_all'.

我从其他q那里收集到该错误是因为 city_tags 不返回任何内容,但是如果要打印出所需的html则不是这种情况吗?我已经注意到说html位于[]中-这有什么区别吗?

I gather from other q's on here that this error occurs because city_tags returns none, but this can't be the case if it is printing out the desired html? I have noticed that said html is in [] - does this make a difference?

推荐答案

如错误所示, city_tags 是一个ResultSet,它是节点列表,并且没有find_all 方法,您必须遍历该集合并在每个单独的节点上应用 find_all ,或者在您的情况下,我认为您可以简单地提取 href 属性来自每个节点:

As the error says, the city_tags is a ResultSet which is a list of nodes and it doesn't have the find_all method, you either have to loop through the set and apply find_all on each individual node or in your case, I think you can simply extract the href attribute from each node:

[tag['href'] for tag in city_tags]

#['https://www.chapter-living.com/blog/',
# 'https://www.chapter-living.com/testimonials/',
# 'https://www.chapter-living.com/events/']

这篇关于使用Python和Beautiful Soup进行Web抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆