使用BeautifulSoup抓取Python转到下一页 [英] Python scraping go to next page using BeautifulSoup

查看：90 发布时间：2020/5/17 19:59:13 python web-scraping beautifulsoup next attributeerror

本文介绍了使用BeautifulSoup抓取Python转到下一页的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的抓取代码:

import requests
from bs4 import BeautifulSoup as soup
def get_emails(_links:list):
for i in range(len(_links)):
 new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'})
 if new_d:
   yield new_d[-1]['title']

start=20
while True:
d = soup(requests.get('http://www.schulliste.eu/type/gymnasien/?bundesland=&start=20').text, 'html.parser')

results = [i['href'] for i in d.find_all('a')][52:-9]
results = [link for link in results if link.startswith('http://')]
print(list(get_emails(results)))

next_page=soup.find('div', {'class': 'paging'}, 'weiter')

if next_page:

    d=next_page.get('href')
    start+=20
else:
    break

多数民众赞成在我得到的错误: AttributeError:"str"对象没有属性"find_all"

And thats the error I get: AttributeError: 'str' object has no attribute 'find_all'

当您按下按钮"weiter"(下一页)时，urlending将从"... start = 20"更改为"start = 40". 由于每个站点有20个结果，因此需要20秒的步骤. 有人知道错误的原因吗?

When you press the button "weiter" (next page) the urlending changes from "...start=20" to "start=40". It is in 20s steps because there are 20 results per site. Does anyone know the reason for the error?

推荐答案

您将汤"放入名为"d"的变量中.

You put the 'soup' in a variable called 'd'.

因此替换以下行:

next_page=soup.find('div', {'class': 'paging'}, 'weiter')

与此:

next_page = d.find('div', {'class': 'paging'}, 'weiter')

这篇关于使用BeautifulSoup抓取Python转到下一页的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用BeautifulSoup抓取Python转到下一页 [英] Python scraping go to next page using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用BeautifulSoup抓取Python转到下一页 [英] Python scraping go to next page using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭