使用BeautifulSoup在python中抓取多个页面 [英] scraping multiple pages in python with BeautifulSoup

查看：280 发布时间：2020/9/20 7:48:54 python html web-scraping beautifulsoup

本文介绍了使用BeautifulSoup在python中抓取多个页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我设法编写了从第一页抓取数据的代码，现在我被困在此代码中编写一个循环以抓取下一个"n"页.下面是代码

I have managed to write code to scrape data from the first page and now the I am stuck with writing a loop in this code to scrape the next 'n' pages. Below is the code

如果有人可以指导/帮助我编写可从其余页面抓取数据的代码，我将不胜感激.

I would appreciate if someone could guide/help me to write the code that would scrape the data from remaining pages.

谢谢！

from bs4 import BeautifulSoup
import requests
import csv


url = requests.get('https://wsc.nmbe.ch/search?sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt=contain&searchSpec=s').text

soup = BeautifulSoup(url, 'lxml')

elements = soup.find_all('div', style="border-bottom: 1px solid #C0C0C0; padding: 10px 0;")
#print(elements)

csv_file = open('wsc_scrape.csv', 'w')

csv_writer = csv.writer(csv_file)

csv_writer.writerow(['sp_name', 'species_author', 'status', 'family'])


for element in elements:
    sp_name = element.i.text.strip()
    print(sp_name)



    status = element.find('span', class_ = ['success label', 'error label']).text.strip()
    print(status)




    author_family = element.i.next_sibling.strip().split('|')
    species_author = author_family[0].strip()
    family = author_family[1].strip()
    print(species_author)
    print(family)


    print()

    csv_writer.writerow([sp_name, species_author, status, family])

csv_file.close()

推荐答案

您必须在URL中传递page=参数并遍历所有页面:

You have to pass page= parameter in URL and iterate over all pages:

from bs4 import BeautifulSoup
import requests
import csv

csv_file = open('wsc_scrape.csv', 'w', encoding='utf-8')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['sp_name', 'species_author', 'status', 'family'])

for i in range(151):
    url = requests.get('https://wsc.nmbe.ch/search?page={}&sFamily=Salticidae&fMt=begin&sGenus=&gMt=begin&sSpecies=&sMt=begin&multiPurpose=slsid&sMulti=&mMt=contain&searchSpec=s'.format(i+1)).text
    soup = BeautifulSoup(url, 'lxml')
    elements = soup.find_all('div', style="border-bottom: 1px solid #C0C0C0; padding: 10px 0;")
    for element in elements:
        sp_name = element.i.text.strip()
        print(sp_name)
        status = element.find('span', class_ = ['success label', 'error label']).text.strip()
        print(status)
        author_family = element.i.next_sibling.strip().split('|')
        species_author = author_family[0].strip()
        family = author_family[1].strip()
        print(species_author)
        print(family)
        print()
        csv_writer.writerow([sp_name, species_author, status, family])

csv_file.close()

这篇关于使用BeautifulSoup在python中抓取多个页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用BeautifulSoup在python中抓取多个页面 [英] scraping multiple pages in python with BeautifulSoup

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用BeautifulSoup在python中抓取多个页面 [英] scraping multiple pages in python with BeautifulSoup

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭