网页抓取-转到第2页 [英] Web Scraping - Get to Page 2

查看:65
本文介绍了网页抓取-转到第2页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何进入第二个数据集?无论我做什么,它只会返回第1页.

How to I get to page two of the data sets? No matter what I do, it only returns page 1.

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

myURL = 'https://jobs.collinsaerospace.com/search-jobs/'

uClient = uReq(myURL)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
container = page_soup.findAll("section", {"id":"search-results"}, {"data-current-page":"4"})


for child in container:
    for heading in child.find_all('h2'):
        print(heading.text)

推荐答案

尝试使用以下脚本从您感兴趣的任何页面获取结果.您需要做的就是根据需要更改范围.我本可以定义一个while循环来耗尽全部内容,但这不是您提出的问题.

Try the following script to get the results from whatever pages you are interested in. All you need to do is change the range as per your requirement. I could have defined a while loop to exhaust the whole content but that is not the question you asked.

import requests
from bs4 import BeautifulSoup

link = 'https://jobs.collinsaerospace.com/search-jobs/results?'

params = {
'CurrentPage': '',
'RecordsPerPage': 15,
'Distance': 50,
'SearchResultsModuleName': 'Search Results',
'SearchFiltersModuleName': 'Search Filters',
'SearchType': 5
}
for page in range(1,5):  #This is where you change the range to get the results from whatever page you want
    params['CurrentPage'] = page
    res = requests.get(link,params=params)
    soup = BeautifulSoup(res.json()['results'],"lxml")
    for name in soup.select("h2"):
        print(name.text)

这篇关于网页抓取-转到第2页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆