网页抓取-转到第2页 [英] Web Scraping - Get to Page 2
本文介绍了网页抓取-转到第2页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何进入第二个数据集?无论我做什么,它只会返回第1页.
How to I get to page two of the data sets? No matter what I do, it only returns page 1.
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
myURL = 'https://jobs.collinsaerospace.com/search-jobs/'
uClient = uReq(myURL)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
container = page_soup.findAll("section", {"id":"search-results"}, {"data-current-page":"4"})
for child in container:
for heading in child.find_all('h2'):
print(heading.text)
推荐答案
尝试使用以下脚本从您感兴趣的任何页面获取结果.您需要做的就是根据需要更改范围.我本可以定义一个while循环来耗尽全部内容,但这不是您提出的问题.
Try the following script to get the results from whatever pages you are interested in. All you need to do is change the range as per your requirement. I could have defined a while loop to exhaust the whole content but that is not the question you asked.
import requests
from bs4 import BeautifulSoup
link = 'https://jobs.collinsaerospace.com/search-jobs/results?'
params = {
'CurrentPage': '',
'RecordsPerPage': 15,
'Distance': 50,
'SearchResultsModuleName': 'Search Results',
'SearchFiltersModuleName': 'Search Filters',
'SearchType': 5
}
for page in range(1,5): #This is where you change the range to get the results from whatever page you want
params['CurrentPage'] = page
res = requests.get(link,params=params)
soup = BeautifulSoup(res.json()['results'],"lxml")
for name in soup.select("h2"):
print(name.text)
这篇关于网页抓取-转到第2页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文