API 捕获所有分页数据?(Python) [英] API capture all paginated data? (python)

查看:119
本文介绍了API 捕获所有分页数据?(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用请求包来访问 API (greenhouse.io).API 是分页的,所以我需要遍历页面以获取我想要的所有数据.使用类似的东西:

I'm using the requests package to hit an API (greenhouse.io). The API is paginated so I need to loop through the pages to get all the data I want. Using something like:

results = []
for i in range(1,326+1):
    response = requests.get(url, 
                            auth=(username, password), 
                            params={'page':i,'per_page':100})
    if response.status_code == 200:
        results += response.json()

通过点击 headers 属性,我知道有 326 个页面:

I know there are 326 pages by hitting the headers attribute:

In [8]:
response.headers['link']
Out[8]:
'<https://harvest.greenhouse.io/v1/applications?page=3&per_page=100>; rel="next",<https://harvest.greenhouse.io/v1/applications?page=1&per_page=100>; rel="prev",<https://harvest.greenhouse.io/v1/applications?page=326&per_page=100>; rel="last"'

有没有办法自动提取这个数字?使用请求包?还是我需要使用正则表达式之类的?

Is there any way to extract this number automatically? Using the requests package? Or do I need to use regex or something?

或者,我应该以某种方式使用 while 循环来获取所有这些数据吗?什么是最好的方法?有什么想法吗?

Alternatively, should I somehow use a while loop to get all this data? What is the best way? Any thoughts?

推荐答案

python 请求库 (http://docs.python-requests.org/en/latest/) 可以提供帮助.基本步骤将是 (1) 所有请求并从标题中获取链接(您将使用它来获取最后一页信息),然后 (2) 遍历结果直到您到达最后一页.

The python requests library (http://docs.python-requests.org/en/latest/) can help here. The basic steps will be (1) all the request and grab the links from the header (you'll use this to get that last page info), and then (2) loop through the results until you're at that last page.

import requests

results = []
    
response = requests.get('https://harvest.greenhouse.io/v1/applications', auth=('APIKEY',''))
raw = response.json()  

for i in raw:  
    results.append(i) 

while response.links['next'] != response.links['last']:  
    r = requests.get(response.links['next'], auth=('APIKEY', '')  
    raw = r.json()  
    for i in raw:  
        results.append(i)

这篇关于API 捕获所有分页数据?(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆