API捕获所有分页数据? (Python) [英] API capture all paginated data? (python)

查看:110
本文介绍了API捕获所有分页数据? (Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用请求包来访问API(greenhouse.io).该API是分页的,因此我需要遍历页面以获取所需的所有数据.使用类似的东西:

I'm using the requests package to hit an API (greenhouse.io). The API is paginated so I need to loop through the pages to get all the data I want. Using something like:

results = []
for i in range(1,326+1):
    response = requests.get(url, 
                            auth=(username, password), 
                            params={'page':i,'per_page':100})
    if response.status_code == 200:
        results += response.json()

我通过点击headers属性知道有326页:

I know there are 326 pages by hitting the headers attribute:

In [8]:
response.headers['link']
Out[8]:
'<https://harvest.greenhouse.io/v1/applications?page=3&per_page=100>; rel="next",<https://harvest.greenhouse.io/v1/applications?page=1&per_page=100>; rel="prev",<https://harvest.greenhouse.io/v1/applications?page=326&per_page=100>; rel="last"'

有什么方法可以自动提取此号码?使用请求包?还是我需要使用正则表达式或其他东西?

Is there any way to extract this number automatically? Using the requests package? Or do I need to use regex or something?

或者,我是否应该以某种方式使用while循环来获取所有这些数据?什么是最好的方法?有什么想法吗?

Alternatively, should I somehow use a while loop to get all this data? What is the best way? Any thoughts?

推荐答案

python请求库( http://docs.python-requests.org/en/latest/)可以在此处提供帮助.基本步骤将是(1)所有请求并获取标题中的链接(您将使用它来获取最后一页的信息),然后(2)遍历结果直到您到达最后一页.

The python requests library (http://docs.python-requests.org/en/latest/) can help here. The basic steps will be (1) all the request and grab the links from the header (you'll use this to get that last page info), and then (2) loop through the results until you're at that last page.

import requests

results = []

response = requests.get('https://harvest.greenhouse.io/v1/applications', auth=('APIKEY',''))
raw = response.json()  

for i in raw:  
    results.append(i) 

while response.links['next'] != response.links['last']:  
    r = requests.get(r.links['next'], auth=('APIKEY', '')  
    raw = r.json()  
    for i in raw:  
        results.append(i)

这篇关于API捕获所有分页数据? (Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆