如何使用python循环分页API [英] How to loop through paginated API using python

查看:40
本文介绍了如何使用python循环分页API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从 REST API 检索 500 部最受欢迎的电影,但结果仅限于每页 20 部,而且我每 10 秒只能进行 40 次调用(https://developers.themoviedb.org/3/getting-started/request-rate-limiting).我无法动态循环浏览分页结果,因此 500 个最受欢迎的结果位于一个列表中.

我可以成功返回前 20 部最受欢迎的电影(见下文)并枚举电影的编号,但我在循环中陷入困境,该循环允许我在前 500 部中分页而不会因 API 而超时速率限制.

import requests #进行TMDB API调用#Discover API url 过滤到电影 >= 2004 并包含戏剧流派_ID:18discover_api = 'https://api.themoviedb.org/3/discover/movie?api_key=['我的 api 密钥']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'#Returning 所有戏剧电影 >= 2004 年的流行描述discover_api = requests.get(discover_api).json()most_popular_films =discover_api['results']#printing movie_id 和 movie_title 按受欢迎程度 desc对于我,枚举中的电影(most_popular_films):打印(我,电影['id'],电影['标题'])

<预><代码>示例响应:{页面":1,total_results":101685,总页数":5085,结果": [{"vote_count": 13,身份证":280960,视频":假,投票平均值":5.2,"title": "卡特琳娜和其他人",人气":130.491,"poster_path": "/kZMCbp0o46Tsg43omSHNHJKNTx9.jpg","original_language": "pt","original_title": "Catarina e os Outros",genre_ids":[18、9648],"backdrop_path": "/9nDiMhvL3FtaWMsvvvzQIuq276X.jpg",成人":假,《概述》:《外面,第一缕阳光破晓.十六岁的卡塔琳娜睡不着.不料,大城市里的成年人被欲望所感动……卡塔琳娜发现她是艾滋病毒阳性.她想拖其他人一起.",发布日期":2011-03-01"},{"vote_count": 9,身份证":531309,视频":假,投票平均值":4.6,"title": "布莱本",人气":127.582,"poster_path": "/roslEbKdY0WSgYaB5KXvPKY0bXS.jpg","original_language": "zh","original_title": "Brightburn",genre_ids":[27、878,18、53],

我需要 python 循环将分页结果附加到一个列表中,直到我捕获了 500 部最受欢迎的电影.

<预><代码>期望输出:电影_ID 电影_标题280960 卡塔琳娜等人531309 布赖恩伯恩438650 冷酷追击537915 后50465玻璃457799 极其邪恶、骇人听闻的邪恶和卑鄙

解决方案

大多数 API 都包含一个 next_url 字段,以帮助您遍历所有结果.让我们来看看一些案例.

1.没有 next_url 字段

您可以循环遍历所有页面,直到 results 字段为空:

import requests #进行TMDB API调用#Discover API url 过滤到电影 >= 2004 并包含戏剧流派_ID:18discover_api_url = 'https://api.themoviedb.org/3/discover/movie?api_key=['我的 api 密钥']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'most_popular_films = []新结果 = 真页 = 1而新结果:discover_api = requests.get(discover_api_url + f"&page={page}").json()new_results = discover_api.get("结果", [])most_popular_films.extend(new_results)页 += 1#printing movie_id 和 movie_title 按受欢迎程度 desc对于我,枚举中的电影(most_popular_films):打印(我,电影['id'],电影['标题'])

2.取决于 total_pages 字段

import requests #进行TMDB API调用#Discover API url 过滤到电影 >= 2004 并包含戏剧流派_ID:18discover_api_url = 'https://api.themoviedb.org/3/discover/movie?api_key=['我的 api 密钥']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'discover_api = requests.get(discover_api_url).json()most_popular_films =discover_api[结果"]对于范围内的页面(2,discover_api[total_pages"]+1):discover_api = requests.get(discover_api_url + f"&page={page}").json()most_popular_films.extend(discover_api["results"])#printing movie_id 和 movie_title 按流行度 desc对于我,枚举中的电影(most_popular_films):打印(我,电影['id'],电影['标题'])

3.next_url 字段存在!耶!

同样的想法,只是现在我们检查 next_url 字段是否为空 - 如果为空,则为最后一页.

import requests #进行TMDB API调用#Discover API url 过滤到电影 >= 2004 并包含戏剧流派_ID:18discover_api = 'https://api.themoviedb.org/3/discover/movie?api_key=['我的 api 密钥']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'discover_api = requests.get(discover_api).json()most_popular_films =discover_api[结果"]而discover_api["next_url"]:discover_api = requests.get(discover_api["next_url"]).json()most_popular_films.extend(discover_api["results"])#printing movie_id 和 movie_title 按受欢迎程度 desc对于我,枚举中的电影(most_popular_films):打印(我,电影['id'],电影['标题'])

I need to retrieve the 500 most popular films from a REST API, but the results are limited to 20 per page and I am only able to make 40 calls every 10 seconds (https://developers.themoviedb.org/3/getting-started/request-rate-limiting). I am unable to loop through the paginated results dynamically, so that the 500 most popular results are in a single list.

I can successfully return the top 20 most popular films (see below) and enumerate the number of the film, but I am getting stuck working through the loop that allows me to paginate through the top 500 without timing out due to the API rate limit.

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

#Returning all drama films >= 2004 in popularity desc
discover_api = requests.get(discover_api).json()

most_popular_films = discover_api['results']

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])


Sample response:

{
  "page": 1,
  "total_results": 101685,
  "total_pages": 5085,
  "results": [
    {
      "vote_count": 13,
      "id": 280960,
      "video": false,
      "vote_average": 5.2,
      "title": "Catarina and the others",
      "popularity": 130.491,
      "poster_path": "/kZMCbp0o46Tsg43omSHNHJKNTx9.jpg",
      "original_language": "pt",
      "original_title": "Catarina e os Outros",
      "genre_ids": [
        18,
        9648
      ],
      "backdrop_path": "/9nDiMhvL3FtaWMsvvvzQIuq276X.jpg",
      "adult": false,
      "overview": "Outside, the first sun rays break the dawn.  Sixteen years old Catarina can't fall asleep.  Inconsequently, in the big city adults are moved by desire...  Catarina found she is HIV positive. She wants to drag everyone else along.",
      "release_date": "2011-03-01"
    },
    {
      "vote_count": 9,
      "id": 531309,
      "video": false,
      "vote_average": 4.6,
      "title": "Brightburn",
      "popularity": 127.582,
      "poster_path": "/roslEbKdY0WSgYaB5KXvPKY0bXS.jpg",
      "original_language": "en",
      "original_title": "Brightburn",
      "genre_ids": [
        27,
        878,
        18,
        53
      ],

I need the the python loop to append the paginated results into a single list until I have captured the 500 most popular films.


Desired Output:

Movie_ID  Movie_Title
280960    Catarina and the others
531309    Brightburn
438650    Cold Pursuit
537915    After
50465     Glass
457799    Extremely Wicked, Shockingly Evil and Vile

解决方案

Most APIs include a next_url field to help you loop through all results. Let's examine some cases.

1. No next_url field

You can just loop through all pages until results field is empty:

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api_url = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

most_popular_films = []
new_results = True
page = 1
while new_results:
    discover_api = requests.get(discover_api_url + f"&page={page}").json()
    new_results = discover_api.get("results", [])
    most_popular_films.extend(new_results)
    page += 1

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

2. Depend on total_pages field

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api_url = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

discover_api = requests.get(discover_api_url).json()
most_popular_films = discover_api["results"]
for page in range(2, discover_api["total_pages"]+1):
    discover_api = requests.get(discover_api_url + f"&page={page}").json()
    most_popular_films.extend(discover_api["results"])

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

3. next_url field exists! Yay!

Same idea, only now we check for the emptiness of the next_url field - If it's empty, it's the last page.

import requests #to make TMDB API calls

#Discover API url filtered to movies >= 2004 and containing Drama genre_ID: 18
discover_api = 'https://api.themoviedb.org/3/discover/movie? 
api_key=['my api key']&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&primary_release_year=>%3D2004&with_genres=18'

discover_api = requests.get(discover_api).json()
most_popular_films = discover_api["results"]
while discover_api["next_url"]:
    discover_api = requests.get(discover_api["next_url"]).json()
    most_popular_films.extend(discover_api["results"])

#printing movie_id and movie_title by popularity desc
for i, film in enumerate(most_popular_films):
    print(i, film['id'], film['title'])

这篇关于如何使用python循环分页API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆