Python:requests.get,循环迭代url [英] Python: requests.get, iterating url in a loop

查看:67
本文介绍了Python:requests.get,循环迭代url的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过在 for 循环中迭代 requests.get(url) 来从 stats.nba.com 获取信息,其中 url 在每次迭代时都会更改.如果我只迭代一次它就可以工作,但两次或更多次似乎会出错,我不知道为什么.我是编程新手,所以任何信息都会有所帮助.提前致谢.这是我的代码:

I am trying to get information from stats.nba.com by iterating requests.get(url) in a for loop, where the url changes at every iteration. If I just iterate it once it works but twice or more seems to give errors and I'm not sure why. I'm new to programming so any info will be helpful. Thanks in advance. Here's my code:

import requests
import json

team_id = 1610612737

def get_data(url):
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        return data
    else:
        print(response.text)
        print(response.status_code)

for i in range(30): # 30 NBA Teams
    base_url = "http://stats.nba.com/stats/teamdetails?teamID="   
    team_url = base_url + str(team_id)
    data = get_data(team_url)

    ## Do stuff ##

   team_id +=1

如果我执行 'for i in range(1):' 它可以工作,但是如果范围大于 1,每次迭代我都会得到 status_code = 400.感谢您的帮助!

If I do 'for i in range(1):' it works, but I get status_code = 400 for each iteration if the range is greater than 1. Thanks for the help!

推荐答案

该网站限制每秒请求数,因此您需要包含特定的请求标头或在您的脚本中设置延迟(第一个选项是最快且可能的两者中最可靠的).

The website limits requests per second, so you'll need to include specific request headers or put a delay in your script (the first option being the quickest and likely most reliable of the two).

'''
add under team_id = 1610612737
'''

HEADERS = {'user-agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)'
                          'AppleWebKit/537.36 (KHTML, like Gecko)'
                          'Chrome/45.0.2454.101 Safari/537.36'),
                          'referer': 'http://stats.nba.com/scores/'}

然后将此添加到您的响应中get:

Then add this to your response get:

response = requests.get(url, headers=HEADERS)

*如果您使用这种方法,您的脚本根本不需要延迟.

*You shouldn't need to have a delay in your script at all if you use this method.

import time
time.sleep(10) # delays for 10 seconds (put in your loop)

使用延迟似乎命中或未命中,因此除非绝对必要,否则我不建议使用.

Seems like hit or miss using a delay, so I'd not recommend using unless absolutely necessary.

这篇关于Python:requests.get,循环迭代url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆