限制/限制GRequests中HTTP请求的速率 [英] Limiting/throttling the rate of HTTP requests in GRequests

查看:214
本文介绍了限制/限制GRequests中HTTP请求的速率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用 GRequests 和lxml编写的Python 2.7.3小脚本从各个网站收集一些可收集的卡价格并进行比较.问题是网站之一限制了请求的数量,如果我超过了该请求,则会发回HTTP错误429.

I'm writing a small script in Python 2.7.3 with GRequests and lxml that will allow me to gather some collectible card prices from various websites and compare them. Problem is one of the websites limits the number of requests and sends back HTTP error 429 if I exceed it.

是否有一种方法可以限制GRequestes中的请求数量,以使我不超过我指定的每秒请求数量? 另外-如果发生HTTP 429,如何让GRequestes在一段时间后重试?

Is there a way to add throttling the number of requests in GRequestes so that I don't exceed the number of requests per second I specify? Also - how can I make GRequestes retry after some time if HTTP 429 occurs?

另一方面,它们的极限太低了.大约每15秒8个请求.我多次用浏览器破坏它,只是刷新页面以等待价格变动.

On a side note - their limit is ridiculously low. Something like 8 requests per 15 seconds. I breached it with my browser on multiple occasions just refreshing the page waiting for price changes.

推荐答案

要回答我自己的问题,因为我必须自己弄清楚这个问题,并且关于这方面的信息似乎很少.

Going to answer my own question since I had to figure this by myself and there seems to be very little info on this going around.

想法如下.与GRequests一起使用的每个请求对象在创建时都可以将会话对象作为参数.另一方面,会话对象可以安装发出请求时使用的HTTP适配器.通过创建我们自己的适配器,我们可以拦截请求并对请求进行速率限制,从而找到最适合我们的应用程序的方式.就我而言,我最终得到了下面的代码.

The idea is as follows. Every request object used with GRequests can take a session object as a parameter when created. Session objects on the other hand can have HTTP adapters mounted that are used when making requests. By creating our own adapter we can intercept requests and rate-limit them in way we find best for our application. In my case I ended up with the code below.

用于节流的对象

DEFAULT_BURST_WINDOW = datetime.timedelta(seconds=5)
DEFAULT_WAIT_WINDOW = datetime.timedelta(seconds=15)


class BurstThrottle(object):
    max_hits = None
    hits = None
    burst_window = None
    total_window = None
    timestamp = None

    def __init__(self, max_hits, burst_window, wait_window):
        self.max_hits = max_hits
        self.hits = 0
        self.burst_window = burst_window
        self.total_window = burst_window + wait_window
        self.timestamp = datetime.datetime.min

    def throttle(self):
        now = datetime.datetime.utcnow()
        if now < self.timestamp + self.total_window:
            if (now < self.timestamp + self.burst_window) and (self.hits < self.max_hits):
                self.hits += 1
                return datetime.timedelta(0)
            else:
                return self.timestamp + self.total_window - now
        else:
            self.timestamp = now
            self.hits = 1
            return datetime.timedelta(0)

HTTP适配器:

class MyHttpAdapter(requests.adapters.HTTPAdapter):
    throttle = None

    def __init__(self, pool_connections=requests.adapters.DEFAULT_POOLSIZE,
                 pool_maxsize=requests.adapters.DEFAULT_POOLSIZE, max_retries=requests.adapters.DEFAULT_RETRIES,
                 pool_block=requests.adapters.DEFAULT_POOLBLOCK, burst_window=DEFAULT_BURST_WINDOW,
                 wait_window=DEFAULT_WAIT_WINDOW):
        self.throttle = BurstThrottle(pool_maxsize, burst_window, wait_window)
        super(MyHttpAdapter, self).__init__(pool_connections=pool_connections, pool_maxsize=pool_maxsize,
                                            max_retries=max_retries, pool_block=pool_block)

    def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
        request_successful = False
        response = None
        while not request_successful:
            wait_time = self.throttle.throttle()
            while wait_time > datetime.timedelta(0):
                gevent.sleep(wait_time.total_seconds(), ref=True)
                wait_time = self.throttle.throttle()

            response = super(MyHttpAdapter, self).send(request, stream=stream, timeout=timeout,
                                                       verify=verify, cert=cert, proxies=proxies)

            if response.status_code != 429:
                request_successful = True

        return response

设置:

requests_adapter = adapter.MyHttpAdapter(
    pool_connections=__CONCURRENT_LIMIT__,
    pool_maxsize=__CONCURRENT_LIMIT__,
    max_retries=0,
    pool_block=False,
    burst_window=datetime.timedelta(seconds=5),
    wait_window=datetime.timedelta(seconds=20))

requests_session = requests.session()
requests_session.mount('http://', requests_adapter)
requests_session.mount('https://', requests_adapter)

unsent_requests = (grequests.get(url,
                                 hooks={'response': handle_response},
                                 session=requests_session) for url in urls)
grequests.map(unsent_requests, size=__CONCURRENT_LIMIT__)

这篇关于限制/限制GRequests中HTTP请求的速率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆