Tweepy rate limit / pagination issue。 [英] Tweepy rate limit / pagination issue.

查看:221
本文介绍了Tweepy rate limit / pagination issue。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经组织了一个小型Twitter工具来提取相关的推文,以便在潜在语义分析中进行后续分析。具有讽刺意味的是,该位(更复杂的位)工作正常 - 这是拉动tweets的问题。我正在使用下面的代码进行设置。



这个技术上有效,但没有预期的 - .items(200)参数,我以为每个请求会拉200个tweets,但是被阻塞成15个tweet块这个200个项目的费用我13请求) - 我知道这是原始/默认的RPP变量(现在在Twitter文档中的'count'),但我已经尝试了在Cursor设置(rpp = 100,这是从twitter文档的最大值),这没有什么区别。



Tweepy / Cursor文档



感谢任何想法!我确定这是一个小小的调整设置,但我已经尝试了各种设置页面和rpp,无济于事。

  auth = tweepy.OAuthHandler(apikey,apisecret)
auth.set_access_token(access_token,access_token_secret_var)
从工具导入read_user,read_tweet
从auth导入基本
api = tweepy.API(auth)
current_results = []
从tweepy导入光标
为光标中的tweet (api.search,
q = search_string,
result_type =recent,
include_entities = True,
lang =en)。items(200):
current_user,created = read_user(tweet.author)
current_tweet,created = read_tweet(tweet,current_user)
current_results.append(tweet)
print current_results


解决方案

最后,我得到了同事的一些帮助。 Afaict,rpp和items()调用在实际的API调用之后。 Twitter文档中的count选项,以前是RPP所提及的以上,在Tweepy 2.3.0中仍然被称为rpp,似乎在这里存在问题。



我最后做的是修改Tweepy代码 - 在api.py中,我将count添加到搜索绑定部分(在我的安装中L643左右) )。

 search
search = bind_api(
path ='/ search / tweets .json',
payload_type ='search_results',
allowed_pa​​ram = ['q','count','lang','locale','since_id','geocode','max_id'因为','至','result_type',**'count **','include_entities','from','to','source']

这允许我调整上面的代码:

 在Cursor中的tweet(api.search,
q = search_string,
count = 100,
result_type =recent,
include_entities = True,
lang =en)。项目(200):

哪些导致两个调用,而不是十五;我已经用

  print api.rate_limit_status()[resources] 

,每次只剩下2次搜索。


I've put together a small twitter tool to pull relevant tweets, for later analysis in a latent semantic analysis. Ironically, that bit (the more complicated bit) works fine - it's pulling the tweets that's the problem. I'm using the code below to set it up.

This technically works, but no as expected - the .items(200) parameter I thought would pull 200 tweets per request, but it's being blocked into 15 tweet chunks (so the 200 items 'costs' me 13 requests) - I understand that this is the original/default RPP variable (now 'count' in the Twitter docs), but I've tried that in the Cursor setting (rpp=100, which is the maximum from the twitter documentation), and it makes no difference.

Tweepy/Cursor docs
The other nearest similar question isn't quite the same issue

Grateful for any thoughts! I'm sure it's a minor tweak to the settings, but I've tried various settings on page and rpp, to no avail.

auth = tweepy.OAuthHandler(apikey, apisecret)
auth.set_access_token(access_token, access_token_secret_var)
from tools import read_user, read_tweet
from auth import basic
api = tweepy.API(auth)
current_results = []
from tweepy import Cursor
for tweet in Cursor(api.search,
                       q=search_string,
                       result_type="recent",
                       include_entities=True,
                       lang="en").items(200):
    current_user, created = read_user(tweet.author)
    current_tweet, created = read_tweet(tweet, current_user)
    current_results.append(tweet)
print current_results

解决方案

I worked it out in the end, with a little assistance from colleagues. Afaict, the rpp and items() calls are coming after the actual API call. The 'count' option from the Twitter documentation which was formerly RPP as mentioned above, and is still noted as rpp in Tweepy 2.3.0, seems to be at issue here.

What I ended up doing was modifying the Tweepy Code - in api.py, I added 'count' in to the search bind section (around L643 in my install, ymmv).

""" search """
search = bind_api(
    path = '/search/tweets.json',
    payload_type = 'search_results',
    allowed_param = ['q', 'count', 'lang', 'locale', 'since_id', 'geocode', 'max_id', 'since', 'until', 'result_type', **'count**', 'include_entities', 'from', 'to', 'source']
)

This allowed me to tweak the code above to:

for tweet in Cursor(api.search,
                       q=search_string,
                       count=100,
                       result_type="recent",
                       include_entities=True,
                       lang="en").items(200):

Which results in two calls, not fifteen; I've double checked this with

print api.rate_limit_status()["resources"]

after each call, and it's only deprecating my remaining searches by 2 each time.

这篇关于Tweepy rate limit / pagination issue。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆