如何使用 Python 检索给定用户的所有推文和属性? [英] How can I retrieve all Tweets and attributes for a given user using Python?

查看:40
本文介绍了如何使用 Python 检索给定用户的所有推文和属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 Twitter 检索数据,使用 Tweepy 作为在命令行输入的用户名.我想提取很多关于状态和用户的数据,所以想出了以下内容:

请注意,我正在导入所有必需的模块,并且具有 oauth + 密钥(只是未包含在此处)并且文件名正确,只是已更改:

# 定义要获取推文的用户.接受来自用户的输入user = tweepy.api.get_user(input("请输入推特用户名:"))# 显示推特用户名的基本信息打印 (" ")打印(基本信息",用户名)打印(屏幕名称:",user.screen_name)打印(名称:",用户名)打印(推特唯一ID:",user.id)打印(帐户创建于:",user.created_at)时间线 = api.user_timeline(screen_name=user,include_rts=True,count=100)对于时间线中的推文:打印(ID:",tweet.id)打印(用户ID:",tweet.user.id)打印(文本:",tweet.text)打印(创建:",tweet.created_at)打印(地理:",tweet.geo)打印(贡献者:",tweet.contributors)打印(坐标:",tweet.coordinates)打印(收藏:",tweet.favorited)打印(回复屏幕名称:",tweet.in_reply_to_screen_name)打印(回复状态ID:",tweet.in_reply_to_status_id)打印(回复状态ID str:",tweet.in_reply_to_status_id_str)打印(回复用户ID:",tweet.in_reply_to_user_id)print("回复用户ID str:", tweet.in_reply_to_user_id_str)打印(地点:",tweet.place)打印(转推:",tweet.retweeted)打印(转推计数:",tweet.retweet_count)打印(来源:",tweet.source)打印(截断:",tweet.truncated)

我希望最终能够遍历用户的所有推文(最多 3200 条).不过,第一件事.到目前为止,虽然我有两个问题,但我收到以下有关转推的错误消息:

请输入推特用户名:barackobamaTraceback(最近一次通话): 中的文件usertimeline.py",第 64 行时间线 = api.user_timeline(screen_name=user, count=100, page=1)文件C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py",第 153 行,在 _call引发 TweepError(error_msg)tweepy.error.TweepError:Twitter 错误响应:状态代码 = 401回溯(最近一次调用最后一次): 中的文件usertimeline.py",第 42 行user = tweepy.api.get_user(input("请输入推特用户名:"))文件C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py",第 153 行,在 _call引发 TweepError(error_msg)tweepy.error.TweepError:Twitter 错误响应:状态代码 = 404

将用户名作为变量传递似乎也是一个问题:

回溯(最近一次调用最后一次): 中的文件usertimleline.py",第 64 行时间线 = api.user_timeline(screen_name=user, count=100, page=1)文件C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py",第 153 行,在 _call引发 TweepError(error_msg)tweepy.error.TweepError:Twitter 错误响应:状态代码 = 401

我已经隔离了这两个错误,即它们不能一起工作.

请原谅我的无知,我对 Twitter API 不太感兴趣,但我学得很快.Tweepy 文档确实很糟糕,我在网上做了大量的阅读,但似乎无法解决这个问题.如果我能解决这个问题,我会发布一些文档.

我知道如何在提取后将数据传输到 MySQL 数据库中(它会这样做,而不是打印到屏幕上)并对其进行操作,以便我可以用它做一些事情,它只是把它弄出来我有的问题.有没有人有任何想法,或者我应该考虑另一种方法吗?

任何帮助真的很感激.干杯

遵循今天早上@Eric Olson 的建议;我做了以下事情.

1) 创建了一套全新的 Oauth 凭据以进行测试.2) 将代码复制到一个新脚本中,如下所示:

身份验证

consumer_key = "(已移除)"consumer_secret = "(已删除)"access_key="88394805-(已删除)"access_secret="(已删除)"auth = tweepy.OAuthHandler(consumer_key,consumer_secret)auth.set_access_token(access_key, access_secret)api=tweepy.API(auth)# 确认用于 OAuth 的帐户打印 ("API 名称是:", api.me().name)api.update_status("从命令行使用 Tweepy")

我第一次运行脚本时,它工作正常并更新我的状态并返回 API 名称,如下所示:

<预><代码>>>>API 名称是:Chris Howden

从那时起我就明白了:

回溯(最近一次调用最后一次):文件C:/Users/Chris/Dropbox/Uni_2012-3/6CC995 - Independent Studies/Scripts/get Api name and update status.py", line 19, in <module>api.update_status("在命令行中使用 Tweepy")文件C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py",第 153 行,在 _call引发 TweepError(error_msg)tweepy.error.TweepError:Twitter 错误响应:状态代码 = 403

我能看到它做这样的事情的唯一原因是它拒绝生成的访问令牌.我应该不需要更新访问令牌吗?

解决方案

如果你愿意尝试另一个库,你可以给 rauth 一枪.已经有一个 Twitter 示例,但如果你觉得懒惰并且只是想要一个工作示例,这是我修改演示脚本的方法:

 from rauth import OAuth1Service# 获取真正的消费者密钥 &来自 https://dev.twitter.com/apps/new 的秘密推特 = OAuth1Service(名称='推特',consumer_key='J8MoJG4bQ9gcmGh8H7XhMg',consumer_secret='7WAscbSy65GmiVOvMU5EBYn5z80fhQkcFWSLMJJu4',request_token_url='https://api.twitter.com/oauth/request_token',access_token_url='https://api.twitter.com/oauth/access_token',authorize_url='https://api.twitter.com/oauth/authorize',base_url='https://api.twitter.com/1/')request_token, request_token_secret = twitter.get_request_token()authorize_url = twitter.get_authorize_url(request_token)打印 '在浏览器中访问此 URL:' + authorize_urlpin = raw_input('从浏览器输入密码:')session = twitter.get_auth_session(request_token,request_token_secret,方法='POST',data={'oauth_verifier': pin})params = {'screen_name': 'github', # 用户从中提取推文'include_rts': 1, # 包括转推'count': 10} # 10 条推文r = session.get('statuses/user_timeline.json', params=params)对于我,在 enumerate(r.json(), 1) 中发推文:handle = tweet['user']['screen_name'].encode('utf-8')text = tweet['text'].encode('utf-8')打印{0}".@{1} - {2}'.format(i, handle, text)

您可以按原样运行此程序,但请务必更新凭据!这些仅用于演示目的.

完全公开,我是 rauth 的维护者.

I am attempting to retrieve data from Twitter, using Tweepy for a username typed at the command line. I'm wanting to extract quite a bit of data about the status and user,so have come up with the following:

Note that I am importing all the required modules ok and have oauth + keys (just not included it here) and filename is correct, just been changed:

# define user to get tweets for. accepts input from user
user = tweepy.api.get_user(input("Please enter the twitter username: "))

# Display basic details for twitter user name
print (" ")
print ("Basic information for", user.name)
print ("Screen Name:", user.screen_name)
print ("Name: ", user.name)
print ("Twitter Unique ID: ", user.id)
print ("Account created at: ", user.created_at)

timeline = api.user_timeline(screen_name=user, include_rts=True, count=100)
    for tweet in timeline:
        print ("ID:", tweet.id)
        print ("User ID:", tweet.user.id)
        print ("Text:", tweet.text)
        print ("Created:", tweet.created_at)
        print ("Geo:", tweet.geo)
        print ("Contributors:", tweet.contributors)
        print ("Coordinates:", tweet.coordinates) 
        print ("Favorited:", tweet.favorited)
        print ("In reply to screen name:", tweet.in_reply_to_screen_name)
        print ("In reply to status ID:", tweet.in_reply_to_status_id)
        print ("In reply to status ID str:", tweet.in_reply_to_status_id_str)
        print ("In reply to user ID:", tweet.in_reply_to_user_id)
        print ("In reply to user ID str:", tweet.in_reply_to_user_id_str)
        print ("Place:", tweet.place)
        print ("Retweeted:", tweet.retweeted)
        print ("Retweet count:", tweet.retweet_count)
        print ("Source:", tweet.source)
        print ("Truncated:", tweet.truncated)

I would like this eventually to iterate through all of a user's tweets (up to the 3200 limit). First things first though. So far though I have two problems, I get the following error message regarding retweets:

Please enter the twitter username: barackobamaTraceback (most recent call last):
  File " usertimeline.py", line 64, in <module>
    timeline = api.user_timeline(screen_name=user, count=100, page=1)
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 401
Traceback (most recent call last):
  File "usertimeline.py", line 42, in <module>
    user = tweepy.api.get_user(input("Please enter the twitter username: "))
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 404

Passing the username as a variable seems to be a problem also:

Traceback (most recent call last):
  File " usertimleline.py", line 64, in <module>
    timeline = api.user_timeline(screen_name=user, count=100, page=1)
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 401

I've isolated both these errors, i.e. they aren't working together.

Forgive my ignorance, I am not too hot with Twitter APIs but am learning pretty rapidly. Tweepy documentation really does suck and I've done loads of reading round on the net, just can't seem to get this fixed. If I can get this sorted, i'll be posting up some documentation.

I know how to transfer the data into an MySQL db once extracted (it will do that, rather than print to screen) and manipulate it so that I can do stuff with it, it is just getting it out that I am having the problems with. Does anyone have any ideas or is there another method I should be considering?

Any help really appreciated. Cheers

EDIT:

Following on from @Eric Olson's suggestion this morning; I did the following.

1) Created a completely brand new set of Oauth credentials to test. 2) Copied code across to a new script as follows:

Oauth

consumer_key = "(removed)"
consumer_secret = "(removed)"
access_key="88394805-(removed)"
access_secret="(removed)"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api=tweepy.API(auth)



# confirm account being used for OAuth
print ("API NAME IS: ", api.me().name)
api.update_status("Using Tweepy from the command line")

The first time i run the script, it works fine and updates my status and returns the API name as follows:

>>> 
API NAME IS:  Chris Howden

Then from that point on I get this:

Traceback (most recent call last):
  File "C:/Users/Chris/Dropbox/Uni_2012-3/6CC995 - Independent Studies/Scripts/get Api name and update status.py", line 19, in <module>
    api.update_status("Using Tweepy frm the command line")
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 403

The only reason I can see for it doing something like this is that it is rejecting the generated access token. I shouldn't need to renew the access token should I?

解决方案

If you're open to trying another library, you could give rauth a shot. There's already a Twitter example but if you're feeling lazy and just want a working example, here's how I'd modify that demo script:

from rauth import OAuth1Service

# Get a real consumer key & secret from https://dev.twitter.com/apps/new
twitter = OAuth1Service(
    name='twitter',
    consumer_key='J8MoJG4bQ9gcmGh8H7XhMg',
    consumer_secret='7WAscbSy65GmiVOvMU5EBYn5z80fhQkcFWSLMJJu4',
    request_token_url='https://api.twitter.com/oauth/request_token',
    access_token_url='https://api.twitter.com/oauth/access_token',
    authorize_url='https://api.twitter.com/oauth/authorize',
    base_url='https://api.twitter.com/1/')

request_token, request_token_secret = twitter.get_request_token()

authorize_url = twitter.get_authorize_url(request_token)

print 'Visit this URL in your browser: ' + authorize_url
pin = raw_input('Enter PIN from browser: ')

session = twitter.get_auth_session(request_token,
                                   request_token_secret,
                                   method='POST',
                                   data={'oauth_verifier': pin})

params = {'screen_name': 'github',  # User to pull Tweets from
          'include_rts': 1,         # Include retweets
          'count': 10}              # 10 tweets

r = session.get('statuses/user_timeline.json', params=params)

for i, tweet in enumerate(r.json(), 1):
    handle = tweet['user']['screen_name'].encode('utf-8')
    text = tweet['text'].encode('utf-8')
    print '{0}. @{1} - {2}'.format(i, handle, text)

You can run this as-is, but be sure to update the credentials! These are meant for demo purposes only.

Full disclosure, I am the maintainer of rauth.

这篇关于如何使用 Python 检索给定用户的所有推文和属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆