如何从Tweepy对象提取数据到 pandas 数据框? [英] How to extract data from a Tweepy object into a pandas dataframe?
问题描述
我正在尝试创建一个如下所示的Pandas数据框:
I am attempting to create a Pandas dataframe that looks like:
| user_name | followers | following | retweets | likes | tweet date | tweet |
|:---------:|:---------:|:---------:|:--------:|:-----:|:-----------:|:------------:|
| user1 | 50 | 100 | 25 | 10 | Oct-1-2019 | lorem ipsum… |
| user1 | 50 | 100 | 25 | 10 | Oct-6-2019 | lorem ipsum… |
| user1 | 50 | 100 | 25 | 10 | Oct-19-2019 | lorem ipsum… |
| user1 | 50 | 100 | 25 | 10 | Oct-4-2019 | lorem ipsum… |
| user1 | 50 | 100 | 25 | 10 | Oct-16-2019 | lorem ipsum… |
| user2 | 321 | 12151 | 2017 | 0 | Sep-12-2018 | lorem ipsum… |
| user2 | 321 | 12151 | 2017 | 0 | Sep-15-2018 | lorem ipsum… |
| user2 | 321 | 12151 | 2017 | 0 | Sep-17-2018 | lorem ipsum… |
| user2 | 321 | 12151 | 2017 | 0 | Sep-17-2018 | lorem ipsum… |
| user2 | 321 | 12151 | 2017 | 0 | Sep-17-2019 | lorem ipsum… |
| user3 | 122 | 124 | 11 | 38337 | Nov-1-2019 | foobar |
(这里的值是任意的)
我要尝试的工作是从Twitter个人资料开始,然后抓取该个人资料的关注者并提取有关该个人资料的以下功能:
{username (@), follower count, following count, # of retweets, # of likes}
What I am trying to do is starting with a Twitter profile, to then scrape through the followers of that profile and extract the following features about that profile:
{username (@), follower count, following count, # of retweets, # of likes}
我正在使用 Tweepy 来尝试实现这一目标.
I am using Tweepy to try and accomplish this.
到目前为止,我当前的代码可以抓住追随者,但是它会为追随者打印出_json
,而不是我正在寻找的适当详细信息.
So far, my current codes can grab followers, but it prints out the _json
for the follower, and not the proper details I am looking for.
import tweepy
import time
#insert your Twitter keys here
consumer_key =''
consumer_secret=''
access_token=''
access_token_secret=''
#twitter_handle='TimBarbalace'
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify = True)
users = []
if(api.verify_credentials):
print("Logged In Successfully")
else:
print("Error -- Could not log in with your credentials")
followers = tweepy.Cursor(api.followers).items()
i = 99
curr = 0
for follower in followers:
if curr < i:
print(follower)
curr += 1
else:
exit()
这是JSON
User(_api=<tweepy.api.API object at 0x0000028E4D3C8F60>, _json={'id': 1898321922, 'id_str': '1898321922', 'name': 'Creator Support', 'screen_name': 'GamerGrowthHQ', 'location': 'Global', 'description': 'Supporting Creators through advice, shout-outs, and daily support. Managed by @adron_foe', 'url': 'https://www.twitch.tv/adron_foe', 'entities': {'url': {'urls': [{'url': 'https://www.twitch.tv/adron_foe', 'expanded_url': 'https://twitch.tv/adron_foe', 'display_url': 'twitch.tv/adron_foe', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 130539, 'friends_count': 73691, 'listed_count': 157, 'created_at': 'Mon Sep 23 20:37:10 +0000 2013', 'favourites_count': 2001, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuses_count': 1540, 'lang': None, 'status': {'created_at': 'Sun Sep 29
23:49:54 +0000 2019', 'id': 1178456902491131909, 'id_str': '1178456902491131909', 'text': 'RT @zFakes_: Looking for an editor to make My first twitch emote', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'retweeted_status': {'created_at': 'Sun Sep 29 10:36:55 +0000 2019', 'id': 1178257339499110401, 'id_str': '1178257339499110401', 'text': 'Looking for an editor to make My first twitch emote', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 1, 'favorite_count': 23, 'favorited': False, 'retweeted': False, 'lang': 'en'}, 'is_quote_status': False, 'retweet_count': 1, 'favorite_count': 0, 'favorited': False, 'retweeted': False, 'lang': 'en'}, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': False, 'profile_background_color': '000000', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme12/bg.gif', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme12/bg.gif', 'profile_background_tile': False, 'profile_image_url': 'http://pbs.twimg.com/profile_images/1120067816118521856/PxOWQ_Qe_normal.png', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/1120067816118521856/PxOWQ_Qe_normal.png', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/1898321922/1554732991', 'profile_link_color': '1B95E0', 'profile_sidebar_border_color': '000000', 'profile_sidebar_fill_color': '000000', 'profile_text_color': '000000', 'profile_use_background_image': False, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'can_media_tag': True, 'followed_by': True, 'following': False, 'live_following': False, 'follow_request_sent': False, 'notifications': False, 'muting': False, 'blocking': False, 'blocked_by': False, 'translator_type': 'none'}, id=1898321922, id_str='1898321922', name='Creator Support', screen_name='GamerGrowthHQ', location='Global', description='Supporting Creators through advice, shout-outs, and daily support. Managed by @adron_foe', url='https://www.twitch.tv/adron_foe', entities={'url':
{'urls': [{'url': 'https://www.twitch.tv/adron_foe', 'expanded_url': 'https://twitch.tv/adron_foe', 'display_url': 'twitch.tv/adron_foe', 'indices': [0, 23]}]}, 'description': {'urls': []}}, protected=False, followers_count=130539, friends_count=73691, listed_count=157, created_at=datetime.datetime(2013, 9, 23, 20, 37, 10), favourites_count=2001, utc_offset=None, time_zone=None, geo_enabled=False, verified=False, statuses_count=1540, lang=None, status=Status(_api=<tweepy.api.API object at 0x0000028E4D3C8F60>, _json={'created_at': 'Sun Sep 29 23:49:54 +0000 2019', 'id':
1178456902491131909, 'id_str': '1178456902491131909', 'text': 'RT @zFakes_: Looking for an editor to make My first twitch emote', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="http://twitter.com/download/iphone"
rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'retweeted_status': {'created_at': 'Sun Sep 29 10:36:55 +0000 2019', 'id': 1178257339499110401, 'id_str': '1178257339499110401', 'text': 'Looking for an editor to make My first twitch emote', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 1, 'favorite_count': 23, 'favorited': False, 'retweeted': False, 'lang': 'en'}, 'is_quote_status': False, 'retweet_count': 1, 'favorite_count': 0, 'favorited': False, 'retweeted': False, 'lang': 'en'}, created_at=datetime.datetime(2019, 9, 29, 23, 49, 54), id=1178456902491131909, id_str='1178456902491131909', text='RT @zFakes_: Looking for an editor to make My first twitch emote', truncated=False, entities={'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, source='Twitter for iPhone', source_url='http://twitter.com/download/iphone', in_reply_to_status_id=None, in_reply_to_status_id_str=None, in_reply_to_user_id=None, in_reply_to_user_id_str=None, in_reply_to_screen_name=None, geo=None, coordinates=None, place=None, contributors=None, retweeted_status=Status(_api=<tweepy.api.API object at 0x0000028E4D3C8F60>, _json={'created_at': 'Sun Sep 29 10:36:55 +0000 2019', 'id': 1178257339499110401, 'id_str': '1178257339499110401', 'text': 'Looking for an editor to make My first twitch emote', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for
Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None,
'in_reply_to_screen_name': None, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 1, 'favorite_count': 23, 'favorited': False, 'retweeted': False, 'lang': 'en'}, created_at=datetime.datetime(2019, 9, 29, 10, 36, 55), id=1178257339499110401, id_str='1178257339499110401', text='Looking for an editor to make My first twitch emote', truncated=False, entities={'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, source='Twitter for Android', source_url='http://twitter.com/download/android',
in_reply_to_status_id=None, in_reply_to_status_id_str=None, in_reply_to_user_id=None, in_reply_to_user_id_str=None, in_reply_to_screen_name=None, geo=None, coordinates=None, place=None, contributors=None, is_quote_status=False, retweet_count=1, favorite_count=23, favorited=False, retweeted=False, lang='en'), is_quote_status=False, retweet_count=1, favorite_count=0, favorited=False, retweeted=False, lang='en'), contributors_enabled=False, is_translator=False, is_translation_enabled=False, profile_background_color='000000', profile_background_image_url='http://abs.twimg.com/images/themes/theme12/bg.gif', profile_background_image_url_https='https://abs.twimg.com/images/themes/theme12/bg.gif', profile_background_tile=False, profile_image_url='http://pbs.twimg.com/profile_images/1120067816118521856/PxOWQ_Qe_normal.png', profile_image_url_https='https://pbs.twimg.com/profile_images/1120067816118521856/PxOWQ_Qe_normal.png', profile_banner_url='https://pbs.twimg.com/profile_banners/1898321922/1554732991', profile_link_color='1B95E0', profile_sidebar_border_color='000000', profile_sidebar_fill_color='000000', profile_text_color='000000', profile_use_background_image=False, has_extended_profile=False, default_profile=False, default_profile_image=False, can_media_tag=True, followed_by=True, following=False, live_following=False, follow_request_sent=False, notifications=False, muting=False, blocking=False, blocked_by=False, translator_type='none')
我正试图找到一种可重复的方法,使我能够:
I am trying to find a repeatable method that allows me to:
从已登录的Twitter帐户中获取200个关注者,解析其帐户详细信息(包括推文),并创建一个包含上述详细信息的(大)Python Pandas数据框对象.
Take 200 followers from the signed in Twitter account, parse their account details (including tweets), and create a (large) Python Pandas dataframe object containing the mentioned details.
我尝试了此链接和
I have tried this link and this link, but I have not understood how to properly implement them to accomplish what I am looking for.
另一个示例是我可以使用以下命令访问用户帐户的位置:
Another example is me being able to access the location of a user account, with the following:
import tweepy
import time
#insert your Twitter keys here
consumer_key =''
consumer_secret=''
access_token=''
access_token_secret=''
#twitter_handle='TimBarbalace'
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify = True)
users = []
if(api.verify_credentials):
print("Logged In Successfully")
else:
print("Error -- Could not log in with your credentials")
followers = tweepy.Cursor(api.followers).items()
i = 99
curr = 0
for follower in followers:
if curr < i:
print(follower.screen_name, follower.location)
curr += 1
else:
exit()
结果:
crzyazn888 Washington, DC
narutouz16
GamerGrowthHQ Global
pleasantemma Hell, Pennsylvania
karadise_art in a galaxy far, far away
webdivaloper
Maurer_Ranger The Internet
megliebsch Philadelphia, Pennyslvania
hoang_le_96 Philadelphia, PA
lasallephilo Philadelphia, PA
brianmaxwell33
BobbyJPolitics Philadelphia, PA
_nadcas
JPower96IsTaken
crypticsmystic
ZacharyFlair Washington, DC
thegierczaks1
KFlahertyRN
cbars68
kaitlyndmcd Philadelphia, PA
illMELt_withyou
jesskidding07
BetaRayJohn
tew_dedicatesd Baltimore, MD
hbthen3rd Redmond, WA
g_laubenstein Philadelphia, PA
tewsaucey
leahgarloff Philadelphia, PA
TheCage52
softballkenz13
zyocard
josephsilvestr5 Mays Chapel, MD
jerry_ooooo
karadevanney Point Place, Wisconsin
omgitsfranipher New Jersey, USA
PaigeBuckworth
LSU_studyabroad
jcaskerr
Process_Pete Towson, MD
lexyandiknowiit Maryland, USA
lawoqTr
sucreidesc83 Казань
LaSalleSGA Philadelphia, PA
N_Pilny1
Kaileyminkk
allyssapingul HOBY MD
cgarvss
ubertev
beckwoodworth
lmgeee22
nosayslion Philadelphia, PA
CoreyRayEid Los Angeles
s0_krispy
aimeemarierose3 La Salle University
where_is_harry_ La Salle University
OfficialDriscoe Baltimore, MD
THEchubby_messi
Sera_Numquam Philadelphia, PA
3dBeddingsets
CelanoScott
alixleto1
dzhuzham4 Missouri, USA
tayyheath D(M)V
50ShadesOfGlaze
Deidre_Mc
nicole_wickizer
Thomasmedia2019 California, USA
water2142
DurkinSays Philadelphia, PA
tavia_overton Baltimore, MD
NotKTLeu
CornHub35 West Palm Beach, FL
The0kayJosh cincinnati zoo
sherree_wale
XavierRivera_ Baltimore, MD
phinguyen_163
dannywess83
okweightlossdna
cd_somers Baltimore, MD
OscarOr85985212
LawAbidingHuman London Town
LorenzoTanoueAK Durham, NC
cdvsmith
StephanieeLynn0
MrAlphonsoJones Virginia
baltiMAURA
keondra281
yagirlmels
HBroughaha
mi_erna
mike_wieczorek
chase_brennan13
Maryjs93 Phoenixville, PA
Brady_McKinney Baltimore... UMD Alumni
akbashor Philadelphia, PA
LinzJustin
cabarca_14
013MG
B_kroner82
注意-阅读了一些Stack Overflow帖子后,我认为每位用户只有200条最新的tweet就足够了.
NOTE - After reading some Stack Overflow posts, I think only the newest 200 tweets per user can suffice.
我还发现此Github链接仅用于提取推文?
I also found this Github link for extracting just tweets?
我已经向这个问题添加了赏金.
I have added a Bounty to this question.
推荐答案
将tweepy对象转换为JSON:
-
初学者的Teepy -
followers
是包含User(...)
的生成器,这是tweepy.models.User
类型- 将
followers
包裹在list()
中以解开生成器的包装,或仅迭代followers
而无需解压缩它. - 我将其拆包到
list
中,以防需要检查内容 - Attribution to Tweepy for beginners
followers
is a generator containingUser(...)
, which is atweepy.models.User
type- Wrap
followers
inlist()
to unpack the generator, or just iterate through thefollowers
without unpacking it. - I unpacked it into a
list
in case there's some need to inspect the content
import tweepy import json import pandas as pd from pandas.io.json import json_normalize #insert your Twitter keys here consumer_key = '' consumer_secret= '' access_token = '' access_token_secret = '' auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) followers = list(tweepy.Cursor(api.followers).items()) # function to convert _json to JSON def jsonify_tweepy(tweepy_object): json_str = json.dumps(tweepy_object._json) return json.loads(json_str) # Call the function and unload each _json into follower_list followers_list = [jsonify_tweepy(follower) for follower in followers] # Convert followers_list to a pandas dataframe df = json_normalize(followers_list)
要获取关注者推文:
- 使用
class TweetMiner
,如顶部链接中所示 - 如前所述,我没有编写此类,但是确实使用了它,并且它按照指定的方式执行操作,以提取tweet.
- 也就是说,
except
子句是禁止的. - Use
class TweetMiner
, as shown in the link at the top - As already noted, I did not write this class, but I did use it and it performs as specified, to extract tweets.
- That said, bare
except
clauses are a no-no.
To get follower tweets:
from datetime import datetime class TweetMiner(object): result_limit = 20 data = list() api = False twitter_keys = {'consumer_key': 'your consumer_key', 'consumer_secret': 'your consumer_secret', 'access_token_key': 'your access_token', 'access_token_secret': 'your access_token_secret'} def __init__(self, keys_dict=twitter_keys, api=api, result_limit=20): self.twitter_keys = keys_dict auth = tweepy.OAuthHandler(keys_dict['consumer_key'], keys_dict['consumer_secret']) auth.set_access_token(keys_dict['access_token_key'], keys_dict['access_token_secret']) self.api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) self.twitter_keys = keys_dict self.result_limit = result_limit def mine_user_tweets(self, user, mine_rewteets=False, max_pages=5): data = list() last_tweet_id = False page = 1 while page <= max_pages: if last_tweet_id: statuses = self.api.user_timeline(screen_name=user, count=self.result_limit, max_id=last_tweet_id - 1, tweet_mode = 'extended', include_retweets=True) else: statuses = self.api.user_timeline(screen_name=user, count=self.result_limit, tweet_mode = 'extended', include_retweets=True) for item in statuses: mined = {'tweet_id': item.id, 'name': item.user.name, 'screen_name': item.user.screen_name, 'retweet_count': item.retweet_count, 'text': item.full_text, 'mined_at': datetime.now(), 'created_at': item.created_at, 'favourite_count': item.favorite_count, 'hashtags': item.entities['hashtags'], 'status_count': item.user.statuses_count, 'location': item.place, 'source_device': item.source} try: mined['retweet_text'] = item.retweeted_status.full_text except: mined['retweet_text'] = 'None' try: mined['quote_text'] = item.quoted_status.full_text mined['quote_screen_name'] = status.quoted_status.user.screen_name except: mined['quote_text'] = 'None' mined['quote_screen_name'] = 'None' last_tweet_id = item.id data.append(mined) page += 1 return data
给班级打电话
- 原始对象不包含推文
- 使用上方的
df
吸引所有关注者,并使用class TweetMiner
下载每个用户的推文. - 以下代码将创建数据帧的字典
mined_tweets_dict
,其中每个键都是用户. - The original object does not contain tweets
- Using
df
from above, get all the followers and useclass TweetMiner
to download the tweets for each user. - The follow code, will create a dict of dataframes,
mined_tweets_dict
, where each key is a user.
Call the class
miner=TweetMiner(result_limit=200) mined_tweets_dict = dict() for name in df['screen_name'].unique(): try: mined_tweets = miner.mine_user_tweets(user=name, max_pages=17) mined_tweets_dict[name] = pd.DataFrame(mined_tweets) except tweepy.TweepError as e: print(f'{name} could not be processed because {e}')
使用
.to_csv
保存:Save with
.to_csv
:with open('follower_tweets.csv', mode='a', encoding='utf-8') as f: for i, df in enumerate(mined_tweets_dict.values()): if i == 0: df.to_csv(f, header=True, index=False) else: df.to_csv(f, header=False, index=False)
这篇关于如何从Tweepy对象提取数据到 pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- Wrap
Convert the tweepy object to JSON:
- 将