如何从 Twitter Search API 创建 Pandas 数据框? [英] How to create pandas dataframe from Twitter Search API?
问题描述
我正在使用 Twitter 搜索 API,它返回一个字典字典.我的目标是从响应字典中的键列表创建一个数据框.
I am working with the Twitter Search API which returns a dictionary of dictionaries. My goal is to create a dataframe from a list of keys in the response dictionary.
此处的 API 响应示例:示例响应
Example of API response here: Example Response
我在 Statuses 字典中有一个键列表
I have a list of keys within the Statuses dictionary
keys = ["created_at", "text", "in_reply_to_screen_name", "source"]
我想遍历 Statuses 字典中返回的每个键值,并将它们放入以键为列的数据框中.
I would like to loop through each key value returned in the Statuses dictionary and put them in a dataframe with the keys as the columns.
目前有代码可以单独遍历单个键并分配给列表,然后附加到数据帧,但想要一种方法一次执行多个键.当前代码如下:
Currently have code to loop through a single key individually and assign to list then append to dataframe but want a way to do more than one key at a time. Current code below:
#w is the word to be queired
w = 'keyword'
#count of tweets to return
count = 1000
#API call
query = twitter.search.tweets(q= w, count = count)
def data_l2 (q, k1, k2):
data = []
for results in q[k1]:
data.append(results[k2])
return(data)
screen_names = data_l3(query, "statuses", "user", "screen_name")
data = {'screen_names':screen_names,
'tweets':tweets}
frame=pd.DataFrame(data)
frame
推荐答案
我将分享我在使用 Twitter API 时想到的更通用的解决方案.假设您在一个名为 my_ids
的列表中拥有要获取的推文 ID:
I will share a more generic solution that I came up with, as I was working with the Twitter API. Let's say you have the ID's of tweets that you want to fetch in a list called my_ids
:
# Fetch tweets from the twitter API using the following loop:
list_of_tweets = []
# Tweets that can't be found are saved in the list below:
cant_find_tweets_for_those_ids = []
for each_id in my_ids:
try:
list_of_tweets.append(api.get_status(each_id))
except Exception as e:
cant_find_tweets_for_those_ids.append(each_id)
然后在这个代码块中,我们隔离了我们下载的每个 tweepy 状态对象的 json 部分,并将它们全部添加到列表中....
Then in this code block we isolate the json part of each tweepy status object that we have downloaded and we add them all into a list....
my_list_of_dicts = []
for each_json_tweet in list_of_tweets:
my_list_of_dicts.append(each_json_tweet._json)
...然后我们将这个列表写入一个 txt 文件:
...and we write this list into a txt file:
with open('tweet_json.txt', 'w') as file:
file.write(json.dumps(my_list_of_dicts, indent=4))
现在我们将从 tweet_json.txt 文件创建一个 DataFrame(我已经添加了一些与我正在处理的用例相关的键,但您可以添加您的特定键):
Now we are going to create a DataFrame from the tweet_json.txt file (I have added some keys that were relevant to my use case that I was working on, but you can add your specific keys instead):
my_demo_list = []
with open('tweet_json.txt', encoding='utf-8') as json_file:
all_data = json.load(json_file)
for each_dictionary in all_data:
tweet_id = each_dictionary['id']
whole_tweet = each_dictionary['text']
only_url = whole_tweet[whole_tweet.find('https'):]
favorite_count = each_dictionary['favorite_count']
retweet_count = each_dictionary['retweet_count']
created_at = each_dictionary['created_at']
whole_source = each_dictionary['source']
only_device = whole_source[whole_source.find('rel="nofollow">') + 15:-4]
source = only_device
retweeted_status = each_dictionary['retweeted_status'] = each_dictionary.get('retweeted_status', 'Original tweet')
if retweeted_status == 'Original tweet':
url = only_url
else:
retweeted_status = 'This is a retweet'
url = 'This is a retweet'
my_demo_list.append({'tweet_id': str(tweet_id),
'favorite_count': int(favorite_count),
'retweet_count': int(retweet_count),
'url': url,
'created_at': created_at,
'source': source,
'retweeted_status': retweeted_status,
})
tweet_json = pd.DataFrame(my_demo_list, columns = ['tweet_id', 'favorite_count',
'retweet_count', 'created_at',
'source', 'retweeted_status', 'url'])
这篇关于如何从 Twitter Search API 创建 Pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!