根据Python中的tweet文本对tweet的类型(tweet/retweet/mention)进行分类 [英] Classify type of tweet (tweet/retweet/mention) based on tweet text in Python
问题描述
通过几个示例,我已经能够创建一个简单的Python脚本来解析Twitter Streaming API的JSON输出,并为每条推文打印出screen_name
和text
.我想修改我的代码,将每条推文也归类为以下之一:
Pulling from a couple of different examples, I've been able to create a simple Python script that parses the JSON output from the Twitter Streaming API, and prints out the screen_name
and text
for each tweet. I would like to modify my code to also classify each tweet as one of the following:
(1)转推->在"tweet"文本列中有一个"RT @anyusername"
(1) Retweet --> There is an "RT @anyusername" somewhere in the tweet text column
(2)提及-> tweet列中有一个"@anyusername",但没有任何"RT @anyusername"
(2) Mention --> There is an "@anyusername" but no "RT @anyusername" in the tweet column
(3) Tweet ->在tweet列中没有"RT @anyusername"或任何"@anyusername"
(3) Tweet --> There is no "RT @anyusername" nor any "@anyusername" in the tweet column
我可以使用以下公式在Excel中执行此操作,但是我仍然可以在Python中找到答案.
I can do this in Excel with the following formula, but I can figure it out in Python yet.
=IF(IFERROR(FIND("RT @",B2)>0,"False"),"Retweet",IF(IFERROR(FIND("@",B2)>0,"False"),"Mention","Tweet"))
现有代码
Existing Code
import json
import sys
from csv import writer
with open(sys.argv[1]) as in_file, \
open(sys.argv[2], 'w') as out_file:
print >> out_file, 'tweet_author, tweet_text, tweet_type'
csv = writer(out_file)
for line in in_file:
try:
tweet = json.loads(line)
except:
pass
tweet_text = tweet['text']
row = (
tweet['user']['screen_name'],
tweet_text
)
values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
csv.writerow(values)
推荐答案
我这里没有任何python解释器,但应该与此类似:
I don't have any python interpreter here, but it should be something similar to this:
import re
def url_match(tweet):
match = re.match(r'RT\s@....+', tweet)
if match:
return "RT"
else:
match = re.match(r'@....+', tweet)
if match:
return "mention"
else
return "tweet"
注意:这将适用于此分类,但是如果您要检索用户名(即@USERNAME),则必须对此稍作调整.
Note: this will work for this classification, but if you want to retrieve usernames i.e. @USERNAME you will have to tweak this a little more.
这篇关于根据Python中的tweet文本对tweet的类型(tweet/retweet/mention)进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!