使用 tweepy 从 twitter 流 api 中排除转推 [英] Exclude retweets from twitter streaming api using tweepy

查看:30
本文介绍了使用 tweepy 从 twitter 流 api 中排除转推的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用 python tweepy 库从 Twitter 的流式 API 中提取推文时,是否可以排除转推?

When using the python tweepy library to pull tweets from twitter's streaming API is it possible to exclude retweets?

例如,如果我只想要特定用户发布的推文,例如:twitterStream.filter(follow = ["20264932"]) 但这会返回转推,我想排除它们.我怎样才能做到这一点?

For instance, if I want only the tweets posted by a particular user ex: twitterStream.filter(follow = ["20264932"]) but this returns retweets and I would like to exclude them. How can I do this?

提前致谢.

推荐答案

仅检查推文的文本以查看它是否以RT"开头并不是真正可靠的解决方案.你需要决定你会考虑什么转推,因为它并不完全明确.Twitter API docs 解释了推文文本中带有RT"的推文不是正式转发.

Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.

有时人们会在推文的开头输入 RT 以表示他们正在重新发布其他人的内容.这不是 Twitter 的官方命令或功能,但表示他们正在引用其他用户的推文.

如果您遵循官方"定义,那么您想过滤掉推文,如果它们的转推属性具有 True 值,如下所示:

If you're going by the 'official' definition, then you want to filter tweets out if they have a True value for their retweeted attribute, like this:

if not tweet['retweeted']:
    # do something with standard tweets

如果你想更具包容性,包括非官方"转发,你应该检查子字符串RT@"的字符串,而不仅仅是它是否以RT"开头,因为前者更干净、更快并消除了推文以RT"开头但不是转推的更多边缘情况(那里有大量数据,我确定这是可能的).这是一些代码:

And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT @' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:

if not tweet['retweeted'] and 'RT @' not in tweet['text']:
    # do something with standard tweets

后一个条件获取您集合中常规推文的推文子集,并与您集合中推文文本中没有RT@"的推文子集进行交集,从而为您留下推文定期推文.

The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT @' in the tweet text, leaving you with tweets that are supposedly regular tweets.

这篇关于使用 tweepy 从 twitter 流 api 中排除转推的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆