使用tweepy排除从Twitter流API锐推 [英] Exclude retweets from twitter streaming api using tweepy

查看:343
本文介绍了使用tweepy排除从Twitter流API锐推的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用Python tweepy 库拉鸣叫从Twitter的流API是它可以排除锐推?

When using the python tweepy library to pull tweets from twitter's streaming API is it possible to exclude retweets?

举例来说,如果我只想要发布特定用户的前鸣叫: twitterStream.filter(遵循=20264932])但这返回锐推和我想将它们排除在外。我怎样才能做到这一点?

For instance, if I want only the tweets posted by a particular user ex: twitterStream.filter(follow = ["20264932"]) but this returns retweets and I would like to exclude them. How can I do this?

感谢您提前。

推荐答案

只是检查鸣叫的文字,看它是否与RT开始是不是一个真正的强大的解决方案。你需要做什么,你会考虑转推的决定,因为它是不完全清楚的。 Twitter的API 文档解释鸣叫以RT的推文文字都没有正式转推。

Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.

有时候,人们在一条Twitter消息的开头键入RT表明,他们正在重新发布他人的内容。这不是一个官方Twitter的命令或功能,而是意味着他们引用其他用户的Twitter消息,

如果你被官方的定义去,那么你要过滤的鸣叫,如果他们有他们的转推属性值,就像这样:

If you're going by the 'official' definition, then you want to filter tweets out if they have a True value for their retweeted attribute, like this:

if not tweet['retweeted']:
    # do something with standard tweets

如果你想成为更具包容性,包括非正式重新鸣叫,你应该检查子RT @',而不仅仅是字符串,如果它与RT开始,因为,前者是更清洁,更快速并消除更多的边缘情况下鸣叫与RT启动,但不是转推(大量的数据在那里,我敢肯定,这是一种可能性)。下面是一些code为:

And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT @' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:

if not tweet['retweeted'] and 'RT @' not in tweet['text']:
    # do something with standard tweets

后者有条件把你的收藏是经常鸣叫鸣叫的子集,它的交叉点与您的收藏鸣叫没有RT @的推文文字的子集,让你与那些所谓的鸣叫经常鸣叫。

The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT @' in the tweet text, leaving you with tweets that are supposedly regular tweets.

这篇关于使用tweepy排除从Twitter流API锐推的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆