在Twitter Streaming API中过滤数据 [英] Filter data in Twitter Streaming API

查看:188
本文介绍了在Twitter Streaming API中过滤数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试用Twitter Streaming API。一切工作就像一个魅力,但API发送我吨的数据,我不需要。是否有可能过滤API发送给我的数据?

我使用以下流: https://stream.twitter.com/1.1/statuses/filter.json




https://dev.twitter.com/docs/api/1.1/post/statuses/filter


您可以输入一组关键字作为过滤器来跟踪twitter,根据当前限制,您最多可以跟踪400个关键字。



在检索推文后,您必须再次进行手动筛选,以消除嘈杂的数据。 所以如果你可以指定你正在寻找一组关键字,你会达到你想要的;但数据中总会有噪音,因为通过简单的关键字过滤来定义smtg几乎是不可能的。例如,假设您想要跟踪与名为XYZ的品牌相关的所有推文。要得到有关品牌 XYZ 的推文,您可能只有一个关键词集合,其中只包含XYZ。 API会给你所有包含 XYZ 的推文给你,但是假设XYZ在某些语言中有意义,说这种语言的人会发推文这个词,你会也收到。还假设有一个叫做XYZ的城市,人们将发送办理登机手续。所以在这一点上,你需要通过语言检测或上下文信息检索过滤出与你的主题无关的推文。但关键是要指定关于您要覆盖的主题的关键字。



干杯。


I'm currently experimenting with the Twitter Streaming API. Everything work's like a charm, but the API sends me ton's of data, which I don't need. Is there a possibility to filter the data the API send me?

I'm using the following stream: https://stream.twitter.com/1.1/statuses/filter.json

解决方案

Take a look at the filter stream of the api:

https://dev.twitter.com/docs/api/1.1/post/statuses/filter

You can enter a set of keywords as a filter to track twitter, according to current limitations you can track up to 400 keywords.

After retrieving the tweets you have to make a manual filtering again to remove noisy data.

So if you can specify what you are looking by a set of keywords, you will achieve what you want; but there will always be noise in your data because it is almost impossible to define smtg that precisely through simple keyword filtering.

For example lets assume you wanna track all tweets related to a brand named XYZ. For getting tweets about brand XYZ you might have a one word keyword set which contains only "XYZ". API will give all the tweets containing XYZ to you, but assume that "XYZ" has a meaning in some language and people of speaking that language will tweet about that word and you will receive that too. Also assume there is a city called XYZ and people will send check-in mesasgees. So at that point you need to filter out tweets that are not related to your topic, either by language detection or contextual information retrieval. But the key is to specify your keyword set about the topic you wanna cover.

Cheers.

这篇关于在Twitter Streaming API中过滤数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆