如何使用nltk从Twitter抓取流数据与pycurl连接-正则表达式 [英] How to grab streaming data from twitter connect with pycurl using nltk - regular expression
问题描述
我是Python的新手,并且得到了老板的任务:
I am newbie in Python and given a task from my boss to do this :
- twitter的流式传输数据与pycurl连接并以JSON输出
- 使用NLTK和正则表达式进行解析
- 将其保存到数据库文件(mySQL)或文件库(txt)
注意:这是我要抓取的网址('http://search.twitter.com/search.json?geocode=-0.789275%2C113.921327%2C1.0km&q=+near%3Aindonesia+以内%3A1km& result_type = recent& rpp = 10')
Note : this is the url that i want to grab ('http://search.twitter.com/search.json?geocode=-0.789275%2C113.921327%2C1.0km&q=+near%3Aindonesia+within%3A1km&result_type=recent&rpp=10')
有没有人知道如何使用上述步骤从Twitter抓取流数据?
Is there anyone know how to grab a streaming data from twitter using the step above ?
您的帮助将不胜感激:)
Your help would be very grateful :)
推荐答案
我会看模式:这是一个非常不错的Web挖掘库,并且还附带了Twitter挖掘api.该文档也很好.
I would look at pattern: it's a very nice web mining library, and it comes with a Twitter mining api as well. The documentation is pretty good too.
否则,请查看 https://dev.twitter.com/docs/twitter-libraries 用于Twitter库,并且获取流也应该非常简单.
Otherwise, look at https://dev.twitter.com/docs/twitter-libraries for twitter libraries, and getting the stream should be pretty straightforward too.
这篇关于如何使用nltk从Twitter抓取流数据与pycurl连接-正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!