计算趋势主题 [英] Computing Trending Topics

查看:77
本文介绍了计算趋势主题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我正在基于各种条件从Twitter收集推文,并将这些推文存储在本地mysql数据库中.我希望能够处理诸如twitter之类的趋势主题,长度可以在1-3个字之间.

Let's say I'm collecting tweets from twitter based on a variety of criteria and storing these tweets in a local mysql database. I want to be able to computer trending topics, like twitter, that can be anywhere from 1-3 words in length.

是否可以编写脚本来执行类似PHP和mysql的操作?

Is it possible to write a script to do something like this PHP and mysql?

一旦找到计数项,我就找到了如何计算哪些术语是热门"的答案,但我只停留在第一部分.我应该如何将数据存储在数据库中,如何计算数据库中长度为1-3个字的术语的频率?

I've found answering on how to compute which terms are "hot" once you're able to get counts of the terms, but I'm stuck at the first part. How should I store the data in the database, how can I count frequency of terms in the database that are 1-3 words in length?

推荐答案

我寄来的热门话题:
1.获取推文
2.按空间将每个tweet拆分为n克(如果需要3个字长,则最多3克)数组
3.从url,@ username,常用词和垃圾字符中过滤出每个数组
4.计算所有唯一的关键字/词组频率
5.忽略一些垃圾词/短语

trending topic receipt from me :
1. fetch the tweets
2. split each tweets by space into n-gram (up to 3 gram if you want 3 words length) array
3. filter out each array from url, @username, common words and junk chars
4. count all unique keyword / phrase frequency
5. mute some junk word / phrase

是的,您可以在php& mysql;)

yes, you can do it on php & mysql ;)

这篇关于计算趋势主题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆