Twitter的竞争〜节省鸣叫(PHP&安培; MySQL的) [英] Twitter competition ~ saving tweets (PHP & MySQL)

查看:158
本文介绍了Twitter的竞争〜节省鸣叫(PHP&安培; MySQL的)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建一个应用程序,以帮助我们的团队管理Twitter的竞争。到目前为止,我已经成功地与API精细交互,并返回一组,我需要的鸣叫。

I am creating an application to help our team manage a twitter competition. So far I've managed to interact with the API fine, and return a set of tweets that I need.

我挣扎在处理在数据库中的鸣叫的存储的最佳方式决定,如何经常检查他们如何确保没有重叠或空白。

I'm struggling to decide on the best way to handle the storage of the tweets in the database, how often to check for them and how to ensure there are no overlaps or gaps.

您可以得到每页100个微博的最大数量。此刻,我目前的想法是将他们之前运行cron脚本说,每5分钟一次左右,并在同一时间抢了整整100鸣叫和循环通过他们寻找的分贝,看看我能找到他们,

You can get a maximum number of 100 tweets per page. At the moment, my current idea is to run a cron script say, once every 5 minutes or so and grab a full 100 tweets at a time, and loop through them looking in the db to see if I can find them, before adding them.

这有一台运行100个查询对数据库每5分钟的明显缺点,而且然而,许多插入也有。我真的不喜欢。另外,我宁愿有一些更实时。由于Twitter是一个实时服务,按理说,我们应该尽快他们进入更新我们的参赛名单。

This has the obvious drawback of running 100 queries against the db every 5 minutes, and however many INSERT there are also. Which I really don't like. Plus I would much rather have something a little more real time. As twitter is a live service, it stands to reason that we should update our list of entrants as soon as they enter.

这又牵扯出不必重复投票表决的Twitter,这虽然可能是必要的,我不知道我要像锤子,他们的API的缺点。

This again throws up a drawback of having to repeatedly poll Twitter, which, although might be necessary, I'm not sure I want to hammer their API like that.

有没有人有一个优雅的解决任何想法?我需要确保我捕获所有的微博,而不是让任何人出去,并保持数据库的用户是唯一的。虽然我认为只是将一切,然后通过用户名分组得到的表,但它并不整齐。

Does anyone have any ideas on an elegant solution? I need to ensure that I capture all the tweets, and not leave anyone out, and keeping the db user unique. Although I have considered just adding everything and then grouping the resultant table by username, but it's not tidy.

我很高兴来处理事情分开显示的一侧,这只是从MySQL和显示一拉。但后端的设计是让我头疼,因为我看不到一个有效的方式来保持它的空转不锤击无论是API或分贝。

I'm happy to deal with the display side of things separately as that's just a pull from mysql and display. But the backend design is giving me a headache as I can't see an efficient way to keep it ticking over without hammering either the api or the db.

推荐答案

Twitter的API提供了一个流API这可能是你想做的事,以确保您捕捉一切东西:
http://dev.twitter.com/pages/streaming_api_methods

The Twitter API offers a streaming API that is probably what you want to do to ensure you capture everything: http://dev.twitter.com/pages/streaming_api_methods

如果我明白你在找什么,你可能会想一个状态/过滤器,使用跟踪参数与您正在寻找的任何区别特征(井号标签,单词,短语,地点,用户)。

If I understand what you're looking for, you'll probably want a statuses/filter, using the track parameter with whatever distinguishing characteristics (hashtags, words, phrases, locations, users) you're looking for.

许多Twitter的API库有这个内置的,但基本上你保持HTTP连接开放和Twitter不断地发送您的鸣叫,因为它们发生。见href=\"http://dev.twitter.com/pages/streaming_api_concepts#connecting\" rel=\"nofollow\">流API概述关于这个细节

Many Twitter API libraries have this built in, but basically you keep an HTTP connection open and Twitter continuously sends you tweets as they happen. See the streaming API overview for details on this. If your library doesn't do it for you, you'll have to check for dropped connections and reconnect, check the error codes, etc - it's all in the overview. But adding them as they come in will allow you to completely eliminate duplicates in the first place (unless you only allow one entry per user - but that's client-side restrictions you'll deal with later).

至于不骂你的数据库,一旦你拥有Twitter刚刚送你的东西,你在你的最终控制 - 你可以轻松拥有您的客户端缓存了鸣叫,因为他们进来,然后将它们写入分贝给定的时间或计数间隔 - 写无论它每隔5分钟云集,或写一次有100鸣叫,或两者(显然,这些数字只是占位符)。这时候,你可以检查现有的用户名,如果你需要 - 写缓存的列表中选择将允许你做的事情有效率的最好机会,但是你想为

As far as not hammering your DB, once you have Twitter just sending you stuff, you're in control on your end - you could easily have your client cache up the tweets as they come in, and then write them to the db at given time or count intervals - write whatever it has gathered every 5 minutes, or write once it has 100 tweets, or both (obviously these numbers are just placeholders). This is when you could check for existing usernames if you need to - writing a cached-up list would allow you the best chance to make things efficient however you want to.

更新:
上面我的解决方案可能是这样做,如果你想获得实时的结果(这好像你这样做)的最佳方法。但正如另一种答案中提到,它可能是可以只使用搜索API 收集条目之后比赛已经结束了,而不用担心它们存储在所有 - 当你问结果(如搜索API链接概述),您可以指定页面,但也有限制至于多少效果,你可以整体取出,这可能会导致你错过一些条目。有什么解决方案最适合你的应用程序是由你。

Update: My solution above is probably the best way to do it if you want to get live results (which it seems like you do). But as is mentioned in another answer, it may well be possible to just use the Search API to gather entries after the contest is over, and not worry about storing them at all - you can specify pages when you ask for results (as outlined in the Search API link), but there are limits as to how many results you can fetch overall, which may cause you to miss some entries. What solution works best for your application is up to you.

这篇关于Twitter的竞争〜节省鸣叫(PHP&安培; MySQL的)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆