Twitter 竞赛 ~ 保存推文 (PHP & MySQL) [英] Twitter competition ~ saving tweets (PHP & MySQL)

查看:20
本文介绍了Twitter 竞赛 ~ 保存推文 (PHP & MySQL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个应用程序来帮助我们的团队管理 Twitter 比赛.到目前为止,我已经成功地与 API 进行了良好的交互,并返回了一组我需要的推文.

I am creating an application to help our team manage a twitter competition. So far I've managed to interact with the API fine, and return a set of tweets that I need.

我正在努力确定处理数据库中推文存储的最佳方式、检查它们的频率以及如何确保没有重叠或间隙.

I'm struggling to decide on the best way to handle the storage of the tweets in the database, how often to check for them and how to ensure there are no overlaps or gaps.

每页最多可以获得 100 条推文.目前,我目前的想法是运行一个 cron 脚本,比如每 5 分钟左右一次,一次抓取完整的 100 条推文,然后在添加它们之前循环查看它们是否可以在数据库中找到它们.

You can get a maximum number of 100 tweets per page. At the moment, my current idea is to run a cron script say, once every 5 minutes or so and grab a full 100 tweets at a time, and loop through them looking in the db to see if I can find them, before adding them.

这有一个明显的缺点,就是每 5 分钟对数据库运行 100 次查询,但也有很多 INSERT.我真的不喜欢.另外,我更希望有一些更实时的东西.由于 twitter 是一项实时服务,因此我们应该在参赛者进入后立即更新我们的名单.

This has the obvious drawback of running 100 queries against the db every 5 minutes, and however many INSERT there are also. Which I really don't like. Plus I would much rather have something a little more real time. As twitter is a live service, it stands to reason that we should update our list of entrants as soon as they enter.

这再次引发了一个必须反复轮询 Twitter 的缺点,虽然这可能是必要的,但我不确定我是否想像这样敲打他们的 API.

This again throws up a drawback of having to repeatedly poll Twitter, which, although might be necessary, I'm not sure I want to hammer their API like that.

有人对优雅的解决方案有任何想法吗?我需要确保我捕获了所有推文,不遗漏任何人,并保持 db 用户的唯一性.虽然我考虑过只添加所有内容,然后按用户名对结果表进行分组,但它并不整洁.

Does anyone have any ideas on an elegant solution? I need to ensure that I capture all the tweets, and not leave anyone out, and keeping the db user unique. Although I have considered just adding everything and then grouping the resultant table by username, but it's not tidy.

我很高兴单独处理事物的显示方面,因为这只是从 mysql 和 display 中提取的.但是后端设计让我很头疼,因为我看不到一种有效的方法来保持它的运行而不敲击 api 或 db.

I'm happy to deal with the display side of things separately as that's just a pull from mysql and display. But the backend design is giving me a headache as I can't see an efficient way to keep it ticking over without hammering either the api or the db.

推荐答案

Twitter API 提供了一个流式 API,这可能是您想要确保捕获所有内容的目的:http://dev.twitter.com/pages/streaming_api_methods

The Twitter API offers a streaming API that is probably what you want to do to ensure you capture everything: http://dev.twitter.com/pages/streaming_api_methods

如果我明白您在寻找什么,您可能需要一个 statuses/filter,使用具有任何区别特征(主题标签、单词、词组、地点、用户).

If I understand what you're looking for, you'll probably want a statuses/filter, using the track parameter with whatever distinguishing characteristics (hashtags, words, phrases, locations, users) you're looking for.

许多 Twitter API 库都内置了此功能,但基本上您保持 HTTP 连接打开,并且 Twitter 会在推文发生时不断向您发送推文.有关详细信息,请参阅流式 API 概述.如果您的库不为您执行此操作,则您必须检查断开的连接并重新连接、检查错误代码等 - 这一切都在概述中.但是,在它们进来时添加它们将使您首先完全消除重复项(除非您只允许每个用户一个条目 - 但这是您稍后将处理的客户端限制).

Many Twitter API libraries have this built in, but basically you keep an HTTP connection open and Twitter continuously sends you tweets as they happen. See the streaming API overview for details on this. If your library doesn't do it for you, you'll have to check for dropped connections and reconnect, check the error codes, etc - it's all in the overview. But adding them as they come in will allow you to completely eliminate duplicates in the first place (unless you only allow one entry per user - but that's client-side restrictions you'll deal with later).

至于不破坏您的数据库,一旦您让 Twitter 向您发送内容,您就可以控制自己 - 您可以轻松地让您的客户在推文进来时缓存它们,然后将它们写入db 在给定的时间或计数间隔 - 每 5 分钟写一次它收集到的任何内容,或者写一次它有 100 条推文,或两者兼而有之(显然这些数字只是占位符).这是您可以根据需要检查现有用户名的时间 - 编写缓存列表将使您有最佳机会以任何方式提高效率.

As far as not hammering your DB, once you have Twitter just sending you stuff, you're in control on your end - you could easily have your client cache up the tweets as they come in, and then write them to the db at given time or count intervals - write whatever it has gathered every 5 minutes, or write once it has 100 tweets, or both (obviously these numbers are just placeholders). This is when you could check for existing usernames if you need to - writing a cached-up list would allow you the best chance to make things efficient however you want to.

更新:如果您想获得实时结果(这似乎是您所做的),我上面的解决方案可能是最好的方法.但正如另一个答案中提到的,很可能只使用 Search API 在比赛结束后收集参赛作品,而根本不用担心存储它们 - 您可以在询问结果时指定页面(如搜索 API 链接中所述),但是您可以获取的结果数量是有限制的总体而言,这可能会导致您错过某些条目.哪种解决方案最适合您的应用取决于您.

Update: My solution above is probably the best way to do it if you want to get live results (which it seems like you do). But as is mentioned in another answer, it may well be possible to just use the Search API to gather entries after the contest is over, and not worry about storing them at all - you can specify pages when you ask for results (as outlined in the Search API link), but there are limits as to how many results you can fetch overall, which may cause you to miss some entries. What solution works best for your application is up to you.

这篇关于Twitter 竞赛 ~ 保存推文 (PHP & MySQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆