在Google App Engine上使用Twitter应用程序的正确方法是什么? [英] What's the correct approach for a Twitter Application on Google App Engine?

查看:89
本文介绍了在Google App Engine上使用Twitter应用程序的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Google App Engine上开发Twitter应用程序.该应用程序基本上从Twitter用户及其关注者及其关注者等收集所有推文.它通常每位用户每次运行收集500条推文,然后将用户的数据插入数据库中.

I am trying to develop a Twitter App on Google App Engine. The app basically collects all tweets from a Twitter user's and his/her followers and their followers and so on. It typically collects 500 tweets per run per user and then inserts the data for the user into the database.

推文收集过程必须每小时进行一次.目前,我正在使用cron作业执行此操作.但是,即使对于一个用户,它也会带来很多截止日期超出的错误,这不是一个好兆头.我正在使用Python.所以我想知道该怎么用?我已经在网上搜索了一下,才知道可以使用任务队列以及cron.但是我不知道该怎么做.如果有人可以帮助我,我将非常感激.还有其他我可以使用的方法/方法吗?

The tweet collection process has to be done every hour. Currently, I am using cron jobs for doing this. But it gives a lot of Deadline exceeded errors, even for one user, which is not a good sign. I am using Python. So I wanted to know what should I use for this? I have searched on the web and came to know that task queues along with cron can be used. But I have no idea how to do that. I will be very thankful if someone can help me with that. Also is there any other method/approach which I can use?

推荐答案

为避免DeadlineExceededException,请使用多个Deferred

To avoid DeadlineExceededExceptions, use multiple Deferred Push Task Queues. With Task Queues, it's easier to break up several tasks into smaller units of work, which prevents any individual task from exceeding the 10 minute threshold allocated to Task Queues.

使用Task Queue API,应用程序可以在由用户请求发起的用户请求之外执行工作.如果应用程序需要执行一些后台工作,则可以使用Task Queue API将该工作组织为称为任务的小型离散单元.该应用程序将任务添加到任务队列中,以便稍后执行.

With the Task Queue API, applications can perform work outside of a user request, initiated by a user request. If an app needs to execute some background work, it can use the Task Queue API to organize that work into small, discrete units, called tasks. The app adds tasks to task queues to be executed later.

延迟的任务队列是推送任务队列,本质上是计划好的任务,这些任务具有应在何时触发的预定时间.以下是有关如何创建延迟任务的简短示例:

Deferred Task Queues are Push Task Queues that are essentially scheduled tasks that have a predetermined time for when they should fire. Here is a short sample of how to create a Deferred Task:

import logging

from google.appengine.ext import deferred

  def do_something_expensive(a, b, c=None):
      logging.info("Fetching Twitter feeds!")
      # Fetch the Twitter data here


# Somewhere else - Pass in parameters needed by the Twitter API
deferred.defer(do_something_expensive, "BobsTwitterParam1", "BobsTwitterParam2", c=True)
deferred.defer(do_something_expensive, "BobsFriendTwitterParam1", "BobsFriendTwitterParam2", c=True)

您从Twitter用户获取数据的过程本质上是递归的,因为您要为关注者的关注者等获取数据,并且作为单个过程的此任务可能会非常昂贵,并且可能会超过阈值.

Your process of fetching data from Twitter users is recursive by nature, since you're fetching data for followers of followers and so forth, and this task as a single process can be quite expensive and would likely exceed the threshold.

任务必须完成执行,并在原始请求后的10分钟内发送200-299之间的HTTP响应值.该期限与用户请求分开,后者的期限为60秒.如果任务的执行接近极限,则App Engine会引发DeadlineExceededError(来自google.appengine.runtime模块),您可以捕获该DeadError以保存工作或在截止日期过去之前记录进度.如果任务执行失败,则App Engine将根据您可以配置的条件重试该任务.

A task must finish executing and send an HTTP response value between 200–299 within 10 minutes of the original request. This deadline is separate from user requests, which have a 60-second deadline. If your task's execution nears the limit, App Engine raises a DeadlineExceededError (from the module google.appengine.runtime) that you can catch to save your work or log progress before the deadline passes. If the task failed to execute, App Engine retries it based on criteria that you can configure.

但是,如果您将每个Twitter用户分为一个完全独立的任务,则每个任务的运行时间仅与获取单个用户的Twitter结果所需的时间一样长.这样不仅效率更高,而且如果在提取用户数据之一时遇到问题,则只有该任务会失败,而其他任务应继续执行.

However, if you separate each Twitter user into a completely separate Task, then each task only runs for as long as it takes to fetch the Twitter results for a single user. Not only is this more efficient, but if there is a problem fetching one of the user's data, only that task would fail while the others should continue to execute.

换句话说,不要尝试在单个Task中获取所有数据.

In other words, don't try to fetch all of the data in a single Task.

或者,如果在极少数情况下或出于任何原因,这些任务应超过10分钟的阈值,请查看后端.

Alternatively, if in the unlikely event or for whatever reason these tasks should exceed the 10 minute threshold, look into Backends.

这篇关于在Google App Engine上使用Twitter应用程序的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆