Akka用于REST轮询 [英] Akka for REST polling

查看:85
本文介绍了Akka用于REST轮询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将大型Scala + Akka + PlayMini应用程序与外部REST API接口.这个想法是定期(基本上每1到10分钟一次)轮询根URL,然后在子URL中进行爬网以提取数据,然后将其发送到消息队列.

I'm trying to interface a large Scala + Akka + PlayMini application with an external REST API. The idea is to periodically poll (basically every 1 to 10 minutes) a root URL and then crawl through sub-level URLs to extract data which is then sent to a message queue.

我想出了两种方法来做到这一点:

I have come up with two ways to do this:

创建角色的层次结构以匹配API的资源路径结构.在Google谷歌纵横情况下,例如,

Create a hierarchy of actors to match the resource path structure of the API. In the Google Latitude case, that would mean, e.g.

  • Actor 'latitude/v1/currentLocation' polls https://www.googleapis.com/latitude/v1/currentLocation
  • Actor 'latitude/v1/location' polls https://www.googleapis.com/latitude/v1/location
  • Actor 'latitude/v1/location/1' polls https://www.googleapis.com/latitude/v1/location/1
  • Actor 'latitude/v1/location/2' polls https://www.googleapis.com/latitude/v1/location/2
  • Actor 'latitude/v1/location/3' polls https://www.googleapis.com/latitude/v1/location/3
  • etc.

在这种情况下,每个参与者都负责定期轮询其关联的资源,以及为下一级路径资源创建/删除子参与者(即,参与者"latitude/v1/location"会创建参与者1、2、3,等等,以了解通过轮询 https://www.googleapis.com/latitude/所了解的所有位置v1/位置).

In this case, each actor is responsible for polling its associated resource periodically, as well as creating / deleting child actors for next-level path resources (i.e. actor 'latitude/v1/location' creates actors 1, 2, 3, etc. for all locations it learns about through polling of https://www.googleapis.com/latitude/v1/location).

创建一个由相同轮询参与者组成的池,这些轮询参与者接收由路由器负载均衡的轮询请求(包含资源路径),轮询一次URL,进行一些处理并计划轮询请求(针对下一级资源和针对轮询的URL).在谷歌纵横中,例如,这意味着:

Create a pool of identical polling actors which receive polling requests (containing the resource path) load-balanced by a router, poll the URL once, do some processing, and schedule polling requests (both for next-level resources and for the polled URL). In Google Latitude, that would mean for instance:

1个路由器,n个轮询角色.对 https://www.googleapis.com/latitude/v1/location 的初始轮询请求导致针对 https://www.googleapis.com/latitude的几个新的(立即)轮询请求/v1/location/1 https://www.googleapis.com/latitude/v1/location/2 等,以及一个(延迟的)对同一资源的轮询请求,即 https://www.googleapis.com/latitude/v1/位置.

1 router, n poller actors. Initial polling request for https://www.googleapis.com/latitude/v1/location leads to several new (immediate) polling requests for https://www.googleapis.com/latitude/v1/location/1, https://www.googleapis.com/latitude/v1/location/2, etc. and one (delayed) polling request for the same resource, i.e. https://www.googleapis.com/latitude/v1/location.

我已经实现了这两种解决方案,并且无法立即观察到任何相关的性能差异,至少对于我感兴趣的API和轮询频率没有.与第二种方法(在这里我需要scheduleOnce(...))一起使用system.scheduler.schedule(...).同样,假设资源被嵌套在多个级别中并且寿命很短(例如,每次轮询之间可以添加/删除多个资源),那么akka的生命周期管理可以在第一种情况下轻松杀死整个分支.第二种方法(理论上)应该更快,并且代码更容易编写.

I have implemented both solutions and can't immediately observe any relevant difference of performance, at least not for the API and polling frequencies I am interested in. I find the first approach to be somewhat easier to reason about and perhaps easier to use with system.scheduler.schedule(...) than the second approach (where I need to scheduleOnce(...)). Also, assuming resources are nested through several levels and somewhat short-lived (e.g. several resources may be added/removed between each polling), akka's lifecycle management makes it easy to kill off a whole branch in the 1st case. The second approach should (theoretically) be faster and the code is somewhat easier to write.

我的问题是:

  1. 哪种方法似乎是最好的(在性能,可扩展性,代码复杂性等方面)?
  2. 您认为这两种方法(特别是第一种方法)的设计有什么问题吗?
  3. 有人尝试实现类似的东西吗?怎么样了?

谢谢!

推荐答案

为什么不创建一个主轮询器,然后在计划表上启动异步资源请求?

Why not create a master poller, which then kicks of async resource requests on the schedule?

我不是Akka的专家,但是我给了一个机会:

I'm no expert using Akka, but I gave this a shot:

遍历资源列表以获取的轮询器对象:

The poller object that iterates through the list of resources to fetch:

import akka.util.duration._
import akka.actor._
import play.api.Play.current
import play.api.libs.concurrent.Akka

object Poller {
  val poller = Akka.system.actorOf(Props(new Actor {
    def receive = {
      case x: String => Akka.system.actorOf(Props[ActingSpider], name=x.filter(_.isLetterOrDigit)) ! x
    }
  }))

  def start(l: List[String]): List[Cancellable] =
    l.map(Akka.system.scheduler.schedule(3 seconds, 3 seconds, poller, _))

  def stop(c: Cancellable) {c.cancel()}
}

异步读取资源并触发更多异步读取的actor.您可以将消息发送安排在日程表上,而不是立即发送:

The actor that reads the resource asynchronously and triggers more async reads. You could put the message dispatch on a schedule rather than call immediately if it was kinder:

import akka.actor.{Props, Actor}
import java.io.File

class ActingSpider extends Actor {
  import context._
  def receive = {
    case name: String => {
      println("reading " + name)
      new File(name) match {
        case f if f.exists() => spider(f)
        case _ => println("File not found")
      }
      context.stop(self)
    }
  }

  def spider(file: File) {
    io.Source.fromFile(file).getLines().foreach(l => {
      val k = actorOf(Props[ActingSpider], name=l.filter(_.isLetterOrDigit))
      k ! l
    })
  }
}

这篇关于Akka用于REST轮询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆