Akka框架支持查找重复消息 [英] Akka framework support for finding duplicate messages

查看:71
本文介绍了Akka框架支持查找重复消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Akka和Scala构建高性能的分布式系统。

I'm trying to build a high-performance distributed system with Akka and Scala.

如果消息要求进行昂贵的(且无副作用)计算到达,并且之前已经要求过完全相同的计算,我想避免再次计算结果。如果先前请求的计算已经完成并且结果可用,我可以对其进行缓存并重新使用。

If a message requesting an expensive (and side-effect-free) computation arrives, and the exact same computation has already been requested before, I want to avoid computing the result again. If the computation requested previously has already completed and the result is available, I can cache it and re-use it.

但是,可以在其中进行重复计算的时间窗口所要求的可能很小。例如出于所有实际目的,我可能同时收到一千或一百万条消息,要求同时进行昂贵的计算。

However, the time window in which duplicate computation can be requested may be arbitrarily small. e.g. I could get a thousand or a million messages requesting the same expensive computation at the same instant for all practical purposes.

有一种名为Gigaspaces的商业产品可以处理这种情况

There is a commercial product called Gigaspaces that supposedly handles this situation.

但是,目前似乎尚无框架支持在Akka中处理重复的工作请求。既然Akka框架已经可以访问通过该框架路由的所有消息,那么看来在这里框架解决方案可能很有意义。

However there seems to be no framework support for dealing with duplicate work requests in Akka at the moment. Given that the Akka framework already has access to all the messages being routed through the framework, it seems that a framework solution could make a lot of sense here.

这是什么我建议Akka框架执行以下操作:
1.创建一个特征以指示将要使用以下缓存方法的消息类型(例如, ExpensiveComputation或类似的消息)。
2.在用户可配置的时间窗口内,巧妙地(散列等)识别(相同或不同)参与者所接收的相同消息。其他选项:选择用于此目的的最大内存缓冲区大小,这取决于(例如,LRU)替换等。Akka还可以选择仅缓存处理成本高昂的消息结果;花费很少时间处理的消息可以根据需要再次进行处理;无需浪费宝贵的缓冲空间来缓存它们及其结果。
3.当识别出相同的消息(在该时间窗口内接收到,可能在同一时刻)时,请避免不必要的重复计算。该框架将自动执行此操作,从本质上讲,重复的消息将永远不会被新的参与者接收以进行处理;它们将无声地消失,并且处理一次的结果(无论该计算是在过去完成还是在进行中)将被发送给所有适当的接收者(如果已经可用则立即发送,如果没有则在完成计算之后)。请注意,即使回复字段不同,消息也应被视为相同,只要它们表示的语义/计算在其他方面相同。还要注意,计算应该纯粹是功能性的,即没有副作用,以使建议的缓存优化有效并且完全不改变程序语义。

Here is what I am proposing for the Akka framework to do: 1. Create a trait to indicate a type of messages (say, "ExpensiveComputation" or something similar) that are to be subject to the following caching approach. 2. Smartly (hashing etc.) identify identical messages received by (the same or different) actors within a user-configurable time window. Other options: select a maximum buffer size of memory to be used for this purpose, subject to (say LRU) replacement etc. Akka can also choose to cache only the results of messages that were expensive to process; the messages that took very little time to process can be re-processed again if needed; no need to waste precious buffer space caching them and their results. 3. When identical messages (received within that time window, possibly "at the same time instant") are identified, avoid unnecessary duplicate computations. The framework would do this automatically, and essentially, the duplicate messages would never get received by a new actor for processing; they would silently vanish and the result from processing it once (whether that computation was already done in the past, or ongoing right then) would get sent to all appropriate recipients (immediately if already available, and upon completion of the computation if not). Note that messages should be considered identical even if the "reply" fields are different, as long as the semantics/computations they represent are identical in every other respect. Also note that the computation should be purely functional, i.e. free from side-effects, for the caching optimization suggested to work and not change the program semantics at all.

我建议与Akka的处理方式不兼容,并且/或者如果您看到一些强烈的理由来说明这是一个非常糟糕的主意,请告诉我。

If what I am suggesting is not compatible with the Akka way of doing things, and/or if you see some strong reasons why this is a very bad idea, please let me know.

谢谢,
很棒,斯卡拉(Scala)

Thanks, Is Awesome, Scala

推荐答案

您要问的是不依赖于Akka框架,而这是您构造演员和消息的方式。首先,通过equals / hashCode方法确保您的消息是不变的,并且具有适当定义的身份。案例类免费提供给您,但是如果您在消息中嵌入了actorRefs以便进行回复,则您将必须覆盖identity方法。案例类参数还应具有递归的相同属性(不变且正确的标识)。

What you are asking is not dependent on the Akka framework but rather it's how you architect your actors and messages. First ensuring that your messages are immutable and have an appropriately defined identities via the equals/hashCode methods. Case classes give you both for free however if you have actorRefs embedded in the message for reply purposes you will have to override the identity methods. The case class parameters should also have the same properties recursively (immutable and proper identity).

第二,您需要弄清楚参与者如何处理存储和标识当前/过去计算。最简单的方法是将请求唯一映射到参与者。这样,该参与者(只有该参与者)将能够处理该特定请求。给定一组固定的actor和请求的hashCode,可以轻松完成此操作。 如果参与者是受监督的角色,则奖励积分:主管在其中管理负载平衡/映射并替换失败的参与者(Akka使这部分变得容易)。

Secondly you need to figure out how the actors will handle storing and identifying current/past computations. The easiest is to uniquely map requests to actors. This way that actor and only that actor will ever process that specific request. This can be done easily given a fixed set of actors and the hashCode of the request. Bonus points if the actor set is supervised where the supervisor is managing the load balancing/mapping and replacing failed actors ( Akka makes this part easy ).

最后,参与者本身可以根据您描述的条件来维持响应缓存行为。在actor上下文中,所有内容都是线程安全的,因此,通过您自己想要的任何类型的行为,由请求本身作为键控的LRU缓存(请记住良好的身份属性)都很容易。

Finally the actor itself can maintain a response caching behavior based on the criteria you described. Everything is thread safe in the context of the actor so a LRU cache keyed by the request itself ( good identity properties remember ) is easy with any type of behavior you want.

这篇关于Akka框架支持查找重复消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆