在AWS Lambda实例之间协调单例令牌 [英] Coordinating singleton token between AWS Lambda instances

查看:50
本文介绍了在AWS Lambda实例之间协调单例令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑将聊天机器人编写为AWS Lambda函数.它是通过来自第三方服务(集成了该服务的聊天服务)的HTTP请求来调用的.现在,该第三方API有点……古怪.它的主要问题是:它需要一个身份验证令牌才能与之交互,但一次只能存在一个令牌.它的工作方式是:

Consider a chat bot written as an AWS Lambda function. It gets invoked via HTTP requests from a 3rd party service, the chat service into which it is integrated. Now, that 3rd party API is a bit… quirky. The main issue with it is: it requires an authentication token to interact with, but only one token can exist at any one time. The way it works is:

  • Bot使用一堆秘密信息创建对 POST/auth 的请求,并作为响应返回令牌.
  • 然后,
  • Bot请求使用 Authorization:Bearer< token> 进行 POST/messages 的请求,以将消息发布到聊天室.
  • 令牌会在几个小时后失效.
  • Bot creates a request to POST /auth with a bunch of secret information, and gets returned a token in response.
  • Bot then makes a request to POST /messages with Authorization: Bearer <token> to post messages to the chat.
  • Tokens expire after some hours.

问题的关键在于 POST/auth 请求会重置任何先前的令牌,并使所有其他持有令牌的客户端无效.没有 GET/auth 或类似的方法来获取现有令牌.

The crux is that a POST /auth request resets any previous token and invalidates all other clients which hold tokens. There is no GET /auth or similar to get existing tokens.

这现在成为随机产生的机器人的并发实例之间的协调问题.它们都需要使用相同的令牌,但是将独立生成令牌.我不是很热衷于引入一些单例服务,其任务是仅协调令牌,我想保留独立缩放的Lambda范式.我正在使用DynamoDB存储令牌以尝试在bot实例之间协调它们,但是仍然存在边际竞争的情况,这可能需要多达三次重试才能解决令牌.

This now becomes a coordination problem between concurrent instances of randomly spawned bots. They all need to use the same token, but will generate tokens independently. I'm not terribly keen on introducing some singleton service whose task it would be to only coordinate the token, I'd like to keep the independently-scaling Lambda paradigm. I'm using DynamoDB to store tokens to try to coordinate them between bot instances, but still have edge case race conditions which can require up to three retries to get tokens settled.

最坏的情况是DynamoDB中存储的令牌无效,并且两个bot被同时实例化:

The worst case is an invalid token stored in DynamoDB and two bots being instantiated simultaneously:

  • 两者都会读取无效的令牌
  • 两者都会尝试失败的请求
  • 两者都将尝试再次读取令牌,因为与此同时另一个机器人可能已对其进行刷新
  • 都将丢弃已知为错误的令牌
  • 两者都将生成一个新令牌并将其存储,其中一个随机地赢得"
  • 失败者"将再次提出错误请求,并重复之前的步骤
  • both will read the invalid token
  • both will attempt requests which fail
  • both will try to read the token again, since another bot may have refreshed it in the meantime
  • both will discard the token which is known to be bad
  • both will generate a new token and store it, one of them randomly "winning"
  • the "loser" will be doing another bad request and repeat the previous steps

理想情况下,将存在一个可锁定资源,一个机器人可以锁定并生成令牌,而其他机器人可以等待.但是AFAIK DynamoDB没有这种功能.

Ideally there would be a lockable resource which one bot can lock and generate a token, and others can wait for. But AFAIK DynamoDB doesn't have that kind of functionality.

什么是在独立的并行Lambda实例之间协调此类单例令牌的良好模式和/或AWS服务?

What is a good pattern and/or AWS service to coordinate such a singleton token between independent, parallel Lambda instances?

推荐答案

对于@Jens的答案,我想提出一点不同的看法:

I'd like to offer a slightly different take on the @Jens' answer:

  1. 使用DynamoDB存储当前的身份验证令牌:具有单个条目的DDB表可以用作身份验证令牌的权威来源.有关如何更新令牌的更多信息,请参见下面的#3.

  1. Use DynamoDB to store the current auth token: A DDB table with a single entry can be used as an authoritative source for the auth token. More on how to update the tokens at #3 below.

用于auth令牌的本地缓存:虽然Lambda应该是无状态的,但实际上确实具有执行

Local caching for the auth token: While Lambda is supposed to be stateless, it does actually have the ability to do some local caching, so each Lambda instance should follow some logic like this:

  • 如果本地缓存中的auth令牌的值为NULL,请从DDB读取它并更新本地缓存
  • 如果auth令牌的值非NULL,则尝试使用auth令牌进行请求
  • 如果请求由于令牌无效而失败,请尝试刷新令牌(请参见下面的#3)

利用条件写入来刷新令牌:为此,任何发现错误令牌的lambda实例都应尝试指定自身",以便重新分配令牌.通过对有条件写入来刷新令牌DDB表中的令牌记录,尝试声明自己为自行指定的更新程序

Take advantage of conditional writes to refresh tokens: For this, any lambda instance that finds a bad token should attempt to "designate itself" to refresh the token by doing a conditional write on the token record in the DDB table, attempting to claim itself as the self-designated updater

  • 如果条件写入成功,则lambda继续执行 POST/auth 请求,获取新令牌并随后更新DDB记录并从锁中删除自身;
  • 如果有条件写入失败,则lambda应该回退一段时间(大约200毫秒左右),然后尝试重新读取记录,以查看另一个lambda是否在此期间成功刷新了令牌.
  • 如果在读取记录时,lambda确定令牌已刷新(即令牌值与令牌值不同),则可以继续使用新令牌,并且一切都很好;如果发现记录仍被锁定,则可以再等待一段时间然后重试
  • 您可能应该想出一个安全的退避时间段(例如1-2秒),在此之后,lambda应该接管"该时间段.假设由于任何原因,先前锁定它的另一个实例已经终止,则负责更新令牌的责任;这是您不会陷入所有实例永远等待的僵局
  • if the conditional write succeeds, the lambda continues to make the POST /auth request, get a new token and subsequently update the DDB record and removes itself from the lock;
  • if the conditional write fails, the lambda should back-off for some time (maybe 200 ms or so) and then attempt to re-read the record to see if another lambda successfully refreshed the token in the mean time
  • if upon reading the record the lambda determines that the token had been refreshed (ie. token value is different than what it had), it can proceed to using the new token and all is good; if it find that the record is still locked, it can wait some more and retry
  • you should probably come up with a safe back-off period of time (say 1-2 seconds) after which a lambda should "take over" the responsibility of updating the token, under the assumption that the other instance that had previously locked it has been terminated for whatever reason; this was you don't end up in a deadlock where all instances wait forever


此解决方案的优点是减少了对每个请求从DDB读取数据的需求,并且不需要任何其他服务(Lambda和DDB除外).


This solution has the advantage of reducing the need to read from DDB for each request and not requiring any other services (other the Lambda and DDB).

当执行条件写入时,lambda可以为该操作生成UUID,并且将该UUID和时间戳存储在项目上,以指示该项目被锁定"的时间.后续实例将读取UUID和时间戳,并确定该项目上必须有其他实例.并且,如果自该项目被锁定以来已经过了太多时间,则另一个实例可以接管".

When performing the conditional write, lambda can generate a UUID for the operation and store this UUID and a timestamp on the items to indicate the time when the item was "locked". Subsequent instances will read the UUID and timestamp and determine that some other instance must be working on the item. And if too much time has passed since the item was locked, another instance can "take over".

这篇关于在AWS Lambda实例之间协调单例令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆