我应该如何按用户和优先级按比例读取多个队列? [英] How should I read multiple queues, pro rata by user and priority?

查看:72
本文介绍了我应该如何按用户和优先级按比例读取多个队列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试思考用消息队列系统(AWS SQS/Beanstalkd/Iron MQ/Redis)替换MySQL + Cron排队系统的方法.

I am currently trying to think of ways to replace a MySQL + Cron queuing system with a message queue system (AWS SQS/Beanstalkd/Iron MQ/Redis).

假设我有100个用户.这些用户可以向我发出API请求.每个API请求都是一条SMS,必须通过我操作的单个调制解调器发送.

Let's say I have 100 users. These users are able to make API requests to me. Each API request is an SMS which I must send via a single modem which I operate.

每个SMS的优先级都为1-3.

Each SMS can have a priority of 1-3.

我面临的问题是单个调制解调器是一个瓶颈,因此我不能简单地按FIFO顺序处理队列,因为如果一个用户发送10,000 SMS,而我又将它们添加到队列中,我的另一个在第一个用户的这10,000条操作完成之前,用户将看不到任何SMS短信.

The problem that I am facing, is that the single modem is a bottleneck, so I can't simply process the queue in a FIFO order, because if one user sends 10,000 SMS and I add these to the queue, my other users would not see any SMS go out until these 10,000 for the first user have finished.

现在,我正在使用MySQL进行任务:

Right now, I am using MySQL for the task:

SELECT COUNT(*) AS `count`, `user_id` FROM `queue` GROUP BY `user_id`

这会给我这样的结果:

count  | user_id
-------|--------
10000  | 1
1      | 2

然后我将这些计数加在一起,这样我就可以处理10,001短信了.

I then add the counts together which gives me 10,001 SMS to process.

我对每一行进行求和:

(row_count / total_count) * 100 = percentage

例如:

(10000 / 10001) * 100 = 99.9900009999%
(1 / 10001) * 100 = 0.0099990001%

我知道我的调制解调器每秒可以处理140条SMS,因此,如果我的Cron以1分钟的周期运行,那么我将在一分钟内发送8,400条SMS.

I know that my modem can handle 140 SMS per second, so if my Cron runs on a 1 minute cycle, I will send 8,400 SMS in a minute.

我使用这些计算方法来进行选择:

I use these calculations to give me my selections:

ceil( (8400 / 100) * 99.9900009999) ) = 8,400 for user #1
ceil( (8400 / 100) * 0.0099990001) ) = 1 for user #2

因此,在这种情况下,我为每个具有LIMIT的用户执行一个简单的MySQL选择,按优先级ASC排序,以给我优先级为1的优先级,最后给以优先级3s的所有优先级.

So in this case, I do a simple MySQL select for each user with a LIMIT, ordering by priority ASC, to give me any priority 1s first, and any priority 3s last.

我们是否将超过8,400的数据推送到调制解调器上都没有关系,因为尽管调制解调器不能保证FIFO,但它只会在调制解调器上排队,因此我们需要尽可能保持每分钟8,400的速度.在这种情况下,我们将8401推入调制解调器.

It doesn't matter if we push more than 8,400 to the modem because it will simply queue on the modem, although the modem doesn't guarantee FIFO, so we need to be as tight on the 8,400 per minute as possible. In this case we push 8,401 to the modem.

这样做要好得多,因为我们不会只发送给用户1所有10,000,而是只发送8,400,并且即使他们只有1条SMS,也会从用户2中提取一些SMS.它仍然着重于谁要处理的SMS最多,并且它也与调制解调器的吞吐量保持一致.

This is much better, because rather than sending all 10,000 for user 1 first, we only do 8,400 and also get some of user 2's SMS out even though they only have 1 SMS. It's still weighted on who has the most SMS to process and it keeps inline with the modem throughput too.

鉴于我需要优先考虑的事实,我目前正在将Beanstalkd作为我的唯一选择.

Given the fact that I need priorities, I am currently looking at Beanstalkd as my only option.

我认为我可以为每个用户创建一个队列,当API请求进入时,将SMS和优先级一起添加到用户队列中.

I figured I could create a queue for each user, and when API requests come in, add the SMS to the user queue along with the priority.

然后,我将有一个工作人员,该工作人员对每个队列进行计数(某些用户队列可能为空,因此我不希望每个持续运行的用户都需要一个工作人员).

I would then have one worker, which does a count on each queue (some user queues may be empty so I wouldn't want a worker for each user constantly running).

一旦单个工作人员拥有每个用户的队列数,它将开始读取每个队列,直到我为每个用户指定的最大数量,然后按顺序推送到调制解调器.

Once the single worker has the queue count for each user, it will start to read each queue up to the maximum number I specify for each user and push to the modem in order.

因此,在这种情况下,它将依次为用户#1读取8,400条SMS,为用户#2读取1条SMS.

So in this case, it will read 8,400 SMS for user #1 and 1 SMS for user #2 in that order.

要将SMS发送到调制解调器,必须使用HTTP.如果我获得200 OK,则可以删除作业.如果出现错误500,我将不会删除该作业,因此它将再次被拾取.对于其他任何事情,我都会抛出一个异常,并将该工作埋在Beanstalkd中,以供人工检查.

To get SMS to the modem, I have to use HTTP. If I get a 200 OK, I can delete the job. If I get a Error 500, I will not delete the job so it will be picked up again. For anything else, I will throw an exception and bury the job in Beanstalkd for inspection by a human.

我在这里担心的是,因为我正在使用HTTP,所以这本身就是一个瓶颈.理想情况下,我希望使用cURL(140/sec)在1分钟内执行8400个HTTP请求.我知道我可以使用curl_multi_*函数同时执行10个HTTP请求来加快速度,但是我在寻找是否还有其他选择可以进一步加快速度?

My concerns here is that because I am using HTTP, this is a bottleneck in itself. Ideally I will want to perform 8,400 HTTP requests in 1 minute using cURL (140/sec). I am aware that I can use curl_multi_* functions to perform say 10 HTTP requests concurrently to speed this up but I am looking to see if there could be any other options to speed things up further?

主要问题是这正在阻止.因此,一个用户的SMS将先于其他所有用户的SMS.在这里,我们将为用户#1处理8,400条SMS,然后为用户#2处理1条SMS.

The main issue is that this is blocking. So one user's SMS will go before all of the other users SMS. Here we will process 8,400 SMS for user #1, followed by 1 SMS for user #2.

例如,当我有待处理的消息总数时,我是否应该考虑为每个用户生成一个工作程序?如果这样做,我们将同时处理用户#1和用户#2的SMS.但是,使用此选项,我确实担心无法控制去往调制解调器的HTTP请求的总量,因为我不想超载它.如果我有100个童工全部都同时向调制解调器执行10个HTTP请求,会发生什么情况?

For example, should I think about spawning a worker for each user once I have their total count of messages to process? If I did this, we would process SMS for user #1 and user #2 concurrently. With this option though, I do worry that I cannot control the overall amount of HTTP requests going to the modem, because I do not want to overload it. What happens if I have 100 child workers all doing 10 HTTP requests concurrently to the modem?

这些工人必须是子进程,一旦完成,它们将关闭.父进程需要知道这一点,然后执行另一次计算并生成新的子进程.

These workers would have to be child processes that close once finished. The parent process would need to know about this to then perform another calculation and spawn new child workers.

如果有人对如何处理多个队列,一个队列的另一端(调制解调器)的情况有任何建议,那将是最有用的.

If anyone has any suggestions on how to handle this scenario of multiple queues with one queue the other end (the modem), that would be most helpful.

推荐答案

我的第一个念头是使用Beanstalkd优先级,并将消息分成不同的组,每组的优先级都不同.

My first thought is to use Beanstalkd priorities, and split the messages into groups, each with a different priority.

  • 用户1想要发送10,000 msgs.
  • 用户2要发送101消息.

  • User 1 wants to send 10,000 msgs.
  • User 2 wants to send 101 msgs.

条消息1-100被置于优先级1的队列中

messages 1-100 of user 1 are put into the queue at priority 1

每个消息的前100条消息首先发送(真正离开网关的消息取决于将它们放入队列的时间).在没有延迟的情况下(例如在90秒后发送),最接近优先级0的消息/作业将首先发送.

The first 100 messages of each are sent first (which ones really leaves the gate depends on when they were put into the queue). Without a delay (eg, send after 90 seconds) involved, messages/jobs closest to priority 0 get sent first.

为确保每轮都有用户发送,我将您设置的最大优先级限制为拥有的客户数量,因此您不会让最大的客户最终获得优先级1,000,000或更多,这意味着他们其余的所有消息都必须等到其他所有人都完成之后才能使用.只需将优先级重新设为1.

To make sure that some of every user are sent on every round, I'd limit the max priority that you set to the number of customers that you have, so you don't have your biggest customer end up with a priority of 1,000,000 or more, which would that all the rest of their messages had to wait until everyone else had completed. Just restart the priority back at one.

这篇关于我应该如何按用户和优先级按比例读取多个队列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆