I've been thinking about strategies to balance queue processing across multiple system tenants. The goal is to prevent one tenant that's pushing too many jobs from delaying job processing of other tenants.
During my research I came across this post by Mike Perham (The creator of Sidekiq) in which he uses a technique called shuffle sharding to send bulk jobs to a random queue each time, and have at least 2 workers process jobs from each queue.
That way, if one tenant dispatches a bulk of jobs they will only fill one random queue while the rest of queues are normally populated with other tenants' jobs. This technique solves the problem of bulk jobs being sent in 1 request/process. However, if the tenant is really busy that it's dispatching a lot of jobs from different requests/processes, they still can fill all the queues if each dispatch operation picks a random queue.
I wanted to explore the possible option for dealing with this problem and published a video where I discuss 3 possible approaches:
In the video I cover an approach where we use Redis::throttle
to control the number of jobs a tenant can run in a minute, and release jobs back to the queue if the tenant hit the limit.
Another approach is dispatching jobs to a random queue and configuring each worker with a random queue priority.
The third approach is giving each tenant their own separate queue and configure the workers to pick a random tenant queue on each worker loop.
That last approach seems pretty decent, since jobs of each tenant are isolated in a separate queue and if one tenant is pushing a ton of jobs, they won't delay processing jobs for other tenants.
The downside of that approach is that the worker may check multiple empty queues before it find a queue with jobs to run. Also since picking queues is dynamic, Horizon won't be able to auto-balance the worker pool and you'll have to configure a set number of workers to be always running.
Finally, another approach that wasn't covered in the video is configuring Horizon to auto-balance a pool of workers that consume jobs from multiple queues while pushing jobs of every tenant to a fixed queue each time. That way if a tenant pushes a lot of jobs, they'll only fill a single queue and Horizon will start new workers to process jobs from this busy queue.
Thoughts
I'd love to hear your thoughts. Let me know in the comments section on the video or on twitter.