Introducing exq-scheduler, a distributed scheduler for Sidekiq

One of our customers wanted to setup a time based job scheduling system, similar to cron, to reliably schedule critical tasks. They were already using sidekiq, to run some tasks in the background.

To improve fault tolerance, we tried running sidekiq-schedular (a popular scheduling library) in a distributed setup. But we found synchronization issues1 that could lead to jobs being scheduled multiple times under some scenarios.

In our effort to fix these issues, we built exq-scheduler.

We had to understand some finer aspects of distributed systems, to get a better understanding of how processes synchronize using shared memory. We plan to write about some of these understandings in subsequent notes.

About Sidekiq and Exq

Sidekiq is a job processing library. It has two components, a client and worker. It uses a redis LIST as storage. A job instruction is a json object complying with a schema. A sidekiq client creates a job instruction in the specified schema & pushes it to a queue (redis LIST). A sidekiq worker listening to the queue receives this instruction and performs a related task.

client and worker can be written in any language, as long as they work with the same schema, as implemented by sidekiq. Exq is an elixir library, which complies with the same storage schema. In other words, you could use sidekiq client with exq worker, or visa versa, or even use exq for both client and worker.

exq-scheduler is a sidekiq client that enqueues job instructions as per a time schedule.

These time schedules can be configured using cron syntax as follows.

config :exq_scheduler, :schedules,
  signup_report: %{
    cron: "0 * * * *",
    class: "SignUpReportWorker",
    args: ["arg1", "arg2"],
    queue: "default"
  }

Features

Here are some of exq-scheduler’s salient features. We might expand on implementation details in future posts.

Stability

exq-scheduler is currently being used in production with one of our customers. The library is backed by tests which introduces various faults and verifies that invariants are always maintained.

  1. Issues were caused by lack of determinism when building a unique key for scheduled tasks and not handling edge cases, which account for failure of scheduler, after acquiring a lock. https://github.com/moove-it/sidekiq-scheduler/issues/156, https://github.com/moove-it/sidekiq-scheduler/issues/181 

Sometimes, hard problems need a small experienced team, and a new perspective.

Know problems where we could be of help? Let’s Talk