C++ Cron Class Implementation

Recently for Garlic I added a basic Cron class: User can add a cron-like schedule specification and a std::function to execute, and it will wait until next operation needs to be executed, and execute it.
It involves many problems, but on this post I will talk just about how to know how many seconds the system has to wait until next cron job.

On cron.

Cron is a scheduler on which the user can set a schedule for a job to run, setting for example that it has to run for every second on every Tuesday on 2015. Or just everyday at 2am.

Example 1: Every second on every tuesday on 2015
* * * * 2 2015

Example 2: Everyday at 2am
* 2 * * * *

There is a lot more information and examples on the internet.
Most interesting for our algorithm is that the first incarnation in the 70's it checked every seconds if any of the rules was ready to be executed, but soon cron creators realised that it was much more efficient to know how long to sleep until next job.

Our cron

The version of cron here commented is not a full version: it has no period nor ranges, only * and numbers. But the architecture allows easy adding these elements (if only I had more time...).
The basic cron class just adds the CronJobs, each with the specification and function to call, and keeps them ordered by next to execute first. The work method is where it blocks to wait for next job. Every time it executes one, it reorders the vector to have next on the head again, which can be the same. If there are no jobs it sleeps for a full minute and checks again. All this could be implemented better using thread signals and conditions.

When is the next tick?

To know when is the next tick, each CronJob has a next method that returns the Unix timestamp for the next tick (to sleep(next_t - now)).
As previously shown each cron schedule identifier has 7 fields (seconds, minutes, hour, day of month, month of year, day of week (0=7=sunday) and year). But actually they are interconnected: if you choose a day of month, it might not be the proper day of week. Actually if you choose a day and a month that may not even exist (29th feb). An important note is that it looks for dates, not jsut seconds, so we will have to convert forward and beck from seconds to dates. We will use the candidate_t struct for that. Internally is just an array with 5 elements: year, month, day, hour, minute. It also provides some conversion methods to Unix timestamps and string for easy debugging.
Initially the algorithm had two phases:
  1. Propose a candidate for all constraints
  2. Check is valid
  3. If valid, convert to seconds, if not, advance and try again.
In this initial idea I try to propose a candidate that fits all the 7 constraints, and then check validity, but  something was missing. Actually a lot was missing, so I thought a lot on the problem and finally I fomalized some ideas:
  1. There are several checks to perform, at least one per constraint, but there are more as valid date (29th February). Actually almost all the checks are the same, to be in a valid range (0-59 for minutes, for example), except the day of week. 
  2. If a check fails it can increment a value to check again if it fails or not. Sometimes incrementing a value can make the range overflow (so we have to roll back to the minimum) and then the next element has to increase, so each check must have an "overflow" part. For minutes its hours, for hours its days, for days months, and for months years. 
  3. Some checks do not work just by incrementing. For example day of week; in this case all the previous checks are invalidated and we have to check all again. Actually this idea is also good for valid dates; this date is not valid (29th Feb), so increment the day and start again.
  4. There are some dates that can not be satisfied never: 2015-02-29 just don't exist. Then throw an exception. Also I set some upper limit to dates; in this case the year 2040.
  5. If all rules satisfy, then convert this date to seconds and return it.
  6. The initial condition is next minute, so calculate current minute and increment the minute.
Then the problem moves forward to create this checkers. Each check is a subclass of the abstract class Checker:

class Check {
 std::shared_ptr overflow_part; // For minutes is hours, so at minute overflow, increment hour.

 virtual bool valid(const candidate_t &candidate) = 0;
 virtual bool incr(candidate_t &candidate) = 0; // After incr, is candidate still valid, or should I look from the begining again.
 virtual std::string to_string() = 0;
 void set_overflow_part(std::shared_ptr prev){
We have three subclasses: InRange, DayOfWeek and ValidDate.

  • ValidDate ensures the 29th Feb problem and 31th of some months, taking into care the leap years.
  • DayOfWeek is not implemented yet, but should convert the date to the proper day of week (doomsday rule) and accept it or ask incr. Incr just increments the day.
  • InRange is the generic range: It receives a valid initial range, an specifier, the part of the date it changes and in a second round the overflow part. Just now understands just "*" for any or a specific number.
Once the checkers are all set up (173-189), then the main loop is run:

bool valid=true;
 for(auto &r: rules){
   if (!valid) // Start over again, this is normally not good day of week, or 30 feb style dates.
 if (candidate[0]>=max_year)

Is it efficient?

At first I thought that it could loop over all seconds and slowly increment minutes and so on.. so for lets say "0 0 1 1 * 2020" it will take around 189216000 loop cycles to find it... but it does not. At all. Actually as the constraints are fixed, then for each constraint if it satisfies it continues, if not, it sets to the minimum to it goes the minutes to 0, the hours to 0 and so on. So somehting like this rule take 1 cycle. Just one. Something more complex like: "* * 29 2 * *" takes just as many years to the next leap year (2 cycles for 2014, max is 4 cycles).
So yes. Its pretty efficient. Of course there are cases that may cycle many times, but normally it find the answer pretty fast.


With some basic blocks I could generate a basic cron schedule rule parser and system. It still has not all functionalities implemented, but the foundations are sound and no architecture problem should appear.
There are still many things that have rough edges and they will be polished in future versions.

The code is MPL 2.0, so use it in your own projects, but future version may change that to Apache2.

Please contact me in case of any questions.

No comments:

Post a Comment