Onion C++ Bindings best practices / patterns. pt. 1.

Normally not even I use the C bindings for onion. C is a great language to focus on algorithms and details, but for web development its too DIY. Normally to use Onion I use C++. The bindings are not 100% complete, but enough to make performant (space, size and speed) servers. As I use it more I will try to describe the best practices I can find.

Use Onion::Url

Onion::Url is the basic url regex based dispatcher. It uses onion_url under the hood. It is so nice you can add url handlers to url handlers, creating the onion structure:

Admin admin; // A "module".

Onion::Url other_urls;
other_urls.add("static", "Static text data", HTTP_OK); // Static data can be added just like that. Useful for small snippets.
other_urls.add("^lambda/(.*)$", [](Onion::Request &req, Onion::Response &res){ // Lambdas can be used too. Regex can be used to capture data.
  res<<"Hello world"<<req.query().get("1");

Onion::Url root(onion_server); // Get the root handler.

root.add("^admin/", admin.urls()); // And add sub urls.
root.add("^other/", other_urls); // Or a simple url.
root.add("^static/", onion_handler_export_local_new( static_path.c_str() )); // Also can add simple C handlers.

There are many many things that can be done with urls, but I will resume them here: If you use regex, start with ^. $ is a nice way to stop it if this is the end of the string. Groups can be created using (). If its not a regex, its fully compared, so its equivalent to ^text$.

c_handler to use the C bindings.

All Onion:: objects have a c_handler() method that returns the equivalent C pointer to use with C onion functions. So everything you can do in C you can do it with C++ too, even if the bindings are not complete. Whats more important, almost all objects can be created from a C pointer as well, so it works both directions.

Once we have this knowledge that means that, for example, all the existing C handlers, as the webdav hander, static directories or otemplate templates, can be used seamlessly, although some care may be needed.

Use of otemplate templates.

If you are not using otemplates, do it now.

To use it under C++, and until a better mechanism is developed, use the following pattern:

onion_connection_status handler(Onion::Request &req, Onion::Response &res){
  Onion::Dict context = globalContext(req);

  onion_dict *ctx=onion_dict_dup(context.c_handler());
  template_html(ctx, req.c_handler());

Actually I have a render_to_response, not yet in github, quite similar to Django's:

namespace Onion{
 typedef std::function<void (onion_dict *d, onion_response *r)> template_f;

onion_connection_status Onion::render_to_response(Onion::template_f fn, const Onion::Dict& context, Onion::Response &res)
 ONION_DEBUG("Context: %s", context.toJSON().c_str());

 onion_dict *d=const_cast<Onion::Dict*>(&context)->c_handler();
 fn(d, res.c_handler());

Use a context

As seen above, a global function that returns the context can be very useful. In django it is composed by the RequestContext function, that calls all the TEMPLATE_CONTEXT_PROCESSORS.

There are no builtin functionalities like that in Onion, so that is manual, but its a good idea. So create your own globalContext(), that receives the request handler, and fill out a dictionary with the context to pass to the otemplate.

Store state in session

This is a general advise for all web developers.

The global state of the server is not the state of one particular user. Do not use it for that. Actually if you avoid this, you may gain easier implementations, without mutexes nor anything like that. The session is already mutexed, so you get one valid session always, and after modifying, you save a proper one as well. For the event that the user do two petitions at the same time, state may be one or another, so dont push it too hard client side. And if you do, then use a proper database, not the session.


Onion Python Bindings: Fastest python HTTP server ever


Onion python bindings are blazing fast!


One of the points in making onion in C is to be able to easy create bindings to other programming languages. On onion main branch we already provide C++ bindings, which really improve a lot the onion experience. At https://github.com/krakjoe/ponion there is a prof of concept for PHP bindings, and I also made a fast prof of concept of Python bindings. They are available at https://github.com/davidmoreno/onion/tree/python.

In this version it uses the ctypes to provide the bindings, with the hope to make it usable with pypy (although it does not work right now).

A simple web server

import sys

from onion import Onion, O_POOL, O_POLL, OCS_PROCESSED
import json

def hello(request, response):

def bye(request, response):
 response.write_html_safe( 'path is <%s>'% request.fullpath() )
 #response.write_html_safe( json.dumps( request.query().copy() ) )

def main():


 urls.add("", hello)
 urls.add_static("static", "This is static text")
 urls.add(r"^.*", bye)


if __name__=='__main__':

The first two lines are about adding the library to the global path. In the future it should be installed as any other python module. The it is loaded, loading just some symbols. All of them are the same as possible as the C version and the C++.

Then we define a couple of handlers. Each handler should receive both the request and response. From the request some info about the request can be gathered, and on the response the response should be written. For this prof of concept I only implemented bindings for write and write_html_safe. Check at github source to see more options. Adding bindings for more methods is really easy.

The first handler converts all the headers to a json object, and dumps it. The second one just writes some safe html (properly quoting html symbols) to the response.

In the main function we just create the onion object, as a pool of threads, and add a url handler for several addresses. Normally is just a regex to handler mapping, but the same as onion there are special handlers already programmed in C which can give ahuge performance boost for certain situations, such as simple static data.

Finally we just call listen to listen for connections.


This is not a full performance benchmark, just a comparison with other technologies. A fast Google search yields this Blog post, which found gevent one of the fastest, so I will compare just to that. The test he gives just writes back pong:


def application(environ, start_response):
    status = '200 OK'
    output = 'Pong!'
    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]
from gevent import wsgi
wsgi.WSGIServer(('', 8088), application, spawn=None).serve_forever()

Running ./wrk http://localhost:8088/ I get:

Running 10s test @ http://localhost:8088/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.39ms  373.51us   8.30ms   85.61%
    Req/Sec     3.49k   507.99     5.33k    75.98%
  66243 requests in 10.00s, 9.92MB read
Requests/sec:   6624.38
Transfer/sec:      0.99MB

A similar program for onion-python would be:

import sys
from onion import Onion, O_POOL

def application(request, response):

urls.add("", application)

And it yields this amazing results:
Running 10s test @ http://localhost:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   344.00us  544.02us  15.02ms   91.09%
    Req/Sec    15.42k     3.17k   25.44k    69.41%
  292468 requests in 10.00s, 40.16MB read
Requests/sec:  29246.77
Transfer/sec:      4.02MB

But for this simple example maybe we can cheat a little bit, just telling onion that "Pong!" is a static content:

urls.add_static("", "Pong!")

Running 10s test @ http://localhost:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   206.52us  351.44us   7.19ms   91.60%
    Req/Sec    23.80k     3.29k   54.33k    74.12%
  449232 requests in 10.00s, 61.69MB read
Requests/sec:  44923.53
Transfer/sec:      6.17MB


Onion is quite stable and used in production in many places. This bindings are not. Many bugs may appear. For example Control-C does not work to stop it (I do Control-Z and kill -9 %1).

Onion-Python does not follow the WSGI protocol. WSGI allows an application in python that uses that interface, to use any backend. Maybe in the future this should be explored, but from my very low understanding of WSGI, that may mean that finally only the connection handling and parsing could be used, forgetting about the path dispatcher, for example.

Anyway it really looks that with some more work to expose all onion functionalities we have a huge winner here.


Twelve lines C++ INI parser


std::map<std::string, std::map<std::string,std::string>> data; 
std::string group;
for(const auto &line: underscore::file(inifile)){
 auto l=line.split('#',true)[0];
 if (l.empty())  continue;
 if (l.startswith("[") && l.endswith("]")) 
  auto p=l.split('=',true);

The power of underscore.hpp

With the use of underscore.hpp I was able to do a fairly simple, inefficient, and perfect for this place parser for ini files in just 12 lines. Its for sure less efficient that a hand crafted parser, as it reads a lot of data in memory and the starts slicing it out, and there is no error checking. But for simple ini files, it just works.

The idea is to parse just line by line (line 3), then slice out comments and empty lines (lines 4 and 5). Then we check if we are inside a group (startswith, endswith, line 6), and if so keep the group name. If not, then split at =, and keep on the key name the first part, and the rest (careful, that's not l[1], but from = on), is the value.

As a result we have a nice map of strings to map of strings to string, that is just data[group][key]=value. For parts without group, the group name is "". If there is something that is not a group nor a key=value line, then it will just store all line as the key.

Over all this we need not accessing methods, list all keys methods and so on. All of it can be accessed at garlic source code.


C++ Cron Class Implementation

Recently for Garlic I added a basic Cron class: User can add a cron-like schedule specification and a std::function to execute, and it will wait until next operation needs to be executed, and execute it.
It involves many problems, but on this post I will talk just about how to know how many seconds the system has to wait until next cron job.

On cron.

Cron is a scheduler on which the user can set a schedule for a job to run, setting for example that it has to run for every second on every Tuesday on 2015. Or just everyday at 2am.

Example 1: Every second on every tuesday on 2015
* * * * 2 2015

Example 2: Everyday at 2am
* 2 * * * *

There is a lot more information and examples on the internet.
Most interesting for our algorithm is that the first incarnation in the 70's it checked every seconds if any of the rules was ready to be executed, but soon cron creators realised that it was much more efficient to know how long to sleep until next job.

Our cron

The version of cron here commented is not a full version: it has no period nor ranges, only * and numbers. But the architecture allows easy adding these elements (if only I had more time...).
The basic cron class just adds the CronJobs, each with the specification and function to call, and keeps them ordered by next to execute first. The work method is where it blocks to wait for next job. Every time it executes one, it reorders the vector to have next on the head again, which can be the same. If there are no jobs it sleeps for a full minute and checks again. All this could be implemented better using thread signals and conditions.

When is the next tick?

To know when is the next tick, each CronJob has a next method that returns the Unix timestamp for the next tick (to sleep(next_t - now)).
As previously shown each cron schedule identifier has 7 fields (seconds, minutes, hour, day of month, month of year, day of week (0=7=sunday) and year). But actually they are interconnected: if you choose a day of month, it might not be the proper day of week. Actually if you choose a day and a month that may not even exist (29th feb). An important note is that it looks for dates, not jsut seconds, so we will have to convert forward and beck from seconds to dates. We will use the candidate_t struct for that. Internally is just an array with 5 elements: year, month, day, hour, minute. It also provides some conversion methods to Unix timestamps and string for easy debugging.
Initially the algorithm had two phases:
  1. Propose a candidate for all constraints
  2. Check is valid
  3. If valid, convert to seconds, if not, advance and try again.
In this initial idea I try to propose a candidate that fits all the 7 constraints, and then check validity, but  something was missing. Actually a lot was missing, so I thought a lot on the problem and finally I fomalized some ideas:
  1. There are several checks to perform, at least one per constraint, but there are more as valid date (29th February). Actually almost all the checks are the same, to be in a valid range (0-59 for minutes, for example), except the day of week. 
  2. If a check fails it can increment a value to check again if it fails or not. Sometimes incrementing a value can make the range overflow (so we have to roll back to the minimum) and then the next element has to increase, so each check must have an "overflow" part. For minutes its hours, for hours its days, for days months, and for months years. 
  3. Some checks do not work just by incrementing. For example day of week; in this case all the previous checks are invalidated and we have to check all again. Actually this idea is also good for valid dates; this date is not valid (29th Feb), so increment the day and start again.
  4. There are some dates that can not be satisfied never: 2015-02-29 just don't exist. Then throw an exception. Also I set some upper limit to dates; in this case the year 2040.
  5. If all rules satisfy, then convert this date to seconds and return it.
  6. The initial condition is next minute, so calculate current minute and increment the minute.
Then the problem moves forward to create this checkers. Each check is a subclass of the abstract class Checker:

class Check {
 std::shared_ptr overflow_part; // For minutes is hours, so at minute overflow, increment hour.

 virtual bool valid(const candidate_t &candidate) = 0;
 virtual bool incr(candidate_t &candidate) = 0; // After incr, is candidate still valid, or should I look from the begining again.
 virtual std::string to_string() = 0;
 void set_overflow_part(std::shared_ptr prev){
We have three subclasses: InRange, DayOfWeek and ValidDate.

  • ValidDate ensures the 29th Feb problem and 31th of some months, taking into care the leap years.
  • DayOfWeek is not implemented yet, but should convert the date to the proper day of week (doomsday rule) and accept it or ask incr. Incr just increments the day.
  • InRange is the generic range: It receives a valid initial range, an specifier, the part of the date it changes and in a second round the overflow part. Just now understands just "*" for any or a specific number.
Once the checkers are all set up (173-189), then the main loop is run:

bool valid=true;
 for(auto &r: rules){
   if (!valid) // Start over again, this is normally not good day of week, or 30 feb style dates.
 if (candidate[0]>=max_year)

Is it efficient?

At first I thought that it could loop over all seconds and slowly increment minutes and so on.. so for lets say "0 0 1 1 * 2020" it will take around 189216000 loop cycles to find it... but it does not. At all. Actually as the constraints are fixed, then for each constraint if it satisfies it continues, if not, it sets to the minimum to it goes the minutes to 0, the hours to 0 and so on. So somehting like this rule take 1 cycle. Just one. Something more complex like: "* * 29 2 * *" takes just as many years to the next leap year (2 cycles for 2014, max is 4 cycles).
So yes. Its pretty efficient. Of course there are cases that may cycle many times, but normally it find the answer pretty fast.


With some basic blocks I could generate a basic cron schedule rule parser and system. It still has not all functionalities implemented, but the foundations are sound and no architecture problem should appear.
There are still many things that have rough edges and they will be polished in future versions.

The code is MPL 2.0, so use it in your own projects, but future version may change that to Apache2.

Please contact me in case of any questions.


Introducing underscore.hpp

Add some functional code to your C++!


Using underscore.hpp can save programming time and add memory requirements to your C++ projects. It allows easy list, string and file manipulation, using functional idioms.


Why should C++ be more difficult to program than Python, curly braces and syntax apart. Granted with Python we have list comprehensions, generators, and generator comprehension. But also we can just split a string, trim it and slice it.
With standard C++ this tasks are dificult, and with Boost it just gets a bit better.
I want to introducuce you underscore.hpp, and underscore.js inspired C++ library to easy the use of lists, files and strings.
It can be downloaded/contributed at https://github.com/davidmoreno/underscore.hpp
By lists I mean the abstract concept as something iterable, not specific std::list nor std::vector (actually internally std::vector has better status).

String example



using namespace std;
using namespace underscore;

int main(void){
 cout<<_("Hello world").replace(" ", ", ").lower()<<endl;
5 allocs, 5 frees, 173 bytes allocated

C++/std -- Not really equivalent


// From http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string
void myReplace(std::string& str, const std::string& oldStr, const std::string& newStr)
  size_t pos = 0;
  while((pos = str.find(oldStr, pos)) != std::string::npos)
     str.replace(pos, oldStr.length(), newStr);
     pos += newStr.length();

int main(void){
 std::string hello="Hello world";
 myReplace(hello," ",", ");
  std::begin(hello), std::end(hello),
  std::begin(hello), ::tolower);

4 allocs/frees, 136 bytes


print ( ', '.join( "Hello world".split(' ') ).lower() )
26,011 allocs, 20,713 frees, 4,977,609 bytes allocated

"hello, world".
This is a simple find and replace and then to lower. Nothing magic, just some candy for string manipulation.

What about lists?

Using python list manipulation is really simple, if you use list comprehensions suddently a huge class of problems become easy. In C++ we have similar algorithms that can be used, as the remove/erase idiom or std::transform, but they are convoluted to use (albeit extremely efficient). For example lets count the number of letters of each word of a list:



int main(void){
 std::cout<< '[' <<
  underscore::_("Hello, world! How do you do?")
   .split(' ')
   .map([](const std::string &s){
    return s.size();
   .join(", ")
29 alloc/frees, 841 bytes allocated



int main(){
 std::string hello="Hello, world! How do you do?";

 std::vector sizes;
 std::stringstream ss(hello);
 std::istream_iterator begin(ss);
 std::istream_iterator end;
 std::transform( begin, end,
  std::back_inserter(sizes), [](const std::string &w){
   return w.length();
 bool first=true;
 for(size_t s: sizes){
  if (first)
   std::cout<<", ";
9 allocs, 9 frees, 253 bytes allocated


print [len(x) for x in "Hello, world! How do you do?".split(' ')]
26,049 allocs, 20,751 frees, 4,972,941 bytes allocated
So here we get a pattern: python is easy, takes a lot of memory (and CPU time), C++/std is difficult but expremely efficient. Also as it is so difficult more time is epent o look fo the best way as the diference between doing it right or wrong is not important vs the time necesary to prepare the data.
Finally coding on C++/underscore.hpp is much easier than vanilla C++, and efficiency although lower is comparable to C++/std.


Lets create a map with the memory information on a Linux system. This will not use all the idioms, just try to do the task:



int main(void){
        std::map m;

        for(auto &l: underscore::file("/proc/meminfo")){
                auto p=l.split(':');
                auto n=p[1];
                if (n.endswith("kB"))
                if (n.length()>0)

512 allocs/frees, 25,316 bytes allocated

for l in open("/proc/meminfo").readlines():
        if n:
                m[p[0]]=int( n )
print m["MemFree"]
26,714 allocs, 21,416 frees, 5,020,289 bytes allocated

How it works internally

Underscore.hpp just encapsulates the data you give to it into a std container. If none is given it uses a std::vector internally. If you pass to it some existing container, its just a wrapper that gives methods to easy working. Again internally it tries to use std algorithms as much as possible, and so efficiency should be similar. But there is a big difference: all operations are non mutable. This means a lot of copies, intermediary lists, and higher memory usage but freedom on the types and thread safety, and easier reasoning.
So finally its a tradeoff: easier C++ development or more efficient C++ development.
I for one prefer to implement the easier underscore version and if needed go for the more efficient one later.

Future development

This is just the iceberg tip. With just the simple map and filter there are so many new things that suddenly are easy to do that just exploring this world is amazing. Adding string manipulation just makes these two worlds fit like a puzzle, as does the file manipulation. But there is still more work to do. First of all try to add concepts so the development when, for example the return type of a map is needed is not a problem to solve by itself, just a compilation problem.
Also the memory consumption as it creates the intermediary lists is too high. In the generator.hpp header there are some ideas to allow to have some kind of generators, so data is manipulated from stage to stage, but not copied into the final container until the last generator. The idea is instead of first doing and operation on all the elements, do first all the operations on the first element. In other words, we give up instruction cache for data cache. It should really improve everything. But its hard, so just now there are only some ideas there.
Also the recent range concepts blog post is inspiring, so it worths checking it out.


The code is apache2 licensed and can be used without any restriction. If you find that some functionality can be improved and more functionalities fit in there, please send me a line.


Dissecting onion FrameworkBench's implementation

In this article I will dissect the onion benchmark as used on the FameworkBench benchmark suite. The version I'm dissecting is at http://bit.ly/18tdbOj. Some bugs were found in the benchmark code when writing this article and these are commented as they are spotted. The version used for the benchmark round 4 is as described in this article, bugs included, with the exception of the new fortunes test, to be used at FrameworkBench round 5, yet to be released.


We will start our journey at main, line 263:
This is the main onion initialization. Onion users can have as many servers as desired, each one is created with onion_new, which receives one parameter that sets options for the server. Here we are using O_POOL for a pool mode: some threads are created, each waiting for a new request and serving the requests as needed. Other options are described at http://bit.ly/18tdTLi.

Next, we are initializing a test_data structure which is necessary to keep track of the database connections. As onion does not have an ORM (yet?), in this example to access the database we are using the MySQL C API, creating a fixed amount of connections. A semaphore is used to prevent the excess of simultaneous requests and a mutex to change the status of the connections if necessary. This simple operation is managed with functions get_connection and free_connection. These two functions need further improvement to avoid a possible congestion at database access.

Next, at 286, we are initializing a onion_dict with the "message" : "Hello world!" data.
onion_dict_add(data.hello,"message","Hello, world"0);
onion_dict_add final parameter decides how you use memory management. Users inform the dictionary if the second and third parameters are to be:
  1. used as straight pointers (no copy),
  2. copied,
  3. automatically deleted (give ownership).
  4. specific type for data, for example OD_DICT to add a dictionary.
Options 1 and 3 are not exclusive. Option 2 implies option 3 on the copied data. Default for option 4 is a text string.

These options are used all through onion to save memory and improve performance.

In the testing code, we are using 0 as flags, which means that the data can be used as passed, and does not need to be copied nor freed.

At 288 we are creating and setting the root handler.
onion_set_root_handler(o, onion_handler_new((void*)&muxer, (void*)&data, NULL));
As requests arrive to onion they are served by this root handler. It can have subhandlers for specific conditions; as the conditions are programmed in the handler itself, there are endless possibilities  as virtual host, ip discrimination, authentication handling, urls based on regex a-la Django (onion_url), or as done here for performance, manual checking of url paths. Check the documentation as the most common options are easily handled and normally user do not have to create their manual muxers, unless performance requires it. This root handler receives a data pointer, to the test_data structure, and does not require any destructor call. For any request it will call the mutex function.

Finally we start the listening, and after it we deinitialize all data. The listening will continue until onion_listen_stop is called, which is called on SIGINT and SIGTERM signals.


The muxer, at 277,  is the first example of a handler function. It receives some user data and then the request and the response.

/// Multiplexes to the proper handler depending on the path.
/// As there is no proper database connection pool, take one connection randomly, and uses it.
onion_connection_status muxer(struct test_data *data, onion_request *req, onion_response *res){
  const char *path=onion_request_get_path(req);
  if (strcmp(path, "")==0)
    return return_json(data->hello, req, res);
  if (strcmp(path, "json")==0)
    return return_json_libjson(NULL, req, res);

We are extracting the current path and comparing it to the known paths, calling the appropiate handler. If the path is empty, calls return_json, if it is json, return_json_libjson, and so on. Finally if none fits, an OCS_INTERNAL_ERROR is returned. Actually it could have been better to return OCS_NOT_PROCESSED, so if there are more handlers in the chain, next can try and so on, and if none at all processed it it would return a 404 not found (customizable). Next version may fix this.


The first implementation of json uses the internal onion_dict_to_json, that just converts a onion_dict to a JSON string. There is no, by the moment, support for lists, integers, nor json to dict. But for small json is more than enough. This has better performance than the libjson version, but its more limited.

We are creating an onion_block, which is a kind of variable length string, to print the json into. Then we are setting the content-type and length, and finally we are writing it into the response:
onion_response_set_header(res, "Content-Type","application/json");
onion_response_set_length(res, size);
onion_response_write(res, onion_block_data(bl), size);

Setting the length was advised as it helps the keep alive; but since a commit on 6 May, if the response is small a delayed header mechanism checks the size before sending anything and the content-length is automatically inserted. This helps to have keep alive even for request with unknown size.

After this the block is freed, and we return the marker that the request has been processed:


This version does as return_json, but uses libjson. It also creates the JSON object everytime it is called, and copies it to a string. Then the setting of header, length, and write is performed in the same way.


This test uses the MySQL C API as indicated before. At line 91:
const char *nqueries_str=onion_request_get_query(req,"queries");
int queries=(nqueries_str) ? atoi(nqueries_str) : 1;
We are getting the GET parameter "queries" and storing it into nqueries_str. If the query does not exist, onion_request_get_query will return NULL, and on next line, depending on this we will convert it to a integer or set it to 1.

On following lines it does the SQL query and create the json object. Then this object is converted to a string and sent as shown on the return_json_libjson example.


For the fortunes example there were quite a lot more requirements, as using a templating system, UTF-8, and HTML escaping. For HTML escaping part code was developed for onion, so now by default the templating system automatically escapes variables. There is no option, right now, to not escape variables.

Onion's templating system is otemplate. With it you can write normal HTML (or any text based code) with specific tags to allow inserting data from varaibles, looping, internationalization (i18n) and extending base templates. It is heavily based on Django's, and normally you will not note the difference on the template side, if use is kept to basics. Looking at the Makefile we can see the compilation of the template to C source code:
base_html.c: otemplate base.html 
onion/build/tools/otemplate/otemplate base.html base_html.c
The generated source file have several functions. For this example I decided to define the function I needed manually at line 120:
onion_connection_status fortunes_html_template(onion_dict *context, onion_request *req, onion_response *res);
This function receives a context dictionary and the normal request and response objects. It can be called at the end of your view function, and the context will be freed automatically. On the function itself we prepare the onion_dict, first creating a temporary struct for the data, with dynamic resizing, and filled using MySQL C API. Then we sort it as requested by the benchmark requirements, and prepare the dict itself. For the dict the OD_DICT flag is used to embed sub dicts. As onion_dict does not support arrays, the for loops on the template itself is on the values of the dicts, on the order set by the keys. For this reason we have to insert the values as { "00000001" : { "id" : "1", "message" : "...." } }, and so on.

When everything is ready we just call the fortunes_html_template; when it returns the dict will be automatically deallocated.

On the template side we use a base.html, with blocks and a title variable. This is extended at fortunes.html, where we loop over the fortunes and print them as requested:

{% for f in fortunes %}
{% endfor %}

Closing up

This is a dissection of the onion implementation of the benchmark used a ThemEmpower's BenchmarkTest, and as can be seen onion tries to help where possible to make a C web application as easy as possible to develop. Real power on onion is not performance, but ease of use. There are some parts that might need some more work, as the templating system, but just at is is now it creates an ease of use unseen before when creating HTTP servers with C. Also performance does not hurt, but that's a problem of onion's internals, not of the interface.

If there is some reader that would like some specifics to be more described, or have some ideas on how it might be better implemented, on this test or on onion itself, please leave your comments.


Benchmarking onion

Today a thought provoking benchmark was published at Hacker News: http://www.techempower.com/blog/2013/03/28/framework-benchmarks/. It compares several web frameworks and how they perform.

How well does onion perform?

I did a fast test program with onion to check how good is the performance compared to the other frameworks. The code is at https://gist.github.com/davidmoreno/5264730. Instructions to compile and execute the test are on the gist itself.

This test only checks the simple json code, and the database but using sqlite for 1 and 20 requests. Database dump is at http://www.coralbits.com/static/world.sql. As onion is not a framework, there are no facilities for using the database comfortably, just the plain sqlite3 API, which is OK depending what do you want to do.

The results are, on my Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz:

56548.61 JSON
8426.56 Sqlite 1 petition
515.42 Sqlite 20 petitions
This is just one execution, the laptop is just doing everything as normal, including spotify on wine playing the Inception soundtrack.

Graph from chartgo.

For comparison, I could only run the JSON test on the nodejs example they provide at https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/nodejs:

4983.75 JSON nodejs

As you can see onion has about 10x better performance than nodejs. Extrapolating the performance I get on nodejs, I should be able to get 17359.67 using netty on my laptop, the best in their benchmark. This means onion should be about 3.25 times faster than the best framework of these benchmark for this specific test. More work is needed on my computer to make the other tests work.

But onion is not a framework. Or at least not a complete one. It needs at least an ORM.

Does this makes sense?

Anyway I think comparing benchmarks this way misses several points, especially when it says that performance is the most important, not even talking about the ease of developing on such framework.

For me its more important the balance between the speed of development and performance on hardware, with normally more weight on the development speed. A full web application done in C o C++ will be really performant, but it will suffer higher development costs, development time and quite probably bugs. Finally it will be very expensive. On the other hand one made in Python or Ruby will be much easier to develop but will be a worse performer. It depends on the project that the balance should be on one side or the other. What you save on hardware you pay in developers. It depends on your size, budget, and scope whether you should go for one or another direction.

I would never ever use onion for a full web application development, but for specific services where performance is critical, its a very good option. Also on systems where a full framework is too much overhead, use of a leaner alternative like onion is the way to go.