2013-05-10

Dissecting onion FrameworkBench's implementation

In this article I will dissect the onion benchmark as used on the FameworkBench benchmark suite. The version I'm dissecting is at http://bit.ly/18tdbOj. Some bugs were found in the benchmark code when writing this article and these are commented as they are spotted. The version used for the benchmark round 4 is as described in this article, bugs included, with the exception of the new fortunes test, to be used at FrameworkBench round 5, yet to be released.

Initialization.

We will start our journey at main, line 263:
o=onion_new(O_POOL);
This is the main onion initialization. Onion users can have as many servers as desired, each one is created with onion_new, which receives one parameter that sets options for the server. Here we are using O_POOL for a pool mode: some threads are created, each waiting for a new request and serving the requests as needed. Other options are described at http://bit.ly/18tdTLi.

Next, we are initializing a test_data structure which is necessary to keep track of the database connections. As onion does not have an ORM (yet?), in this example to access the database we are using the MySQL C API, creating a fixed amount of connections. A semaphore is used to prevent the excess of simultaneous requests and a mutex to change the status of the connections if necessary. This simple operation is managed with functions get_connection and free_connection. These two functions need further improvement to avoid a possible congestion at database access.

Next, at 286, we are initializing a onion_dict with the "message" : "Hello world!" data.
onion_dict_add(data.hello,"message","Hello, world"0);
onion_dict_add final parameter decides how you use memory management. Users inform the dictionary if the second and third parameters are to be:
  1. used as straight pointers (no copy),
  2. copied,
  3. automatically deleted (give ownership).
  4. specific type for data, for example OD_DICT to add a dictionary.
Options 1 and 3 are not exclusive. Option 2 implies option 3 on the copied data. Default for option 4 is a text string.

These options are used all through onion to save memory and improve performance.

In the testing code, we are using 0 as flags, which means that the data can be used as passed, and does not need to be copied nor freed.

At 288 we are creating and setting the root handler.
onion_set_root_handler(o, onion_handler_new((void*)&muxer, (void*)&data, NULL));
As requests arrive to onion they are served by this root handler. It can have subhandlers for specific conditions; as the conditions are programmed in the handler itself, there are endless possibilities  as virtual host, ip discrimination, authentication handling, urls based on regex a-la Django (onion_url), or as done here for performance, manual checking of url paths. Check the documentation as the most common options are easily handled and normally user do not have to create their manual muxers, unless performance requires it. This root handler receives a data pointer, to the test_data structure, and does not require any destructor call. For any request it will call the mutex function.


Finally we start the listening, and after it we deinitialize all data. The listening will continue until onion_listen_stop is called, which is called on SIGINT and SIGTERM signals.

muxer

The muxer, at 277,  is the first example of a handler function. It receives some user data and then the request and the response.


/// Multiplexes to the proper handler depending on the path.
/// As there is no proper database connection pool, take one connection randomly, and uses it.
onion_connection_status muxer(struct test_data *data, onion_request *req, onion_response *res){
  const char *path=onion_request_get_path(req);
  if (strcmp(path, "")==0)
    return return_json(data->hello, req, res);
  if (strcmp(path, "json")==0)
    return return_json_libjson(NULL, req, res);
  [...]
  return OCS_INTERNAL_ERROR;
}

We are extracting the current path and comparing it to the known paths, calling the appropiate handler. If the path is empty, calls return_json, if it is json, return_json_libjson, and so on. Finally if none fits, an OCS_INTERNAL_ERROR is returned. Actually it could have been better to return OCS_NOT_PROCESSED, so if there are more handlers in the chain, next can try and so on, and if none at all processed it it would return a 404 not found (customizable). Next version may fix this.

return_json

The first implementation of json uses the internal onion_dict_to_json, that just converts a onion_dict to a JSON string. There is no, by the moment, support for lists, integers, nor json to dict. But for small json is more than enough. This has better performance than the libjson version, but its more limited.

We are creating an onion_block, which is a kind of variable length string, to print the json into. Then we are setting the content-type and length, and finally we are writing it into the response:
onion_response_set_header(res, "Content-Type","application/json");
onion_response_set_length(res, size);
onion_response_write(res, onion_block_data(bl), size);

Setting the length was advised as it helps the keep alive; but since a commit on 6 May, if the response is small a delayed header mechanism checks the size before sending anything and the content-length is automatically inserted. This helps to have keep alive even for request with unknown size.

After this the block is freed, and we return the marker that the request has been processed:
return OCS_PROCESSED;

return_json_libjson

This version does as return_json, but uses libjson. It also creates the JSON object everytime it is called, and copies it to a string. Then the setting of header, length, and write is performed in the same way.

return_db

This test uses the MySQL C API as indicated before. At line 91:
const char *nqueries_str=onion_request_get_query(req,"queries");
int queries=(nqueries_str) ? atoi(nqueries_str) : 1;
We are getting the GET parameter "queries" and storing it into nqueries_str. If the query does not exist, onion_request_get_query will return NULL, and on next line, depending on this we will convert it to a integer or set it to 1.

On following lines it does the SQL query and create the json object. Then this object is converted to a string and sent as shown on the return_json_libjson example.

fortunes

For the fortunes example there were quite a lot more requirements, as using a templating system, UTF-8, and HTML escaping. For HTML escaping part code was developed for onion, so now by default the templating system automatically escapes variables. There is no option, right now, to not escape variables.

Onion's templating system is otemplate. With it you can write normal HTML (or any text based code) with specific tags to allow inserting data from varaibles, looping, internationalization (i18n) and extending base templates. It is heavily based on Django's, and normally you will not note the difference on the template side, if use is kept to basics. Looking at the Makefile we can see the compilation of the template to C source code:
base_html.c: otemplate base.html 
onion/build/tools/otemplate/otemplate base.html base_html.c
The generated source file have several functions. For this example I decided to define the function I needed manually at line 120:
onion_connection_status fortunes_html_template(onion_dict *context, onion_request *req, onion_response *res);
This function receives a context dictionary and the normal request and response objects. It can be called at the end of your view function, and the context will be freed automatically. On the function itself we prepare the onion_dict, first creating a temporary struct for the data, with dynamic resizing, and filled using MySQL C API. Then we sort it as requested by the benchmark requirements, and prepare the dict itself. For the dict the OD_DICT flag is used to embed sub dicts. As onion_dict does not support arrays, the for loops on the template itself is on the values of the dicts, on the order set by the keys. For this reason we have to insert the values as { "00000001" : { "id" : "1", "message" : "...." } }, and so on.

When everything is ready we just call the fortunes_html_template; when it returns the dict will be automatically deallocated.

On the template side we use a base.html, with blocks and a title variable. This is extended at fortunes.html, where we loop over the fortunes and print them as requested:

{% for f in fortunes %}
<tr>
<td>{{f.id}}</td>
<td>{{f.message}}</td>
</tr>
{% endfor %}

Closing up

This is a dissection of the onion implementation of the benchmark used a ThemEmpower's BenchmarkTest, and as can be seen onion tries to help where possible to make a C web application as easy as possible to develop. Real power on onion is not performance, but ease of use. There are some parts that might need some more work, as the templating system, but just at is is now it creates an ease of use unseen before when creating HTTP servers with C. Also performance does not hurt, but that's a problem of onion's internals, not of the interface.

If there is some reader that would like some specifics to be more described, or have some ideas on how it might be better implemented, on this test or on onion itself, please leave your comments.

2013-03-28

Benchmarking onion

Today a thought provoking benchmark was published at Hacker News: http://www.techempower.com/blog/2013/03/28/framework-benchmarks/. It compares several web frameworks and how they perform.

How well does onion perform?


I did a fast test program with onion to check how good is the performance compared to the other frameworks. The code is at https://gist.github.com/davidmoreno/5264730. Instructions to compile and execute the test are on the gist itself.

This test only checks the simple json code, and the database but using sqlite for 1 and 20 requests. Database dump is at http://www.coralbits.com/static/world.sql. As onion is not a framework, there are no facilities for using the database comfortably, just the plain sqlite3 API, which is OK depending what do you want to do.

The results are, on my Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz:

56548.61 JSON
8426.56 Sqlite 1 petition
515.42 Sqlite 20 petitions
This is just one execution, the laptop is just doing everything as normal, including spotify on wine playing the Inception soundtrack.

Graph from chartgo.

For comparison, I could only run the JSON test on the nodejs example they provide at https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/nodejs:

4983.75 JSON nodejs

As you can see onion has about 10x better performance than nodejs. Extrapolating the performance I get on nodejs, I should be able to get 17359.67 using netty on my laptop, the best in their benchmark. This means onion should be about 3.25 times faster than the best framework of these benchmark for this specific test. More work is needed on my computer to make the other tests work.

But onion is not a framework. Or at least not a complete one. It needs at least an ORM.

Does this makes sense?


Anyway I think comparing benchmarks this way misses several points, especially when it says that performance is the most important, not even talking about the ease of developing on such framework.

For me its more important the balance between the speed of development and performance on hardware, with normally more weight on the development speed. A full web application done in C o C++ will be really performant, but it will suffer higher development costs, development time and quite probably bugs. Finally it will be very expensive. On the other hand one made in Python or Ruby will be much easier to develop but will be a worse performer. It depends on the project that the balance should be on one side or the other. What you save on hardware you pay in developers. It depends on your size, budget, and scope whether you should go for one or another direction.

I would never ever use onion for a full web application development, but for specific services where performance is critical, its a very good option. Also on systems where a full framework is too much overhead, use of a leaner alternative like onion is the way to go.

2013-03-19

Garlic automated testing

Automated testing in a sec.

Jenkins is a great piece of software. Its a great in many dimensions, one of them is size and complexity. It also uses Java which makes it even greater, or should I say bigger and slower?

For devices like Raspberry Pi, or limited resources servers as virtual machines, using Jenkins can mean a third of your available memory. And to set it running spending some time setting up Tomcat or similar, Apache or similar, then setting all up, users.. And finally use it! Actually its not a lot of work, but for my taste overkill for simple automated testing.

So for those use cases that Jenkins is overkill, I present Garlic.

https://github.com/davidmoreno/garlic

Garlic uses the onion http C library, and it developed using C++. Its actually very few lines of code, and can be improved in many ways, but does the job. Currently it allows only one testing project (although that will be changed as needed) and it uses only an external executable as tests. This will be normally a shell script that runs whatever programs do the real testing, and if the shell scripts succeeds, then the tests pass.

5 Minutes Long Howto

After downloading it fro github and compiling all that garlic needs for running is an ini config file, as for example this one from onion:
# Garlic test config file. Check http://github.com/davidmoreno/garlic

[global]
#username=coralbits
#password=coralbits

name=Garlic test onion

[server]
port=1234
address=0.0.0.0

[scripts]
on_back_to_normal=mail -s "[Onion testing] is back to normal ✓" dmoreno@coralbits.com
on_error=mail -s "[Onion testing] error ✗" dmoreno@coralbits.com

check=git fetch && [ $( git diff origin/master | wc -l ) = 0 ]
update=git pull

test=./auto.sh
It has very simple sections. Global can set the password and username, as it can only can have one (maybe in the future I will add pam support, to use local users). Also here the user adds a name for the server.

At server the user can set up the server address and port to use for the server. Maybe in the future SSL support can be added quite easily.

Finally on the scripts section user sets up scripts to be executed when the different events happen, as errors, back to normal, check, update and finally the test. Some of these are not yet implemented (I look at you update), and maybe in the future the format will change to allow several tests, and even several proyects.

But this is the main format. Check the github site for more info on each parameter.

Automatic testing and running

To run the server the user must run garlic pointing to the ini file:

./garlic ~/onion/build/test/onion.ini

To do the automated testing, system facilities must be used, as cron, adding a line like:

*/5 * * * * ~/garlic/build/src/garlic ~/onion/build/test/onion.ini --check-and-run

The --check-and-run command line option runs the scripts/check script, and if success (exit code 0), then runs the tests.

Contributions appreciated

Source code is MPL licensed: just use it.

Current source code is at github: https://github.com/davidmoreno/garlic

It is simple and functional, but more features can be added, as multiple tests, use PAM authentication, HTTPS support... Even use less the Unicode smileys as they dont display properly on every browser.

2012-11-27

Ajax file upload progress

For I project I work on, we needed file upload progress bars. This is a hell, as everybody who tried knows. First time I tried it used an external CGI helper that saved the file to my temporary file, with a special id, and using ajax I was polling for the progress. Inefficient, prone to errors, and even to fill your tmp.
Of course there are more complete solutions, but the basic idea on most of them is the same: upload it on one place, sneak into the progress using AJAX on another place.
But HTML5 was here to help us. Wasn't it?

Introducing HTML5 + jQuery + Javascript upload progress!

First the how to use it:
  1. Download it from https://github.com/davidmoreno/jquery-file-upload-progress
  2.  Load jQuery and uploadProgress.js on your html.
  3. Add this:
    $('form').uploadProgress( { onProgress: myprogressbar } )
  4. Setup your myprogressbar(progress) function, that will be called as file is uploaded.
  5. Done
This uploadProgress function might be something as simple as a jQuery UI progress bar, and then:
uploadProgress = function(p){
   $( ".selector" ).progressbar({ value: p });
}

How it works.

At birds view this plugin uploads the form using XHR, tracks the progress signal (new in HTML5), and when finished, if it cans, replaces current content with the XHR returned content.
Many, really many, things can fail when doing the upload, for example the replacement of the page fails, or the upload fails, or it succeds but returns an error page. And last, but not least, the progress signal might be missing. On all those situations we should at least make it work as in good ol' times, and just forget about the progress. I'm looking at you IE.

How it really works.

Hooking the progress.

jQuery does a great job allowing expansion and customization. We use this availability to replace the built in xhr method on the .ajax function to add the progress bar. It will call the opts.onProgress method with a percentage of completion.
It also hooks other methods to the AJAX query: onBeforeSend, onComplete, and onError.
Finally but not last, its important to send the proper data: new FormData(form); is your friend here.

Getting the answer.

As you know using XHR means AJAX, which in turn is not just loading a page. IT returns whatever this form wanted to do. Normally to show a new page. So I prepared a special function that takes that answer and just replaces current content with that. Its dirty, not always work, but normally is what you want: just show the new page. 
It can fail if the CSS is diferent (for example an error is returned), and on some other conditions, so some work could easily improve the behaviour, but normally just works. 
If it can not replace the contents, it just do a GET on the POSTed page. If your POST just changes the state of that page, and a reload shows that change (which is algo quite normal), it will also work.
This is implemented at line 18.
This are just the two options I'm using, but actually when you decorate your form with $.uploadProgress, you can also set an onComplete or onError to do different things when completed.

Pros & cons

Pros
  • Just works
  • Easily customizable
  • If something fails, normal post behaviour is preserved
  • Uses jQuery
Cons
  • Delicate, not work on IE
  • Hack to show AJAX sent post. May fail and not show progress.
  • Uses jQuery

Clone it!

To clone it:
git clone git@github.com:davidmoreno/jquery-file-upload-progress.git 

Pull request are welcome!

Thanks

2012-06-08

Patents, copyright and trademarks.

This is the first of two articles about licenses and copyright. This first is to set the differences between patents, copyright and trademark,
as these elements are quite important for the licenses, and not everybody differentiate them properly.


Disclaimer: IANAL. I Am Not a Lawyer. Specifically I am not your lawyer, and everything said here is purely informative and my opinion as a developer. If some of the stated data is wrong, or can be expanded, please use the comments and I will update the main article with the important bits.


Patents.

Patents are a monopoly right given by governments to inventors so that there is incentive to spend time and money inventing new things. The government gives you the right to decide who has the right to make new copies of your invention for a limited period of time.

For software its not 100% clear what is patentable, where and what it covers specifically. It changes from place to place, and although on the USA its possible to patent software, in the EU its not, although the European Patent Office is accepting submissions and granting them.

If somebody contributes to your software, he/she/it may or may not give you patent rights. This is quite important, and in doubt you have to ask, as you may get sued later.

Copyright.

Once you have some work done, you have the right to decide who can copy it or not.

Every single line of code has its own owner. Line nr 101 can be of A and 102 of B. Copyright can be transferred. When a programmer is hired to do some work for some company, the copyright if of whoever pays the developer to do it, unless otherwise agreed. It is as if he, as developer and owner of his lines, decides to give the copyright of them to the company for money.

If there is no copyright transfer, the authorship is of whoever wrote it, with the given license.

If somebody creates a code it decides the license of that snippet; if its incompatible with the license of the rest of the program, then there is no right to redistribute it without breaching the license. Because of this normally the license of the rest of the program is adopted, but not always. For example parts of the linux kernel are GPLv2 or above, other parts are just GPLv2, and some small bits are BSD. As not everything is "or above" then effectively the linux kernel is GPLv2, and should be redistributed by those terms.

If somebody cedes its ownership to another entity, that entity is the new owner. The owner of the code can relicenciate it.

Than means that if part of your program es GPLv2 and you redistribute it, and somebody do some collaboration adding some code, you can not relicenciate (to any license, including private ones) without this developer ceding his copyright. Copyright can be shared, so he still has the ownership, but you too.

That is the reason why Sun/Oracle asked to sign a copyright assignment document before accepting any colaboration to OpenOffice, as they were making their own private version, with some non public addons, and even providing it to IBM (IBM symphony). For this extra copyright assignment there were not so many developers, and some forks were made keeping, as could not be of any other way, the GPLv2 license. Finally LibreOffice was made with most of the old developers, and it did not ask any copyright assignments. Collaborations skyrocketed, although the first big change they made was code cleanup.


Trademarks.

Trademarks are the right to be identified with some name or logo.

Trademark owners decide who can use that trademark and for what. Some trademark owners are quite protective with their brands (most, actually), and other are more permissive given some limits (I can only think of Debian and Linux).

It does not matter which license you choose trademark is always on. For example firefox is MPL, and anybody can make a fork of it and add or remove features, but it can not be called Firefox. At most, maybe, you can say its based on Mozilla Firefox, but do it with care. Firefox is mozilla's only, and if you change just a line of code, then its not Firefox. For example Debian had to change their Firefox-based browser name to Iceweasel, as they apply security updates on their own pace, normally quite close to mainline, but sometimes faster. and most important they want to keep that right.

This way, even if your software is BSD and somebody just compiles it, it can not be promoted in anyway with your product name, unless you give them the right to do so, for example Xerox is in danger of loosing the trademark.
It is interesting that in some places if you brand becomes so known that it identifies a generic term, you may loose the trademark right.

Next

On next instalment I will review most common licenses: GPLv2, GPLv3, LGPL, AGPL, BSD, MPL and APL.


2011-11-30

Cooperative Oterm, part II: Sharing using UUIDs

On the previous instalment we saw how can several browsers access the same terminal, but it requires to log in with the proper user. Here we will discuss how to overcome such limitation, and security implications on the implemented method.

Using Universal Unique IDs.

First of all we should give each terminal instance a unique id into our system, so it does not matter from where you come, you can access to this terminal. It can be an internal identifier, random or not.

Also if we have a truly random, and very difficult to guess ID it can be used as "password" to access the terminal. This requires several things as a trusted network where third parties can not see what you write, and a channel to communicate such id. The first should be solved before using oterm as communication is through HTTPS; if you decide to use HTTP you already are giving out everything. For the passing of UUID user must use their own means, be it an email (encrypted), chat, or shouting out the 36 characters.

So we desire a truly random ID, which can not be guessed easily, which can not collide. So what better than UUIDs. Also linux is very kind and has a truly random UUID generator at /proc/sys/kernel/random/uuid.

Modifing the code

Once we have the idea clear we have to accomplish the changes. First instead of accessing the terminals at /term/[ID] we will access them at /uuid/[UUID], and remove the old interface. We could keep it, but there is no need for it. Then we added a dictionary to map the UUIDs to the process they control.

One very important point is authentication. Now there is no need to authenticate to use the UUID terminals, if you know the UUID, but if you dont know it you must be able to create new terminals. So now the PAM module protects not all the URL tree, but only /term/, as is here where we ask for the status of the sessions and create new terms. If you can not access some information dialog will tell you that you can not, and ask to login.

Also as we had to change a lot around to make this happend we updated the structure to use latest onion developments, and even created new developments to ease the use. So now we use better the url handlers,  opack with support for directory packing, and a little styling up of the presentation.

Also some pool mode bugs were solved, like when a connection timeout, it is removed and also needs to process some data.

How to share the terminal.

So now when user creates a new terminal after loging in, the terminal is accessed using a UUID, and this UUID does not need password to access, as knowledge of the UUID implies that access has been granted.

So if you want to share the terminal, just give the address, maybe changing the server name for the proper IP or domain name.

On next instalment we will see how to share the terminal on the wide internet using UPnP. It will not be coded in, but used externally, and will have minimal integration.


2011-11-02

Cooperative Oterm, part I: Sharing the ajax xterm.

Oterm, onion's web ajax-powered xterm emulator is getting cooperative mode. With it you could send a link to another user and connect straight to a Linux terminal. It will perform the sharing, chat and even NAT transversal where possible.

In these posts I will explain the steps that were performed to get to the cooperative Oterm.

Source code as it develops can be found at github.

The several readers one writer problem.
Shared terminal example
In order for several readers to be able to use the same terminal there has to be some mechanisms to allow to have several readers of the output of the executing program in the terminal.

Basically we have one buffer with the latest data from the terminal, and then several web readers each which can be on different position on the stream. It is important to note that each terminal normally joins in different points of the stream, so we must send the buffer contents so they can show something to the user, and then wait for new data as it appears, with possible delays meaning that one terminal gets data from one point and the others from another.

Internally Oterm does not use (yet?) websockets, just connects to one URL, the out stream, and writes posts to another, the in stream. The out stream is blocking, which means that the browser will ask for the URL but will not get the answer until some data is ready. This is done using one thread per connection.

Write to the internal buffer
So the first change needed was to change the behaviour of blocking on the client side until some data was ready, and make the spawned command (normally bash) to write when data is ready to an internal buffer. This is a circular buffer on which we just write at the end overwritting old data. The default size is 16KB (a terminal size of  120x136 full of data, for example).

To do this we used the new poller facilities, which make use of a thread of events which get woken up when new data is in a read stream. Perfect for this. Now when data is ready we signal using pthread conditions the waiting thread on the connection to write the available data.

So now we don't have a connection waiting for the data from the terminal, but for just a pthread condition.

Clients on different stream positions
Once we have the data in the buffer, when the client is waiting for data we must know from where it needs the data. We can not rely on having a server-side marker for each connection given the connection-less nature of HTTP, and doing tricks here might not be wise as we might lose some characters if some request is not properly sent. Also we don't know when a connection is not used any more, and using timers is not nice given that we block indefinitely.

So as solution of all this, we send at the end of every packet the current position on the stream. So now when the user ask for data it asks for data from a specific position. If there is data, the server returns it promptly. If there is not, then we wait on the condition for new data ready. For this we use a new control command specific for Oterm. Users nor applications should be aware of this as the emmiter is the server and the client intercepts it and interprets it.

We moved the session problem about from where on the buffer to ask to the client, where we have proper state tracking.

Future enhancements
On future articles I will talk about opening NAT for onion processes,  so user from outside of our network can access our Oterm. This is a security sensitive issue, as user may open its computer straight to the World Wide Web, but UPnP gives us tools, and in some situations its the perfect way to cooperatively work on a Linux terminal.