Introducing underscore.hpp

Add some functional code to your C++!


Using underscore.hpp can save programming time and add memory requirements to your C++ projects. It allows easy list, string and file manipulation, using functional idioms.


Why should C++ be more difficult to program than Python, curly braces and syntax apart. Granted with Python we have list comprehensions, generators, and generator comprehension. But also we can just split a string, trim it and slice it.
With standard C++ this tasks are dificult, and with Boost it just gets a bit better.
I want to introducuce you underscore.hpp, and underscore.js inspired C++ library to easy the use of lists, files and strings.
It can be downloaded/contributed at https://github.com/davidmoreno/underscore.hpp
By lists I mean the abstract concept as something iterable, not specific std::list nor std::vector (actually internally std::vector has better status).

String example



using namespace std;
using namespace underscore;

int main(void){
 cout<<_("Hello world").replace(" ", ", ").lower()<<endl;
5 allocs, 5 frees, 173 bytes allocated

C++/std -- Not really equivalent


// From http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string
void myReplace(std::string& str, const std::string& oldStr, const std::string& newStr)
  size_t pos = 0;
  while((pos = str.find(oldStr, pos)) != std::string::npos)
     str.replace(pos, oldStr.length(), newStr);
     pos += newStr.length();

int main(void){
 std::string hello="Hello world";
 myReplace(hello," ",", ");
  std::begin(hello), std::end(hello),
  std::begin(hello), ::tolower);

4 allocs/frees, 136 bytes


print ( ', '.join( "Hello world".split(' ') ).lower() )
26,011 allocs, 20,713 frees, 4,977,609 bytes allocated

"hello, world".
This is a simple find and replace and then to lower. Nothing magic, just some candy for string manipulation.

What about lists?

Using python list manipulation is really simple, if you use list comprehensions suddently a huge class of problems become easy. In C++ we have similar algorithms that can be used, as the remove/erase idiom or std::transform, but they are convoluted to use (albeit extremely efficient). For example lets count the number of letters of each word of a list:



int main(void){
 std::cout<< '[' <<
  underscore::_("Hello, world! How do you do?")
   .split(' ')
   .map([](const std::string &s){
    return s.size();
   .join(", ")
29 alloc/frees, 841 bytes allocated



int main(){
 std::string hello="Hello, world! How do you do?";

 std::vector sizes;
 std::stringstream ss(hello);
 std::istream_iterator begin(ss);
 std::istream_iterator end;
 std::transform( begin, end,
  std::back_inserter(sizes), [](const std::string &w){
   return w.length();
 bool first=true;
 for(size_t s: sizes){
  if (first)
   std::cout<<", ";
9 allocs, 9 frees, 253 bytes allocated


print [len(x) for x in "Hello, world! How do you do?".split(' ')]
26,049 allocs, 20,751 frees, 4,972,941 bytes allocated
So here we get a pattern: python is easy, takes a lot of memory (and CPU time), C++/std is difficult but expremely efficient. Also as it is so difficult more time is epent o look fo the best way as the diference between doing it right or wrong is not important vs the time necesary to prepare the data.
Finally coding on C++/underscore.hpp is much easier than vanilla C++, and efficiency although lower is comparable to C++/std.


Lets create a map with the memory information on a Linux system. This will not use all the idioms, just try to do the task:



int main(void){
        std::map m;

        for(auto &l: underscore::file("/proc/meminfo")){
                auto p=l.split(':');
                auto n=p[1];
                if (n.endswith("kB"))
                if (n.length()>0)

512 allocs/frees, 25,316 bytes allocated

for l in open("/proc/meminfo").readlines():
        if n:
                m[p[0]]=int( n )
print m["MemFree"]
26,714 allocs, 21,416 frees, 5,020,289 bytes allocated

How it works internally

Underscore.hpp just encapsulates the data you give to it into a std container. If none is given it uses a std::vector internally. If you pass to it some existing container, its just a wrapper that gives methods to easy working. Again internally it tries to use std algorithms as much as possible, and so efficiency should be similar. But there is a big difference: all operations are non mutable. This means a lot of copies, intermediary lists, and higher memory usage but freedom on the types and thread safety, and easier reasoning.
So finally its a tradeoff: easier C++ development or more efficient C++ development.
I for one prefer to implement the easier underscore version and if needed go for the more efficient one later.

Future development

This is just the iceberg tip. With just the simple map and filter there are so many new things that suddenly are easy to do that just exploring this world is amazing. Adding string manipulation just makes these two worlds fit like a puzzle, as does the file manipulation. But there is still more work to do. First of all try to add concepts so the development when, for example the return type of a map is needed is not a problem to solve by itself, just a compilation problem.
Also the memory consumption as it creates the intermediary lists is too high. In the generator.hpp header there are some ideas to allow to have some kind of generators, so data is manipulated from stage to stage, but not copied into the final container until the last generator. The idea is instead of first doing and operation on all the elements, do first all the operations on the first element. In other words, we give up instruction cache for data cache. It should really improve everything. But its hard, so just now there are only some ideas there.
Also the recent range concepts blog post is inspiring, so it worths checking it out.


The code is apache2 licensed and can be used without any restriction. If you find that some functionality can be improved and more functionalities fit in there, please send me a line.

No comments:

Post a Comment