Aug 21, 2013

HOWTO: compiling ffmpeg + x264 + MP3 + AAC + Xvid + AMR on Ubuntu 12.04

By popular demand (according to page view stats), I've decided to refresh the Dec'2008 post ("HOWTO: compiling ffmpeg + x264 + MP3 + Xvid + AMR on Ubuntu 8.x") for Ubuntu 12.04. Enjoy :)

ffmpeg is THE audio/video conversion tool. Unfortunately, the default build included in Ubuntu is usually quite outdated, as well as lacking support for many codecs.

The purpose of this article is to show you how you can build a fresh, up to date version of ffmpeg supporting (almost) all major codecs. This procedure was successfully performed on a vanilla Ubuntu 12.04 system.


0) Prerequisites

First, we need to enable the 'multiverse" repository in /etc/apt/sources.list, because that's where some of our external libraries live. All you have to do is
uncomment the line similar to:
add deb http://eu-west-1.ec2.archive.ubuntu.com/ubuntu/ precise multiverse sudo apt-get update

Don't forget to refresh the repos:
$ sudo apt-get update

Then, we need to make sure that git and svn are installed. We'll need them to fetch source code.
$ sudo apt-get install git subversion

Let's also add some mandatory build tools:
$ sudo apt-get install build-essential nasm pkg-config 

Now would be a good time to decide where you're going to build all those sources. Just create a temporary directory anywhere you like (you'll need about 150MB).

The x264 build requires a recent version of the yasm assembler. Let's take care of it:
$ sudo apt-get remove yasm 
$ wget http://www.tortall.net/projects/yasm/releases/yasm-1.2.0.tar.gz 
$ tar xvfz yasm-1.2.0.tar.gz 
$ cd yasm-1.2.0 
$ ./configure --prefix=/usr/local 
$ make 
$ sudo make install 
$ yasm --version
yasm 1.2.0 Compiled on Aug 21 2013. 

Done! One final note before we start building: if you have an existing installation of ffmpeg, you may run into linking issues caused by conflicting library versions. My advice is to remove all existing copies of libav* (libavcodec and so on) which may be present in /usr/lib, either by uninstalling them with APT or by deleting the .a and .so files.


1) Fetching the ffmpeg sources

First of all, let's get the latest ffmpeg source:
$ git clone git://source.ffmpeg.org/ffmpeg.git ffmpeg

Of course, you could just go ahead with configuremakemake install and be done with it. Unfortunately, it's not that simple. Go ahead, run configure:

$ cd ffmpeg
$./configure 
--prefix=/usr/local
output removed
Creating config.mak and config.h...

Take a closer look at the output, especially at the 'Enabled encoders' section. A number of major formats, like AAC, MP3, x.264 or XViD are missing. Can you live without these? Probably not...

Why, oh why are they missing? Take another look at the output of the configure command:
libfaac enabled no
libmp3lame enabled no
libx264 enabled no
libxvid enabled no

These encoders are missing because they're handled by external libraries which are not part of the ffmpeg source package. Chasing them all is a bit of a pain in the #$$ and hopefully this article will help!

2) Configuring ffmpeg... and failing

Let's go wild and enable these additional encoders: AAC, MP3, Theora, Vorbis, x264, xvid, AMR, GSM. Let's also build shared libraries, which are useful if you're running multiple copies of ffmpeg:

./configure --prefix=/usr/local --enable-gpl --enable-version3 --enable-nonfree --enable-shared --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libfaac --enable-libgsm --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid 

It will inevitably lead to something like:
ERROR: libfaac not found

That's ok, we haven't installed the external libraries required by our ffmpeg build. Let's get to it!

3) Installing AAC, MP3, Theora, Vorbis, AMR, GSM

These are readily available in the Ubuntu repository, let's add them:
$ sudo apt-get install libfaac-dev libmp3lame-dev libtheora-dev libvorbis-dev libopencore-amrnb-dev libopencore-amrwb-dev libgsm1-dev zlib1g-dev libgpac1-dev

4) Installing libx264

x264 is a free library for encoding H264/AVC video streams. It can be installed with APT using 'apt-get install libx264-dev' but let's make sure we have both the latest ffmpeg and the latest x264.

So, let's fetch the x264 sources and build them:
git clone git://git.videolan.org/x264.git
$ cd x264

$ ./configure --prefix=/usr/local --enable-shared
make
$ sudo make install


5) Installing libxvid

Let's fetch the xvid sources and build them:
$ wget http://downloads.xvid.org/downloads/xvidcore-1.3.2.tar.gz 
$ tar xvfz xvidcore-1.3.2.tar.gz 
$ cd xvidcore-1.3.2/build/generic 
$ ./configure --prefix=/usr/local 
$ make
$ sudo make install

6) Configuring ffmpeg... and succeeding!

We should have everything we need now. Let's try that configure command again:
./configure --prefix=/usr/local --enable-gpl --enable-version3 --enable-nonfree --enable-shared --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libfaac --enable-libgsm --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid
output removed
Creating config.mak, config.h, and doc/config.texi...

All right, let's build it. Time for coffee(s), this is a rather long build:
$ make
$ sudo make install
$ sudo ldconfig -v

Cool, huh? Now, let's check this new ffmpeg:
which ffmpeg
/usr/local/bin/ffmpeg

$ ffmpeg -encoders|grep -E "mp3|xvid|aac|gsm|amr|x264|theora|vorbis"
ffmpeg version N-55691-gca7f637 Copyright (c) 2000-2013 the FFmpeg developers
  built on Aug 21 2013 15:44:17 with gcc 4.6 (Ubuntu/Linaro 4.6.3-1ubuntu5)
  configuration: --prefix=/usr/local --enable-gpl --enable-version3 --enable-nonfree --enable-shared --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libfaac --enable-libgsm --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libxvid
  libavutil      52. 42.100 / 52. 42.100
  libavcodec     55. 29.100 / 55. 29.100
  libavformat    55. 14.100 / 55. 14.100
  libavdevice    55.  3.100 / 55.  3.100
  libavfilter     3. 82.100 /  3. 82.100
  libswscale      2.  5.100 /  2.  5.100
  libswresample   0. 17.103 /  0. 17.103
  libpostproc    52.  3.100 / 52.  3.100
 V..... libx264              libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (codec h264)
 V..... libx264rgb           libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 RGB (codec h264)
 V..... libxvid              libxvidcore MPEG-4 part 2 (codec mpeg4)
 V..... libtheora            libtheora Theora (codec theora)
 A..X.. aac                  AAC (Advanced Audio Coding)
 A..... libfaac              libfaac AAC (Advanced Audio Coding) (codec aac)
 A..... libopencore_amrnb    OpenCORE AMR-NB (Adaptive Multi-Rate Narrow-Band) (codec amr_nb)
 A..... libgsm               libgsm GSM (codec gsm)
 A..... libgsm_ms            libgsm GSM Microsoft variant (codec gsm_ms)
 A..... libmp3lame           libmp3lame MP3 (MPEG audio layer 3) (codec mp3)
 A..X.. vorbis               Vorbis
 A..... libvorbis            libvorbis (codec vorbis)

Congratulations, your brand new ffmpeg can now encode AAC, MP3, Theora, Vorbis, x264, xvid, AMR and GSM :)

7) Cleaning up (optional)
If like me you keep building the latest version, you will eventually end up with a lot of unnecessary libraries in /usr/local/lib. There's nothing really wrong with old versions, but for the sake of clarity, you may want to remove the old shared libraries, i.e. the ones NOT linked as lib*.so.

That's it for today :)

Arduino: LCD thermometer

I am truly and utterly in love with the Arduino project. There is no simpler and easier way to learn both programming and electronics. Believe me, the smartest gift for a 12-year old kid today is a C beginner book ("Beginning C with Arduino" is a perfect fit: read and approved!), an Arduino starter kit and some guidance to get going. If this had been around 30 years ago, a lot of innocent components would have been spared an untimely death at my n00b hands :D

I usually practice what I preach, so my own kids (10 and 12) are on their way. I can't believe how fast they learn, you should see reading schematics, wiring at light speed and correcting my fat finger mistakes: "Dad, you got this one wrong, it must go to the ground". Awesome.

Anyway, here's a cool little project I've built once I could actually lay my hands on the board again. This is a mix between two projects included in the starter kit, i.e. the Love-o-meter (temperature sensor) and the Crystal Ball (LCD screen). Fritzing schematic and code are on GitHub.

The TMP36 temperature sensor is connected to analog input A0. Its value is read every second by the program. The LCD screen is wired like in the Crystal Ball project, with a potentiometer to adjust brightness.

To avoid displaying "jumpy" temperatures, I'm storing and averaging the last 256 samples. This is implemented using a simple array-based circular buffer. Once the average temperature has been computed, it's displayed on the LCD screen.

A couple more things:

  • I'm defining a custom character (°), using a 5x7 bitmap.
  • I'm using dtostrf() to convert a float into a string. Normally, sprintf() would be used, but the Arduino library doesn't provide float support in sprintf().

Here's the full code.



Till next time... Hack at the world!

Aug 17, 2013

Node.js, part 5.1: don't you C?

Leslie Lamport once said (before many of you were even born): "You know you have a distributed system when the crash of a computer you've never heard of stops you from getting any work done". The Internet sure proved him right, as demonstrated again last night by Google and their global worldwide crash...

However, when distributed systems work as intended, interesting things do happen. For instance, if one was so inclined, one could reimplement a server in a different language while preserving interfaces and things-would-still-work.

And so, as I sipped my morning coffee, I became very much fucking inclined indeed to rewrite yesterday's Node.js zeromq server in C. Call it my stupid idea of the day... or just an interesting comparison of two pieces of code standing 40 years apart :) In the real world, one good reason to do this would be to improve performance of critical parts of your application: divide, rewrite, conquer!

Here's the ANSI89-compliant code (also on Github), the only deviance being the use of snprintf() which only made it into C99 :)



Easy to build (no Makefile, no Maven and its XML crap):
ubuntu@ip-10-35-165-93:~$ gcc -ansi -pedantic -Wextra -Wall -o worker worker.c -lzmq
worker.c: In function ‘main’:
worker.c:29:2: warning: format ‘%d’ expects argument of type ‘int’, but argument 3 has type ‘size_t’ [-Wformat]
worker.c:34:2: warning: implicit declaration of function ‘snprintf’ [-Wimplicit-function-declaration]

Let's start this C worker and hit it with some requests coming from our Node.js app:
ubuntu@ip-10-35-165-93:~$ ./worker 
Worker ready on tcp://*:3000
received some stuff 15, size=2
sent result 30
received some stuff 8, size=1
sent result 16

Nice. In all fairness, yesterday's Node.js code is quite easier to read. I did add more error management clutter in the C program, but doing without it is not an option. Love them or hate them, modern languages will spit a stack trace out when they crash: most of the time, this is enough to start investigating. On the other hand,  C will say : "core dumped" and leave you crying all the way home: so yes, you HAVE to write more C code to get roughly the same level of information. You can also see the immense power of toString() and dynamic typing, although the latter does introduces huge caveats.

So, what did we learn? Breaking down your app not only allows you to scale it piece by piece: it also gives you the freedom to rewrite critical parts, whatever your motivation is. To me, this is true modularity, something that I've rarely seen in so-called modular architectures based on Java or C#. Having classes is nice, but services are what you really need.

And yeah, for ultimate control and performance, C cannot be defeated. I'm quite scared to see it disappear from many curriculums. Kids, listen to the old man: you must learn C. You'll thank me later.

End of sermon. Now go in peace and hack at the world!

Aug 16, 2013

Node.js, part 5: Zero the hero!

And you thought I was on holiday?

Time to hack our Node app again and introduce another great piece of software called zeromq. On top of having one of the coolest logos, zeromq is a high-performance, low-latency messaging system that you simply cannot afford to ignore when you need to build highly scalable apps (IMHO). Other contenders would be RabbitMQ (awesome and rock-solid, but not really lightweight) and NSQ (written by the bit.ly folks).

The basic idea will be to have our web app call a remote server through zeromq. The remote server will perform a super fancy calculation and return it back to the web app. I can feel that you're excited already, so let's get started :*)

Both our zeromq machines (the client and the server) require a little bit of setup: on top of Node.js and npm (as described in previous posts), we need to install the zeromq library. Once again, the one available in the Ubuntu repo is way too old, so let's build a fresh version:

$ wget http://download.zeromq.org/zeromq-3.2.3.tar.gz
$ tar xvfz zeromq-3.2.3.tar.gz
$ cd zeromq-3.2.3
$ ./configure
make
$ sudo make install

Now, let's add the zeromq bindings for Node.js, namely the zeromq.node library aka zmq:

$ sudo apt-get install build-essential (required by the zmq install)
$ npm install zmq

Now we're ready to roll. Let's keep it simple and use a nice request-reply pattern: the web app will send a request to the remote server and get a response back. Zeromq supports many other patters, you'll find more info in the official documentation. Definitely worth a read, zeromq is a superb piece of software engineering art. Lightweight, simple, elegant... ah, love at first sight.

But I digress (that fine bottle of Leffe, probably). Let's look at our zeromq server first.

BTW, since I guess it's time to stop mourning SCCS ;), I've created a proper Github account where you'll find all pieces of code posted on this blog. Can't say I love Git dearly, but a little progress every now and then is okay, I guess.



Nice and short, huh? Create a 'reply' socket, listen on port 3000/TCP and send back twice was we got. And yes, everything in life is an 'int', so there.

Now, let's look at the corresponding part in the client, i.e. our web app:



Just as simple. Create a 'request' socket, bind it to our remote server and, in a separate function, fire some data, read the answer and display it.

Here's the full code:



Logs (or it didn't happen, right?): display all entries and get one id twice

local0.info<134>: Aug 16 17:29:52 10.37.166.19 {"message":"*** NODE APP STARTED ***"}
local0.info<134>: Aug 16 17:29:52 10.37.166.19 {"message":"Connected to zeromq server"}
local0.info<134>: Aug 16 17:29:52 10.37.166.19 {"message":"Web server started"}
local0.info<134>: Aug 16 17:29:52 10.37.166.19 {"message":"Connected to memcache"}
local0.info<134>: Aug 16 17:29:52 10.37.166.19 {"message":"Connected to database"}
local0.info<134>: Aug 16 17:30:04 10.37.166.19 {"message":"Request received, id=0"}
local0.info<134>: Aug 16 17:30:04 10.37.166.19 {"message":"Finding all documents"}
local0.info<134>: Aug 16 17:30:12 10.37.166.19 {"message":"Request received, id=51e3ce08915082db3df32c08"}
local0.info<134>: Aug 16 17:30:12 10.37.166.19 {"message":"Cache miss, key 51e3ce08915082db3df32c08. Querying..."}
local0.info<134>: Aug 16 17:30:12 10.37.166.19 {"message":"Item found: {\"_id\":\"51e3ce08915082db3df32c08\",\"x\":25}"}
local0.info<134>: Aug 16 17:30:12 10.37.166.19 {"message":"Zeromq request: 25"}
local0.info<134>: Aug 16 17:30:12 10.37.166.19 {"message":"Zeromq response: 50"}
local0.info<134>: Aug 16 17:30:12 10.37.166.19 {"message":"Stored key=51e3ce08915082db3df32c08, value=25"}
local0.info<134>: Aug 16 17:30:15 10.37.166.19 {"message":"Request received, id=51e3ce08915082db3df32c08"}
local0.info<134>: Aug 16 17:30:15 10.37.166.19 {"message":"Cache hit,  key=51e3ce08915082db3df32c08, value=25"}
local0.info<134>: Aug 16 17:30:15 10.37.166.19 {"message":"Zeromq request: 25"}
local0.info<134>: Aug 16 17:30:15 10.37.166.19 {"message":"Zeromq response: 50"}

Let's take a few steps back and look at what we built over the last posts:


  • a simple Node.js web app...
  • fetching data from an Elasticache cluster...
  • fed by a MongoDB server...
  • logging remotely to a syslog server...
  • calling a remote server through zeromq...
  • all of it open-source...
  • all of it cloud-based...
  • each part easily scalable, independently from the others.
And above everything, very little code: 159 lines (comments included), with the help of a few well-chosen libraries. 

Makes me want to be a developer again. Food for thought ;)

Till next time, my friends.

PS: In case you were wondering "Zero the hero" is a Sabbath song, although one could argue that the Cannibal Corpse cover is actually quite better.

Aug 5, 2013

Node.js, part 4: the Big Kahuna syslog!

To paraphrase the immortal Samuel Jackson in Pulp Fiction: "Logs! The cornerstone of any nutritious application!". And my Node.js app doesn't have any! As Mr Brando once said "Ah, the horror, the horror!". At the time, he probably didn't worry about log management, but the quote is definitely relevant for that too :D

I'm an old fashioned guy, so the first thing that comes to my mind is : "syslog". We could certainly find fancier ways (yes, I'm looking at you, scala-*), but syslog will do for now.

Once again, I'll use an Ubuntu 12.04 instance in EC2, which happens to have a running rsyslogd :) Let's take a look at the main config file, /etc/rsyslogd.conf.

We'll use UDP to ship our logs. However, because of this unbelievable 2-year old bug, we can't use port 514 (the default port for syslog): we'll use 5140 instead, don't forget to open 5140/UDP in the security group of the EC2 instance!

# provides UDP syslog reception
$ModLoad imudp
$UDPServerRun 5140

Let's now look at the default rules, located in /etc/rsyslog.d/50-default.conf. Let's keep it very simple and make sure we log everything. We'll also add a new template, because the default one misses some useful information, such as priority (duh!).

$template TraditionalFormatWithPRI,"%pri-text%: %timegenerated% %HOSTNAME% %syslogtag%%msg:::drop-last-lf%\n"
*.*    /var/log/messages;TraditionalFormatWithPRI

I hear my sysadmin team booing the ugliness of this setup. Yeah, I love you too, bitchez :-P

Anyway, this should be enough, now we just need to restart rsyslogd:
ubuntu@ip-10-34-245-176:~$ sudo service rsyslog restart
rsyslog stop/waiting
rsyslog start/running, process 4639

Now, let's move back to our Node.js instance and try to talk to our syslog server:
ubuntu@ip-10-37-166-19:~$ echo "<14>Node sez: I see you"|nc -u -v -w0 10.34.245.176 5140

If you tail /var/log/messages on the syslog server, the last line should say:
user.info<14>: Aug  5 14:40:18 Node sez: I see you

OK. Now that we have a running syslog server, how is the Node.js app going to send it data? The good news is that there's a large number of packages to do this. The bad news is, most of them don't seem to work very well... 

Ain2 worked right out of the box. If you're only looking for a syslog module, I would recommend this one. However, I ended up picking winston, an interesting logging framework supporting many different transports  (file, http, several databases, etc) including syslog through the winston-syslog module.

In theory, at least. For the life of me, I couldn't get this thing to work, despite a lot of creative hacking and creative swearing too. Fortunately, winston-syslogUdp works: "It was created when the authors experienced challenges with winston-syslog". Yeah, tell me about it! Good job, guys.

winston requires a newer version of Node.js than the one available in the Ubuntu repo, so let's take care of this upgrade first:
ubuntu@ip-10-37-166-19:~$ sudo add-apt-repository ppa:chris-lea/node.js
ubuntu@ip-10-37-166-19:~$ sudo apt-get update
ubuntu@ip-10-37-166-19:~$ sudo apt-cache show nodejs|grep Version
Version: 0.10.15-1chl1~precise1
Version: 0.6.12~dfsg1-1ubuntu1
ubuntu@ip-10-37-166-19:~$ sudo apt-get install nodejs=0.10.15-1chl1~precise1
ubuntu@ip-10-37-166-19:~$ node -v
v0.10.15

Now, let's update the Node.js packages and install the winston packages:
ubuntu@ip-10-37-166-19:~$ npm update
ubuntu@ip-10-37-166-19:~$ npm install winston winston-syslogudp

After some more hacking and swearing (but no drinking, I promise), I adapted my program to use winston, with the following setup:
  • don't log to the console 
  • log to a local file
  • log to a remote syslog server
Here how you do this, it's really simple:
// Don't log to console
winston.remove(winston.transports.Console);
// Log to local file
winston.add(winston.transports.File, { filename: 'log.txt' });
// Log to remote syslog
winston.add(winston.transports.Syslog, {"host":"10.34.245.176", "port":"5140", "localhost":"10.37.166.19"});

This is quite self-explanatory, as you will see below. However, winston can also do some pretty advanced stuff, I encourage you to read the documentation and go log-crazy ;)

While I was at it, I cleaned the code a little bit and also solved the MongoDB ObjectId validation issue like this. Not sure if it's the canonical way, but it works.
// Check that the ObjectId is valid
try {
    oid = new require('mongodb').ObjectID.createFromHexString(query.id);
}
catch (err) {
    winston.info("Invalid OID, err="+err);
    response.end();
    return;
}

Here's the full code.



Let's run this, hit it with some requests and look at the logs :)

ubuntu@ip-10-37-166-19:~$ node www5.js &
ubuntu@ip-10-37-166-19:~$ tail -f log.txt 
{"level":"info","message":"*** NODE APP STARTED ***","timestamp":"2013-08-05T15:07:36.678Z"}
{"level":"info","message":"Web server started","timestamp":"2013-08-05T15:07:36.823Z"}
{"level":"info","message":"Connected to memcache","timestamp":"2013-08-05T15:07:36.850Z"}
{"level":"info","message":"Connected to database","timestamp":"2013-08-05T15:07:36.876Z"}

ubuntu@ip-10-34-245-176:~$ tail -f /var/log/messages 
local0.info<134>: Aug  5 15:07:34 10.37.166.19 {"message":"*** NODE APP STARTED ***"}
local0.info<134>: Aug  5 15:07:34 10.37.166.19 {"message":"Web server started"}
local0.info<134>: Aug  5 15:07:34 10.37.166.19 {"message":"Connected to memcache"}
local0.info<134>: Aug  5 15:07:34 10.37.166.19 {"message":"Connected to database"}

GET http://ec2-54-247-53-161.eu-west-1.compute.amazonaws.com:8080/?id=0
local0.info<134>: Aug  5 15:11:51 10.37.166.19 {"message":"Request received, id=0"}
local0.info<134>: Aug  5 15:11:51 10.37.166.19 {"message":"Finding all documents"}

GET http://ec2-54-247-53-161.eu-west-1.compute.amazonaws.com:8080/?id=51e3ce08915082db3df32bf7
local0.info<134>: Aug  5 15:12:19 10.37.166.19 {"message":"Request received, id=51e3ce08915082db3df32bf7"}
local0.info<134>: Aug  5 15:12:19 10.37.166.19 {"message":"Cache miss, key 51e3ce08915082db3df32bf7. Querying..."}
local0.info<134>: Aug  5 15:12:19 10.37.166.19 {"message":"Item found: {\"_id\":\"51e3ce08915082db3df32bf7\",\"x\":8}"}
local0.info<134>: Aug  5 15:12:19 10.37.166.19 {"message":"Stored key=51e3ce08915082db3df32bf7, value=8"}

The same one, right away:
local0.info<134>: Aug  5 15:12:21 10.37.166.19 {"message":"Request received, id=51e3ce08915082db3df32bf7"}
local0.info<134>: Aug  5 15:12:21 10.37.166.19 {"message":"Cache hit,  key=51e3ce08915082db3df32bf7, value=8"}

The same one, more than 60 seconds later :)
local0.info<134>: Aug  5 15:14:15 10.37.166.19 {"message":"Request received, id=51e3ce08915082db3df32bf7"}
local0.info<134>: Aug  5 15:14:15 10.37.166.19 {"message":"Cache miss, key 51e3ce08915082db3df32bf7. Querying..."}
local0.info<134>: Aug  5 15:14:15 10.37.166.19 {"message":"Item found: {\"_id\":\"51e3ce08915082db3df32bf7\",\"x\":8}"}
local0.info<134>: Aug  5 15:14:15 10.37.166.19 {"message":"Stored key=51e3ce08915082db3df32bf7, value=8"}

Now, let's try a bogus ObjectID:
GET http://ec2-54-247-53-161.eu-west-1.compute.amazonaws.com:8080/?id=666
local0.info<134>: Aug  5 15:17:22 10.37.166.19 {"message":"Request received, id=666"}
local0.info<134>: Aug  5 15:17:22 10.37.166.19 {"message":"Invalid OID, err=Error: Argument passed in must be a single String of 12 bytes or a string of 24 hex characters"}

Woohoo! This doesn't crash the app anymore. By the way, if you use Chrome, you certainly see a lot of annoying lines saying:
local0.info<134>: Aug  5 15:14:15 10.37.166.19 {"message":"Request received, id=undefined"}
You can ignore them: this is just Chrome trying over and over and over again to get the favorite icon (favicon.ico)... This is a known bug.

And so now, our web app haz logz and at 131 lines of code (comments included), it's not bloated either. Pretty amazing.

That's it for today. Keep hacking :)

Aug 3, 2013

Node.js + MongoDB, part 3: exit memcached, enter ElastiCache

In a previous post, we added caching to our Node.js + MongoDB web application, thanks to an EC2 instance running memcached.

As you may know, AWS support its own brand of caching, aka ElastiCache: "Amazon ElastiCache is protocol-compliant with Memcached, a widely adopted memory object caching system, so code, applications, and popular tools that you use today with existing Memcached environments will work seamlessly with the service".

Seamlessly? Hmmm. Alright, let's take a look :)

First, let's use the AWS console to create an ElastiCache cluster. We'll start small, with 1 single micro node supporting a little over 200MB of cache. More than we need... and for free :)

Wait a couple of minutes and the cluster will come online. Let's check that the single node is accessible from our EC2 instance (no need to mess with the security group, port 11211 is open by default):
ubuntu@ip-10-48-249-35:~$ echo stats| nc memcache.9vficl.0001.euw1.cache.amazonaws.com 11211

This looks like memcached stats alright. Ok, now let's change a single line in our app, to use ElastiCache instead of our homemade instance:

// MemcacheClient = new mc.Client("10.234.177.74");
MemcacheClient = new mc.Client("memcache.9vficl.0001.euw1.cache.amazonaws.com");

Now, let's run the app:
ubuntu@ip-10-48-249-35:~$ node www4.js 
Web server started
Connected to memcache
Connected to database

Nice. Time for queries and console output!
Mac:~ julien$ curl http://ec2-54-216-3-139.eu-west-1.compute.amazonaws.com:8080/?id=51e3ce08915082db3df32bfc

Request received, id=51e3ce08915082db3df32bfc
Cache miss, key 51e3ce08915082db3df32bfc. Querying...
Item found: {"_id":"51e3ce08915082db3df32bfc","x":13}
Stored key=51e3ce08915082db3df32bfc, value=13

Let's replay the request quickly:
Request received, id=51e3ce08915082db3df32bfc
Cache hit,  key=51e3ce08915082db3df32bfc, value=13

And once more after 60 seconds (the item TTL):
Request received, id=51e3ce08915082db3df32bfc
Cache miss, key 51e3ce08915082db3df32bfc. Querying...
Item found: {"_id":"51e3ce08915082db3df32bfc","x":13}
Stored key=51e3ce08915082db3df32bfc, value=13

Let's stop the web app and look at the cache stats:
ubuntu@ip-10-48-249-35:~$ echo stats| nc memcache.9vficl.cfg.euw1.cache.amazonaws.com 11211|grep [g,s]et
STAT cmd_get 3
STAT cmd_set 2
STAT cmd_config_get 134
STAT cmd_config_set 1
STAT get_hits 1
STAT get_misses 2

God knows that 'seamlessly' is one of the most overused words in the IT business, but I've got to admit, this is truly seamless :)

As usual with AWS, you also get a bunch of graphs out of the box. That's a nice plus.


So that's the "cache" part of ElastiCache and it does what it's supposed to do. What about the 'elastic' part'?

Adding more nodes to the cluster is literally one click away: "Add node" :) After a few minutes, the new node is added to cluster.

Let's update our web app and relaunch it:
// MemcacheClient = new mc.Client("10.234.177.74");
MemcacheClient = new mc.Client(["memcache.9vficl.0001.euw1.cache.amazonaws.com", "memcache.9vficl.0002.euw1.cache.amazonaws.com"]);

ElastiCache has its own client, which allows the app to auto-discover new nodes without having to restart. Very nice, but the client is only available for Java (grumble) and PHP (bleh). Hopefully other languages are on the way.

Now, how about we hit the 2-node cluster with a little more traffic? This ugly script will do the trick:



Let's run this in 2 or 3 local shells and wait for a little while. Coffee break :)

Alright, what do the graphs say?






In the beginning, we see some traffic on node 1 only, then some on node2 only, as I tested them independently. Then, I updated the app to support both nodes and I restarted it. As you can see, traffic is pretty well balanced between the two servers, which means that the mc client works :)

One last thing to conclude: going back to the web app console output, we can clearly see the parallel asynchronous nature of Node.js at work.

Stored key=51e3ce08915082db3df32bfd, value=14
Stored key=51e3ce08915082db3df32bff, value=16
Stored key=51e3ce08915082db3df32bfe, value=15
Request received, id=51e3ce08915082db3df32bf0
Request received, id=51e3ce08915082db3df32bf2
Request received, id=51e3ce08915082db3df32bf1
Cache hit,  key=51e3ce08915082db3df32bf0, value=1
Cache hit,  key=51e3ce08915082db3df32bf2, value=3
Cache hit,  key=51e3ce08915082db3df32bf1, value=2
Request received, id=51e3ce08915082db3df32bfb
Cache hit,  key=51e3ce08915082db3df32bfb, value=12
Request received, id=51e3ce08915082db3df32bfd
Request received, id=51e3ce08915082db3df32bfc
Cache hit,  key=51e3ce08915082db3df32bfc, value=13
Request received, id=51e3ce08915082db3df32bf4
Request received, id=51e3ce08915082db3df32bf3
Cache hit,  key=51e3ce08915082db3df32bf3, value=4
Request received, id=51e3ce08915082db3df32bf5
Cache hit,  key=51e3ce08915082db3df32bfd, value=14

The two lines I've highlighted above come from the processing of the same HTTP request. However, a lot of stuff happened in the middle because other requests. Did I create threads? No. Did I mess with 'worker pools' or similar atrocities? No. I just wrote my code and let the framework do the rest.

That's it for today. Next time, we'll look at... ah, you'll see :)

Happy hacking.

Node.js + MongoDB, part 2: here comes memcached!

Let's elaborate on the Node.js + MongoDB app. I lied: it's not really worth $1 billion... yet. It surely will once we've added memcached to the mix ;)

Once again, we'll use a couple of EC2 instances running Ubuntu 12.04: one for the Node.js web server and one for the memcached server. MongoDB will still be served from MongoLab.

Start your instances and let's configure our memcached server first. Since it requires port 11211 to be open, we have to add a couple of rules for 11211/UDP and 11211/TCP in the security group attached to the instance.

Then, let's install memcached:
ubuntu@ip-10-234-177-74:~$ sudo apt-get install memcached

We also need to edit /etc/memcached.conf in order to set the '-l' parameter to the correct IP address (running ifconfig eth0 will confirm the right one to use). In my case, it is 10.234.177.74.

Then, we need to restart memcached:

ubuntu@ip-10-234-177-74:~$ sudo service memcached restart

Now, let's go to the Node.js instance and check that we can access the memcached server:
ubuntu@ip-10-48-161-115:~$ echo stats|nc 10.234.177.74 11211

If you see a lot of stats like I do, you're good to go. If not, double-check the steps above (rules, config file, restart).

Now, let's install the memcached client for Node.js with npm, the Node.js package manager. There are several clients out there, mc looks pretty good and well-maintained :)

ubuntu@ip-10-48-161-115:~$ npm install mc

That's it. Now, let's write some code, yeah! Here's the idea:
  • call the web server with a MongoDB ObjectId, e.g. http://ec2-54-216-3-139.eu-west-1.compute.amazonaws.com:8080/?id=51e3ce08915082db3df32bf7
  • query the memcached server
  • if we hit, job done!
  • if we miss, query the MongoDB server and update the cache


Alright, let's run this and hit it with some requests :
Mac:~ julien$ curl http://ec2-54-216-3-139.eu-west-1.compute.amazonaws.com:8080/?id=0
{"_id":"51e3ce08915082db3df32bf0","x":1}
{"_id":"51e3ce08915082db3df32bf1","x":2}
{"_id":"51e3ce08915082db3df32bf2","x":3}
{"_id":"51e3ce08915082db3df32bf3","x":4}
{"_id":"51e3ce08915082db3df32bf4","x":5}
{"_id":"51e3ce08915082db3df32bf5","x":6}
{"_id":"51e3ce08915082db3df32bf6","x":7}
{"_id":"51e3ce08915082db3df32bf7","x":8}
{"_id":"51e3ce08915082db3df32bf8","x":9}
{"_id":"51e3ce08915082db3df32bf9","x":10}
{"_id":"51e3ce08915082db3df32bfa","x":11}
{"_id":"51e3ce08915082db3df32bfb","x":12}
{"_id":"51e3ce08915082db3df32bfc","x":13}
{"_id":"51e3ce08915082db3df32bfd","x":14}
{"_id":"51e3ce08915082db3df32bfe","x":15}
{"_id":"51e3ce08915082db3df32bff","x":16}
{"_id":"51e3ce08915082db3df32c00","x":17}
{"_id":"51e3ce08915082db3df32c01","x":18}
{"_id":"51e3ce08915082db3df32c02","x":19}
{"_id":"51e3ce08915082db3df32c03","x":20}
{"_id":"51e3ce08915082db3df32c04","x":21}
{"_id":"51e3ce08915082db3df32c05","x":22}
{"_id":"51e3ce08915082db3df32c06","x":23}
{"_id":"51e3ce08915082db3df32c07","x":24}
{"_id":"51e3ce08915082db3df32c08","x":25}

Mac:~ julien$ curl http://ec2-54-216-3-139.eu-west-1.compute.amazonaws.com:8080/?id=51e3ce08915082db3df32bfc
{"_id":"51e3ce08915082db3df32bfc","x":13}

Console output:
Request received, id=51e3ce08915082db3df32bfc
Cache miss, key 51e3ce08915082db3df32bfc. Querying...
Item found: {"_id":"51e3ce08915082db3df32bfc","x":13}
Stored key=51e3ce08915082db3df32bfc, value=13

Memcached stats:
ubuntu@ip-10-234-177-74:~$ echo stats|nc 10.234.177.74 11211|grep [s,g]et
STAT cmd_get 1
STAT cmd_set 1
STAT get_hits 0
STAT get_misses 1

Let's try the same request again (within 60 seconds!): +1 get, +1 hit!
Request received, id=51e3ce08915082db3df32bfc
Cache hit,  key=51e3ce08915082db3df32bfc, value=13

STAT cmd_get 2
STAT cmd_set 1
STAT get_hits 1
STAT get_misses 1

And 60 seconds later, the memcached item should have disappeared: +1 get, +1 miss, +1 set
Cache miss, key 51e3ce08915082db3df32bfc. Querying...
Item found: {"_id":"51e3ce08915082db3df32bfc","x":13}
Stored key=51e3ce08915082db3df32bfc, value=13

STAT cmd_get 3
STAT cmd_set 2
STAT get_hits 1
STAT get_misses 2

Pretty cool, huh? A basic Node.js + memcached + MongoDB app in less than 100 lines of code, comments and logging included.

However, the really great stuff is what you DON'T see:
  • Node.js automatically handles asynchronous requests and callbacks,
  • Building a memcached cluster will hardly have any impact on the application code, 
  • Building a MongoDB cluster (replica sets, sharding) will be completely transparent.
  • And of course, you can easily create your own pool of Node.js servers and load balance them.
That's A LOT of scalability for free as far as the application developer is concerned.

Food for thought... Something tells me this isn't the last post on these topics. I hope you're enjoying this as much as I am. Time for a drink, I'm exhausted. Cheers!