Jul 27, 2012

Under the hood of Real-Time Bidding

Originally written for boom, the latest in performance display advertising.

Within the display advertising environment, Real-Time Bidding has evolved from curiosity to mainstay in the last 18 months.  Though real-time bidding can appear to be a pretty simple value proposition – advertisers bid, win and display on an impression-by-impression basis –don’t make the mistake of thinking it’s unsophisticated.   The technology behind RTB is extraordinary.   At Criteo, we’ve built out our real-time capabilities from scratch in the last two years, to handle 7 billion daily RTB requests on 16 exchanges covering more than 30 countries.

Big challenge, big solution.  Let’s take a look under the hood, shall we?  
In order to participate on a RTB exchange, media buyers first need to implement proper bidding protocols. Apart from the nascent OpenRTB initiative, there is not yet an industry-wide standard, so development work is required in each case.  Getting everything set is usually quite straightforward, especially with the right software architecture (and the right team … another story in itself).  However, since the ecosystem is made up of many companies figuring this out on the fly – no names – things can occasionally get tricky and surprises can happen.
Now, what about the bidding process?  In a trivial case, where an advertiser might simply enter a bid ten cents across the board, the process is not a big deal.  But in the non-trivial case, a bid should be the end result of the system applying sophisticated algorithms to a combination of data sent in the request as well as the media buyer’s own data.  This data can include things like:
  • User properties: user id (cookie matching required), user data stored by your platform
  • Zone properties: size, positioning e.g. above or below the fold
  • Website properties: domain, URL, site content.
  • Business rules: block lists, ad verification for brand safety.

For the winning bidder, it’s then time to deliver a banner, which could again range from a simple, static banner to a unique, fully personalized ad incorporating multiple products, offers, layouts, etc. 

Up to now, all of this is actually the easy part!  And here’s the real catch: once a request has been sent by the exchange, the bid must be received within a very short amount of time - generally under 100 milliseconds. Since typical one-way coast-to-coast latency in the US being 75 milliseconds, infrastructure becomes the alpha and omega of RTB. Get it wrong and your most clever algorithm won’t even get a chance to bid! 
The next challenge is then to introduce as much algorithmic intelligence as possible in whatever precious time is left. This leads to another problem: namely, storing and retrieving data quickly enough within the bidding process.  Then there are the issues of logging and crunching huge amounts of data, and detecting and fixing technical issues.   Quickly.  Not to mention the unannounced specification changes breaking the protocol, but I said “no names.”   Oh, and did I mention 24/7/365 availability?  The list goes on and on. 
RTB is far from mundane, and it represents a tremendous technical challenge.  But it’s also the future of media buying, and so we keep working at it.

Jul 18, 2012

Learning about data centers

A lot of my time was spent visiting data centers lately, with more visits on the horizon. Supersonic growth, you see :)

Not that I'm complaining, mind you. I find data centers totally fascinating. Aiming for perfect, 100% availability. Designed to survive pretty much any natural or man-made disaster.  Plain and simple at first sight, yet so complex and intricate when you scratch the surface. Secret, cold and lonely places, but throbbing with invisible energy. Are server racks a modern-day Stonehenge, erected to worship faceless gods? Mystical considerations aside, data centers are the alpha and the omega of modern civilization: our everyday life - and sometimes, the difference between life and death - depends on their availability. Face it: in the 21st century, Norns are weaving the web of fate with fiber optic cables.

For any self-respecting engineer, the hunger to understand how data centers work should be irresistible. There's so much to learn: construction, power, cooling, cabling, network, servers, etc. Where do you start? Actually, the question really is: where CAN you start? The data center industry is quite opaque. The veil is rarely lifted and some questions are hardly ever answered.

Of course, you should visit as many data centers as you can. I never refuse any opportunity, even when I'm not actively looking for hosting space. The more you see, the more you learn... and the more you realize how much there is still to learn. Keep your eyes open, use your common sense (water pipes above the racks? Hmm), ask a million questions. Try to get in touch with the guys actually running the site. They may not look like much, but with a bit of social engineering, they will give you the real lowdown on the site. Chances are you'll meet VERY colorful characters too, so don't miss out.

In addition to visits, there are a few, precious engineering resources that you may find useful.

Books: most of them are a complete waste of time and money (especially if they say "Green IT" in the title). However, I can vouch for the quality of these two:
  • "Administering Data Centers: Servers, Storage, And Voice over Ip" by K. Jayaswal, (John Wiley & Sons, 2005, ISBN-13: 978-0471771838): a very good place to start. All introductory concepts are present and newcomers will learn a lot on data center infrastructure and on IT inftastructure in general (servers, network, redundancy, etc). 
  •  "Maintaining Mission Critical Systems in a 24/7 Environment", by. P. Curtis (Wiley-Blackwell, 2nd edition, 2011, ISBN-13: 978-0470650424): fantastic book, but definitely not recommended as an introduction. This book focuses on daily maintenance of data center infrastructure (not IT infrastructure) and it goes into A LOT of detail on how things work and what to do to keep them going. Quick example: 9 pages on how to check the quality of fuel deliveries for your generators... you get the idea :) A mine of technical information, definitely the bedtime book for data center staff.
Online resources: sadly, most of the material out there is either watered-down to braindead level, or just infected with marketing/PR/sales bullshit (maybe that's the same thing, hey?). If you want technical information, you'll find it here. I'm not affiliated with any vendor, in case you were wondering :)
  • Energy University @ Schneider Electric: lots of free courses on data center technology (power, cooling, etc). Tested and approved.
  • Online courses @ Siemens: power only (breakers, transformers, etc). Very good stuff too.
So there you go. Learn, learn, learn. Till next time, get rackin', boy!