And so here's my current reading list, with some early impressions.
AWS System Administration, by Mike Ryan (O'Reilly, 2015, early release version). Possibly the first book of its kind. Yes, the AWS online documentation is very good, but the humongous amount of information is sometimes a little overwhelming. This book is a nice introduction to most AWS building blocks, with lots of real-life advice and tons of examples. A useful compass to navigate the AWS ocean.
Designing Data-Intensive Applications, by Martin Kleppmann (O'Reilly, 2015, early release version). Subtitled "the big ideas behind reliable, scalable and maintainable systems", this book covers all major concepts and techniques used to build data stores, both for OTP and analytics : data models, storage and retrieval (yes, you will understand B-trees at last), encoding, replication, etc. Lots of illustrations, lots of examples from current technologies, lots of complex stuff explained in plain English. I like it very much so far.
Learning Spark, Matei Zaharia et al. (O'Reilly, 2015). A beginner book written by the creator of Spark (O'Reilly has another Spark book for advanced readers). This one delivers exactly what the title says and is another fine example of why O'Reilly books are the best : straight to the point and lots of examples (Python, Java, Scala). You'll be coding Spark jobs in no time. Some advanced topics are covered at the end of the book, including machine learning with MLLib.
Next on the pile :
User Story Mapping, Jeff Patton (O'Reilly, 2014) - Key Agile concept! Short version here.
Data Science From Scratch, Joel Grus (O'Reilly, 2015) : "Anyone who has some amount of mathematical aptitude and some amount of programming skill has the necessary raw materials to do data science". Sounds pragmatic and bullshit-free :)
PS: anyone from O'Reilly reading this? If you feel so inclined, I'll gladly accept a t-shirt or something. Thank you.