Michael J. Swart

May 24, 2017

A Table Of Contents For the Data Industry

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 10:52 am

DDIAA review of Designing Data-Intensive Applications The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppman, published by O’Reilly.

Martin Kleppmann

In my time exploring the world of data, I often feel intrigued by the unfamiliar. But like most, I’m leery of buzzwords and maybe a little worried that I’m missing out on the thing that we’ll all need to learn to extend our careers beyond the next five years.

Adding to the frustration, I have difficulty evaluating new technologies when the only help I’ve got is a vendor’s brochure, or a blog titled “Why X is way way better than Y”. Balanced, unbiased help is hard to find. And I really value unbiased ideas. Personally, I try not to appear the Microsoft groupie. I value my MVP award, but I also like to stress the “independent” part of that.

I’m also wary of some academic researchers. The theory can sometimes drift too far away from the practical (dba.stackexchange only has 411 results for “normal form” and I suspect many of those are homework questions). Some academics almost seem offended whenever a vendor deviates from relational theory.

That’s why I was so thrilled (and bit relieved) to discover Martin Kleppmann’s Designing Data-Intensive Applications. Martin is an amazing writer who approached his book with a really balanced style. He’s also a researcher with real-world experience that helps him focus on the practical.

Designing Data-Intensive Applications

First let me just say that the book has a really cool wild boar on the cover. The boar reminds me of Porcellino at the University of Waterloo.

Boar

Designing Data-Intensive Applications is a book that covers database systems very comprehensively. He covers both relational systems and distributed systems. He covers data models, fault tolerance strategies and so much more. In fact he covers so many topics that the whole book seems like a table of contents for our data industry.

Here’s the thing. When I read the parts I know, he’s a hundred percent right and that helps me trust him when he talks about the parts that I don’t know about.

Martin talks a lot about distributed systems, both the benefits and drawbacks, and even though you may have no plans to write a Map-Reduce job, you’ll be equipped to talk about it intelligently with those that do. The same goes for other new systems. For example, after reading Martin’s book, you’ll be able to read the spec sheet on Cosmos DB and feel more comfortable reasoning about its benefits (but that’s a post for another day).

Event Sourcing

Martin then goes on to write about event sourcing. Martin is a fan of event sourcing (as are several of my colleagues). In fact I first learned about event sourcing when a friend sent me Martin’s video Turning the Database Inside Out With Apache Samza.
A ridiculously simplified summary is this. Make the transaction log the source of truth and make everything else a materialized view. It allows for some really powerful data solutions and it simplifies some problems you and I have learned to live with. So I wondered whether his chapter on event sourcing would sound like a commercial. Nope, that chapter is still remarkably well balanced.

By the way, when the revolution comes it won’t bother me at all. There’s no longer such thing as a one-size-fits-all data system and the fascinating work involves fitting all the pieces together. I’m working in the right place to explore this brave new world and excited to learn the best way to move from here to there.

Some Other Notes I Made

  • This feels like the RedBook put together by Michael Stonebraker but it deals with more kinds of systems. I hope Martin refreshes this book every few years with new editions as the industry changes.
  • Martin suggests that the C in ACID might have been introduced for the purpose of the acronym. I knew it!
  • Martin calls the CAP theorem unhelpful and explains why. He admits the CAP theorem “encouraged engineers to explore a wider design space” but it is probably better left behind.
  • My wife Leanne doesn’t like fantasy books and she won’t read a book that has a map in the front. My friend Paul won’t read a book without one. Martin is very professional, but his style shows and I love it. He’s got a map at the beginning of every chapter.
  • The quotes at the beginning of each chapter are really well chosen. They come from Douglas Adams, Terry Pratchett and others. But my favorite is from Thomas Aquinas “If the highest aim of a captain were to preserve his ship, he would keep it in port forever.”

Martin writes “The goal of this book is to help you navigate the fast and diverse changing landscapes of technologies for processing and storing data”. I believe he met that goal.

2 Comments »

  1. Thanks for reviewing this. I wouldn’t have heard of it otherwise most likely, and it looks super interesting.

    Comment by Erik Darling — May 24, 2017 @ 7:52 pm

  2. No problem Erik, I think a community crossover is in order now and then. It’s always good to hear what folks outside the Microsoft world think.

    Comment by Michael J. Swart — May 24, 2017 @ 9:07 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress