In a recent blog post DHH again questions the status quo with the Basecamp database architecture.
I was surprised to learn that Basecamp continues to use what I call the Big Ass DB architecture – which boils down to keeping all your data in one big SQL database and then building a bigger and bigger server. This goes against the increasingly common wisdom of using distributed NoSQL DBs based on cheap, replaceable hardware, to get to “internet” scale. Another notable site that has gone the Big Ass DB route is StackOverflow.
I’m starting to find wisdom in both Jeff Atwood and DHH who have pushed these designs. I wish I had the scaling problems these guys have, but I don’t. It is almost certainly better bang for the buck for us to tune our current application and add more features than to use a more “advanced” database architecture.
With that said, the secret behind scaling up these architectures is that increasingly significant portions of the database are managed outside the DB itself in huge RAM caches. In fact DHH follows up his DB post with a picture of his caching hardware.
That is 864GB of RAM.
I’m one of those old dudes that remembers when RAM was a scarce commodity, and David is right, old habits are hard to break. But as I type this in a text box on a machine with 24GB of RAM, it is hard to not notice that times have changed.
But all this fast, cheap RAM has created a whole new set of problems for programmers – namely keeping cached data consistent with the data on disk. To be honest, my own team can be a bit too cavalier when caching data and calculations.
There is currently no easy way to determine where the canonical data in a complex system resides and how it is updated, and my bet is we will see a new set of tools and languages evolve to create abstractions which will make it possible for mere mortal programmers like myself to get these architectures right. I’m curious to hear more about DHH’s “russian-doll” architecture. Maybe this stuff will drop in Rails sooner than latter.