S3 Twitter: What is needed is quick hash append
I dug up a couple interesting posts from 'Al' at Folknologist (sorry, I can't find Al's full name on his blog).
First is a comment on the circleshare blog regarding Twitter's database scaling issues:
The big problem is the inserts (if the backend is a db), every tweet has to be inserted. Thus even if you have a fast messaging (in memory) the write that accompanies it is relatively slow. In such cases you need some super fast hash append system rather than a database, something that literally just writes to a log like file. (Deletes can be handle by null writes on existing keys).
If somebody has a scalable appender like this in code let me know as I could do with one, especially if I can get it working with S3
Yes indeed, if someone can produce a reliable appender in a cost effective way using S3, I'd love to see it as well. After some research into S3, I don't think it is feasible. Unlike gfs which supports record appends, S3 does not.
Second is the call for a database service for AWS.
AWS is built around an expectation that storage takes place using the highly redundant/reliable S3 infrastructure. This of course makes sense except in the case where one is using a database for storage as opposed to files.
There's the real kicker. I can't think of many significant web applications which don't need at least some database services. Even if an app can make good use of EC2, such as mass video encoding, at some point the application must store something in a database, which makes EC2 and S3 solutions to a subset of a web site's problems.