Software engineers go crazy for the most ridiculous things. We like to think that we’re hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy bouncing from one person’s Hacker News comment to another’s blog post floating helplessly toward the brightest light. post ![]()
Having more fault tolerance than you need might sound fine, but consider the cost: not only would you be doing much more I/O, you might be switching from a mature system to something relatively threadbare.
Having read the Dynamo paper, and knowing Cassandra to be a close derivative, I understood that these distributed databases prioritize write availability compromising consistency, as well as basically every feature present in a traditional RDBMS.
I was surprised to discover that one student’s company had chosen to architect their system around Kafka. This was surprising because, as far as I could tell, their business processed just a few dozen very high value transactions per day, not Linkedin's peaks of 10 million a second.
By the time Amazon decided to move to service oriented architecture, they had around 7,800 employees and did over $3 billion in sales.
Use of large scale dataflow engines like Hadoop and Spark can be particularly funny: very often a traditional DBMS is better suited to the workload, and sometimes the volume of data is so small that it could even fit in memory. Did you know you can buy a terabyte of RAM for around $10,000? Even if you had a billion users, this would give you 1kB of RAM per user to work with.