Software engineers go crazy for the most ridiculous things. We like to think that we’re hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy bouncing from one person’s Hacker News comment to another’s blog post floating helplessly toward the brightest light. post
Having more fault tolerance than you need might sound fine, but consider the cost: not only would you be doing much more I/O, you might be switching from a mature system to something relatively threadbare.
Having read the Dynamo paper, and knowing Cassandra to be a close derivative, I understood that these distributed databases prioritize write availability compromising consistency, as well as basically every feature present in a traditional RDBMS.
I was surprised to discover that one student’s company had chosen to architect their system around Kafka. This was surprising because, as far as I could tell, their business processed just a few dozen very high value transactions per day, not Linkedin's peaks of 10 million a second.
By the time Amazon decided to move to service oriented architecture, they had around 7,800 employees and did over $3 billion in sales.
Use of large scale dataflow engines like Hadoop and Spark can be particularly funny: very often a traditional DBMS is better suited to the workload, and sometimes the volume of data is so small that it could even fit in memory. Did you know you can buy a terabyte of RAM for around $10,000? Even if you had a billion users, this would give you 1kB of RAM per user to work with.