Over the last 7 years, there have been dozens of Hadoop and Spark benchmarks. A vast majority of them are a scam. Sometimes the perpetrators are ignorant of how to do a benchmark correctly (i.e. Hanlon’s Razor.) But too often, some zealot just wants to declare their product the winner. They ignore or adjust the facts to fit their objective. The novice data management buyer then makes decisions based on misleading performance results.
This leads to slow performing implementations if not outright purchasing disasters. Business users are then told to abandon slow running queries, they don’t need them. (Yeah, right.) That company then falls behind their competitors who made better purchasing decisions. Never forget: query performance is vital for employee productivity and results accuracy. When hundreds of employees make 1000s of informed decisions every day, it adds up.
Today’s misleading big data benchmarks mirror historical misbehavior. In the 1990s, poorly defined vendor benchmarks claimed superiority over competitors. Customers got bombarded with claims they couldn’t understand nor use for comparisons. The Transaction Processing Council was formed to define standard OLTP and decision support benchmark specifications. TPC benchmarks require auditors, full process disclosure, and pricing to the public. TPC-DS and TPC-H are the current decision support benchmark specifications. Ironically, TPC-DS and the older TPC-H queries are used in nearly all tainted benchmark boasts. IBM Research complained loudly about Hadoop TPC-DS benchmarks back in 2014. But the fake news continues, sometimes called bench-marketing. But most of today’s misleading benchmarks come from academics and vendor laboratories, not marketing people.
Here are some of the ways Hadoop benchmark cheating occurs.
Grandpa’s hardware. Comparing old hardware and software to new is the oldest trick in the book. Comparing a 2012 server with four cores to 2017 servers with 22 cores its pure cheating. Furthermore, each CPU generation brings faster DRAM memory and PCI busses. Is it any surprise the new hardware wins all benchmark tests? But then some startup vendor publishes “NoSQL beats Relational Database X”, leaving out inconvenient details. There’s a variation cheat where hard disk servers are compared to SSD only servers. Solution: publish full configuration details. And always use the same hardware and OS for every comparison
SQL we don’t like. None of the SQL-on-Hadoop offerings support all the ANSII SQL-99 functions. The easy solution is to discard SQL the product cannot run. While TPC-DS has roughly 100 SQL queries, no Hadoop vendors report more than 25 query results. Worse, they modify the SQL in ways that invalidates the purpose of the test. Solution: Disclose all SQL modifications.
Cherry pickers. By reporting only the query tests that they win, the vendor claims they beat the competitor 100% of the time. These are easy to spot since every bar chart shows the competitor losing by 2X to 100X slower. Solution: publish all TPC test results. But this will never happen which is why customers must run their own benchmarks
Tweaks and no tweaks. First, install the competitor out-of-the-box product, preferably a back level version. Run the tests. Next load the cheater products on the system. Now tweak, tune, and cajole the cheater software to its' highest performance. Too bad the competitor couldn't do any tweaks. Solution: test out of the box products only
Amnesia cache. Run the benchmark a few times so that all the data ends up in memory cache. Run it again and “Wow! That was fast.” Next, reboot and run the competitor software with empty memory caches forcing it to read all the data from disk. Surprise! The competitor loses. Solution: clear the in-memory cache before every test.
Manual optimizer (an oxymoron). Lacking an optimizer, Hadoop and Spark systems process tables in the order encountered in the SQL statement. So programmers reorder the SQL clauses for the best performance. That seems fair to me. But if a Tableau or Qlik user issues a query, those tweaks are not possible. Oops --not fair to the users. Solution: run the TPC test SQL ‘as-is’ without reordering.
Ignorance is easy. This is common with university students who lack experience testing complex systems. Benchmarking is a career path, not a semester’s experiment. Controlling all the variables is much harder than most people know. There are many common mistakes that students don’t grasp and repeatedly apply. Solution: be skeptical of university benchmarks. Sorry about that Berzerkly.
Scalability that is not. Many benchmarks double the data volume and call it a scalability test. Wrong. That’s a scale-up stress test. Scale-out is when more servers are added to an existing cluster. Solution: Double the data and the cluster hardware. If response time is within 5% of the prior tests, it’s a good scalability test. Note to self: Teradata systems consistently scale-out between 97-100%.