Three Tales on Finding Logic Bugs in Database Management Systems

July 2, 2020

Manuel Rigger


Three Tales on Finding Logic Bugs in Database Management Systems

Time:   11:00am
Location:   Zoom7 - https://zoom.us/j/7911012202 (pass: s3)

Database Management Systems (DBMS) are used ubiquitously for storing and retrieving data. It is critical that they function correctly — incorrectly computed result sets (e.g., by omitting a row) can cause serious loss or damage. We refer to such defects as logic bugs. Despite their importance, finding logic bugs in production DBMS is a longstanding challenge. Existing techniques such as fuzzing and differential testing are ineffective in finding them. We have proposed a set of novel techniques to effectively detect logic bugs by tackling the two core technical issues: generating test queries and constructing test oracles. We approach from three distinct conceptual angles: (1) generating queries so that a given row is included in their result sets, (2) translating a given query to suppress optimizations and provide reference results, and (3) partitioning a query into multiple queries whose combined result is identical to the original query’s result. We designed, realized and evaluated these approaches on a range of widely-used, production-quality DBMS including SQLite, MySQL, PostgreSQL, CockroachDB, and TiDB. To date, we have reported over 400 unique previously unknown bugs in these systems, over 350 of which have been fixed by the developers. Notably, half of our reports were logic bugs, while the remaining errors and crash bugs. Our work has provided solid methodological and technical bases for effectively testing DBMS in practice and already started seeing industrial adoption.