All Posts

  • Published on
    Database schema evolution (DDL) poses unique challenges in ensuring concurrency with transactions while maintaining ACID properties. Traditional approaches use metadata locks (MDLs) to serialize DDL and DML/DQL, but this can severely impact performance. This blog discusses the issues and explores solutions like multi-version concurrency control to enable highly concurrent non-blocking schema changes.
  • Published on
    This blog discusses the fatal problem of distributed metadata lock (MDL) deadlocks that can occur when distributed transactions and database definition language (DDL) statements execute concurrently across multiple database nodes. It explains the causes and presents a solution for detecting these distributed MDL deadlocks.
  • Published on
    I explore the motivation and techniques behind building a vectorized execution engine for distributed queries. Traditional tuple-at-a-time evaluation fails to utilize hardware efficiently at scale. By expressing queries as linear algebraic operations on batches of column vectors using generated kernels, significant performance gains can be achieved through improved data locality, reduced interpretation overhead and better utilization of CPU resources like SIMD units.
  • Published on
    I explore approaches to automatically test databases for logic bugs that can severely impact applications and data integrity. Traditional testing struggles with database complexity, so I examine new techniques like Pivoted Query Synthesis, Non-Optimizing Reference Engine Construction and Ternary Query Partitioning. By generating and validating randomized queries, these methods can reveal hidden logic errors not discovered by conventional testing.