All Posts

  • Published on
    I explore the motivation and techniques behind building a vectorized execution engine for distributed queries. Traditional tuple-at-a-time evaluation fails to utilize hardware efficiently at scale. By expressing queries as linear algebraic operations on batches of column vectors using generated kernels, significant performance gains can be achieved through improved data locality, reduced interpretation overhead and better utilization of CPU resources like SIMD units.
  • Published on
    I explore approaches to automatically test databases for logic bugs that can severely impact applications and data integrity. Traditional testing struggles with database complexity, so I examine new techniques like Pivoted Query Synthesis, Non-Optimizing Reference Engine Construction and Ternary Query Partitioning. By generating and validating randomized queries, these methods can reveal hidden logic errors not discovered by conventional testing.