Progressive Execution of Big Data Processing

This project explores novel big data query optimization techniques to progressively execute SQL query pipelines to incrementally compute query results.


Texera is a GUI-based workflow system for interactive big data Analytics.

Recent Publications

. Optimizing Machine Learning Inference Queries with Correlative Proxy Models. In VLDB, 2022.


. Tempura: A General Cost-based Optimizer Framework for Incremental Data Processing . In VLDB, 2022.


. Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera. In VLDB Demo, 2020.

Project Video

. Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing. In SIGMOD Demo, 2020.


. Amber: A Debuggable Dataflow System Based on the Actor Model. In VLDB, 2020.


. A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets. [Best Demo Award]. In ICDE Demo, 2017.

PDF Project Video


I’m the Teaching Assistant for the following courses at UC Irvine:

  • Spring 2019: CS 221 - Information Retrieval
    • Developed the course project from scratch - Implement a full-text search engine similar to Lucene and Elastic Search in Java.
  • Winter 2018, Spring 2018, Winter 2019: CS 122B - Projects in Databases and Web Applications
    • Made many improvements to update the class project with latest web-related technologies, such as Github, Maven, Frontend-Backend separation, modern JavaScript, and more.
  • Fall 2017: CS 222P - Principles of Data Management.
    • Made an improvement of using Valgrind to facilitate debugging.