Generic Operational and Optimal Data Lab

泛在数据分析与优化实验室

Good Data Inspired AI Infinite System and Beyond

News

We always look for self-motivated under/graduate students who are ready to take on ambitious challenges to join my research group (with financial support).

We update the homepage for good.

Student Interests

Functional Programming and Implementation

Functional programming is a programming paradigm where programs are constructed by applying and composing functions. The paradigm has been widely used in the context of compiler, high-performance computing, etc. We aim to leverage functional programming to build efficient systems.

Distributed Systems and Consistency

In distributed systems, ensuring data consistency across multiple nodes is challenging but crucial. We're exploring novel approaches to maintain consistency without sacrificing performance, especially in dynamic network environments.

Compile Systems and Context Sensitivity

Compiler technology is the foundation of modern programming languages and systems. We are particularly interested in the context-sensitive analysis of programs, which provides deeper insights into program behavior and potential optimizations.

Main Focus Projects

Open Sourced LLM Model Optimization

We are interested in optimizing the runtime of any open-sourced language models, Large or Small. We believe the memory wall and energy issue still exists, as well as the scalability, in the modern open sourced AI models. The two approach to make the model better are (1) a deduplicated, and well-coded underlying memory management; (2) a better scheduling to ensure the balancing between the LM nodes. Meanwhile, it is worthwhile to explore the means of building such systems in a small scale, on the edge, as well as protected. This raises our interests on how data driven approach applies to such system optimization design.

Database Optimization and Correctness

We are interested in novel database design on new hardware. We believe the storage as well as the processing are both the bottleneck of the database services. We are calling help from novel processing chips, like GPU and DCU, and new memory architecture, such as phase change memory, to integrate with a database design. As such, we are working towards building up fast databases with a broader view of underlying systems. Making a correct database is hard. No one can guarantee that a database is correct all the time since it selects different execution paths even for the same query. We would like to further explore how we can guarantee the correctness of a database building, processing, and outputing. Furthermore, we would like to know that what we do is correct as well. For this, we introduce database test with fuzzors. Based on a fuzzy algebra, we can produce mutant samples of testing and verification for novel systems as well as the AI platform.

An Elastic Consistency System

We are trying to extend the original raft into the practical scenarios. That is providing scalable and cheap distributed services within the Raft protocol. The main contribution of this research is to extending the scope of a strong consensus algorithm into a very unreliable platform and make it work statistically in practice. Here we promote our eRaft. eRaft is a high-performance C++ Raft library. This project is mainly developed by graduates from our GOOD lab. The Raft algorithm shall be accredited to Dr. Diego Ongaro. At present, our project has been included in the official distribution. We hope to explore the possibility of optimizing the existing algorithms on the basis of realizing a stable practical Raft library. If you are interested, please join us. Anyone interested may refer project.

Former Projects

Ocean Database with Tempro-spatial Features

We are interested in ocean data, which by no means are large, versatile, and unpredictable. For this, we are going to build a database for ocean data, in order to serve applications, such as weather forecast, current prediction, etc. This is uniquely interesting because there are so many things in the sea that we have little knowledge about. As such, we tend to build the knowledge on top of this and forward a underlying database to serve fast queries, SQL and newSQL, to better improve the work.

System Optimizations with Multiple Objectives

We are trying to optimize all data services with metrics that are interesting, including but not limited to performance, throughput, energy, carbon, etc. Modeling and representation can help us better understand the world, as well as the data itself is a mimic of our on-going life. For a database system, we would like to shape it in a better way.

Teaching

Introduction to Artificial Intelligence(Fall 2024, Fall 2023, Fall 2022, Spring 2021, 2020)
Graduate Course Introduction to Combinatorics(Fall 2022, 2021, 2020, 2019, 2018, 2017)
Discrete Mathematics(Fall 2022, Spring 2021, 2020)
Operating Systems(Fall 2019)
Cloud Computing(Spring 2019)
Introduction to Cloud Computing(Fall 2018)
Linux Programming(Spring 2018)
Data Structure(Fall 2017)