I am generally interested in distributed systems and networking. Currently, I am pursuing the vision of compute-network converged distributed systems to support latency-critical applications. Compute-network convergence manifests on both the system and application levels and our work spans the following two topics respectively.
Low-Latency Data Center In-Network Computing
With the slowdown of Moore’s law, large-scale data centers have been employing an increasing number of domain-specific accelerators (GPUs, TPUs, and FPGAs) to deliver the needed unprecedented performance for computation-intensive workloads like machine learning model training. Under this trend, the emergence of programmable network devices (P4 switches, SmartNICs, and DPUs more recently) has motivated a new concept called in-network computing, where programmable network devices (with P4/NPL languages) are instructed to accelerate application-specific computations (e.g., allreduce for distributed machine learning training) in addition to running network functions. In-network computing brings tremendous performance benefits for a variety of distributed workloads, but also imposes challenges to the design of data center systems. My work on in-network computing has been on the following aspects.
- Holistic resource scheduling for data center in-network computing [ASPLOS'21][ToN'22]
- Unified programming system for in-network computing [HotNets'21]
- In-network machine learning [EuroP4'22]
Latency-Critical Machine Learning Systems
Emerging mobile and ubiquitous applications demand real-time, reliable performance which can benefit from decentralized, close-to-user deployments in the field. For example, augmented reality (AR) applications running on mobile devices can leverage nearby compute servers to perform heavy object/action recognition tasks powered by machine learning on mobile devices or over the mobile network at the edge. However, serving machine learning models is challenging due to the heavy computation and requires careful resource provisioning at the dynamic edge network in order to guarantee latency service-level objectives (SLOs) while achieving resource efficiency. My work on machine learning systems has been focused on the following aspects.