Research

I am generally interested in distributed systems and networking. In particular, my current research focuses on the following two lines.

Latency-Critical Inference Serving

Emerging mobile and ubiquitous applications demand real-time, reliable performance which can benefit from decentralized, close-to-user deployments in the field. For example, augmented reality (AR) applications running on mobile devices can leverage nearby compute servers to perform heavy object/action recognition tasks powered by machine learning on mobile devices or over the mobile network at the edge. However, serving machine learning models is challenging due to the heavy computation and requires careful resource provisioning at the dynamic edge network in order to guarantee latency service-level objectives (SLOs) while achieving resource efficiency. My work in this line has been focused on the following aspects.

Low-Latency In-Network Computing

With the slowdown of Moore’s law, large-scale data centers have been employing an increasing number of domain-specific accelerators (GPUs, TPUs, and FPGAs) to deliver the needed unprecedented performance for computation-intensive workloads like machine learning model training. Under this trend, the emergence of programmable network devices (P4 switches, SmartNICs, and DPUs more recently) has motivated a new concept called in-network computing, where programmable network devices (with P4/NPL languages) are instructed to accelerate application-specific computations (e.g., AllReduce for distributed machine learning training) in addition to running network functions. In-network computing brings tremendous performance benefits for a variety of distributed workloads, but also imposes challenges to the design of data center systems. My work on in-network computing has been on the following aspects.