Publications
2025
- Ne Wang, Wenxiang Lin, Lin Zhang, Shaohuai Shi, Ruiting Zhou, and Bo Li, “SP-MoE: Expediting Mixture-of-Experts Training with Optimized Pipelining Planning,” IEEE International Conference on Computer Communications (INFOCOM) 2025, London, United Kingdom, May 19–22, 2025.
- Xinglin Pan, Wenxiang Lin, Lin Zhang, Shaohuai Shi, Zhenheng Tang, Rui Wang, Bo Li, and Xiaowen Chu, “FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models,” ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2025, Rotterdam, The Netherlands, March 30-April 3, 2025.
2024
- Ye Liu, Shan Chang, Denghui Li, Shaohuai Shi, and Bo Li, “RoPe-Door: Towards Robust and Persistent Backdoor Data Poisoning Attacks in Federated Learning,” IEEE Network, October 25, 2024.
- Jing Peng, Zihan Li, Shaohuai Shi, and Bo Li, “Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning,” 53rd International Conference on Parallel Processing (ICPP) 2024, Gotland, Sweden, August 12-15, 2024.
- Zichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu, “Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning,” 53rd International Conference on Parallel Processing (ICPP) 2024, Gotland, Sweden, August 12-15, 2024.
- Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, and Xuan Wang, “Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing,” IEEE/ACM IWQoS 2024, Guangzhou, China, June 19-21, 2024.
- Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu, “FedImpro: Measuring and Improving Client Update in Federated Learning,” ICLR 2024, Vienna, Austria, May 7-11, 2024.
- Shaohuai Shi, Xinglin Pan, Qiang Wang, Chengjian Liu, Xiaozhe Ren, Zhongzhe Hu, Yu Yang, Bo Li, and Xiaowen Chu, “ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling,” EuroSys 2024, Athens, Greece, April 22-25, 2024.
- Xinglin Pan, Wenxiang Lin, Shaohuai Shi, Xiaowen Chu, Weinong Sun, and Bo Li, “Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules,” IEEE INFOCOM 2024, Vancouver, Canada, May 20-23, 2024.
- Hucheng Liu, Shaohuai Shi, Xuan Wang, Zoe Lin Jiang, and Qian Chen, “Performance Analysis and Optimizations of Matrix Multiplications on ARMv8 Processors,” Design, Automation and Test in Europe Conference (DATE), Valencia, Spain, March 25-27, 2024.
2023
- Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Shaohuai Shi, Bingsheng He, and Xiaowen Chu, “FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs,” Symposium on Large Language Models (LLM 2023) with IJCAI 2023, Macao, China, August 21, 2023.
- Lin Zhang, Longteng Zhang, Shaohuai Shi, Xiaowen Chu and Bo Li, “Evaluation and Optimization of Gradient Compression for Distributed Deep Learning,” IEEE ICDCS 2023, Hong Kong, China, July 2023.
- Lin Zhang, Shaohuai Shi, Xiaowen Chu, Wei Wang, Bo Li and Chengjian Liu, “Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining,” IEEE ICDCS 2023, Hong Kong, China, July 2023.
- Shaohuai Shi, Qing Yang, Yang Xiang, Shuhan Qi, and Xuan Wang, “An Efficient Split Fine-tuning Framework for Edge and Cloud Collaborative Learning,” The Design Automation Conference (DAC) 2023 (Poster), Moscone West, San Francisco, July 2023. [PDF, Code]
- Lin Zhang, Shaohuai Shi, and Bo Li, “Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation,” ICLR 2023, Kigali, Rwanda, May 2023. [PDF, Code]
- Lin Zhang, Shaohuai Shi, and Bo Li, “Accelerating Distributed K-FAC with Efficient Collective Communication and Scheduling,” IEEE INFOCOM 2023, New York Area, U.S.A., May 2023.
- Shaohuai Shi, Xinglin Pan, Xiaowen Chu, and Bo Li, “PipeMoE: Accelerating Mixture-of-Experts through Adaptive Pipelining,” IEEE INFOCOM 2023, New York Area, U.S.A., May 2023.
2022
- Zhenheng Tang, Shaohuai Shi, Bo Li, and Xiaowen Chu, “GossipFL: A Decentralized Federated Learning Framework with Sparsified and Adaptive Communication,” IEEE Transactions on Parallel and Distributed Systems (TPDS), December 2022.
- Lin Zhang, Shaohuai Shi, Wei Wang, and Bo Li, “Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning,” IEEE Transactions on Cloud Computing (TCC), September 2022. [PDF, Code]
- Qiang Wang, Shaohuai Shi, Kaiyong Zhao, and Xiaowen Chu, “EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching,” ECCV 2022, Tel Aviv, Israel, October 2022. [PDF, Code]
- Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, and Xiaowen Chu, “Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning,” ICML 2022, Baltimore, Maryland, July 2022. [PDF, Code]
2021
- Zhenheng Tang, Zhikai Hu, Shaohuai Shi, Yiu-Ming Cheung, Yilun Jin, Zhenghang Ren, and Xiaowen Chu, “Data Resampling for Federated Learning with Non-IID Labels,” International Workshop on Federated and Transfer Learning for Data Sparsity and Confidentiality in Conjunction with IJCAI 2021 (FTL-IJCAI’21), Virtual Event, August 2021.
- Shaohuai Shi, Lin Zhang, and Bo Li, “Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks,” IEEE ICDCS 2021, Virtual Event, July 2021.
- Shaohuai Shi, Xiaowen Chu, and Bo Li, “Exploiting Simultaneous Communications to Accelerate Data Parallel Distributed Deep Learning,” IEEE INFOCOM 2021, Virtual Event, May 2021. (Best Paper Award, 3 out of 1266 submissions)
- Shaohuai Shi*, Xianhao Zhou*, Shutao Song*, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, and Xiaowen Chu, “Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters,” The 4th Conference on Machine Learning and Systems (MLSys) 2021, Virtual Event, April 2021. [PDF]
- Xin He, Shihao Wang, Xiaowen Chu, Shaohuai Shi, Jiangping Tang, Jiyong Zhang, Xin Liu, Chenggang Yan, Guiguang Ding, “Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans,” AAAI 2021, Virtual Event, February 2021.
- Shaohuai Shi, Xiaowen Chu, and Bo Li, “MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning,” IEEE Transactions on Parallel and Distributed Systems (TPDS), 2021. [PDF, Code]
2020
- Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, and Bo Li, “A Quantitative Survey of Communication Optimizations in Distributed Deep Learning,” IEEE Network, 2020. [PDF, Code]
- Zhenheng Tang, Shaohuai Shi, and Xiaowen Chu, “Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection,” IEEE ICDCS 2020 (Poster), Singapore, December 2020.
- Shaohuai Shi, Qiang Wang, and Xiaowen Chu, “Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format,” IEEE ICPADS 2020, Hong Kong, December 2020. [PDF, Code]
- Yuxin Wang, Qiang Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Kaiyong Zhao, and Xiaowen Chu, “Benchmarking the Performance and Power of AI Accelerators for AI Training,” The 3rd High Performance Machine Learning Workshop (HPML 2020), Melbourne, Australia, November 2020. [PDF]
- Shaohuai Shi, Qiang Wang, Xiaowen Chu, Bo Li, Yang Qin, Ruihao Liu, and Xinxiao Zhao, “Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs,” IEEE INFOCOM 2020, Toronto, Canada, July 2020. [Code]
- Shaohuai Shi, Zhenheng Tang, Qiang Wang, Kaiyong Zhao, and Xiaowen Chu, “Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees,” The 24th European Conference on Artificial Intelligence (ECAI), Santiago de Compostela, Spain, June 2020. [PDF, Code]
- Qiang Wang*, Shaohuai Shi*, Shizhen Zheng, Kaiyong Zhao, and Xiaowen Chu, “FADNet: A Fast and Accurate Network for Disparity Estimation,” International Conference on Robotics and Automation (ICRA), Paris, France, June 2020. [PDF, Code]
2019
- Xin He, Shihao Wang, Shaohuai Shi, Zhenheng Tang, Yuxin Wang, Zhihao Zhao, Jing Dai, Ronghao Ni, Xiaofeng Zhang, Xiaoming Liu, Zhili Wu, Wu Yu, and Xiaowen Chu, “Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models,” KDDBHI Workshop 2019, IEEE BigData Conference, Los Angeles, CA, December 2019.
- Shaohuai Shi, Kaiyong Zhao, Qiang Wang, Zhenheng Tang, and Xiaowen Chu, “A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification,” IJCAI 2019, Macao, China, August 2019. [PDF, Code]
- Shaohuai Shi, Qiang Wang, Kaiyong Zhao, Zhenheng Tang, Yuxin Wang, Xiang Huang, and Xiaowen Chu, “A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks,” IEEE ICDCS 2019, Texas, USA, July 2019. [PDF, Code]
- Shaohuai Shi, Xiaowen Chu, and Bo Li, “MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms,” IEEE INFOCOM 2019, Paris, France, May 2019. [PDF, Code]
2018
- Xianyan Jia*, Shutao Song*, Shaohuai Shi*, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, and Xiaowen Chu, “Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes,” NeurIPS 2018 Workshop on Systems for ML and Open Source Software, Montreal, Canada, December 2018. [PDF]
- Shaohuai Shi, Qiang Wang, Xiaowen Chu, and Bo Li, “A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning,” IEEE ICPADS 2018, Singapore, December 2018. [PDF]
- Shaohuai Shi, Qiang Wang, and Xiaowen Chu, “Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs,” IEEE DataCom 2018, Athens, Greece, August 2018. (Best Paper Award) [PDF]
2017 and before
- Shaohuai Shi, Pengfei Xu, and Xiaowen Chu, “Supervised Learning Based Algorithm Selection for Deep Neural Networks,” IEEE ICPADS 2017, Shenzhen, China, December 2017. [PDF, Code]
- Pengfei Xu, Shaohuai Shi, and Xiaowen Chu, “Performance Evaluation of Deep Learning Tools in Docker Containers,” The 3rd International Conference on Big Data Computing and Communications (BigCom), Chengdu, China, August 2017.
- Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. “Benchmarking State-of-the-art Deep Learning Software Tools,” The 7th International Conference on Cloud Computing and Big Data (CCBD), Macao, China, November 2016. [PDF, Code]
- Jingjing Chen, You Li, Xiaowen Chu, Shaohuai Shi, Tang Tao, Lin Cui, Zhiling Xu, and Jianliang Xu, “Ebanshu: An Interactivity-aware Blended Virtual Learning Environment,” The 9th International Conference on Internet and Web Applications and Services (ICIW), Paris, France, July 2014
- Shuhan Qi, Xuan Wang, and Shaohuai Shi, “Mixed Precision Method for GPU-based FFT,” The 14th IEEE International Conference on Computational Science and Engineering, Dalian, China, August 2011.
- Jiangfeng Peng, Hu Chen, and Shaohuai Shi, “The GPU-based String Matching System in Advanced AC Algorithm,” The 10th IEEE International Conference on Computer and Information Technology, West Yorkshire, UK, June 2010.
Preprints
- Zhenheng Tang, Shaohuai Shi, Xiaowen Chu, Wei Wang, and Bo Li, “Communication-Efficient Distributed Deep Learning: A Comprehensive Survey,” March 2020. [PDF]
- Shaohuai Shi, Xiaowen Chu, Ka Chun Cheung, and Simon See, “Understanding Top-k Sparsification in Distributed Deep Learning,” 2019. [PDF, Code]
- Shi, Shaohuai and Xiaowen Chu. “Speeding Up Convolutional Neural Networks by Exploiting the Sparsity of Rectifier Units,” April 2017. [PDF]