diff --git a/README.md b/README.md index 2d2a120..6d95bfb 100644 --- a/README.md +++ b/README.md @@ -44,10 +44,24 @@ Excited to introduce **DeepEP** - the first open-source EP communication library ✅ Native FP8 dispatch support ✅ Flexible GPU resource control for computation-communication overlapping +### Day 3 - [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) + +Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. + +⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs +✅ No heavy dependency, as clean as a tutorial +✅ Fully Just-In-Time compiled +✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes +✅ Supports dense layout and two MoE layouts + +🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM + + + ### Ongoing Releases... ## 2024 AI Infrastructure Paper (SC24) ### Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning 📄 Paper Link -📄 Arxiv Paper Link \ No newline at end of file +📄 Arxiv Paper Link