From be36f16c7f2fbebb59d4b940afe5a5a153c19d90 Mon Sep 17 00:00:00 2001 From: Huang Panpan Date: Wed, 26 Feb 2025 08:55:30 +0800 Subject: [PATCH] add deepgemm --- README.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 2d2a120..6d95bfb 100644 --- a/README.md +++ b/README.md @@ -44,10 +44,24 @@ Excited to introduce **DeepEP** - the first open-source EP communication library ✅ Native FP8 dispatch support ✅ Flexible GPU resource control for computation-communication overlapping +### Day 3 - [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) + +Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. + +⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs +✅ No heavy dependency, as clean as a tutorial +✅ Fully Just-In-Time compiled +✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes +✅ Supports dense layout and two MoE layouts + +🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM + + + ### Ongoing Releases... ## 2024 AI Infrastructure Paper (SC24) ### Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning 📄 Paper Link -📄 Arxiv Paper Link \ No newline at end of file +📄 Arxiv Paper Link