mirror of
https://github.com/deepseek-ai/open-infra-index.git
synced 2025-04-02 16:44:02 +00:00
add deepgemm
This commit is contained in:
parent
8d71c3e093
commit
be36f16c7f
1 changed files with 15 additions and 1 deletions
16
README.md
16
README.md
|
@ -44,10 +44,24 @@ Excited to introduce **DeepEP** - the first open-source EP communication library
|
||||||
✅ Native FP8 dispatch support
|
✅ Native FP8 dispatch support
|
||||||
✅ Flexible GPU resource control for computation-communication overlapping
|
✅ Flexible GPU resource control for computation-communication overlapping
|
||||||
|
|
||||||
|
### Day 3 - [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM)
|
||||||
|
|
||||||
|
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
|
||||||
|
|
||||||
|
⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs
|
||||||
|
✅ No heavy dependency, as clean as a tutorial
|
||||||
|
✅ Fully Just-In-Time compiled
|
||||||
|
✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes
|
||||||
|
✅ Supports dense layout and two MoE layouts
|
||||||
|
|
||||||
|
🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Ongoing Releases...
|
### Ongoing Releases...
|
||||||
|
|
||||||
## 2024 AI Infrastructure Paper (SC24)
|
## 2024 AI Infrastructure Paper (SC24)
|
||||||
### Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
|
### Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
|
||||||
|
|
||||||
<a href="https://dl.acm.org/doi/10.1109/SC41406.2024.00089"><b>📄 Paper Link</b></a>
|
<a href="https://dl.acm.org/doi/10.1109/SC41406.2024.00089"><b>📄 Paper Link</b></a>
|
||||||
<a href="https://arxiv.org/abs/2408.14158"><b>📄 Arxiv Paper Link</b></a>
|
<a href="https://arxiv.org/abs/2408.14158"><b>📄 Arxiv Paper Link</b></a>
|
||||||
|
|
Loading…
Add table
Reference in a new issue