update day 2 - DeepEP

2025-10-14 14:11:37 -04:00 · 2025-02-25 10:27:40 +08:00 · 2025-02-25 10:27:40 +08:00 · 35446186f6
commit 35446186f6
parent 006cdcf4e2
1 changed files with 14 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -23,15 +23,27 @@ Daily unlocks begin soon. No ivory towers - just pure garage-energy and communit

 Stay tuned – let's geek out in the open together.

-### Day 1 - FlashMLA
+### Day 1 - [FlashMLA](https://github.com/deepseek-ai/FlashMLA)
 **Efficient MLA Decoding Kernel for Hopper GPUs**  
 Optimized for variable-length sequences, battle-tested in production  

-🔗 <a href="https://github.com/deepseek-ai/FlashMLA"><b>GitHub Repo</b></a>  
+🔗 <a href="https://github.com/deepseek-ai/FlashMLA"><b>FlashMLA GitHub Repo</b></a>  
 ✅ BF16 support  
 ✅ Paged KV cache (block size 64)  
 ⚡ Performance: 3000 GB/s memory-bound | BF16 580 TFLOPS compute-bound on H800

+### Day 2 - [DeepEP](https://github.com/deepseek-ai/DeepEP)
+
+Excited to introduce **DeepEP** - the first open-source EP communication library for MoE model training and inference.
+
+🔗 <a href="https://github.com/deepseek-ai/DeepEP"><b>DeepEP GitHub Repo</b></a>  
+✅ Efficient and optimized all-to-all communication  
+✅ Both intranode and internode support with NVLink and RDMA  
+✅ High-throughput kernels for training and inference prefilling  
+✅ Low-latency kernels for inference decoding  
+✅ Native FP8 dispatch support  
+✅ Flexible GPU resource control for computation-communication overlapping  
+
 ### Ongoing Releases...

 ## 2024 AI Infrastructure Paper (SC24)