mirror of
https://github.com/deepseek-ai/awesome-deepseek-integration.git
synced 2025-07-26 00:55:01 -04:00
add curator
This commit is contained in:
parent
bd3ef90cc1
commit
1547c531a2
3 changed files with 71 additions and 0 deletions
30
docs/curator/README.md
Normal file
30
docs/curator/README.md
Normal file
|
@ -0,0 +1,30 @@
|
|||
|
||||

|
||||
|
||||
|
||||
# [Curator](https://github.com/bespokelabsai/curator)
|
||||
|
||||
|
||||
Curator is an open-source tool to curate large scale datasets for post-training LLMs.
|
||||
|
||||
Curator was used to curate [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k), a reasoning dataset to train a fully open reasoning model [Bespoke-Stratos](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation).
|
||||
|
||||
|
||||
### Curator supports:
|
||||
|
||||
- Calling Deepseek API for scalable synthetic data curation
|
||||
- Easy structured data extraction
|
||||
- Caching and automatic recovery
|
||||
- Dataset visualization
|
||||
- Saving $$$ using batch mode
|
||||
|
||||
### Call Deepseek API with Curator easily:
|
||||
|
||||

|
||||
|
||||
# Get Started here
|
||||
|
||||
- [Colab Example](https://colab.research.google.com/drive/1Z78ciwHIl_ytACzcrslNrZP2iwK05eIF?usp=sharing)
|
||||
- [Github Repo](https://github.com/bespokelabsai/curator)
|
||||
- [Documentation](https://docs.bespokelabs.ai/)
|
||||
- [Discord](https://discord.com/invite/KqpXvpzVBS)
|
Loading…
Add table
Add a link
Reference in a new issue