![image](https://raw.githubusercontent.com/bespokelabsai/curator/main/docs/Bespoke-Labs-Logomark-Red-crop.png)


# [Curator](https://github.com/bespokelabsai/curator)


Curator is an open-source tool to curate large scale datasets for post-training LLMs. 

Curator was used to curate [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k), a reasoning dataset to train a fully open reasoning model [Bespoke-Stratos](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation).


### Curator supports:

- Calling Deepseek API for scalable synthetic data curation
- Easy structured data extraction
- Caching and automatic recovery
- Dataset visualization
- Saving $$$ using batch mode

### Call Deepseek API with Curator easily:

![image](https://pbs.twimg.com/media/GiLHb-xasAAbs4m?format=jpg&name=4096x4096)

# Get Started here

- [Colab Example](https://colab.research.google.com/drive/1Z78ciwHIl_ytACzcrslNrZP2iwK05eIF?usp=sharing)
- [Github Repo](https://github.com/bespokelabsai/curator)
- [Documentation](https://docs.bespokelabs.ai/)
- [Discord](https://discord.com/invite/KqpXvpzVBS)