 # [Curator](https://github.com/bespokelabsai/curator) Curator is an open-source tool to curate large scale datasets for post-training LLMs. Curator was used to curate [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k), a reasoning dataset to train a fully open reasoning model [Bespoke-Stratos](https://www.bespokelabs.ai/blog/bespoke-stratos-the-unreasonable-effectiveness-of-reasoning-distillation). ### Curator supports: - Calling Deepseek API for scalable synthetic data curation - Easy structured data extraction - Caching and automatic recovery - Dataset visualization - Saving $$$ using batch mode ### Call Deepseek API with Curator easily:  # Get Started here - [Colab Example](https://colab.research.google.com/drive/1Z78ciwHIl_ytACzcrslNrZP2iwK05eIF?usp=sharing) - [Github Repo](https://github.com/bespokelabsai/curator) - [Documentation](https://docs.bespokelabs.ai/) - [Discord](https://discord.com/invite/KqpXvpzVBS)