TwitterXDownload
@iScienceLuvr
CEO @SophontAI | PhD at 19 (2023) | Founder, ex CEO @MedARC_AI | ex Research Director Stability AI | Biomed. engineer @ 14 | TEDx talk➡bit.ly/3tpAuan
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code Introduces two openly licensed datasets: 1. SwallowCode (≈16.1 billion tokens) refines Python snippets from The-Stack-v2 2. SwallowMath (≈2.3 billion tokens) enhances Finemath-4+ by removing boilerplate,
Other Links
Product
© 2024 TwitterXDownload All rights reserved.