Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

10:02 AM · May 7, 2025

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Introduces two openly licensed datasets:
1. SwallowCode (≈16.1 billion tokens) refines Python snippets from The-Stack-v2
2. SwallowMath (≈2.3 billion tokens) enhances Finemath-4+ by removing boilerplate, restoring context, and reformatting solutions into concise, step-by-step explanations

abs: https://arxiv.org/abs/2505.02881
datasets: https://huggingface.co/datasets/tokyotech-llm/swallow-code
https://huggingface.co/datasets/tokyotech-llm/swallow-math

शेयर करना

अन्वेषण करना

TwitterXDownload

v1.4.74

Download Twitter videos and media content for free. No registration required. Fast and easy Twitter video downloader. Twitter Media Saver. Twitter X Download.

अन्य लिंक

डाउनलोडर

टिक टोकवीडियो डाउनलोडर

tiktokवीडियो डाउनलोडर

बिलिबिलीवीडियो डाउनलोडर

त्वरित कार्यकर्तावीडियो डाउनलोडर

छोटी सी लाल किताबवीडियो डाउनलोडर

संबंधित उत्पाद

English 简体中文繁體中文 हिन्दी Español Français Deutsch বাংলা Русский Português اردو 日本語 한국어 Tiếng Việt Italiano ไทย Türkçe

© 2024 TwitterXDownload सर्वाधिकार सुरक्षित।

support@twitterxdownload.com