Search

Maosong Sun

Process reinforcement through implicit rewards.
Advancing LLM Reasoning Generalists with Preference Trees
Empowering private tutoring by chaining large language models.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Sparse Low-rank Adaptation of Pre-trained Language Models

Published with Wowchemy — the free, open source website builder that empowers creators.