Scaling of Search and Learning: A Roadmap to Strawberry fields o1, o3 (forever?)
Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Bo Wang, Shimin Li, Yunhua Zhou, Qipeng Guo, Xuanjing Huang, Xipeng Qiu, Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective, arXiv:2412.14135v1
Abstract: OpenAI o1 represents a significant milestone in Artificial Inteiligence, which achieves expert-level performances on many challanging tasks that require strong reasoning this http URL has claimed that the main techinique behinds o1 is the reinforcement learining. Recent works use alternative approaches like knowledge distillation to imitate o1's reasoning style, but their effectiveness is limited by the capability ceiling of the teacher ...