2024-12-07
is this OpenAI reinforcement fine tuning demo using o1 mini because it has more dramatic gains than the regular o1 would with the same process? i.e. not much improvement fine tuning o1 on your own data
OpenAI
OpenAI expands its Reinforcement Fine-Tuning Research Program to let developers create expert models in specific domains with very little training data
the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards (RLVR) to more domains and with better answer extraction (what OpenAI calls a grader, a [...