RLAIF

2024/9/6 7:48:53

文献阅读:RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

文献阅读:RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback 1. 文章简介2. 方法介绍 1. 整体方法说明 3. 实验结果 1. RLHF vs RLAIF2. Prompt的影响3. Self-Consistency4. Labeler Size的影响5. 标注数据的影响 4. 总结 & 思考 文…