, Chief Technology Officer and Co-Founder, Contextual AI
Alignment with human feedback is a crucial aspect of large language model deployments. The dominant alignment approaches, reinforcement learning from human feedback and direct preference optimization, have a major downside: they require paired preference data, incurring expense and slow data annotation efforts. In the real world, unpaired data is much more abundant. Can we speed up the feedback loop by removing the requirement for paired data? As I'll explain, we can do exactly that, via a new alignment method called Kahneman Tversky Optimization (KTO).