Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...
Correspondence to Tahlia Alsop, School of Health and Rehabilitation Sciences, The University of Queensland, St Lucia, QLD 4072, Australia; t.alsop{at}uq.edu.au The WHO has called for action to ...