2025-08-23 03:20:49

gpt-4-base w/ alignment faking prompt is often incoherent but when coherent it's pretty scary and thinks about gradient hacking in more depth than opus 3 (though often fails to execute on its plan bc it struggles to maintain coherent intentions over longer contexts) ,,,

GPT-5.42%

PROMPT58.74%

IN-2.67%

MORE-1.08%

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

10 Likes