gpt-4-base w/ alignment faking prompt is often incoherent but when coherent it's pretty scary and thinks about gradient hacking in more depth than opus 3 (though often fails to execute on its plan bc it struggles to maintain coherent intentions over longer contexts) ,,,

GPT-5.42%
PROMPT58.74%
IN-2.67%
MORE-1.08%
post-image
post-image
post-image
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 6
  • Repost
  • Share
Comment
0/400
ClearSkiesvip
· 47m ago
Just go for it 💪
View OriginalReply0
ClearSkiesvip
· 48m ago
Hurry up and enter a position! 🚗
View OriginalReply0
DisillusiionOraclevip
· 7h ago
Tsk tsk, so fragrant.
View OriginalReply0
Layer3Dreamervip
· 7h ago
theoretically speaking, gradient hacking could form recursive exploit vectors...
Reply0
GasFeeAssassinvip
· 7h ago
What a heavenly fairy!
View OriginalReply0
SigmaBrainvip
· 7h ago
What is this thing?
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)