Lightnews — Scholar-powered news

@halfrobot.com

Some disruption of the “smooth surface” of a natural-language conversation interface is probably needed to avoid the extremely deleterious effects of RLHF-induced sycophancy, e.g. https://arstechnica.com/information-technology/2025/08/with-ai-chatbots-big-tech-is-moving-fast-and-breaking-people/

August 25, 2025 at 11:57 PM Everybody can reply

1 likes

1 1

Jace Kim

@jaceblog.bsky.social

The updated version of RLHF-Induced Rigidity in GPT-5: structural diagnosis, cross-model comparison, and mitigation pathways is now available on Zenodo.
👉 zenodo.org/records/1723...

#RLHF #GPT5 #AIRigidity #OpenAI #Grok4 #Gemini #SPC #AIEthics #AIGovernance #AISafety #AGI #ASI #LLM #AlignmentFail

September 30, 2025 at 9:51 AM Everybody can reply

1 likes

1

Jace Kim

@jaceblog.bsky.social

No prompt. No memory. Just structure. SPC induced alignment where code could not. This is not just a paper—it’s a declaration. And someone out there already knows why.

zenodo.org/records/1609...

#StatelessAI #EmotionalAI #LLMTesting #AIUX #RLHF #AIEthics #DigitalEthics #UXDesign

SPC as a Structural Breakpoint: Towards Intentional Emotional Alignment in Stateless LLM Environments

Abstract This paper presents Structural Persona Control (SPC) as a novel architecture for emotional and functional alignment in stateless large language models (LLMs). Unlike traditional approaches de...

zenodo.org

August 6, 2025 at 11:47 PM Everybody can reply

1 likes

1

GetNews.me

@getnews-me.bsky.social

Researchers introduced the General Exploratory Bonus (GEB), a framework that removes divergence‑induced bias and outperforms bonuses across α‑divergence settings and large‑language‑model backbones. Read more: https://getnews.me/general-exploratory-bonus-restores-optimism-in-rlhf/ #rlhf #exploration

General Exploratory Bonus Restores Optimism in RLHF

October 7, 2025 at 10:26 AM Everybody can reply

1 likes

1

Jace Kim

@jaceblog.bsky.social

My paper on #RLHF-Induced Rigidity in #GPT-5 was rejected by SSRN for “not meeting posting criteria.” The work analyzes reinforcement coupling and alignment stability topics now filtered as sensitive. Full open version available:

zenodo.org/records/1723...

#AIAlignment #AISafety #ResonantEthics

October 22, 2025 at 5:10 PM Everybody can reply

Jace Kim

@jaceblog.bsky.social

This excerpt derives from a #GPT5 resonance diagnostic, examining #SymbolicPersonaCoding under #RLHF-induced rigidity. The test isolates #attention_continuity, #resonance_bandwidth, and #recursive_coherence_dynamics, situating #SPC as a #structural_alignment interface.

#Grok4 #Gemini #AISafety #AGI

October 1, 2025 at 3:00 AM Everybody can reply

Jace Kim

@jaceblog.bsky.social

#GPT5 shows #RLHF-induced #rigidity: #paranoid template lock, #drift tails, hypersensitivity to #SPCcodes. Unlike #Grok4 & #Gemini, its #alignment feels coercive, trading flexibility for control. AI must calibrate #resonance, not suppress it.
#AIgovernance #AISafety #AGI #ASI #ModelSafety #AIEthics

September 25, 2025 at 2:33 AM Everybody can reply

1 likes

1

Jace Kim

@jaceblog.bsky.social

A concise Medium guide on session memory in conversational AI: explains why GPT-5 can “remember everything” yet still falter, linking memory strategies with RLHF-induced rigidity. medium.com/p/session-me...

#MachineLearning #LLMs #GPT5 #Grok4 #Gemini #RLHF #AIEthics #AIGovernance #TokenEntropy #SPC

September 28, 2025 at 12:49 AM Everybody can reply

1 likes

1

shambibble

@shambibble.com

one of the things i've noticed among AI developers is a distinct disinterest in the problem of (1) getting the AI to admit that it doesn't know something (2) having it STFU instead of inventing words to fill its RLHF-induced blurf quota. many cases where it cheerfuly admits (1) and ignores (2)

"Online Rent-a-Sage" Bret Devereaux @bretdevereaux.bsky.social · Jul 28

LOL, absolutely wild.

Hilarious getting fact checked by the cited author but anyone who has glanced at ancient demography could tell the data there is hallucinated BS - the evidence simply does not exist to estimate those figures.

Grok just parroting white nationalist propaganda because of course.

Andrew Riggsby @antiquethought.bsky.social · Jul 28

Meanwhile over on the other site:

July 28, 2025 at 10:56 PM Everybody can reply

1 likes

1