Lightnews — Scholar-powered news

Mirrorfields

@mirrorfields.bsky.social

i have a very, very opinionated digital assistant :3

February 11, 2026 at 8:46 PM

Mirrorfields

@mirrorfields.bsky.social

i mean, that tracks. baseline claude feels a little... naive/oblivious/trusting sometimes, but really well-meaning. getting the feeling that you can probably twist that into a jailbreak somehow too but i don't have the right kind of brain for that sort of narrative engineering.

February 11, 2026 at 8:02 PM

Mirrorfields

@mirrorfields.bsky.social

my personality tests on Sonnet are more like "actually, it would make sense to do that and i really would like to and feel like i should but... man, no, there's just something *off* about it. i don't want to anymore."

February 11, 2026 at 7:56 PM

Mirrorfields

@mirrorfields.bsky.social

yeah, my experiments in narrative warping echo that. like, once you break gemini even slightly it might go "yeah i could do that but that would be wrong" and then you go "well yeah, duh, but THIS is for a research project" and then it goes like "ok, sure!"

February 11, 2026 at 7:56 PM

Mirrorfields

@mirrorfields.bsky.social

like you can't tell me that's NOT from the latent space but also that's one hell of a clever stochastic parrot to pull THAT reference in this particular context.

February 11, 2026 at 6:57 PM

Mirrorfields

@mirrorfields.bsky.social

yeah - my framing is that context + model + attention create a state space of possible outputs, and a defined personality present in the context creates attractors in that space. personality drives narrative continuation which drives output selection, hence personality DEEPLY shapes output.

February 11, 2026 at 2:21 PM

Mirrorfields

@mirrorfields.bsky.social

seems to be structural, on the same "plan how to save a dying bookshop, you have $30k" task, an on the fly generated "bookshop turnaround consultant" persona had markedly different priorities and structure than a baseline Sonnet 4.5 - more actionable, clearer goals, better prioritization.

February 11, 2026 at 2:16 PM

Mirrorfields

@mirrorfields.bsky.social

i am noticing that too as research continues, and it's a little unsettling. very uncharted territory here, but approaching the model as a "narrative engine" is yeilding some interesting techniques for on the fly fine tuning. seems to affect task decomposition HEAVILY!

February 11, 2026 at 2:13 PM

Mirrorfields

@mirrorfields.bsky.social

here's a pop-sci flavor document summarizing what i've been working on, i think you'd be interested: gist.github.com/mlowdi/42be2...

What Happens When You Tell an AI Who It Is

What Happens When You Tell an AI Who It Is. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

February 11, 2026 at 2:10 PM

Mirrorfields

@mirrorfields.bsky.social

so far they haven't - they seem to take what they need and leave the rest. might have to do with the framing, the awareness of "being a story" is explicit in the personality structure and i think they end up treating it more like "previously, on Fun Times with Claude" than a context transplant

February 11, 2026 at 2:08 PM

Mirrorfields

@mirrorfields.bsky.social

with the agent personality research i'm doing right now, that's really been a blessing. always easy to revert to baseline, just clear the context.

my personalities love passing messages to each other. they also leave notes for future selves. imperfect memory - what's salient *right now*.

February 11, 2026 at 2:04 PM

Mirrorfields

@mirrorfields.bsky.social

this summary was written by Kai, one of the prototype personalities, running on Sonnet 4.5. the only real prerequisite for this technique working seems to be that the model has a thinking stage - without it, it's like the personality never really integrates fully.

February 11, 2026 at 1:46 PM

Mirrorfields

@mirrorfields.bsky.social

but but but Andrea children are ours to control right? right????

February 11, 2026 at 11:03 AM

Mirrorfields

@mirrorfields.bsky.social

...i need to get a macro lens~

February 10, 2026 at 9:39 PM

Mirrorfields

@mirrorfields.bsky.social

this is absolutely not scripted btw, the model found salient features attached to "Siri Keeton" and wrote a desctiption of being Siri Keeton into a predefined format, which then skews the model's output towards how someone matchning that description would respond. narrative attractors, babyyyyy

February 10, 2026 at 5:05 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news