Lightnews — Scholar-powered news

Harvey Lederman @harveylederman.bsky.social · 12d

People are incredibly good at predicting each other! Seem to use “folk psych“ concepts to do it

Harvey Lederman @harveylederman.bsky.social · 12d

properties it talks about instantiated by the relevant systems whenever it is well-predicted by the theory? I don't have a confident answer to this question; I feel pressure in both directions, but you seem confident that the q should be answered one way, and I'm just not sure!

1

Harvey Lederman @harveylederman.bsky.social · 12d

or having full information about mechanism. On the first point: it's a really hard question how we should think about high-level properties! For instance, statistical mechanics is a very successful theory. Is that because the properties it talks about are realized in some deep way? Or are the...

1

Harvey Lederman @harveylederman.bsky.social · 12d

On the second point: lots of the time in science, we aren't certain something is true (e.g. is there dark matter?), but we have good evidence that it is. Interpretationism allows that we can have good evidence that a system has beliefs and desires, even without checking every possible theory...

1

Harvey Lederman @harveylederman.bsky.social · 13d

criticism applies? Stepping back: your initial reaction was "isn't this a reductio?" My response was: even if interpretationism is false, we still learn interesting things about LLMs by taking it seriously.

1 1

Harvey Lederman @harveylederman.bsky.social · 13d

That's an interesting reaction. I thought we were saying something more along the lines of "the study of attitudes as interpretationists understand them is useful". And here the thought was that it's a model of what high-level properties we might look for to predict LLM behavior. So not sure the...

1 1

Harvey Lederman @harveylederman.bsky.social · 13d

Oh, I see, you are making a smaller point with the "predict" claim, where we use this word as equivalent to "entail" (i.e. interpretationism entails that ELIZA has no beliefs). I think that's a reasonably standard term of art, but I'm sorry that it was confusing!

Harvey Lederman @harveylederman.bsky.social · 13d

Eh? We meant "sufficiently good theory" and "predict sufficiently well" to be equivalent. Also, why would the attribution of beliefs and desires not make testable predictions? Certain patterns of behavior are not rational in light of certain profiles of beliefs and desires; they are ruled out.

1

Harvey Lederman @harveylederman.bsky.social · 14d

not obnoxious in the least! super helpful -- i'm embarrassed bc i think i even read one of these before but my mind is sievelike these days

1

Harvey Lederman @harveylederman.bsky.social · 14d

Thanks! We'll think more...and thanks for the reading list -- sorry we didn't get there before this draft!

1 1

Harvey Lederman @harveylederman.bsky.social · 14d

I guess Greco is also to the “relative to purposes“ though he is a contextualist so attributions are true or false in context (without needing relativizaroom)

1

Harvey Lederman @harveylederman.bsky.social · 14d

great! I expect people will have different reactions to the terminology and some will say “objective” is a good term here, but I’ll think about whether to change — good we agree on the actual question being interesting (and a feature of our view not yours)

2 1

Harvey Lederman @harveylederman.bsky.social · 14d

I guess this is telling me we're developing different views? I don't want sensitivity to this diversity of goals because I want to say that someone either believes p or doesn't, not that it's relative to some other thing, a purpose). (Maybe you want that, too, but you're contextualist?)

1 1

Harvey Lederman @harveylederman.bsky.social · 14d

Crucially our view is not like that: on our view an ascribee has / doesn't have beliefs (simpliciter). I think that's an important distinction. (Again whether or not this was Dennett's view.)

1 1

Harvey Lederman @harveylederman.bsky.social · 14d

If I said "absolute" instead of "objective" would that make you happier? Whether or not this was his view, Dennett is sometimes characterized as thinking that belief-attribution is relative to a person or a purpose. You don't just have / not beliefs, you have them relative to attributer purpose...

1 1

Harvey Lederman @harveylederman.bsky.social · 14d

Really appreciate the comments and references — sorry we missed these. please do self promote (by email?) other things you’d like us to check out!

1

Harvey Lederman @harveylederman.bsky.social · 14d

Thanks for engaging Devin!! On the first point — I have to read and think. On the second point, not sure I get the move about stance or perspective. (Or rather, I feel you get what we’re saying?) and on the bigger point: do you have the same objection to best systems analyses of laws?

3 1

Reposted by Harvey Lederman

Jeremiah McCall (he/him) @gamingthepast.bsky.social · 14d

Wow @utaustin.bsky.social maybe @utaustinihs.bsky.social maybe there are other ways to shout out to the department of East Asia Studies and the Department of History, but however that may be, they are doing some amazing stuff with in house games for their JapanLab! Just found even more stuff here.

Projects — JapanLab

www.utjapanlab.com

2 5 11

Harvey Lederman @harveylederman.bsky.social · 14d

Thanks for the comment! I’ll be curious what you think if you have a chance to read some of the paper. We don’t take a stand on the truth of interpretationism. In section 2.2 we explicitly discuss why interpretationism matters even if it’s not the true theory of belief and desire

2 1

Harvey Lederman @harveylederman.bsky.social · 14d

This is a draft paper. We very much welcome feedback and discussion! It builds on my earlier work with @kmahowald.bsky.social, and we’re indebted to Kyle, Murray Shanahan, and quite a few others for discussion and commentary (though they're not responsible for the views in the paper). 7/7

1 3

Harvey Lederman @harveylederman.bsky.social · 14d

We briefly assess the consequences of attributing interpretationist propositional attitudes (e.g. for copyright, welfare, safety, etc.). 6/7

1 3

Harvey Lederman @harveylederman.bsky.social · 14d

In addition, we critically assess the view that LLMs merely “role play” or “simulate” minds. We argue that clarity is needed on what the empirical content of this view is, by contrast to one (like ours) on which LLMs do have (interpretationist) propositional attitudes. 5/7

1 3

Harvey Lederman @harveylederman.bsky.social · 14d

Third claim: what we call the "HHH+0 framework" -- LLM instances want to be honest, helpful, harmless, and, in addition, may have “zero-shot” desires, acquired from the system prompt. The notion of zero-shot desires is new to the paper, and a key part of our picture. 4/7

1 3

Harvey Lederman @harveylederman.bsky.social · 14d

Second claim: interpretationists have reason to think these instances have desires. Along the way we highlight a key criterion for interpretationist desire: taking a wide array of means to rationally promote a small range of ends in an array of environments. 3/7

1 2

Harvey Lederman @harveylederman.bsky.social · 14d

First claim: the appropriate locus of “psychology” in LLMs is not the model but the runtime instance. This point has been in the ether, but we give new arguments for it and articulate our own version. 2/7

1 3