Guilherme Almeida
@almeida2808.bsky.social
220 followers 210 following 27 posts
XJur.
Posts Media Videos Starter Packs
Reposted by Guilherme Almeida
lsolum.bsky.social
Almeida on Dual Character Concepts, buff.ly/QjZqcuZ - Guilherme Almeida has posted A Defense of Dual Character Concepts in Legal Philosophy and Beyond on SSRN.
almeida2808.bsky.social
I just posted a new pre-print where I argue that dual character concepts are something new and interesting in legal philosophy and beyond and that they can't be reduced to ambiguity, prototypes, or metalinguistic negotiations. Comments are very welcome! papers.ssrn.com/sol3/papers....
<span>A defense of dual character concepts in legal philosophy and beyond</span><span></span>
Recent work in jurisprudence claimed that central legal concepts, such as that of LEGAL VALIDITY and of a legal RULE have a dual character structure. Moreover,
papers.ssrn.com
Reposted by Guilherme Almeida
xphilosopher.bsky.social
This is a job in political theory/philosophy that is specifically for someone interested in empirical work. Looks perfect for folks doing experimental philosophy
Reposted by Guilherme Almeida
tomerullman.bsky.social
oh wow wow wow. Wow.

It's officially out, after many years:

"Loopholes: A window into value alignment and the communication of meaning"

authors.elsevier.com/c/1k~vV_Ebvv...

Read on for a brief summary, but I encourage you to read the thing itself.
almeida2808.bsky.social
That’s a great point! The study also included legal rules, like one prohibiting shooting at deers. The design doesn’t allow us to look at the two sets of rules separately, but that would be a good direction for future research.
almeida2808.bsky.social
Now out @ Journal of Research in Personality (free until June): authors.elsevier.com/a/1k%7EqWL4L...

We found evidence that trait empathy correlates with purposivism in rule violation judgments. Also: most people share a single concept that seems to have a dual character structure.
Reposted by Guilherme Almeida
lakens.bsky.social
The long awaited results of the Brazilian Reproducibility Network are in! Their final sample consists of 97 replications of 47 studies. The only coherent measure of replication, p<.05, shows a replication rate of 19%. www.biorxiv.org/content/10.1...
almeida2808.bsky.social
But the usual caveats should still be in place. For instance, even with temperature calibration, models still showed diminished diversity of thought when compared to humans.

Comments are more than welcome!
almeida2808.bsky.social
Overall, this suggests that the models are doing something more than mere memorization and that we could potentially learn about the likely reactions humans would have to novel stimuli by looking at LLM responses. 13/14
almeida2808.bsky.social
Interestingly, LLMs diverge here. GPT-4o and Llama 3.2 90b were not affected by the time pressure manipulation, but Claude 3 and Gemini Pro were. Moreover, the latter were similar to humans in that they relied more on text under forced delay than under time pressure. 12/14
almeida2808.bsky.social
The cool thing, of course, is that you can't really put LLMs under time pressure or forced delay (at least not with the public APIs). Thus, we're just either telling the model that it should respond within 4 seconds or that it must wait at least 15 seconds. 11/14
almeida2808.bsky.social
That depends at least in part on whether you think that the time pressure manipulation is inducing a bias or not. But you could argue either that competent concept application requires sensitivity to time constraints, or that time constraints elicit bias by restricting processing. 10/14
almeida2808.bsky.social
For Study 2, we decided to try something different. Among humans, we know that time pressure leads to more purposivism and a forced delay leads to more textualism. This could be read as either a context-sensitive feature of the concept of rule or as a bias. What would competent LLMs do? 9/14
almeida2808.bsky.social
Even more surprisingly, the same thing was true for all the models we tested: all of them were less textualist on the new stimuli when compared to the old stimuli. We interpret this to be evidence of conceptual mastery. Even subtle differences between stimuli are tracked by current LLMs. 8/14
almeida2808.bsky.social
The human data was surprising in that it revealed a significant difference between old and new vignettes. We didn't expect there to be any difference, but participants relied on text to a lesser extent on new vignettes when compared to old vignettes 7/14
almeida2808.bsky.social
To deal with (2), we first collected new data from humans. We then computed the standard deviation in each cell of our 2 (text) x 2 (purpose) x 4 (scenario) x 2 (new vs. old) design (total: 32 cells) and selected the temperature for each model that minimized the mean squared error between SDs. 6/14
almeida2808.bsky.social
To address issue (1), we created new vignettes that were supposed to match up perfectly with those in an earlier paper (doi.org/10.1037/lhb0...), changing just the exact words used. If models are just memorizing, they wouldn't be able to generalize to the new stimuli (although that's debatable) 5/14
almeida2808.bsky.social
Temperature is a parameter controlling the extent to which models will prioritize their best answer. Previous research sometimes set temperature to 0, driving models to nearly-deterministic results, while others vary it in somewhat arbitrary ways. We think there is a better way to do this! 4/14
almeida2808.bsky.social
2) Even when the significance patterns are similar, LLMs tend to show diminished diversity of thought, (see arxiv.org/abs/2302.07267) that is, different runs of the same model show much less variance in response to a fixed stimuli than a human sample. But LLM-APIs allow us to adjust that. 3/14
almeida2808.bsky.social
Previous work has shown that LLMs respond to stimuli in roughly the same way as humans. Usually, those papers compare the responses generated by LLMs with previously published human results. The issue is that LLMs could achieve this result through memorization. So, we need new stimuli. 2/14