Tilman Bayer
tilmanbayer.bsky.social
Tilman Bayer
@tilmanbayer.bsky.social
AI, data, Wikipedia, co-maintainer of @wikiresearch.bsky.social
Not to engage in victim blaming (OpenAI surely invites this kind of mistake), but it's almost 2026 and people should know better than to run such a query without web search/reasoning.
With the same prompt, 5.2 Thinking (with "Extended Thinking") gives me 9 books, all real
chatgpt.com/share/6940a6...
December 16, 2025 at 12:28 AM
Big fan of ACX's aspirin vs. warfarin example www.astralcodexten.com/p/webmd-and-...
December 14, 2025 at 1:20 AM
Interesting to see the manifesto's extensive reference to Christopher Alexander, given his influence on past decades of software *development* (which, yes, is not the same as software *design*). en.wikipedia.org/wiki/Christo...
December 5, 2025 at 10:01 PM
"applies even more to social media than it did to TV" - which research result(s) are you referring to here, exactly?
The only related statements I can see in the Noy and Rao paper seem inconsistent with such claims that social media exacerbates things relative to cable TV:
November 25, 2025 at 8:21 PM
Wikipedia editors found out the hard way years ago already that the former reputation of @rollingstone.com for decent non-cultural reporting is a thing of the past.

Articles such as this are now officially unwelcome as a source, by unanimous community consensus: en.wikipedia.org/wiki/Wikiped...
November 25, 2025 at 8:28 AM
5.1 Thinking argues that the regulators make the "online platform" classification on the level of "services" (recital 15), after which "Online-platform-specific rights [...] aren’t limited to public content" on that service chatgpt.com/share/691ba8...
November 17, 2025 at 11:19 PM
Apropos this - half a year later, what's your overall sense about whether and how much this incident has helped increase the reach and/or reputation of Signal?
October 14, 2025 at 6:55 AM
I mean, the appendix describes in detail how they measured how much Theory of Mind a user has. E.g. saying "hello" or "thanks" to your chatbot should increase your score.
That said, it's amusing that the scoring was done by AI ("Language Model as a Research Assistant (LMRA; Eloundou et al.; 2024)").
September 25, 2025 at 6:09 PM
September 12, 2025 at 2:28 PM
Promoting 'Adult Content' on Bluesky, eh? 😉
Hope you aren't going to travel to Mississippi or the UK anytime soon ...
September 11, 2025 at 11:41 PM
That's a cool pfp idea!
September 8, 2025 at 12:26 AM
That's false, the paper explicitly states on p.4 that interstates/freeways/expressways were excluded.
In SF, that would mean that I-280 and U.S. Route 101 (which Waymo indeed still only does test rides on, although "doesn't go on" is false too) are not included in the comparison.
August 24, 2025 at 8:52 PM
...and applied it to the Wildchat dataset... www.phylliida.dev/modelwelfare...
August 15, 2025 at 2:12 AM
How does this compare to Anthropic's Clio data? www.anthropic.com/research/clio
(or, where in these top 10 use cases might the company hide such chats 😉)
August 15, 2025 at 12:19 AM
I believe there is lots of potential there.
But it's rather peculiar that Kaurov and Oreskes highlight the Black Spatula Project as a concrete example. It launched to big fanfare in December and appears to have seen basically zero activity afterwards according to its GitHub page
July 26, 2025 at 6:11 PM
By the way, do you happen to have any idea what kind of "law enforcement requirements related to cyber-bullying prevention" Tea might be blaming here? www.teaforwomen.com/cyberincident
July 26, 2025 at 5:51 PM
I guess the fact that they apparently looked at developer fixed effects doesn't really assuage you ...
July 11, 2025 at 6:02 AM
By the way, which visionary 1990s views by Burda are you gushing about here, exactly ("breathtaking")?
Context: ...
July 2, 2025 at 5:33 PM
This thread fails to mention that the release (even though based on PD material) prohibits commercial use and comes with other unusual terms (which open-source project wants to hire lawyers to determine whether your lawyers would agree that it is "unaffiliated with commercial ... intent"?)
June 12, 2025 at 11:56 PM
4) That's why WaPo says your paper has "implications for the policy debate swirling around AI and copyright" (despite your protestations that it is "not a tech policy writeup"), e.g. re the UK bill right now. And why right after quoting pro-fair use arguments it quotes you as a counterpoint.
June 12, 2025 at 3:15 AM
3) The introduction makes it clear that the purpose of the paper is not merely the provision of a new dataset, but also to shift policy discussions by finding a possibility to accede to the copyright maximalist demands of IP owners (prohibiting training without "consent").
June 12, 2025 at 3:15 AM
1) You come down on the anti fair use side right at the start of the abstract already, embracing the "unlicensed" --> "infringement" / "ethical concerns" shortcut favored by copyright industry advocates.
June 12, 2025 at 3:15 AM
Great to see a systematic evaluation of such ideas.
Small correction: It is not true that CORE-Bench (Siegel et al.) "primarily focused on ... computer-science disciplines alone" - medical+social science papers made up more than half of their data set
May 25, 2025 at 1:46 AM
I mean, BOLD was in fact used by Meta to debias Llama 2, e.g. successfully reducing LLMs' lamentable anti-male bias regarding the US entertainment industry 😉 ("more positive sentiment towards American female actresses than male actors")
arxiv.org/pdf/2307.09288
May 5, 2025 at 11:50 PM
The "Slaughterbots" scenario focused on autonomous decision-making, expecting this to make drone swarms "scalable weapons of mass destruction" spectrum.ieee.org/why-you-shou...
That hasn't come to pass. The current labor intense drone war in Ukraine still requires lots of human pilots for FPVs etc.
April 30, 2025 at 5:57 AM