But +$6k for running a 400 prompt test? lol
But +$6k for running a 400 prompt test? lol
Branding this as 'reasoning' is... You know what, sure. I'm too tired to fight it today.
Branding this as 'reasoning' is... You know what, sure. I'm too tired to fight it today.
It's fine. Not perfect, not awful. Could write a short essay on my issues with it, but it's not egregious or offensive.
That said, oh boy, I get why the "I don't read books" SF bro effective altruism crowd think it's A MASTERPIECE, and I hate it for that
It's fine. Not perfect, not awful. Could write a short essay on my issues with it, but it's not egregious or offensive.
That said, oh boy, I get why the "I don't read books" SF bro effective altruism crowd think it's A MASTERPIECE, and I hate it for that
Millions of requests sent daily, obfuscated by shell identities. It's honestly a miracle anyone can ever backtrace bad actors.
Millions of requests sent daily, obfuscated by shell identities. It's honestly a miracle anyone can ever backtrace bad actors.
The way it handles Sozin and Roku's relationship is straight up terrible. Sozin himself is stripped of any redeeming qualities, his companions are boring clichés, and his role in the story feels almost shoehorned.
Its chromosomal diploid numbers range from 10 to 70, and is considered "taxonomic chaos"
They challenge our definition of species and make cute noises.
Its chromosomal diploid numbers range from 10 to 70, and is considered "taxonomic chaos"
They challenge our definition of species and make cute noises.
Even when all the subordinate functions went dark, it just playacted like it was getting updates.
Amusing, but not a great sign for a future of codebots
Even when all the subordinate functions went dark, it just playacted like it was getting updates.
Amusing, but not a great sign for a future of codebots
Doubt this will ever be a widely acceptable way to test, but it's an impressive win for tooling
Doubt this will ever be a widely acceptable way to test, but it's an impressive win for tooling
Yeah I too wish I could switch to an ultra wide, but have you ever tried opening an app as a remote desktop connection? Barely knows how to scale full screen
Yeah I too wish I could switch to an ultra wide, but have you ever tried opening an app as a remote desktop connection? Barely knows how to scale full screen
15 years ago, I could gift someone physical media an not worry about platforms.
Gadgets and one-off tech toys were less "what ecosystem" roulette. Could gift grandparents a Chromecast, or your aunt a cute Bluetooth speaker.
15 years ago, I could gift someone physical media an not worry about platforms.
Gadgets and one-off tech toys were less "what ecosystem" roulette. Could gift grandparents a Chromecast, or your aunt a cute Bluetooth speaker.
Benchmarking models based on non-oracle and expert tests.
I want to see which LLMs can parse documentation, find typos in code, scrub data and adhere to defined output instructions.
The fact that there are not 30 different benchmarks from different organizations in medicine, in law, in advice quality, etc. is a big shame. People are using systems for these things anyway & we don’t know implications.
Benchmarking models based on non-oracle and expert tests.
I want to see which LLMs can parse documentation, find typos in code, scrub data and adhere to defined output instructions.
I hated it so much, but it was the foundation for CS degrees back then. It pretty much derailed my enthusiasm for coding for a decade and convinced me for just as long I didn't actually want to code.
I hated it so much, but it was the foundation for CS degrees back then. It pretty much derailed my enthusiasm for coding for a decade and convinced me for just as long I didn't actually want to code.
Every prompt I give it, even if it's a backend database tool, it ends up building some kind of React web frontend.
Just gave up and leaned into it. Use it for MVP porting of local scripts into small web apps 🤷
Every prompt I give it, even if it's a backend database tool, it ends up building some kind of React web frontend.
Just gave up and leaned into it. Use it for MVP porting of local scripts into small web apps 🤷
Adoption tends and concerns from companies is a revelation.
People are less worried about safety, much more worried that almost all the tools being sold/pitched are low quality.
Adoption tends and concerns from companies is a revelation.
People are less worried about safety, much more worried that almost all the tools being sold/pitched are low quality.
I'm still baffled by how much modern stacks are so damn bloated and web based.
Not bad just a alien strange world where everything is JavaScript
I'm still baffled by how much modern stacks are so damn bloated and web based.
Not bad just a alien strange world where everything is JavaScript
Blind has not helped the problem of people understanding what "total comp" actually means.
Blind has not helped the problem of people understanding what "total comp" actually means.
Likely overdue, but means my saltier takes will need to be reigned in a bit
Likely overdue, but means my saltier takes will need to be reigned in a bit