Lightnews — Scholar-powered news

spellbanisher.bsky.social

@spellbanisher.bsky.social

This one wasn't too bad

December 1, 2025 at 12:35 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Same prompt, but on a foggy car window instead of a chalkboard

November 29, 2025 at 2:13 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Doubling the length of video quadruples its cost. At 10c per second for a 10 second video, an 80 minute video (short movie) would cost about $260,000. That's just for the base model at 720p. If you wanted pro at 1024p, about $1.3 million.

October 15, 2025 at 7:29 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

It doesn't fundamentally change your point, but openai released api prices for sora 2. It is 10 cents per second for 720p video on the base model, 30 cents per second for the pro model, and 50 cents per second for 1024p.

October 8, 2025 at 2:48 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Openai doesn't create any original content, so basically sora is trained on almost nothing but copyrighted content.

October 2, 2025 at 3:17 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

15 cents per second or .15 cents?

October 2, 2025 at 12:33 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Before agi they should try to make ai that can reliably take drive thru fast food orders

gizmodo.com/taco-bell-sa...

Taco Bell Says 'No Más' to AI Drive-Thru Experiment

If you think humans get your order wrong, wait until you try AI.

gizmodo.com

August 28, 2025 at 7:39 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Not even the fast food checkout ai is reliable enough.

gizmodo.com/taco-bell-sa...

Taco Bell Says 'No Más' to AI Drive-Thru Experiment

If you think humans get your order wrong, wait until you try AI.

gizmodo.com

August 28, 2025 at 7:36 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Too big to fail

August 5, 2025 at 6:28 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Using AI to write this article made it tedious to read. The 'rule of three' (where you have lists of attributes in 3 clauses) is especially egregious here, where it is used 5 times in 4 sentences.

July 5, 2025 at 1:41 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

I've seen several videos of walking humanoids being kicked and shoved without falling over. Why or how is it that they are still too unsafe?

April 11, 2025 at 2:38 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

17-20$ for the high efficiency mode. Low efficiency mode used about 172x as much compute.

December 29, 2024 at 1:36 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

I don't think they have individuals take the entire test. Rather, they'll have turkers complete like 5 tasks and after they get a sufficient sample size they can estimate what an average person would score if they took the whole test.

December 26, 2024 at 4:47 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

They have a private test, but it can only be taken after the benchmark is beaten on public and semi-private test set. O3 did not beat the benchmark. It met the score requirement (85% or above), but not the cost requirement (less than $10k).

December 26, 2024 at 4:42 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Are Amazon turkers considered above or below American average?

December 25, 2024 at 7:11 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

It is actually based on the training set. Another study found that the two-shot average on the evaluation set was 60%, with a high of 98%. They also estimated that an ensemble of 10 randomly selected people online would score 100%.
arxiv.org/html/2409.01...

December 23, 2024 at 9:06 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

A smaller open source model running on less than .10$ per task managed 56% on arc-agi. O3 used 30,000x as much compute to get 88%. Wouldn't be surprised if used similar methods, with difference being compute. Openai did train the model for this domain.

December 21, 2024 at 2:58 PM

spellbanisher.bsky.social

@spellbanisher.bsky.social

Openai did say they fine-tuned o3 on the 400 public eval questions. Since gpt4o and o1 preview were tested on the semi-private evaluations, they would also have those questions if they wanted to fine their models on that as well.

December 21, 2024 at 2:08 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

"Semi-private" refers to the test set arc-agi created specifically for frontier models. The problems can't be seen beforehand (hence private) but because frontier models run on the developers api, the developers have the test set once their model takes the test.

December 21, 2024 at 2:05 AM

spellbanisher.bsky.social

@spellbanisher.bsky.social

From google labs Imagefx

December 18, 2024 at 2:54 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news