AI is a tool you can use in any field not just computer science.
You can teach students to study history better using Deep Research for example, then compare their results and the number of iterations they ran with machine to actually measure critical thinking.
It's not just ARC. Even in math or coding benchmarks, the performance is extraordinary. I think this series of models is truly generalist. I work with o1 every day, and the difference with Claude is insane.