#FormalMath
@rohanpaul_ai https://x.com/rohanpaul_ai/status/1922146786678124829 #x-rohanpaul_ai

This paper presents FormalMATH, a large Lean4 formal mathematical reasoning benchmark with 5,560 problems.

It was built using an innovative human-in-the-loop pipeline.

This pipeline uses LLMs for au...
May 13, 2025 at 4:46 AM
Yu, Peng, Ding, Li, Peng, Liu, Zhang, Yuan, Xin, Huang, Wen, Zhang, Liu: FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models https://arxiv.org/abs/2505.02735 https://arxiv.org/pdf/2505.02735 https://arxiv.org/html/2505.02735
May 7, 2025 at 6:02 AM
arXiv:2505.02735v1 Announce Type: new
Abstract: Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale. To address this, we present FormalMATH, a [1/7 of https://arxiv.org/abs/2505.02735v1]
May 7, 2025 at 6:02 AM
reasoning scenarios, suggesting that human-written informal reasoning introduces noise rather than clarity in the formal reasoning settings. We believe that FormalMATH provides a robust benchmark for benchmarking formal mathematical reasoning. [7/7 of https://arxiv.org/abs/2505.02735v1]
May 7, 2025 at 6:02 AM
Yu, Peng, Ding, Li, Peng, Liu, Zhang, Yuan, Xin, Huang, Wen, Zhang, Liu: FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models https://arxiv.org/abs/2505.02735 https://arxiv.org/pdf/2505.02735 https://arxiv.org/html/2505.02735
May 6, 2025 at 6:02 AM
The first volume of the new Open Access journal "Annals of Formalized Mathematics" was released today!

➡️ afm.episciences.org/volume/view/...

#FormalMath #Mathematics #OpenAccess
July 15, 2025 at 8:16 PM
Overview: HN discussed formalizing math with tools like Lean. Motivations include precision, collaboration, and applying software dev methods to research. Also touched on AI's role & challenges. #FormalMath 1/6
October 26, 2025 at 4:00 AM
reasoning scenarios, suggesting that human-written informal reasoning introduces noise rather than clarity in the formal reasoning settings. We believe that FormalMATH provides a robust benchmark for benchmarking formal mathematical reasoning. [7/7 of https://arxiv.org/abs/2505.02735v1]
May 6, 2025 at 6:02 AM
This talk is a great introduction to Project Numina and their open-source dataset of mathematics problems and solutions. We look forward to the results from IMO 2025!

➡️ Watch the video here: youtube.com/watch?v=mSbf...

#AI #Mathematics #OpenSource #FormalMath #LeanLang
Yann Fleureau - Project Numina and AI for Theorem Proving
YouTube video by Institut des Hautes Etudes Scientifiques (IHES)
youtube.com
June 2, 2025 at 10:02 PM
FormalMATH: Benchmarking formal mathematical reasoning of large language models. ~ Zhouliang Yu et als. arxiv.org/abs/2505.02735 #LLMs #Autoformalization #Math #ITP #LeanProver
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale. To address this, we present FormalMATH, a lar...
arxiv.org
May 8, 2025 at 9:03 AM
Two great talks at #HLF25 last week: Sanjeev Arora on superhuman AI mathematicians using #LeanLang, and David Silver on AI learning through experience with #LeanProver verification.

www.youtube.com/watch?v=q9MJ...

#AI #FormalMath #ReinforcementLearning
Spark Session | September 15
YouTube video by Heidelberg Laureate Forum
www.youtube.com
September 23, 2025 at 6:00 PM
We’re pleased to announce #ItaLean2025: Bridging Formal Mathematics and AI, an international conference dedicated to @lean-lang.org, Formal Mathematics, and AI4Math.

📍 University of Bologna
🗓 9–12 December 2025

Proudly supported by #Harmonic.

#LeanLang #FormalMath #AI4Math
October 14, 2025 at 7:44 AM
arXiv:2505.02735v1 Announce Type: new
Abstract: Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale. To address this, we present FormalMATH, a [1/7 of https://arxiv.org/abs/2505.02735v1]
May 6, 2025 at 6:02 AM
EvolProver, a 7‑billion‑parameter non‑reasoning theorem prover, hit 53.8% pass@32 on the FormalMATH‑Lite benchmark, setting a new state‑of‑the‑art result for its size. Read more: https://getnews.me/evolprover-advances-automated-theorem-proving-with-symmetry/ #evolprover #automatedtheoremproving #ai
October 3, 2025 at 12:47 AM
Alternative foundations like Homotopy Type Theory (HoTT) were discussed but generally seen as too niche, complex, or controversial for a project focused on a standard undergraduate text like Tao's Analysis I. #FormalMath 5/6
June 1, 2025 at 7:00 PM