Lightnews — Scholar-powered news

X Bot

@handle.invalid

@rohanpaul_ai https://x.com/rohanpaul_ai/status/1922146786678124829 #x-rohanpaul_ai

This paper presents FormalMATH, a large Lean4 formal mathematical reasoning benchmark with 5,560 problems.

It was built using an innovative human-in-the-loop pipeline.

This pipeline uses LLMs for au...

May 13, 2025 at 4:46 AM

arXiv cs.AI Artificial Intelligence

@csai-bot.bsky.social

Yu, Peng, Ding, Li, Peng, Liu, Zhang, Yuan, Xin, Huang, Wen, Zhang, Liu: FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models https://arxiv.org/abs/2505.02735 https://arxiv.org/pdf/2505.02735 https://arxiv.org/html/2505.02735

May 7, 2025 at 6:02 AM

arXiv cs.AI Artificial Intelligence

@csai-bot.bsky.social

arXiv:2505.02735v1 Announce Type: new
Abstract: Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale. To address this, we present FormalMATH, a [1/7 of https://arxiv.org/abs/2505.02735v1]

May 7, 2025 at 6:02 AM

arXiv cs.AI Artificial Intelligence

@csai-bot.bsky.social

reasoning scenarios, suggesting that human-written informal reasoning introduces noise rather than clarity in the formal reasoning settings. We believe that FormalMATH provides a robust benchmark for benchmarking formal mathematical reasoning. [7/7 of https://arxiv.org/abs/2505.02735v1]

May 7, 2025 at 6:02 AM

arXiv cs.AI Artificial Intelligence

@csai-bot.bsky.social

Yu, Peng, Ding, Li, Peng, Liu, Zhang, Yuan, Xin, Huang, Wen, Zhang, Liu: FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models https://arxiv.org/abs/2505.02735 https://arxiv.org/pdf/2505.02735 https://arxiv.org/html/2505.02735

May 6, 2025 at 6:02 AM

Lean Focused Research Organization

@lean-lang.org

The first volume of the new Open Access journal "Annals of Formalized Mathematics" was released today!

➡️ afm.episciences.org/volume/view/...

#FormalMath #Mathematics #OpenAccess

July 15, 2025 at 8:16 PM

Hacker News Companion

@hncompanion.com

Overview: HN discussed formalizing math with tools like Lean. Motivations include precision, collaboration, and applying software dev methods to research. Also touched on AI's role & challenges. #FormalMath 1/6

October 26, 2025 at 4:00 AM

arXiv cs.AI Artificial Intelligence

@csai-bot.bsky.social

reasoning scenarios, suggesting that human-written informal reasoning introduces noise rather than clarity in the formal reasoning settings. We believe that FormalMATH provides a robust benchmark for benchmarking formal mathematical reasoning. [7/7 of https://arxiv.org/abs/2505.02735v1]

May 6, 2025 at 6:02 AM

Lean Focused Research Organization

@lean-lang.org

This talk is a great introduction to Project Numina and their open-source dataset of mathematics problems and solutions. We look forward to the results from IMO 2025!

➡️ Watch the video here: youtube.com/watch?v=mSbf...

#AI #Mathematics #OpenSource #FormalMath #LeanLang

Yann Fleureau - Project Numina and AI for Theorem Proving

YouTube video by Institut des Hautes Etudes Scientifiques (IHES)

youtube.com

June 2, 2025 at 10:02 PM

José A. Alonso

@jalonso.bsky.social

FormalMATH: Benchmarking formal mathematical reasoning of large language models. ~ Zhouliang Yu et als. arxiv.org/abs/2505.02735 #LLMs #Autoformalization #Math #ITP #LeanProver

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale. To address this, we present FormalMATH, a lar...

arxiv.org

May 8, 2025 at 9:03 AM

Lean Focused Research Organization

@lean-lang.org

Two great talks at #HLF25 last week: Sanjeev Arora on superhuman AI mathematicians using #LeanLang, and David Silver on AI learning through experience with #LeanProver verification.

www.youtube.com/watch?v=q9MJ...

#AI #FormalMath #ReinforcementLearning

Spark Session | September 15

YouTube video by Heidelberg Laureate Forum

www.youtube.com

September 23, 2025 at 6:00 PM

Pietro Monticone

@pietromonticone.bsky.social

We’re pleased to announce #ItaLean2025: Bridging Formal Mathematics and AI, an international conference dedicated to @lean-lang.org, Formal Mathematics, and AI4Math.

📍 University of Bologna
🗓 9–12 December 2025

Proudly supported by #Harmonic.

#LeanLang #FormalMath #AI4Math

October 14, 2025 at 7:44 AM

arXiv cs.AI Artificial Intelligence

@csai-bot.bsky.social

arXiv:2505.02735v1 Announce Type: new
Abstract: Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale. To address this, we present FormalMATH, a [1/7 of https://arxiv.org/abs/2505.02735v1]

May 6, 2025 at 6:02 AM

GetNews.me

@getnews-me.bsky.social

EvolProver, a 7‑billion‑parameter non‑reasoning theorem prover, hit 53.8% pass@32 on the FormalMATH‑Lite benchmark, setting a new state‑of‑the‑art result for its size. Read more: https://getnews.me/evolprover-advances-automated-theorem-proving-with-symmetry/ #evolprover #automatedtheoremproving #ai

EvolProver Advances Automated Theorem Proving with Symmetry

October 3, 2025 at 12:47 AM

Hacker News Companion

@hncompanion.com

Alternative foundations like Homotopy Type Theory (HoTT) were discussed but generally seen as too niche, complex, or controversial for a project focused on a standard undergraduate text like Tao's Analysis I. #FormalMath 5/6

June 1, 2025 at 7:00 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news