AI + FM papers
@ai-fm-papers.bsky.social
87 followers
0 following
22 posts
A feed of interesting AI / math / formal methods papers. Posts by @m-dodds.bsky.social
Posts
Media
Videos
Starter Packs
AI + FM papers
@ai-fm-papers.bsky.social
· Apr 12
Automated Proof Generation for Rust Code via Self-Evolution
Ensuring correctness is crucial for code generation. Formal verification offers a definitive assurance of correctness, but demands substantial human effort in proof construction and hence raises a pre...
arxiv.org
AI + FM papers
@ai-fm-papers.bsky.social
· Apr 12
Can LLMs Enable Verification in Mainstream Programming?
Although formal methods are capable of producing reliable software, they have seen minimal adoption in everyday programming. Automatic code generation using large language models is becoming increasin...
arxiv.org
AI + FM papers
@ai-fm-papers.bsky.social
· Feb 12
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
We introduce the Formally Verified Automated Programming Progress Standards, or FVAPPS, a benchmark of 4715 samples for writing programs and proving their correctness, the largest formal verification ...
arxiv.org
Reposted by AI + FM papers
AI + FM papers
@ai-fm-papers.bsky.social
· Jan 14
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
Automated code generation with large language models has gained significant traction, but there remains no guarantee on the correctness of generated code. We aim to use formal verification to provide ...
arxiv.org
AI + FM papers
@ai-fm-papers.bsky.social
· Dec 18
Laurel: Generating Dafny Assertions Using Large Language Models
Dafny is a popular verification language, which automates proofs by outsourcing them to an SMT solver. This automation is not perfect, however, and the solver often requires guidance in the form of he...
arxiv.org
AI + FM papers
@ai-fm-papers.bsky.social
· Dec 10
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only acc...
arxiv.org
Reposted by AI + FM papers
AI + FM papers
@ai-fm-papers.bsky.social
· Nov 30
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models
Posted by Sergei Glazunov and Mark Brand, Google Project Zero Introduction At Project Zero, we constantly seek to expand the scope and e...
googleprojectzero.blogspot.com
AI + FM papers
@ai-fm-papers.bsky.social
· Nov 30
AI + FM papers
@ai-fm-papers.bsky.social
· Nov 30
From Naptime to Big Sleep: Using Large Language Models To Catch Vulnerabilities In Real-World Code
Posted by the Big Sleep team Introduction In our previous post, Project Naptime: Evaluating Offensive Security Capabilities of Large L...
googleprojectzero.blogspot.com
AI + FM papers
@ai-fm-papers.bsky.social
· Nov 26
Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiabil...
arxiv.org
AI + FM papers
@ai-fm-papers.bsky.social
· Nov 24
Evaluating frontier AI R&D capabilities of language model agents against human experts
We’re releasing RE-Bench, a new benchmark for measuring the performance of humans and frontier model agents on ML research engineering tasks. We also share data from 71 human expert attempts and resul...
metr.org
AI + FM papers
@ai-fm-papers.bsky.social
· Nov 24
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Do large language models (LLMs) solve reasoning tasks by learning robust generalizable algorithms, or do they memorize training data? To investigate this question, we use arithmetic reasoning as a rep...
arxiv.org