Lightnews — Scholar-powered news

Michael R. Bock

@michaelrbock.com

The full plan:

www.columntax.com/blog/our-se...

Column Tax’s “secret” master plan to automate tax filing

Column Tax is in a unique position at a unique moment in technology history.

www.columntax.com

October 29, 2025 at 1:48 PM

Michael R. Bock

@michaelrbock.com

We’re so confident that we’re publishing an internal roadmap document: our “secret” master plan to automate tax filing (just between you & me).

October 29, 2025 at 1:48 PM

Michael R. Bock

@michaelrbock.com

And now the combination of the latest AI progress and our expert team & large proprietary eval datasets means we’re the group that can finally fully automate tax filing and save people time & money.

October 29, 2025 at 1:48 PM

Michael R. Bock

@michaelrbock.com

The blog post in question: michaelrbock.com/hypothesis

Hypothesis Sheets: how to navigate and exit the idea maze with a (good) startup idea

In 2020 when we were at the beginning of our startup journey I had a conversation with Erik Goldman where he shared this process, which we used to start Column Tax.

michaelrbock.com

October 23, 2025 at 3:35 PM

Michael R. Bock

@michaelrbock.com

4/ next up?

adding tool use (code execution & web search) to see how that helps models calculate tax returns

also testing Claude Opus 4.1 and GPT-5 mini & nano

follow here: github.com/column-tax/...

column-tax/tax-calc-bench

Code & data for TaxCalcBench. Contribute to column-tax/tax-calc-bench development by creating an account on GitHub.

github.com

September 18, 2025 at 5:39 PM

Michael R. Bock

@michaelrbock.com

3/ GPT-5 is impressive in many ways

especially because it's knowledge cutoff is still September 2024

but it's not the leader in tax calculation today

(even with maximal test time compute)

September 18, 2025 at 5:39 PM

Michael R. Bock

@michaelrbock.com

2/ back in July, we published the first-ever eval for US personal income tax calculations

x.com/michaelrboc...

September 18, 2025 at 5:38 PM

Michael R. Bock

@michaelrbock.com

10/ Read more about the work, research, and results here:

www.columntax.com/blog/taxcal...

TaxCalcBench: Can AI file your taxes?

AI can’t do your taxes on its own (yet).

www.columntax.com

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

9/ This work wouldn’t have been possible without the hard work of our Tax Analyst team over the past 4 years & the success of our commercial product: you can’t buy this dataset on Scale or Surge.

View the dataset and testing harness here:

github.com/column-tax/...

GitHub - column-tax/tax-calc-bench: Code & data for TaxCalcBench

Code & data for TaxCalcBench. Contribute to column-tax/tax-calc-bench development by creating an account on GitHub.

github.com

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

8/ Models are also inconsistent:

using pass^k (a measure of reliability of a model across multiple runs on the same task), performance degrades with additional runs meaning models mess up in new & surprising ways when calculating tax returns.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

7/ For some models, performance improves with increased inference-time compute (thinking budget tokens)

but not for the best model (Gemini 2.5 Pro), suggesting alternative techniques/scaffolding/orchestration is required to get AI to do this tax calculation task.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

6/ Models consistently:

1. Misuse tax tables
2. Make calculation errors

For example, models will hallucinate line numbers on Forms or use incorrect eligibility limits.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

5/ Takeaway: models can’t calculate tax returns reliably today.

Even on this simplified data set and allowing the models to output to a simplified format, the best model only calculates 32.35% of returns correctly.

July 23, 2025 at 3:18 PM

Michael R. Bock

@michaelrbock.com

4/ TaxCalcBench is a dataset of 51 pairs of user inputs and the expected tax return output + a testing harness.

We made the task easy for the models. We provide:
- all of the data (e.g. W-2s) needed to file a return
- the expected output in IRS XML format

July 23, 2025 at 3:17 PM

Michael R. Bock

@michaelrbock.com

3/ Tax calculation means taking a user’s "inputs" (W-2s, 1099s) and outputting the Form 1040 in the IRS XML format.

75k pages of English text define the transformations required to do this.

Companies like @ColumnTax use deterministic tax engines to do these calculations.

July 23, 2025 at 3:17 PM

Michael R. Bock

@michaelrbock.com

2/ Today, we’re releasing TaxCalcBench: a first-ever benchmark dataset & eval framework for testing AI’s ability to calculate US personal income tax returns.

Tax is a secretive industry, so we’re proud to release a research paper sharing our findings:

arxiv.org/abs/2507.16126

TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task

Can AI file your taxes? Not yet. Calculating US personal income taxes is a task that requires building an understanding of vast amounts of English text and using that knowledge to carefully...

arxiv.org

July 23, 2025 at 3:17 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news