All posts

AI & Technology

Why No LLM Should Touch Your Tax Computation

By Michael Cutajar6 min read

There's a wave of AI tax products launching right now. They promise to "do your taxes" using large language models. Upload your documents, ask a question in natural language, get a tax return back.

It sounds great. It's also dangerous.

We build AI-powered accounting infrastructure. AI is core to what we do. But we deliberately exclude LLMs from one part of the pipeline: the computation. Here's why, and why it matters for any platform thinking about embedding tax functionality.

The difference between classification and computation

There are two fundamentally different jobs in accounting:

Classification is deciding what something is. Is this transaction business or personal? Is this invoice subject to VAT? Is this expense deductible? What category does it fall into?

Classification is fuzzy by nature. A meal could be a business expense or a personal one depending on context. A software subscription could be an office expense or cost of goods sold depending on the business. There are rules, but applying them requires judgment.

LLMs are good at this. They can read an invoice, understand the context, and make a reasonable classification decision. When they're wrong, a human reviewer catches it. The error is correctable and low-stakes.

Computation is applying the rules to produce a number. Once you know the taxable income, computing the tax liability is arithmetic. The VAT rate in Germany for standard-rated supplies is 19%. The Malta income tax rate for a single person earning €25,000 is calculated by applying specific brackets defined in legislation. The US self-employment tax is 92.35% of net earnings multiplied by 15.3%, with the Social Security portion capped at $176,100.

These aren't judgment calls. They're deterministic. The answer is either right or wrong. There's no "reasonable" range.

Why LLMs get computation wrong

LLMs are probabilistic. They predict the most likely next token. This works brilliantly for language, classification, summarisation, and reasoning. It fails for arithmetic and rule application, for a simple reason: the model doesn't calculate — it predicts what a calculation result looks like.

Most of the time, the prediction is right. But "most of the time" isn't good enough when the output is a tax return that gets filed with a government authority.

Here's what goes wrong:

Rate application errors. An LLM might apply the 2024 tax brackets to a 2025 return. It might use the wrong rate for a specific income band. It might miss a surcharge or forget a threshold. These aren't hallucinations in the dramatic sense — they're subtle, plausible-looking errors that are hard to catch without recalculating from scratch.

Compounding errors. Tax returns are sequential. Adjusted gross income feeds into taxable income, which feeds into tax liability, which feeds into credits, which feeds into amount owed or refunded. An error in step two compounds through every subsequent step. A deterministic engine produces the same result every time. An LLM might produce slightly different numbers on different runs.

Jurisdiction drift. When you operate across 30+ countries, the rules diverge dramatically. Malta's VAT has different exemptions from Germany's. The UK's National Insurance thresholds change annually. France's social charges have multiple components with different ceilings. An LLM trained on general tax knowledge doesn't reliably track which country's rules apply to which computation.

No audit trail. When a tax authority asks "how did you arrive at this number?", you need to show the calculation. A deterministic engine can produce a step-by-step trace: "Line 1: gross income €47,200. Line 2: deductible expenses €8,400. Line 3: net income = Line 1 – Line 2 = €38,800." An LLM can't show its working because it didn't work — it predicted.

What a deterministic engine looks like

A deterministic tax engine is software that implements the tax code as rules. It takes classified inputs (income, expenses, deductions, jurisdiction) and produces outputs (tax liability, VAT due, SSC payable) through explicit computation.

The rules are codified from legislation. When the law changes — new rates, new thresholds, new exemptions — the rules are updated by accountants who read the legislation, not by retraining a model. The engine produces the same output for the same input every time. It can be audited, tested, and verified against known examples.

This is how Avalara computes sales tax. This is how payroll systems compute withholding. This is how every serious tax computation has worked for decades. The idea that you'd replace it with a probabilistic model because the model can also write poetry is, frankly, reckless.

Where AI belongs in the pipeline

AI is transformative for accounting. But it belongs in specific places:

In all of these cases, the AI is doing what AI is good at: handling ambiguity, processing language, making judgment calls. The computation — the actual math — stays deterministic.

What this means for platforms

If you're a platform considering how to offer tax and accounting to your users, ask your provider one question: is the tax computation deterministic or AI-generated?

If the answer is "AI handles everything," your users are getting tax returns produced by a system that predicts what the right answer looks like rather than calculating it. That might be fine for a rough estimate. It's not fine for a filing that carries legal consequences.

If the answer is "AI classifies, deterministic engines compute," you're getting the best of both worlds: the flexibility and intelligence of AI where judgment is needed, and the precision of hard-coded rules where arithmetic is needed.

Your users' tax returns should be computed, not predicted.


Accora uses AI for classification and queries. Tax computation is deterministic, built on codified law. Warranted accountants review every output before it's filed. Learn more at accora.ai


Michael Cutajar, CPA — Founder of Accora.