There is a fundamental tension at the heart of AI in tax compliance that most technology marketing carefully avoids mentioning. Tax law is deterministic. Document processing is probabilistic. Confusing the two creates expensive, potentially illegal, outcomes.
The Nature of Tax Law
Tax law is a set of rules with defined inputs and correct outputs. If a self-employed professional in Malta earns EUR 50,000 in a year, their tax liability is not approximately EUR 8,000 or probably around EUR 8,000. It is a specific, calculable number determined by the Income Tax Act (Chapter 123 of the Laws of Malta), the applicable tax rates, the allowable deductions, and the relevant exemptions.
The VAT rate on a standard-rated supply in Malta is 18%. Not roughly 18%. Not most likely 18%. Exactly 18%. The reduced rate on accommodation is 7%. The reduced rate on electricity is 5%. These are statutory facts, not predictions.
This determinism extends to every aspect of tax compliance: filing deadlines, threshold calculations, exemption criteria, penalty computations, and reporting formats. There is a correct answer, and anything that is not the correct answer is wrong. Being wrong has consequences: penalties, interest, audits, and in serious cases, prosecution.
The Nature of Document Processing
Financial document processing is the opposite. When an AI system reads a receipt, it is making probabilistic assessments at every step. Is that character a 1 or a 7? Is that amount EUR 14.50 or EUR 74.50? Is this an expense for office supplies or client entertainment? Is this receipt from the supplier whose name appears to be "Mdna Off. Sup." which might be "Mdina Office Supplies" or "Madonna Office Supply"?
Each of these determinations carries uncertainty. The system's confidence might be 99% on a clearly printed invoice or 82% on a faded thermal paper receipt. This uncertainty is not a flaw to be eliminated. It is an inherent characteristic of converting unstructured, real-world data into structured, machine-readable form.
The most honest and technically accurate thing an AI document processing system can say is: "I am X% confident that this document is an invoice from Supplier Y, dated Z, for amount A, with VAT at rate R." That probabilistic statement is very different from "this invoice is from Supplier Y for amount A."
Why Pure LLMs Are Dangerous for Tax
Large language models are probabilistic systems. They generate outputs based on statistical patterns in their training data. When you ask GPT-4 what VAT rate applies to a specific supply in Malta, it will give you an answer. That answer will usually be correct, because the training data contains enough references to Maltese VAT rates. But "usually correct" is not the standard that tax compliance demands.
The dangers are well documented:
Hallucination. LLMs generate plausible-sounding but factually incorrect information. They might cite a tax rate that existed in a previous year, apply a rule from the wrong jurisdiction, or simply fabricate a regulation that sounds reasonable but does not exist. OpenAI's own documentation acknowledges this limitation, and academic research consistently demonstrates hallucination rates that, while declining with each model generation, remain non-trivial.
Temporal confusion. Tax law changes. Malta's budget announcements regularly adjust rates, thresholds, and incentives. An LLM trained on data through a certain date will not know about subsequent changes unless it is continuously updated. A model that correctly states the Micro Invest tax credit rules as of 2024 might give outdated information for 2026.
Jurisdictional blending. LLMs are trained on data from many jurisdictions. They can conflate rules from different countries, particularly when asked about smaller jurisdictions where training data is sparse. Maltese tax law is sufficiently different from UK or US law that a model applying the wrong jurisdiction's rules produces completely invalid results.
The Deloitte warning. In 2023, reports emerged of a legal case where an AI system generated fictional case citations, a hallucination that cost the firms involved significant reputational damage and financial penalties. In the tax context, an AI hallucinating a non-existent exemption or misquoting a statutory rate creates audit risk for every return it touches.
Why Pure Rules Engines Are Insufficient
The obvious solution might seem to be abandoning AI entirely and using pure rules-based systems. Tax calculation engines have existed for decades. They take structured inputs (income amount, deduction amount, filing status) and apply deterministic rules to produce correct outputs.
The problem is upstream. Rules engines need structured, clean data as input. They cannot read a crumpled receipt photographed under a car dashboard light. They cannot parse an email from a supplier that says "here's what you owe us for last month's stuff, cheers." They cannot extract the VAT registration number from a PDF invoice where the layout puts it in an unexpected location.
The manual process of converting unstructured financial documents into structured data is exactly the bottleneck that accounting automation aims to eliminate. A rules engine that requires a human to type in every figure is just a calculator with extra steps.
The Hybrid Architecture
The systems that actually work in production combine both approaches. The architecture is conceptually straightforward even if the implementation is complex:
Layer 1: Probabilistic AI for data extraction. Computer vision and NLP models read financial documents and extract relevant data: amounts, dates, supplier names, VAT rates, line items. Each extraction carries a confidence score. The AI is doing what it does best: handling ambiguity, varied formats, poor quality inputs, and multilingual content.
Layer 2: Validation and enrichment. Extracted data is validated against known references. Is this supplier in the existing supplier database? Does the extracted VAT number match a valid format? Is the total mathematically consistent with the line items and tax amounts? Failures at this stage trigger flags for human review.
Layer 3: Deterministic rules engine for tax compliance. Validated, structured data feeds into a rules engine that applies tax law with zero tolerance for ambiguity. The rules engine knows that the extracted expense of EUR 250 for business travel in Malta is deductible under Article 14 of the Income Tax Act, that the VAT of EUR 45 at 18% is reclaimable as input tax, and that this must appear in the correct box of the VAT return.
Layer 4: Human review for exceptions. Transactions where the AI confidence is below threshold, where validation checks fail, or where the rules engine encounters edge cases are routed to a qualified human. The human does not review every transaction. They review only the exceptions, with full context about what the AI extracted and why it was flagged.
Why This Matters for VAT
VAT compliance is particularly sensitive to the deterministic-probabilistic distinction because classification errors propagate. If a transaction is misclassified as standard-rated when it should be exempt, the VAT return contains an incorrect figure. If that error is systematic, affecting a category of transactions, the cumulative impact across a year of quarterly returns can be substantial.
Consider the Maltese scenario: a real estate agent receives a commission payment from a property developer. Is this a standard-rated supply at 18%? An exempt financial service? A supply subject to the margin scheme? The correct answer depends on the specific nature of the service, the status of the parties, and the applicable provisions of the VAT Act (Chapter 406).
An LLM might give you a plausible answer. But "plausible" is not the same as "correct according to the Commissioner for Revenue's interpretation as set out in Guidelines issued under Article 37 of the VAT Act." The rules engine, programmed with the actual statutory provisions and administrative guidance, will give you the correct answer every time, provided it receives the correct inputs.
The AI's job is to ensure the rules engine receives those correct inputs. The AI reads the commission statement, extracts the amount, identifies the parties, and classifies the nature of the supply. The rules engine then applies the correct VAT treatment. If the AI is uncertain about the classification, it flags the transaction for human review rather than guessing.
The Cost Calculus
There is a quantifiable economic argument for the hybrid approach. Consider a business processing 500 transactions per month:
Fully manual processing: An experienced bookkeeper takes an average of 3 minutes per transaction: reading the document, entering the data, categorising it, and determining the tax treatment. That is 25 hours per month, roughly a part-time role.
Pure AI processing: If the AI processes everything without human review and achieves 95% accuracy, that is 25 errors per month. Each error costs time to identify (typically during reconciliation or audit preparation), investigate, and correct. Error correction typically takes 10-15 minutes per error, and some errors are not caught until a tax audit, where the consequences are more severe.
Hybrid processing: The AI processes all transactions, flags approximately 10-15% for human review (50-75 transactions), and achieves 99%+ accuracy on the automatically processed ones. A human reviews the flagged items, taking perhaps 2 minutes each (since the AI has already extracted the data and identified the specific uncertainty). Total human time: approximately 3 hours per month, an 88% reduction versus fully manual processing, with higher overall accuracy.
The mathematics favour the hybrid approach at every scale. It reduces human effort without sacrificing the accuracy that tax compliance demands. This is not about replacing humans with AI. It is about deploying each where they add the most value.
Michael Cutajar, CPA — Founder of Accora.