All posts

AI & Technology

How Natural Language Processing Is Transforming Document Processing in Accounting

By Michael Cutajar9 min read

Accountants have always been document processors. The core job involves reading financial documents, extracting the relevant numbers, categorising them correctly, and producing accurate reports. Natural Language Processing is automating each of these steps, but the gap between reading text and understanding its financial meaning is where the real transformation is happening.

From OCR to NLP: Reading vs Understanding

Optical Character Recognition (OCR) has existed for decades. It converts images of text into machine-readable characters. Modern OCR engines from Google (Vision AI), Amazon (Textract), and Microsoft (Azure AI Document Intelligence) achieve character-level accuracy rates above 99% on clean, printed documents.

But OCR only reads. It does not understand. An OCR engine scanning an invoice will output a stream of text: "Supplier: Mediterranean Office Supplies Ltd. Invoice No: INV-8847. Date: 15/03/2026. Item: A4 Paper 5 Reams. Qty: 10. Unit Price: EUR 4.50. VAT 18%: EUR 8.10. Total: EUR 53.10."

That text is accurate but unstructured. The OCR engine does not know that "EUR 53.10" is the total amount, that "18%" is the VAT rate, or that "Mediterranean Office Supplies Ltd" is the supplier name. It just sees characters on a page.

NLP bridges this gap. It takes the OCR output and applies linguistic and contextual understanding to extract meaning. It identifies that the number following "Total:" is the invoice total. It recognises that the percentage near "VAT" is the tax rate. It understands that the text at the top of the document near "Supplier:" is the vendor name.

Named Entity Recognition for Financial Documents

Named Entity Recognition (NER) is the NLP technique most directly relevant to accounting document processing. Standard NER models identify entities like people, organisations, locations, and dates. Financial NER extends this to accounting-specific entities:

Training financial NER models requires annotated datasets where humans have marked each entity in thousands of documents. The Stanford NER system and spaCy's entity recognition pipeline provide foundations, but financial document processing demands custom models trained on domain-specific data.

Research published at EMNLP and ACL conferences has shown that transformer-based NER models, particularly those that incorporate layout information alongside text, achieve F1 scores above 95% on structured financial documents. Microsoft's LayoutLM family of models, now in its third iteration, was specifically designed for document understanding tasks that require both textual and spatial features.

The Format Chaos

If every invoice followed the same template, document processing would be a solved problem. They do not. A single accounting firm might receive documents in dozens of formats:

Each format presents different challenges. PDF invoices are relatively clean but may use embedded fonts that OCR engines struggle with. Phone photographs introduce perspective distortion, uneven lighting, and motion blur. WhatsApp messages mix conversation with financial data in an unstructured stream.

The NLP system must handle all of these. Pre-processing pipelines correct for image quality issues before OCR. Document classification models identify what type of document has been received. Specialised extraction models then apply the appropriate processing strategy for each document type.

Multilingual Financial Documents

A Maltese business operates in an inherently multilingual environment. English and Maltese are both official languages. Italian is widely understood. EU regulations and cross-border transactions introduce French, German, and other European languages.

Multilingual NLP for financial documents faces specific challenges beyond general translation:

Multilingual transformer models like XLM-RoBERTa and mBERT provide cross-lingual understanding, but fine-tuning on financial documents in each target language significantly improves performance. The practical challenge is assembling sufficient annotated training data in languages with smaller digital footprints.

The Accuracy Challenge

No AI system processes financial documents with 100% accuracy. The critical question is where errors occur and how they are handled.

Common extraction errors include:

Confidence Scores and Human Review

Well-designed systems address extraction uncertainty through confidence scoring. Each extracted field is assigned a confidence score reflecting the model's certainty. A clearly printed "Total: EUR 1,500.00" on a high-quality PDF might receive a 99% confidence score. A partially obscured amount on a crumpled receipt might receive 72%.

The system then applies a threshold. Extractions above the threshold (typically 90-95%) are accepted automatically. Those below are flagged for human review. This creates a tiered workflow:

This approach optimises the allocation of human attention. Rather than reviewing every document, humans focus on the cases where the AI is genuinely uncertain. The corrections humans make feed back into the training data, progressively improving the model's accuracy on similar documents.

Real-World Accuracy Rates

Published benchmarks and industry reports provide a realistic picture of current capabilities:

These figures represent field-level accuracy, meaning the percentage of individual fields (date, amount, supplier name) correctly extracted. Document-level accuracy, where every field must be correct for the document to count as successfully processed, is naturally lower.

The trajectory is consistently improving. Google's Document AI accuracy improved by 15-20% between 2021 and 2024 on benchmark datasets. Amazon Textract's invoice processing feature, launched in 2022, reduced error rates by approximately 30% compared to its general-purpose document analysis. Each major model version brings measurable improvements, particularly on the difficult edge cases that drive accuracy from 95% toward 99%.

For the accounting profession, these accuracy rates are already sufficient to transform the workflow. The AI handles the bulk processing. Humans handle the exceptions. The result is faster, cheaper, and often more accurate than fully manual processing, where human data entry error rates of 1-4% are well documented.


Michael Cutajar, CPA — Founder of Accora.