
Case Study: Reading Human Hand Writing with MVTec Halcon DeepOCR
Exploring MVTec’s Deep Learning Tool (DLT) – Version 24.12
We recently tested the capabilities of MVTec’s Deep Learning Tool following its update to version 24.12. Of particular interest was the introduction of OCR data labeling within the DLT, enabling us to explore Deep OCR.
Our test involved a dataset of handwritten words from two individuals, aiming to train a model to recognize handwritten text. Given the small dataset and the challenging handwriting, we did not expect perfect accuracy especially as we struggled to read some of the writing ourselves!
Dataset & Training
We labeled 3,000 words, ensuring coverage of all alphabetic symbols, numerics, and some punctuation. A single model was trained to recognize both handwriting styles.
Labeling Process
The latest DLT update introduced assisted labeling for Deep OCR, which significantly sped up the process. The tool automatically detected most words and suggested labels, reducing the need for manual box drawing. Additionally, the DLT streamlined dataset splitting and hyperparameter tuning.
Training
MVTec’s Deep Learning supports NVIDIA’s cuDNN-enabled hardware, allowing for efficient training. Using an RTX A2000 12GB GPU, the model trained on 3,000 words in approximately 20 minutes.
While Deep OCR models require GPUs for training, other model types support both CPUs and GPUs. For example, a global context anomaly detection model took 15 hours on a CPU but under 2 hours on an NVIDIA GPU—highlighting the speed advantage. Runtime inference, including for Deep OCR, is supported on both.
Results & Analysis
As expected with a small dataset, the model showed signs of overfitting. It accurately recognized 45% of unseen words (i.e., the entire word was correctly identified).
A word was classified as incorrect if at least one character was misidentified.
However, manual analysis showed that 70% of the incorrect words contained only a single character error. Given this and the limited dataset size, the 45% accuracy is a promising result.
Notably, the model does not attempt to recognize English words; it simply identifies characters. A simple post-processing step (e.g., autocorrection) could significantly improve accuracy, though not all errors would be easily correctable…..
Opportunities for Improvement
One potential enhancement to the DLT would be more granular accuracy metrics, such as:
Character-level accuracy – How many individual characters were correctly predicted?
Approximate word accuracy – Using a measure like Levenshtein distance, allowing users to define a "close enough" threshold.
Final Thoughts
MVTec’s Deep Learning Tool continues to impress. At #Oculus Vision, we frequently use it for real-world industrial machine vision applications. The tool offers a wide range of model types, and MVTec has done an excellent job streamlining the process from labeling to training, evaluation, and testing.