Probabilistic Dating of Historical Manuscripts via...

Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
This research introduces a new way to estimate the age of historical manuscripts by analyzing their visual appearance. Traditionally, automated dating tools either categorize manuscripts into broad century-long buckets or provide a single-year estimate without indicating how confident they are in that result. This paper proposes a "probabilistic" approach, which treats manuscript dating as a regression problem. Instead of just guessing a year, the model outputs a full range of possibilities, allowing it to quantify its own uncertainty. This is particularly useful for historians and archivists who need to know when a document was written but also need to understand the reliability of that estimate.

How the Approach Works

The researchers use a deep learning architecture that combines an EfficientNet-B2 backbone with a specialized "Normal-Inverse-Gamma" (NIG) output head. This design allows the model to perform its analysis in a single forward pass—a significant improvement in efficiency over older methods that required running the model dozens of times to estimate confidence.
The model decomposes uncertainty into two types: "aleatoric" uncertainty, which represents the inherent ambiguity in the script (such as faded ink or messy handwriting), and "epistemic" uncertainty, which represents the model's own lack of knowledge. By learning these, the system can distinguish between a document that is difficult to date because it is blurry and one that is difficult to date because the model hasn't seen that specific style of writing before.

Key Results and Performance

The model was tested on the DIVA-HisDB benchmark, which includes 150 pages from three different medieval codices. It achieved a mean absolute error (MAE) of 5.4 years, which is highly accurate given that the training data was labeled by century.
One of the most practical findings is the model's ability to perform "selective prediction." By filtering for the 20% of manuscript patches where the model is most confident, the accuracy improves to an MAE of just 0.5 years. This suggests a powerful workflow for archives: the system could automatically date clear, consistent documents while flagging ambiguous or degraded sections for human expert review. Additionally, the model proved to be a reliable indicator of image quality, as its uncertainty scores consistently increased when it encountered blurred or compressed images.

Limitations and Future Directions

While the results are promising, the study is currently limited by the diversity of the training data. The model performs well on the three specific script styles it was trained on, but it struggles to generalize to entirely new, unseen script families. Currently, the model does not yet "know what it doesn't know" when faced with a completely unfamiliar script, meaning it might provide a confident but incorrect date for a style it hasn't encountered. The authors suggest that future work should incorporate more diverse datasets to help the model better recognize when it is looking at a script outside of its training experience.

Probabilistic Dating of Historical Manuscripts via... | AI Research

Key Takeaways

How the Approach Works

Key Results and Performance

Limitations and Future Directions

Comments (0)

No comments yet