Back to AI Research

AI Research

Probabilistic Dating of Historical Manuscripts via... | AI Research

Key Takeaways

  • Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features This research introduces a new way to estimate the ag...
  • We introduce a probabilistic approach for dating historical manuscript pages from visual features alone.
  • Our architecture combines an EfficientNet-B2 backbone with a Normal-Inverse-Gamma (NIG) output head trained with a joint negative-log-likelihood and evidence-regularization objective.
  • Uncertainty decomposition shows aleatoric uncertainty is a strong predictor of dating error (Spearman $\rho=0.729$), and a selective prediction about the most certain 20\% of patches can provide \textbf{0.5 years MAE}.
  • Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
Paper AbstractExpand

We introduce a probabilistic approach for dating historical manuscript pages from visual features alone. Instead of aggregating centuries into classes as is standard in the previous literature, we pose dating as an evidential deep regression problem over a continuous year axis, allowing our neural network to output a full predictive distribution with decomposed aleatoric and epistemic uncertainty in a single forward pass. Our architecture combines an EfficientNet-B2 backbone with a Normal-Inverse-Gamma (NIG) output head trained with a joint negative-log-likelihood and evidence-regularization objective. On the DIVA-HisDB benchmark (150 pages, 3 medieval codices, 151,936 patches), our model scores a test MAE of 5.4 years, well below the 50-year century-label supervision granularity, with 93\% of patches within 5 years and 97\% within 10 years. Our approach achieves \textbf{PICP=92.6\%}, the best calibration among all compared methods, in a single forward pass, outperforming MC Dropout (PICP=88.2\%, 50 passes) and Deep Ensembles (PICP=79.7\%, 5 models) at $5\times$ lower inference cost. Uncertainty decomposition shows aleatoric uncertainty is a strong predictor of dating error (Spearman $\rho=0.729$), and a selective prediction about the most certain 20\% of patches can provide \textbf{0.5 years MAE}. We show that predicted uncertainty increases as image degradation worsens, spatial decomposition maps explain which script regions cause aleatoric uncertainty, and page-level aggregation reduces MAE to 4.5 years with $\rho=0.905$ between uncertainty and page-level error.

Probabilistic Dating of Historical Manuscripts via Evidential Deep Regression on Visual Script Features
This research introduces a new way to estimate the age of historical manuscripts by analyzing their visual appearance. Traditionally, automated dating tools either categorize manuscripts into broad century-long buckets or provide a single-year estimate without indicating how confident they are in that result. This paper proposes a "probabilistic" approach, which treats manuscript dating as a regression problem. Instead of just guessing a year, the model outputs a full range of possibilities, allowing it to quantify its own uncertainty. This is particularly useful for historians and archivists who need to know when a document was written but also need to understand the reliability of that estimate.

How the Approach Works

The researchers use a deep learning architecture that combines an EfficientNet-B2 backbone with a specialized "Normal-Inverse-Gamma" (NIG) output head. This design allows the model to perform its analysis in a single forward pass—a significant improvement in efficiency over older methods that required running the model dozens of times to estimate confidence.
The model decomposes uncertainty into two types: "aleatoric" uncertainty, which represents the inherent ambiguity in the script (such as faded ink or messy handwriting), and "epistemic" uncertainty, which represents the model's own lack of knowledge. By learning these, the system can distinguish between a document that is difficult to date because it is blurry and one that is difficult to date because the model hasn't seen that specific style of writing before.

Key Results and Performance

The model was tested on the DIVA-HisDB benchmark, which includes 150 pages from three different medieval codices. It achieved a mean absolute error (MAE) of 5.4 years, which is highly accurate given that the training data was labeled by century.
One of the most practical findings is the model's ability to perform "selective prediction." By filtering for the 20% of manuscript patches where the model is most confident, the accuracy improves to an MAE of just 0.5 years. This suggests a powerful workflow for archives: the system could automatically date clear, consistent documents while flagging ambiguous or degraded sections for human expert review. Additionally, the model proved to be a reliable indicator of image quality, as its uncertainty scores consistently increased when it encountered blurred or compressed images.

Limitations and Future Directions

While the results are promising, the study is currently limited by the diversity of the training data. The model performs well on the three specific script styles it was trained on, but it struggles to generalize to entirely new, unseen script families. Currently, the model does not yet "know what it doesn't know" when faced with a completely unfamiliar script, meaning it might provide a confident but incorrect date for a style it hasn't encountered. The authors suggest that future work should incorporate more diverse datasets to help the model better recognize when it is looking at a script outside of its training experience.

Comments (0)

No comments yet

Be the first to share your thoughts!