Multi-Field Hybrid Retrieval-Augmented Generation for Maritime Accident Root Cause Analysis
This paper addresses the labor-intensive nature of maritime safety investigations, where experts must manually sift through decades of tribunal reports to identify the root causes of accidents. The authors propose a framework that automates this process by using Retrieval-Augmented Generation (RAG). By organizing 13,329 historical reports into structured "incident cards" and using a specialized retrieval system, the framework helps investigators find relevant past precedents and generates evidence-based root cause analysis (RCA) reports that are more consistent and accurate than those produced by standard AI models.
Structuring Historical Data
The researchers transformed raw, decades-old tribunal documents into a structured knowledge base. Each report was broken down into three distinct, retrievable fields: the Summary (incident description), the Causes (tribunal reasoning), and the Disposition (administrative outcomes). By separating these sections, the system avoids the "dilution" that occurs when a document is treated as one long block of text. Additionally, the team used rule-based tagging to categorize cases into a hierarchical taxonomy of causes, ensuring that the AI can consistently label and organize findings according to established maritime safety standards.
A Field-Aware Hybrid Retrieval Strategy
To find the most relevant historical cases, the system uses a "hybrid" retrieval approach. It combines sparse retrieval (which excels at finding exact technical or legal keywords) with dense retrieval (which uses AI embeddings to understand the semantic meaning behind the text). These results are then fused using Reciprocal Rank Fusion (RRF). This method is "field-aware," meaning it specifically looks for matches across the different sections of the incident cards. This ensures that the system doesn't just find documents that look similar on the surface, but rather those that share the same underlying causal logic and administrative outcomes.
Improving Accuracy and Traceability
The study demonstrates that grounding an AI model in these retrieved precedents significantly improves the quality of its output. Compared to an AI model working without external evidence, the proposed framework achieved an 11.5% improvement in "LLM-as-a-judge" scores, which measure the coherence and accuracy of the generated analysis. Case studies show that the system effectively prevents the AI from making "hallucinated" or speculative leaps—such as blaming alcohol for an explosion when the actual cause was a technical equipment failure—by forcing the model to rely on verified, historically similar precedents.
Considerations and Limitations
Because large-scale expert labels for these reports are difficult to obtain, the researchers developed a metadata-driven proxy score to evaluate how well the system retrieves relevant cases. While this provides a reproducible way to measure performance, the authors note that it is a heuristic approach rather than a perfect gold standard. Additionally, the use of rule-based tagging and automated evaluation means there is a potential for model-dependent bias. Despite these constraints, the framework offers a significant step forward in streamlining maritime safety workflows by providing investigators with a faster, more reliable way to synthesize complex accident data.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!