Consensus-based Agentic Large Language Model Framew...

Consensus-based Agentic Large Language Model Framew... | AI Research

Key Takeaways

Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification Accurate classification of Harmonized Tariff Schedu...
Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics.
This paper proposes an agentic large language model (LLM) framework for Canadian 10-digit HTS code classification in smart-port and maritime logistics environments.
We evaluate the framework on a private dataset of 3,300 domain-expert-labeled product records collected from logistics and delivery contexts.
These findings demonstrate the need for evidence-grounded, uncertainty-aware, and human-centered classification workflows rather than fully autonomous single-step prediction.

Paper AbstractExpand

Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging because product descriptions are often short, incomplete, or ambiguous, while correct classification depends on hierarchical tariff structures, legal notes, and jurisdiction-specific rules. This paper proposes an agentic large language model (LLM) framework for Canadian 10-digit HTS code classification in smart-port and maritime logistics environments. The framework integrates multi-agent information retrieval, semantic retrieval over official tariff documents, evidence-grounded reasoning, consensus-based validation, element-wise voting across hierarchical code components, confidence estimation, and human-in-the-loop escalation. We evaluate the framework on a private dataset of 3,300 domain-expert-labeled product records collected from logistics and delivery contexts. Experimental results show that exact 10-digit classification remains difficult even for advanced LLMs, with performance decreasing from coarse chapter-level prediction to fine-grained tariff and statistical suffix assignment. These findings demonstrate the need for evidence-grounded, uncertainty-aware, and human-centered classification workflows rather than fully autonomous single-step prediction. The proposed framework supports more interpretable, accountable, and compliance-oriented HTS classification for maritime logistics and smart-port operations. Our code is available at this https URL .

Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification
Accurate classification of Harmonized Tariff Schedule (HTS) codes is a critical but difficult task in maritime logistics. These codes determine duty rates, ensure regulatory compliance, and facilitate trade statistics. However, because product descriptions are often vague or incomplete, and because the classification process must follow strict, hierarchical legal rules, automated systems frequently struggle to achieve high accuracy. This paper introduces an agentic framework that uses Large Language Models (LLMs) to perform this classification by mimicking the evidence-based reasoning of human experts, rather than relying on simple, single-step predictions.

How the Framework Works

The system functions as a multi-agent workflow that treats HTS classification as a structured, evidence-grounded task. Instead of asking an AI to guess a code in one go, the framework breaks the process into several intelligent steps:

Evidence Gathering: The system uses multi-agent information retrieval to search for details about the product and cross-references them with official tariff documents.
Hierarchical Reasoning: Because HTS codes are built in layers—from broad chapters down to specific statistical suffixes—the model validates each level of the code to ensure it remains consistent with the overall hierarchy.
Consensus-Based Validation: The framework uses "element-wise voting" and self-consistency checks. By comparing multiple reasoning paths, the system can identify when its own internal logic is conflicted.

Managing Uncertainty and Human Oversight

A key feature of this framework is its ability to recognize its own limitations. The system calculates a confidence score for its predictions. If the model determines that the product description is too ambiguous or the legal requirements are too complex to resolve with high certainty, it triggers a "human-in-the-loop" escalation. In these cases, the system generates specific questions for a human user, asking for the missing attributes—such as material composition or intended use—needed to finalize the correct code. This ensures that the process remains accountable and auditable.

Key Findings and Performance

The researchers tested the framework on a private dataset of 3,300 expert-labeled Canadian HTS records. The results highlight that even advanced LLMs find 10-digit HTS classification to be a significant challenge. The study observed a clear trend: while models are generally better at predicting broad categories (like chapters), their accuracy drops as they move toward the more granular, fine-grained tariff items and statistical suffixes.
These findings suggest that fully autonomous, single-step AI classification is risky for customs compliance. Instead, the authors argue that the industry should shift toward workflows that prioritize evidence-based reasoning, uncertainty detection, and human collaboration to ensure that trade documentation remains accurate and legally sound.

Consensus-based Agentic Large Language Model Framew... | AI Research

Key Takeaways

How the Framework Works

Managing Uncertainty and Human Oversight

Key Findings and Performance

Comments (0)

No comments yet