A Deterministic Agentic Workflow for HS Tariff Clas...

A Deterministic Agentic Workflow for HS Tariff Clas... | AI Research

Key Takeaways

A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions The Harmonized System (HS) is th...
End-to-end prompting of large language models fails characteristically by resolving one axis while ignoring the priority constraints on the others.
This design yields interpretability by construction--each decision is decomposed into stage-wise structured outputs with verbatim citation of the chapter or section notes that bear on it.
The architecture combines offline knowledge-engineering of the Chinese HS tariff with an online six-stage pipeline.
A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions

Paper AbstractExpand

Harmonized System (HS) tariff classification is a high-stakes, expert-level task in which a free-form product description must be mapped to a specific six- or eight-digit code under the General Interpretive Rules (GIR), section notes, chapter notes, and Explanatory Notes. The difficulty lies not in knowledge volume but in *multi-dimensional rule reasoning*: a correct classification must satisfy competing priority rules along several axes simultaneously, including material, form, function, essential character, the part-versus-whole boundary, and specific listing versus residual headings. End-to-end prompting of large language models fails characteristically by resolving one axis while ignoring the priority constraints on the others. We present a *deterministic agentic workflow* in contrast to self-planning agents: the control flow is fixed, language model calls are confined to narrow stages, and reflection and verification are retained as local mechanisms. This design yields interpretability by construction--each decision is decomposed into stage-wise structured outputs with verbatim citation of the chapter or section notes that bear on it. The architecture combines offline knowledge-engineering of the Chinese HS tariff with an online six-stage pipeline. Evaluated on HSCodeComp at the six-digit level, the workflow reaches 75.0% top-1 and 91.5% top-3 at four digits, and 64.2% top-1 and 78.3% top-3 at six digits with Qwen3.6-plus; an open-weight Qwen3.6-27B-FP8 backbone in non-thinking mode achieves 84.2% four-digit and 77.4% six-digit top-1 agreement with the frontier model. A two-stage manual audit of 226 six-digit disagreements suggests that a non-trivial fraction of HSCodeComp ground-truth labels may deviate from HS general rules; full adjudication records are released in the appendix as preliminary findings for community review.

A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions
The Harmonized System (HS) is the global standard for classifying traded goods, but assigning the correct code is a complex, high-stakes task. Experts must navigate competing rules regarding a product's material, form, and function, often while adhering to strict legal notes. This paper addresses why standard AI models struggle with this task: they often focus on one aspect of a product while ignoring the complex, hierarchical priority rules required by customs law. The authors introduce a "deterministic agentic workflow" that replaces open-ended AI planning with a fixed, step-by-step process to ensure accurate and legally defensible classifications.

Why Standard AI Models Fail

The primary challenge in HS classification is "multi-dimensional rule reasoning." A product might be made of plastic (material), be in the form of a film (form), and be used for a phone screen (function). Customs rules often dictate that one of these factors must take priority over the others. When large language models are asked to classify a product in one go, they frequently resolve one dimension correctly but ignore the priority constraints of the others. Furthermore, these models often lack access to the specific, structured legal text required to make a correct decision, leading them to fabricate codes that do not exist.

A Fixed, Step-by-Step Workflow

Instead of allowing an AI to decide its own path, the authors created a rigid, six-stage pipeline that mirrors the structure of the HS tariff itself. The process begins by extracting key product attributes, then moves through candidate retrieval, shortlisting, and deep ranking based on specific chapter and section notes. By forcing the model to follow a fixed sequence—moving from the chapter level down to the subheading—the system ensures that every decision is grounded in the correct legal context. This design makes the AI’s reasoning "interpretable by construction," meaning the system provides verbatim citations from legal notes for every classification it makes.

Performance and Accuracy

The researchers evaluated their workflow using the HSCodeComp benchmark. Using the Qwen3.6-plus model, the system achieved a 64.2% top-1 accuracy at the six-digit level. Notably, the architecture is efficient enough that even a smaller, open-weight 27B-class model achieved results closely aligned with larger frontier models. This suggests that the accuracy of the system comes from the structured, deterministic workflow rather than relying solely on the raw reasoning power of a single massive AI model.

Insights from Manual Audits

A significant finding emerged during a manual audit of 226 cases where the system disagreed with the benchmark’s ground-truth labels. The authors discovered that a non-trivial portion of the existing benchmark labels appeared to deviate from the official HS general rules. By releasing their full adjudication records, the authors provide a resource for the community to review these findings, highlighting that even expert-level benchmarks may contain errors that require careful, rule-based verification.