Back to AI Research

AI Research

LECTOR: Joint Optimization of Scientific Reasoning... | AI Research

Key Takeaways

  • What the paper is about AI Scientists have shown promising progress across multiple stages of the research pipeline, among which automatic scientific paper w...
  • AI Scientists have shown promising progress across multiple stages of the research pipeline, among which automatic scientific paper writing remains a formidable challenge.
  • The Introduction writing is especially challenging, which demands not only linguistic fluency, but logical soundness and verifiable faithfulness.
  • Most AI-assisted methods treat the task as text generation instead of reasoning and structuring, leading to severe drawbacks, e.g., hallucinating citations.
  • To address this, we first formulate the Content-Conditional Introduction Generation (CCIG) task, which requires grounding the Introduction in the paper's core evidence.
Paper AbstractExpand

AI Scientists have shown promising progress across multiple stages of the research pipeline, among which automatic scientific paper writing remains a formidable challenge. The Introduction writing is especially challenging, which demands not only linguistic fluency, but logical soundness and verifiable faithfulness. Most AI-assisted methods treat the task as text generation instead of reasoning and structuring, leading to severe drawbacks, e.g., hallucinating citations. To address this, we first formulate the Content-Conditional Introduction Generation (CCIG) task, which requires grounding the Introduction in the paper's core evidence. We then propose LECTOR, a novel Logic-Expression Co-Reinforcement Learning framework that can strictly follow the scientist's logic, add high-quality citations and keep structured expressions. LECTOR first constructs a logic-reasoning graph from the paper's main body to serve as a verifiable logical blueprint. Subsequently, it employs a Logic-Expression Co-Rewarding mechanism to jointly optimize for both the graph's structural fidelity and the final narrative's quality. We conduct a dataset from Nature Communications papers to assess our method. Extensive experiments show consistent improvements in both logic fidelity and Introduction generation quality metrics, e.g., Graph Quality (+26.7%), Citation Quality (+8.6%), and Paper Consistency (+3.3%). Code and data are available at this https URL .

What the paper is about

AI Scientists have shown promising progress across multiple stages of the research pipeline, among which automatic scientific paper writing remains a formidable challenge. The Introduction writing is especially challenging, which demands not only linguistic fluency, but logical soundness and verifiable faithfulness. Most AI-assisted methods treat the task as text generation instead of reasoning and structuring, leading to severe drawbacks, e.g., hallucinating citations. To address this, we first formulate the Content-Conditional Introduction Generation (CCIG) task, which requires grounding the Introduction in the paper's core evidence. We then propose LECTOR, a novel Logic-Expression Co-Reinforcement Learning framework that can strictly follow the scientist's logic, add high-quality citations and keep structured expressions. LECTOR first constructs a logic-reasoning graph from the paper's main body to serve as a verifiable logical blueprint. Subsequently, it employs a Logic-Expression Co-Rewarding mechanism to jointly optimize for both the graph's structural fidelity and the final narrative's quality. We conduct a dataset from Nature Communications papers to assess our method. Extensive experiments show consistent improvements in both logic fidelity and Introduction generation quality metrics, e.g., Graph Quality (+26.7%), Citation Quality (+8.6%), and Paper Consistency (+3.3%). Code and data are available at this https URL .

What it covers

LECTOR: Joint Optimization of Scientific Reasoning Graphs and Introduction Generation Jiabei Xiao Yizhou Wang Chen Tang Pengze Li Wanli Ouyang Shixiang Tang Abstract AI Scientists have shown promising progress across multiple stages of the research pipeline, among which automatic scientific paper writing remains a formidable challenge. The Introduction writing is especially challenging, which demands not only linguistic fluency, but logical soundness and verifiable faithfulness. Most AI-assisted methods treat the task as text generation instead of reasoning and structuring, leading to severe drawbacks, e.g. , hallucinating citations. To address this, we first formulate the Content-Conditional Introduction Generation (CCIG) task, which requires grounding the Introduction in the paper’s core evidence. We then propose LECTOR, a novel Logic-Expression Co-Reinforcement Learning framework that can strictly follow the scientist’s logic, add high-quality citations and keep structured expressions. LECTOR first constructs a logic-reasoning graph from the paper’s main body to serve as a verifiable logical blueprint. Subsequently, it employs a Logic-Expression Co-Rewarding mechanism to jointly optimize for both the graph’s structural fidelity and the final narrative’s quality. We conduct a dataset from Nature Communications papers to assess our method. Extensive experiments show consistent improvements in both logic fidelity and Introduction generation quality metrics, e.g. , Graph Quality (+ 26.7% ), Citation Quality ( +8.6% ), and Paper Consistency (+ 3.3% ). Code and data are available at https://github.com/Xiao-Youth/LECTOR . Machine Learning, ICML 1 Introduction Recent advances in large language models have enabled the development of AI Scientist systems, which aim to automate the scientific research process. This ambition is exemplified by the development of massive, closed-source models like OpenAI’s Prism, which target scientific writing (OpenAI, 2026 ) . While such models may achieve high linguistic fluency, their “black-box” nature renders their internal reasoning processes unverifiable. In scientific writing, the Introduction section is particularly critical in summarizing the entire research: a high-quality Introduction requires not only fluent language generation but also accurate understanding of research logic and structured presentation of motivation, methodology, and contributions. Consequently, an AI’s capacity to generate a logically sound and well-structured Introduction serves as a critical benchmark , distinguishing deep comprehension from superficial text generation rather than merely producing surface-level text. Figure 1 : Overview of the LECTOR framework and performance evaluation. (a) Existing methods treat introduction writing as a direct text generation task, often leading to logical inconsistencies and hallucinations. (b) LECTOR reformulates the task as Content-Conditional Introduction Generation (CCIG), first extracting a Reasoning Logic Graph as a verifiable logical blueprint to guide logic-aware writing. (c) Results show that our LECTOR-4B significantly outperforms the Qwen3-4B baseline. Notably, LECTOR-4B achieves superior Overall Performance compared to the state-of-the-art commercial closed-source model GPT-o3 , validating the effectiveness of our logic-expression co-reinforcement learning approach. Yet, existing AI-assisted writing methods always fail to meet this benchmark. The core reason is that they treat Introduction writing as a common text generation problem, when it is fundamentally a task of reasoning and structuring. It requires abstracting the paper-level reasoning structure from technical content and only then transforming it into a coherent high-level narrative. Most methods simply design writing prompts for general LLMs, a black-box approach that bypasses the crucial reasoning step entirely, leading to severe drawbacks that compromise academic integrity. First, these methods can result in hallucinating citations, e.g., nonexistent publications or incorrect authorship. Second, and more critically, they fail to ensure logical consistency between the Introduction and the following Results and Methodology . As a result, generated Introductions often exhibit logical inconsistencies, missing motivations, or misaligned contributions. To systematically address these failures, we propose a new content-conditioned introduction generation (CCIG) task, a more serious one for introduction generation, that we ask the model to write the Introduction section given the Methodology , Results , Analyses , and the Citation list. To seriously evaluate the logic, citation and expression of the generated introduction, we design a set of metrics including logic fidelity, expression fluency, and citation quality. Together, the task and our metrics provide a principled framework for developing and benchmarking models that not only generate fluent but also logically self-contained introductions. To solve the CCIG task, we introduce LECTOR 1 1 1 Lector in Latin means Reader in English, reflecting the model’s goal of deep comprehension during writing. , a L ogic- E xpression C o-Reinforcemen t Learning framew or k. The key innovations are two-fold. First, we leverage a logic-reasoning graph as a structured intermediate representation to regularize the logic of the generated introduction. In the logic-reasoning graph, nodes are self-contained sentences that represent information from the paper, while edges explicitly model the logical relationships that connect these claims into a coherent argument, guided by the three Peircean reasoning paradigms (Peirce, 1992 ) , The graph acts as an explicit logical blueprint, forcing the model to first map out the paper’s argumentative skeleton before generating any text. Second, we propose a logic and expression co-rewarding, where reward signals are computed from both the quality of the extracted reasoning structure and the generated Introduction, encouraging the model to align logic fidelity with writing quality. This joint optimization strategy enables mutual reinforcement between scientific understanding and structure-aware writing. To validate the effectiveness of the proposed method, we construct a large-scale dataset of 10,200 scientific papers from Nature Communications , covering diverse physics-related domains and spanning publications from April 2010 to March 2025. Using this dataset, we evaluate our approach on the challenging task of logic-aware Introduction writing. LECTOR allows LLMs to bridge the gap between deep logical reasoning and high-quality narrative generation. To demonstrate this, we implement LECTOR on a 4B-parameter model, i.e., Qwen3-4B-Instruct-2507 (Yang et al. , 2025 ) , observing remarkable improvements across all metrics, including Graph Quality (+ 26.7% ), Citation Quality ( +8.6% ), and Paper Consistency (+ 3.3% ). Notably, the final performance of our lightweight model is comparable to that of strong commercial systems like GPT-o3 (OpenAI, 2025 ) , while vastly outperforming its untrained baseline. This demonstrates that by explicitly modeling a paper’s reasoning structure and jointly optimizing for logic and expression, our framework provides a more efficient path to high-fidelity scientific writing than relying on model scale alone. Our contributions are summarized as follows: (1) We introduce the content-conditional introduction generation (CCIG) task, a new and more rigorous task for scientific writing AI, which prioritizes verifiable logical fidelity over mere topical fluency. (2) We propose LECTOR, a novel Logic-Expression Co-Reinforcement Learning framework designed to solve the CCIG task, which utilizes a logic-reasoning graph as an explicit intermediate representation and a co-rewarding mechanism to jointly optimize for both structural logic and narrative quality. (3) We construct a dataset using papers from Nature Communications and empirically validate that LECTOR improves both logic fidelity and Introduction generation quality. 2 Related Work 2.1 LLM-based Scientific Writing Recent advances in Large Language Models (LLMs) show the potential for automating scientific writing. While raising lots of concerns about academic misconduct (Cheng and Zhang, 2025 ; Kwon, 2025 ) , top-tier venues such as ICML, Nature and Science open a window to AI-assisted paper writing with rigorous regulations. One common use case is to generate literature surveys. These methods generally leverage LLMs to automatically collect relevant papers and synthesize coherent survey articles, demonstrating the potential of LLMs in large-scale academic content generation (Wang et al. , 2024b ; Yan et al. , 2025 ; Zhang et al. , 2025 ) . Beyond survey writing, recent AI Scientist systems (Lu et al. , 2024 ; Yamada et al. , 2025 ; Weng et al. , 2025b ; Yu et al. , 2025 ; Tang et al. , 2026 ; Weng et al. , 2025a ) further extend this direction by generating complete academic papers in an end-to-end manner. Despite the impressive progress in end-to-end paper generation, these systems often suffer from quality issues in writing, including logical inconsistency, unclear contribution positioning, and weak structural organization (Ivanov, 2025 ; Mezzadri, 2025 ) . This indicates that directly generating papers without explicitly modeling research logic may limit the reliability and interpretability of the writing process (BaHammam, 2025 ; Knöchel et al. , 2025 ) , motivating a deeper investigation into how scientific writing should be guided by structured understanding. The most relevant work is SciIG (Garg et al. , 2025 ) , which systematically benchmarks LLMs on the task of writing research paper Introductions, providing detailed evaluation of writing quality across different models (Liu et al. , 2024 ; Team et al. , 2025 ) . While this work offers valuable insights into the strengths and limitations of current LLMs in scientific writing, it primarily leverages titles, abstracts to generate introductions and therefore focuses on expression fluency and does not explicitly evaluate whether the generated text is grounded in a correct understanding of the underlying research logic. In contrast, our work leverages a reasoning-logic graph to regularize the flow of introduction so that it follows the logic of Results, Methodology, Analysis and Citations and designs a logic-expression co-rewarding strategy to improve both logic and expression of the generated introduction. 2.2 Structured Representation for Scientific Documents Structured representations of scientific documents extract the internal reasoning-logic in scientific papers, which show benefits to a variety of downstream understanding tasks. Open Research Knowledge Graph (ORKG) (Jaradeh et al. , 2019 ) represents research contributions as semantic entities and relations to support scholarly comparison and retrieval. NLP-AKG constructs a large-scale academic knowledge graph for NLP by extracting fine-grained conceptual relations across papers, enabling structured semantic search and analysis (Lan et al. , 2025 ) . Contrastive Hierarchical Discourse Graph models scientific papers with hierarchical discourse structures for summarization (Zhang et al. , 2023 ) . These approaches demonstrate the effectiveness of structured representations for downstream tasks such as retrieval and summarization. Recently, ARCHE (Li et al. , 2026 ) introduces a benchmark for extracting latent reasoning chains from scientific papers, explicitly targeting the recovery of implicit reasoning structures and revealing the limitations of current LLMs in capturing formal reasoning processes. However, it focuses on reasoning extraction alone and does not connect structured reasoning representations to scientific writing. In contrast, our work leverages a reasoning logic graph to explicitly model research-level logical structure and directly uses it to guide Introduction generation. 2.3 Benchmarks for Paper Understanding Early benchmarks mainly evaluate information-seeking question answering over scientific documents. PubMedQA focuses on biomedical literature QA (Jin et al. , 2019 ) , while QASPER extends this setting to expert-authored questions requiring multi-section evidence aggregation (Dasigi et al. , 2021 ) . Recent datasets target deeper document-level comprehension. SciDQA emphasizes cross-section reasoning for scientific reading comprehension (Singh et al. , 2024 ) . With the emergence of long-context models, LongBench and LongBench v2 benchmark LLMs on realistic long-document understanding tasks, including scientific papers (Bai et al. , 2024 , 2025 ) . However, existing benchmarks primarily assess local comprehension, retrieval, or long-context reading. In contrast, our work evaluates research-level understanding by explicitly modeling scientific reasoning logic and assessing it through structure-guided Introduction generation. 3 Methodology Figure 2 : The overall architecture of LECTOR. The framework operates in two synergistic stages within a single rollout: (Top) Reasoning Logic Graph Extraction: Given the main body of scientific research articles including Methods ( ℳ \mathcal{M} ), Results ( ℛ \mathcal{R} ), and Discussion ( 𝒟 \mathcal{D} ) but excluding the Introduction, LECTOR extracts an explicit Reasoning Logic Graph . This graph consists of nodes connected through deduction , abduction , and induction to derive the paper’s core idea. (Bottom) Logic-Aware Introduction Writing: Taking the extracted graph and a citation list 𝒞 \mathcal{C} as input sources, the model generates a structured introduction following the CARS (Create a Research Space) move structures (e.g., Establishing a Territory/Niche ). Optimization: Both stages share weights and are jointly optimized through a Logic-Expression Co-Rewarding mechanism. By rigorously evaluating Graph Quality and Graph-Writing Alignment alongside Writing Quality and Citation Quality , LECTOR ensures that the high-quality reasoning logic graph effectively grounds the final introduction to be logically sound, verifiably faithful, and narratively fluent. While Large Language Models (LLMs) can generate fluent and plausible scientific introductions, their outputs often lack deep logical coherence and verifiable fidelity to the core research narrative. This limitation arises because conventional training paradigms optimize for textual coherence on surface-level textual patterns, rather than explicitly modeling the underlying logic graph of scientific arguments. Existing introduction generation tasks ask the LLM to write the introduction based on the title, abstract and citations, which only contains more abstract information. We consider such a setting to contradict the real academic writing scenario, where the introduction section is summarized from more detailed parts such as results, methodology, analysis and citations. Therefore we propose a Content-Conditional Introduction Generation (CCIG) task (Sec. 3.1 ) and teach an LLM to generate a coherent and logically faithful Introduction for scientific papers given results, methodology, analysis sections and citation lists. To build a simple baseline, we propose LECTOR , a L ogic- E xpression C o-Reinforcemen t Learning framew or k, where a logic reasoning graph (Sec. 3.2 ) behaves as a versatile intermediate representation to enhance the logic of generated introduction, and a logic and expression co-rewarding (Sec. 3.3 ) designed to jointly optimize for logic fidelity and narrative quality. 3.1 Content-Conditional Introduction Generation Task Different from existing formulation that leverages title, abstract, citation lists to generate introduction, we propose a more realistic setting that is to generate the introduction section based on more detailed experimental information. Specifically, given a scientific paper 𝒫 \mathcal{P} , we define its main body ℬ \mathcal{B} as the content excluding the Introduction ℐ \mathcal{I} . The main body, which comprises the Methods ℳ \mathcal{M} , Results ℛ \mathcal{R} , Analyses 𝒜 \mathcal{A} , and Citations 𝒞 \mathcal{C} , serves as the detailed, low-level evidence that substantiates the claims of the paper. The content-conditional introduction generation (CCIG) task requires the model to do a mapping from this evidence to the high-level narrative of the Introduction: ℐ = ℱ ​ ( ℳ , ℛ , 𝒜 , 𝒞 ) , \mathcal{I}=\mathcal{F}(\mathcal{M},\mathcal{R},\mathcal{A},\mathcal{C}), (1) where ℐ \mathcal{I} is the introduction section of the paper, ℳ \mathcal{M} is the method of the paper, ℛ \mathcal{R} is the result section of the paper, 𝒜 \mathcal{A} is the analysis of the paper and 𝒞 \mathcal{C} is the citation of the paper. Our task formulation differs fundamentally from concurrent work like SciIG (Garg et al. , 2025 ) , which generates an Introduction conditioned on high-level summaries ( e.g. , Title, Abstract) and external context ( e.g. , Related Papers). While SciIG’s setup prioritizes topical relevance and summary, our CCIG task focuses on logic grounding and writing quality . By conditioning on detailed information in the main body, e.g., method, results, analysis and citation lists, we create a more realistic scenario that requires a model to ground its narrative in the specific methods and findings of the research paper, rather than summarizing high-level concepts. 3.2 Logic-reasoning Graph as an Intermediate Representation Table 1 : Definitions of the Six Reasoning Edge Types. Paradigm Edge Type Role: Represents a premise that is… Deduction deduction-rule A general principle, law, or established rule. deduction-case A specific instance or case that falls under that general rule. Induction induction-common A general pattern or commonality abstracted across multiple observations. induction-case An individual observation or piece of evidence supporting the pattern. Abduction abduction-phenomenon An observation or phenomenon that requires an explanation. abduction-knowledge Background knowledge that offers the best explanation for the phenomenon. The direct and end-to-end approach ( i.e. , LLM supervised finetuning) to content-conditional introduction generation is sophisticated and ill-posed. This approach forces a model to simultaneously comprehend a long, unstructured body of text and compose a logically coherent narrative, leading to two critical problems. First, it lacks guidance for the LLM to focus on the pivotal elements of the research. Second, it lacks an explicit mechanism to enforce logical consistency. To overcome these issues, we argue that an intermediate representation that regularize the writing logic is not only beneficial, but essential. An effective representation must possess two properties: (1) Compactness , which helps distinguish core arguments from the verbose details of the main body, and (2) Structuring , which helps synthesize concepts and articulate the logical connections among them. Therefore, we consider that the logic-reasoning graph is the ideal intermediate representation for scientific introduction writing. Dissimilar to unstructured summaries, a graph explicitly models the foundational components of scientific reasoning. By converting the unstructured main body ℬ \mathcal{B} into a structured logic graph 𝒢 \mathcal{G} , we transform the task from a complex, implicit reasoning problem into a more tractable, logic-guided generation problem. 3.2.1 Reasoning Logic Graph Extraction From the main body ℬ = ( ℳ , ℛ , 𝒜 , 𝒞 ) \mathcal{B}=(\mathcal{M},\mathcal{R},\mathcal{A},\mathcal{C}) , the model constructs a reasoning logic graph 𝒢 \mathcal{G} , which serves as an explicit, formalized representation of the relationships among research problems, methods, experiments, and findings: 𝒢 = ℱ ​ ( ℬ ) = ℱ ​ ( ℳ , ℛ , 𝒜 ) , \mathcal{G}=\mathcal{F}(\mathcal{B})=\mathcal{F}(\mathcal{M},\mathcal{R},\mathcal{A}), (2) where 𝒢 \mathcal{G} is the logic-reasoning graph defined below. ℱ \mathcal{F} is initialized from Qwen-4B, prompted with descriptions of the Reasoning-logic graph 𝒢 \mathcal{G} , and finetuned by reinforcement learning with Logic-Expression Co-rewarding (Sec. 3.3 ). To force the model to concentrate on the underlying logic of the paper, we omit bibliographic details, i.e., Citation 𝒞 \mathcal{C} . Definition of Reasoning Logic Graph 𝒢 \mathcal{G} . Our graph formalism is inspired by the philosophical work of Charles S. Peirce (Peirce, 1992 ) , who categorized all valid reasoning as deductive , inductive , or abductive , or combinations thereof. A reasoning Logic Graph 𝒢 = ( 𝒱 , ℰ ) \mathcal{G}=(\mathcal{V},\mathcal{E}) is a single-rooted directed acyclic graph, where its nodes 𝒱 \mathcal{V} are complete, self-contained sentences that represent an atomic unit of information extracted from the paper, i.e. , a scientific claim, an experimental finding, a piece of background knowledge, an opinion or statement derived from referenced work. The edges of the graph ℰ \mathcal{E} are designed to model the three Peircean reasoning paradigms. Each logical inference is formed by a specific pair of premise edges pointing to a single conclusion, ensuring that every reasoning step is well-founded and traceable. The roles of the six defined edge types are detailed in Table 1 . Discussion. While our reasoning graph shares a motivational origin with the one used in the ARCHE benchmark (Li et al. , 2026 ) , their purposes are fundamentally different. ARCHE employs its graph as a final output for the evaluation of an LLM’s reasoning capabilities. In contrast, we utilize the reasoning graph as an intermediate representation designed to guide the learning of content-conditional introduction generation. 3.2.2 Logic-aware Introduction Writing To produce a coherent and academically polished introduction, we borrow the guidelines from the influential CARS (Create A Research Space) framework (Swales, 1990 ) . This framework posits that a good introduction consists of three moves: (1) The first move is to establish the territory, which is describing the broader research area and its importance, summarizing the relevant background knowledge represented in the logic-reasoning graph 𝒢 \mathcal{G} ; (2) The second move is to build the niche, which is identifying gaps, unresolved issues, or limitations suggested by the logic reasoning graph 𝒢 \mathcal{G} ; (3) The last move is to present the central research idea corresponding to the root node of the graph, showing how it logically follows from the preceding reasoning steps. Based on the above guidelines as prompts, we leverage a learnable large language model ℱ \mathcal{F} to generate the introduction ℐ \mathcal{I} , i.e., ℐ = ℱ ​ ( 𝒢 , 𝒞 ) , \mathcal{I}=\mathcal{F}(\mathcal{G},\mathcal{C}), (3) where ℐ \mathcal{I} is the generated introduction, 𝒞 \mathcal{C} is the citation list, and ℱ \mathcal{F} is the large language model that can be trained by the reinforcement learning in Sec. 3.3 . The citation list 𝒞 \mathcal{C} provides all bibliographic entries, each associated with a unique index. The model is constrained to rely exclusively on 𝒢 \mathcal{G} for the scientific narrative and on 𝒞 \mathcal{C} for sourcing citations, which must be inserted using the required [idx] format. 3.3 Logic-Expression Co-Reinforcement Learning While the task can naturally decompose into two stages (i) Reasoning Logic Graph Extraction and (ii) Logic-aware Introduction Writing, training these components in isolation poses significant challenges. A disjoint two-stage training paradigm not only requires expensive annotation for intermediate graph representations (logic graphs aligned with specific introductions) but also risks catastrophic forgetting for previous stage. Furthermore, independent optimization ignores the dependency between the two tasks, leading to error propagation. Therefore, we propose a joint learning framework inspired by the Information Bottleneck (IB) principle. We treat the extracted reasoning logic graph 𝒢 \mathcal{G} not merely as an intermediate output, but as a compressed semantic bottleneck that distills the essential information (Tishby and Zaslavsky, 2015 ; Tishby et al. , 2000 ) from the paper body ℬ = ( ℳ , ℛ , 𝒜 , 𝒞 ) \mathcal{B}=(\mathcal{M},\mathcal{R},\mathcal{A},\mathcal{C}) required to reconstruct the introduction ℐ \mathcal{I} , where ℳ \mathcal{M} , ℛ \mathcal{R} , 𝒜 \mathcal{A} , 𝒞 \mathcal{C} are the methodology, result, analysis and citation section of the paper, respectively. Instead of being supervised by static labels, we cast the entire trajectory ℬ → 𝒢 → ℐ \mathcal{B}\to\mathcal{G}\to\mathcal{I} as a single unified episode within an RL paradigm. The model is trained to maximize a set of carefully designed fine-grained rewards . These rewards evaluate the quality of the final generated Introduction, providing feedback that propagates back to optimize the generation and extraction policies simultaneously. 3.3.1 Simplified Reinforcement Learning Drawing inspiration from the efficiency of Group Relative Policy Optimization (GRPO) (Shao et al. , 2024 ) , we propose a Simplified PPO architecture tailored for our joint extraction-generation task. To reduce memory overhead and training cost, we streamline the system to encompass only two active components: a Policy Model (Actor, π θ \pi_{\theta} ) and a Value Model (Critic, V ϕ V_{\phi} ). Specifically, we discard the learned Reward Model, as the quality of logic extraction and text generation in our domain can be evaluated through deterministic rules rather than black-box predictions. Consequently, the training objective is driven by a verifiable reward function. Let a trajectory be τ = ( ℬ , 𝒢 , ℐ ) \tau=(\mathcal{B},\mathcal{G},\mathcal{I}) . The optimization objective is defined as: ℒ C ​ L ​ I ​ P ​ ( θ ) = 𝔼 τ ∼ π θ ​ [ min ⁡ ( ρ t ​ ( θ ) ​ A ^ t , clip ​ ( ρ t ​ ( θ ) , 1 − ϵ , 1 + ϵ ) ​ A ^ t ) ] , \small{\mathcal{L}^{CLIP}(\theta)=\mathbb{E}{\tau\sim\pi{\theta}}\left[\min\left(\rho_{t}(\theta)\hat{A}{t},\text{clip}(\rho{t}(\theta),1-\epsilon,1+\epsilon)\hat{A}{t}\right)\right],} (4) where ρ t ​ ( θ ) = π θ ​ ( a t | s t ) π θ o ​ l ​ d ​ ( a t | s t ) \rho{t}(\theta)=\frac{\pi_{\theta}(a_{t}|s_{t})}{\pi_{\theta_{old}}(a_{t}|s_{t})} is the probability ratio, and A ^ t \hat{A}{t} is the advantage estimated by the Critic V ϕ V{\phi} . Crucially, the advantage calculation relies on our verifiable reward R ​ ( τ ) R(\tau) , which is formulated as a weighted sum of feedback signals: R ​ ( τ ) = R graph ​ ( 𝒢 ) + R faith ​ ( 𝒢 , ℐ ) + R consis ​ ( ℐ ) + R qual ​ ( ℐ ) + R ref ​ ( ℐ ) , \footnotesize R(\tau)=R_{\text{graph}}(\mathcal{G})+R_{\text{faith}}(\mathcal{G},\mathcal{I})+R_{\text{consis}}(\mathcal{I})+R_{\text{qual}}(\mathcal{I})+R_{\text{ref}}(\mathcal{I}), (5) where 𝒢 \mathcal{G} and ℐ \mathcal{I} denote the extracted reasoning graph and the generated introduction within the trajectory τ \tau , respectively. Specifically, R graph R_{\text{graph}} acts as a structural regularizer for the intermediate representation, R faith R_{\text{faith}} penalizes hallucinations to ensure the text strictly follows the graph, R ref R_{\text{ref}} provides supervised guidance from the ground truth, and R qual R_{\text{qual}} enforces high-level academic writing standards. The detailed design of the rewards is as follows. Table 2: Main experimental results comparing our method against strong proprietary LLMs and baselines. GQ : Graph Quality, GW : Graph-Writing Alignment, PC : Paper Consistency, WQ : Writing Quality, CQ : Citation Quality, OP : Overall Performance. The One-Step-Baseline lacks intermediate graph generation, hence GQ and GW are not applicable. Model GQ ↑ \uparrow GW ↑ \uparrow PC ↑ \uparrow WQ ↑ \uparrow CQ ↑ \uparrow OP ↑ \uparrow Proprietary SOTA Models GLM-4.7 ( GLM-4.7Team , 2025 ) 0.123 0.088 0.428 0.734 0.739 0.466 Gemini-2.5pro ( Comanici et al. , 2025 ) 0.357 0.262 0.458 0.814 0.710 0.566 Grok4 ( Grok4Team , 2025 ) 0.651 0.601 0.464 0.691 0.721 0.599 Claude-haiku-4.5 ( Anthropic , 2025 ) 0.707 0.727 0.470 0.699 0.529 0.612 GPT-o3 ( OpenAI , 2025 ) 0.690 0.448 0.416 0.882 0.691 0.656 Our Framework (Backbone: Qwen3-4B) Qwen3-4B-Instruct-2507 (Base) 0.478 0.682 0.453 0.546 0.444 0.510 One-Step-Baseline – – 0.476 0.829 0.477 – LECTOR (Our Method) 0.745 0.623 0.486 0.834 0.530 0.665 3.3.2 Verifiable Reward Modeling To steer the model toward generating logically sound and rigorous introductions, we design a composite reward function R ​ ( τ ) R(\tau) aggregated from five distinct dimensions: graph validity, generation faithfulness, paper consistency, academic quality, and reference alignment. Graph Validity ( R graph R_{\text{graph}} ) ensures the intermediate reasoning graph 𝒢 \mathcal{G} is topologically valid and informative. We employ Reasoning Edge Accuracy , where an LLM verifier checks if the premise node logically supports the conclusion node for each edge, defining the reward as the ratio of validated edges. Additionally, we compute Entity Coverage by measuring the overlap between entities in 𝒢 \mathcal{G} and key concepts extracted from the ground-truth introduction ℐ ∗ \mathcal{I}^{*} , encouraging the graph to capture essential research concepts. Faithfulness Rewards ( R faith R_{\text{faith}} ). Since ℐ \mathcal{I} is generated solely from 𝒢 \mathcal{G} , strict adherence to the graph’s semantics is critical to prevent hallucination. We enforce this via Bidirectional Coverage , which penalizes ungrounded content by calculating the semantic overlap of key phrases between 𝒢 \mathcal{G} and ℐ \mathcal{I} . Furthermore, we assess Entailment Faithfulness using the SummaC model to compute NLI scores (treating the linearized gr

Comments (0)

No comments yet

Be the first to share your thoughts!