Back to AI Research

AI Research

Confidence-Aware Automated Assessment of Student-Dr... | AI Research

Key Takeaways

  • Confidence-Aware Automated Assessment of Student-Drawn Scientific Models In science education, students often create drawings to demonstrate their understand...
  • Student-generated drawings are widely used in science education to assess learners' conceptual understanding in modeling-based tasks aligned with the Next Generation Science Standards (NGSS).
  • However, scoring such drawings requires expert human judgment to interpret complex visual representations, making large-scale assessment costly to implement and sustain in classroom settings.
  • In this work, we study automated scoring of student-generated scientific drawings using a vision-based model.
  • We evaluate a Vision Transformer (ViT) with parameter-efficient adaptation and propose a confidence-aware scoring framework that derives response-level confidence from test-time predictive distributions.
Paper AbstractExpand

Student-generated drawings are widely used in science education to assess learners' conceptual understanding in modeling-based tasks aligned with the Next Generation Science Standards (NGSS). However, scoring such drawings requires expert human judgment to interpret complex visual representations, making large-scale assessment costly to implement and sustain in classroom settings. In this work, we study automated scoring of student-generated scientific drawings using a vision-based model. We evaluate a Vision Transformer (ViT) with parameter-efficient adaptation and propose a confidence-aware scoring framework that derives response-level confidence from test-time predictive distributions. This confidence signal enables selective automation by scoring high-confidence responses automatically while deferring uncertain cases for human review. Experiments on six NGSS-aligned middle school assessment items show that the proposed approach improves scoring reliability while supporting a practical trade-off between automated coverage and scoring risk, highlighting the value of confidence-aware methods for trustworthy educational assessment.

Confidence-Aware Automated Assessment of Student-Drawn Scientific Models
In science education, students often create drawings to demonstrate their understanding of complex concepts, such as how particles behave in different temperatures. While these drawings are valuable for assessment, they are difficult to grade at scale because they require time-consuming, expert human judgment. This paper introduces a new AI-driven framework that automates the scoring of these drawings while providing a "confidence score." This allows the system to automatically grade clear, high-confidence responses while flagging ambiguous or complex drawings for human review, ensuring that automated assessment remains both efficient and reliable.

How the System Works

The researchers utilized a Vision Transformer (ViT), a type of AI model designed to interpret visual data. To make the model practical for classroom use, they employed a technique called Low-Rank Adaptation (LoRA). This allows the model to learn how to score scientific drawings without needing to be fully retrained, which saves significant computational resources.
The core innovation is the "confidence-aware" component. When the model evaluates a drawing, it creates several slightly altered versions of the image—such as different crops or rotations—and checks if its scoring decision remains consistent across these variations. If the model consistently assigns the same score, it marks the assessment as high-confidence. If the scores vary, the system recognizes that it is uncertain and defers the final decision to a human teacher.

Improving Scoring Reliability

To test the framework, the researchers applied it to six different middle school science assessment items. The results showed that this confidence-aware approach outperformed standard AI models, particularly in terms of Cohen’s kappa, a metric used to measure how well the AI’s scores align with those of human experts. By filtering out unreliable predictions through a process called "selective trust," the system reduces the risk of incorrect automated grading. This makes the AI a more trustworthy partner for teachers who need to manage large volumes of student work.

Practical Considerations

The study highlights that while this AI approach is highly efficient, it is designed to support—not replace—the teacher. The confidence score acts as a triage tool, helping educators focus their limited time on the student responses that are most difficult to interpret.
It is important to note that the performance of this system is tied to the specific context of the data used. The drawings were collected from middle school classrooms in one region of the United States, and the model’s "expert" benchmarks are based on human graders, meaning any biases present in human scoring could potentially be reflected in the AI’s output. Future research aims to test this framework across more diverse student populations and explore how it might provide more detailed feedback to help students improve their scientific modeling skills.

Comments (0)

No comments yet

Be the first to share your thoughts!