WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning
Wearable devices like smartwatches generate a constant stream of complex health data, but current AI assistants struggle to interpret this information accurately. While large language models (LLMs) are excellent at processing text, they often fail to understand the raw, high-dimensional, and temporal nature of physiological signals. This paper introduces WEQA, an agentic framework designed to bridge the gap between human-language health questions and sensor-native data analysis, allowing for more precise and clinically sound health insights.
A New Approach to Health Reasoning
Instead of relying on a single, fixed model to interpret all health data, WEQA uses a "query-adaptive" agent. When a user asks a question, an LLM controller acts as a planner, analyzing the request to determine which specific tools are needed. It then routes the query to the appropriate pathway—such as statistical analysis for recent activity, or specialized machine learning models for predictive tasks like blood pressure estimation or respiratory screening. This modular design allows the system to handle diverse tasks, from simple descriptive summaries to complex medical predictions.
Evidence-Based Auditing
A critical component of the WEQA framework is its "Grounded Response Auditing" stage. After the system gathers evidence from sensor data and predictive models, it performs a verification step. This ensures that the final response provided to the user is directly supported by the underlying sensor evidence. The system also incorporates external medical knowledge to provide context and calibrates its language to reflect the level of certainty in its findings, which is essential for maintaining safety and clinical reliability.
Performance and Evaluation
To test the framework, the researchers developed a new benchmark covering four health domains, including cardiovascular, respiratory, and mental health. The results show that WEQA significantly outperforms standard LLM-only and existing agentic baselines, achieving at least 24% higher accuracy across the tested tasks. Furthermore, in a blinded study, both medical experts and regular users rated the responses generated by WEQA as more useful and clinically sound than those produced by other methods.
Why Adaptability Matters
The study highlights that the success of the framework is not just due to its specialized tools, but specifically its ability to adapt its reasoning workflow to the user's intent. When the researchers removed the query-adaptive planning component, they observed a consistent drop in performance across all metrics. This confirms that for wearable health, the ability to dynamically choose the right computational path—rather than using a one-size-fits-all approach—is the key to providing high-quality, personalized health assistance.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!