Back to AI Research

AI Research

WEQA: Wearable hEalth Question Answering with Query... | AI Research

Key Takeaways

  • WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning Wearable devices like smartwatches generate a constant stream of complex healt...
  • Language models are remarkably capable at medical question answering, in some cases surpassing the accuracy of general physicians.
  • The diversity of sensor modalities and user intents cannot be effectively handled by a fixed reasoning workflow or a single pretrained foundation model.
  • To address these challenges, we propose WEQA, a query-adaptive agent framework that unifies LLM reasoning with specialized wearable analytical and modeling tools.
  • We also curate a benchmark spanning four open wearable datasets comprising analytic and predictive tasks in three different health domains.
Paper AbstractExpand

Language models are remarkably capable at medical question answering, in some cases surpassing the accuracy of general physicians. However, answering questions about wearable health data remains challenging and understudied, as these ubiquitous sensors produce continuous, high-dimensional, and longitudinal data, which is non-trivial to align with text-centric distributions in LLM pretraining. The diversity of sensor modalities and user intents cannot be effectively handled by a fixed reasoning workflow or a single pretrained foundation model. To address these challenges, we propose WEQA, a query-adaptive agent framework that unifies LLM reasoning with specialized wearable analytical and modeling tools. An LLM controller is employed to synthesize execution plans and dynamically route each query to the appropriate combination of sensor analysis and pretrained models, and perform grounded response auditing with external knowledge. We also curate a benchmark spanning four open wearable datasets comprising analytic and predictive tasks in three different health domains. Experiments show that our framework is 24% more accurate than LLM and agentic baselines, and a blinded study with 12 medical experts and 8 users shows substantial gains in usefulness and clinical soundness.

WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning
Wearable devices like smartwatches generate a constant stream of complex health data, but current AI assistants struggle to interpret this information accurately. While large language models (LLMs) are excellent at processing text, they often fail to understand the raw, high-dimensional, and temporal nature of physiological signals. This paper introduces WEQA, an agentic framework designed to bridge the gap between human-language health questions and sensor-native data analysis, allowing for more precise and clinically sound health insights.

A New Approach to Health Reasoning

Instead of relying on a single, fixed model to interpret all health data, WEQA uses a "query-adaptive" agent. When a user asks a question, an LLM controller acts as a planner, analyzing the request to determine which specific tools are needed. It then routes the query to the appropriate pathway—such as statistical analysis for recent activity, or specialized machine learning models for predictive tasks like blood pressure estimation or respiratory screening. This modular design allows the system to handle diverse tasks, from simple descriptive summaries to complex medical predictions.

Evidence-Based Auditing

A critical component of the WEQA framework is its "Grounded Response Auditing" stage. After the system gathers evidence from sensor data and predictive models, it performs a verification step. This ensures that the final response provided to the user is directly supported by the underlying sensor evidence. The system also incorporates external medical knowledge to provide context and calibrates its language to reflect the level of certainty in its findings, which is essential for maintaining safety and clinical reliability.

Performance and Evaluation

To test the framework, the researchers developed a new benchmark covering four health domains, including cardiovascular, respiratory, and mental health. The results show that WEQA significantly outperforms standard LLM-only and existing agentic baselines, achieving at least 24% higher accuracy across the tested tasks. Furthermore, in a blinded study, both medical experts and regular users rated the responses generated by WEQA as more useful and clinically sound than those produced by other methods.

Why Adaptability Matters

The study highlights that the success of the framework is not just due to its specialized tools, but specifically its ability to adapt its reasoning workflow to the user's intent. When the researchers removed the query-adaptive planning component, they observed a consistent drop in performance across all metrics. This confirms that for wearable health, the ability to dynamically choose the right computational path—rather than using a one-size-fits-all approach—is the key to providing high-quality, personalized health assistance.

Comments (0)

No comments yet

Be the first to share your thoughts!