Autonomous AI-assisted diabetic retinopathy screening at primary care is associated with increased presentation to eye care by at risk patientsLeong, Ariel; Wolf, Risa M.; Channa, Roomasa; Wang, Jiangxia; Lehmann, Harold; Abramoff, Michael D.; Liu, T. Y. Alvin
doi: 10.1038/s41746-026-02460-5pmid: 41781569
Adult patients with diabetes (n = 3745) seen at Johns Hopkins Medicine primary care sites were referred to the Wilmer Eye Institute either based on a primary care provider referral or autonomous AI diagnostic result (referral was made after a positive or non-diagnostic result). An inverse-probability-weighted regression, which incorporated propensity score matching on social determinants of health and relevant clinical variables, showed that implementation of an autonomous AI-assisted diabetic screening program in a primary care clinic was associated with increased presentation to eye care specialist care by African-Americans (p = 0.02). This is significant because African-Americans have traditionally been less likely to undergo annual screening exams and more likely to present with more severe forms of diabetic retinopathy (DR). The results suggest a potential association between office-based, AI-assisted DR screening and improved downstream ophthalmic access for African-American patients. However, given that the analysis was exploratory, this association should be interpreted cautiously and further validated.
A transformer-based survival model for prediction of all-cause mortality in patients with heart failure: a multi-cohort studyRao, Shishir; Ahmed, Nouman; Salimi-Khorshidi, Gholamreza; Yau, Christopher; Su, Huimin; Conrad, Nathalie; Asselbergs, Folkert W.; Woodward, Mark; Jackson, Rod; Cleland, John GF; Rahimi, Kazem
doi: 10.1038/s41746-025-02296-5pmid: 41507366
Heart failure (HF) patients have complex health profiles that existing risk models fail to capture. We developed TRisk, a Transformer-based artificial intelligence survival model for predicting mortality using routine electronic health records (EHR) in HF patients. Using UK data from 403,534 HF patients across 1418 English general practices, we trained and validated TRisk and compared it against MAGGIC-EHR, the MAGGIC model adapted for use on routine EHR by substituting variables (e.g. left-ventricular ejection fraction) that are not routinely available. External validation was conducted on 21,767 patients from USA hospitals. In the UK cohort, TRisk achieved a concordance index (C-index): 0.845 (95% CI: 0.841, 0.849), outperforming MAGGIC-EHR (C-index: 0.728 [0.723, 0.733]) for 36-month mortality prediction. In subgroup analyses, TRisk demonstrated less variability in predictive performance by sex, age, and baseline characteristics compared to MAGGIC-EHR, suggesting less biased modelling. Evaluating TRisk in USA data via transfer learning yielded a C-index of 0.802 (0.789, 0.816). Explainability analysis revealed TRisk captured established risk factors while identifying underappreciated ones, particularly cancers and hepatic failure, with cancers maintaining prognostic utility even a decade before baseline. TRisk provides more accurate, well-calibrated mortality prediction using routine data across international healthcare settings, demonstrating potential for improved risk stratification in patients with HF.
New model, old risks: sociodemographic bias and adversarial hallucinations vulnerability in GPT-5Omar, Mahmud; Agbareia, Reem; Apakama, Donald U.; Horowitz, Carol R.; Freeman, Robert; Charney, Alexander W.; Nadkarni, Girish N.; Klang, Eyal
doi: 10.1038/s41746-026-02584-8pmid: 41935214
We re-evaluated GPT-5 using our published pipelines: 500 emergency vignettes across 32 sociodemographic labels for bias, and adversarial prompts with fabricated details. GPT-5 showed no measurable improvement over GPT-4o in sociodemographic-linked decision variation, with several LGBTQIA+ groups flagged for mental-health screening in 100% of cases. Adversarial hallucination rates were higher (65% vs 53% for GPT-4o); a mitigation prompt reduced this to 7.67%.
Specialized foundation models for intelligent operating roomsÖzsoy, Ege; Pellegrini, Chantal; Bani-Harouni, David; Yuan, Kun; Keicher, Matthias; Navab, Nassir
doi: 10.1038/s41746-026-02631-4pmid: 41986551
Surgical procedures unfold in complex environments demanding coordination between surgical teams, tools, imaging and increasingly, intelligent robotic systems. While AI solutions like ChatGPT and Gemini have revolutionized language understanding and seen early adaptions in clinical diagnosis, they fall short in the safety-critical, multimodal setting of surgery. Ensuring safety and efficiency in ORs of the future requires intelligent systems, like surgical robots, smart instruments and digital copilots, capable of understanding complex activities and hazards. We introduce ORQA, a multimodal foundation model unifying visual, auditory, and structured data for holistic surgical understanding. ORQA’s question-answering framework empowers diverse tasks, serving as an intelligence core for surgical technologies. We benchmark ORQA against generalist vision-language models, and show that while they struggle to perceive surgical scenes, ORQA delivers substantially stronger, consistent performance. To meet diverse deployment needs, we design, and release a family of smaller ORQA models tailored to different computational requirements. This work establishes a foundation for the next wave of intelligent surgical solutions, enabling surgical teams and medical technology providers to create smarter and safer operating rooms.
A test-time clinically adaptive framework for detecting multiple fundus diseases harnessing ophthalmic foundation modelsJiang, Hongyang; Liu, Zirong; Gao, Mengdi; Xu, Bowen; Fang, Danqi; Zheng, Chunwen; Shen, Ruyue; Nguyen, Truong X.; Ran, An Ran; Tham, Clement C.; He, Qinghua; Szeto, Simon K. H.; Cheung, Carol Y.
doi: 10.1038/s41746-026-02480-1pmid: 41772178
Fundus diseases are leading causes of global vision impairment, often presenting with complex comorbidities that challenge conventional artificial intelligence models. While ophthalmic foundation models (FMs) offer promising capabilities, their clinical translation for multi-disease detection remains limited by issues such as imbalanced data distribution, uncertainty in multi-label predictions, inter-disease confusion, and domain shifts. Here, we introduce RetExpert, a test-time clinically adaptive framework that enhances FMs for robust and generalizable detection of multiple fundus diseases from color fundus photographs. RetExpert incorporates adaptive knowledge units with a novel stochastic one-hot activation module to improve generalizability, alongside long-tail-aware learning and an uncertainty-aware multi-label learning strategies. It also integrates a fundus disease co-occurrence matrix as medical prior knowledge to mitigate the confusion score (C-score) between diseases. Furthermore, RetExpert employs a lightweight test-time adaptation method combining unsupervised and pseudo-supervised learning (TTUL + TTPL), enabling dynamic parameter adjustment without full retraining. Extensive evaluations on 15 public and private datasets demonstrate that RetExpert outperforms ophthalmic FMs in detection performance, reliability, and cross-domain adaptability, offering a clinically viable solution for automated multi-disease screening in real-world settings.
Economic evaluation of a digital symptom checker for endometriosis using a Markov decision process modelXu, Yihan; Prentice, Carley; Torres-Rueda, Sergio; Meczner, András; Multmeier, Jan; Wickham, Aidan; Kelly, Laura; Klepchukova, Anna; Stsefanovich, Heorhi; Zhaunova, Liudmila
doi: 10.1038/s41746-025-02332-4pmid: 41772033
Digital symptom checkers (SCs) are increasingly used to support early symptom recognition and care-seeking, yet evidence on their cost-effectiveness remains limited. We conducted an economic evaluation of a digital SC for endometriosis, a prevalent but underdiagnosed condition, as a case study. We developed a Markov decision process model to compare the digital SC with the standard of care from a societal perspective. Over a 40-year horizon, the digital SC reduced diagnostic delay by 4.36 years, generated 0.049 quality-adjusted life years (QALYs) per person, saved $5196.22 in costs, and produced an incremental net monetary benefit (INMB) of $10,089.00 at a $100,000/QALY threshold. Probabilistic sensitivity analysis confirmed the robustness of these findings, with an INMB of $12,398.92 (95% CI: $11,893.11–$12,904.72). Scenario analyses showed that the SC remained cost-effective under a wide range of assumptions, with the greatest value realized when sensitivity and specificity were ≥0.7, compliance exceeded 45%, and a time horizon of at least 10 years. This study provides the first economic evaluation of a digital SC for endometriosis and illustrates when and how digital SCs can deliver value to patients and health systems.
PrysmNet a polyp refining system using salience and multimodal guidance for reproducible cross domain segmentationXiao, Junbo; Han, Yi; Wang, Lei; Li, Ying; Wang, Xiaotong; Li, Shizhe; Yi, Jun; Wu, Yu; Liu, Xiaowei
doi: 10.1038/s41746-026-02345-7pmid: 41565973
Colorectal cancer prevention benefits from accurate and reproducible polyp segmentation, yet cross-domain generalization and boundary precision remain challenging in real-world deployments. We propose Prysm-Net, a ViT-based framework designed to address these issues through architectural innovation and advanced training guidance. Our model is augmented with a biologically inspired salience module (BSM) that dynamically sharpens boundary-relevant features. To further enhance robustness without increasing inference costs, we introduce two training-only strategies: (i) foundation-model distillation from SAM, which transfers knowledge at the output, boundary, and feature levels, and (ii) multi-modal guidance that injects auxiliary structural and textural cues via gated cross-attention. Extensive experiments on standard in-domain benchmarks and challenging cross-domain datasets demonstrate that Prysm-Net achieves superior segmentation accuracy and robust generalization compared to state-of-the-art methods, all while maintaining a lightweight inference process by disabling auxiliary guidance at test time.
Advancing diagnostic equity through artificial intelligence chest radiograph screening for osteoporosis in Asian populationsChen, Shu-Han; Chang, Ray-E; Lien, Chia-En; Yang, Dun-Jhu; Yao, Pei; Wu, Meng-Lu; Chen, Kun-Hui
doi: 10.1038/s41746-026-02484-xpmid: 41857300
Early identification of abnormal bone mineral density (BMD) through opportunistic screening is critical for preventing osteoporotic fractures. We validated an AI model in 2384 asymptomatic adults (57.7% female; mean age 43.6 years) undergoing health examinations in Taiwan. Using DXA as the reference, the model identified 255 suspected abnormal BMD cases, with 94 (3.9%) DXA-confirmed positive. Population-level performance was robust, yielding an AUC of 0.95 (95% CI 0.93–0.99) and sensitivity of 79.7% (95% CI 71.3–86.5%). Although BMI distributions paralleled East Asian regional trends, intersectional subgroup analyses remain exploratory due to small event counts. Decision curve analysis indicated superior net benefit for AI-based referral over “refer all” or “refer none” strategies, particularly for women with normal BMI (18.5–23 kg/m²). This AI tool offers precise triage for Asian health examination populations, though further validation in multi-center cohorts is required to confirm broad generalizability.
UroFusion-X: a unified multimodal deep learning framework for robust diagnosis, subtyping, and prognosis of urological cancersXiao, Yingming; Yang, Shengke; He, Mingjing; Chen, Li; Wu, Yi; Zhong, Lei
doi: 10.1038/s41746-025-02295-6pmid: 41554842
Multimodal clinical data, including imaging, pathology, omics, and laboratory tests, are often fragmented in routine practice, leading to inconsistent decision-making in the management of urological cancers. We propose UroFusion-X, a unified multimodal framework for integrated diagnosis, molecular subtyping, and prognosis prediction of bladder, kidney, and prostate cancers, with inherent robustness to missing modalities. The system incorporates 3D imaging encoders, pathology multiple-instance learning, omics graph networks, and a TabTransformer for laboratory and clinical variables. A cross-modal co-attention mechanism combined with a gated product-of-experts fusion strategy enables effective representation alignment across heterogeneous inputs, while anatomy-pathology consistency constraints and patient-level contrastive learning further enhance interpretability and generalization. Prognostic modeling is achieved via DeepSurv and DeepHit survival heads. Evaluated on a multi-center real-world cohort with external validation and leave-one-center-out testing, UroFusion-X consistently outperformed strong unimodal and simple fusion baselines, maintained over 90% of its predictive performance under substantial modality dropout, and demonstrated higher net clinical benefit in decision curve analysis. These results indicate that the proposed framework can improve decision consistency and reduce unnecessary testing when deployed in real clinical workflows.