arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.21573 2026-04-24 cs.CV q-bio.QM

CHRep: Cross-modal Histology Representation and Post-hoc Calibration for Spatial Gene Expression Prediction

Changfan Wang, Xinran Wang, Donghai Liu, Fei Su, Lulu Sun, Zhicheng Zhao, Zhu Meng

详情

英文摘要

Spatial transcriptomics (ST) enables spatially resolved gene profiling but remains expensive and low-throughput, limiting large-cohort studies and routine clinical use. Predicting spatial gene expression from routine hematoxylin and eosin (H&E) slides is a promising alternative, yet under realistic leave-one-slide-out evaluation, existing models often suffer from slide-level appearance shifts and regression-driven over-smoothing that suppress biologically meaningful variation. CHRep is a two-phase framework for robust histology-to-expression prediction. In the training phase, CHRep learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization. In the inference phase, cross-slide robustness is improved without backbone fine-tuning through a lightweight calibration module trained on the training slides, which combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module. Unlike prior embedding-alignment or retrieval-based transfer methods that rely on a single prediction route, CHRep couples topology-preserving representation learning with post-hoc calibration, enabling stable neighborhood retrieval and controlled bias correction under slide-level shifts. Across the three cohorts, CHRep consistently improves gene-wise correlation under leave-one-slide-out evaluation, with the largest gains observed on Alex+10x. Relative to HAGE, the Pearson correlation coefficient on all considered genes [PCC(ACG)] increases by 4.0% on cSCC and 9.8% on HER2+. Relative to mclSTExp, PCC(ACG) further improves by 39.5% on Alex+10x, together with 9.7% and 9.0% reductions in mean squared error (MSE) and mean absolute error (MAE), respectively.

URL PDF HTML ☆

赞 0 踩 0

2604.21508 2026-04-24 cs.AI q-bio.BM

BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature

Jiaxian Yan, Jintao Zhu, Yuhang Yang, Qi Liu, Kai Zhang, Zaixi Zhang, Xukai Liu, Boyan Zhang, Kaiyuan Gao, Jinchuan Xiao, Enhong Chen

Comments 20 pages, 5 figures, 1 table

详情

英文摘要

Protein-ligand bioactivity data published in the literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated bioactivity extraction remains challenging because it requires not only interpreting biochemical semantics distributed across text, tables, and figures, but also reconstructing chemically exact ligand structures (e.g., Markush structures). To address this bottleneck, we introduce BioMiner, a multi-modal extraction framework that explicitly separates bioactivity semantic interpretation from ligand structure construction. Within BioMiner, bioactivity semantics are inferred through direct reasoning, while chemical structures are resolved via a chemical-structure-grounded visual semantic reasoning paradigm, in which multi-modal large language models operate on chemically grounded visual representations to infer inter-structure relationships, and exact molecular construction is delegated to domain chemistry tools. For rigorous evaluation and method development, we further establish BioVista, a comprehensive benchmark comprising 16,457 bioactivity entries curated from 500 publications. BioMiner validates its extraction ability and provides a quantitative baseline, achieving an F1 score of 0.32 for bioactivity triplets. BioMiner's practical utility is demonstrated via three applications: (1) extracting 82,262 data from 11,683 papers to build a pre-training database that improves downstream models performance by 3.9%; (2) enabling a human-in-the-loop workflow that doubles the number of high-quality NLRP3 bioactivity data, helping 38.6% improvement over 28 QSAR models and identification of 16 hit candidates with novel scaffolds; and (3) accelerating protein-ligand complex bioactivity annotation, achieving a 5.59-fold speed increase and 5.75% accuracy improvement over manual workflows in PoseBusters dataset.

URL PDF HTML ☆

赞 0 踩 0

2604.21263 2026-04-24 cs.AI cs.PL cs.SE q-bio.QM

Trustworthy Clinical Decision Support Using Meta-Predicates and Domain-Specific Languages

Michael Bouzinier, Sergey Trifonov, Michael Chumack, Eugenia Lvova, Dmitry Etin

详情

英文摘要

\textbf{Background:} Regulatory frameworks for AI in healthcare, including the EU AI Act and FDA guidance on AI/ML-based medical devices, require clinical decision support to demonstrate not only accuracy but auditability. Existing formal languages for clinical logic validate syntactic and structural correctness but not whether decision rules use epistemologically appropriate evidence. \textbf{Methods:} Drawing on design-by-contract principles, we introduce meta-predicates -- predicates about predicates -- for asserting epistemological constraints on clinical decision rules expressed in a DSL. An epistemological type system classifies annotations along four dimensions: purpose, knowledge domain, scale, and method of acquisition. Meta-predicates assert which evidence types are permissible in any given rule. The framework is instantiated in AnFiSA, an open-source platform for genetic variant curation, and demonstrated using the Brigham Genomics Medicine protocol on 5.6 million variants from the Genome in a Bottle benchmark. \textbf{Results:} Decision trees used in variant interpretation can be reformulated as unate cascades, enabling per-variant audit trails that identify which rule classified each variant and why. Meta-predicate validation catches epistemological errors before deployment, whether rules are human-written or AI-generated. The approach complements post-hoc methods such as LIME and SHAP: where explanation reveals what evidence was used after the fact, meta-predicates constrain what evidence may be used before deployment, while preserving human readability. \textbf{Conclusions:} Meta-predicate validation is a step toward demonstrating not only that decisions are accurate but that they rest on appropriate evidence in ways that can be independently audited. While demonstrated in genomics, the approach generalises to any domain requiring auditable decision logic.

URL PDF HTML ☆

赞 0 踩 0

2604.21260 2026-04-24 stat.ML cs.AI cs.LG econ.EM q-bio.QM stat.ME

Calibeating Prediction-Powered Inference

Lars van der Laan, Mark Van Der Laan

Comments Paper website: https://larsvanderlaan.github.io/ppi-aipw/

2604.21095 2026-04-24 cs.DC cs.SE q-bio.GN

TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes

Xingzhong Zhao, Ziqian Xie, Islam, Sheikh Muhammad Saiful, Tian Xia, Chen, Cheng, Degui Zhi

2604.21011 2026-04-24 cs.CV q-bio.NC

Micro-DualNet: Dual-Path Spatio-Temporal Network for Micro-Action Recognition

Naga VS Raviteja Chappa, Evangelos Sariyanidi, Lisa Yankowitz, Gokul Nair, Casey J. Zampella, Robert T. Schultz, Birkan Tunç

Comments Accepted to International Conference on Automatic Face and Gesture Recognition (FG)

2604.20981 2026-04-24 q-bio.QM cs.CV cs.LG

PanGuide3D: Cohort-Robust Pancreas Tumor Segmentation via Probabilistic Pancreas Conditioning and a Transformer Bottleneck

Sunny Joy Ma, Xiang Ma

详情

英文摘要

Pancreatic tumor segmentation in contrast-enhanced computed tomography (CT) is clinically important yet technically challenging: lesions are often small, heterogeneous, and easily confused with surrounding soft tissue, and models that perform well on one cohort frequently degrade under cohort shift. Our goal is to improve cross-cohort generalization while keeping the model architecture simple, efficient, and practical for 3D CT segmentation. We introduce PanGuide3D, a cohort-robust architecture with a shared 3D encoder, a pancreas decoder that predicts a probabilistic pancreas map, and a tumor decoder that is explicitly conditioned on this pancreas probability at multiple scales via differentiable soft gating. To capture long-range context under distribution shift, we further add a lightweight Transformer bottleneck in the U-Net bottleneck representation. We evaluate cohort transfer by training on the PanTS (Pancreatic Tumor Segmentation) cohort and testing both in-cohort (PanTS) and out-of-cohort on MSD (Medical Segmentation Decathlon) Task07 Pancreas, using matched preprocessing and training protocols across strong baselines. We collect voxel-level segmentation metrics, patient-level tumor detection, subgroup analyses by tumor size and anatomical location, volume-conditioned performance analyses, and calibration measurements to assess reliability. Across the evaluated models, PanGuide3D achieves the best overall tumor performance and shows improved cross-cohort generalization, particularly for small tumors and challenging anatomical locations, while reducing anatomically implausible false positives. These findings support probabilistic anatomical conditioning as a practical strategy for improving cross-cohort robustness in an end-to-end model and suggest potential utility for contouring support, treatment planning, and multi-institutional studies.

URL PDF HTML ☆

赞 0 踩 0

2604.20942 2026-04-24 q-bio.QM q-bio.BM

VARIANT: Web Server for Decoding and Analyzing Viral Mutations at Genome and Protein Levels

Rui Wang, Xuhang Dai, Xin Cao, Changchuan Yin, Tamar Schlick, Guo-Wei Wei

Comments 15 pages, 5 figures

详情

英文摘要

A comprehensive analysis of viral mutations is essential for understanding viral evolution, disease epidemiology, diagnosis, drug resistance, etc. However, challenges remain in capturing complex mutation patterns and supporting diverse viral families with varying genome architectures. To address these needs, we present VARIANT, an web server for mutational analysis of RNA viral genomes and associated viral products across both single- and multi-segment virus genomes. The server takes as input a viral reference genome, a reference protein sequence, and/or multiple sequence alignment, and automatically provides full annotation of mutation types, including standard categories such as point mutations (missense, silent, and nonsense), insertions, deletions, or frameshift events in both coding and non-coding regions. In addition, VARIANT detects three biologically significant mutation patterns that are overlooked by conventional software/packages: ``row mutations'' (consecutive substitutions within a window of 3 nts), ``hot mutations'' (two non-consecutive substitutions within a window of 3 nts), and potential programmed ribosomal frameshifting (PRF) regions. The server currently contains automatic analysis of major viral pathogens, including SARS-CoV-2, HIV-1, Influenza H3N2, Ebola virus, and Chikungunya virus. It also allows users to analyze customized viruses. Users can track VARIANT analysis progress in real time, visualize mutation distributions, and download structured results in ZIP format. VARIANT also incorporates dual graph topology analysis to classify frameshifting element structures from dot-bracket notation input. This feature enables systematic comparison of RNA secondary structure motifs across viral families by mapping structures to a comprehensive library of dual graph topologies. The web server is freely available at https://variant.up.railway.app.

URL PDF HTML ☆

赞 0 踩 0

2604.20885 2026-04-24 physics.bio-ph cs.GL q-bio.PE

From Physical Difference to Meaning: A Constructor-Theoretic Framework for Prebiotic Information in Casimir-Lifshitz-Coupled Protocell Clusters

Michael Massoth

Comments 8 pages, 3 figures, The Eighteenth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies, BIOTECHNO 2026, Valencia, Spain

2604.05004 2026-04-24 physics.hist-ph q-bio.NC

Causal Stance

Yoshiyuki Ohmura, Yasuo Kuniyoshi

2602.03875 2026-04-24 cs.LG cs.AI q-bio.QM

Reversible Deep Learning for 13C NMR in Chemoinformatics: On Structures and Spectra

Stefan Kuhn, Vandana Dwarka, Przemyslaw Karol Grenda, Eero Vainikko

Comments 10 pages, 4 figures, 4 tables

2307.00385 2026-04-24 q-bio.NC eess.IV

Sulcal Pattern Matching with the Wasserstein Distance

Zijian Chen, Soumya Das, Moo K. Chung

Comments Published in Proceedings of the 2023 IEEE International Symposium on Biomedical Imaging (ISBI)