arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.12026 2026-04-15 cs.LG q-bio.BM q-bio.QM

TriFit: Trimodal Fusion with Protein Dynamics for Mutation Fitness Prediction

Seungik Cho

详情

英文摘要

Predicting the functional impact of single amino acid substitutions (SAVs) is central to understanding genetic disease and engineering therapeutic proteins. While protein language models and structure-based methods have achieved strong performance on this task, they systematically neglect protein dynamics; residue flexibility, correlated motions, and allosteric coupling are well-established determinants of mutational tolerance in structural biology, yet have not been incorporated into supervised variant effect predictors. We present TriFit, a multimodal framework that integrates sequence, structure, and protein dynamics through a four-expert Mixture-of-Experts (MoE) fusion module with trimodal cross-modal contrastive learning. Sequence embeddings are extracted via masked marginal scoring with ESM-2 (650M); structural embeddings from AlphaFold2-predicted C-alpha geometries; and dynamics embeddings from Gaussian Network Model (GNM) B-factors, mode shapes, and residue-residue cross-correlations. The MoE router adaptively weights modality combinations conditioned on the input, enabling protein-specific fusion without fixed modality assumptions. On the ProteinGym substitution benchmark (217 DMS assays, 696k SAVs), TriFit achieves AUROC 0.897 +/- 0.0002, outperforming all supervised baselines including Kermut (0.864) and ProteinNPT (0.844), and the best zero-shot model ESM3 (0.769). Ablation studies confirm that dynamics provides the largest marginal contribution over pairwise modality combinations, and TriFit achieves well-calibrated probabilistic outputs (ECE = 0.044) without post-hoc correction.

URL PDF HTML ☆

赞 0 踩 0

2604.12025 2026-04-15 cs.AI

WiseOWL: A Methodology for Evaluating Ontological Descriptiveness and Semantic Correctness for Ontology Reuse and Ontology Recommendations

Aryan Singh Dalal, Maria Baloch, Asiyah Yu Lin, Anna Maria Masci, Kathleen M. Jagodnik, Hande Kucuk McGinty

Comments 7 pages, 2 figures. Submitted to a conference

2604.12018 2026-04-15 cs.CL cs.AI

LLMs Struggle with Abstract Meaning Comprehension More Than Expected

Hamoud Alhazmi, Jiachen Jiang

2604.12016 2026-04-15 cs.AI cs.LG

Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space

Vladimir Vasilenko

Comments 16 pages, 5 figures. Code and data: https://github.com/b102e/yar-attractor-experiment

2604.12015 2026-04-15 cs.LG cs.CL

UCS: Estimating Unseen Coverage for Improved In-Context Learning

Jiayi Xin, Xiang Li, Evan Qiang, Weiqing He, Tianqi Shang, Weijie J. Su, Qi Long

Comments ACL 2026 Findings; 17 pages, 3 figures

2604.12012 2026-04-15 cs.CV

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

Bingyi Cao, Koert Chen, Kevis-Kokitsi Maninis, Kaifeng Chen, Arjun Karpur, Ye Xia, Sahil Dua, Tanmaya Dabral, Guangxing Han, Bohyung Han, Joshua Ainslie, Alex Bewley, Mithun Jacob, René Wagner, Washington Ramos, Krzysztof Choromanski, Mojtaba Seyedhosseini, Howard Zhou, André Araujo

Comments CVPR2026 camera-ready + appendix

2604.12007 2026-04-15 cs.AI

When to Forget: A Memory Governance Primitive

Baris Simsek

Comments 12 pages, 5 figures

2604.12006 2026-04-15 cs.RO

A Foot Resistive Force Model for Legged Locomotion on Muddy Terrains

Xunjie Chen, Liuyin Wang, Xinyan Huang, Jerry Shan, Yantao Shen, Jingang Yi

Comments IEEE/ASME Transactions on Mechatronics (under review)

2604.12005 2026-04-15 cs.LG cs.AI

BayMOTH: Bayesian optiMizatiOn with meTa-lookahead -- a simple approacH

Rahman Ejaz, Varchas Gopalaswamy, Ricardo Luna, Aarne Lees, Vineet Gundecha, Christopher Kanan, Soumyendu Sarkar, Riccardo Betti

2604.11998 2026-04-15 cs.CV cs.AI

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timofte, Nicu Sebe, Mohamed Elhoseiny, Lingyi Hong, Mingxi Cheng, Xingqi He, Runze Li, Xingdong Sheng, Wenqiang Zhang, Jiacong Liu, Shu Luo, Yikai Qin, Yaze Zhao, Yongwei Jiang, Yixiong Zou, Zhe Zhang, Yang Yang, Kaiyu Li, Bowen Fu, Zixuan Jiang, Ke Li, Hui Qiao, Xiangyong Cao, Xuanlong Yu, Youyang Sha, Longfei Liu, Di Yang, Xi Shen, Kyeongryeol Go, Taewoong Jang, Saiprasad Meesiyawar, Ravi Kirasur, Rakshita Kulkarni, Bhoomi Deshpande, Harsh Patil, Uma Mudenagudi, Shuming Hu, Chao Chen, Tao Wang, Wei Zhou, Qi Xu, Zhenzhao Xing, Dandan Zhao, Hanzhe Xia, Dongdong Lu, Zhe Zhang, Jingru Wang, Guangwei Huang, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Jiajia Liu, Liwei Zhou, Bei Dou, Tao Wu, Zekang Fan, Junjie Liu, Adhémar de Senneville, Flavien Armangeon, Mengbers, Yazhe Lyu, Zhimeng Xin, Zijian Zhuang, Hongchun Zhu, Li Wang

Comments accepted by CVPRW 26 @ NTIRE

2604.11996 2026-04-15 cs.CL cs.AI

Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces

Manas Pathak, Xingyao Chen, Shuozhe Li, Amy Zhang, Liu Leqi

详情

英文摘要

Should we trust Large Language Models (LLMs) with high accuracy? LLMs achieve high accuracy on reasoning benchmarks, but correctness alone does not reveal the quality of the reasoning used to produce it. This highlights a fundamental limitation of outcome-based evaluation: models may arrive at correct answers through flawed reasoning, and models with substantially different reasoning capabilities can nevertheless exhibit similar benchmark accuracy, for example due to memorization or over-optimization. In this paper, we ask: given existing benchmarks, can we move beyond outcome-based evaluation to assess the quality of reasoning itself? We seek metrics that (1) differentiate models with similar accuracy and (2) are robust to variations in input prompts and generation configurations. To this end, we propose a reasoning score that evaluates reasoning traces along dimensions such as faithfulness, coherence, utility, and factuality. A remaining question is how to aggregate this score across multiple sampled traces. Naively averaging them is undesirable, particularly in long-horizon settings, where the number of possible trajectories grows rapidly, and low-confidence correct traces are more likely to be coincidental. To address this, we introduce the Filtered Reasoning Score (FRS), which computes reasoning quality using only the top-K% most confident traces. Evaluating with FRS, models that are indistinguishable under standard accuracy exhibit significant differences in reasoning quality. Moreover, models with higher FRS on one benchmark tend to perform better on other reasoning benchmarks, in both accuracy and reasoning quality. Together, these findings suggest that FRS complements accuracy by capturing a model's transferable reasoning capabilities. We open source our evaluation codebase: https://github.com/Manas2006/benchmark_reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2604.11994 2026-04-15 cs.LG math.OC stat.ML

Offline-Online Reinforcement Learning for Linear Mixture MDPs

Zhongjun Zhang, Sean R. Sinclair

Comments 72 pages, 4 figures

2604.11993 2026-04-15 cs.CV physics.optics

Ultra-low-light computer vision using trained photon correlations

Mandar M. Sohoni, Jérémie Laydevant, Mathieu Ouellet, Shi-Yuan Ma, Ryotatsu Yanagimoto, Benjamin A. Ash, Tatsuhiro Onodera, Tianyu Wang, Logan G. Wright, Peter L. McMahon

Comments 49 pages, 47 figures

2604.11992 2026-04-15 cs.RO cs.CV

ReefMapGS: Enabling Large-Scale Underwater Reconstruction by Closing the Loop Between Multimodal SLAM and Gaussian Splatting

Daniel Yang, Jungseok Hong, John J. Leonard, Yogesh Girdhar

2604.11986 2026-04-15 cs.LG

Exploring Concept Subspace for Self-explainable Text-Attributed Graph Learning

Xiaoxue Han, Libo Zhang, Zining Zhu, Yue Ning

2604.11981 2026-04-15 cs.RO

Bipedal-Walking-Dynamics Model on Granular Terrains

Xunjie Chen, Xinyan Huang, Peter Shan, Jingang Yi, Tao Liu

Comments Accepted paper in ICRA 2026

2604.11978 2026-04-15 cs.AI

The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break

Xinyu Jessica Wang, Haoyue Bai, Yiyou Sun, Haorui Wang, Shuibai Zhang, Wenjie Hu, Mya Schroder, Bilge Mutlu, Dawn Song, Robert D Nowak

2604.11975 2026-04-15 cs.RO

M2HRI: An LLM-Driven Multimodal Multi-Agent Framework for Personalized Human-Robot Interaction

Shaid Hasan, Breenice Lee, Sujan Sarker, Tariq Iqbal

2604.11972 2026-04-15 cs.LG

Multi-Head Residual-Gated DeepONet for Coherent Nonlinear Wave Dynamics

Zhiwei Fan, Yiming Pan, Daniel Coca

2604.11971 2026-04-15 cs.LG stat.AP

Classification of Epileptic iEEG using Topological Machine Learning

Sunia Tanweer, Narayan Puthanmadam Subramaniyam, Firas A. Khasawneh

2604.11970 2026-04-15 cs.CV cs.AI cs.CL cs.LG

INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents

Somraj Gautam, Anathapindika Dravichi, Gaurav Harit

Comments Accepted in ACL 2026 (Findings)

2604.11969 2026-04-15 cs.AI

Narrative-Driven Paper-to-Slide Generation via ArcDeck

Tarik Can Ozden, Sachidanand VS, Furkan Horoz, Ozgur Kara, Junho Kim, James Matthew Rehg

Comments Project webpage: https://arcdeck.org/

2604.11961 2026-04-15 cs.CV

Fall Risk and Gait Analysis in Community-Dwelling Older Adults using World-Spaced 3D Human Mesh Recovery

Chitra Banarjee, Patrick Kwon, Ania Lipat, Rui Xie, Chen Chen, Ladda Thiamwong

Comments Work was accepted at Computer Vision for Biomechanics Workshop (CVBW) at CVPR 2026

2604.11948 2026-04-15 cs.LG cs.AR

Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Yixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania

Comments Accepted for publication at the 63rd ACM/IEEE Design Automation Conference (DAC 2026)

2604.11947 2026-04-15 cs.LG cs.AI cs.DC

ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism

Alan Aboudib, Rodrigo Lopez Portillo A., Kalei Brady, Steffen Cruz

2604.11945 2026-04-15 cs.LG cs.AI cs.MA

AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow

Jiale Liu, Nanzhe Wang

详情

英文摘要

High-fidelity numerical simulation of subsurface flow is computationally intensive, especially for many-query tasks such as uncertainty quantification and data assimilation. Deep learning (DL) surrogates can significantly accelerate forward simulations, yet constructing them requires substantial machine learning (ML) expertise - from architecture design to hyperparameter tuning - that most domain scientists do not possess. Furthermore, the process is predominantly manual and relies heavily on heuristic choices. This expertise gap remains a key barrier to the broader adoption of DL surrogate techniques. For this reason, we present AutoSurrogate, a large-language-model-driven multi-agent framework that enables practitioners without ML expertise to build high-quality surrogates for subsurface flow problems through natural-language instructions. Given simulation data and optional preferences, four specialized agents collaboratively execute data profiling, architecture selection from a model zoo, Bayesian hyperparameter optimization, model training, and quality assessment against user-specified thresholds. The system also handles common failure modes autonomously, including restarting training with adjusted configurations when numerical instabilities occur and switching to alternative architectures when predictive accuracy falls short of targets. In our setting, a single natural-language sentence can be sufficient to produce a deployment-ready surrogate model, with minimum human intervention required at any intermediate stage. We demonstrate the utility of AutoSurrogate on a 3D geological carbon storage modeling task, mapping permeability fields to pressure and CO$_2$ saturation fields over 31 timesteps. Without any manual tuning, AutoSurrogate is able to outperform expert-designed baselines and domain-agnostic AutoML methods, demonstrating strong potential for practical deployment.

URL PDF HTML ☆

赞 0 踩 0

2604.11944 2026-04-15 cs.LG q-bio.QM

A unified data format for managing diabetes time-series data: DIAbetes eXchange (DIAX)

Elliott C. Pryor, Marc D. Breton, Anas El Fathi

Comments 7 pages, 2 figures

2604.11932 2026-04-15 cs.CV

EigenCoin: sassanid coins classification based on Bhattacharyya distance

Rahele Allahverdi, Mohammad Mahdi Dehshibi, Azam Bastanfard, Daryoosh Akbarzadeh

Comments 2nd World Conference on Information Technology (WCIT-2011)

2604.11928 2026-04-15 cs.LG cs.CR

INTARG: Informed Real-Time Adversarial Attack Generation for Time-Series Regression

Gamze Kirman Tokgoz, Onat Gungor, Tajana Rosing, Baris Aksanli

2604.11927 2026-04-15 cs.CV

A Workflow to Efficiently Generate Dense Tissue Ground Truth Masks for Digital Breast Tomosynthesis

Tamerlan Mustafaev, Oleg Kruglov, Margarita Zuley, Luana de Mero Omena, Guilherme Muniz de Oliveira, Vitor de Sousa Franca, Bruno Barufaldi, Robert Nishikawa, Juhun Lee