arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.15723 2026-03-18 cs.AI

Context-Length Robustness in Question Answering Models: A Comparative Empirical Study

Trishita Dhara, Siddhesh Sheth

Comments Accepted for Oral Presentation at Math AI 2026 conference

详情

英文摘要

Large language models are increasingly deployed in settings where relevant information is embedded within long and noisy contexts. Despite this, robustness to growing context length remains poorly understood across different question answering tasks. In this work, we present a controlled empirical study of context-length robustness in large language models using two widely used benchmarks: SQuAD and HotpotQA. We evaluate model accuracy as a function of total context length by systematically increasing the amount of irrelevant context while preserving the answer-bearing signal. This allows us to isolate the effect of context length from changes in task difficulty. Our results show a consistent degradation in performance as context length increases, with substantially larger drops observed on multi-hop reasoning tasks compared to single-span extraction tasks. In particular, HotpotQA exhibits nearly twice the accuracy degradation of SQuAD under equivalent context expansions. These findings highlight task-dependent differences in robustness and suggest that multi-hop reasoning is especially vulnerable to context dilution. We argue that context-length robustness should be evaluated explicitly when assessing model reliability, especially for applications involving long documents or retrieval-augmented generation.

URL PDF HTML ☆

赞 0 踩 0

2603.15713 2026-03-18 cs.LG cs.AI cs.IR

Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences

Artem Sakhno, Ivan Sergeev, Alexey Shestov, Omar Zoloev, Elizaveta Kovtun, Gleb Gusev, Andrey Savchenko, Maksim Makarenko

2603.15711 2026-03-18 cs.AI cs.IR q-bio.QM

Knowledge Graph Extraction from Biomedical Literature for Alkaptonuria Rare Disease

Giang Pham, Rebecca Finetti, Caterina Graziani, Bianca Roncaglia, Asma Bendjeddou, Linda Brodo, Sara Brunetti, Moreno Falaschi, Stefano Forti, Silvia Giulia Galfré, Paolo Milazzo, Corrado Priami, Annalisa Santucci, Ottavia Spiga, Alina Sîrbu

2603.15708 2026-03-18 cs.LG cs.AI

Mastering the Minority: An Uncertainty-guided Multi-Expert Framework for Challenging-tailed Sequence Learning

Ye Wang, Zixuan Wu, Lifeng Shen, Jiang Xie, Xiaoling Wang, Hong Yu, Guoyin Wang

2603.15696 2026-03-18 cs.LG cs.AI

Tackling Over-smoothing on Hypergraphs: A Ricci Flow-guided Neural Diffusion Approach

Mengyao Zhou, Zhiheng Zhou, Xiao Han, Xingqin Qi, Guanghui Wang, Guiying Yan

2603.15689 2026-03-18 cs.LG cs.AI cs.CV

Transition Flow Matching

Chenrui Ma

2603.15688 2026-03-18 cs.SD cs.LG

PulmoVec: A Two-Stage Stacking Meta-Learning Architecture Built on the HeAR Foundation Model for Multi-Task Classification of Pediatric Respiratory Sounds

Izzet Turkalp Akbasli, Oguzhan Serin

Comments 14 pages, 2 figures, 4 tables; supplementary material included (4 tables, 3 multi-panel figures)

2603.15687 2026-03-18 cs.LG cs.AI

Evidential Domain Adaptation for Remaining Useful Life Prediction with Incomplete Degradation

Yubo Hou, Mohamed Ragab, Yucheng Wang, Min Wu, Abdulla Alseiari, Chee-Keong Kwoh, Xiaoli Li, Zhenghua Chen

详情

DOI: 10.1109/TIM.2025.3551977
Journal ref: IEEE Transactions on Instrumentation and Measurement (2025) IEEE Transactions on Instrumentation and Measurement IEEE Transactions on Instrumentation and Measurement IEEE Transactions on Instrumentation and Measurement

英文摘要

Accurate Remaining Useful Life (RUL) prediction without labeled target domain data is a critical challenge, and domain adaptation (DA) has been widely adopted to address it by transferring knowledge from a labeled source domain to an unlabeled target domain. Despite its success, existing DA methods struggle significantly when faced with incomplete degradation trajectories in the target domain, particularly due to the absence of late degradation stages. This missing data introduces a key extrapolation challenge. When applied to such incomplete RUL prediction tasks, current DA methods encounter two primary limitations. First, most DA approaches primarily focus on global alignment, which can misaligns late degradation stage in the source domain with early degradation stage in the target domain. Second, due to varying operating conditions in RUL prediction, degradation patterns may differ even within the same degradation stage, resulting in different learned features. As a result, even if degradation stages are partially aligned, simple feature matching cannot fully align two domains. To overcome these limitations, we propose a novel evidential adaptation approach called EviAdapt, which leverages evidential learning to enhance domain adaptation. The method first segments the source and target domain data into distinct degradation stages based on degradation rate, enabling stage-wise alignment that ensures samples from corresponding stages are accurately matched. To address the second limitation, we introduce an evidential uncertainty alignment technique that estimates uncertainty using evidential learning and aligns the uncertainty across matched stages.

URL PDF HTML ☆

赞 0 踩 0

2603.15681 2026-03-18 cs.LG

Flood Risk Follows Valleys, Not Grids: Graph Neural Networks for Flash Flood Susceptibility Mapping in Himachal Pradesh with Conformal Uncertainty Quantification

Paras Sharma, Swastika Sharma

Comments 28 pages, 10 figures, 2 tables. Code and data at https://github.com/Parassharmaa/flash-flood-zones-hp

2603.15677 2026-03-18 cs.CL

MedArena: Comparing LLMs for Medicine-in-the-Wild Clinician Preferences

Eric Wu, Kevin Wu, Jason Hom, Paul H. Yi, Angela Zhang, Alejandro Lozano, Jeff Nirschl, Jeff Tangney, Kevin Byram, Braydon Dymm, Narender Annapureddy, Eric Topol, David Ouyang, James Zou

2603.15668 2026-03-18 cs.AI cs.CR quant-ph

Quantum-Secure-By-Construction (QSC): A Paradigm Shift For Post-Quantum Agentic Intelligence

Arit Kumar Bishwas, Mousumi Sen, Albert Nieto-Morales, Joel Jacob Varghese

2603.15667 2026-03-18 cs.AI cs.CE

A Dynamic Survey of Fuzzy, Intuitionistic Fuzzy, Neutrosophic, Plithogenic, and Extensional Sets

Takaaki Fujita, Florentin Smarandache

Comments Book. Peer-reviewed. Publisher: Neutrosophic Science International Association (NSIA). ISBN: 978-1-59973-842-0

2603.15666 2026-03-18 cs.AI

Compiled Memory: Not More Information, but More Precise Instructions for Language Agents

James Rhodes, George Kang

2603.15665 2026-03-18 cs.AI

QV May Be Enough: Toward the Essence of Attention in LLMs

Zhang Edward

2603.15663 2026-03-18 cs.CV cs.AI

OrthoAI v2: From Single-Agent Segmentation to Dual-Agent Treatment Planning for Clear Aligners

Lansiaux Edouard, Leman Margaux

2603.15661 2026-03-18 cs.AI

DynaTrust: Defending Multi-Agent Systems Against Sleeper Agents via Dynamic Trust Graphs

Yu Li, Qiang Hu, Yao Zhang, Lili Quan, Jiongchi Yu, Junjie Wang

Comments 9 pages

2603.15658 2026-03-18 cs.AI cs.CL cs.IR

Did You Check the Right Pocket? Cost-Sensitive Store Routing for Memory-Augmented Agents

Madhava Gaikwad

Comments accepted in ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems

2603.15656 2026-03-18 cs.LG cs.AI cs.CV

Attribution-Guided Model Rectification of Unreliable Neural Network Behaviors

Peiyu Yang, Naveed Akhtar, Jiantong Jiang, Ajmal Mian

Comments Accepted to CVPR 2026

2603.15655 2026-03-18 cs.LG cs.AI cs.CL cs.IT cs.MA math.IT

Beyond Reward Suppression: Reshaping Steganographic Communication Protocols in MARL via Dynamic Representational Circuit Breaking

Liu Hung Ming

Comments 38 pages, includes 5 figures and 8 tables, preliminary version, AI safety / multi-agent reinforcement learning

详情

英文摘要

In decentralized Multi-Agent Reinforcement Learning (MARL), steganographic collusion -- where agents develop private protocols to evade monitoring -- presents a critical AI safety threat. Existing defenses, limited to behavioral or reward layers, fail to detect coordination in latent communication channels. We introduce the Dynamic Representational Circuit Breaker (DRCB), an architectural defense operating at the optimization substrate. Building on the AI Mother Tongue (AIM) framework, DRCB utilizes a Vector Quantized Variational Autoencoder (VQ-VAE) bottleneck to convert unobservable messages into auditable statistical objects. DRCB monitors signals including Jensen-Shannon Divergence drift, L2-norm codebook displacement, and Randomized Observer Pool accuracy to compute an EMA-based Collusion Score. Threshold breaches trigger four escalating interventions: dynamic adaptation, gradient-space penalty injection into the Advantage function A^pi, temporal reward suppression, and full substrate circuit breaking via codebook shuffling and optimizer state reset. Experiments on a Contextual Prisoner's Dilemma with MNIST labels show that while static monitoring fails (p = 0.3517), DRCB improves observer mean accuracy from 0.858 to 0.938 (+9.3 percent) and reduces volatility by 43 percent, while preserving mean joint reward (p = 0.854). Analysis of 214,298 symbol samples confirms "Semantic Degradation," where high-frequency sequences converge to zero entropy, foreclosing complex steganographic encodings. We identify a "Transparency Paradox" where agents achieve surface-level determinism while preserving residual capacity in long-tail distributions, reflecting Goodhart's Law. This task-agnostic methodology provides a technical path toward MICA-compliant (Multi-Agent Internal Coupling Audit) pre-deployment auditing for autonomous systems.

URL PDF HTML ☆

赞 0 踩 0

2603.15654 2026-03-18 cs.LG cs.AI cs.CV

Discovering the Hidden Role of Gini Index In Prompt-based Classification

Ruixi Lin

2603.15653 2026-03-18 cs.CL cs.AI cs.LG

Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context

Keivan Alizadeh, Parshin Shojaee, Minsik Cho, Mehrdad Farajtabar

Comments preprint

详情

英文摘要

Long-context handling remains a core challenge for language models: even with extended context windows, models often fail to reliably extract, reason over, and use the information across long contexts. Recent works like Recursive Language Models (RLM) have approached this challenge by agentic way of decomposing long contexts into recursive sub-calls through programmatic interaction at inference. While promising, the success of RLM critically depends on how these context-interaction programs are selected, which has remained largely unexplored. In this paper, we study this problem and introduce SRLM, a framework that augments programmatic context interaction with uncertainty-aware Self-Reflection. SRLM leverages three intrinsic signals: self consistency, reasoning length, and verbalized confidence. These serve as complementary indicators of a model's internal uncertainty, and the model uses them to evaluate and compare candidate context-interaction programs. Extensive experiments across diverse benchmark datasets, context lengths, and backbone models, show that SRLM consistently outperforms state-of-the-art baselines, yielding up to 22% improvement over RLM under the same time budget. Our findings show that recursion itself is not the primary driver of performance in RLM, and a simple self-reflective program search can match or surpass RLM without requiring self-query or explicit recursion mechanisms. We find that for context lengths within the model's window, RLMs with recursion often degrade performance relative to the base model, whereas SRLM yields consistent gains across both short and long contexts. We also find that RLM is less effective in tasks with semantically intensive nature, where heuristic program search is insufficient and broader contextual understanding is required, while self-reflection in SRLM provides a semantic signal that better steers reasoning in these scenarios.

URL PDF HTML ☆

赞 0 踩 0

2603.15651 2026-03-18 cs.LG cs.AI

A federated learning framework with knowledge graph and temporal transformer for early sepsis prediction in multi-center ICUs

Yue Chang, Guangsen Lin, Jyun Jie Chuang, Shunqi Liu, Xinkui Li, Yaozheng Li

2603.15650 2026-03-18 cs.LG cs.CV

How to Achieve Prototypical Birth and Death for OOD Detection?

Ningkang Peng, Qianfeng Yu, Xiaoqian Peng, Linjing Qian, Yafei Liu, Canran Xiao, Xinyu Lu, Tingyu Lu, Zhichao Zheng, Yanhui Gu

2603.15648 2026-03-18 cs.CV cs.GR cs.LG cs.MM

Improving Generative Adversarial Network Generalization for Facial Expression Synthesis

Arbish Akram, Nazar Khan, Arif Mahmood

2603.15647 2026-03-18 cs.LG cs.AI

Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing

Zeyu Zhang, Xiangxiang Dai, Ziyi Han, Xutong Liu, John C. S. Lui

2603.15645 2026-03-18 cs.LG cs.AI

XLinear: Frequency-Enhanced MLP with CrossFilter for Robust Long-Range Forecasting

Xiang Ao

Comments 8 pages, 7 figures. Accepted and published in 2025 5th International Conference on Artificial Intelligence, Automation and High Performance Computing (AIAHPC)

2603.15644 2026-03-18 cs.LG cs.CL

Tokenization Tradeoffs in Structured EHR Foundation Models

Lin Lawrence Guo, Santiago Eduardo Arciniegas, Joseph Jihyung Lee, Adam Paul Yan, George Tomlinson, Jason Fries, Lillian Sung

2603.15642 2026-03-18 cs.AI

CraniMem: Cranial Inspired Gated and Bounded Memory for Agentic Systems

Pearl Mody, Mihir Panchal, Rishit Kar, Kiran Bhowmick, Ruhina Karani

Comments International Conference on Learning Representations 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents)

2603.15641 2026-03-18 cs.AI cs.LG cs.NE

Form Follows Function: Recursive Stem Model

Navid Hakimi

Comments 11 pages, 9 figures

详情

英文摘要

Recursive reasoning models such as Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM) show that small, weight-shared networks can solve compute-heavy and NP puzzles by iteratively refining latent states, but their training typically relies on deep supervision and/or long unrolls that increase wall-clock cost and can bias the model toward greedy intermediate behavior. We introduce Recursive Stem Model (RSM), a recursive reasoning approach that keeps the TRM-style backbone while changing the training contract so the network learns a stable, depth-agnostic transition operator. RSM fully detaches the hidden-state history during training, treats early iterations as detached "warm-up" steps, and applies loss only at the final step. We further grow the outer recursion depth $H$ and inner compute depth $L$ independently and use a stochastic outer-transition scheme (stochastic depth over $H$) to mitigate instability when increasing depth. This yields two key capabilities: (i) $>20\times$ faster training than TRM while improving accuracy ($\approx 5\times$ reduction in error rate), and (ii) test-time scaling where inference can run for arbitrarily many refinement steps ($\sim 20,000 H_{\text{test}} \gg 20 H_{\text{train}}$), enabling additional "thinking" without retraining. On Sudoku-Extreme, RSM reaches 97.5% exact accuracy with test-time compute (within ~1 hour of training on a single A100), and on Maze-Hard ($30 \times 30$) it reaches ~80% exact accuracy in ~40 minutes using attention-based instantiation. Finally, because RSM implements an iterative settling process, convergence behavior provides a simple, architecture-native reliability signal: non-settling trajectories warn that the model has not reached a viable solution and can be a guard against hallucination, while stable fixed points can be paired with domain verifiers for practical correctness checks.

URL PDF HTML ☆

赞 0 踩 0

2603.15634 2026-03-18 cs.AI cs.IR cs.LG

NextMem: Towards Latent Factual Memory for LLM-based Agents

Zeyu Zhang, Rui Li, Xiaoyan Zhao, Yang Zhang, Wenjie Wang, Xu Chen, Tat-Seng Chua

Comments 17 pages, 7 figures, 4 tables