arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.00812 2026-04-02 cs.LG

Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation

Martin Jaraiz

Comments 10 pages, 3 figures, draft

详情

英文摘要

We present experimental results from seven controlled runs of nanoFMT, a Free-Market Algorithm (FMA) orchestrated transformer with dynamic Mixture-of-Experts (MoE) management. The experiments address a fundamental question for advanced LLM development: how should an MoE system manage its expert pool when operating at full capacity under changing data distributions? We demonstrate that cost-penalized fitness metrics, combined with a linear grace period for newborn experts, produce a system that accumulates domain expertise through diversification rather than replacement. The central result is a round-trip domain shift experiment showing 9-11x faster recovery when returning to a previously learned domain, with zero expert births or replacements required. This "molecular memory" effect -- where dormant experts survive and reactivate when their domain returns -- has no analogue in current MoE management approaches. A preliminary cost analysis estimates annual savings of $39.1M and 27.1 GWh energy reduction for an OpenAI-scale provider under a moderate scenario.

URL PDF HTML ☆

赞 0 踩 0

2604.00804 2026-04-02 cs.RO cs.CV

Compact Keyframe-Optimized Multi-Agent Gaussian Splatting SLAM

Monica M. Q. Li, Pierre-Yves Lajoie, Jialiang Liu, Giovanni Beltrame

2604.00801 2026-04-02 cs.LG cs.AI cs.CL

Routing-Free Mixture-of-Experts

Yilun Liu, Jinru Han, Sikuan Yan, Volker Tresp, Yunpu Ma

Comments Code is available at https://github.com/liuyilun2000/RoutingFreeMoE/tree/release

2604.00800 2026-04-02 cs.LG

MIRANDA: MId-feature RANk-adversarial Domain Adaptation toward climate change-robust ecological forecasting with deep learning

Yuchang Jiang, Jan Dirk Wegner, Vivien Sainte Fare Garnot

Comments EarthVision CVPRW 2026

2604.00795 2026-04-02 cs.AI cs.LG

Preference Guided Iterated Pareto Referent Optimisation for Accessible Route Planning

Paolo Speziali, Arno De Greef, Mehrdad Asadi, Willem Röpke, Ann Nowé, Diederik M. Roijers

2604.00792 2026-04-02 cs.CV

HICT: High-precision 3D CBCT reconstruction from a single X-ray

Wen Ma, Jiaxiang Liu, Zikai Xiao, Ziyang Wang, Feng Yang, Zuozhu Liu

2604.00790 2026-04-02 cs.AI

RefineRL: Advancing Competitive Programming with Self-Refinement Reinforcement Learning

Shaopeng Fu, Xingxing Zhang, Li Dong, Di Wang, Furu Wei

2604.00788 2026-04-02 cs.AI cs.CR

UK AISI Alignment Evaluation Case-Study

Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz, Xander Davies

2604.00785 2026-04-02 cs.LG cs.AI cs.DC

Scalable Pretraining of Large Mixture of Experts Language Models on Aurora Super Computer

Dharma Teja Vooturi, Dhiraj Kalamkar, Dipankar Das, Bharat Kaul

2604.00784 2026-04-02 cs.CV

An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language Models

Lennart Maack, Alexander Schlaefer

2604.00778 2026-04-02 cs.CL

From Early Encoding to Late Suppression: Interpreting LLMs on Character Counting Tasks

Ayan Datta, Mounika Marreddy, Alexander Mehler, Zhixue Zhao, Radhika Mamidi

详情

英文摘要

Large language models (LLMs) exhibit failures on elementary symbolic tasks such as character counting in a word, despite excelling on complex benchmarks. Although this limitation has been noted, the internal reasons remain unclear. We use character counting (e.g., "How many p's are in apple?") as a minimal, controlled probe that isolates token-level reasoning from higher-level confounds. Using this setting, we uncover a consistent phenomenon across modern architectures, including LLaMA, Qwen, and Gemma: models often compute the correct answer internally yet fail to express it at the output layer. Through mechanistic analysis combining probing classifiers, activation patching, logit lens analysis, and attention head tracing, we show that character-level information is encoded in early and mid-layer representations. However, this information is attenuated by a small set of components in later layers, especially the penultimate and final layer MLP. We identify these components as negative circuits: subnetworks that downweight correct signals in favor of higher-probability but incorrect outputs. Our results lead to two contributions. First, we show that symbolic reasoning failures in LLMs are not due to missing representations or insufficient scale, but arise from structured interference within the model's computation graph. This explains why such errors persist and can worsen under scaling and instruction tuning. Second, we provide evidence that LLM forward passes implement a form of competitive decoding, in which correct and incorrect hypotheses coexist and are dynamically reweighted, with final outputs determined by suppression as much as by amplification. These findings carry implications for interpretability and robustness: simple symbolic reasoning exposes weaknesses in modern LLMs, underscoring need for design strategies that ensure information is encoded and reliably used.

URL PDF HTML ☆

赞 0 踩 0

2604.00773 2026-04-02 cs.CL

From Baselines to Preferences: A Comparative Study of LoRA/QLoRA and Preference Optimization for Mental Health Text Classification

Mihael Arcan

2604.00770 2026-04-02 cs.LG cs.AI

Thinking Wrong in Silence: Backdoor Attacks on Continuous Latent Reasoning

Swapnil Parekh

2604.00767 2026-04-02 cs.LG

ActivityNarrated: An Open-Ended Narrative Paradigm for Wearable Human Activity Understanding

Lala Shakti Swarup Ray, Mengxi Liu, Alcina Pinto, Deepika Gurung, Daniel Geissler, Paul Lukowoicz, Bo Zhou

详情

英文摘要

Wearable HAR has improved steadily, but most progress still relies on closed-set classification, which limits real-world use. In practice, human activity is open-ended, unscripted, personalized, and often compositional, unfolding as narratives rather than instances of fixed classes. We argue that addressing this gap does not require simply scaling datasets or models. It requires a fundamental shift in how wearable HAR is formulated, supervised, and evaluated. This work shows how to model open-ended activity narratives by aligning wearable sensor data with natural-language descriptions in an open-vocabulary setting. Our framework has three core components. First, we introduce a naturalistic data collection and annotation pipeline that combines multi-position wearable sensing with free-form, time-aligned narrative descriptions of ongoing behavior, allowing activity semantics to emerge without a predefined vocabulary. Second, we define a retrieval-based evaluation framework that measures semantic alignment between sensor data and language, enabling principled evaluation without fixed classes while also subsuming closed-set classification as a special case. Third, we present a language-conditioned learning architecture that supports sensor-to-text inference over variable-length sensor streams and heterogeneous sensor placements. Experiments show that models trained with fixed-label objectives degrade sharply under real-world variability, while open-vocabulary sensor-language alignment yields robust and semantically grounded representations. Once this alignment is learned, closed-set activity recognition becomes a simple downstream task. Under cross-participant evaluation, our method achieves 65.3% Macro-F1, compared with 31-34% for strong closed-set HAR baselines. These results establish open-ended narrative modeling as a practical and effective foundation for real-world wearable HAR.

URL PDF HTML ☆

赞 0 踩 0

2604.00761 2026-04-02 cs.CV cs.CR

PrivHAR-Bench: A Graduated Privacy Benchmark Dataset for Video-Based Action Recognition

Samar Ansari

2604.00757 2026-04-02 cs.CV cs.AI

IWP: Token Pruning as Implicit Weight Pruning in Large Vision Language Models

Dong-Jae Lee, Sunghyun Baek, Junmo Kim

2604.00754 2026-04-02 cs.CL cs.LG

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

Zehao Jin, Yanan Sui

2604.00752 2026-04-02 cs.RO cs.HC

A wearable haptic device for edge and surface simulation

Rui Chen, Xianlong Mai, Alireza Sanaei, Domenico Chiaradia, Antonio Frisoli, Daniele Leonardis

2604.00744 2026-04-02 cs.RO

How to Train your Tactile Model: Tactile Perception with Multi-fingered Robot Hands

Christopher J. Ford, Kaichen Shi, Laura Butcher, Nathan F. Lepora, Efi Psomopoulou

Comments Accepted for publication at the International Conference on Robotics and Automation (ICRA) 2026, Vienna

2604.00739 2026-04-02 cs.LG cs.AI

BioCOMPASS: Integrating Biomarkers into Transformer-Based Immunotherapy Response Prediction

Sayed Hashim, Frank Soboczenski, Paul Cairns

2604.00738 2026-04-02 cs.RO

SoftHand Model-W: A 3D-Printed, Anthropomorphic, Underactuated Robot Hand with Integrated Wrist and Carpal Tunnel

Dhillon B. Merritt, Christopher J. Ford, Haoran Li, Malia Smith, Zhixing Chen, Efi Psomopoulou, Nathan F. Lepora

Comments Accepted for publication at the International Conference of Robotics and Automation (ICRA) 2026, Vienna

2604.00726 2026-04-02 cs.LG

Exploring Silent Data Corruption as a Reliability Challenge in LLM Training

Anton Altenbernd, Philipp Wiesner, Odej Kao

Comments 10 Pages, 4 Figures, CCGrid 2026

2604.00725 2026-04-02 cs.CV cs.LG

A Benchmark of State-Space Models vs. Transformers and BiLSTM-based Models for Historical Newspaper OCR

Merveilles Agbeti-messan, Thierry Paquet, Clément Chatelain, Pierrick Tranouez, Stéphane Nicolas

2604.00722 2026-04-02 cs.CL

LangMARL: Natural Language Multi-Agent Reinforcement Learning

Huaiyuan Yao, Longchao Da, Xiaoou Liu, Charles Fleming, Tianlong Chen, Hua Wei

Comments 20 pages, 12 figures

2604.00716 2026-04-02 cs.AI cs.LG

CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection

Rajkiran Panuganti

Comments 11 pages, 1 figure, 3 tables. Code available at https://github.com/agenticclass/circuitprobe

2604.00715 2026-04-02 cs.CL cs.AI cs.LG

To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining

Karan Singh, Michael Yu, Varun Gangal, Zhuofu Tao, Sachin Kumar, Emmy Liu, Steven Y. Feng

Comments Code and data at https://github.com/DegenAI-Labs/RAG-scaling-laws

2604.00698 2026-04-02 cs.LG cs.AI cs.CL

Learning to Hint for Reinforcement Learning

Yu Xia, Canwen Xu, Zhewei Yao, Julian McAuley, Yuxiong He

详情

英文摘要

Group Relative Policy Optimization (GRPO) is widely used for reinforcement learning with verifiable rewards, but it often suffers from advantage collapse: when all rollouts in a group receive the same reward, the group yields zero relative advantage and thus no learning signal. For example, if a question is too hard for the reasoner, all sampled rollouts can be incorrect and receive zero reward. Recent work addresses this issue by adding hints or auxiliary scaffolds to such hard questions so that the reasoner produces mixed outcomes and recovers a non-zero update. However, existing hints are usually fixed rather than adapted to the current reasoner, and a hint that creates learning signal under the hinted input does not necessarily improve the no-hint policy used at test time. To this end, we propose Hint Learning for Reinforcement Learning (HiLL), a framework that jointly trains a hinter policy and a reasoner policy during RL. For each hard question, the hinter generates hints online conditioned on the current reasoner's incorrect rollout, allowing hint generation to adapt to the reasoner's evolving errors. We further introduce hint reliance, which measures how strongly correct hinted trajectories depend on the hint. We derive a transferability result showing that lower hint reliance implies stronger transfer from hinted success to no-hint success, and we use this result to define a transfer-weighted reward for training the hinter. Therefore, HiLL favors hints that not only recover informative GRPO groups, but also produce signals that are more likely to improve the original no-hint policy. Experiments across multiple benchmarks show that HiLL consistently outperforms GRPO and prior hint-based baselines, demonstrating the value of adaptive and transfer-aware hint learning for RL. The code is available at https://github.com/Andree-9/HiLL.

URL PDF HTML ☆

赞 0 踩 0

2604.00696 2026-04-02 cs.CV

TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning

Soumya Shamarao Jahagirdar, Edson Araujo, Anna Kukleva, M. Jehanzeb Mirza, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Rogerio Feris, James R. Glass, Hilde Kuehne

2604.00689 2026-04-02 cs.LG cs.NA math.NA

Performance of Neural and Polynomial Operator Surrogates

Josephine Westermann, Benno Huber, Thomas O'Leary-Roseberry, Jakob Zech

Comments 44 pages, 21 figures

2604.00686 2026-04-02 cs.LG

Full-Gradient Successor Feature Representations

Ritish Shrirao, Aditya Priyadarshi, Raghuram Bharadwaj Diddigi

Comments Submitted to IEEE CDC 2026