arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.27124 2026-05-01 cs.LG q-bio.QM

Better Models, Faster Training: Sigmoid Attention for single-cell Foundation Models

Vijay Sadashivaiah, Georgios Dasoulas, Judith Mueller, Soumya Ghosh

详情

英文摘要

Training stable biological foundation models requires rethinking attention mechanisms: we find that using sigmoid attention as a drop in replacement for softmax attention a) produces better learned representations: on six diverse single-cell datasets, sigmoid achieves 25% higher cell-type separation, better cell-type cohesion metrics, and lower validation loss, b) faster training, models with sigmoid attention train up to 10% faster than their softmax counterparts, and c) more stable training by eliminating inherent sources of instability in softmax attention. We establish that sigmoid attention has globally bounded derivatives ($\leq 0.25$) as opposed to softmax, and a diagonal Jacobian structure in contrast with softmax's dense coupling, which together help alleviate training instabilities. In stress tests on 160M-parameter bidirectional attention models trained without gradient clipping on 8K-token sequences, softmax diverges catastrophically, with gradients exploding by four orders of magnitude, while sigmoid remains stable. Finally, we implement and open-source TritonSigmoid, an efficient GPU kernel that achieves 515 TFLOPS on H100 GPUs, outperforming both FlashAttention-2 and FlashSigmoid, with native padding support, which is essential for biological sequences. Our results establish sigmoid attention as both theoretically grounded and empirically superior for biological foundation models. Code is available at https://github.com/MSDLLCpapers/triton-sigmoid

URL PDF HTML ☆

赞 0 踩 0

2604.27122 2026-05-01 cs.CV

InterPartAbility: Text-Guided Part Matching for Interpretable Person Re-Identification

Shakeeb Murtaza, Aryan Shukla, Rajarshi Bhattacharya, Maguelonne Heritier, Eric Granger

2604.27118 2026-05-01 cs.RO cs.AI

PALCAS: A Priority-Aware Intelligent Lane Change Advisory System for Autonomous Vehicles using Federated Reinforcement Learning

Yassine Ibork, Nhat Ha Nguyen, Myounggyu Won, Lokesh Das

2604.27115 2026-05-01 cs.CL

Exploring the Limits of Pruning: Task-Specific Neurons, Model Collapse, and Recovery in Task-Specific Large Language Models

M. K. Khalidi Siam, Md. Tausif-Ul-Islam, Md. Reshad Romim Khan, Mohammed Ali Hossain, Mushfiqul Amin, Labib Hasan Khan, Niloy Farhan, Farig Sadeque

2604.27106 2026-05-01 cs.CV cs.AI cs.LG cs.RO

Reconstruction by Generation: 3D Multi-Object Scene Reconstruction from Sparse Observations

Andrii Zadaianchuk, Leonardo Barcellona, Lennard Schuenemann, Christian Gumbsch, Zehao Wang, Muhammad Zubair Irshad, Fabien Despinoy, Rahaf Aljundi, Stratis Gavves, Sergey Zakharov

Comments Website: https://reconstruction-by-generation.github.io

2604.27105 2026-05-01 cs.CV

Automated Detection of Mutual Gaze and Joint Attention in Dual-Camera Settings via Dual-Stream Transformers

Jakub Kosmydel, Paweł Gajewski, Arkadiusz Białek

2604.27102 2026-05-01 cs.LG cs.AI physics.data-an physics.geo-ph

Anomaly Detection in Soil Heavy Metal Contamination Using Unsupervised Learning for Environmental Risk Assessment

Isaac Tettey Adjokatse, Samuel Senyo Koranteng, George Yamoah Afrifa, Theophilus Ansah-Narh, Marcellin Atemkeng, Joseph Bremang Tandoh, Kow Ahor Essel-Yorke, Richmond Opoku-Sarkodie, Rebecca Davis

Comments 7 pages, 6 figures, IEEE Conference

2604.27096 2026-05-01 cs.AI

Think it, Run it: Autonomous ML pipeline generation via self-healing multi-agent AI

Adela Bara, Gabriela Dobrita, Simona-Vasilica Oprea

2604.27095 2026-05-01 cs.RO

Interaction Forces and Internal Loads in Parallel Manipulators with Actuation Redundancy

Joshua Flight, Clément Gosselin

Comments 13 pages, 11 figures. Submitted to Mechanism and Machine Theory

2604.27093 2026-05-01 cs.CL cs.AI

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap

详情

英文摘要

Current LLM safety alignment techniques improve model robustness against adversarial attacks, but overlook whether and how LLMs can recover helpfulness when benign users clarify their intent. We introduce CarryOnBench, the first interactive benchmark that measures whether LLMs can revise their interpretation of user intent and recover utility, while remaining safe through multi-turn conversations. Starting from 398 seemingly harmful queries with benign underlying intents, we simulate 5,970 conversations by varying user follow-up sequences, evaluating 14 models on both intent-aligned utility and safety. CarryOnBench yields 1,866 different conversation flows of 4--12 turns, totaling 23,880 model responses. We design Ben-Util, a checklist-based metric that evaluates how well each model response fulfills the user's benign information need using atomic items. At turn one, models fulfill only 10.5--37.6% of the user's benign information need. When the same query includes the benign intent upfront, models fulfill 25.1--72.1%, confirming that models withhold information due to intent misinterpretation, not limited knowledge. With benign clarifications in multi-turn conversations, 13 of 14 models approach or exceed this single-turn baseline, yet recovery cost varies across models. We identify three failure modes invisible to single-turn evaluations: utility lock-in, where a model rarely updates despite clarification; unsafe recovery, where a model updates at disproportionate safety cost; and repetitive recovery, where a model recycles prior responses rather than providing new information. Moreover, conversations converge to similar harmfulness levels regardless of how conservative the model starts. These findings expose a gap that single-turn evaluations miss -- whether a model is appropriately cautious or simply unresponsive to clarified user intent.

URL PDF HTML ☆

赞 0 踩 0

2604.27092 2026-05-01 cs.AI physics.optics

End-to-end autonomous scientific discovery on a real optical platform

Shuxing Yang, Fujia Chen, Rui Zhao, Junyao Wu, Yize Wang, Haiyao Luo, Ning Han, Qiaolu Chen, Yuze Hu, Wenhao Li, Mingzhu Li, Hongsheng Chen, Yihao Yang

Comments 25 pages, 4 figures

2604.27089 2026-05-01 cs.LG cs.DC cs.PF

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism

Ahan Gupta, Zhihao Wang, Neel Dani, Masahiro Tanaka, Olatunji Ruwase, Minjia Zhang

Comments 13 pages, 9 figures, 1 table

2604.27083 2026-05-01 cs.LG

Co-Evolving Policy Distillation

Naibin Gu, Chenxu Yang, Qingyi Si, Chuanyu Qin, Dingyu Yao, Peng Fu, Zheng Lin, Weiping Wang, Nan Duan, Jiaqi Wang

Comments Work in progress

2604.27082 2026-05-01 cs.AI cs.LG cs.SE

When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

Emma Casey, David Roberts, David Sim, Ian Beaver

Comments 12 pages with appendix

2604.27063 2026-05-01 cs.LG cs.NE

Learning to Forget: Continual Learning with Adaptive Weight Decay

Aditya A. Ramesh, Alex Lewandowski, Jürgen Schmidhuber

Comments Preprint version

2604.27045 2026-05-01 cs.LG cs.AI cs.CL

Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture

Samuel L Pugh, Eric Yang, Alexander Muir Sutherland, Alessandra Breschi

2604.27043 2026-05-01 cs.CL

CL-bench Life: Can Language Models Learn from Real-Life Context?

Shihan Dou, Yujiong Shen, Chenhao Huang, Junjie Ye, Jiayi Chen, Junzhe Wang, Qianyu He, Shichun Liu, Changze Lv, Jiahang Lin, Jiazheng Zhang, Ming Zhang, Shaofan Liu, Tao Ji, Zhangyue Yin, Cheng Zhang, Huaibing Xie, Jianglu Hu, Jingcheng Deng, Lincheng Li, Minda Hu, Shaolei Wang, Syrus Zhao, Weichao Wang, Yan Lei, Yang Liu, Yanling Xiao, Yiting Liu, Zenan Xu, Zhen Guo, Ziliang Zhao, Pluto Zhou, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Di Wang, Shunyu Yao

Comments 50 pages, 11 figures

2604.27039 2026-05-01 cs.CL

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang

详情

英文摘要

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.

URL PDF HTML ☆

赞 0 踩 0

2604.27031 2026-05-01 cs.LG cs.AI cs.NE

NORACL: Neurogenesis for Oracle-free Resource-Adaptive Continual Learning

Karthik Charan Raghunathan, Christian Metzner, Laura Kriener, Melika Payvand

Comments 23 pages, 6 figures and 3 tables

详情

英文摘要

In a continual learning setting, we require a model to be plastic enough to learn a new task and stable enough to not disturb previously learned capabilities. We argue that this dilemma has an architectural root. A finite network has limited representational and plastic resources, yet the required capacity depends on properties of the future task stream that are unknown: how many tasks will be encountered, and how much they overlap in feature space. Regularization-based methods preserve past knowledge within fixed-capacity architectures and therefore implicitly rely on an oracle architecture sized for this unknown future. When tasks are only weakly related, fixed architectures progressively run out of plastic resources; when tasks are few or strongly overlapping, models are often over-provisioned. Inspired by neurogenesis in biology, we propose NORACL to address the stability-plasticity dilemma by tackling the oracle architecture problem through neuronal growth. Starting from a compact network, NORACL grows only when needed by monitoring two complementary signals for representational and plasticity saturation. We evaluate NORACL against oracle-sized static baselines across varying task counts and geometries. Across all settings, NORACL achieves final average accuracies that are better than or on par with oracle-provisioned static baselines while using fewer parameters. Additionally, NORACL yields architectures with interpretable growth, i.e. dissimilar tasks predominantly expand feature-extraction layers, whereas tasks which rely on common features shift growth toward later feature-combination layers. Our analysis further explains why fixed-capacity networks lose plasticity as tasks accumulate, whereas NORACL creates fresh capacity for new tasks through growth. Together, these results show that adaptive neurogenesis pushes the stability-plasticity Pareto frontier of continual learning.

URL PDF HTML ☆

赞 0 踩 0

2604.27014 2026-05-01 cs.LG cs.CR

Fidelity, Diversity, and Privacy: A Multi-Dimensional LLM Evaluation for Clinical Data Augmentation

Guillermo Iglesias, Gema Bello-Orgaz, María Navas-Loro, Cristian Ramirez-Atencia, Mercè Salvador Robert, Enrique Baca-Garcia

Comments 9 pages, 1 figure, 1 table

2604.27003 2026-05-01 cs.LG cs.AI

When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents

Qisheng Hu, Quanyu Long, Wenya Wang

Comments Working in progress

2604.26999 2026-05-01 cs.AI

Compositional Meta-Learning for Mitigating Task Heterogeneity in Physics-Informed Neural Networks

Beomchul Park, Minsu Koh, Heejo Kong, Seong-Whan Lee

Comments Accepted by Pattern Recognition

2604.26988 2026-05-01 cs.RO

Robot Planning and Situation Handling with Active Perception

Austine Oloo, Zainab Altaweel, Yohei Hayamizu, Peiqi Liu, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Chris Paxton, Xiaohan Zhang, Shiqi Zhang

2604.26986 2026-05-01 cs.CL

BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

Tosin Adewumi, Martin Karlsson, Lama Alkhaled, Marcus Liwicki

Comments 19 pages, 4 figures

2604.26984 2026-05-01 cs.LG

Monitoring Neural Training with Topology: A Footprint-Predictable Collapse Index

Alexander Kalinowski

2604.26288 2026-05-01 cs.CV cs.AI

CheXthought: A global multimodal dataset of clinical chain-of-thought reasoning and visual attention for chest X-ray interpretation

Sonali Sharma, Jin Long, George Shih, Sarah Eid, Christian Bluethgen, Francine L. Jacobson, Emily B. Tsai, Global Radiology Consortium, Ahmed M. Alaa, Curtis P. Langlotz

Comments 51 pages, 7 figures, 10 tables

2604.25367 2026-05-01 cs.CV

Self-DACE++: Robust Low-Light Enhancement via Efficient Adaptive Curve Estimation

Jianyu Wen, Jun Xie, Feng Chen, Zhepeng Wang, Chenhao Wu, Tong Zhang, Yixuan Yu, Piotr Swierczynski

2604.25186 2026-05-01 cs.CV cs.CE cs.MM

FCMBench-Video: Benchmarking Document Video Intelligence

Runze Cui, Fangxin Shang, Yehui Yang, Qing Yang, Yanwu Xu, Tao Chen

详情

英文摘要

Document understanding is a critical capability in financial credit review, onboarding, and remote verification, where both decision accuracy and evidence traceability matter. Compared with static document images, document videos present a temporally redundant and sequentially unfolding evidence stream, require evidence integration across frames, and preserve acquisition-process cues relevant to authenticity-sensitive and anti-fraud review. We introduce FCMBench-Video, a benchmark for document-video intelligence that evaluates document perception, temporal grounding, and evidence-grounded reasoning under realistic capture conditions. For privacy-compliant yet realistic data at scale, we organize construction as an atomic-acquisition and composition workflow that records reusable single-document clips, applies controlled degradations, and assembles long-form multi-document videos with prescribed temporal spans. FCMBench-Video is built from 495 atomic videos composed into 1,200 long-form videos paired with 11,322 expert-annotated question--answer instances, covering 28 document types over 20s--60s duration tiers and 5,960 Chinese / 5,362 English instances. Evaluations on nine recent Video-MLLMs show that FCMBench-Video provides meaningful separation across systems and capabilities: counting is the most duration-sensitive task, Cross-Document Validation and Evidence-Grounded Selection probe higher-level evidence integration, and Visual Prompt Injection provides a complementary robustness dimension. The overall score distribution is broad and approximately bell-shaped, indicating a benchmark that is neither saturated nor dominated by trivial cases. Together, these results position FCMBench-Video as a reproducible benchmark for tracking Video-MLLM progress on document-video understanding and probing capability boundaries in authenticity-sensitive credit-domain applications.

URL PDF HTML ☆

赞 0 踩 0

2604.24805 2026-05-01 cs.LG q-bio.QM

minAction.net: Energy-First Neural Architecture Design -- From Biological Principles to Systematic Validation

Martin G. Frasch

Comments v2: Abstract updated to match revised lambda-sweep results (full sweep range; both MNIST and Fashion-MNIST; ~3 orders of magnitude reduction). Updated author affiliations and emails. Embedded Zenodo data DOI 10.5281/zenodo.19840031. Notation uniformity in Sec 3.4. Fig S2(a) caption clarified. Corrected Zenodo archive size in Appendix A (95 MB compressed)

2604.24637 2026-05-01 cs.LG cs.AI q-bio.NC

Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks

Kevin McKee, Thomas Hazy, Yicong Zheng, Zacharie Bugaud, Thomas Miconi

Comments 16 pages, 15 figures

详情

英文摘要

Block-sequential continual learning demands that a single model both protect prior solutions from catastrophic forgetting and efficiently infer at inference time which prior solution matches the current input without task labels. We present Functional Task Networks (FTN), a parameter-isolation method inspired by structural and dynamical motifs found in the mammalian neocortex. Similar to mixture-of-experts, this method uses a high dimensional, self-organizing binary mask over a large population of small but deep networks, inspired by dendritic models of pyramidal neurons. The mask is produced by a three-stage procedure: (1) gradient descent on a continuous mask identifies task-relevant neurons, (2) a smoothing kernel biases the result toward spatial contiguity, (3) and k-winner-take-all binarizes the resulting group at a fixed capacity budget. Like mixture-of-experts, each neuron is an independent deep network, so disjoint masks give exactly disjoint gradient updates, providing structural guarantees against catastrophic forgetting. This three-stage procedure recovers the sub-network of a previously-trained task in a single gradient step, providing unsupervised task segmentation at inference time. We test it on three continual-learning benchmarks: (1) a synthetic multi-task classification/regression generator, (2) MNIST with shuffled class labels (pure concept shift), and (3) Permuted MNIST (domain shift). On all three, FTN with fine grained smoothing (FTN-Slow) results in nearly zero forgetting. FTN with a large kernel and only 2 iterations of smoothing (FTN-Fast) trades off some retention for increased speed. We show that the spatial organization mechanism reduces the effective mask search from the combinatorial top-k subset problem in O(C(H,K)) to the complexity of a near-linear scan in O(H) over compact cortical neighborhoods, which is parallelized by the gradient-based update.

URL PDF HTML ☆

赞 0 踩 0