arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.07461 2026-03-10 cs.CL cs.AI cs.LG

The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling

J. Clayton Kerce, Alexis Fox

详情

英文摘要

Standard transformers entangle all computation in a single residual stream, obscuring which components perform which functions. We introduce the Dual-Stream Transformer, which decomposes the residual stream into two functionally distinct components: a token stream updated by attention and a context stream updated by feed-forward networks. Information flow between attention heads is controlled through a hierarchy of mixing strategies, from fully independent (maximum interpretability) to dense (standard transformer behavior). This design exposes a tunable tradeoff between interpretability and performance. We measure this tradeoff on language modeling tasks at 29M parameters. Fully independent head mixing increases validation loss by 8\% relative to dense baselines. The recommended Kronecker mixing strategy, which permits scalar communication between heads while preserving within-head structure, costs only 2.5\%. All configurations maintain functional generation under attention amplification (scaling logits by factors up to 16 at inference time), with degradation ranging from 16\% to 27\%. This robustness suggests the architectures learn discrete algorithms that operate independently of soft probabilistic mixing. The architecture provides a foundation for interpretable language models where internal structure is exposed by design. \footnote{This work was partially supported by DARPA Contract HR001125C0302.}

URL PDF HTML ☆

赞 0 踩 0

2603.07454 2026-03-10 cs.CV cs.LG cs.RO

SLNet: A Super-Lightweight Geometry-Adaptive Network for 3D Point Cloud Recognition

Mohammad Saeid, Amir Salarpour, Pedram MohajerAnsari, Mert D. Pesé

Comments Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)

2603.07448 2026-03-10 cs.LG

Discrete Tokenization Unlocks Transformers for Calibrated Tabular Forecasting

Yael S. Elmatad

2603.07444 2026-03-10 cs.AI econ.GN q-fin.EC

HLER: Human-in-the-Loop Economic Research via Multi-Agent Pipelines for Empirical Discovery

Chen Zhu, Xiaolu Wang

详情

英文摘要

Large language models (LLMs) have enabled agent-based systems that aim to automate scientific research workflows. Most existing approaches focus on fully autonomous discovery, where AI systems generate research ideas, conduct analyses, and produce manuscripts with minimal human involvement. However, empirical research in economics and the social sciences poses additional constraints: research questions must be grounded in available datasets, identification strategies require careful design, and human judgment remains essential for evaluating economic significance. We introduce HLER (Human-in-the-Loop Economic Research), a multi-agent architecture that supports empirical research automation while preserving critical human oversight. The system orchestrates specialized agents for data auditing, data profiling, hypothesis generation, econometric analysis, manuscript drafting, and automated review. A key design principle is dataset-aware hypothesis generation, where candidate research questions are constrained by dataset structure, variable availability, and distributional diagnostics, reducing infeasible or hallucinated hypotheses. HLER further implements a two-loop architecture: a question quality loop that screens and selects feasible hypotheses, and a research revision loop where automated review triggers re-analysis and manuscript revision. Human decision gates are embedded at key stages, allowing researchers to guide the automated pipeline. Experiments on three empirical datasets show that dataset-aware hypothesis generation produces feasible research questions in 87% of cases (versus 41% under unconstrained generation), while complete empirical manuscripts can be produced at an average API cost of $0.8-$1.5 per run. These results suggest that Human-AI collaborative pipelines may provide a practical path toward scalable empirical research.

URL PDF HTML ☆

赞 0 踩 0

2603.07443 2026-03-10 cs.CV

Med-Evo: Test-time Self-evolution for Medical Multimodal Large Language Models

Dunyuan Xu, Xikai Yang, Juzheng Miao, Yaoqian Li, Jinpeng Li, Pheng-Ann Heng

2603.07442 2026-03-10 cs.RO

LITHE: Bridging Best-Effort Python and Real-Time C++ for Hot-Swapping Robotic Control Laws on Commodity Linux

He Kai Lim, Tyler R. Clites

Comments 8 pages, 5 figures. Submitted to IEEE/RSJ International Conference on Intelligent Robots & Systems (IROS) 2026

2603.07441 2026-03-10 cs.CV

DogWeave: High-Fidelity 3D Canine Reconstruction from a Single Image via Normal Fusion and Conditional Inpainting

Shufan Sun, Chenchen Wang, Zongfu Yu

2603.07437 2026-03-10 cs.LG cs.SY eess.SY math.OC stat.ML

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

Comments 38 pages; preliminary version appeared in IEEE CDC 2023; this is the extended journal version, with an end-to-end guarantee added

2603.07432 2026-03-10 cs.CV cs.CL cs.HC cs.LG

Generalization in Online Reinforcement Learning for Mobile Agents

Li Gu, Zihuan Jiang, Zhixiang Chi, Huan Liu, Ziqiang Wang, Yuanhao Yu, Glen Berseth, Yang Wang

2603.07430 2026-03-10 cs.CV

Disentangled Textual Priors for Diffusion-based Image Super-Resolution

Lei Jiang, Xin Liu, Xinze Tong, Zhiliang Li, Jie Liu, Jie Tang, Gangshan Wu

Comments Accepted by CVPR 2026

2603.07426 2026-03-10 cs.RO

Cable-driven Continuum Robotics: Proprioception via Proximal-integrated Force Sensing

Gang Zhang, Junyan Yan, Jibiao Chen, Shing Shin Cheng

2603.07417 2026-03-10 cs.RO

Unifying Sidewinding and Rolling: A Wave-Based Framework for Self-Righting in Elongated Limbless and Multi-Legged Robots

Hangjun Liu, Jiarui Geng, Jinxuan Ding, Gengzhi He, Xiyuan Wang, Melisa Arukgoda, Joe DiGennaro, George Ubertalli, Grigoriy Blekherman, Baxi Chong

详情

英文摘要

Centipede-like robots offer unique locomotion advantages due to their small cross-sectional area for accessing confined spaces, and their redundant legs enhance robustness in cluttered environments such as search-and-rescue and pipe inspection. However, elongated robots are particularly vulnerable to tipping over when climbing large obstacles, making reliable self-righting essential for field deployment. Self-righting strategies for elongate, multi-legged systems remain poorly understood. In this study, we conduct a comparative biomechanics and robophysical investigation to address three key questions: (1) What self-righting strategies are effective for elongate, many-legged systems? (2) How should these strategies depend on morphological parameters such as leg length and leg number? (3) Is there a morphological limit beyond which reliable self-righting becomes infeasible? We compare two biological exemplars: Scolopendra subspinipes (short legs) and Scutigera coleoptrata (house centipedes with long legs). Scolopendra subspinipes reliably self-rights both during aerial phases and through ground-assisted self-righting, whereas house centipedes rely predominantly on aerial reorientation and struggle to generate effective self-righting torques during ground contact. Motivated by these observations, we construct a parameterized space of bio-inspired self-righting strategies and develop an elongate robot with adjustable leg lengths. Systematic experiments reveal that increasing leg length necessitates a shift in control strategy to prevent torque over-concentration in mid-body actuators, and we identify a critical limb-length threshold above which robust self-righting becomes challenging. These results establish morphology-strategy coupling principles for self-righting in elongate robots and provide design guidelines for centipede-like systems operating in uncertain terrain.

URL PDF HTML ☆

赞 0 踩 0

2603.07416 2026-03-10 cs.LG

DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation

Shuzhang Zhong, Baotong Lu, Qi Chen, Chuanjie Liu, Fan Yang, Meng Li

2603.07415 2026-03-10 cs.LG cs.AI cs.IT math.IT

Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting

Ran Cheng

Comments 39 pages

详情

英文摘要

Catastrophic forgetting remains a central challenge in continual learning (CL), yet lacks a unified information-theoretic explanation for why some architectures forget catastrophically while others do not. We introduce \emph{Context Channel Capacity} ($C_\mathrm{ctx}$), the mutual information between a CL architecture's context signal and its generated parameters, and prove that zero forgetting requires $C_\mathrm{ctx} \geq H(T)$, where $H(T)$ is the task identity entropy. We establish an \emph{Impossibility Triangle} -- zero forgetting, online learning, and finite parameters cannot be simultaneously satisfied by sequential state-based learners -- and show that conditional regeneration architectures (HyperNetworks) bypass this triangle by redefining parameters as function values rather than states. We validate this framework across 8 CL methods on Split-MNIST (1,130+ experiments over 86 days, 4 seeds each), showing that $C_\mathrm{ctx}$ perfectly predicts forgetting behavior: methods with $C_\mathrm{ctx} = 0$ (NaiveSGD, EWC, SI, LwF, CFlow) exhibit catastrophic forgetting (6--97\%), while methods with $C_\mathrm{ctx} \approx 1$ (HyperNetwork) achieve zero forgetting (98.8\% ACC). We further propose \emph{Wrong-Context Probing} (P5), a practical diagnostic protocol for measuring $C_\mathrm{ctx}$, and extend the framework to CIFAR-10 via a novel \emph{Gradient Context Encoder} that closes the oracle gap from 23.3pp to 0.7pp. A systematic taxonomy of 15+ closed research directions -- including the Hebbian null result (frozen random features outperform learned features), CFlow's $θ_0$-memorizer phenomenon, and the $S_N$ symmetry barrier to column specialization -- provides the community with precisely diagnosed negative results. Our central design principle: \emph{architecture over algorithm} -- the context pathway must be structurally unbypassable.

URL PDF HTML ☆

赞 0 踩 0

2603.07414 2026-03-10 cs.CV

QdaVPR: A novel query-based domain-agnostic model for visual place recognition

Shanshan Wan, Lai Kang, Yingmei Wei, Tianrui Shen, Haixuan Wang, Chao Zuo

2603.07406 2026-03-10 cs.CV cs.AI

UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restoration

Debabrata Mandal, Soumitri Chattopadhyay, Yujie Wang, Marc Niethammer, Praneeth Chakravarthula

2603.07404 2026-03-10 cs.RO cs.AI

Adaptive Capacity Allocation for Vision Language Action Fine-tuning

Donghoon Kim, Minji Bae, Unghui Nam, Gyeonghun Kim, Suyun Lee, Kyuhong Shim, Byonghyo Shim

Comments ICRA 2026 (Official Code: https://github.com/dhkim-furiosa/LoRA-SP)

2603.07403 2026-03-10 cs.CV

Prompt-Based Caption Generation for Single-Tooth Dental Images Using Vision-Language Models

Anastasiia Sukhanova, Aiden Taylor, Julian Myers, Zichun Wang, Kartha Veerya Jammuladinne, Satya Sri Rajiteswari Nimmagadda, Aniruddha Maiti, Ananya Jana

Comments Accepted to IEEE International Conference on Semantic Computing (IEEE ICSC 2026)

2603.07402 2026-03-10 cs.LG

Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss

Ruixin Guo, Xinyu Li, Hao Zhou, Yang Zhou, Ruoming Jin

Comments Accepted at ICLR 2026 (https://openreview.net/forum?id=ANH044Wdje)

2603.07401 2026-03-10 cs.CV

VIVECaption: A Split Approach to Caption Quality Improvement

Varun Ananth, Baqiao Liu, Haoran Cai

2603.07400 2026-03-10 cs.RO

Perceptive Variable-Timing Footstep Planning for Humanoid Locomotion on Disconnected Footholds

Zhaoyang Xiang, Upama Pant, Ayonga Hereid

Comments 8 pages, 5 figures, 1 table, 3 algorithms. Supplemental video at: https://youtu.be/5EeuBnSb66s

2603.07399 2026-03-10 cs.CV eess.SP

Interpretable Aneurysm Classification via 3D Concept Bottleneck Models: Integrating Morphological and Hemodynamic Clinical Features

Toqa Khaled, Ahmad Al-Kabbany

详情

英文摘要

We are concerned with the challenge of reliably classifying and assessing intracranial aneurysms using deep learning without compromising clinical transparency. While traditional black-box models achieve high predictive accuracy, their lack of inherent interpretability remains a significant barrier to clinical adoption and regulatory approval. Explainability is paramount in medical modeling to ensure that AI-driven diagnoses align with established neurosurgical principles. Unlike traditional eXplainable AI (XAI) methods -- such as saliency maps, which often provide post-hoc, non-causal visual correlations -- Concept Bottleneck Models (CBMs) offer a robust alternative by constraining the model's internal logic to human-understandable clinical indices. In this article, we propose an end-to-end 3D Concept Bottleneck framework that maps high-dimensional neuroimaging features to a discrete set of morphological and hemodynamic concepts for aneurysm identification. We implemented this pipeline using a pre-trained 3D ResNet-34 backbone and a 3D DenseNet-121 to extract features from CTA volumes, which were subsequently processed through a soft bottleneck layer representing human-interpretable clinical concepts. The model was optimized using a joint-loss function to balance diagnostic focal loss and concept mean squared error (MSE), validated via stratified five-fold cross-validation. Our results demonstrate a peak task classification accuracy of 93.33% +/- 4.5% for the ResNet-34 architecture and 91.43% +/- 5.8% for the DenseNet-121 model. Furthermore, the implementation of 8-pass Test-Time Augmentation (TTA) yielded a robust mean accuracy of 88.31%, ensuring diagnostic stability during inference. By maintaining an accuracy-generalization gap of less than 0.04, this framework proves that high predictive performance can be achieved without sacrificing interpretability.

URL PDF HTML ☆

赞 0 踩 0

2603.07394 2026-03-10 cs.CV cs.AI cs.CL

AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions

Jihyoung Jang, Hyounghun Kim

Comments ICLR 2026 (28 pages); Project website: https://aqua-iclr2026.github.io/

2603.07393 2026-03-10 cs.RO cs.SY eess.SY

Underwater Embodied Intelligence for Autonomous Robots: A Constraint-Coupled Perspective on Planning, Control, and Deployment

Jingzehua Xu, Guanwen Xie, Jiwei Tang, Shuai Zhang, Xiaofan Li

Comments This article is currently under review

详情

英文摘要

Autonomous underwater robots are increasingly deployed for environmental monitoring, infrastructure inspection, subsea resource exploration, and long-horizon exploration. Yet, despite rapid advances in learning-based planning and control, reliable autonomy in real ocean environments remains fundamentally constrained by tightly coupled physical limits. Hydrodynamic uncertainty, partial observability, bandwidth-limited communication, and energy scarcity are not independent challenges; they interact within the closed perception-planning-control loop and often amplify one another over time. This Review develops a constraint-coupled perspective on underwater embodied intelligence, arguing that planning and control must be understood within tightly coupled sensing, communication, coordination, and resource constraints in real ocean environments. We synthesize recent progress in reinforcement learning, belief-aware planning, hybrid control, multi-robot coordination, and foundation-model integration through this embodied perspective. Across representative application domains, we show how environmental monitoring, inspection, exploration, and cooperative missions expose distinct stress profiles of cross-layer coupling. To unify these observations, we introduce a cross-layer failure taxonomy spanning epistemic, dynamic, and coordination breakdowns, and analyze how errors cascade across autonomy layers under uncertainty. Building on this structure, we outline research directions toward physics-grounded world models, certifiable learning-enabled control, communication-aware coordination, and deployment-aware system design. By internalizing constraint coupling rather than treating it as an external disturbance, underwater embodied intelligence may evolve from performance-driven adaptation toward resilient, scalable, and verifiable autonomy under real ocean conditions.

URL PDF HTML ☆

赞 0 踩 0

2603.07392 2026-03-10 cs.CL

Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams

Jiyeon Kim, Hyunji Lee, Dylan Zhou, Sue Hyun Park, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Sungmin Cha, Minjoon Seo

2603.07390 2026-03-10 cs.LG

Deterministic Fuzzy Triage for Legal Compliance Classification and Evidence Retrieval

Rian Atri

Comments 8 pages, 5 figures. Published in the Proceedings of the AAAI Bridge between Artificial Intelligence and Law 2026 (Full papers), pages 51-58

详情

Journal ref: Proceedings of the AAAI Bridge between Artificial Intelligence and Law 2026 (Full papers), pages 51-58, January 21, 2026, AAAI 2026 Bridge Program, Singapore

英文摘要

Legal teams increasingly use machine learning to triage large volumes of contractual evidence, but many models are opaque, non-deterministic, and difficult to align with frameworks such as HIPAA or NERC-CIP. We study a simple, reproducible alternative based on deterministic dual encoders and transparent fuzzy triage bands. We train a RoBERTa-base dual encoder with a 512-dimensional projection and cosine similarity on the ACORD benchmark for graded clause retrieval, then fine-tune it on a CUAD-derived binary compliance dataset. Across five random seeds (40-44) on a single NVIDIA A100 GPU, the model achieves ACORD-style retrieval performance of NDCG@5 0.38-0.42, NDCG@10 0.45-0.50, and 4-star Precision@5 about 0.37 on the test split. On CUAD-derived binary labels, it achieves AUC 0.98-0.99 and F1 0.22-0.30 depending on positive-class weighting, outperforming majority and random baselines in a highly imbalanced setting with a positive rate of about 0.6%. We then map scalar compliance scores into three regions: auto-noncompliant, auto-compliant, and human-review. Thresholds are tuned on validation data to maximize automatic decision coverage subject to an empirical error-rate constraint of at most 2% over auto-decided examples. The result is a seed-stable system summarized by a small number of scalar parameters. We argue that deterministic encoders, calibrated fuzzy bands, and explicit error constraints provide a practical middle ground between hand-crafted rules and opaque large language models, supporting explainable evidence triage, reproducible audit trails, and concrete mappings to legal review concepts.

URL PDF HTML ☆

赞 0 踩 0

2603.07389 2026-03-10 cs.LG math.OC

Feed m Birds with One Scone: Accelerating Multi-task Gradient Balancing via Bi-level Optimization

Xuxing Chen, Yun He, Jiayi Xu, Minhui Huang, Xiaoyi Liu, Boyang Liu, Fei Tian, Xiaohan Wei, Rong Jin, Sem Park, Bo Long, Xue Feng

2603.07379 2026-03-10 cs.AI cs.CL cs.CR cs.IR

SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions

Saroj Mishra, Suman Niroula, Umesh Yadav, Dilip Thakur, Srijan Gyawali, Shiva Gaire

2603.07372 2026-03-10 cs.CL cs.AI cs.LG

Domain-Specific Quality Estimation for Machine Translation in Low-Resource Scenarios

Namrata Patil Gurav, Akashdeep Ranu, Archchana Sindhujan, Diptesh Kanojia

Comments 21 pages, 7 tables, 7 figures

2603.07370 2026-03-10 cs.LG

Learning to Reflect: Hierarchical Multi-Agent Reinforcement Learning for CSI-Free mmWave Beam-Focusing

Hieu Le, Oguz Bedir, Mostafa Ibrahim, Jian Tao, Sabit Ekin