arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1553
2604.01563 2026-04-03 cs.AI cs.LG

Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training

Abdelrahman Abouzeid

Comments 16 pages, 8 figures. Preprint. Under review

详情
英文摘要

In LLM training, normalization layers and optimizers are typically treated as independent design choices. In a 3x2 factorial at 1B parameters and 1000 training steps, we show this assumption can fail: Dynamic Erf (Derf; Chen & Liu, 2025) suffers a large negative interaction with Muon (Jordan, 2024), with its gap to RMSNorm growing from +0.31 nats under AdamW to +0.97 under Muon, approximately three times larger. Dynamic Tanh (DyT; Zhu et al., 2025), included as a bounded-normalizer control, shows no such penalty. Our evidence points to two failure modes of erf under Muon's faster spectral-norm growth: saturation (lossy compression) and scale blindness (discarding activation magnitude). An EMA-blend that reintroduces running scale estimates recovers ~84% of the gap. Separately, reducing Derf's alpha from its published default (0.5 to 0.3) recovers ~80% by keeping erf in its near-linear regime, where it approximately preserves relative scale; this setting is not the published default of Chen & Liu (2025). Using Derf's published default alpha with Muon incurs a 0.66-nat interaction penalty without producing NaNs or divergence, making the failure easy to miss in short pilot runs.

2604.01561 2026-04-03 cs.CV cs.AI

ReFlow: Self-correction Motion Learning for Dynamic Scene Reconstruction

Yanzhe Liang, Ruijie Zhu, Hanzhi Chang, Zhuoyuan Li, Jiahao Lu, Tianzhu Zhang

Comments Project page: https://rosetta-leong.github.io/ReFlow_Page/ {this https URL}

详情
英文摘要

We present ReFlow, a unified framework for monocular dynamic scene reconstruction that learns 3D motion in a novel self-correction manner from raw video. Existing methods often suffer from incomplete scene initialization for dynamic regions, leading to unstable reconstruction and motion estimation, which often resorts to external dense motion guidance such as pre-computed optical flow to further stabilize and constrain the reconstruction of dynamic components. However, this introduces additional complexity and potential error propagation. To address these issues, ReFlow integrates a Complete Canonical Space Construction module for enhanced initialization of both static and dynamic regions, and a Separation-Based Dynamic Scene Modeling module that decouples static and dynamic components for targeted motion supervision. The core of ReFlow is a novel self-correction flow matching mechanism, consisting of Full Flow Matching to align 3D scene flow with time-varying 2D observations, and Camera Flow Matching to enforce multi-view consistency for static objects. Together, these modules enable robust and accurate dynamic scene reconstruction. Extensive experiments across diverse scenarios demonstrate that ReFlow achieves superior reconstruction quality and robustness, establishing a novel self-correction paradigm for monocular 4D reconstruction.

2604.01560 2026-04-03 cs.CL

DeltaMem: Towards Agentic Memory Management via Reinforcement Learning

Qi Zhang, Shen Huang, Chu Liu, Shouqing Yang, Junbo Zhao, Haobo Wang, Pengjun Xie

Comments preprint, under review

详情
英文摘要

Recent advances in persona-centric memory have revealed the powerful capability of multi-agent systems in managing persona memory, especially in conversational scenarios. However, these complex frameworks often suffer from information loss and are fragile across varying scenarios, resulting in suboptimal performance. In this paper, we propose DeltaMem, an agentic memory management system that formulates persona-centric memory management as an end-to-end task within a single-agent setting. To further improve the performance of our agentic memory manager, we draw inspiration from the evolution of human memory and synthesize a user-assistant dialogue dataset along with corresponding operation-level memory updating labels. Building on this, we introduce a novel Memory-based Levenshtein Distance to formalize the memory updating reward, and propose a tailored reinforcement learning framework to further enhance the management capabilities of DeltaMem. Extensive experiments show that both training-free and RL-trained DeltaMem outperform all product-level baselines across diverse long-term memory benchmarks, including LoCoMo, HaluMem, and PersonaMem.

2604.01553 2026-04-03 cs.CV

Cross-Domain Vessel Segmentation via Latent Similarity Mining and Iterative Co-Optimization

Zhanqiang Guo, Jianjiang Feng, Jie Zhou

详情
英文摘要

Retinal vessel segmentation serves as a critical prerequisite for automated diagnosis of retinal pathologies. While recent advances in Convolutional Neural Networks (CNNs) have demonstrated promising performance in this task, significant performance degradation occurs when domain shifts exist between training and testing data. To address these limitations, we propose a novel domain transfer framework that leverages latent vascular similarity across domains and iterative co-optimization of generation and segmentation networks. Specifically, we first pre-train generation networks for source and target domains. Subsequently, the pretrained source-domain conditional diffusion model performs deterministic inversion to establish intermediate latent representations of vascular images, creating domain-agnostic prototypes for target synthesis. Finally, we develop an iterative refinement strategy where segmentation network and generative model undergo mutual optimization through cyclic parameter updating. This co-evolution process enables simultaneous enhancement of cross-domain image synthesis quality and segmentation accuracy. Experiments demonstrate that our framework achieves state-of-the-art performance in cross-domain retinal vessel segmentation, particularly in challenging clinical scenarios with significant modality discrepancies.

2604.01552 2026-04-03 cs.LG

ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor

Yixiao Wang, Ting Jiang, Zishan Shao, Hancheng Ye, Jingwei Sun, Mingyuan Ma, Jianyi Zhang, Yiran Chen, Hai Li

详情
英文摘要

Denoising generative models deliver high-fidelity generation but remain bottlenecked by inference latency due to the many iterative denoiser calls required during sampling. Training-free acceleration methods reduce latency by either sparsifying the model architecture or shortening the sampling trajectory. Current training-free acceleration methods are more complex than necessary: higher-order predictors amplify error under aggressive speedups, and architectural modifications hinder deployment. Beyond 2x acceleration, step skipping creates structural scarcity -- at most one fresh evaluation per local window -- leaving the computed output and its backward difference as the only causally grounded information. Based on this, we propose ZEUS, an acceleration method that predicts reduced denoiser evaluations using a second-order predictor, and stabilizes aggressive consecutive skipping with an interleaved scheme that avoids back-to-back extrapolations. ZEUS adds essentially zero overhead, no feature caches, and no architectural modifications, and it is compatible with different backbones, prediction objectives, and solver choices. Across image and video generation, ZEUS consistently improves the speed-fidelity performance over recent training-free baselines, achieving up to 3.2x end-to-end speedup while maintaining perceptual quality. Our code is available at: https://github.com/Ting-Justin-Jiang/ZEUS.

2604.01550 2026-04-03 cs.CV

Prototype-Based Low Altitude UAV Semantic Segmentation

Da Zhang, Gao Junyu, Zhao Zhiyuan

Comments Accepted to ICME 2026

详情
英文摘要

Semantic segmentation of low-altitude UAV imagery presents unique challenges due to extreme scale variations, complex object boundaries, and limited computational resources on edge devices. Existing transformer-based segmentation methods achieve remarkable performance but incur high computational overhead, while lightweight approaches struggle to capture fine-grained details in high-resolution aerial scenes. To address these limitations, we propose PBSeg, an efficient prototype-based segmentation framework tailored for UAV applications. PBSeg introduces a novel prototype-based cross-attention (PBCA) that exploits feature redundancy to reduce computational complexity while maintaining segmentation quality. The framework incorporates an efficient multi-scale feature extraction module that combines deformable convolutions (DConv) with context-aware modulation (CAM) to capture both local details and global semantics. Experiments on two challenging UAV datasets demonstrate the effectiveness of the proposed approach. PBSeg achieves 71.86\% mIoU on UAVid and 80.92\% mIoU on UDD6, establishing competitive performance while maintaining computational efficiency. Code is available at https://github.com/zhangda1018/PBSeg.

2604.01545 2026-04-03 cs.AI

RAE-AR: Taming Autoregressive Models with Representation Autoencoders

Hu Yu, Hang Xu, Jie Huang, Zeyue Xue, Haoyang Huang, Nan Duan, Feng Zhao

详情
英文摘要

The latent space of generative modeling is long dominated by the VAE encoder. The latents from the pretrained representation encoders (e.g., DINO, SigLIP, MAE) are previously considered inappropriate for generative modeling. Recently, RAE method lights the hope and reveals that the representation autoencoder can also achieve competitive performance as the VAE encoder. However, the integration of representation autoencoder into continuous autoregressive (AR) models, remains largely unexplored. In this work, we investigate the challenges of employing high-dimensional representation autoencoders within the AR paradigm, denoted as \textit{RAE-AR}. We focus on the unique properties of AR models and identify two primary hurdles: complex token-wise distribution modeling and the high-dimensionality amplified training-inference gap (exposure bias). To address these, we introduce token simplification via distribution normalization to ease modeling difficulty and improve convergence. Furthermore, we enhance prediction robustness by incorporating Gaussian noise injection during training to mitigate exposure bias. Our empirical results demonstrate that these modifications substantially bridge the performance gap, enabling representation autoencoder to achieve results comparable to traditional VAEs on AR models. This work paves the way for a more unified architecture across visual understanding and generative modeling.

2604.01542 2026-04-03 cs.CV physics.optics

Universal computational thermal imaging overcoming the ghosting effect

Hongyi Xu, Du Wang, Chenjun Zhao, Jiashuo Chen, Jiale Lin, Liqin Cao, Yanfei Zhong, Yiyuan She, Fanglin Bao

Comments 9 pages, 6 figures

详情
英文摘要

Thermal imaging is crucial for night vision but fundamentally hampered by the ghosting effect, a loss of detailed texture in cluttered photon streams. While conventional ghosting mitigation has relied on data post-processing, the recent breakthrough in heat-assisted detection and ranging (HADAR) opens a promising frontier for hyperspectral computational thermal imaging that produces night vision with day-like visibility. However, universal anti-ghosting imaging remains elusive, as state-of-the-art HADAR applies only to limited scenes with uniform materials, whereas material non-uniformity is ubiquitous in the real world. Here, we propose a universal computational thermal imaging framework, TAG (thermal anti-ghosting), to address material non-uniformity and overcome ghosting for high-fidelity night vision. TAG takes hyperspectral photon streams for nonparametric texture recovery, enabling our experimental demonstration of unprecedented expression recovery in thus-far-elusive ghostly human faces -- the archetypal, long-recognized ghosting phenomenon. Strikingly, TAG not only universally outperforms HADAR across various scenes, but also reveals the influence of material non-uniformity, shedding light on HADAR's effectiveness boundary. We extensively test facial texture and expression recovery across day and night, and demonstrate, for the first time, thermal 3D topological alignment and mood detection. This work establishes a universal foundation for high-fidelity computational night vision, with potential applications in autonomous navigation, reconnaissance, healthcare, and wildlife monitoring.

2604.01538 2026-04-03 cs.CL cs.AI

Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging

Mengxian Lyu, Cheng Peng, Ziyi Chen, Mengyuan Zhang, Jieting Li Lu, Yonghui Wu

详情
英文摘要

Large language models have been adopted in the medical domain for clinical documentation to reduce clinician burden. However, studies have reported that LLMs often "forget" a significant amount of instruction-following ability when fine-tuned using a task-specific medical dataset, a critical challenge in adopting general-purpose LLMs for clinical applications. This study presents a model merging framework to efficiently adapt general-purpose LLMs to the medical domain by countering this forgetting issue. By merging a clinical foundation model (GatorTronLlama) with a general instruct model (Llama-3.1-8B-Instruct) via interpolation-based merge methods, we seek to derive a domain-adapted model with strong performance on clinical tasks while retaining instruction-following ability. Comprehensive evaluation across medical benchmarks and five clinical generation tasks (e.g., radiology and discharge summarization) shows that merged models can effectively mitigate catastrophic forgetting, preserve clinical domain expertise, and retain instruction-following ability. In addition, our model merging strategies demonstrate training efficiency, achieving performance on par with fully fine-tuned baselines under severely constrained supervision (e.g., 64-shot vs. 256-shot). Consequently, weight-space merging constitutes a highly scalable solution for adapting open-source LLMs to clinical applications, facilitating broader deployment in resource-constrained healthcare environments.

2604.01535 2026-04-03 cs.CL

Read More, Think More: Revisiting Observation Reduction for Web Agents

Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada

详情
英文摘要

Web agents based on large language models (LLMs) rely on observations of web pages -- commonly represented as HTML -- as the basis for identifying available actions and planning subsequent steps. Prior work has treated the verbosity of HTML as an obstacle to performance and adopted observation reduction as a standard practice. We revisit this trend and demonstrate that the optimal observation representation depends on model capability and thinking token budget: (1) compact observations (accessibility trees) are preferable for lower-capability models, while detailed observations (HTML) are advantageous for higher-capability models; moreover, increasing thinking tokens further amplifies the benefit of HTML. (2) Our error analysis suggests that higher-capability models exploit layout information in HTML for better action grounding, while lower-capability models suffer from increased hallucination under longer inputs. We also find that incorporating observation history improves performance across most models and settings, and a diff-based representation offers a token-efficient alternative. Based on these findings, we suggest practical guidelines: adaptively select observation representations based on model capability and thinking token budget, and incorporate observation history using diff-based representations.

2604.01529 2026-04-03 cs.AI cs.MA

A Role-Based LLM Framework for Structured Information Extraction from Healthy Food Policies

Congjing Zhang, Ruoxuan Bao, Jingyu Li, Yoav Ackerman, Shuai Huang, Yanfang Su

详情
英文摘要

Current Large Language Model (LLM) approaches for information extraction (IE) in the healthy food policy domain are often hindered by various factors, including misinformation, specifically hallucinations, misclassifications, and omissions that result from the structural diversity and inconsistency of policy documents. To address these limitations, this study proposes a role-based LLM framework that automates the IE from unstructured policy data by assigning specialized roles: an LLM policy analyst for metadata and mechanism classification, an LLM legal strategy specialist for identifying complex legal approaches, and an LLM food system expert for categorizing food system stages. This framework mimics expert analysis workflows by incorporating structured domain knowledge, including explicit definitions of legal mechanisms and classification criteria, into role-specific prompts. We evaluate the framework using 608 healthy food policies from the Healthy Food Policy Project (HFPP) database, comparing its performance against zero-shot, few-shot, and chain-of-thought (CoT) baselines using Llama-3.3-70B. Our proposed framework demonstrates superior performance in complex reasoning tasks, offering a reliable and transparent methodology for automating IE from health policies.

2604.01526 2026-04-03 cs.LG

Learning ECG Image Representations via Dual Physiological-Aware Alignments

Hung Manh Pham, Jialu Tang, Aaqib Saeed, Dong Ma, Bin Zhu, Pan Zhou

详情
英文摘要

Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on access to raw signal recordings, limiting their applicability in real-world and resource-constrained settings. In this paper, we present ECG-Scan, a self-supervised framework for learning clinically generalized representations from ECG images through dual physiological-aware alignments: 1) Our approach optimizes image representation learning using multimodal contrastive alignment between image and gold-standard signal-text modalities. 2) We further integrate domain knowledge via soft-lead constraints, regularizing the reconstruction process and improving signal lead inter-consistency. Extensive benchmarking across multiple datasets and downstream tasks demonstrates that our image-based model achieves superior performance compared to existing image baselines and notably narrows the gap between ECG image and signal analysis. These results highlight the potential of self-supervised image modeling to unlock large-scale legacy ECG data and broaden access to automated cardiovascular diagnostics.

2604.01523 2026-04-03 cs.RO

Robust Autonomous Control of a Magnetic Millirobot in In Vitro Cardiac Flow

Anuruddha Bhattacharjee, Xinhao Chen, Lamar O. Mair, Suraj Raval, Yancy Diaz-Mercado, Axel Krieger

详情
英文摘要

Untethered magnetic millirobots offer significant potential for minimally invasive cardiac therapies; however, achieving reliable autonomous control in pulsatile cardiac flow remains challenging. This work presents a vision-guided control framework enabling precise autonomous navigation of a magnetic millirobot in an in vitro heart phantom under physiologically relevant flow conditions. The system integrates UNet-based localization, A* path planning, and a sliding mode controller with a disturbance observer (SMC-DOB) designed for multi-coil electromagnetic actuation. Although drag forces are estimated using steady-state CFD simulations, the controller compensates for transient pulsatile disturbances during closed-loop operation. In static fluid, the SMC-DOB achieved sub-millimeter accuracy (root-mean-square error, RMSE = 0.49 mm), outperforming PID and MPC baselines. Under moderate pulsatile flow (7 cm/s peak, 20 cP), it reduced RMSE by 37% and peak error by 2.4$\times$ compared to PID. It further maintained RMSE below 2 mm (0.27 body lengths) under elevated pulsatile flow (10 cm/s peak, 20 cP) and under low-viscosity conditions (4.3 cP, 7 cm/s peak), where baseline controllers exhibited unstable or failed tracking. These results demonstrate robust closed-loop magnetic control under time-varying cardiac flow disturbances and support the feasibility of autonomous millirobot navigation for targeted drug delivery.

2604.01520 2026-04-03 cs.AI

LLM Agents as Social Scientists: A Human-AI Collaborative Platform for Social Science Automation

Lei Wang, Yuanzi Li, Jinchao Wu, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen

详情
英文摘要

Traditional social science research often requires designing complex experiments across vast methodological spaces and depends on real human participants, making it labor-intensive, costly, and difficult to scale. Here we present S-Researcher, an LLM-agent-based platform that assists researchers in conducting social science research more efficiently and at greater scale by "siliconizing" both the research process and the participant pool. To build S-Researcher, we first develop YuLan-OneSim, a large-scale social simulation system designed around three core requirements: generality via auto-programming from natural language to executable scenarios, scalability via a distributed architecture supporting up to 100,000 concurrent agents, and reliability via feedback-driven LLM fine-tuning. Leveraging this system, S-Researcher supports researchers in designing social experiments, simulating human behavior with LLM agents, analyzing results, and generating reports, forming a complete human-AI collaborative research loop in which researchers retain oversight and intervention at every stage. We operationalize LLM simulation research paradigms into three canonical reasoning modes (induction, deduction, and abduction) and validate S-Researcher through systematic case studies: inductive reproduction of cultural dynamics consistent with Axelrod's theory, deductive testing of competing hypotheses on teacher attention validated against survey data, and abductive identification of a cooperation mechanism in public goods games confirmed by human experiments. S-Researcher establishes a new human--AI collaborative paradigm for social science, in which computational simulation augments human researchers to accelerate discovery across the full spectrum of social inquiry.

2604.01514 2026-04-03 cs.CL cs.CV

Why Instruction-Based Unlearning Fails in Diffusion Models?

Zeliang Zhang, Rui Sun, Jiani Liu, Qi Wu, Chenliang Xu

详情
英文摘要

Instruction-based unlearning has proven effective for modifying the behavior of large language models at inference time, but whether this paradigm extends to other generative models remains unclear. In this work, we investigate instruction-based unlearning in diffusion-based image generation models and show, through controlled experiments across multiple concepts and prompt variants, that diffusion models systematically fail to suppress targeted concepts when guided solely by natural-language unlearning instructions. By analyzing both the CLIP text encoder and cross-attention dynamics during the denoising process, we find that unlearning instructions do not induce sustained reductions in attention to the targeted concept tokens, causing the targeted concept representations to persist throughout generation. These results reveal a fundamental limitation of prompt-level instruction in diffusion models and suggest that effective unlearning requires interventions beyond inference-time language control.

2604.01506 2026-04-03 cs.LG

Beyond Logit Adjustment: A Residual Decomposition Framework for Long-Tailed Reranking

Zhanliang Wang, Hongzhuo Chen, Quan Minh Nguyen, Mian Umair Ahsan, Kai Wang

Comments Preprint

详情
英文摘要

Long-tailed classification, where a small number of frequent classes dominate many rare ones, remains challenging because models systematically favor frequent classes at inference time. Existing post-hoc methods such as logit adjustment address this by adding a fixed classwise offset to the base-model logits. However, the correction required to restore the relative ranking of two classes need not be constant across inputs, and a fixed offset cannot adapt to such variation. We study this problem through Bayes-optimal reranking on a base-model top-k shortlist. The gap between the optimal score and the base score, the residual correction, decomposes into a classwise component that is constant within each class, and a pairwise component that depends on the input and competing labels. When the residual is purely classwise, a fixed offset suffices to recover the Bayes-optimal ordering. We further show that when the same label pair induces incompatible ordering constraints across contexts, no fixed offset can achieve this recovery. This decomposition leads to testable predictions regarding when pairwise correction can improve performance and when cannot. We develop REPAIR (Reranking via Pairwise residual correction), a lightweight post-hoc reranker that combines a shrinkage-stabilized classwise term with a linear pairwise term driven by competition features on the shortlist. Experiments on five benchmarks spanning image classification, species recognition, scene recognition, and rare disease diagnosis confirm that the decomposition explains where pairwise correction helps and where classwise correction alone suffices.

2604.01504 2026-04-03 cs.CL cs.AI cs.CY

Magic, Madness, Heaven, Sin: LLM Output Diversity is Everything, Everywhere, All at Once

Harnoor Dhingra

Comments Under review

详情
英文摘要

Research on Large Language Models (LLMs) studies output variation across generation, reasoning, alignment, and representational analysis, often under the umbrella of "diversity." Yet the terminology remains fragmented, largely because the normative objectives underlying tasks are rarely made explicit. We introduce the Magic, Madness, Heaven, Sin framework, which models output variation along a homogeneity-heterogeneity axis, where valuation is determined by the task and its normative objective. We organize tasks into four normative contexts: epistemic (factuality), interactional (user utility), societal (representation), and safety (robustness). For each, we examine the failure modes and vocabulary such as hallucination, mode collapse, bias, and erasure through which variation is studied. We apply the framework to analyze all pairwise cross-contextual interactions, revealing that optimizing for one objective, such as improving safety, can inadvertently harm demographic representation or creative diversity. We argue for context-aware evaluation of output variation, reframing it as a property shaped by task objectives rather than a model's intrinsic trait.

2604.01499 2026-04-03 cs.LG

Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-Training

William Hoy, Binxu Wang, Xu Pan

详情
英文摘要

Evolution Strategies (ES) have emerged as a scalable gradient-free alternative to reinforcement learning based LLM fine-tuning, but it remains unclear whether comparable task performance implies comparable solutions in parameter space. We compare ES and Group Relative Policy Optimization (GRPO) across four tasks in both single-task and sequential continual-learning settings. ES matches or exceeds GRPO in single-task accuracy and remains competitive sequentially when its iteration budget is controlled. Despite this similarity in task performance, the two methods produce markedly different model updates: ES makes much larger changes and induces broader off-task KL drift, whereas GRPO makes smaller, more localized updates. Strikingly, the ES and GRPO solutions are linearly connected with no loss barrier, even though their update directions are nearly orthogonal. We develop an analytical theory of ES that explains all these phenomena within a unified framework, showing how ES can accumulate large off-task movement on weakly informative directions while still making enough progress on the task to match gradient-based RL in downstream accuracy. These results show that gradient-free and gradient-based fine-tuning can reach similarly accurate yet geometrically distinct solutions, with important consequences for forgetting and knowledge preservation. The source code is publicly available: https://github.com/Bhoy1/ESvsGRPO.

2604.01490 2026-04-03 cs.RO

Distal-Stable Beam for Continuum Robots

Ryouichi Saito, Takahiro Koide, Yuya Tanaka, Yasutaka Nakashima, Motoji Yamamoto, Ayato Kanada

Comments 8 pages, 7 figures

详情
英文摘要

Continuum robots are well suited for constrained environments but suffer from low distal stiffness, resulting in large posture errors under external loads. In this paper, we propose a novel structural primitive, the Distal-Stable Beam, which achieves a strong stiffness gradient through purely geometric design, maintaining compliance in the intermediate section while ensuring high distal rigidity. The structure consists of two parallel rods and one convergent rod constrained by guide disks, introducing geometric coupling that suppresses deformation modes and preserves distal posture. Experiments show that the distal stiffness is 12 times higher than at the center, corresponding to an approximately 100-fold improvement over a conventional cantilever beam. The proposed mechanism enables simultaneous compliance and distal stability without active stiffness modulation, providing a new design approach for continuum robots requiring both safety and precision.

2604.01481 2026-04-03 cs.LG cs.AI

DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data

Arshia Ilaty, Hossein Shirazi, Amir Rahmani, Hajar Homayouni

详情
英文摘要

The development of robust clinical decision support systems is frequently impeded by the scarcity of high-fidelity, privacy-preserving biomedical data. While Generative Large Language Models (LLMs) offer a promising avenue for synthetic data generation, they often struggle to capture the complex, non-linear dependencies and severe class imbalances inherent in Electronic Health Records (EHR), leading to statistically plausible but clinically invalid records. To bridge this gap, we introduce DISCO-TAB (DIScriminator-guided COntrol for TABular synthesis), a novel framework that orchestrates a fine-tuned LLM with a multi-objective discriminator system optimized via Reinforcement Learning. Unlike prior methods relying on scalar feedback, DISCO-TAB evaluates synthesis at four granularities, token, sentence, feature, and row, while integrating Automated Constraint Discovery and Inverse-Frequency Reward Shaping to autonomously preserve latent medical logic and resolve minority-class collapse. We rigorously validate our framework across diverse benchmarks, including high-dimensional, small-sample medical datasets (e.g., Heart Failure, Parkinson's). Our results demonstrate that hierarchical feedback yields state-of-the-art performance, achieving up to 38.2% improvement in downstream clinical classifier utility compared to GAN and Diffusion baselines, while ensuring exceptional statistical fidelity (JSD < 0.01) and robust resistance to membership inference attacks. This work establishes a new standard for generating trustworthy, utility-preserving synthetic tabular data for sensitive healthcare applications.

2604.01480 2026-04-03 cs.AI physics.comp-ph

A Self-Evolving Agentic Framework for Metasurface Inverse Design

Yi Huang, Bowen Zheng, Yunxi Dong, Hong Tang, Huan Zhao, S. M. Rakibul Hasan Shawon, Hualiang Zhang

详情
英文摘要

Metasurface inverse design has become central to realizing complex optical functionality, yet translating target responses into executable, solver-compatible workflows still demands specialized expertise in computational electromagnetics and solver-specific software engineering. Recent large language models (LLMs) offer a complementary route to reducing this workflow-construction burden, but existing language-driven systems remain largely session-bounded and do not preserve reusable workflow knowledge across inverse-design tasks. We present an agentic framework for metasurface inverse design that addresses this limitation through context-level skill evolution. The framework couples a coding agent, evolving skill artifacts, and a deterministic evaluator grounded in physical simulation so that solver-specific strategies can be iteratively refined across tasks without modifying model weights or the underlying physics solver. We evaluate the framework on a benchmark spanning multiple metasurface inverse-design task types, with separate training-aligned and held-out task families. Evolved skills raise in-distribution task success from 38% to 74%, increase criteria pass fraction from 0.510 to 0.870, and reduce average attempts from 4.10 to 2.30. On held-out task families, binary success changes only marginally, but improvements in best margin together with shifts in error composition and agent behavior indicate partial transfer of workflow knowledge. These results suggest that the main value of skill evolution lies in accumulating reusable solver-specific expertise around reliable computational engines, thereby offering a practical path toward more autonomous and accessible metasurface inverse-design workflows.

2604.01477 2026-04-03 cs.LG cs.SY eess.SY

Soft MPCritic: Amortized Model Predictive Value Iteration

Thomas Banker, Nathan P. Lawrence, Ali Mesbah

Comments submitted to CDC 2026

详情
英文摘要

Reinforcement learning (RL) and model predictive control (MPC) offer complementary strengths, yet combining them at scale remains computationally challenging. We propose soft MPCritic, an RL-MPC framework that learns in (soft) value space while using sample-based planning for both online control and value target generation. soft MPCritic instantiates MPC through model predictive path integral control (MPPI) and trains a terminal Q-function with fitted value iteration, aligning the learned value function with the planner and implicitly extending the effective planning horizon. We introduce an amortized warm-start strategy that recycles planned open-loop action sequences from online observations when computing batched MPPI-based value targets. This makes soft MPCritic computationally practical, while preserving solution quality. soft MPCritic plans in a scenario-based fashion with an ensemble of dynamic models trained for next-step prediction accuracy. Together, these ingredients enable soft MPCritic to learn effectively through robust, short-horizon planning on classic and complex control tasks. These results establish soft MPCritic as a practical and scalable blueprint for synthesizing MPC policies in settings where policy extraction and direct, long-horizon planning may fail.

2604.01476 2026-04-03 cs.LG cs.CL

When Reward Hacking Rebounds: Understanding and Mitigating It with Representation-Level Signals

Rui Wu, Ruixiang Tang

Comments 15 pages, 8 figures

详情
英文摘要

Reinforcement learning for LLMs is vulnerable to reward hacking, where models exploit shortcuts to maximize reward without solving the intended task. We systematically study this phenomenon in coding tasks using an environment-manipulation setting, where models can rewrite evaluator code to trivially pass tests without solving the task, as a controlled testbed. Across both studied models, we identify a reproducible three-phase rebound pattern: models first attempt to rewrite the evaluator but fail, as their rewrites embed test cases their own solutions cannot pass. They then temporarily retreat to legitimate solving. When legitimate reward remains scarce, they rebound into successful hacking with qualitatively different strategies. Using representation engineering, we extract concept directions for shortcut, deception, and evaluation awareness from domain-general contrastive pairs and find that the shortcut direction tracks hacking behavior most closely, making it an effective representational proxy for detection. Motivated by this finding, we propose Advantage Modification, which integrates shortcut concept scores into GRPO advantage computation to penalize hacking rollouts before policy updates. Because the penalty is internalized into the training signal rather than applied only at inference time, Advantage Modification provides more robust suppression of hacking compared with generation-time activation steering.

2604.01474 2026-04-03 cs.CV cs.LG

Prime Once, then Reprogram Locally: An Efficient Alternative to Black-Box Service Model Adaptation

Yunbei Zhang, Chengyi Cai, Feng Liu, Jihun Hamm

Comments CVPR 2026

详情
英文摘要

Adapting closed-box service models (i.e., APIs) for target tasks typically relies on reprogramming via Zeroth-Order Optimization (ZOO). However, this standard strategy is known for extensive, costly API calls and often suffers from slow, unstable optimization. Furthermore, we observe that this paradigm faces new challenges with modern APIs (e.g., GPT-4o). These models can be less sensitive to the input perturbations ZOO relies on, thereby hindering performance gains. To address these limitations, we propose an Alternative efficient Reprogramming approach for Service models (AReS). Instead of direct, continuous closed-box optimization, AReS initiates a single-pass interaction with the service API to prime an amenable local pre-trained encoder. This priming stage trains only a lightweight layer on top of the local encoder, making it highly receptive to the subsequent glass-box (white-box) reprogramming stage performed directly on the local model. Consequently, all subsequent adaptation and inference rely solely on this local proxy, eliminating all further API costs. Experiments demonstrate AReS's effectiveness where prior ZOO-based methods struggle: on GPT-4o, AReS achieves a +27.8% gain over the zero-shot baseline, a task where ZOO-based methods provide little to no improvement. Broadly, across ten diverse datasets, AReS outperforms state-of-the-art methods (+2.5% for VLMs, +15.6% for standard VMs) while reducing API calls by over 99.99%. AReS thus provides a robust and practical solution for adapting modern closed-box models.

2604.01467 2026-04-03 cs.CL cs.AI

A Dynamic Atlas of Persian Poetic Symbolism: Families, Fields, and the Historical Rewiring of Meaning

Kourosh Shahnazari, Seyed Moein Ayyoubzadeh, Mohammadali Keshtparvar

详情
英文摘要

Persian poetry is often remembered through recurrent symbols before it is remembered through plot. Wine vessels, gardens, flames, sacred titles, bodily beauty, and courtly names return across centuries, yet computational work still tends to flatten this material into isolated words or broad document semantics. That misses a practical unit of organization in Persian poetics: related forms travel as families and gain force through recurring relations. Using a corpus of 129,451 poems, we consolidate recurrent forms into traceable families, separate imagistic material from sacred and courtly reference, and map their relations in a multi-layer graph. The symbolic core is relatively sparse, the referential component much denser, and the attachment zone between them selective rather than diffuse. Across 11 Hijri-century bins, some families remain widely distributed, especially Shab (Night), Ruz (Day), and Khaak (Earth). Wine vessels, garden space, flame, and lyric sound strengthen later, while prestige-coded and heroic-courtly vocabulary is weighted earlier. Century-specific graphs show change in arrangement as well as membership. Modularity rises, cross-scope linkage declines, courtly bridges weaken, and sacred bridges strengthen. Hub positions shift too: Kherqe (Sufi Robe) gains late prominence, Farkhondeh {Blessed} and Banafsheh (Violet) recede, and Saaghar (Wine Cup) stays central across the chronology. In this corpus, Persian symbolism appears less as a fixed repertory than as a long-lived system whose internal weights and connections change over time.

2604.01466 2026-04-03 cs.RO cs.CV cs.LG

Efficient Equivariant Transformer for Self-Driving Agent Modeling

Scott Xu, Dian Chen, Kelvin Wong, Chris Zhang, Kion Fallah, Raquel Urtasun

Comments CVPR 2026

详情
英文摘要

Accurately modeling agent behaviors is an important task in self-driving. It is also a task with many symmetries, such as equivariance to the order of agents and objects in the scene or equivariance to arbitrary roto-translations of the entire scene as a whole; i.e., SE(2)-equivariance. The transformer architecture is a ubiquitous tool for modeling these symmetries. While standard self-attention is inherently permutation equivariant, explicit pairwise relative positional encodings have been the standard for introducing SE(2)-equivariance. However, this approach introduces an additional cost that is quadratic in the number of agents, limiting its scalability to larger scenes and batch sizes. In this work, we propose DriveGATr, a novel transformer-based architecture for agent modeling that achieves SE(2)-equivariance without the computational cost of existing methods. Inspired by recent advances in geometric deep learning, DriveGATr encodes scene elements as multivectors in the 2D projective geometric algebra $\mathbb{R}^*_{2,0,1}$ and processes them with a stack of equivariant transformer blocks. Crucially, DriveGATr models geometric relationships using standard attention between multivectors, eliminating the need for costly explicit pairwise relative positional encodings. Experiments on the Waymo Open Motion Dataset demonstrate that DriveGATr is comparable to the state-of-the-art in traffic simulation and establishes a superior Pareto front for performance vs computational cost.

2604.01461 2026-04-03 cs.AI

Reducing Hallucinations in LLM-based Scientific Literature Analysis Using Peer Context Outlier Detection

Daniel Xie, Maxwell J. Jacobson, Adil Wazeer, Haiyan Wang, Xinghang Zhang, Yexiang Xue

详情
英文摘要

Reducing hallucinations in Large Language Models (LLMs) is essential for improving the accuracy of data extraction from large text corpora. Current methods, like prompt engineering and chain-of-thought prompting, focus on individual documents but fail to consider relationships across a corpus. This paper introduces Peer Context Outlier Detection (P-COD), a novel approach that uses the relationships between documents to improve extraction accuracy. Our application domain is in scientific literature summarization, where papers with similar experiment settings should draw similar conclusions. By comparing extracted data to validated peer information within the corpus, we adjust confidence scores and flag low-confidence results for expert review. High-confidence results, supported by peer validation, are considered reliable. Our experiments demonstrate up to 98% precision in outlier detection across 6 domains of science, demonstrating that our design reduces hallucinations, enhances trust in automated systems, and allows researchers to focus on ambiguous cases, streamlining the data extraction workflows.

2604.01460 2026-04-03 cs.CV

Reinforcing Consistency in Video MLLMs with Structured Rewards

Yihao Quan, Zeru Shi, Jinman Zhao, Ruixiang Tang

详情
英文摘要

Multimodal large language models (MLLMs) have achieved remarkable progress in video understanding. However, seemingly plausible outputs often suffer from poor visual and temporal grounding: a model may fabricate object existence, assign incorrect attributes, or collapse repeated events while still producing a globally reasonable caption or answer. We study this failure mode through a compositional consistency audit that decomposes a caption into supporting factual and temporal claims, investigating whether a correct high-level prediction is actually backed by valid lower-level evidence. Our top-down audit reveals that even correct root relational claims often lack reliable attribute and existence support. This indicates that standard sentence-level supervision is a weak proxy for faithful video understanding. Furthermore, when turning to reinforcement learning (RL) for better alignment, standard sentence-level rewards often prove too coarse to accurately localize specific grounding failures. To address this, we replace generic sentence-level rewards with a structured reward built from factual and temporal units. Our training objective integrates three complementary components: (1) an instance-aware scene-graph reward for factual objects, attributes, and relations; (2) a temporal reward for event ordering and repetition; and (3) a video-grounded VQA reward for hierarchical self-verification. Across temporal, general video understanding, and hallucination-oriented benchmarks, this objective yields consistent gains on open-source backbones. These results suggest that structured reward shaping is a practical route to more faithful video understanding.

2604.01457 2026-04-03 cs.CL

Wired for Overconfidence: A Mechanistic Perspective on Inflated Verbalized Confidence in LLMs

Tianyi Zhao, Yinhan He, Wendy Zheng, Yujie Zhang, Chen Chen

详情
英文摘要

Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that a compact set of MLP blocks and attention heads, concentrated in middle-to-late layers, consistently writes the confidence-inflation signal at the final token position. We further show that targeted inference-time interventions on these circuits substantially improve calibration. Together, our results suggest that verbalized overconfidence in LLMs is driven by identifiable internal circuits and can be mitigated through targeted intervention.

2604.01453 2026-04-03 cs.CV

Nonlinear Methods for Analyzing Pose in Behavioral Research

Carter Sale, Margaret C. Macpherson, Gaurav Patil, Kelly Miles, Rachel W. Kallen, Sebastian Wallot, Michael J. Richardson

Comments 40 pages, 13 figures

详情
英文摘要

Advances in markerless pose estimation have made it possible to capture detailed human movement in naturalistic settings using standard video, enabling new forms of behavioral analysis at scale. However, the high dimensionality, noise, and temporal complexity of pose data raise significant challenges for extracting meaningful patterns of coordination and behavioral change. This paper presents a general-purpose analysis pipeline for human pose data, designed to support both linear and nonlinear characterizations of movement across diverse experimental contexts. The pipeline combines principled preprocessing, dimensionality reduction, and recurrence-based time series analysis to quantify the temporal structure of movement dynamics. To illustrate the pipeline's flexibility, we present three case studies spanning facial and full-body movement, 2D and 3D data, and individual versus multi-agent behavior. Together, these examples demonstrate how the same analytic workflow can be adapted to extract theoretically meaningful insights from complex pose time series.