arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.08234 2026-03-10 cs.AI cs.LG

The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs

Yonghong Deng, Zhen Yang, Ping Jian, Xinyue Zhang, Zhongbin Guo, Chengzhi Li

详情

英文摘要

With the rapid advancement of large language models (LLMs), the safety of LLMs has become a critical concern. Despite significant efforts in safety alignment, current LLMs remain vulnerable to jailbreaking attacks. However, the root causes of such vulnerabilities are still poorly understood, necessitating a rigorous investigation into jailbreak mechanisms across both academic and industrial communities. In this work, we focus on a continuation-triggered jailbreak phenomenon, whereby simply relocating a continuation-triggered instruction suffix can substantially increase jailbreak success rates. To uncover the intrinsic mechanisms of this phenomenon, we conduct a comprehensive mechanistic interpretability analysis at the level of attention heads. Through causal interventions and activation scaling, we show that this jailbreak behavior primarily arises from an inherent competition between the model's intrinsic continuation drive and the safety defenses acquired through alignment training. Furthermore, we perform a detailed behavioral analysis of the identified safety-critical attention heads, revealing notable differences in the functions and behaviors of safety heads across different model architectures. These findings provide a novel mechanistic perspective for understanding and interpreting jailbreak behaviors in LLMs, offering both theoretical insights and practical implications for improving model safety.

URL PDF HTML ☆

赞 0 踩 0

2603.08232 2026-03-10 cs.RO

A General Lie-Group Framework for Continuum Soft Robot Modeling

Lingxiao Xun, Benoît Rosa, Jérôme Szewczyk, Brahim Tamadazte

2603.08230 2026-03-10 cs.SD cs.AI eess.AS

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction

Xiaofeng Yu, Jiaheng Dong, Jean Honorio, Abhirup Ghosh, Hong Jia, Ting Dang

Comments The paper was submitted to Interspeech for review

2603.08228 2026-03-10 cs.CV

GarmentPainter: Efficient 3D Garment Texture Synthesis with Character-Guided Diffusion Model

Jinbo Wu, Xiaobo Gao, Xing Liu, Chen Zhao, Jialun Liu

2603.08227 2026-03-10 cs.CV

SRNeRV: A Scale-wise Recursive Framework for Neural Video Representation

Jia Wang, Jun Zhu, Xinfeng Zhang

Comments Accepted by IEEE ISCAS 2026

2603.08219 2026-03-10 cs.LG

Wiener Chaos Expansion based Neural Operator for Singular Stochastic Partial Differential Equations

Dai Shi, Luke Thompson, Andi Han, Peiyan Hu, Junbin Gao, José Miguel Hernández-Lobato

2603.08211 2026-03-10 cs.LG cs.AI

Revisiting Gradient Staleness: Evaluating Distance Metrics for Asynchronous Federated Learning Aggregation

Patrick Wilhelm, Odej Kao

2603.08208 2026-03-10 cs.CV cs.AI

Alignment-Aware and Reliability-Gated Multimodal Fusion for Unmanned Aerial Vehicle Detection Across Heterogeneous Thermal-Visual Sensors

Ishrat Jahan, Molla E Majid, M Murugappan, Muhammad E. H. Chowdhury, N. B. Prakash, Saad Bin Abul Kashem, Balamurugan Balusamy, Amith Khandakar

2603.08202 2026-03-10 cs.CV cs.AI

MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data

Siarhei Sheludzko, Dhimitrios Duka, Bernt Schiele, Hilde Kuehne, Anna Kukleva

Comments 18 pages, 11 figures. Accepted at WACV 2026

2603.08199 2026-03-10 cs.CV cs.RO

Fusion-Poly: A Polyhedral Framework Based on Spatial-Temporal Fusion for 3D Multi-Object Tracking

Xian Wu, Yitao Wu, Xiaoyu Li, Zijia Li, Lijun Zhao, Lining Sun

详情

英文摘要

LiDAR-camera 3D multi-object tracking (MOT) combines rich visual semantics with accurate depth cues to improve trajectory consistency and tracking reliability. In practice, however, LiDAR and cameras operate at different sampling rates. To maintain temporal alignment, existing data pipelines usually synchronize heterogeneous sensor streams and annotate them at a reduced shared frequency, forcing most prior methods to perform spatial fusion only at synchronized timestamps through projection-based or learnable cross-sensor association. As a result, abundant asynchronous observations remain underexploited, despite their potential to support more frequent association and more robust trajectory estimation over short temporal intervals. To address this limitation, we propose Fusion-Poly, a spatial-temporal fusion framework for 3D MOT that integrates asynchronous LiDAR and camera data. Fusion-Poly associates trajectories with multi-modal observations at synchronized timestamps and with single-modal observations at asynchronous timestamps, enabling higher-frequency updates of motion and existence states. The framework contains three key components: a frequency-aware cascade matching module that adapts to synchronized and asynchronous frames according to available detection modalities; a frequency-aware trajectory estimation module that maintains trajectories through high-frequency motion prediction, differential updates, and confidence-calibrated lifecycle management; and a full-state observation alignment module that improves cross-modal consistency at synchronized timestamps by optimizing image-projection errors. On the nuScenes test set, Fusion-Poly achieves 76.5% AMOTA, establishing a new state of the art among tracking-by-detection 3D MOT methods. Extensive ablation studies further validate the effectiveness of each component. Code will be released.

URL PDF HTML ☆

赞 0 踩 0

2603.08195 2026-03-10 cs.CL

Supporting Workflow Reproducibility by Linking Bioinformatics Tools across Papers and Executable Code

Clémence Sebe, Olivier Ferret, Aurélie Névéol, Mahdi Esmailoghli, Ulf Leser, Sarah Cohen-Boulakia

2603.08188 2026-03-10 cs.LG

Sequential Service Region Design with Capacity-Constrained Investment and Spillover Effect

Tingting Chen, Feng Chu, Jiantong Zhang

2603.08185 2026-03-10 cs.LG

SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization

Yeonsik Park, Hyeonseong Kim, Seungkyu Choi

Comments 21 pages, 4 figures

2603.08181 2026-03-10 cs.LG

AutoAdapt: An Automated Domain Adaptation Framework for LLMs

Sidharth Sinha, Anson Bastos, Xuchao Zhang, Akshay Nambi, Chetan Bansal, Saravan Rajmohan

2603.08180 2026-03-10 cs.CV cs.LG

ALOOD: Exploiting Language Representations for LiDAR-based Out-of-Distribution Object Detection

Michael Kösel, Marcel Schreiber, Michael Ulrich, Claudius Gläser, Klaus Dietmayer

Comments Accepted for publication at the 2025 IEEE Intelligent Transportation Systems Conference (ITSC)

2603.07582 2026-03-10 cs.RO

Model-Based and Neural-Aided Approaches for Dog Dead Reckoning

Gal Versano, Itai Savin, Itzik Klein

2603.06578 2026-03-10 cs.CV

Multimodal Large Language Models as Image Classifiers

Nikita Kisel, Illia Volkov, Klara Janouskova, Jiri Matas

2603.06572 2026-03-10 cs.CV cs.LG

SCOPE: Scene-Contextualized Incremental Few-Shot 3D Segmentation

Vishal Thengane, Zhaochong An, Tianjin Huang, Son Lam Phung, Abdesselam Bouzerdoum, Lu Yin, Na Zhao, Xiatian Zhu

Comments Accepted at CVPR 2026 (Findings)

2603.05867 2026-03-10 cs.CV

TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, Tony C. W. Mok, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi, Yingda Xia, Ling Zhang

Comments Accepted at ICLR 2026. 10 pages + appendix

详情

英文摘要

Accurate tumor analysis is central to clinical radiology and precision oncology, where early detection, reliable lesion characterization, and pathology-level risk assessment guide diagnosis and treatment planning. Chain-of-Thought (CoT) reasoning is particularly important in this setting because it enables step-by-step interpretation from imaging findings to clinical impressions and pathology conclusions, improving traceability and reducing diagnostic errors. Here, we target the clinical tumor analysis task and build a large-scale benchmark that operationalizes a multimodal reasoning pipeline, spanning findings, impressions, and pathology predictions. We curate TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with step-aligned rationales and cross-modal alignments along the trajectory from findings to impression to pathology, enabling evaluation of both answer accuracy and reasoning consistency. We further propose TumorChain, a multimodal interleaved reasoning framework that tightly couples 3D imaging encoders, clinical text understanding, and organ-level vision-language alignment. Through cross-modal alignment and iterative interleaved causal reasoning, TumorChain grounds visual evidence, aggregates conclusions, and issues pathology predictions after multiple rounds of self-refinement, improving traceability and reducing hallucination risk. Experiments show consistent improvements over strong baselines in lesion detection, impression generation, and pathology classification, and demonstrate strong generalization on the DeepTumorVQA benchmark. These results highlight the potential of multimodal reasoning for reliable and interpretable tumor analysis in clinical practice. Detailed information about our project can be found on our project homepage at https://github.com/ZJU4HealthCare/TumorChain.

URL PDF HTML ☆

赞 0 踩 0

2603.05768 2026-03-10 cs.LG cs.AI cs.CV

Bridging Domains through Subspace-Aware Model Merging

Levy Chaves, Chao Zhou, Rebekka Burkholz, Eduardo Valle, Sandra Avila

Comments Accepted at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR)

2603.05576 2026-03-10 cs.RO

Task Parameter Extrapolation via Learning Inverse Tasks from Forward Demonstrations

Serdar Bahar, Fatih Dogangun, Matteo Saveriano, Yukie Nagai, Emre Ugur

Comments Corrected author affiliation

2603.05565 2026-03-10 cs.LG cs.AI

When AI Levels the Playing Field: Skill Homogenization, Asset Concentration, and Two Regimes of Inequality

Xupeng Chen, Shuchen Meng

2603.05522 2026-03-10 cs.AI cs.CV cs.LG cs.RO

RoboLayout: Differentiable 3D Scene Generation for Embodied Agents

Ali Shamsaddinlou

2603.05437 2026-03-10 cs.CV cs.AI

SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning

Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Minju Jeon, Hyungee Kim, Dong-Jin Kim

Comments Accepted to CVPR 2026

2603.05318 2026-03-10 cs.LG cs.AI

GALACTIC: Global and Local Agnostic Counterfactuals for Time-series Clustering

Christos Fragkathoulas, Eleni Psaroudaki, Themis Palpanas, Evaggelia Pitoura

2603.04384 2026-03-10 cs.CL

AgentIR: Reasoning-Aware Retrieval for Deep Research Agents

Zijian Chen, Xueguang Ma, Shengyao Zhuang, Jimmy Lin, Akari Asai, Victor Zhong

2603.02919 2026-03-10 cs.CV cs.AI cs.LG

Interpretable Motion-Attentive Maps: Spatio-Temporally Localizing Concepts in Video Diffusion Transformers

Youngjun Jun, Seil Kang, Woojung Han, Seong Jae Hwang

Comments CVPR 2026

2603.02083 2026-03-10 cs.RO cs.CV

$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Siting Wang, Xiaofeng Wang, Zheng Zhu, Minnan Pei, Xinyu Cui, Cheng Deng, Jian Zhao, Guan Huang, Haifeng Zhang, Jun Wang

2602.22555 2026-03-10 cs.LG cs.AI

Autoregressive Visual Decoding from EEG Signals

Sicheng Dai, Hongwang Xiao, Shan Yu, Qiwei Ye

详情

Journal ref: ICLR 2026

英文摘要

Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal resolution. However, current approaches face significant challenges in bridging the modality gap between EEG and image data. These methods typically rely on complex adaptation processes involving multiple stages, making it hard to maintain consistency and manage compounding errors. Furthermore, the computational overhead imposed by large-scale diffusion models limit their practicality in real-world brain-computer interface (BCI) applications. In this work, we present AVDE, a lightweight and efficient framework for visual decoding from EEG signals. First, we leverage LaBraM, a pre-trained EEG model, and fine-tune it via contrastive learning to align EEG and image representations. Second, we adopt an autoregressive generative framework based on a "next-scale prediction" strategy: images are encoded into multi-scale token maps using a pre-trained VQ-VAE, and a transformer is trained to autoregressively predict finer-scale tokens starting from EEG embeddings as the coarsest representation. This design enables coherent generation while preserving a direct connection between the input EEG signals and the reconstructed images. Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks, while using only 10% of the parameters. In addition, visualization of intermediate outputs shows that the generative process of AVDE reflects the hierarchical nature of human visual perception. These results highlight the potential of autoregressive models as efficient and interpretable tools for practical BCI applications.

URL PDF HTML ☆

赞 0 踩 0

2602.21772 2026-03-10 cs.SD cs.AI

UniWhisper: Efficient Continual Multi-task Training for Robust Universal Audio Representation

Yuxuan Chen, Peize He, Haoyuan Yu, Junzi Zhang