arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.05773 2026-03-16 cs.CR cs.AI cs.LG

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Jinman Wu, Yi Xie, Shen Lin, Shiqian Zhao, Xiaofeng Chen

详情

英文摘要

Safety alignment is often conceptualized as a monolithic process wherein harmfulness detection automatically triggers refusal. However, the persistence of jailbreak attacks suggests a fundamental mechanistic decoupling. We propose the \textbf{\underline{D}}isentangled \textbf{\underline{S}}afety \textbf{\underline{H}}ypothesis \textbf{(DSH)}, positing that safety computation operates on two distinct subspaces: a \textit{Recognition Axis} ($\mathbf{v}_H$, ``Knowing'') and an \textit{Execution Axis} ($\mathbf{v}_R$, ``Acting''). Our geometric analysis reveals a universal ``Reflex-to-Dissociation'' evolution, where these signals transition from antagonistic entanglement in early layers to structural independence in deep layers. To validate this, we introduce \textit{Double-Difference Extraction} and \textit{Adaptive Causal Steering}. Using our curated \textsc{AmbiguityBench}, we demonstrate a causal double dissociation, effectively creating a state of ``Knowing without Acting.'' Crucially, we leverage this disentanglement to propose the \textbf{Refusal Erasure Attack (REA)}, which achieves State-of-the-Art attack success rates by surgically lobotomizing the refusal mechanism. Furthermore, we uncover a critical architectural divergence, contrasting the \textit{Explicit Semantic Control} of Llama3.1 with the \textit{Latent Distributed Control} of Qwen2.5. The code and dataset are available at https://anonymous.4open.science/r/DSH.

URL PDF HTML ☆

赞 0 踩 0

2603.05772 2026-03-16 cs.CR cs.AI

Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads

Jinman Wu, Yi Xie, Shiqian Zhao, Xiaofeng Chen

2602.21130 2026-03-16 stat.ML cs.LG

An Enhanced Projection Pursuit Tree Classifier with Visual Methods for Assessing Algorithmic Improvements

Natalia da Silva, Dianne Cook, Eun-Kyung Lee

2602.13165 2026-03-16 cs.IR cs.AI

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Asmit Kumar Singh, Haozhe Wang, Laxmi Naga Santosh Attaluri, Tak Chiam, Weihua Zhu

2602.05474 2026-03-16 cs.IR cs.AI

LLM-driven Multimodal Recommendation

Yicheng Di

Comments There are some writing errors in our methods section that need to be corrected. We will then add extensive experiments and rewrite the Introduction and related work sections

2601.18113 2026-03-16 cs.CR cs.AI

MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

Dezhang Kong, Zhuxi Wu, Shiqi Liu, Zhicheng Tan, Kuichen Lu, Minghao Li, Qichen Liu, Shengyu Chu, Zhenhua Xu, Xuan Liu, Meng Han

2601.17907 2026-03-16 cs.CR cs.LG

FARM: Few-shot Adaptive Malware Family Classification under Concept Drift

Numan Halit Guldemir, Oluwafemi Olukoya, Jesús Martínez-del-Rincón

Comments This work is currently under review for journal publication

2601.15369 2026-03-16 eess.IV cs.AI

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Letian Zhang, Sucheng Ren, Yanqing Liu, Xianhang Li, Zeyu Wang, Yuyin Zhou, Huaxiu Yao, Zeyu Zheng, Weili Nie, Guilin Liu, Zhiding Yu, Cihang Xie

2601.08697 2026-03-16 cs.HC cs.AI

Auditing Student-AI Collaboration: A Case Study of Online Graduate CS Students

Nifu Dan

2601.04478 2026-03-16 eess.SP cs.LG

Prediction of Cellular Malignancy Using Electrical Impedance Signatures and Supervised Machine Learning

Shadeeb Hossain

2512.04120 2026-03-16 cs.CR cs.AI cs.CL cs.CY cs.DB cs.IR

Towards Contextual Sensitive Data Detection

Liang Telkamp, Madelon Hulsebos

2511.02620 2026-03-16 cs.CR cs.LG

Verifying LLM Inference to Detect Model Weight Exfiltration

Roy Rinberg, Adam Karvonen, Alexander Hoover, Daniel Reuter, Keri Warr

2510.24534 2026-03-16 quant-ph cs.AI cs.CR

Quantum-Resistant Networks Using Post-Quantum Cryptography

Xin Jin, Nitish Kumar Chandra, Mohadeseh Azari, Kaushik P. Seshadreesan, Junyu Liu

Comments Submission for 2025 IEEE Workshop on Quantum IntelLigence, Learning & Security (QUILLS), https://sites.google.com/view/quills2025/home

2510.00208 2026-03-16 eess.SY cs.RO cs.SY math.OC

Robust Attitude Control of Nonlinear UAV Dynamics with LFT Models and $\mathcal{H}_\infty$ Performance

Tanay Kumar, Raktim Bhattacharya

Comments 6 pages, 6 figures, 3 tables, submitted to ACC 2026

2509.22355 2026-03-16 quant-ph cs.LG

Multi-channel convolutional neural quantum embedding

Yujin Kim, Changjae Im, Taehyun Kim, Tak Hur, Daniel K. Park

Comments 20 pages, 7 figures

2509.19881 2026-03-16 eess.AS cs.SD

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model

The Hieu Pham, Tan Dat Nguyen, Phuong Thanh Tran, Joon Son Chung, Duc Dung Nguyen

Comments ICASSP 2026

2509.06703 2026-03-16 cs.CR cs.LG

On the (In)Security of Loading Machine Learning Models

Gabriele Digregorio, Marco Di Gennaro, Stefano Zanero, Stefano Longari, Michele Carminati

Comments Accepted to the 2026 IEEE Symposium on Security and Privacy (SP)

2509.06553 2026-03-16 eess.IV cs.CV cs.LG

Impact of Labeling Inaccuracy and Image Noise on Tooth Segmentation in Panoramic Radiographs using Federated, Centralized and Local Learning

Johan Andreas Balle Rubak, Khuram Naveed, Sanyam Jain, Lukas Esterle, Alexandros Iosifidis, Ruben Pauwels

详情

DOI: 10.1093/dmfr/twag001

英文摘要

Objectives: Federated learning (FL) may mitigate privacy constraints, heterogeneous data quality, and inconsistent labeling in dental diagnostic AI. We compared FL with centralized (CL) and local learning (LL) for tooth segmentation in panoramic radiographs across multiple data corruption scenarios. Methods: An Attention U-Net was trained on 2066 radiographs from six institutions across four settings: baseline (unaltered data); label manipulation (dilated/missing annotations); image-quality manipulation (additive Gaussian noise); and exclusion of a faulty client with corrupted data. FL was implemented via the Flower AI framework. Per-client training- and validation-loss trajectories were monitored for anomaly detection and a set of metrics (Dice, IoU, HD, HD95 and ASSD) was evaluated on a hold-out test set. From these metrics significance results were reported through Wilcoxon signed-rank test. CL and LL served as comparators. Results: Baseline: FL achieved a median Dice of 0.94889 (ASSD: 1.33229), slightly better than CL at 0.94706 (ASSD: 1.37074) and LL at 0.93557-0.94026 (ASSD: 1.51910-1.69777). Label manipulation: FL maintained the best median Dice score at 0.94884 (ASSD: 1.46487) versus CL's 0.94183 (ASSD: 1.75738) and LL's 0.93003-0.94026 (ASSD: 1.51910-2.11462). Image noise: FL led with Dice at 0.94853 (ASSD: 1.31088); CL scored 0.94787 (ASSD: 1.36131); LL ranged from 0.93179-0.94026 (ASSD: 1.51910-1.77350). Faulty-client exclusion: FL reached Dice at 0.94790 (ASSD: 1.33113) better than CL's 0.94550 (ASSD: 1.39318). Loss-curve monitoring reliably flagged the corrupted site. Conclusions: FL matches or exceeds CL and outperforms LL across corruption scenarios while preserving privacy. Per-client loss trajectories provide an effective anomaly-detection mechanism and support FL as a practical, privacy-preserving approach for scalable clinical AI deployment.

URL PDF HTML ☆

赞 0 踩 0

2509.05379 2026-03-16 cs.CR cs.AI

ThreatGPT: An Agentic AI Framework for Enhancing Public Safety through Threat Modeling

Sharif Noor Zisad, Ragib Hasan

2508.16624 2026-03-16 cs.CY cs.AI

The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5

Hiroki Naito

Comments 9 pages ,3 tables

2507.20796 2026-03-16 econ.GN cs.AI cs.LG q-fin.EC

Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

Wei Lu, Amit Dhanda, Daniel L. Chen, Christian B. Hansen

2505.15854 2026-03-16 cs.NI cs.AI cs.ET cs.LG cs.MA

Integration of TinyML and LargeML: A Survey of 6G and Beyond

Thai-Hoc Vu, Ngo Hoang Tu, Thien Huynh-The, Kyungchun Lee, Sunghwan Kim, Miroslav Voznak, Quoc-Viet Pham

Comments This work has been accepted for publication in IEEE Internet of Things Journal under ID: IoT-56661-2025

2505.10628 2026-03-16 stat.ML cs.LG math.PR

Minimax learning rates for estimating binary classifiers under margin conditions

Jonathan García, Philipp Petersen

2503.15509 2026-03-16 cs.HC cs.CL

Representing data in words: A context engineering approach

Amandine M. Caut, Amy Rouillard, Beimnet Zenebe, Matthias Green, Ágúst Pálmason Morthens, David J. T. Sumpter

2411.10406 2026-03-16 quant-ph cond-mat.dis-nn cs.AI cs.DC

How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits

Masoud Mohseni, Artur Scherer, K. Grace Johnson, Oded Wertheim, Matthew Otten, Namit Anand, Navid Anjum Aadit, Yuri Alexeev, Gilad Ben-Shach, Kirk M. Bresniker, Kerem Y. Camsari, Barbara Chapman, Soumitra Chatterjee, Shuvro Chowdhury, Gebremedhin A. Dagnew, Tom Dvir, Aniello Esposito, Farah Fahim, Michael Ferguson, Marco Fiorentino, Archit Gajjar, Katerina Gratsea, Gaurav Gyawali, Christian Heiter, Ali H. Z. Kavaki, Abdullah Khalid, Xiangzhou Kong, Bohdan Kulchytskyy, Elica Kyoseva, Ruoyu Li, P. Aaron Lott, Igor L. Markov, Robert F. McDermott, Lucas Morais, Giacomo Pedretti, Pooja Rao, Eleanor Rieffel, Allyson Silva, John Sorebo, Panagiotis Spentzouris, Ziv Steiner, Boyan Torosov, Davide Venturelli, Robert J. Visser, Zak Webb, Xin Zhan, Yonatan Cohen, Pooya Ronagh, Alan Ho, Raymond G. Beausoleil, John M. Martinis

Comments 71 pages, 53 figures. General revision, added new sections, added figures, added references, added appendices

2410.03191 2026-03-16 stat.ML cs.LG

Nested Deep Learning Model Towards A Foundation Model for Brain Signal Data

Fangyi Wei, Jiajie Mo, Kai Zhang, Haipeng Shen, Srikantan Nagarajan, Fei Jiang

Comments 56 pages; paper structure updated

2407.15693 2026-03-16 math.AP cs.LG math.FA math.ST stat.TH

Fisher-Rao Gradient Flow: Geodesic Convexity and Functional Inequalities

José A. Carrillo, Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Dongyi Wei

Comments 38 pages

2303.07287 2026-03-16 stat.ML cs.LG econ.EM

Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment Norm

Huiming Zhang, Haoyu Wei, Guang Cheng

Comments This manuscript has been withdrawn by the authors as it is not yet ready for public release. Further improvements and revisions are required before a final version can be considered for distribution

2603.13225 2026-03-16 math.NT

Sizes of Pre-Images of the Minimal Euclidean Function on the Gaussian Integers

Hester Graves

Comments 7 pages, six illustrations (but only 4 figures)

2603.13222 2026-03-16 cond-mat.str-el cond-mat.quant-gas quant-ph

Two-channel physics in a lightly doped antiferromagnetic Mott insulator revealed by two-hole spectroscopy

Pit Bermes, Sebastian Paeckel, Annabelle Bohrdt, Lukas Homeier, Fabian Grusdt

Comments 7 pages, 4 figures