arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.16737 2026-04-24 cs.LG cs.AI cs.CL cs.CR math.OC

Secure LLM Fine-Tuning via Safety-Aware Probing

Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Xiaokun Luan, Meng Sun

详情

英文摘要

Large language models (LLMs) have achieved remarkable success across many applications, but their ability to generate harmful content raises serious safety concerns. Although safety alignment techniques are often applied during pre-training or post-training, recent studies show that subsequent fine-tuning on adversarial or even benign data can still compromise model safety. In this paper, we revisit the fundamental question of why fine-tuning on non-harmful data may nevertheless degrade safety. We show that the safety and task-performance loss landscapes are partially decoupled, so updates that improve task-specific performance may still move the model toward unsafe regions. Based on this insight, we propose a safety-aware probing (SAP) optimization framework for mitigating safety risks during fine-tuning. Concretely, SAP uses contrastive safety signals to locate safety-correlated directions, and optimizes a lightweight probe that perturbs hidden-state propagation during fine-tuning, thereby steering parameter updates away from harmful trajectories while preserving task-specific learning. Extensive experiments show that SAP consistently improves the safety--utility tradeoff across multiple models and tasks. Averaged over multiple LLMs, SAP reduces the harmful score significantly relative to standard fine-tuning, outperforming strong baselines while maintaining competitive task-specific performance. SAP also demonstrates stronger robustness under harmful data poisoning, adversarial fine-tuning, and a dedicated post-fine-tuning adaptive attack, validating that SAP is an effective and scalable framework for preserving LLM safety during fine-tuning. Our code is available at https://github.com/ChengcanWu/SAP.

URL PDF HTML ☆

赞 0 踩 0

2505.02369 2026-04-24 cs.LG cs.AI cs.CV cs.IT cs.NE math.IT

Sharpness-Aware Minimization with Z-Score Gradient Filtering

Vincent-Daniel Yun

Comments Accepted to ICASSP 2026 | NeurIPS 2025 OPT Workshop Paper

2505.00039 2026-04-24 cs.CL cs.AI cs.IR

An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach

Hudson de Martim

Comments Major revision for clarity and academic precision. Updated title and abstract. Refined core terminology, contributions, related work, and shifted the implementation to a conceptual architecture. Added new arguments to strengthen the paper's thesis

详情

DOI: 10.3233/FAIA251598
Journal ref: Legal Knowledge and Information Systems (JURIX 2025), Frontiers in Artificial Intelligence and Applications, IOS Press, 2025

英文摘要

Retrieval-Augmented Generation (RAG) systems in the legal domain face a critical challenge: standard, flat-text retrieval is blind to the hierarchical, diachronic, and causal structure of law, leading to anachronistic and unreliable answers. This paper introduces the Structure-Aware Temporal Graph RAG (SAT-Graph RAG), an ontology-driven framework designed to overcome these limitations by explicitly modeling the formal structure and diachronic nature of legal norms. We ground our knowledge graph in a formal, LRMoo-inspired model that distinguishes abstract legal Works from their versioned Expressions. We model temporal states as efficient aggregations that reuse the versioned expressions (CTVs) of unchanged components, and we reify legislative events as first-class Action nodes to make causality explicit and queryable. This structured backbone enables a unified, planner-guided query strategy that applies explicit policies to deterministically resolve complex requests for (i) point-in-time retrieval, (ii) hierarchical impact analysis, and (iii) auditable provenance reconstruction. Through a case study on the Brazilian Constitution, we demonstrate how this approach provides a verifiable, temporally-correct substrate for LLMs, enabling higher-order analytical capabilities while drastically reducing the risk of factual errors. The result is a practical framework for building more trustworthy and explainable legal AI systems.

URL PDF HTML ☆

赞 0 踩 0

2504.15594 2026-04-24 cs.LG cs.CV

Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification

Tatsuhito Hasegawa, Shunsuke Sakai

Comments 22 pages, 11 figures, under review

2504.11159 2026-04-24 cs.AI

C-SHAP for time series: An approach to high-level temporal explanations

Annemarie Jutte, Faizan Ahmed, Jeroen Linssen, Maurice van Keulen

Comments Comments: 18 pages, 7 figures, improved and expanded version of the original paper

2504.07940 2026-04-24 cs.CV

Beyond the Frame: Generating 360 Panoramic Videos from Perspective Videos

Rundong Luo, Matthew Wallingford, Ali Farhadi, Noah Snavely, Wei-Chiu Ma

Comments Project page: https://red-fairy.github.io/argus/

2504.03476 2026-04-24 cs.CV

Anatomy-Aware Text-Visual Fusion with Dual-Perspective Prompts for Fine-Grained Lumbar Spine Segmentation

Sheng Lian, Jianlong Cai, Dengfeng Pan, Guang-Yong Chen, Hao Xu, Fan Zhang, Guodong Fan, Shuo Li

2503.17239 2026-04-24 cs.CL cs.AI

SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche

2502.20769 2026-04-24 cs.CV

Information Bottleneck-Guided Heterogeneous Graph Learning for Interpretable Neurodevelopmental Disorder Diagnosis

Yueyang Li, Lei Chen, Wenhao Dong, Shengyu Gong, Zijian Kang, Boyang Wei, Weiming Zeng, Hongjie Yan, Lingbin Bian, Zhiguo Zhang, Wai Ting Siok, Nizhuan Wang

详情

DOI: 10.1016/j.neucom.2026.133721
Journal ref: Neurocomputing, 2026

英文摘要

Developing interpretable models for neurodevelopmental disorders (NDDs) diagnosis presents significant challenges in effectively encoding, decoding, and integrating multimodal neuroimaging data. While many existing machine learning approaches have shown promise in brain network analysis, they typically suffer from limited interpretability, particularly in extracting meaningful biomarkers from functional magnetic resonance imaging (fMRI) data and establishing clear relationships between imaging features and demographic characteristics. Besides, current graph neural network methodologies face limitations in capturing both local and global functional connectivity patterns while simultaneously achieving theoretically principled multimodal data fusion. To address these challenges, we propose the Interpretable Information Bottleneck Heterogeneous Graph Neural Network (I2B-HGNN), a unified framework that applies information bottleneck principles to guide both brain connectivity modeling and cross-modal feature integration. This framework comprises two complementary components. The first is the Information Bottleneck Graph Transformer (IBGraphFormer), which combines transformer-based global attention mechanisms with graph neural networks through information bottleneck-guided pooling to identify sufficient biomarkers. The second is the Information Bottleneck Heterogeneous Graph Attention Network (IB-HGAN), which employs meta-path-based heterogeneous graph learning with structural consistency constraints to achieve interpretable fusion of neuroimaging and demographic data. The experimental results demonstrate that I2B-HGNN achieves superior performance in diagnosing NDDs, exhibiting both high classification accuracy and the ability to provide interpretable biomarker identification while effectively analyzing non-imaging data.

URL PDF HTML ☆

赞 0 踩 0

2502.17751 2026-04-24 cs.LG cs.AI

Graded Neural Networks

Tony Shaska

2502.15793 2026-04-24 cs.LG cs.SY eess.SY

Anomaly Detection in Smart Power Grids with Graph-Regularized MS-SVDD: a Multimodal Subspace Learning Approach

Thomas Debelle, Fahad Sohrab, Pekka Abrahamsson, Moncef Gabbouj

Comments 23 pages, 5 figures, supplementary material

2502.04416 2026-04-24 cs.LG cs.AI

Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis

Zehua Pei, Hui-Ling Zhen, Lancheng Zou, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

Comments Accepted by ACL 2026 Main

2501.11275 2026-04-24 cs.LG cs.NA math.NA

Higher Order Approximation Rates for ReLU CNNs in Korobov Spaces

Yuwen Li, Guozhi Zhang

2411.17061 2026-04-24 cs.CV

SCASeg: Strip Cross-Attention for Efficient Semantic Segmentation

Guoan Xu, Jiaming Chen, Wenfeng Huang, Wenjing Jia, Guangwei Gao, Guo-Jun Qi

Comments TIP

2411.11707 2026-04-24 cs.CL cs.AI

Federated Co-tuning Framework for Large and Small Language Models

Tao Fan, Yan Kang, Guoqiang Ma, Lixin Fan, Shuoling Liu, Kai Chen, Qiang Yang

2410.16698 2026-04-24 cs.LG

Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation

Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

Comments Accepted at AISTATS 2025

2410.16006 2026-04-24 cs.CL

Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Divyanshu Aggarwal, Sankarshan Damle, Navin Goyal, Satya Lokam, Sunayana Sitaram

Comments 19 pages, 6 tables, 4 figures, Accepted to ACL 2026 Findings

2407.19664 2026-04-24 cs.LG

Adaptive Soft Error Protection for Neural Network Processing

Xinghua Xue, Cheng Liu, Feng Min, Yinhe Han

2309.07176 2026-04-24 cs.LG stat.ML

Mind the Gap: Optimal and Equitable Encouragement Policies

Angela Zhou

Comments Updated with major new case study on SNAP recertification benefits

2305.01626 2026-04-24 cs.CL cs.AI cs.SD eess.AS

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Gašper Beguš, Thomas Lu, Zili Wang

2304.01844 2026-04-24 cs.AI

Grid-SD2E: A General Grid-Feedback in a System for Cognitive Learning

Jingyi Feng, Chenming Zhang

Comments 21 pages, 8 figures, 8 formulas

2106.01254 2026-04-24 cs.LG cs.HC cs.MA

Principled Evaluation with Human Labels: One Rater at a Time and Rater Equivalence

Paul Resnick, Yuqing Kong, Grant Schoenebeck, Tim Weninger

2012.10700 2026-04-24 cs.AI

Minimax Strikes Back

Quentin Cohen-Solal, Tristan Cazenave

2604.21603 2026-04-24 cs.LO cs.AI cs.DB

Using ASP(Q) to Handle Inconsistent Prioritized Data

Meghyn Bienvenu, Camille Bourgaux, Robin Jean, Giuseppe Mazzotta

Comments This is an extended version of a paper appearing at the 23rd International Conference on Principles of Knowledge Representation and Reasoning (KR 2026). 21 pages

2604.21602 2026-04-24 cs.NE cs.AI cs.AR cs.ET cs.LG

On the Role of Preprocessing and Memristor Dynamics in Reservoir Computing for Image Classification

Rishona Daniels, Duna Wattad, Ronny Ronen, David Saad, Shahar Kvatinsky

Comments Accepted for publication in Advanced Electronic Materials. Main text: pages 1-32, 11 figures. Supporting information: pages 24-32, 11 figures

2604.21599 2026-04-24 cs.SE cs.LG

Verifying Machine Learning Interpretability Requirements through Provenance

Lynn Vonderhaar, Juan Couder, Daryela Cisneros, Omar Ochoa

2604.21595 2026-04-24 stat.ML cs.LG

A Kernel Nonconformity Score for Multivariate Conformal Prediction

Louis Meyer, Wenkai Xu

2604.21579 2026-04-24 cs.SE cs.AI

A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair

Milan De Koning, Ali Asgari, Pouria Derakhshanfar, Annibale Panichella

Comments 12 pages

2604.21536 2026-04-24 cs.IR cs.AI

Pre-trained LLMs Meet Sequential Recommenders: Efficient User-Centric Knowledge Distillation

Nikita Severin, Danil Kartushov, Vladislav Urzhumov, Vladislav Kulikov, Oksana Konovalova, Alexey Grishanov, Anton Klenitskiy, Artem Fatkulin, Alexey Vasilev, Andrey Savchenko, Ilya Makarov

Comments Accepted to ECIR 2026. 7 pages. This version of the contribution has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-032-21300-6_42

2604.21529 2026-04-24 cs.MA cs.AI

Architectures for Robust Self-Organizing Energy Systems under Information and Control Constraints

Emilie Frost, Astrid Nieße

Comments This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution will be published in Agents and Artificial Intelligence, Lecture Notes in Computer Science, and available online at https://doi.org/10.1007/978-3-032-25029-2_19