arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.07754 2026-04-10 cs.CR cs.CL

The Art of (Mis)alignment: How Fine-Tuning Methods Effectively Misalign and Realign LLMs in Post-Training

Rui Zhang, Hongwei Li, Yun Shen, Xinyue Shen, Wenbo Jiang, Guowen Xu, Yang Liu, Michael Backes, Yang Zhang

Comments Accepted by ACL Findings 2026

详情

英文摘要

The deployment of large language models (LLMs) raises significant ethical and safety concerns. While LLM alignment techniques are adopted to improve model safety and trustworthiness, adversaries can exploit these techniques to undermine safety for malicious purposes, resulting in \emph{misalignment}. Misaligned LLMs may be published on open platforms to magnify harm. To address this, additional safety alignment, referred to as \emph{realignment}, is necessary before deploying untrusted third-party LLMs. This study explores the efficacy of fine-tuning methods in terms of misalignment, realignment, and the effects of their interplay. By evaluating four Supervised Fine-Tuning (SFT) and two Preference Fine-Tuning (PFT) methods across four popular safety-aligned LLMs, we reveal a mechanism asymmetry between attack and defense. While Odds Ratio Preference Optimization (ORPO) is most effective for misalignment, Direct Preference Optimization (DPO) excels in realignment, albeit at the expense of model utility. Additionally, we identify model-specific resistance, residual effects of multi-round adversarial dynamics, and other noteworthy findings. These findings highlight the need for robust safeguards and customized safety alignment strategies to mitigate potential risks in the deployment of LLMs. Our code is available at https://github.com/zhangrui4041/The-Art-of-Mis-alignment.

URL PDF HTML ☆

赞 0 踩 0

2604.07752 2026-04-10 cs.SE cs.AI

MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models

Yifei Chen, Sarra Habchi, Lili Wei

Comments 10 pages, Accepted by FSE Companion '26, July 5--9, 2026, Montreal, QC, Canada

2604.07748 2026-04-10 stat.ML cs.LG

Sparse $ε$ insensitive zone bounded asymmetric elastic net support vector machines for pattern classification

Haiyan Du, Hu Yang

2604.07744 2026-04-10 stat.ML cs.LG econ.EM math.ST stat.TH

The Condition-Number Principle for Prototype Clustering

Romano Li, Jianfei Cao

2604.07739 2026-04-10 cs.IR cs.LG

Efficient Dataset Selection for Continual Adaptation of Generative Recommenders

Cathy Jiao, Juan Elenter, Praveen Ravichandran, Bernd Huber, Joseph Cauteruccio, Todd Wasson, Timothy Heath, Chenyan Xiong, Mounia Lalmas, Paul Bennett

Comments ICLR 2026 CAO Workshop (Oral)

2604.07727 2026-04-10 cs.CR cs.AI

TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense

Cheng Liu, Xiaolei Liu, Xingyu Li, Bangzhou Xin, Kangyi Ding

Comments Accepted to Findings of ACL 2026

2604.07695 2026-04-10 cs.CR cs.AI

AITH: A Post-Quantum Continuous Delegation Protocol for Human-AI Trust Establishment

Zhaoliang Chen

Comments 11 pages, 8 tables, 5 theorems (machine-verified via Tamarin Prover). Supplementary materials including formal verification model and reference implementation available from the author

2604.07679 2026-04-10 cs.SE cs.LG cs.SY eess.SY

Towards Counterfactual Explanation and Assertion Inference for CPS Debugging

Zaid Ghazal, Hadiza Yusuf, Khouloud Gaaloul

2604.07671 2026-04-10 stat.ML cs.LG cs.NA math.DS math.NA

On the Unique Recovery of Transport Maps and Vector Fields from Finite Measure-Valued Data

Jonah Botvinick-Greenhouse, Yunan Yang

2604.07639 2026-04-10 quant-ph cs.AI cs.CC cs.IT cs.LG math.IT

Exponential quantum advantage in processing massive classical data

Haimeng Zhao, Alexander Zlokapa, Hartmut Neven, Ryan Babbush, John Preskill, Jarrod R. McClean, Hsin-Yuan Huang

Comments 144 pages, including 9 pages of main text and 10 figures. Code available at https://github.com/haimengzhao/quantum-oracle-sketching

2604.07635 2026-04-10 stat.ML cs.LG stat.AP

Variational Approximated Restricted Maximum Likelihood Estimation for Spatial Data

Debjoy Thakur

2604.07609 2026-04-10 cs.DC cs.LG cs.OS cs.PF cs.SE

Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC

Mohammad Siavashi, Mariano Scazzariello, Gerald Q. Maguire, Dejan Kostić, Marco Chiesa

2604.07601 2026-04-10 cs.CY cs.AI

Google, AI Literacy, and the Learning Sciences: Multiple Modes of Research, Industry, and Practice Partnerships

Victor R. Lee, Michael Madaio, Ben Garside, Aimee Welch, Kristen Pilner Blair, Ibrahim Oluwajoba Adisa, Alon Harris, Kevin Holst, Liat Ben Rafael, Ronit Levavi Morad, Ben Travis, Belle Moller, Andrew Shields, Zak Brown, Lois Hinx, Marisol Diaz, Evan Patton, Selim Tezel, Robert Parks, Hal Abelson, Adam Blasioli, Jeremy Roschelle

2604.07591 2026-04-10 stat.ME cs.AI cs.CL cs.LG stat.ML

From Ground Truth to Measurement: A Statistical Framework for Human Labeling

Robert Chew, Stephanie Eckman, Christoph Kern, Frauke Kreuter

2604.07585 2026-04-10 cs.IR cs.AI

Don't Measure Once: Measuring Visibility in AI Search (GEO)

Julius Schulte, Malte Bleeker, Philipp Kaufmann

Comments 19 pages, 7 figures, 17 tables. Comments welcome!

2604.07560 2026-04-10 q-bio.QM cs.LG

Predicting Activity Cliffs for Autonomous Medicinal Chemistry

Michael Cuccarese

Comments 8 pages, 4 figures github: https://github.com/mcuccarese/Activity-cliff-prediction webapp: https://activity-cliffs-5gnirhr3k3ybhwhz7de7ua.streamlit.app/

2604.07551 2026-04-10 cs.CR cs.AI

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan, Mohammad Ghasemigol, Daniel Takabi

2604.07526 2026-04-10 cs.AR cs.LG

From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference

Ravindra Ganti, Steve Xu

Comments 25 pages, 12 figures, 21 tables

2604.07520 2026-04-10 hep-ph cs.LG

Lecture notes on Machine Learning applications for global fits

Jorge Alda

Comments Lecture notes for the 4th COMCHA School on Computing Challenges in Zaragoza (Spain), 8-15 April 2026. 24 pages, 10 figures, 14 code snippets, 1 appendix. Submission to SciPost Physics Lecture Notes

2604.07502 2026-04-10 cs.SE cs.AI

Beyond Human-Readable: Rethinking Software Engineering Conventions for the Agentic Development Era

Dmytro Ustynov

2604.07494 2026-04-10 cs.SE cs.AI cs.LG

Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals

Lech Madeyski

Comments 5 pages, 1 figure

2604.07493 2026-04-10 cs.CR cs.LG stat.AP

Differentially Private Modeling of Disease Transmission within Human Contact Networks

Shlomi Hod, Debanuj Nayak, Jason R. Gantenberg, Iden Kalemaj, Thomas A. Trikalinos, Adam Smith

详情

英文摘要

Epidemiologic studies of infectious diseases often rely on models of contact networks to capture the complex interactions that govern disease spread, and ongoing projects aim to vastly increase the scale at which such data can be collected. However, contact networks may include sensitive information, such as sexual relationships or drug use behavior. Protecting individual privacy while maintaining the scientific usefulness of the data is crucial. We propose a privacy-preserving pipeline for disease spread simulation studies based on a sensitive network that integrates differential privacy (DP) with statistical network models such as stochastic block models (SBMs) and exponential random graph models (ERGMs). Our pipeline comprises three steps: (1) compute network summary statistics using \emph{node-level} DP (which corresponds to protecting individuals' contributions); (2) fit a statistical model, like an ERGM, using these summaries, which allows generating synthetic networks reflecting the structure of the original network; and (3) simulate disease spread on the synthetic networks using an agent-based model. We evaluate the effectiveness of our approach using a simple Susceptible-Infected-Susceptible (SIS) disease model under multiple configurations. We compare both numerical results, such as simulated disease incidence and prevalence, as well as qualitative conclusions such as intervention effect size, on networks generated with and without differential privacy constraints. Our experiments are based on egocentric sexual network data from the ARTNet study (a survey about HIV-related behaviors). Our results show that the noise added for privacy is small relative to other sources of error (sampling and model misspecification). This suggests that, in principle, curators of such sensitive data can provide valuable epidemiologic insights while protecting privacy.

URL PDF HTML ☆

赞 0 踩 0

2604.07473 2026-04-10 cs.NE cs.AI

When Switching Algorithms Helps: A Theoretical Study of Online Algorithm Selection

Denis Antipov, Carola Doerr

2604.07420 2026-04-10 cs.IR cs.LG

Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking

Chao Zhang, Shuai Lin, ChengLei Dai, Ye Qian, Fan Mingyang, Yi Zhang, Yi Wang, Jingwei Zhuo

2604.07415 2026-04-10 cs.IR cs.AI cs.CL

SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval

Roxana Petcu, Evangelos Kanoulas, Maarten de Rijke

2604.07414 2026-04-10 cs.LO cs.RO cs.SE cs.SY eess.SY

Formally Guaranteed Control Adaptation for ODD-Resilient Autonomous Systems

Gricel Vázquez, Calum Imrie, Sepeedeh Shahbeigi, Nawshin Mannan Proma, Tian Gan, Victoria J Hodge, John Molloy, Simos Gerasimou

2604.07404 2026-04-10 cond-mat.stat-mech cs.LG math.AP stat.ML

Score Shocks: The Burgers Equation Structure of Diffusion Generative Models

Krisanu Sarkar

Comments 41 pages, 7 figures. Introduces a Burgers equation formulation of diffusion model score dynamics and a local binary-boundary theorem for speciation

2604.07401 2026-04-10 cond-mat.dis-nn cs.LG

Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory

Tatiana Petrova, Evgeny Polyachenko, Radu State

2604.07398 2026-04-10 cs.SE cs.AI

Breaking the Illusion of Identity in LLM Tooling

Marek Miller

Comments 8 pages, 2 figures, 2 tables

2604.07396 2026-04-10 cs.AR cs.LG

SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs

Jintao Zhang, Xuanyao Fong