arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.09544 2026-04-13 cs.CL cs.AI cs.LG

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

Hadas Orgad, Boyi Wei, Kaden Zheng, Martin Wattenberg, Peter Henderson, Seraphina Goldfarb-Tarrant, Yonatan Belinkov

详情

英文摘要

Large language models (LLMs) undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle: jailbreaks routinely bypass them, and fine-tuning on narrow domains can induce ``emergent misalignment'' that generalizes broadly. Whether this brittleness reflects a fundamental lack of coherent internal organization for harmfulness remains unclear. Here we use targeted weight pruning as a causal intervention to probe the internal organization of harmfulness in LLMs. We find that harmful content generation depends on a compact set of weights that are general across harm types and distinct from benign capabilities. Aligned models exhibit a greater compression of harm generation weights than unaligned counterparts, indicating that alignment reshapes harmful representations internally--despite the brittleness of safety guardrails at the surface level. This compression explains emergent misalignment: if weights of harmful capabilities are compressed, fine-tuning that engages these weights in one domain can trigger broad misalignment. Consistent with this, pruning harm generation weights in a narrow domain substantially reduces emergent misalignment. Notably, LLMs harmful generation capability is dissociated from how they recognize and explain such content. Together, these results reveal a coherent internal structure for harmfulness in LLMs that may serve as a foundation for more principled approaches to safety.

URL PDF HTML ☆

赞 0 踩 0

2604.09541 2026-04-13 cs.CR cs.IR

Trans-RAG: Query-Centric Vector Transformation for Secure Cross-Organizational Retrieval

Yu Liu, Kun Peng, Wenxiao Zhang, Fangfang Yuan, Cong Cao, Wenxuan Lu, Yanbing Liu

Comments Accepted by DASFAA 2026

2604.09540 2026-04-13 cs.ET q-bio.BM

A Physically-Informed Subgraph Isomorphism Approach to Molecular Docking Using Quantum Annealers

Francesco Micucci, Matteo Barbieri, Gabriella Bettonte, Domenico Bonanni, Anita Camillini, Anna Fava, Daniele Gregori, Andrea R. Beccari, Gianluca Palermo

Comments 7 pages, 3 figures

2604.09537 2026-04-13 cs.CL cs.AI cs.IR cs.LG

Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision

Soroosh Tayebi Arasteh, Mehdi Joodaki, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn

2604.09535 2026-04-13 cs.CV

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

Lulin Liu, Dayou Li, Yiqing Liang, Sicong Jiang, Hitesh Vijay, Hezhen Hu, Xuhai Xu, Zirui Liu, Srinivas Shakkottai, Manling Li, Zhiwen Fan

Comments https://ego-tl.github.io/

2604.09533 2026-04-13 cs.DM cs.IT math.IT quant-ph

On Worst-Case Optimal Polynomial Intersection

Yihang Sun, Mary Wootters

Comments 45 pages, 4 figures

2604.09532 2026-04-13 cs.CV cs.AI

Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise

Zibin Geng, Xuefeng Jiang, Jia Li, Zheng Li, Tian Wen, Lvhua Wu, Sheng Sun, Yuwei Wang, Min Liu

2604.09531 2026-04-13 cs.CV cs.AI cs.CL

VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images

Guanyu Zhou, Yida Yin, Wenhao Chai, Shengbang Tong, Xingyu Fu, Zhuang Liu

Comments Project Page: https://zlab-princeton.github.io/VisionFoundry/

2604.09529 2026-04-13 cs.CV cs.AI cs.CL

VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

Wenyi Xiao, Xinchi Xu, Leilei Gan

Comments 24 pages, ACL 2026 Main. Repository: https://github.com/Mr-Loevan/VL-Calibration

2604.09527 2026-04-13 cs.CV cs.AI cs.LG

Envisioning the Future, One Step at a Time

Stefan Andreas Baumann, Jannik Wiese, Tommaso Martorella, Mahdi M. Kalayeh, Björn Ommer

Comments CVPR 2026. For code and models, see http://compvis.github.io/myriad

2604.09523 2026-04-13 cs.LG cs.MA

Event-Driven Temporal Graph Networks for Asynchronous Multi-Agent Cyber Defense in NetForge_RL

Igor Jankowski

Comments 26 pages, 14 figures, 5 tables

2604.09522 2026-04-13 cs.DS

Packing Compact Subgraphs with Applications to Districting

Ho-Lin Chen, Po-Yu Chou, Prathamesh Dharangutte, Jie Gao, Shang-En Huang, Fang-Yi Yu

Comments To appear in FORC 2026

2604.09521 2026-04-13 cs.IT cs.AI math.IT

Semantic Rate-Distortion for Bounded Multi-Agent Communication: Capacity-Derived Semantic Spaces and the Communication Cost of Alignment

Anthony T. Nixon

Comments 34 pages, 13 figures. Code: https://github.com/alch3mistdev/semantic-rate-distortion

详情

英文摘要

When two agents of different computational capacities interact with the same environment, they need not compress a common semantic alphabet differently; they can induce different semantic alphabets altogether. We show that the quotient POMDP $Q_{m,T}(M)$ - the unique coarsest abstraction consistent with an agent's capacity - serves as a capacity-derived semantic space for any bounded agent, and that communication between heterogeneous agents exhibits a sharp structural phase transition. Below a critical rate $R_{\text{crit}}$ determined by the quotient mismatch, intent-preserving communication is structurally impossible. In the supported one-way memoryless regime, classical side-information coding then yields exponential decay above the induced benchmark. Classical coding theorems tell you the rate once the source alphabet is fixed; our contribution is to derive that alphabet from bounded interaction itself. Concretely, we prove: (1) a fixed-$\varepsilon$ structural phase-transition theorem whose lower bound is fully general on the common-history quotient comparison; (2) a one-way Wyner-Ziv benchmark identification on quotient alphabets, with exact converse, exact operational equality for memoryless quotient sources, and an ergodic long-run bridge via explicit mixing bounds; (3) an asymptotic one-way converse in the shrinking-distortion regime $\varepsilon = O(1/T)$, proved from the message stream and decoder side information; and (4) alignment traversal bounds enabling compositional communication through intermediate capacity levels. Experiments on eight POMDP environments (including RockSample(4,4)) illustrate the phase transition, a structured-policy benchmark shows the one-way rate can drop by up to $19\times$ relative to the counting bound, and a shrinking-distortion sweep matches the regime of the asymptotic converse.

URL PDF HTML ☆

赞 0 踩 0

2604.09520 2026-04-13 math.CO cs.DM math.PR

Random 0/1-polytopes expand rapidly

He Guo, István Tomon

Comments 21 pages

2604.09517 2026-04-13 cs.DC

Sustaining Exascale Performance: Lessons from HPL and HPL-MxP on Aurora

Kazushige Goto, Huda Ibeid, Kalyan Kumaran, Servesh Muralidharan, Anthony-Trung Nguyen, Aditya Nishtala

2604.09515 2026-04-13 cs.SE

When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation

Ahmed Nusayer Ashik, Shaowei Wang, Tse-Hsun Chen, Muhammad Asaduzzaman, Yuan Tian

2604.09514 2026-04-13 cs.CL cs.HC

Many Ways to Be Fake: Benchmarking Fake News Detection Under Strategy-Driven AI Generation

Xinyu Wang, Sai Koneru, Wenbo Zhang, Wenliang Zheng, Saksham Ranjan, Sarah Rajtmajer

2604.09512 2026-04-13 cs.LG physics.optics

Integrated electro-optic attention nonlinearities for transformers

Luis Mickeler, Kai Lion, Alfonso Nardi, Jost Kellner, Pierre Didier, Bhavin J. Shastri, Niao He, Rachel Grange

2604.09511 2026-04-13 cs.CV

RIRF: Reasoning Image Restoration Framework

Wending Yan, Rongkai Zhang, Kaihua Tang, Yu Cheng, Qiankun Liu

2604.09508 2026-04-13 cs.CV cs.AI

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

Yucheng Shen, Jiulong Wu, Jizhou Huang, Dawei Yin, Lingyong Yan, Min Cao

2604.09501 2026-04-13 cs.CL

You Can't Fight in Here! This is BBS!

Richard Futrell, Kyle Mahowald

Comments Accepted at Behavioral and Brain Sciences as a response to the commentaries to the accepted target article "How Linguistics Learned to Stop Worrying and Love the Language Models", whose preprint appears here: arXiv:2501.17047

2604.09499 2026-04-13 cs.RO

Physics-Informed Reinforcement Learning of Spatial Density Velocity Potentials for Map-Free Racing

Shathushan Sivashangaran, Apoorva Khairnar, Sepideh Gohari, Vihaan Dutta, Azim Eskandarian

2604.09498 2026-04-13 math.NA cs.NA

New Scheme Adaption Strategy for Hyperbolic Conservation Laws

Shaoshuai Chu, Michael Herty, Alexander Kurganov

2604.09497 2026-04-13 cs.CL cs.AI

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

Hippolyte Gisserot-Boukhlef, Nicolas Boizard, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo

2604.09495 2026-04-13 cs.MA

Risk-seeking conservative policy iteration with agent-state based policies for Dec-POMDPs with guaranteed convergence

Amit Sinha, Matthieu Geist, Aditya Mahajan

2604.09494 2026-04-13 cs.CL cs.AI cs.IR cs.LG

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

Kyle Whitecross, Negin Rahimi

Comments Code, data, and models available at https://github.com/kswhitecross/RecaLLM

2604.09493 2026-04-13 cs.NI

Policy-Aware Edge LLM-RAG Framework for Internet of Battlefield Things Mission Orchestration

Om Solanki, Lopamudra Praharaj, Deepti Gupta, Maanak Gupta

Comments 10 pages, 5 figures, Accepted at AIS 2026

2604.09489 2026-04-13 cs.CR cs.AI cs.DC cs.LG

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

Israt Jahan Mouri, Muhammad Ridowan, Muhammad Abdullah Adnan

Comments 21 pages, 9 figures, 7 tables

2604.09484 2026-04-13 math.NA cs.NA

Asymptotic-preserving deterministic particle methods for collisional plasma models

Yan Huang, Li Wang

2604.09480 2026-04-13 cs.CV

Online3R: Online Learning for Consistent Sequential Reconstruction Based on Geometry Foundation Model

Shunkai Zhou, Zike Yan, Fei Xue, Dong Wu, Yuchen Deng, Hongbin Zha