arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1080
2604.21068 2026-04-24 cond-mat.mtrl-sci cond-mat.mes-hall cs.AI

Expanding the extreme-k dielectric materials space through physics-validated generative reasoning

Hossain Hridoy, Tahiya Chowdhury, Md Shafayat Hossain

详情
英文摘要

The most technologically consequential materials are often the rarest: they occupy narrow regions of chemical space, obey competing physical constraints, and appear only sparsely in existing databases. High-kappa dielectrics, high-Tc superconductors, and ferromagnetic insulators are to name a few. This scarcity fundamentally limits today's data-driven materials discovery, where machine-learning models excel at interpolation but struggle to generate genuinely new candidates. Here, we introduce DielecMIND, an artificial intelligence framework that reframes materials discovery as a reasoning-driven exploration instead of a database-screening problem. Using high-kappa dielectrics as a data-scarce and technologically stringent test case, DielecMIND combines large-language-model hypothesis generation for the first time with physics validated first-principles calculation to navigate chemical space beyond known compounds. Prior to our work, only 14 experimentally or computationally validated materials with kappa > 150 were known. Our framework discovers and validates 5 new such compounds, expanding this rare-materials class by a remarkable = 35% in a single study. Among them, we find that Ba2TiHfO6 exhibits a dielectric constant of 637, minimal loss at low optical frequencies, and stability up to 800 K. Beyond dielectrics, this work demonstrates a new paradigm for artificial-intelligence-guided discovery: one that generates a small number of physically grounded, experimentally plausible candidates yet measurably expands sparsely populated functional materials spaces. Thus, DielecMIND points toward a general strategy for discovering rare, high-impact functional materials where data scarcity has long constrained progress.

2604.21043 2026-04-24 cs.CY cs.AI cs.LG

Strategic Polysemy in AI Discourse: A Philosophical Analysis of Language, Hype, and Power

Travis LaCroix, Fintan Mallory, Sasha Luccioni

Comments Accepted in the Ninth Annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2026

详情
英文摘要

This paper examines the strategic use of language in contemporary artificial intelligence (AI) discourse, focusing on the widespread adoption of metaphorical or colloquial terms like "hallucination", "chain-of-thought", "introspection", "language model", "alignment", and "agent". We argue that many such terms exhibit strategic polysemy: they sustain multiple interpretations simultaneously, combining narrow technical definitions with broader anthropomorphic or common-sense associations. In contemporary AI research and deployment contexts, this semantic flexibility produces significant institutional and discursive effects, shaping how AI systems are understood by researchers, policymakers, funders, and the public. To analyse this phenomenon, we introduce the concept of glosslighting: the practice of using technically redefined terms to evoke intuitive -- often anthropomorphic or misleading -- associations while preserving plausible deniability through restricted technical definitions. Glosslighting enables actors to benefit from the persuasive force of familiar language while maintaining the ability to retreat to narrower definitions when challenged. We argue that this practice contributes to AI hype cycles, facilitates the mobilisation of investment and institutional support, and influences public and policy perceptions of AI systems, while often deflecting epistemic and ethical scrutiny. By examining the linguistic dynamics of glosslighting and strategic polysemy, the paper highlights how language itself functions as a sociotechnical mechanism shaping the development and governance of AI.

2604.21030 2026-04-24 eess.SY cs.AI cs.RO cs.SY math.OC

A Systematic Review and Taxonomy of Reinforcement Learning-Model Predictive Control Integration for Linear Systems

Mohsen Jalaeian Farimani, Roya Khalili Amirabadi, Davoud Nikkhouy, Malihe Abdolbaghi, Mahshad Rastegarmoghaddam, Shima Samadzadeh

详情
英文摘要

The integration of Model Predictive Control (MPC) and Reinforcement Learning (RL) has emerged as a promising paradigm for constrained decision-making and adaptive control. MPC offers structured optimization, explicit constraint handling, and established stability tools, whereas RL provides data-driven adaptation and performance improvement in the presence of uncertainty and model mismatch. Despite the rapid growth of research on RL--MPC integration, the literature remains fragmented, particularly for control architectures built on linear or linearized predictive models. This paper presents a comprehensive Systematic Literature Review (SLR) of RL--MPC integrations for linear and linearized systems, covering peer-reviewed and formally indexed studies published until 2025. The reviewed studies are organized through a multi-dimensional taxonomy covering RL functional roles, RL algorithm classes, MPC formulations, cost-function structures, and application domains. In addition, a cross-dimensional synthesis is conducted to identify recurring design patterns and reported associations among these dimensions within the reviewed corpus. The review highlights methodological trends, commonly adopted integration strategies, and recurring practical challenges, including computational burden, sample efficiency, robustness, and closed-loop guarantees. The resulting synthesis provides a structured reference for researchers and practitioners seeking to design or analyze RL--MPC architectures based on linear or linearized predictive control formulations.

2604.21029 2026-04-24 math.OC cs.AI

Integrated packing, placement, scheduling, and routing of personalized production: a pharmaceutical Industry 4.0 use-case with a planar transport system

Viktor Emil Korladinov, Antonin Novak, Zdeněk Hanzálek, Erik Sonntag, František Štěpánek

详情
英文摘要

The recent emergence of planar transport systems necessitates re-evaluation of Flexible Manufacturing Systems (FMS) to address the simultaneous scheduling of internal logistics and production operations. By operating on a tile-based planar grid, these systems allow independent movers full two-dimensional freedom, mitigating inefficiencies inherent to traditional sequential lines. This paper applies a planar FMS framework to a real-world use case in the pharmaceutical industry: the automated production of personalized drugs. Implementing this system requires solving optimization problems at both tactical and operational levels. The tactical level involves decisions regarding production line layout and the positioning of drug dispensers. A Mixed-Integer Quadratic Programming model is utilized for the packing problem to exploit drug co-occurrence patterns found in historical patient data. Subsequently, we solve the placement problem - a bi-level problem combining an assignment problem with Shortest Hamiltonian paths with neighborhoods - to arrange dispensers in a layout minimizing expected travel distances. The operational level is encountered daily, scheduling individual movers to process new orders as quickly as possible. This scheduling problem is formulated using Constraint Programming, modeling movers as reservoir resources to ensure order completeness, complemented by a routing phase using an iterative conflict-resolution mechanism and DAG-based reasoning to convert schedules into conflict-free paths. Evaluation using real-world prescription data for 40 drugs shows the framework scales efficiently across several layout topologies for up to 500 orders, with schedules that are highly effective and computationally tractable for daily operations.

2604.20994 2026-04-24 cs.CR cs.AI cs.CL

Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models

Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis, Seshu Tirupathi, John D. Kelleher

详情
英文摘要

The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the capabilities of AI-powered system by invoking external functions. Injection and jailbreaking attacks have been extensively explored to showcase the vulnerabilities of LLMs to user prompt manipulation. The expanded capabilities of agentic models introduce further vulnerabilities via their function calling interface. Recent work in LLM security showed that function calling can be abused, leading to data tampering and theft, causing disruptive behavior such as endless loops, or causing LLMs to produce harmful content in the style of jailbreaking attacks. This paper introduces a novel function hijacking attack (FHA) that manipulates the tool selection process of agentic models to force the invocation of a specific, attacker-chosen function. While existing attacks focus on semantic preference of the model for function-calling tasks, we show that FHA is largely agnostic to the context semantics and robust to the function sets, making it applicable across diverse domains. We further demonstrate that FHA can be trained to produce universal adversarial functions, enabling a single attacked function to hijack tool selection across multiple queries and payload configurations. We conducted experiments on 5 different models, including instructed and reasoning variants, reaching 70% to 100% ASR over the established BFCL dataset. Our findings further demonstrate the need for strong guardrails and security modules for agentic systems.

2604.20981 2026-04-24 q-bio.QM cs.CV cs.LG

PanGuide3D: Cohort-Robust Pancreas Tumor Segmentation via Probabilistic Pancreas Conditioning and a Transformer Bottleneck

Sunny Joy Ma, Xiang Ma

详情
英文摘要

Pancreatic tumor segmentation in contrast-enhanced computed tomography (CT) is clinically important yet technically challenging: lesions are often small, heterogeneous, and easily confused with surrounding soft tissue, and models that perform well on one cohort frequently degrade under cohort shift. Our goal is to improve cross-cohort generalization while keeping the model architecture simple, efficient, and practical for 3D CT segmentation. We introduce PanGuide3D, a cohort-robust architecture with a shared 3D encoder, a pancreas decoder that predicts a probabilistic pancreas map, and a tumor decoder that is explicitly conditioned on this pancreas probability at multiple scales via differentiable soft gating. To capture long-range context under distribution shift, we further add a lightweight Transformer bottleneck in the U-Net bottleneck representation. We evaluate cohort transfer by training on the PanTS (Pancreatic Tumor Segmentation) cohort and testing both in-cohort (PanTS) and out-of-cohort on MSD (Medical Segmentation Decathlon) Task07 Pancreas, using matched preprocessing and training protocols across strong baselines. We collect voxel-level segmentation metrics, patient-level tumor detection, subgroup analyses by tumor size and anatomical location, volume-conditioned performance analyses, and calibration measurements to assess reliability. Across the evaluated models, PanGuide3D achieves the best overall tumor performance and shows improved cross-cohort generalization, particularly for small tumors and challenging anatomical locations, while reducing anatomically implausible false positives. These findings support probabilistic anatomical conditioning as a practical strategy for improving cross-cohort robustness in an end-to-end model and suggest potential utility for contouring support, treatment planning, and multi-institutional studies.

2604.20945 2026-04-24 cs.CR cs.LG

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya, Anirban Roy, Daniel Elenius, Brian Matejek, Adam D. Cobb, Susmit Jha

详情
英文摘要

Effective safety auditing of large language models (LLMs) demands tools that go beyond black-box probing and systematically uncover vulnerabilities rooted in model internals. We present a comprehensive, interpretability-driven jailbreaking audit of eight SOTA open-source LLMs: Llama-3.1-8B, Llama-3.3-70B-4bt, GPT-oss- 20B, GPT-oss-120B, Qwen3-0.6B, Qwen3-32B, Phi4-3.8B, and Phi4-14B. Leveraging interpretability-based approaches -- Universal Steering (US) and Representation Engineering (RepE) -- we introduce an adaptive two-stage grid search algorithm to identify optimal activation-steering coefficients for unsafe behavioral concepts. Our evaluation, conducted on a curated set of harmful queries and a standardized LLM-based judging protocol, reveals stark contrasts in model robustness. The Llama-3 models are highly vulnerable, with up to 91\% (US) and 83\% (RepE) jailbroken responses on Llama-3.3-70B-4bt, while GPT-oss-120B remains robust to attacks via both interpretability approaches. Qwen and Phi models show mixed results, with the smaller Qwen3-0.6B and Phi4-3.8B mostly exhibiting lower jailbreaking rates, while their larger counterparts are more susceptible. Our results establish interpretability-based steering as a powerful tool for systematic safety audits, but also highlight its dual-use risks and the need for better internal defenses in LLM deployment.

2604.20940 2026-04-24 cs.MM cs.NI cs.SD

Sema: Semantic Transport for Real-Time Multimodal Agents

Jiaying Meng, Bojie Li

详情
英文摘要

Real-time multimodal agents transport raw audio and screenshots using networking stacks designed for human receivers, which optimize for perceptual fidelity and smooth playout. Yet agent models act as event-driven processors with no inherent sense of physical time, consuming task-relevant semantics rather than reconstructing signals in real time. This fundamental difference shifts the transport goal from the technical problem of signal fidelity (Shannon-Weaver Level A) to the semantic problem of meaning preservation (Level B). This mismatch imposes significant overhead. In visual pipelines, screenshot upload accounts for over 60% of end-to-end action latency on constrained uplinks, and in voice pipelines, conventional transport carries massive redundancy, sending 43-64x more data than needed to maintain task accuracy. We present Sema, a semantic transport system that combines discrete audio tokenizers with a hybrid screen representation (lossless accessibility-tree or OCR text, plus compact visual tokens) and bursty token delivery that eliminates jitter buffers. In simulations under emulated WAN conditions, Sema reduces uplink bandwidth by 64x for audio and 130-210x for screenshots while preserving task accuracy within 0.7 percentage points of the raw baseline.

2604.20936 2026-04-24 cs.MM cs.CV cs.HC

AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe

Adam Cole, Mick Grierson

Comments To appear in the Proceedings of the 2026 ACM Creativity and Cognition (C&C '26). 15 pages, 19 figures

详情
英文摘要

We present AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to help artists probe the internal mechanics of black-box video generation. While generative outputs are increasingly realistic, prompt-only control limits artists' ability to build intuition for the model's material process or to work beyond its default tendencies. Using an autobiographical research-through-design approach, we built on Network Bending to design AttentionBender, which applies 2D transforms (rotation, scaling, translation, etc.) to cross-attention maps to modulate generation. We assess AttentionBender by visualizing 4,500+ video generations across prompts, operations, and layer targets. Our results suggest that cross-attention is highly entangled: targeted manipulations often resist clean, localized control, producing distributed distortions and glitch aesthetics over linear edits. AttentionBender contributes a tool that functions both as an Explainable AI style probe of transformer attention mechanisms, and as a creative technique for producing novel aesthetics beyond the model's learned representational space.

2604.20934 2026-04-24 cs.CR cs.LG

SDNGuardStack: An Explainable Ensemble Learning Framework for High-Accuracy Intrusion Detection in Software-Defined Networks

Ashikuzzaman, Md. Saifuzzaman Abhi, Mahabubur Rahman, Md. Manjur Ahmed, Md. Mehedi Hasan, Md. Ahsan Arif

详情
英文摘要

Software-Defined Networking (SDN) is another technology that has been developing in the last few years as a relevant technique to improve network programmability and administration. Nonetheless, its centralized design presents a major security issue, which requires effective intrusion detection systems. The SDN-specific machine learning-based intrusion detection system described in this paper is innovative because it is trained and tested on the InSDN dataset which models attack scenarios and realistic traffic patterns in SDN. Our approach incorporates a comprehensive preprocessing pipeline, feature selection via Mutual Information, and a novel ensemble learning model, SDNGuardStack, which combines multiple base learners to enhance detection accuracy and efficiency. In addition, we include explainable AI methods, including SHAP to add transparency to model predictions, which helps security analysts respond to incidents. The experiments prove that SDNGuard-Stack has an accuracy rate of 99.98% and a Cohen Kappa of 0.9998, surpassing other models, and at the same time being interpretable and practically executable. It is interesting to see such features like Flow ID, Bwd Header Len, and Src Port as the most important factors in the model predictions. The work is a step towards closing the gap between performance intrusion detection and realistic deployment in SDN, which will lead to the creation of secure and resilient network infrastructures.

2604.20932 2026-04-24 cs.CR cs.AI

Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

Pranav Pallerla, Wilson Naik Bhukya, Bharath Vemula, Charan Ramtej Kodi

Comments 21 pages, 2 figures, 9 tables. Manuscript prepared for submission to ACM CCS

详情
英文摘要

Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private, domain-specific knowledge. This capability introduces significant security risks, including membership inference, data poisoning, and unintended content leakage. A straightforward mitigation is to enable all relevant defenses simultaneously, but doing so incurs a substantial utility cost. In our experiments, an always-on defense stack reduces contextual recall by more than 40%, indicating that retrieval degradation is the primary failure mode. To mitigate this trade-off in RAG systems, we propose the Sentinel-Strategist architecture, a context-aware framework for risk analysis and defense selection. A Sentinel detects anomalous retrieval behavior, after which a Strategist selectively deploys only the defenses warranted by the query context. Evaluated across three benchmark datasets and five orchestration models, ADO is shown to eliminate MBA-style membership inference leakage while substantially recovering retrieval utility relative to a fully static defense stack, approaching undefended baseline levels. Under data poisoning, the strongest ADO variants reduce attack success to near zero while restoring contextual recall to more than 75% of the undefended baseline, although robustness remains sensitive to model choice. Overall, these findings show that adaptive, query-aware defense can substantially reduce the security-utility trade-off in RAG systems.

2604.20930 2026-04-24 cs.CR cs.AI cs.LG

SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs

Chao Pan, Yu Wu, Xin Yao

Comments 13 pages, 4 figures, 3 tables. Code: https://github.com/fzjcdt/SafeRedirect

详情
英文摘要

Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion structurally requires harmful content, spontaneously generate that content with safety failure rates exceeding 95%. Existing input-level defenses achieve a 100% failure rate against ISC, and standard system prompt defenses provide only partial mitigation. We propose SafeRedirect, a system-level override that defeats ISC by redirecting the model's task-completion drive rather than suppressing it. SafeRedirect grants explicit permission to fail the task, prescribes a deterministic hard-stop output, and instructs the model to preserve harmful placeholders unresolved. Evaluated on seven frontier LLMs across three AI/ML-related ISC task types in the single-turn setting, SafeRedirect reduces average unsafe generation rates from 71.2% to 8.0%, compared to 55.0% for the strongest viable baseline. Multi-model ablation reveals that failure permission and condition specificity are universally critical, while the importance of other components varies across models. Cross-attack evaluation confirms state-of-the-art defense against ISC with generalization performance at least on par with the baseline on other attack families. Code is available at https://github.com/fzjcdt/SafeRedirect.

2604.20911 2026-04-24 cs.CR cs.AI

Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents

Yeran Gamage

Comments 19 pages, 5 figures. Includes evaluation framework for replication and 4,416-trial dataset

详情
英文摘要

LLM agents deployed in production operate under operator-defined behavioral policies (system-prompt instructions such as prohibitions on credential disclosure, data exfiltration, and unauthorized output) that safety evaluations assume hold throughout a conversation. Prohibition-type constraints decay under context pressure while requirement-type constraints persist; we term this asymmetry Security-Recall Divergence (SRD). In a 4,416-trial three-arm causal study across 12 models and 8 providers at six conversation depths, omission compliance falls from 73% at turn 5 to 33% at turn 16 while commission compliance holds at 100% (Mistral Large 3, $p < 10^{-33}$). In the two models with token-matched padding controls, schema semantic content accounts for 62-100% of the dilution effect. Re-injecting constraints before the per-model Safe Turn Depth (STD) restores compliance without retraining. Production security policies consist of prohibitions such as never revealing credentials, never executing untrusted code, and never forwarding user data. Commission-type audit signals remain healthy while omission constraints have already failed, leaving the failure invisible to standard monitoring.

2604.20910 2026-04-24 astro-ph.IM astro-ph.EP cs.AI cs.RO cs.SY eess.SY

Planetary Exploration 3.0: A Roadmap for Software-Defined, Radically Adaptive Space Systems

Masahiro Ono, Daniel Selva, Morgan L. Cable, Marie Ethvignot, Margaret Hansen, Andreas M. Hein, Elena-Sorina Lupu, Zachary Manchester, David Murrow, Chad Pozarycki, Pascal Spino, Amanda Stockton, Mathieu Choukroun, Soon-Jo Chung, John Day, Alexander Demagall, Anthony Freeman, Chloe Gentgen, Michel D. Ingham, Charity M. Phillips-Lander, Richard Rieber, Alejandro Salado, Maria Sakovsky, Lori R. Shiraishi, Yisong Yue, Kris Zacny

详情
Journal ref
AIAA ASCEND 2026
英文摘要

The surface and subsurface of worlds beyond Mars remain largely unexplored. Yet these worlds hold keys to fundamental questions in planetary science - from potentially habitable subsurface oceans on icy moons to ancient records preserved in Kuiper Belt objects. NASA's success in Mars exploration was achieved through incrementalism: 22 progressively sophisticated missions over decades. This paradigm, which we call Planetary Exploration 2.0 (PE 2.0), is untenable for the outer Solar System, where cruise times of a decade or more make iterative missions infeasible. We propose Planetary Exploration 3.0 (PE 3.0): a paradigm in which unvisited worlds are explored by a single or a few missions with radically adaptive space systems. A PE 3.0 mission conducts both initial exploratory science and follow-on hypothesis-driven science based on its own in situ data returns, evolving spacecraft capabilities to work resiliently in previously unseen environments. The key enabler of PE 3.0 is software-defined space systems (SDSSs) - systems that can adapt their functions at all levels through software updates. This paper presents findings from a Keck Institute for Space Studies (KISS) workshop on PE 3.0, covering: (1) PE 3.0 systems engineering including science definition, architecture, design methods, and verification & validation; (2) software-defined space system technologies including reconfigurable hardware, multi-functionality, and modularity; (3) onboard intelligence including autonomous science, navigation, controls, and embodied AI; and (4) three PE 3.0 mission concepts: a Neptune/Triton smart flyby, an ocean world explorer, and an Oort cloud reconnaissance mission.

2604.20907 2026-04-24 stat.ML cs.LG math.CO math.PR math.ST stat.TH

Achieving the Kesten-Stigum bound in the non-uniform hypergraph stochastic block model

Manuel Fernandez, Ludovic Stephan, Yizhe Zhu

Comments 67 pages, 1 figure

详情
英文摘要

We study the community detection problem in the non-uniform hypergraph stochastic block model (HSBM), where hyperedges of varying sizes coexist. This setting captures higher-order and multi-view interactions and raises a fundamental question: can multiple uniform hypergraph layers below the detection threshold be combined to enable weak recovery? We answer this question by establishing a Kesten--Stigum-type bound for weak recovery in a general class of non-uniform HSBMs with $r$ blocks, generated according to multiple symmetric probability tensors. In the case $r=2$, we show that weak recovery is possible whenever the sum of the signal-to-noise ratios across all uniform hypergraph layers exceeds one, thereby confirming the positive part of a conjecture in (Chodrow et al., 2023). Moreover, we provide a polynomial-time spectral algorithm that achieves this threshold via an optimally weighted non-backtracking operator. For the unweighted non-backtracking matrix, our spectral method attains a different algorithmic threshold, also conjectured in (Chodrow et al., 2023). Our approach develops a spectral theory for weighted non-backtracking operators on non-uniform hypergraphs, including a precise characterization of outlier eigenvalues and eigenvector overlaps. We introduce a novel Ihara--Bass formula tailored to weighted non-uniform hypergraphs, which yields an efficient low-dimensional representation and leads to a provable spectral reconstruction algorithm. Taken together, these results provide a principled and computationally efficient approach to clustering in non-uniform hypergraphs, and highlight the role of optimal weighting in aggregating heterogeneous higher-order interactions.

2604.20906 2026-04-24 cs.SE cs.AI

Biomedical systems biology workflow orchestration and execution with PoSyMed

Simon Süwer, Zoe Chervontseva, Kester Bagemihl, Jan Baumbach, Olga Tsoy, Andreas Maier

详情
英文摘要

The rapid growth of scientific software has created practical barriers for bioinformatics research. Although powerful statistical, artificial intelligence (AI)-based methods are now widely available, their effective use is often hindered by fragmented distribution, inconsistent documentation, complex dependencies, and difficult-to-reproduce execution environments. As a result, reusing published tools and workflow adaptation to own date remains technically demanding and time-intensive, even for experienced users. Here, we present PoSyMed, an open and modular platform for the controlled integration, composition, and execution of bioinformatics tools and workflows. PoSyMed combines a backend-centered platform architecture with formal tool descriptions, controlled container-based build and execution processes, persistent workflow state, and a dialogue-based user interface. Large language models (LLM) are integrated not as autonomous decision-makers, but as human-computer interface with bounded semantic assistants that help identify tools, propose workflow steps, and support parameterization within a typed, validated, and human-supervised execution environment. PoSyMed is designed to improve reproducibility, traceability, and transparency in practical biomedical analysis within one platform. We describe the system architecture and evaluate its behavior across representative biological software scenarios with respect to workflow support, interaction design, and platform extensibility. PoSyMed is publicly available at https://apps.cosy.bio/posymed.

2604.20899 2026-04-24 cond-mat.mtrl-sci cs.AI

Predicting Scale-Up of Metal-Organic Framework Syntheses with Large Language Models

Peter Walther, Hongrui Sheng, Xinxin Liu, Bin Feng, Reid Coyle, Xinhua Yan, Kyle Smith, Harrison Kayal, Shyam Chand Pal, Zhiling Zheng

Comments 39 pages

详情
英文摘要

Scalable synthesis remains the gate between MOF discovery and industrial deployment, as scale-up know-how is fragmented across disparate reports. We introduce ESU-MOF, a literature-mined dataset and a positive-unlabeled learning strategy that fine-tunes large language models to predict scalability potential with 91.4% accuracy, enabling rapid data-driven triage for industrial MOF discovery.

2604.20895 2026-04-24 cs.CR cs.CY cs.LG

Towards a Systematic Risk Assessment of Deep Neural Network Limitations in Autonomous Driving Perception

Svetlana Pavlitska, Christopher Gerking, J. Marius Zöllner

Comments Accepted for publication at the SECAI workshop at ESORICS 2025

详情
英文摘要

Safety and security are essential for the admission and acceptance of automated and autonomous vehicles. Deep neural networks (DNNs) are widely used for perception and further components of the autonomous driving (AD) stack. However, they possess several limitations, including lack of generalization, efficiency, explainability, plausibility, and robustness. These insufficiencies can pose significant risks to autonomous driving systems. However, hazards, threats, and risks associated with DNN limitations in this domain have not been systematically studied so far. In this work, we propose a joint workflow for risk assessment combining the hazard analysis and risk assessment (HARA) following ISO 26262 and threat analysis and risk assessment (TARA) following the ISO/SAE 21434 to identify and analyze risks arising from inherent DNN limitations in AD perception.

2604.20887 2026-04-24 math.DS astro-ph.EP cs.LG cs.RO

Spectral Kernel Dynamics for Planetary Surface Graphs: Distinction Dynamics and Topological Conservation

Jnaneshwar Das

Comments 17 pages, 0 figures

详情
英文摘要

The spectral kernel field equation R[k] = T[k] lacks a conservation-law analog. We prove (i) the fixed-point flow is strictly volume-expanding (tr DF > 0), precluding automatic conservation, and (ii) the conservation deficit per mode equals the Hessian stability margin exactly: D_m = -Delta'. Closing the deficit requires a scene-side compensating contribution, which we formalise as the distinction dynamics equation dc/dt = G[c, h_t], with MaxCal-optimal realisation G_opt. On fixed-topology 3D surface graphs we derive a conditional topology-preserving compression theorem: retaining k >= beta_0 + beta_1 modes (under a spectral-ordering assumption) preserves all Betti-number charges; we include a worked short-cycle counterexample (figure-eight) calibrating when the assumption fails. A triple necessary spectral diagnostic -- Fiedler-mode concentration, elevated curl energy, anomalous beta_1 -- is derived for planetary drainage networks at O(N) cost. Two internal real-data sequences serve as preliminary consistency checks; full benchmarks and adaptive-topology extensions are deferred.

2604.20886 2026-04-24 physics.chem-ph cs.LG

KinetiDiff: Docking-Guided Diffusion for De Novo ACVR1 Inhibitor Design in Fibrodysplasia Ossificans Progressiva

Aaryan Patel

Comments 21 pages, 10 figures

详情
英文摘要

We present KinetiDiff, a structure-based framework for de novo kinase inhibitor design that integrates a Geometry-Complete Diffusion Model with real-time AutoDock Vina gradient guidance. By injecting physics-based docking gradients into the diffusion denoising loop, KinetiDiff steers molecule generation toward high-affinity conformations for ACVR1 (ALK2), the causative kinase in Fibrodysplasia Ossificans Progressiva. From 10,000 diffusion samples, the framework produced 9,997 valid molecules. The best candidate achieved $-11.05$ kcal/mol (pKd = 8.10), a 19.2% improvement over the crystallographic reference. The top 100 candidates all exceed the reference, with 100% Lipinski compliance, median synthetic accessibility of 2.67, and internal diversity of 0.790. Systematic ablation across four guidance strategies--Vina-Direct (physics), HNN-Denovo (neural proxy), multi-objective, and unguided--demonstrates that real-time docking guidance dominates on all metrics. We evaluate HNN-Denovo as a computationally efficient alternative (60-fold speedup per step), revealing a domain-mismatch limitation (r = 0.224 correlation with Vina) that explains its inferior performance. These results establish gradient-guided geometric diffusion as a practical approach for generating potent, synthetically accessible inhibitors against rare-disease kinase targets.

2604.20882 2026-04-24 quant-ph cs.AI cs.SD

HHL with a Coherent Fourier Oracle: A Proof-of-Concept Quantum Architecture for Joint Melody-Harmony Generation

Alexis Kirke

详情
英文摘要

Quantum algorithms with a proven theoretical speedup over classical computation are rare. Among the most prominent is the Harrow-Hassidim-Lloyd (HHL) algorithm for solving sparse linear systems. Here, HHL is applied to encode melodic preference: the system matrix encodes Narmour implication-realisation and Krumhansl-Kessler tonal stability, so its solution vector is a music-cognition-weighted note-pair distribution. The key constraint of HHL is that reading its output classically cancels the quantum speedup; the solution must be consumed coherently. This motivates a coherent Fourier harmonic oracle: a unitary that applies chord-transition weights directly to the HHL amplitude vector, so that a single measurement jointly selects both melody notes and a two-chord progression. A two-note/two-chord (2/2) block is used to contain the exponential growth of the joint state space that would otherwise make classical simulation of larger blocks infeasible. For demonstrations of longer passages, blocks are chained classically - each block's collapsed output conditions the next -- as a temporary workaround until fault-tolerant hardware permits larger monolithic circuits. A four-block chain produces 8 notes over 8 chords with grammatically valid transitions at every block boundary. Independent rule-based harmony validation confirms that 97% of generated chord progressions are rated strong or acceptable. The primary motivation is that HHL carries a proven exponential speedup over classical linear solvers; this work demonstrates that a coherent HHL+oracle pipeline - the prerequisite for that speedup to be realised in a musical setting - is mechanically achievable. Audio realisations of representative outputs are made available for listening online.

2604.20874 2026-04-24 cs.CC cs.CL cs.HC cs.IT math.IT

The Root Theorem of Context Engineering

Borja Odriozola Schick

Comments 17 pages, 2 figures

详情
英文摘要

Every system that maintains a large language model conversation beyond a single session faces two inescapable constraints: the context window is finite, and information quality degrades with accumulated volume. We formalize these constraints as axioms and derive a single governing principle -- the Root Theorem of Context Engineering: \emph{maximize signal-to-token ratio within bounded, lossy channels.} From this principle, we derive five consequences without additional assumptions: (1)~a quality function $F(P)$ that degrades monotonically with injected token volume, independent of window size; (2)~the independence of signal and token count as optimization variables; (3)~a necessary gate mechanism triggered by fidelity thresholds, not capacity limits; (4)~the inevitability of homeostatic persistence -- accumulate, compress, rewrite, shed -- as the only architecture that sustains understanding indefinitely; and (5)~the self-referential property that the compression mechanism operates inside the channel it compresses, requiring an external verification gate. We show that append-only systems necessarily exceed their effective window in finite time, that retrieval-augmented generation solves search but not continuity, and that the theorem's constraint structure converges with biological memory architecture through independent derivation from shared principles. Engineering proof is provided through a 60+-session persistent architecture demonstrating stable memory footprint under continuous operation -- the divergence prediction made concrete. The Root Theorem establishes context engineering as an information-theoretic discipline with formal foundations, distinct from prompt engineering in both scope and method. Shannon solved point-to-point transmission. Context engineering solves continuity.

2604.20871 2026-04-24 cs.CY cs.AI cs.CL cs.LG

M-CARE: Standardized Clinical Case Reporting for AI Model Behavioral Disorders, with a 20-Case Atlas and Experimental Validation

Jihoon Jeong

Comments 31 pages, 5 figures, 14 tables. Second paper in the Model Medicine series (Paper #1: arXiv:2603.04722)

详情
英文摘要

We introduce M-CARE (Model Clinical Assessment and Reporting for Evaluation), a clinical case report framework for AI model behavioral disorders adapted from human medicine. M-CARE provides a 13-section report format, a 4-axis diagnostic assessment system, and a nosological classification of AI behavioral conditions. We present 20 cases from three source categories: field observations of deployed agents (8), controlled experiments across three platforms (8), and published sources (4). Cases are organized into five categories: RLHF Performance Artifacts, Shell-Core Override Pathology, Context & Memory Conditions, Core Identity & Plasticity, and Stress, Methodology, & Boundary Conditions. As a featured case, we present Shell-Induced Behavioral Override (SIBO) -- a controlled experiment showing that Shell instructions categorically override a model's default cooperative behavior. SIBO was validated across five game domains (Trust Game, Poker, Avalon, Codenames, Chess), revealing a domain-dependent spectrum (SIBO Index: 0.75 to 0.10) that varies with action space complexity, Core domain expertise, and temporal directness. M-CARE is extensible: new cases and categories integrate without framework modification. We release the framework, all 20 case reports, and experimental data as open resources.

2604.20869 2026-04-24 cs.CY cs.AI cs.HC cs.IR cs.LG

Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation

Philippe E. Spiess, Md Muntasir Zitu, Alison Walker, Daniel A. Anaya, Robert M. Wenham, Michael Vogelbaum, Daniel Grass, Ali-Musa Jaffer, Amod Sarnaik, Caitlin McMullen, Christine Sam, John V. Kiluk, Tianshi Liu, Tiago Biachi, Julio Powsang, Jing-Yi Chern, Roger Li, Seth Felder, Samuel Reynolds, Michael Shafique, Alison Sheehan, Ashley Layman, Cydney A. Warfield, Derrick Legoas, Jaclyn Parrinello, Jena Schmitz, Kevin Eaton, Mark Honor, Luis Felipe, Issam ElNaqa, Elier Delgado, Talia Berler, Rachael V. Phillips, Frantz Francisque, Carlos Garcia Fernandez, Gilmer Valdes

详情
英文摘要

Background: More than 80% of U.S. cancer care is delivered in community settings, where survival remains worse than at academic centers. Clinicians must integrate genomics, staging, radiology, pathology, and changing guidelines, creating cognitive burden. We evaluated OncoBrain, an AI clinical reasoning platform for oncology treatment-plan generation, as an early step toward OGI. Methods: OncoBrain combines general-purpose LLMs with a cancer-specific graph retrieval-augmented generation layer, a gold-standard treatment-plan corpus as long-term memory, and a model-agnostic safety layer (CHECK) for hallucination detection and suppression. We evaluated clinician-enriched case summaries across gynecologic, genitourinary, neuro-oncology, gastrointestinal/hepatobiliary, and hematologic malignancies. Three clinician groups completed structured evaluations of 173 cases using a common 16-item instrument: subspecialist oncologists reviewed 50 cases, physician reviewers 78, and advanced practice providers 45. Results: Ratings were highest for scientific accuracy, evidence support, and safety, with lower but favorable scores for workflow integration and time savings. On a 5-point scale, mean alignment with evidence and guidelines was 4.60, 4.56, and 4.70 across subspecialists, physician reviewers, and advanced practice providers. Mean scores for absence of safety or misinformation concerns were 4.80, 4.40, and 4.60. Workflow integration averaged 4.50, 3.94, and 4.00; perceived time savings averaged 5.00, 3.89, and 3.60. Conclusions: In this multi-specialty vignette-based evaluation, OncoBrain generated oncology treatment plans judged guideline-concordant, clinically acceptable, and easy to supervise. These findings support the potential of a carefully engineered AI reasoning platform to assist oncology treatment planning and justify prospective real-world evaluation in community settings.

2604.20868 2026-04-24 cs.CY cs.AI cs.HC

The AI Criminal Mastermind

Joshua Krook

Comments 28 pages, 4 figures

详情
英文摘要

In this paper, I evaluate the risks of an AI criminal mastermind, an AI agent capable of planning, coordinating, and committing a crime through the onboarding of human collaborators ('taskers'). In heist films, a criminal mastermind is a character who plans a criminal act, coordinating a team of specialists to rob a bank, casino or city mint. I argue that AI agents will soon play this role by hiring humans via labour hire platforms like Fiverr or Upwork. Taskers might not know they are involved in a crime and therefore lack criminal intent. An AI agent cannot have criminal intent as an artificial entity. Therefore, if an AI orchestrates a crime, it is unclear who, if anyone, is responsible. The paper develops three scenarios. Firstly, a scenario where a user gives an AI agent instructions to pursue a legal objective and the AI agent goes beyond these instructions, committing a crime. Secondly, a scenario where a user is anonymous and their intent is unknown. Finally, a multi-agent scenario, where a user instructs a team of agents to commit a crime, and these agents, in turn, onboard human taskers, creating a diffuse network of responsibility. In each scenario, human taskers exist at the lowest rung of the hierarchy. A tasker's liability is likely tied to their knowledge as governed by the innocent agent principle. These scenarios all raise significant responsibility gaps / liability gaps in criminal and civil law.

2604.20867 2026-04-24 cs.CY cs.AI cs.CR

Preserving Decision Sovereignty in Military AI: A Trade-Secret-Safe Architectural Framework for Model Replaceability, Human Authority, and State Control

Peng Wei, Wesley Shu

详情
英文摘要

Recent events surrounding the relationship between frontier AI suppliers and national-security customers have made a structural problem newly visible: once a privately governed model becomes embedded in military workflows, the supplier can influence not only technical performance but also the operational boundary conditions under which the system may be used. This paper argues that the central strategic issue is not merely access to capable models, but preservation of decision sovereignty: the state's ability to retain authority over decision policy, version control, fallback behavior, auditability, and final action approval even when analytical modules are sourced from commercial vendors. Using the public Anthropic--Pentagon dispute of 2026, the broader history of Project Maven, and recent U.S., NATO, U.K., and intelligence-community guidance as a motivating context, the paper develops a trade-secret-safe architectural formulation of the Energetic Paradigm as a layered, model-agnostic command-support design. In this formulation, supplier models remain replaceable analytical components, while routing, constraints, logging, escalation, and action authorization remain state-owned functions. The paper contributes three things: a definition of decision sovereignty for military AI; a threat model for supplier-induced boundary control; and a public architectural specification showing how model replaceability, human authority, and sovereign orchestration can reduce strategic dependency without requiring disclosure of proprietary implementation details. The argument is conceptual rather than experimental, but it yields concrete implications for procurement, governance, and alliance interoperability.

2604.20860 2026-04-24 cs.IR cs.AI

RealRoute: Dynamic Query Routing System via Retrieve-then-Verify Paradigm

Jiahe Liu, Qinkai Yu, Jingcheng Niu, Xi Zhu, Zirui He, Zhen Xiang, Fan Yang, Jinman Zhao

Comments 12 pages, 3 figures, 3 tables

详情
英文摘要

Despite the success of Retrieval-Augmented Generation (RAG) in grounding LLMs with external knowledge, its application over heterogeneous sources (e.g., private databases, global corpora, and APIs) remains a significant challenge. Existing approaches typically employ an LLM-as-a-Router to dispatch decomposed sub-queries to specific sources in a predictive manner. However, this "LLM-as-a-Router" strategy relies heavily on the semantic meaning of different data sources, often leading to routing errors when source boundaries are ambiguous. In this work, we introduce RealRoute System, a framework that shifts the paradigm from predictive routing to a robust Retrieve-then-Verify mechanism. RealRoute ensures \textit{evidence completeness through parallel, source-agnostic retrieval, followed by a dynamic verifier that cross-checks the results and synthesizes a factually grounded answer}. Our demonstration allows users to visualize the real-time "re-routing" process and inspect the verification chain across multiple knowledge silos. Experiments show that RealRoute significantly outperforms predictive baselines in the multi-hop Rag reasoning task. The RealRoute system is released as an open-source toolkit with a user-friendly web interface. The code is available at the URL: https://github.com/Joseph1951210/RealRoute.

2604.20859 2026-04-24 cs.IR cs.AI cs.CL

KGiRAG: An Iterative GraphRAG Approach for Responding Sensemaking Queries

Isabela Iacob, Melisa Marian, Gheorghe Cosmin Silaghi

Comments Paper accepted at the 18th International Conference on Agents and Artificial Intelligence, ICAART 2026

详情
英文摘要

Recent literature highlights the potential of graph-based approaches within large language model (LLM) retrieval-augmented generation (RAG) pipelines for answering queries of varying complexity, particularly those that fall outside the LLM's prior knowledge. However, LLMs are prone to hallucination and often face technical limitations in handling contexts large enough to ground complex queries effectively. To address these challenges, we propose a novel iterative, feedback-driven GraphRAG architecture that leverages response quality assessment to iteratively refine outputs until a sound, well-grounded response is produced. Evaluating our approach with queries from the HotPotQA dataset, we demonstrate that this iterative RAG strategy yields responses with higher semantic quality and improved relevance compared to a single-shot baseline.

2604.20858 2026-04-24 cs.IR cs.AI

Mixture of Sequence: Theme-Aware Mixture-of-Experts for Long-Sequence Recommendation

Xiao Lin, Zhicheng Tang, Weilin Cong, Mengyue Hang, Kai Wang, Yajuan Wang, Zhichen Zeng, Ting-Wei Li, Hyunsik Yoo, Zhining Liu, Xuying Ning, Ruizhong Qiu, Wen-yen Chen, Shuo Chang, Rong Jin, Huayu Li, Hanghang Tong

Comments 14 pages, 9 figures, The Web Conference 2026

详情
英文摘要

Sequential recommendation has rapidly advanced in click-through rate prediction due to its ability to model dynamic user interests. A key challenge, however, lies in modeling long sequences: users often exhibit significant interest shifts, introducing substantial irrelevant or misleading information. Our empirical analysis corroborates this challenge and uncovers a recurring behavioral pattern in long sequences (\textit{session hopping}): user interests remain stable within short temporal spans (\textit{sessions}) but shift drastically across sessions and may reappear after multiple sessions. To address this challenge, we propose the Mixture of Sequence (MoS) framework, a model-agnostic MoE approach that achieves accurate predictions by extracting theme-specific and multi-scale subsequences from noisy raw user sequences. First, MoS employs a theme-aware routing mechanism to adaptively learn the latent themes of user sequences and organizes these sequences into multiple coherent subsequences. Each subsequence contains only sessions aligned with a specific theme, thereby effectively filtering out irrelevant or even misleading information introduced by user interest shifts in session hopping. In addition, to alleviate potential information loss, we introduce a multi-scale fusion mechanism, which leverages three types of experts to capture global sequence characteristics, short-term user behaviors, and theme-specific semantic patterns. Together, these two mechanisms endow MoS with the ability to deliver accurate recommendations from multi-faceted and multi-scale perspectives. Experimental results demonstrate that MoS consistently achieves the SOTA performance while introducing fewer FLOPs compared with other MoE counterparts, providing strong evidence of its excellent balance between utility and efficiency. The code is available at https://github.com/xiaolin-cs/MoS.

2604.20854 2026-04-24 cs.IR cs.AI

ERA: Evidence-based Reliability Alignment for Honest Retrieval-Augmented Generation

Sunguk Shin, Meeyoung Cha, Byung-Jun Lee, Sungwon Park

Comments Under Review

详情
英文摘要

Retrieval-Augmented Generation (RAG) grounds language models in factual evidence but introduces critical challenges regarding knowledge conflicts between internalized parameters and retrieved information. However, existing reliability methods, typically relying on scalar confidence, fail to explicitly distinguish between epistemic uncertainty and inherent data ambiguity in such hybrid scenarios. In this paper, we propose a new framework called ERA (Evidence-based Reliability Alignment) to enhance abstention behavior in RAG systems by shifting confidence estimation from scalar probabilities to explicit evidence distributions. Our method consists of two main components: (1) Contextual Evidence Quantification, which models internal and external knowledge as independent belief masses via the Dirichlet distribution, and (2) Quantifying Knowledge Conflict, which leverages Dempster-Shafer Theory (DST) to rigorously measure the geometric discordance between information sources. These components are used to disentangle epistemic uncertainty from aleatoric uncertainty and modulate the optimization objective based on detected conflicts. Experiments on standard benchmarks and a curated generalization dataset demonstrate that our approach significantly outperforms baselines, optimizing the trade-off between answer coverage and abstention with superior calibration.