arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1864
专题追踪
2603.29181 2026-04-01 eess.IV cs.CV

Retinal Malady Classification using AI: A novel ViT-SVM combination architecture

Shashwat Jha, Vishvaditya Luhach, Raju Poddar

Journal ref 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2022, pp. 1659-1664

详情
英文摘要

Macular Holes, Central serous retinopathy and Diabetic Retinopathy are one of the most widespread maladies of the eyes responsible for either partial or complete vision loss, thus making it clear that early detection of the mentioned defects is detrimental for the well-being of the patient. This study intends to introduce the application of Vision Transformer and Support Vector Machine based hybrid architecture (ViT-SVM) and analyse its performance to classify the optical coherence topography (OCT) Scans with the intention to automate the early detection of these retinal defects.

2603.29176 2026-04-01 q-bio.NC cs.AI cs.CE cs.CV

Predicting Neuromodulation Outcome for Parkinson's Disease with Generative Virtual Brain Model

Siyuan Du, Siyi Li, Shuwei Bai, Ang Li, Haolin Li, Mingqing Xiao, Yang Pan, Dongsheng Li, Weidi Xie, Yanfeng Wang, Ya Zhang, Chencheng Zhang, Jiangchao Yao

详情
英文摘要

Parkinson's disease (PD) affects over ten million people worldwide. Although temporal interference (TI) and deep brain stimulation (DBS) are promising therapies, inter-individual variability limits empirical treatment selection, increasing non-negligible surgical risk and cost. Previous explorations either resort to limited statistical biomarkers that are insufficient to characterize variability, or employ AI-driven methods which is prone to overfitting and opacity. We bridge this gap with a pretraining-finetuning framework to predict outcomes directly from resting-state fMRI. Critically, a generative virtual brain foundation model, pretrained on a collective dataset (2707 subjects, 5621 sessions) to capture universal disorder patterns, was finetuned on PD cohorts receiving TI (n=51) or DBS (n=55) to yield individualized virtual brains with high fidelity to empirical functional connectivity (r=0.935). By constructing counterfactual estimations between pathological and healthy neural states within these personalized models, we predicted clinical responses (TI: AUPR=0.853; DBS: AUPR=0.915), substantially outperforming baselines. External and prospective validations (n=14, n=11) highlight the feasibility of clinical translation. Moreover, our framework provides state-dependent regional patterns linked to response, offering hypothesis-generating mechanistic insights.

2603.29140 2026-04-01 cs.SE cs.AI cs.CL cs.FL

Designing FSMs Specifications from Requirements with GPT 4.0

Omer Nguena Timo, Paul-Alexis Rodriguez, Florent Avellaneda

详情
英文摘要

Finite state machines (FSM) are executable formal specifications of reactive systems. These machines are designed based on systems' requirements. The requirements are often recorded in textual documents written in natural languages. FSMs play a crucial role in different phases of the model-driven system engineering (MDE). For example, they serve to automate testing activities. FSM quality is critical: the lower the quality of FSM, the higher the number of faults surviving the testing phase and the higher the risk of failure of the systems in production, which could lead to catastrophic scenarios. Therefore, this paper leverages recent advances in the domain of LLM to propose an LLM-based framework for designing FSMs from requirements. The framework also suggests an expert-centric approach based on FSM mutation and test generation for repairing the FSMs produced by LLMs. This paper also provides an experimental analysis and evaluation of LLM's capacities in performing the tasks presented in the framework and FSM repair via various methods. The paper presents experimental results with simulated data. These results and methods bring a new analysis and vision of LLMs that are useful for further development of machine learning technology and its applications to MDE.

2603.29128 2026-04-01 math.OC cs.LG

Adaptive Delayed-Update Cyclic Algorithm for Variational Inequalities

Yi Wei, Xufeng Cai, Jelena Diakonikolas

详情
英文摘要

Cyclic block coordinate methods are a fundamental class of first-order algorithms, widely used in practice for their simplicity and strong empirical performance. Yet, their theoretical behavior remains challenging to explain, and setting their step sizes -- beyond classical coordinate descent for minimization -- typically requires careful tuning or line-search machinery. In this work, we develop $\texttt{ADUCA}$ (Adaptive Delayed-Update Cyclic Algorithm), a cyclic algorithm addressing a broad class of Minty variational inequalities with monotone Lipschitz operators. $\texttt{ADUCA}$ is parameter-free: it requires no global or block-wise Lipschitz constants and uses no per-epoch line search, except at initialization. A key feature of the algorithm is using operator information delayed by a full cycle, which makes the algorithm compatible with parallel and distributed implementations, and attractive due to weakened synchronization requirements across blocks. We prove that $\texttt{ADUCA}$ attains (near) optimal global oracle complexity as a function of target error $ε>0,$ scaling with $1/ε$ for monotone operators, or with $\log^2(1/ε)$ for operators that are strongly monotone.

2603.29121 2026-04-01 econ.GN cs.AI cs.CY q-fin.EC

Economics of Human and AI Collaboration: When is Partial Automation More Attractive than Full Automation?

Wensu Li, Atin Aboutorabi, Harry Lyu, Kaizhi Qian, Martin Fleming, Brian C. Goehring, Neil Thompson

详情
英文摘要

This paper develops a unified framework for evaluating the optimal degree of task automation. Moving beyond binary automate-or-not assessments, we model automation intensity as a continuous choice in which firms minimize costs by selecting an AI accuracy level, from no automation through partial human-AI collaboration to full automation. On the supply side, we estimate an AI production function via scaling-law experiments linking performance to data, compute, and model size. Because AI systems exhibit predictable but diminishing returns to these inputs, the cost of higher accuracy is convex: good performance may be inexpensive, but near-perfect accuracy is disproportionately costly. Full automation is therefore often not cost-minimizing; partial automation, where firms retain human workers for residual tasks, frequently emerges as the equilibrium. On the demand side, we introduce an entropy-based measure of task complexity that maps model accuracy into a labor substitution ratio, quantifying human labor displacement at each accuracy level. We calibrate the framework with O*NET task data, a survey of 3,778 domain experts, and GPT-4o-derived task decompositions, implementing it in computer vision. Task complexity shapes substitution: low-complexity tasks see high substitution, while high-complexity tasks favor limited partial automation. Scale of deployment is a key determinant: AI-as-a-Service and AI agents spread fixed costs across users, sharply expanding economically viable tasks. At the firm level, cost-effective automation captures approximately 11% of computer-vision-exposed labor compensation; under economy-wide deployment, this share rises sharply. Since other AI systems exhibit similar scaling-law economics, our mechanisms extend beyond computer vision, reinforcing that partial automation is often the economically rational long-run outcome, not merely a transitional phase.

2603.29119 2026-04-01 eess.SY cs.LG cs.SY math.OC

Sampling-Horizon Neural Operator Predictors for Nonlinear Control under Delayed Inputs

Luke Bhan, Peter Quawas, Miroslav Krstic, Yuanyuan Shi

Comments 6 pages

详情
英文摘要

Modern control systems frequently operate under input delays and sampled state measurements. A common delay-compensation strategy is predictor feedback; however, practical implementations require solving an implicit ODE online, resulting in intractable computational cost. Moreover, predictor formulations typically assume continuously available state measurements, whereas in practice measurements may be sampled, irregular, or temporarily missing due to hardware faults. In this work, we develop two neural-operator predictor-feedback designs for nonlinear systems with delayed inputs and sampled measurements. In the first design, we introduce a sampling-horizon prediction operator that maps the current measurement and input history to the predicted state trajectory over the next sampling interval. In the second design, the neural operator approximates only the delay-compensating predictor, which is then composed with the closed-loop flow between measurements. The first approach requires uniform sampling but yields residual bounds that scale directly with the operator approximation error. In contrast, the second accommodates non-uniform, but bounded sampling schedules at the cost of amplified approximation error, revealing a practical tradeoff between sampling flexibility and approximation sensitivity for the control engineer. For both schemes, we establish semi-global practical stability with explicit neural operator error-dependent bounds. Numerical experiments on a 6-link nonlinear robotic manipulator demonstrate accurate tracking and substantial computational speedup of 25$\times$ over a baseline approach.

2603.29118 2026-04-01 cs.HC cs.AI

"I Just Need GPT to Refine My Prompts": Rethinking Onboarding and Help-Seeking with Generative 3D Modeling Tools

Kanak Gautam, Poorvi Bhatia, Parmit K. Chilana

Comments 16 pages, 10 figures, CHI 2026 submission

详情
英文摘要

Learning to use feature-rich software is a persistent challenge, but generative AI tools promise to lower this barrier by replacing complex navigation with natural language prompts. We investigated how people approach prompt-based tools for 3D modeling in an observational study with 26 participants (14 casuals, 12 professionals). Consistent with earlier work, participants skipped tutorials and manuals, relying on trial and error. What differed in the generative AI context was how and why they sought support: the prompt box became the entry point for learning, collapsing onboarding into immediate action, while some casual users turned to external LLMs for prompts. Professionals used 3D expertise to refine iterations and critically evaluated outputs, often discarding models that did not meet their standards, whereas casual users settled for "good enough." We contribute empirical insights into how generative AI reshapes help-seeking, highlighting new practices of onboarding, recursive AI-for-AI support, and shifting expertise in interpreting outputs.

2603.29117 2026-04-01 eess.SY cs.LG cs.SY math.OC

Predictor-Based Output-Feedback Control of Linear Systems with Time-Varying Input and Measurement Delays via Neural-Approximated Prediction Horizons

Luke Bhan, Miroslav Krstic, Yuanyuan Shi

Comments 11 Pages. Preprint

详情
英文摘要

Due to simplicity and strong stability guarantees, predictor feedback methods have stood as a popular approach for time delay systems since the 1950s. For time-varying delays, however, implementation requires computing a prediction horizon defined by the inverse of the delay function, which is rarely available in closed form and must be approximated. In this work, we formulate the inverse delay mapping as an operator learning problem and study predictor feedback under approximation of the prediction horizon. We propose two approaches: (i) a numerical method based on time integration of an equivalent ODE, and (ii) a data-driven method using neural operators to learn the inverse mapping. We show that both approaches achieve arbitrary approximation accuracy over compact sets, with complementary trade-offs in computational cost and scalability. Building on these approximations, we then develop an output-feedback predictor design for systems with delays in both the input and the measurement. We prove that the resulting closed-loop system is globally exponentially stable when the prediction horizon is approximated with sufficiently small error. Lastly, numerical experiments validate the proposed methods and illustrate their trade-offs between accuracy and computational efficiency.

2603.29115 2026-04-01 astro-ph.GA cs.CV

Schrödinger's Seed: Purr-fect Initialization for an Impurr-fect Universe

Mi chen, Renhao Ye

Comments 3 pages, 1 figure, 21 cats

详情
英文摘要

Context. Random seed selection in deep learning is often arbitrary -- conventionally fixed to values such as 42, a number with no known feline endorsement. Aims. We propose that cats, as liminal beings with a historically ambiguous relationship to quantum mechanics, are better suited to this task than random integers. Methods. We construct a cat-driven seed generator inspired by the first Friedmann equation, and test it by mapping 21 domestic cats' physical properties -- mass, coat pattern, eye colour, and name entropy -- via a Monte ``Catlo'' sampling procedure. Results. Cat-driven seeds achieve a mean accuracy of 92.58%, outperforming the baseline seed of 42 by $\sim$2.5%. Cats from astrophysicist households perform marginally better, suggesting cosmic insight may be contagious. Conclusions. The Universe responds better to cats than to arbitrary integers. Whether cats are aware of this remains unknown.

2603.29114 2026-04-01 cs.SE cs.AI

Towards Explainable Stakeholder-Aware Requirements Prioritisation in Aged-Care Digital Health

Yuqing Xiao, John Grundy, Anuradha Madugalla, Elizabeth Manias

详情
英文摘要

Requirements engineering for aged-care digital health must account for human aspects, because requirement priorities are shaped not only by technical functionality but also by stakeholders' health conditions, socioeconomics, and lived experience. Knowing which human aspects matter most, and for whom, is critical for inclusive and evidence-based requirements prioritisation. Yet in practice, while some studies have examined human aspects in RE, they have largely relied on expert judgement or model-driven analysis rather than large-scale user studies with meaningful human-in-the-loop validation to determine which aspects matter most and why. To address this gap, we conducted a mixed-methods study with 103 older adults, 105 developers, and 41 caregivers. We first applied an explainable machine learning to identify the human aspects most strongly associated with requirement priorities across 8 aged-care digital health themes, and then conducted 12 semi-structured interviews to validate and interpret the quantitative patterns. The results identify the key human aspects shaping requirement priorities, reveal their directional effects, and expose substantial misalignment across stakeholder groups. Together, these findings show that human-centric requirements analysis should engage stakeholder groups explicitly rather than collapsing their perspectives into a single aggregate view. This paper contributes an identification of the key human aspects driving requirement priorities in aged-care digital health and an explainable, human-centric RE framework that combines ML-derived importance rankings with qualitative validation to surface the stakeholder misalignments that inclusive requirements engineering must address.

2603.29109 2026-04-01 cs.SE cs.AI

SemLoc: Structured Grounding of Free-Form LLM Reasoning for Fault Localization

Zhaorui Yang, Haichao Zhu, Qian Zhang, Rajiv Gupta, Ashish Kundu

详情
英文摘要

Fault localization identifies program locations responsible for observed failures. Existing techniques rank suspicious code using syntactic spectra--signals derived from execution structure such as statement coverage, control-flow divergence, or dependency reachability. These signals collapse for semantic bugs, where failing and passing executions follow identical code paths and differ only in whether semantic intent is satisfied. Recent LLM-based approaches introduce semantic reasoning but produce stochastic, unverifiable outputs that cannot be systematically cross-referenced across tests or distinguish root causes from cascading effects. We present SemLoc, a fault localization framework based on structured semantic grounding. SemLoc converts free-form LLM reasoning into a closed intermediate representation that binds each inferred property to a typed program anchor, enabling runtime checking and attribution to program structure. It executes instrumented programs to construct a semantic violation spectrum--a constraint-by-test matrix--from which suspiciousness scores are derived analogously to coverage-based methods. A counterfactual verification step further prunes over-approximate constraints and isolates primary causal violations. We evaluate SemLoc on SemFault-250, a corpus of 250 Python programs with single semantic faults. SemLoc outperforms five coverage-, reduction-, and LLM-based baselines, achieving Top-1 accuracy of 42.8% and Top-3 of 68%, while reducing inspection to 7.6% of executable lines. Counterfactual verification provides an additional 12% accuracy gain and identifies primary causal semantic constraints.

2603.29094 2026-04-01 cs.HC cs.AI

Evaluating a Data-Driven Redesign Process for Intelligent Tutoring Systems

Qianru Lyu, Conrad Borchers, Meng Xia, Karen Xiao, Paulo F. Carvalho, Kenneth R. Koedinger, Vincent Aleven

Comments Accepted as short paper to the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

详情
英文摘要

Past research has defined a general process for the data-driven redesign of educational technologies and has shown that in carefully-selected instances, this process can help make systems more effective. In the current work, we test the generality of the approach by applying it to four units of a middle-school mathematics intelligent tutoring system that were selected not based on suitability for redesign, as in previous work, but on topic. We tested whether the redesigned system was more effective than the original in a classroom study with 123 students. Although the learning gains did not differ between the conditions, students who used the Redesigned Tutor had more productive time-on-task, a larger number of skills practiced, and greater total knowledge mastery. The findings highlight the promise of data-driven redesign even when applied to instructional units *not* selected as likely to yield improvement, as evidence of the generality and wide applicability of the method.

2603.29072 2026-04-01 cond-mat.stat-mech cs.LG math.AT

How much of persistent homology is topology? A quantitative decomposition for spin model phase transitions

Matthew Loftus

Comments 7 pages, 4 figures, 2 tables

详情
英文摘要

Point-cloud persistent homology (PH) -- computing alpha or Rips complexes on spin-position point clouds -- has been widely applied to detect phase transitions in classical spin models since Donato et al. (2016), with subsequent studies attributing the detection to the topological content of the persistence diagram. We ask a simple question that has not been posed: what fraction of the PH signal is genuinely topological? We introduce f_topo, a quantitative decomposition that separates the density-driven and topological contributions to any PH statistic by comparing real spin configurations against density-matched shuffled null models. Across the 2D Ising model (system sizes L = 16-128, ten temperatures) and Potts models (q = 3, 5), we find that H_0 statistics -- total persistence, persistence entropy, feature count -- are 94-100% density-driven (f_topo < 0.07). The density-matched shuffled null detects T_c at the identical location and with comparable peak height as real configurations, showing that density alone is sufficient for phase transition detection. However, H_1 statistics are partially topological: the topological fraction grows with system size as delta(TP_{H_1}) ~ L^{0.53} and follows a finite-size scaling collapse delta(T, L) = L^{0.53} g(tL^{1/nu}) with collapse quality CV = 0.27. The longest persistence bar is strongly topological (f_topo > 1) and scales with the correlation length. A scale-resolved analysis reveals that the topological excess shifts from large-scale to small-scale features as L increases. We propose that the TDA-for-phase-transitions community adopt shuffled null models as standard practice, and that H_1 rather than H_0 statistics be used when genuine topological information is sought.

2603.29062 2026-04-01 cs.CR cs.AI

CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks

KrishnaSaiReddy Patil

Comments 25 pages, 17 tables, 2 figures

详情
英文摘要

LLM-based chatbots in government services face critical security gaps. Multi-turn adversarial attacks achieve over 90% success against current defenses, and single-layer guardrails are bypassed with similar rates. We present CivicShield, a cross-domain defense-in-depth framework for government-facing AI chatbots. Drawing on network security, formal verification, biological immune systems, aviation safety, and zero-trust cryptography, CivicShield introduces seven defense layers: (1) zero-trust foundation with capability-based access control, (2) perimeter input validation, (3) semantic firewall with intent classification, (4) conversation state machine with safety invariants, (5) behavioral anomaly detection, (6) multi-model consensus verification, and (7) graduated human-in-the-loop escalation. We present a formal threat model covering 8 multi-turn attack families, map the framework to NIST SP 800-53 controls across 14 families, and evaluate using ablation analysis. Theoretical analysis shows layered defenses reduce attack probability by 1-2 orders of magnitude versus single-layer approaches. Simulation against 1,436 scenarios including HarmBench (416), JailbreakBench (200), and XSTest (450) achieves 72.9% combined detection [69.5-76.0% CI] with 2.9% effective false positive rate after graduated response, while maintaining 100% detection of multi-turn crescendo and slow-drift attacks. The honest drop on real benchmarks versus author-generated scenarios (71.2% vs 76.7% on HarmBench, 47.0% vs 70.0% on JailbreakBench) validates independent evaluation importance. CivicShield addresses an open gap at the intersection of AI safety, government compliance, and practical deployment.

2603.29051 2026-04-01 physics.flu-dyn cs.LG

Data-informed lifting line theory

Arjun Sharma, Jonas A. Actor, Peter A. Bosler

Comments 22 pages, 6 figures

详情
英文摘要

We present a data-driven framework that extends the predictive capability of classical lifting-line theory (LLT) to a wider aerodynamic regime by incorporating higher-fidelity aerodynamic data from panel method simulations. A neural network architecture with a convolutional layer followed by fully connected layers is developed, comprising two parallel subnetworks to separately process spanwise collocation points and global geometric/aerodynamic inputs such as angle of attack, chord, twist, airfoil distribution, and sweep. Among several configurations tested, this architecture is most effective in learning corrections to LLT outputs. The trained model captures higher-order three-dimensional effects in spanwise lift and drag distributions in regimes where LLT is inaccurate, such as low aspect ratios and high sweep, and generalizes well to wing configurations outside both the LLT regime and the training data range. The method retains LLT's computational efficiency, enabling integration into aerodynamic optimization loops and early-stage aircraft design studies. This approach offers a practical path for embedding high-fidelity corrections into low-order methods and may be extended to other aerodynamic prediction tasks, such as propeller performance.

2603.29050 2026-04-01 eess.SY cs.RO cs.SY math.DS math.OC

Stable Walking for Bipedal Locomotion under Foot-Slip via Virtual Nonholonomic Constraints

Leonardo Colombo, Álvaro Rodríguez Abella, Alexandre Anahory Simoes, Anthony Bloch

详情
英文摘要

Foot slip is a major source of instability in bipedal locomotion on low-friction or uncertain terrain. Standard control approaches typically assume no-slip contact and therefore degrade when slip occurs. We propose a control framework that explicitly incorporates slip into the locomotion model through virtual nonholonomic constraints, which regulate the tangential stance-foot velocity while remaining compatible with the virtual holonomic constraints used to generate the walking gait. The resulting closed-loop system is formulated as a hybrid dynamical system with continuous swing dynamics and discrete impact events. A nonlinear feedback law enforces both classes of constraints and yields a slip-compatible hybrid zero dynamics manifold for the reduced-order locomotion dynamics. Stability of periodic walking gaits is characterized through the associated Poincaré map, and numerical results illustrate stabilization under slip conditions.

2603.29038 2026-04-01 cs.CR cs.AI cs.CL

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Bilgehan Sel, Xuanli He, Alwin Peng, Ming Jin, Jerry Wei

详情
英文摘要

Fine-tuning APIs offered by major AI providers create new attack surfaces where adversaries can bypass safety measures through targeted fine-tuning. We introduce Trojan-Speak, an adversarial fine-tuning method that bypasses Anthropic's Constitutional Classifiers. Our approach uses curriculum learning combined with GRPO-based hybrid reinforcement learning to teach models a communication protocol that evades LLM-based content classification. Crucially, while prior adversarial fine-tuning approaches report more than 25% capability degradation on reasoning benchmarks, Trojan-Speak incurs less than 5% degradation while achieving 99+% classifier evasion for models with 14B+ parameters. We demonstrate that fine-tuned models can provide detailed responses to expert-level CBRN (Chemical, Biological, Radiological, and Nuclear) queries from Anthropic's Constitutional Classifiers bug-bounty program. Our findings reveal that LLM-based content classifiers alone are insufficient for preventing dangerous information disclosure when adversaries have fine-tuning access, and we show that activation-level probes can substantially improve robustness to such attacks.

2603.28999 2026-04-01 math.OC cs.LG stat.ML

Transfer Learning in Bayesian Optimization for Aircraft Design

Ali Tfaily, Youssef Diouane, Nathalie Bartoli, Michael Kokkolaras

详情
英文摘要

The use of transfer learning within Bayesian optimization addresses the disadvantages of the so-called \textit{cold start} problem by using source data to aid in the optimization of a target problem. We present a method that leverages an ensemble of surrogate models using transfer learning and integrates it in a constrained Bayesian optimization framework. We identify challenges particular to aircraft design optimization related to heterogeneous design variables and constraints. We propose the use of a partial-least-squares dimension reduction algorithm to address design space heterogeneity, and a \textit{meta} data surrogate selection method to address constraint heterogeneity. Numerical benchmark problems and an aircraft conceptual design optimization problem are used to demonstrate the proposed methods. Results show significant improvement in convergence in early optimization iterations compared to standard Bayesian optimization, with improved prediction accuracy for both objective and constraint surrogate models.

2603.28998 2026-04-01 cs.CR cs.AI

Design Principles for the Construction of a Benchmark Evaluating Security Operation Capabilities of Multi-agent AI Systems

Yicheng Cai, Mitchell John DeStefano, Guodong Dong, Pulkit Handa, Peng Liu, Tejas Singhal, Peiyu Tseng, Winston Jen White

Comments 29 pages, 1 figure

详情
英文摘要

As Large Language Models (LLMs) and multi-agent AI systems are demonstrating increasing potential in cybersecurity operations, organizations, policymakers, model providers, and researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such AI systems to achieve more autonomous SOCs (security operation centers) and reduce manual effort. In particular, the AI and cybersecurity communities have recently developed several benchmarks for evaluating the red team capabilities of multi-agent AI systems. However, because the operations in SOCs are dominated by blue team operations, the capabilities of AI systems & agents to achieve more autonomous SOCs cannot be evaluated without a benchmark focused on blue team operations. To our best knowledge, no systematic benchmark for evaluating coordinated multi-task blue team AI has been proposed in the literature. Existing blue team benchmarks focus on a particular task. The goal of this work is to develop a set of design principles for the construction of a benchmark, which is denoted as SOC-bench, to evaluate the blue team capabilities of AI. Following these design principles, we have developed a conceptual design of SOC-bench, which consists of a family of five blue team tasks in the context of large-scale ransomware attack incident response.

2603.28972 2026-04-01 cs.CR cs.AI

Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

Alessio Langiu

详情
英文摘要

The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting mechanism further bounds working memory, limiting the emergent leakage surface. We validate the framework through a 2x2 benchmark (Lazy vs. Expert users; Personal vs. Institutional secrets) on a 1,000-sample dataset, achieving a 45% blended OpEx reduction, 100% redaction success on personal secrets, and -- via LLM-as-a-Judge evaluation -- an 85% preference rate for APO-compressed responses over raw baselines. Our results demonstrate that Token Parsimony and Zero Leakage are mathematically dual projections of the same contextual compression operator.

2603.28971 2026-04-01 eess.SY cs.LG cs.SY

A Pontryagin Method of Model-based Reinforcement Learning via Hamiltonian Actor-Critic

Chengyang Gu, Yuxin Pan, Hui Xiong, Yize Chen

Comments 18 pages, 4 figures, in submission

详情
英文摘要

Model-based reinforcement learning (MBRL) improves sample efficiency by leveraging learned dynamics models for policy optimization. However, the effectiveness of methods such as actor-critic is often limited by compounding model errors, which degrade long-horizon value estimation. Existing approaches, such as Model-Based Value Expansion (MVE), partially mitigate this issue through multi-step rollouts, but remain sensitive to rollout horizon selection and residual model bias. Motivated by the Pontryagin Maximum Principle (PMP), we propose Hamiltonian Actor-Critic (HAC), a model-based approach that eliminates explicit value function learning by directly optimizing a Hamiltonian defined over the learned dynamics and reward for deterministic systems. By avoiding value approximation, HAC reduces sensitivity to model errors while admitting convergence guarantees. Extensive experiments on continuous control benchmarks, in both online and offline RL settings, demonstrate that HAC outperforms model-free and MVE-based baselines in control performance, convergence speed, and robustness to distributional shift, including out-of-distribution (OOD) scenarios. In offline settings with limited data, HAC matches or exceeds state-of-the-art methods, highlighting its strong sample efficiency.

2603.28956 2026-04-01 math.FA cs.LG math.MG math.PR math.ST stat.TH

Minimum Norm Interpolation via The Local Theory of Banach Spaces: The Role of $2$-Uniform Convexity

Gil Kur, Pierre Bizeul

Comments A Preliminary work of this work "Minimum Norm Interpolation Meets The Local Theory of Banach Spaces'' appeared at the International Conference of Machine Learning 2024 (consider this info for citations)

详情
英文摘要

The minimum-norm interpolator (MNI) framework has recently attracted considerable attention as a tool for understanding generalization in overparameterized models, such as neural networks. In this work, we study the MNI under a $2$-uniform convexity assumption, which is weaker than requiring the norm to be induced by an inner product, and it typically does not admit a closed-form solution. At a high level, we show that this condition yields an upper bound on the MNI bias in both linear and nonlinear models. We further show that this bound is sharp for overparameterized linear regression when the unit ball of the norm is in isotropic (or John's) position, and the covariates are isotropic, symmetric, i.i.d. sub-Gaussian, such as vectors with i.i.d. Bernoulli entries. Finally, under the same assumption on the covariates, we prove sharp generalization bounds for the $\ell_p$-MNI when $p \in \bigl(1 + C/\log d, 2\bigr]$. To the best of our knowledge, this is the first work to establish sharp bounds for non-Gaussian covariates in linear models when the norm is not induced by an inner product. This work is deeply inspired by classical works on $K$-convexity, and more modern work on the geometry of 2-uniform and isotropic convex bodies.

2603.28938 2026-04-01 eess.SY cs.LG cs.SY math.OC

Optimistic Online LQR via Intrinsic Rewards

Marcell Bartos, Bruce D. Lee, Lenart Treven, Andreas Krause, Florian Dörfler, Melanie N. Zeilinger

详情
英文摘要

Optimism in the face of uncertainty is a popular approach to balance exploration and exploitation in reinforcement learning. Here, we consider the online linear quadratic regulator (LQR) problem, i.e., to learn the LQR corresponding to an unknown linear dynamical system by adapting the control policy online based on closed-loop data collected during operation. In this work, we propose Intrinsic Rewards LQR (IR-LQR), an optimistic online LQR algorithm that applies the idea of intrinsic rewards originating from reinforcement learning and the concept of variance regularization to promote uncertainty-driven exploration. IR-LQR retains the structure of a standard LQR synthesis problem by only modifying the cost function, resulting in an intuitively pleasing, simple, computationally cheap, and efficient algorithm. This is in contrast to existing optimistic online LQR formulations that rely on more complicated iterative search algorithms or solve computationally demanding optimization problems. We show that IR-LQR achieves the optimal worst-case regret rate of $\sqrt{T}$, and compare it to various state-of-the-art online LQR algorithms via numerical experiments carried out on an aircraft pitch angle control and an unmanned aerial vehicle example.

2603.28838 2026-04-01 cs.CR cs.AI

GMA-SAWGAN-GP: A Novel Data Generative Framework to Enhance IDS Detection Performance

Ziyu Mu, Xiyu Shi, Safak Dogan

Comments 13 pages, 2 figures

详情
英文摘要

Intrusion Detection System (IDS) is often calibrated to known attacks and generalizes poorly to unknown threats. This paper proposes GMA-SAWGAN-GP, a novel generative augmentation framework built on a Self-Attention-enhanced Wasserstein GAN with Gradient Penalty (WGAN-GP). The generator employs Gumbel-Softmax regularization to model discrete fields, while a Multilayer Perceptron (MLP)-based AutoEncoder acts as a manifold regularizer. A lightweight gating network adaptively balances adversarial and reconstruction losses via entropy regularization, improving stability and mitigating mode collapse. The self-attention mechanism enables the generator to capture both short- and long-range dependencies among features within each record while preserving categorical semantics through Gumbel-Softmax heads. Extensive experiments on NSL-KDD, UNSW-NB15, and CICIDS2017 using five representative IDS models demonstrate that GMA-SAWGAN-GP significantly improves detection performance on known attacks and enhances generalization to unknown attacks. Leave-One-Attack-type-Out (LOAO) evaluations using Area Under the Receiver Operating Characteristic (AUROC) and True Positive Rate at a 5 percent False Positive Rate confirm that IDS models trained on augmented datasets achieve higher robustness under unseen attack scenarios. Ablation studies validate the contribution of each component to performance gains. Compared with baseline models, the proposed framework improves binary classification accuracy by an average of 5.3 percent and multi-classification accuracy by 2.2 percent, while AUROC and True Positive Rate at a 5 percent False Positive Rate for unknown attacks increase by 3.9 percent and 4.8 percent, respectively, across the three datasets. Overall, GMA-SAWGAN-GP provides an effective approach to generative augmentation for mixed-type network traffic, improving IDS accuracy and resilience.

2603.28824 2026-04-01 cs.CR cs.AI

SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

He Yang, Dongyi Lv, Song Ma, Wei Xi, Jizhong Zhao

Comments 29 pages, 5 figures, accepted to NeurIPS 2025

详情
英文摘要

Dataset condensation aims to synthesize compact yet informative datasets that retain the training efficacy of full-scale data, offering substantial gains in efficiency. Recent studies reveal that the condensation process can be vulnerable to backdoor attacks, where malicious triggers are injected into the condensation dataset, manipulating model behavior during inference. While prior approaches have made progress in balancing attack success rate and clean test accuracy, they often fall short in preserving stealthiness, especially in concealing the visual artifacts of condensed data or the perturbations introduced during inference. To address this challenge, we introduce Sneakdoor, which enhances stealthiness without compromising attack effectiveness. Sneakdoor exploits the inherent vulnerability of class decision boundaries and incorporates a generative module that constructs input-aware triggers aligned with local feature geometry, thereby minimizing detectability. This joint design enables the attack to remain imperceptible to both human inspection and statistical detection. Extensive experiments across multiple datasets demonstrate that Sneakdoor achieves a compelling balance among attack success rate, clean test accuracy, and stealthiness, substantially improving the invisibility of both the synthetic data and triggered samples while maintaining high attack efficacy. The code is available at https://github.com/XJTU-AI-Lab/SneakDoor.

2603.28823 2026-04-01 cs.PF cs.AI

Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs

Yi Liu

详情
英文摘要

Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minutes to 24 hours on consumer GPUs (RTX 4090). Across 70+ runs spanning 50M--1031M parameters, we find: (1)~at each time budget a U-shaped curve emerges where too-small models overfit and too-large models undertrain; (2)~optimal model size follows $N^* \propto t^{0.60}$, growing \emph{faster} than Chinchilla's $N^* \propto C^{0.50}$, with $α= 0.60 \pm 0.07$ robustly exceeding compute-optimal across all sensitivity analyses; (3)~a \emph{dual U-shape mechanism}: short-budget U-curves arise from compute bottlenecks, while long-budget U-curves emerge from data bottlenecks (overfitting), with an intermediate regime where the U-curve temporarily disappears. These findings have immediate implications for researchers training on consumer hardware, where wall-clock time -- not FLOPs -- is the binding constraint. We release all code, logs, and 70+ experimental configurations.

2603.28817 2026-04-01 cs.CR cs.AI

GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models

Md Jueal Mia, Joaquin Molto, Yanzhao Wu, M. Hadi Amini

详情
英文摘要

Small Language Models (SLMs) are emerging as efficient and economically viable alternatives to Large Language Models (LLMs), offering competitive performance with significantly lower computational costs and latency. These advantages make SLMs suitable for resource-constrained and efficient deployment on edge devices. However, existing jailbreak defenses show limited robustness against heterogeneous attacks, largely due to an incomplete understanding of the internal representations across different layers of language models that facilitate jailbreak behaviors. In this paper, we conduct a comprehensive empirical study on 9 jailbreak attacks across 7 SLMs and 3 LLMs. Our analysis shows that SLMs remain highly vulnerable to malicious prompts that bypass safety alignment. We analyze hidden-layer activations across different layers and model architectures, revealing that different input types form distinguishable patterns in the internal representation space. Based on this observation, we propose GUARD-SLM, a lightweight token activation-based method that operates in the representation space to filter malicious prompts during inference while preserving benign ones. Our findings highlight robustness limitations across layers of language models and provide a practical direction for secure small language model deployment.

2603.28815 2026-04-01 cs.CR cs.AI

SkillTester: Benchmarking Utility and Security of Agent Skills

Leye Wang, Zixing Wang, Anjie Xu

Comments Technical report, 13 pages, 2 figures, 9 tables. Project page: https://skilltester.ai. Code: https://github.com/skilltester-ai/skilltester

详情
英文摘要

This technical report presents SkillTester, a tool for evaluating the utility and security of agent skills. Its evaluation framework combines paired baseline and with-skill execution conditions with a separate security probe suite. Grounded in a comparative utility principle and a user-facing simplicity principle, the framework normalizes raw execution artifacts into a utility score, a security score, and a three-level security status label. More broadly, it can be understood as a comparative quality-assurance harness for agent skills in an agent-first world. The public service is deployed at https://skilltester.ai, and the broader project is maintained at https://github.com/skilltester-ai/skilltester.

2603.28813 2026-04-01 cs.MA cs.AI

The impact of multi-agent debate protocols on debate quality: a controlled case study

Ramtin Zargari Marandi

Comments 16 pages, 3 figures

详情
英文摘要

In multi-agent debate (MAD) systems, performance gains are often reported; however, because the debate protocol (e.g., number of agents, rounds, and aggregation rule) is typically held fixed while model-related factors vary, it is difficult to disentangle protocol effects from model effects. To isolate these effects, we compare three main protocols, Within-Round (WR; agents see only current-round contributions), Cross-Round (CR; full prior-round context), and novel Rank-Adaptive Cross-Round (RA-CR; dynamically reorders agents and silences one per round via an external judge model), against a No-Interaction baseline (NI; independent responses without peer visibility). In a controlled macroeconomic case study (20 diverse events, five random seeds, matched prompts/decoding), RA-CR achieves faster convergence than CR, WR shows higher peer-referencing, and NI maximizes Argument Diversity (unaffected across the main protocols). These results reveal a trade-off between interaction (peer-referencing rate) and convergence (consensus formation), confirming protocol design matters. When consensus is prioritized, RA-CR outperforms the others.

2603.28812 2026-04-01 cond-mat.mtrl-sci cond-mat.mes-hall cs.LG

Data-Driven Estimation of the interfacial Dzyaloshinskii-Moriya Interaction with Machine Learning

Davi Rodrigues, Andrea Meo, Ali Hasan, Edoardo Piccolo, Adriano Di Pietro, Alessandro Magni, Marco Madami, Giovanni Finocchio, Mario Carpentieri, Michaela Kuepferling, Vito Puliafito

Comments 13 pages, 7 figures

详情
英文摘要

Machine learning offers powerful tools to support experimental techniques, particularly for extracting latent features from large datasets. In magnetic materials, accurately estimating the interfacial Dzyaloshinskii-Moriya interaction strength remains challenging, as existing experimental methods often rely on indirect measurements and can yield inconsistent results across techniques. Because this interaction is often extracted experimentally from bubble domain expansion, we investigate whether bubble textures alone contain sufficient and reliable information for data driven DMI inference. We therefore develop a compact convolutional neural network trained on a comprehensive micromagnetic dataset of magnetic bubble domains designed to emulate magneto optical Kerr effect imaging, including structural non uniformity, additive noise, and image pixelation. The proposed network demonstrates strong robustness against sample inhomogeneities, noise, and reduced spatial resolution. Furthermore, it exhibits reliable generalization by accurately predicting DMI values outside the trained interval. These results support the use of machine learning as a fast and quantitative tool to characterize magnetic textures with interfacial DMI.