arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1544
专题追踪
2604.02017 2026-04-03 stat.ML cs.LG

Demographic Parity Tails for Regression

Naht Sinh Le, Christophe Denis, Mohamed Hebiri

详情
英文摘要

Demographic parity (DP) is a widely studied fairness criterion in regression, enforcing independence between the predictions and sensitive attributes. However, constraining the entire distribution can degrade predictive accuracy and may be unnecessary for many applications, where fairness concerns are localized to specific regions of the distribution. To overcome this issue, we propose a new framework for regression under DP that focuses on the tails of target distribution across sensitive groups. Our methodology builds on optimal transport theory. By enforcing fairness constraints only over targeted regions of the distribution, our approach enables more nuanced and context-sensitive interventions. Leveraging recent advances, we develop an interpretable and flexible algorithm that leverages the geometric structure of optimal transport. We provide theoretical guarantees, including risk bounds and fairness properties, and validate the method through experiments in regression settings.

2604.02016 2026-04-03 cs.MA cs.AI

Optimizing Interventions for Agent-Based Infectious Disease Simulations

Anja Wolpers, Johannes Ponge, Adelinde M. Uhrmacher

详情
英文摘要

Non-pharmaceutical interventions (NPIs) are commonly used tools for controlling infectious disease transmission when pharmaceutical options are unavailable. Yet, identifying effective interventions that minimize societal disruption remains challenging. Agent-based simulation is a popular tool for analyzing the impact of possible interventions in epidemiology. However, automatically optimizing NPIs using agent-based simulations poses a complex problem because, in agent-based epidemiological models, interventions can target individuals based on multiple attributes, affect hierarchical group structures (e.g., schools, workplaces, and families), and be combined arbitrarily, resulting in a very large or even infinite search space. We aim to support decision-makers with our Agent-based Infectious Disease Intervention Optimization System (ADIOS) that optimizes NPIs for infectious disease simulations using Grammar-Guided Genetic Programming (GGGP). The core of ADIOS is a domain-specific language for expressing NPIs in agent-based simulations that structures the intervention search space through a context-free grammar. To make optimization more efficient, the search space can be further reduced by defining constraints that prevent the generation of semantically invalid intervention patterns. Using this constrained language and an interface that enables coupling with agent-based simulations, ADIOS adopts the GGGP approach for simulation-based optimization. Using the German Epidemic Micro-Simulation System (GEMS) as a case study, we demonstrate the potential of our approach to generate optimal interventions for realistic epidemiological models

2604.01978 2026-04-03 math.PR cs.LG stat.ML

Homogenized Transformers

Hugo Koubbi, Borjan Geshkovski, Philippe Rigollet

详情
英文摘要

We study a random model of deep multi-head self-attention in which the weights are resampled independently across layers and heads, as at initialization of training. Viewing depth as a time variable, the residual stream defines a discrete-time interacting particle system on the unit sphere. We prove that, under suitable joint scalings of the depth, the residual step size, and the number of heads, this dynamics admits a nontrivial homogenized limit. Depending on the scaling, the limit is either deterministic or stochastic with common noise; in the mean-field regime, the latter leads to a stochastic nonlinear Fokker--Planck equation for the conditional law of a representative token. In the Gaussian setting, the limiting drift vanishes, making the homogenized dynamics explicit enough to study representation collapse. This yields quantitative trade-offs between dimension, context length, and temperature, and identifies regimes in which clustering can be mitigated.

2604.01977 2026-04-03 cs.CR cs.AI cs.CL cs.LG cs.SE

RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari, Michael Gentile, Zach Reavis, David Magnotti, Wayne Fullen

Comments 11 pages, 10 figures. To be submitted to CAMLIS 2026

详情
英文摘要

Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually develop detection mechanisms. In 2025, the National Vulnerability Database published over 48,000 new vulnerabilities, motivating the need for automation. We present RuleForge, an AWS internal system that automatically generates detection rules--JSON-based patterns that identify malicious HTTP requests exploiting specific vulnerabilities--from structured Nuclei templates describing CVE details. Nuclei templates provide standardized, YAML-based vulnerability descriptions that serve as the structured input for our rule generation process. This paper focuses on RuleForge's architecture and operational deployment for CVE-related threat detection, with particular emphasis on our novel LLM-as-a-judge (Large Language Model as judge) confidence validation system and systematic feedback integration mechanism. This validation approach evaluates candidate rules across two dimensions--sensitivity (avoiding false negatives) and specificity (avoiding false positives)--achieving AUROC of 0.75 and reducing false positives by 67% compared to synthetic-test-only validation in production. Our 5x5 generation strategy (five parallel candidates with up to five refinement attempts each) combined with continuous feedback loops enables systematic quality improvement. We also present extensions enabling rule generation from unstructured data sources and demonstrate a proof-of-concept agentic workflow for multi-event-type detection. Our lessons learned highlight critical considerations for applying LLMs to cybersecurity tasks, including overconfidence mitigation and the importance of domain expertise in both prompt design and quality review of generated rules through human-in-the-loop validation.

2604.01944 2026-04-03 cs.NI cs.AI cs.LG

Physics-Informed Transformer for Multi-Band Channel Frequency Response Reconstruction

Anatolij Zubow, Joana Angjo, Sigrid Dimce, Falko Dressler

Comments 6 pages, 6 figures

详情
英文摘要

Wideband channel frequency response (CFR) estimation is challenging in multi-band wireless systems, especially when one or more sub-bands are temporarily blocked by co-channel interference. We present a physics-informed complex Transformer that reconstructs the full wideband CFR from such fragmented, partially observed spectrum snapshots. The interference pattern in each sub-band is modeled as an independent two-state discrete-time Markov chain, capturing realistic bursty occupancy behavior. Our model operates on the joint time-frequency grid of $T$ snapshots and $F$ frequency bins and uses a factored self-attention mechanism that separately attends along both axes, reducing the computational complexity to $O(TF^2 + FT^2)$. Complex-valued inputs and outputs are processed through a holomorphic linear layer that preserves phase relationships. Training uses a composite physics-informed loss combining spectral fidelity, power delay profile (PDP) reconstruction, channel impulse response (CIR) sparsity, and temporal smoothness. Mobility effects are incorporated through per-sample velocity randomization, enabling generalization across different mobility regimes. Evaluation against three classical baselines, namely, last-observation-carry-forward, zero-fill, and cubic-spline interpolation, shows that our approach achieves the highest PDP similarity with respect to the ground truth, reaching $ρ\geq 0.82$ compared to $ρ\geq 0.62$ for the best baseline at interference occupancy levels up to 50%. Furthermore, the model degrades smoothly across the full velocity range, consistently outperforming all other baselines.

2604.01943 2026-04-03 stat.ML cs.LG

A Novel Theoretical Analysis for Clustering Heteroscedastic Gaussian Data without Knowledge of the Number of Clusters

Dominique Pastor, Elsa Dupraz, Ismail Hbilou, Guillaume Ansel

Comments 76 pages, submitted to JMLR

详情
英文摘要

This paper addresses the problem of clustering measurement vectors that are heteroscedastic in that they can have different covariance matrices. From the assumption that the measurement vectors within a given cluster are Gaussian distributed with possibly different and unknown covariant matrices around the cluster centroid, we introduce a novel cost function to estimate the centroids. The zeros of the gradient of this cost function turn out to be the fixed-points of a certain function. As such, the approach generalizes the methodology employed to derive the existing Mean-Shift algorithm. But as a main and novel theoretical result compared to Mean-Shift, this paper shows that the sole fixed-points of the identified function tend to be the cluster centroids if both the number of measurements per cluster and the distances between centroids are large enough. As a second contribution, this paper introduces the Wald kernel for clustering. This kernel is defined as the p-value of the Wald hypothesis test for testing the mean of a Gaussian. As such, the Wald kernel measures the plausibility that a measurement vector belongs to a given cluster and it scales better with the dimension of the measurement vectors than the usual Gaussian kernel. Finally, the proposed theoretical framework allows us to derive a new clustering algorithm called CENTRE-X that works by estimating the fixed-points of the identified function. As Mean-Shift, CENTRE-X requires no prior knowledge of the number of clusters. It relies on a Wald hypothesis test to significantly reduce the number of fixed points to calculate compared to the Mean-Shift algorithm, thus resulting in a clear gain in complexity. Simulation results on synthetic and real data sets show that CENTRE-X has comparable or better performance than standard clustering algorithms K-means and Mean-Shift, even when the covariance matrices are not perfectly known.

2604.01857 2026-04-03 physics.optics cs.CV

Enhanced Polarization Locking in VCSELs

Zifeng Yuan, Dewen Zhang, Lei Shi, Yutong Liu, Aaron Danner

详情
英文摘要

While optical injection locking (OIL) of vertical-cavity surface-emitting lasers (VCSELs) has been widely studied in the past, the polarization dynamics of OIL have received far less attention. Recent studies suggest that polarization locking via OIL could enable novel computational applications such as polarization-encoded Ising computers. However, the inherent polarization preference and limited polarization switchability of VCSELs hinder their use for such purposes. To address these challenges, we fabricate VCSELs with tailored oxide aperture designs and combine these with bias current tuning to study the overall impact on polarization locking. Experimental results demonstrate that this approach reduces the required injection power (to as low as 3.6 μW) and expands the locking range. To investigate the impact of the approach, the spin-flip model (SFM) is used to analyze the effects of amplitude anisotropy and bias current on polarization locking, demonstrating strong coherence with experimental results.

2604.01832 2026-04-03 eess.AS cs.SD

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Xiaobin Rong, Yushi Wang, Zheng Wang, Jing Lu

Comments Awarded 1st place in the URGENT 2026 Challenge (objective phase), accepted by ICASSP 2026

详情
英文摘要

We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speech restoration in a self-supervised representation domain and reconstructs the waveform via a neural vocoder, along with a predictive branch that performs spectrogram-domain enhancement, providing complementary cues. Outputs from both branches are fused by a post-processing module, which also performs bandwidth extension to generate the enhanced waveform at 48 kHz, later downsampled to the original sampling rate. This generative-predictive fusion improves robustness and perceptual quality, achieving top performance in the blind-test phase and ranking 1st in the objective evaluation. Audio examples are available at https://xiaobin-rong.github.io/gap-urgenet_demo.

2604.01805 2026-04-03 eess.SY cs.AI cs.SY

Neural Network-Assisted Model Predictive Control for Implicit Balancing

Seyed Soroush Karimi Madahi, Kenneth Bruninx, Bert Claessens, Chris Develder

详情
英文摘要

In Europe, balance responsible parties can deliberately take out-of-balance positions to support transmission system operators (TSOs) in maintaining grid stability and earn profit, a practice called implicit balancing. Model predictive control (MPC) is widely adopted as an effective approach for implicit balancing. The balancing market model accuracy in MPC is critical to decision quality. Previous studies modeled this market using either (i) a convex market clearing approximation, ignoring proactive manual actions by TSOs and the market sub-quarter-hour dynamics, or (ii) machine learning methods, which cannot be directly integrated into MPC. To address these shortcomings, we propose a data-driven balancing market model integrated into MPC using an input convex neural network to ensure convexity while capturing uncertainties. To keep the core network computationally efficient, we incorporate attention-based input gating mechanisms to remove irrelevant data. Evaluating on Belgian data shows that the proposed model both improves MPC decisions and reduces computational time.

2604.01789 2026-04-03 stat.ML cs.LG

Learning in Prophet Inequalities with Noisy Observations

Jung-hun Kim, Vianney Perchet

Comments ICLR 2026

详情
英文摘要

We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d.\ setting, we establish that both an Explore-then-Decide strategy and an $\varepsilon$-Greedy variant achieve the sharp competitive ratio of $1 - 1/e$, under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of $1/2$ can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of $1/2$ against the optimal benchmark is achieved.

2604.01733 2026-04-03 cs.IR cs.CL

From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents

Meftun Akarsu, Recep Kaan Karaman, Christopher Mierbach

Comments 11 pages, 6 figures, 6 tables

详情
英文摘要

Retrieval-Augmented Generation (RAG) systems critically depend on retrieval quality, yet no systematic comparison of modern retrieval methods exists for heterogeneous documents containing both text and tabular data. We benchmark ten retrieval strategies spanning sparse, dense, hybrid fusion, cross-encoder reranking, query expansion, index augmentation, and adaptive retrieval on a challenging financial QA benchmark of 23,088 queries over 7,318 documents with mixed text-and-table content. We evaluate retrieval quality via Recall@k, MRR, and nDCG, and end-to-end generation quality via Number Match, with paired bootstrap significance testing. Our results show that (1) a two-stage pipeline combining hybrid retrieval with neural reranking achieves Recall@5 of 0.816 and MRR@3 of 0.605, outperforming all single-stage methods by a large margin; (2) BM25 outperforms state-of-the-art dense retrieval on financial documents, challenging the common assumption that semantic search universally dominates; and (3) query expansion methods (HyDE, multi-query) and adaptive retrieval provide limited benefit for precise numerical queries, while contextual retrieval yields consistent gains. We provide ablation studies on fusion methods and reranker depth, actionable cost-accuracy recommendations, and release our full benchmark code.

2604.01637 2026-04-03 cs.CR cs.AI

Seclens: Role-specific Evaluation of LLM's for security vulnerablity detection

Subho Halder, Siddharth Saxena, Kashinath Kadaba Shrish, Thiyagarajan M

详情
英文摘要

Existing benchmarks for LLM-based vulnerability detection compress model performance into a single metric, which fails to reflect the distinct priorities of different stakeholders. For example, a CISO may emphasize high recall of critical vulnerabilities, an engineering leader may prioritize minimizing false positives, and an AI officer may balance capability against cost. To address this limitation, we introduce SecLens-R, a multi-stakeholder evaluation framework structured around 35 shared dimensions grouped into 7 measurement categories. The framework defines five role-specific weighting profiles: CISO, Chief AI Officer, Security Researcher, Head of Engineering, and AI-as-Actor. Each profile selects 12 to 16 dimensions with weights summing to 80, yielding a composite Decision Score between 0 and 100. We apply SecLens-R to evaluate 12 frontier models on a dataset of 406 tasks derived from 93 open-source projects, covering 10 programming languages and 8 OWASP-aligned vulnerability categories. Evaluations are conducted across two settings: Code-in-Prompt (CIP) and Tool-Use (TU). Results show substantial variation across stakeholder perspectives, with Decision Scores differing by as much as 31 points for the same model. For instance, Qwen3-Coder achieves an A (76.3) under the Head of Engineering profile but a D (45.2) under the CISO profile, while GPT-5.4 shows a similar disparity. These findings demonstrate that vulnerability detection is inherently a multi-objective problem and that stakeholder-aware evaluation provides insights that single aggregated metrics obscure.

2604.01627 2026-04-03 cs.CR cs.AI

RefinementEngine: Automating Intent-to-Device Filtering Policy Deployment under Network Constraints

Davide Colaiacomo, Chiara Bonfanti, Cataldo Basile

详情
英文摘要

Translating security intent into deployable network enforcement rules and maintaining their effectiveness despite evolving cyber threats remains a largely manual process in most Security Operations Centers (SOCs). In large and heterogeneous networks, this challenge is complicated by topology-dependent reachability constraints and device-specific security control capabilities, making the process slow, error-prone, and a recurring source of misconfigurations. This paper presents RefinementEngine, an engine that automates the refinement of high-level security intents into low-level, deployment-ready configurations. Given a network topology, devices, and available security controls, along with high-level intents and Cyber Threat Intelligence (CTI) reports, RefinementEngine automatically generates settings that implement the desired intent, counter reported threats, and can be directly deployed on target security controls. The proposed approach is validated through real-world use cases on packet and web filtering policies derived from actual CTI reports, demonstrating both correctness, practical applicability, and adaptability to new data.

2604.01607 2026-04-03 cs.DC cs.AI

ModTrans: Translating Real-world Models for Distributed Training Simulator

Yi Lyu

详情
英文摘要

Large-scale distributed training has been a research hot spot in machine learning systems for industry and academia in recent years. However, conducting experiments without physical machines and corresponding resources is difficult. One solution is to leverage distributed training simulators, but current ones like ASTRA-sim do not support importing real-world developed models, which poses challenges for ML researchers seeking to use them. Based on this challenge, we developed ModTrans, a translator supporting format translation from any real-world model to the ASTRA-sim simulator's input, removing the barrier between machine learning experts and machine learning system researchers. The experiment results show that ModTrans's cost is negligible.

2604.01606 2026-04-03 stat.ML cs.LG math.OC

Random Coordinate Descent on the Wasserstein Space of Probability Measures

Yewei Xu, Qin Li

详情
英文摘要

Optimization over the space of probability measures endowed with the Wasserstein-2 geometry is central to modern machine learning and mean-field modeling. However, traditional methods relying on full Wasserstein gradients often suffer from high computational overhead in high-dimensional or ill-conditioned settings. We propose a randomized coordinate descent framework specifically designed for the Wasserstein manifold, introducing both Random Wasserstein Coordinate Descent (RWCD) and Random Wasserstein Coordinate Proximal{-Gradient} (RWCP) for composite objectives. By exploiting coordinate-wise structures, our methods adapt to anisotropic objective landscapes where full-gradient approaches typically struggle. We provide a rigorous convergence analysis across various landscape geometries, establishing guarantees under non-convex, Polyak-Łojasiewicz, and geodesically convex conditions. Our theoretical results mirror the classic convergence properties found in Euclidean space, revealing a compelling symmetry between coordinate descent on vectors and on probability measures. The developed techniques are inherently adaptive to the Wasserstein geometry and offer a robust analytical template that can be extended to other optimization solvers within the space of measures. Numerical experiments on ill-conditioned energies demonstrate that our framework offers significant speedups over conventional full-gradient methods.

2604.01554 2026-04-03 cs.CR cs.LG cs.SE

EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

Yiming Fan, Jun Yeon Won, Ding Zhu, Melih Sirlanci, Mahdi Khalili, Carter Yagemann

Comments 13 pages, 7 figures. This is a technical report for the EXHIB benchmark. Code and data are available at https://github.com/fan1192/bfsd-anon-artifact

详情
英文摘要

Binary Function Similarity Detection (BFSD) is a core problem in software security, supporting tasks such as vulnerability analysis, malware classification, and patch provenance. In the past few decades, numerous models and tools have been developed for this application; however, due to the lack of a comprehensive universal benchmark in this field, researchers have struggled to compare different models effectively. Existing datasets are limited in scope, often focusing on a narrow set of transformations or types of binaries, and fail to reflect the full diversity of real-world applications. We introduce EXHIB, a benchmark comprising five realistic datasets collected from the wild, each highlighting a distinct aspect of the BFSD problem space. We evaluate 9 representative models spanning multiple BFSD paradigms on EXHIB and observe performance degradations of up to 30% on firmware and semantic datasets compared to standard settings, revealing substantial generalization gaps. Our results show that robustness to low- and mid-level binary variations does not generalize to high-level semantic differences, underscoring a critical blind spot in current BFSD evaluation practices.

2604.01525 2026-04-03 math.CA cs.LG math.HO math.OC

A Determinantal Approach to a Sharp $\ell^1-\ell^\infty-\ell^2$ Norm Inequality

Jose Antonio Lara Benitez

详情
英文摘要

We give a short linear--algebraic proof of the inequality \[ \|x\|_1\,\|x\|_\infty \le \frac{1+\sqrt{p}}{2}\,\|x\|_2^2, \] valid for every \(x\in\mathbb{R}^p\). This inequality relates three fundamental norms on finite-dimensional spaces and has applications in optimization and numerical analysis. Our proof exploits the determinantal structure of a parametrized family of quadratic forms, and we show the constant $(1+\sqrt{p})/2$ is optimal.

2604.01517 2026-04-03 eess.SY cs.RO cs.SY

MorphoGuard: A Morphology-Based Whole-Body Interactive Motion Controller

Chenjin Wang, Zheng Yan, Yanmin Zhou, Runjie Shen, Bin He

详情
英文摘要

Whole-body control (WBC) has demonstrated significant advantages in complex interactive movements of high-dimensional robotic systems. However, when a robot is required to handle dynamic multi-contact combinations along a single kinematic chain-such as pushing open a door with its elbow while grasping an object-it faces major obstacles in terms of complex contact representation and joint configuration coupling. To address this, we propose a new control approach that explicitly manages arbitrary contact combinations, aiming to endow robots with whole-body interactive capabilities. We develop a morphology-constrained WBC network (MorphoGuard)-which is trained on a self-constructed dual-arm physical and simulation platform. A series of model recommendation experiments are designed to systematically investigate the impact of backbone architecture, fusion strategy, and model scale on network performance. To evaluate the control performance, we adopt a multi-object interaction task as the benchmark, requiring the model to simultaneously manipulate multiple target objects to specified positions. Experimental results show that the proposed method achieves a contact point management error of approximately 1 cm, demonstrating its effectiveness in whole-body interactive control.

2604.01508 2026-04-03 cs.SE cs.AI

ToolMisuseBench: An Offline Deterministic Benchmark for Tool Misuse and Recovery in Agentic Systems

Akshey Sigdel, Rista Baral

详情
英文摘要

Tool using agents often fail for operational reasons even when language understanding is strong. Common causes include invalid arguments, interface drift, weak recovery, and inefficient retry behavior. We introduce ToolMisuseBench, an offline deterministic benchmark for evaluating tool misuse and recovery under explicit step, call, and retry budgets. The benchmark covers CRUD, retrieval, file, and scheduling environments with replayable fault injection. It reports success, invalid call behavior, policy violations, recovery quality, and budgeted efficiency. We release a public dataset with 6800 tasks and a reproducible evaluation pipeline. Baseline results show fault specific recovery gains for schema aware methods, while overall success remains limited under the released authorization and hard failure settings.

2604.01496 2026-04-03 cs.SE cs.CL

From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents

Nikolai Ludwig, Wasi Uddin Ahmad, Somshubra Majumdar, Boris Ginsburg

详情
英文摘要

We introduce SWE-ZERO to SWE-HERO, a two-stage SFT recipe that achieves state-of-the-art results on SWE-bench by distilling open-weight frontier LLMs. Our pipeline replaces resource-heavy dependencies with an evolutionary refinement strategy: (1) SWE-ZERO utilizes large-scale, execution-free trajectories to master code semantics and repository-level reasoning, and (2) SWE-HERO applies targeted, execution-backed refinement to transition these semantic intuitions into rigorous engineering workflows. Our empirical results set a new benchmark for open-source models of comparable size. We release a dataset of 300k SWE-ZERO and 13k SWE-HERO trajectories distilled from Qwen3-Coder-480B, alongside a suite of agents based on the Qwen2.5-Coder series. Notably, SWE-HERO-32B achieves a 62.2% resolution rate on SWE-bench Verified. Furthermore, despite being trained exclusively on Python, our agents demonstrate robust zero-shot transferability on SWE-bench Multilingual, reaching 44.1% and confirming the paradigm's generalizability across diverse languages.

2604.01484 2026-04-03 cond-mat.stat-mech cs.LG

The topological gap at criticality: scaling exponent d + η, universality, and scope

Matthew Loftus

Comments 7 pages, 4 figures, 4 tables

详情
英文摘要

The topological gap $Δ= TP_{H_1}^{real} - TP_{H_1}^{shuf}$ -- the excess $H_1$ total persistence of the majority-spin alpha complex over a density-matched null -- encodes critical correlations in spin models. We establish finite-size scaling: $Δ(L,T) = A L^{d+η} G_-(L|t/T_c|)$, with $G_-(x) \sim (1+x/x_0)^{-(1+β/ν)}$. For 2D Ising, $α= 2.249 \pm 0.038$, matching $d+η= 9/4$ to $0.03σ$; the $G_-$ exponent $γ= 1.089 \pm 0.077$ is consistent with $1+β/ν= 9/8$ ($ΔR^2 < 10^{-5}$). For 2D Potts $q=3$ with $L$ up to 1024, $α= 2.272 \pm 0.024$ ($0.2σ$ from $d+η= 2.267$), with two-term corrections to scaling ($R^2 = 0.9999$). The $G_-$ exponent $γ= 1.114$ (68% CI $[1.053, 1.173]$) matches $1+β/ν= 17/15$. Scope boundaries: the law fails for 2D Potts $q=4$ ($α= 2.347 \pm 0.017$, $9.3σ$ from $d+η= 5/2$) where logarithmic corrections prevent convergence, and for raw 3D Ising ($4σ$ from $d+η$), but density normalization $Δ/|M|^{1/2}$ recovers $α= 3.06 \pm 0.04$ ($0.6σ$). The framework fails for first-order, BKT, and percolation. The criterion: $α= d+η$ holds when corrections to scaling are algebraic ($ω> 0$) but fails when logarithmic ($ω\to 0$).

2604.01483 2026-04-03 cs.LO cs.AI cs.CR

Type-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem Proving

Devakh Rashie, Veda Rashi

Comments 8 pages, 1 table. Code and live demo available at https://github.com/arkanemystic/lean-agent-protocol and https://axiom.devrashie.space

详情
英文摘要

The rapid evolution of autonomous, agentic artificial intelligence within financial services has introduced an existential architectural crisis: large language models (LLMs) are probabilistic, non-deterministic systems operating in domains that demand absolute, mathematically verifiable compliance guarantees. Existing guardrail solutions -- including NVIDIA NeMo Guardrails and Guardrails AI -- rely on probabilistic classifiers and syntactic validators that are fundamentally inadequate for enforcing complex multi-variable regulatory constraints mandated by the SEC, FINRA, and OCC. This paper presents the Lean-Agent Protocol, a formal-verification-based AI guardrail platform that leverages the Aristotle neural-symbolic model developed by Harmonic AI to auto-formalize institutional policies into Lean 4 code. Every proposed agentic action is treated as a mathematical conjecture: execution is permitted if and only if the Lean 4 kernel proves that the action satisfies pre-compiled regulatory axioms. This architecture provides cryptographic-level compliance certainty at microsecond latency, directly satisfying SEC Rule 15c3-5, OCC Bulletin 2011-12, FINRA Rule 3110, and CFPB explainability mandates. A three-phase implementation roadmap from shadow verification through enterprise-scale deployment is provided.

2604.01472 2026-04-03 math.OC cs.AI cs.LG

The Newton-Muon Optimizer

Zhehang Du, Weijie Su

详情
英文摘要

The Muon optimizer has received considerable attention for its strong performance in training large language models, yet the design principle behind its matrix-gradient orthogonalization remains largely elusive. In this paper, we introduce a surrogate model that not only sheds new light on the design of Muon, but more importantly leads to a new optimizer. In the same spirit as the derivation of Newton's method, the surrogate approximates the loss as a quadratic function of the perturbation to a weight matrix $W$ using only three matrices: the gradient $G$, an output-space curvature matrix $H$, and the data matrix $Z$ that stacks the layer inputs. By minimizing this surrogate in one step and adopting a certain isotropic assumption on the weights, we obtain the closed-form update rule (up to momentum and weight decay) $W \leftarrow W - η\cdot \mathrm{msgn}(G(ZZ^\top)^{-1})$, where $η$ is the learning rate and $\mathrm{msgn}(X)=UV^\top$ if $X=USV^\top$ is a compact singular value decomposition. This new optimization method, which we refer to as Newton-Muon, shows that standard Muon can be interpreted as an implicit Newton-type method that neglects the right preconditioning induced by the input second moment. Empirically, on a reproduction of the earliest publicly released Modded-NanoGPT speedrun configuration using Muon for GPT-2 pretraining, Newton-Muon reaches the target validation loss in 6\% fewer iteration steps and reduces wall-clock training time by about 4\%.

2604.01448 2026-04-03 eess.SY cs.RO cs.SY

Neural Robust Control on Lie Groups Using Contraction Methods (Extended Version)

Yi Lok Lo, Longhao Qian, Hugh H. T. Liu

Comments An extended version of the conference paper submitted for publication in IEEE Conference of Decision and Control

详情
英文摘要

In this paper, we propose a learning framework for synthesizing a robust controller for dynamical systems evolving on a Lie group. A robust control contraction metric (RCCM) and a neural feedback controller are jointly trained to enforce contraction conditions on the Lie group manifold. Sufficient conditions are derived for the existence of such an RCCM and neural controller, ensuring that the geometric constraints imposed by the manifold structure are respected while establishing a disturbance-dependent tube that bounds the output trajectories. As a case study, a feedback controller for a quadrotor is designed using the proposed framework. Its performance is evaluated using numerical simulations and compared with a geometric controller.

2604.01443 2026-04-03 econ.TH cs.AI cs.IT math.IT

All Substitution Is Local

Nidhish Shah, Shaurjya Mandal, Asfandyar Azhar

详情
英文摘要

When does consulting one information source raise the value of another, and when does it diminish it? We study this question for Bayesian decision-makers facing finite actions. The interaction decomposes into two opposing forces: a complement force, measuring how one source moves beliefs to where the other becomes more useful, and a substitute force, measuring how much the current decision is resolved. Their balance obeys a localization principle: substitution requires an observation to cross a decision boundary, though crossing alone does not guarantee it. Whenever posteriors remain inside the current decision region, the substitute force vanishes, and sources are guaranteed to complement each other, even when one source cannot, on its own, change the decision. The results hold for arbitrarily correlated sources and are formalized in Lean 4. Substitution is confined to the thin boundaries where decisions change. Everywhere else, information cooperates. Code and proofs: https://github.com/nidhishs/all-substitution-is-local.

2604.01441 2026-04-03 eess.SY cs.LG cs.OS cs.SY eess.SP stat.ML

Generative Profiling for Soft Real-Time Systems and its Applications to Resource Allocation

Georgiy A. Bondar, Abigail Eisenklam, Yifan Cai, Robert Gifford, Tushar Sial, Linh Thi Xuan Phan, Abhishek Halder

详情
英文摘要

Modern real-time systems require accurate characterization of task timing behavior to ensure predictable performance, particularly on complex hardware architectures. Existing methods, such as worst-case execution time analysis, often fail to capture the fine-grained timing behaviors of a task under varying resource contexts (e.g., an allocation of cache, memory bandwidth, and CPU frequency), which is necessary to achieve efficient resource utilization. In this paper, we introduce a novel generative profiling approach that synthesizes context-dependent, fine-grained timing profiles for real-time tasks, including those for unmeasured resource allocations. Our approach leverages a nonparametric, conditional multi-marginal Schrödinger Bridge (MSB) formulation to generate accurate execution profiles for unseen resource contexts, with maximum likelihood guarantees. We demonstrate the efficiency and effectiveness of our approach through real-world benchmarks, and showcase its practical utility in a representative case study of adaptive multicore resource allocation for real-time systems.

2604.01440 2026-04-03 cs.DB cs.LG

Know Your Streams: On the Conceptualization, Characterization, and Generation of Intentional Event Streams

Andrea Maldonado, Christian Imenkamp, Hendrik Reiter, Thomas Seidl, Wilhelm Hasselbring, Martin Werner, Agnes Koschmider

详情
英文摘要

The shift toward IoT-enabled, sensor-driven systems has transformed how operational data is generated, favoring continuous, real-time event streams (ES) over static event logs. This evolution presents new challenges for Streaming Process Mining (SPM), which must cope with out-of-order events, concurrent activities, incomplete cases, and concept drifts. Yet, the evaluation of SPM algorithms remains rooted in outdated practices, relying on static logs or artificially streamified data that fail to reflect the complexities of real-world streams. To address this gap, we first perform a comprehensive review of data stream literature to identify stream characteristics currently not reflected in the SPM community. Next, we use this information to extend the conceptual foundation for ES. Finally, we propose Stream of Intent, a prototype generator to produce ES with specific features. Our evaluation shows excellence in producing reproducible, intentional ES for targeted benchmarking and adaptive algorithm development in SPM.

2604.01437 2026-04-03 cs.SE cs.AI

Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software Engineering

Jingyue Li, André Storhaug

Comments 7 pages, 5 figures, accepted to the 2nd International Workshop on Responsible Software Engineering (ResponsibleSE 2026), co-located with FSE

详情
英文摘要

With the advancement of Agentic AI, researchers are increasingly leveraging autonomous agents to address challenges in software engineering (SE). However, the large language models (LLMs) that underpin these agents often function as black boxes, making it difficult to justify the superiority of Agentic AI approaches over baselines. Furthermore, missing information in the evaluation design description frequently renders the reproduction of results infeasible. To synthesize current evaluation practices for Agentic AI in SE, this study analyzes 18 papers on the topic, published or accepted by ICSE 2026, ICSE 2025, FSE 2025, ASE 2025, and ISSTA 2025. The analysis identifies prevailing approaches and their limitations in evaluating Agentic AI for SE, both in current research and potential future studies. To address these shortcomings, this position paper proposes a set of guidelines and recommendations designed to empower reproducible, explainable, and effective evaluations of Agentic AI in software engineering. In particular, we recommend that Agentic AI researchers make their Thought-Action-Result (TAR) trajectories and LLM interaction data, or summarized versions of these artifacts, publicly accessible. Doing so will enable subsequent studies to more effectively analyze the strengths and weaknesses of different Agentic AI approaches. To demonstrate the feasibility of such comparisons, we present a proof-of-concept case study that illustrates how TAR trajectories can support systematic analysis across approaches.

2604.01433 2026-04-03 cs.ET cs.AI eess.SP

Semantically Annotated Multimodal Dataset for RF Interpretation and Prediction

Steve Blandino, Jelena Senic, Raied Caromi, Samuel Berweger, Anuraag Bodi, Camillo Gentile, Nada Golmie

Journal ref NeurIPS 2025 AI for Science Workshop

详情
英文摘要

Current limitations in wireless modeling and radio frequency (RF)-based AI are primarily driven by a lack of high-quality, measurement-based datasets that connect RF signals to their physical environments. RF heatmaps, the typical form of such data, are high-dimensional and complex but lack the geometric and semantic context needed for interpretation, constraining the development of supervised machine learning models. To address this bottleneck, we propose a new class of multimodal datasets that combines RF measurements with auxiliary modalities like high-resolution cameras and lidar to bridge the gap between RF signals and their physical causes. The proposed data collection will span diverse indoor and outdoor environments, featuring both static and dynamic scenarios, including human activities ranging from walking to subtle gestures. By achieving precise spatial and temporal co-registration and creating digital replicas for voxel-level annotation, this dataset will enable transformative AI research. Key tasks include the forward problem of predicting RF heatmaps from visual data to revolutionize wireless system design, and the inverse problem of inferring scene semantics from RF signals, creating a new form of RF-based perception.

2604.01417 2026-04-03 cs.IR cs.CL

ReFormeR: Learning and Applying Explicit Query Reformulation Patterns

Amin Bigdeli, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, Ebrahim Bagheri

详情
英文摘要

We present ReFormeR, a pattern-guided approach for query reformulation. Instead of prompting a language model to generate reformulations of a query directly, ReFormeR first elicits short reformulation patterns from pairs of initial queries and empirically stronger reformulations, consolidates them into a compact library of transferable reformulation patterns, and then selects an appropriate reformulation pattern for a new query given its retrieval context. The selected pattern constrains query reformulation to controlled operations such as sense disambiguation, vocabulary grounding, or discriminative facet addition, to name a few. As such, our proposed approach makes the reformulation policy explicit through these reformulation patterns, guiding the LLM towards targeted and effective query reformulations. Our extensive experiments on TREC DL 2019, DL 2020, and DL Hard show consistent improvements over classical feedback methods and recent LLM-based query reformulation and expansion approaches.