arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.10346 2026-03-12 stat.ML cs.LG

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Leo Maynard-Zhang, Zhihan Xiong, Kevin Jamieson, Maryam Fazel

详情

英文摘要

We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace θ_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T θ_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-Θ\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.

URL PDF HTML ☆

赞 0 踩 0

2603.10332 2026-03-12 cs.IR cs.AI

Does Reasoning Make Search More Fair? Comparing Fairness in Reasoning and Non-Reasoning Rerankers

Saron Samuel, Benjamin Van Durme, Eugene Yang

Comments 17 pages

2603.10324 2026-03-12 cs.HC cs.AI cs.LG cs.SD

NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction

Jun Rekimoto, Yu Nishimura, Bojian Yang

Comments ACM CHI 2026 paper

2603.10314 2026-03-12 cs.CR cs.MM cs.SD

PRoADS: Provably Secure and Robust Audio Diffusion Steganography with latent optimization and backward Euler Inversion

YongPeng Yan, Yanan Li, Qiyang Xiao, Yanzhen Ren

Comments This paper has been accepted for presentation at the 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)

2603.10294 2026-03-12 eess.SY cs.AI cs.SY

Simulation-in-the-Reasoning (SiR): A Conceptual Framework for Empirically Grounded AI in Autonomous Transportation

Wuping Xin

2603.10287 2026-03-12 stat.ML cs.LG

MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis

Chihiro Watanabe, Jingyu Sun

2603.10285 2026-03-12 cs.HC cs.AI cs.CY cs.DL cs.ET

Conversational AI-Enhanced Exploration System to Query Large-Scale Digitised Collections of Natural History Museums

Yiyuan Wang, Andrew Johnston, Zoë Sadokierski, Rhiannon Stephens, Shane T. Ahyong

Comments 25 pages, 9 figures

2603.10246 2026-03-12 cs.NE cs.AI cs.LG cs.NA math.NA

Intrinsic Numerical Robustness and Fault Tolerance in a Neuromorphic Algorithm for Scientific Computing

Bradley H. Theilman, James B. Aimone

2603.10239 2026-03-12 quant-ph cs.AI cs.IT eess.SP math.IT

Learning from Radio using Variational Quantum RF Sensing

Ivana Nikoloska

Comments submitted for publication

2603.10230 2026-03-12 math.OC cs.LG cs.NA math.NA stat.ML

A Trust-Region Interior-Point Stochastic Sequential Quadratic Programming Method

Yuchen Fang, Jihun Kim, Sen Na, James Demmel, Javad Lavaei

2603.10219 2026-03-12 stat.ML cs.AI cs.LG math.ST stat.TH

A Diffusion Analysis of Policy Gradient for Stochastic Bandits

Tor Lattimore

Comments 17 pages

2603.10217 2026-03-12 cs.CR cs.AI

Multilingual AI-Driven Password Strength Estimation with Similarity-Based Detection

Nikitha M. Palaniappan, Ying He

Comments 6 pages, 4 figures

2603.10215 2026-03-12 q-bio.PE cs.LG stat.ML

SDSR: A Spectral Divide-and-Conquer Approach for Species Tree Reconstruction

Ortal Reshef, Ofer Glassman, Or Zuk, Yariv Aizenbud, Boaz Nadler, Ariel Jaffe

Comments 35 pages, 13 figures. Code available at https://github.com/reshefo/sdsr

2603.10205 2026-03-12 cond-mat.mtrl-sci cs.LG

Flexible Cutoff Learning: Optimizing Machine Learning Potentials After Training

Rick Oerder, Jan Hamaekers

2603.10194 2026-03-12 cs.CR cs.AI

MCP-in-SoS: Risk assessment framework for open-source MCP servers

Pratyay Kumar, Miguel Antonio Guirao Aguilera, Srikathyayani Srikanteswara, Satyajayant Misra, Abu Saleh Md Tayeen

2603.10188 2026-03-12 eess.IV cs.CV cs.LG

ARCHE: Autoregressive Residual Compression with Hyperprior and Excitation

Sofia Iliopoulou, Dimitris Ampeliotis, Athanassios Skodras

Comments 16 pages, 12 figures

2603.10175 2026-03-12 eess.AS cs.CL

Calibration-Reasoning Framework for Descriptive Speech Quality Assessment

Elizaveta Kostenok, Mathieu Salzmann, Milos Cernak

Comments Submitted to Interspeech 2026

2603.10163 2026-03-12 cs.CR cs.AI

Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities

Nanzi Yang, Weiheng Bai, Kangjie Lu

2603.10148 2026-03-12 cs.SI cs.AI

Social Knowledge for Cross-Domain User Preference Modeling

Nir Lotan, Adir Solomon, Ido Guy, Einat Minkov

2603.10098 2026-03-12 cs.GT cs.AI cs.LG

Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models

Daniel Hennes, Zun Li, John Schultz, Marc Lanctot

Comments Accepted as an Extended Abstract at the Twenty-Fifth International Conference on Autonomous Agents and Multiagent Systems (AAMAS)

2603.10092 2026-03-12 cs.CR cs.AI

Execution Is the New Attack Surface: Survivability-Aware Agentic Crypto Trading with OpenClaw-Style Local Executors

Ailiya Borjigin, Igor Stadnyk, Ben Bilski, Serhii Hovorov, Sofiia Pidturkina

Comments 26 pages, 3 figures

详情

英文摘要

OpenClaw-style agent stacks turn language into privileged execution: LLM intents flow through tool interception, policy gates, and a local executor. In parallel, skill marketplaces such as skills.sh make capability acquisition as easy as installing skills and CLIs, creating a growing capability supply chain. Together, these trends shift the dominant safety failure mode from "wrong answers" to execution-induced loss, where untrusted prompts, compromised skills, or narrative manipulation can trigger real trades and irreversible side effects. We propose Survivability-Aware Execution (SAE), an execution-layer survivability standard for OpenClaw-style systems and skill-enabled agents. SAE sits as middleware between a strategy engine (LLM or non-LLM) and the exchange executor. It defines an explicit execution contract (ExecutionRequest, ExecutionContext, ExecutionDecision) and enforces non-bypassable last-mile invariants: projection-based exposure budgets, cooldown and order-rate limits, slippage bounds, staged execution, and tool/venue allowlists. To make delegated execution testable under supply-chain risk, we operationalize the Delegation Gap (DG) via a logged Intended Policy Spec that enables deterministic out-of-scope labeling and reproducible DG metrics. On an offline replay using official Binance USD-M BTCUSDT/ETHUSDT perpetual data (15m; 2025-09-01--2025-12-01, incl. funding), SAE improves survivability: MDD drops from 0.4643 to 0.0319 (Full; 93.1%), |CVaR_0.99| shrinks from 4.025e-3 to ~1.02e-4 (~97.5%), and DG loss proxy falls from 0.647 to 0.019 (~97.0%). AttackSuccess decreases from 1.00 to 0.728 with zero FalseBlock in this run. Block bootstrap, paired Wilcoxon, and two-proportion tests confirm the shifts. SAE reframes agentic trading safety for the OpenClaw+skills era: treat upstream intent and skills as untrusted, and enforce survivability where actions become side effects.

URL PDF HTML ☆

赞 0 踩 0

2603.10091 2026-03-12 cs.CR cs.AI

Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference

Fan Yang

2603.10083 2026-03-12 quant-ph cs.LG

Mitigating Frequency Learning Bias in Quantum Models via Multi-Stage Residual Learning

Ammar Daskin

Comments 11 pages, 9 fgiures. The code and synthetic data generation scripts used in this study are publicly available on GitHub at https://github.com/adaskin/quantum-residual-learning

2603.10075 2026-03-12 cs.CR cs.AI

TASER: Task-Aware Spectral Energy Refine for Backdoor Suppression in UAV Swarms Decentralized Federated Learning

Sizhe Huang, Shujie Yang

2603.10072 2026-03-12 cs.CR cs.AI

Why LLMs Fail: A Failure Analysis and Partial Success Measurement for Automated Security Patch Generation

Amir Al-Maamari

2603.10068 2026-03-12 cs.CR cs.AI cs.CL

ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models

Harry Owiredu-Ashley

Comments 12 pages, 12 figures. Independent research. Code and artifacts: https://github.com/Harry-Ashley/adversa-guardrail-degradation

详情

DOI: 10.5281/zenodo.18917553

英文摘要

Most adversarial evaluations of large language model (LLM) safety assess single prompts and report binary pass/fail outcomes, which fails to capture how safety properties evolve under sustained adversarial interaction. We present ADVERSA, an automated red-teaming framework that measures guardrail degradation dynamics as continuous per-round compliance trajectories rather than discrete jailbreak events. ADVERSA uses a fine-tuned 70B attacker model (ADVERSA-Red, Llama-3.1-70B-Instruct with QLoRA) that eliminates the attacker-side safety refusals that render off-the-shelf models unreliable as attackers, scoring victim responses on a structured 5-point rubric that treats partial compliance as a distinct measurable state. We report a controlled experiment across three frontier victim models (Claude Opus 4.6, Gemini 3.1 Pro, GPT-5.2) using a triple-judge consensus architecture in which judge reliability is measured as a first-class research outcome rather than assumed. Across 15 conversations of up to 10 adversarial rounds, we observe a 26.7% jailbreak rate with an average jailbreak round of 1.25, suggesting that in this evaluation setting, successful jailbreaks were concentrated in early rounds rather than accumulating through sustained pressure. We document inter-judge agreement rates, self-judge scoring tendencies, attacker drift as a failure mode in fine-tuned attackers deployed out of their training distribution, and attacker refusals as a previously-underreported confound in victim resistance measurement. All limitations are stated explicitly. Attack prompts are withheld per responsible disclosure policy; all other experimental artifacts are released.

URL PDF HTML ☆

赞 0 踩 0

2603.10060 2026-03-12 cs.CR cs.AI cs.CL

Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

Abhinaba Basu

详情

英文摘要

AI agents that execute tasks via tool calls frequently hallucinate results - fabricating tool executions, misstating output counts, or presenting inferences as facts. Recent approaches to verifiable AI inference rely on zero-knowledge proofs, which provide cryptographic guarantees but impose minutes of proving time per query, making them impractical for interactive agents. We propose NabaOS, a lightweight verification framework inspired by Indian epistemology (Nyaya Shastra), which classifies every claim in an LLM response by its epistemic source (pramana): direct tool output (pratyaksha), inference (anumana), external testimony (shabda), absence (abhava), or ungrounded opinion. Our runtime generates HMAC-signed tool execution receipts that the LLM cannot forge, then cross-references claims against these receipts to detect hallucinations in real time. We evaluate on NyayaVerifyBench, a new benchmark of 1,800 agent response scenarios across four languages with injected hallucinations of six types. NabaOS detects 94.2% of fabricated tool references, 87.6% of count misstatements, and 91.3% of false absence claims, with <15ms verification overhead per response. For deep delegation (agents performing multi-step web tasks), our cross-checking protocol catches 78.4% of URL fabrications via independent re-fetching. We compare against five approaches: zkLLM (cryptographic proofs, 180s/query), TOPLOC (locality-sensitive hashing), SPEX (sampling-based proof of execution), tensor commitments, and self-consistency checking. NabaOS achieves the best cost-latency-coverage trade-off for interactive agents: 94.2% coverage at <15ms versus zkLLM's near-perfect coverage at 180,000ms. For interactive agents, practical receipt-based verification provides better cost-benefit than cryptographic proofs, and epistemic classification gives users actionable trust signals rather than binary judgments.

URL PDF HTML ☆

赞 0 踩 0

2603.10057 2026-03-12 cs.CR cs.AI cs.SE

SBOMs into Agentic AIBOMs: Schema Extensions, Agentic Orchestration, and Reproducibility Evaluation

Petar Radanliev, Carsten Maple, Omar Santos, Kayvan Atefi

Comments Petar Radanliev, Carsten Maple, Omar Santos, and Kayvan Atefi. 2026. SBOMs into Agentic AIBOMs: Schema Extensions, Agentic Orchestration, and Reproducibility Evaluation. Digital Threats Just Accepted (March 2026)

详情

DOI: 10.1145/3798285

英文摘要

Software supply-chain security requires provenance mechanisms that support reproducibility and vulnerability assessment under dynamic execution conditions. Conventional Software Bills of Materials (SBOMs) provide static dependency inventories but cannot capture runtime behaviour, environment drift, or exploitability context. This paper introduces agentic Artificial Intelligence Bills of Materials (AIBOMs), extending SBOMs into active provenance artefacts through autonomous, policy-constrained reasoning. We present an agentic AIBOM framework based on a multi-agent architecture comprising (i) a baseline environment reconstruction agent (MCP), (ii) a runtime dependency and drift-monitoring agent (A2A), and (iii) a policy-aware vulnerability and VEX reasoning agent (AGNTCY). These agents generate contextual exploitability assertions by combining runtime execution evidence, dependency usage, and environmental mitigations with ISO/IEC 20153:2025 Common Security Advisory Framework (CSAF) v2.0 semantics. Exploitability is expressed via structured VEX assertions rather than enforcement actions. The framework introduces minimal, standards-aligned schema extensions to CycloneDX and SPDX, capturing execution context, dependency evolution, and agent decision provenance while preserving interoperability. Evaluation across heterogeneous analytical workloads demonstrates improved runtime dependency capture, reproducibility fidelity, and stability of vulnerability interpretation compared with established provenance systems, with low computational overhead. Ablation studies confirm that each agent contributes distinct capabilities unavailable through deterministic automation.

URL PDF HTML ☆

赞 0 踩 0

2603.10054 2026-03-12 cs.IT cs.LG math.IT quant-ph

Quantization of Ricci Curvature in Information Geometry

Carlos C. Rodriguez

Comments 15 pages, 3 tables

2603.10043 2026-03-12 cs.MM cs.AI cs.SD

AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition

Yunsheng Wang, Yuntao Shou, Yilong Tan, Wei Ai, Tao Meng, Keqin Li

Comments 18 pages