arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2086
2605.05523 2026-05-08 stat.ML cs.LG stat.CO

Permutation-preserving Functions and Neural Vecchia Covariance Kernels

Jian Cao, Nian Liu, Ying Lin

详情
英文摘要

We introduce a novel framework for constructing scalable and flexible covariance kernels for Gaussian processes (GPs) by directly learning the covariance structure under a regression-type parameterization induced by Vecchia approximations, using deep neural architectures. Specifically, we model kriging coefficients and conditional standard deviations, deterministic quantities that uniquely characterize the covariance, providing stable and informative learning targets. Exploiting the permutation-equivariant structure of conditioning sets in the Vecchia factorization, we derive a universal representation for permutation-preserving functions and design neural architectures that respect this symmetry, leading to improved training stability and data efficiency. The proposed approach enables expressive, non-stationary kernel learning while maintaining computational scalability, thereby bridging classical GP methodology with modern deep learning.

2605.05522 2026-05-08 eess.IV cs.CV

Tumor-aware augmentation with task-guided attention analysis improves rectal cancer segmentation from magnetic resonance images

Aneesh Rangnekar, Joao Miranda, Natally Horvat, Stephanie Chahwan, Samir Alrayess, Aditya Apte, Aditi Iyer, Eve LoCastro, Revathi Ravella, Marc J Gollub, Iva Petkovska, Jesse Joshua Smith, Paul Romesser, Julio Garcia-Aguilar, Harini Veeraraghavan, Joseph Deasy

详情
英文摘要

Pretraining on large-scale datasets has been shown to improve transformer generalizability, even for out-of-domain (OOD) modalities and tasks. However, two common assumptions often fail under OOD transfer: that downstream datasets can be adapted to the fixed input geometry of pretrained models and that pretrained representations transfer effectively across imaging modalities. We show that these assumptions break down through two interacting failure modes in CT-to-MRI transfer: inefficient token usage caused by zero-padding to match pretrained input dimensions and ineffective feature adaptation. These failures led to accuracy degradation despite extensive fine-tuning. We investigated these failure modes using two CT-pretrained hierarchical shifted-window transformer backbones, SMIT and Swin UNETR, pretrained with different objectives and datasets. Mechanistic analysis introduced an attention dilution index (ADI), an entropy-based metric quantifying attention diverted toward uninformative padding tokens, and centered kernel alignment (CKA) to measure feature reuse in MRI tasks. ADI increased with zero-padding, while high feature reuse did not necessarily correspond to improved accuracy. To mitigate these issues, we introduced two interventions: a tumor-aware augmentation strategy to improve tumor appearance heterogeneity coverage and an anisotropic cropping strategy to restore token efficiency. Fine-tuning on identical rectal MRI datasets improved detection rates to 224/247 (90.7%) for SMIT and 219/247 (88.7%) for Swin UNETR, demonstrating improved robustness under CT-to-MRI transfer. This study is among the first to examine when pretrained transformers fail to transfer effectively across imaging modalities and how simple mitigation strategies, motivated by mechanistic analysis of datasets, can reduce transfer limitations while improving robustness and MRI detection.

2605.05514 2026-05-08 cs.IT cs.AI cs.LG cs.NI eess.SP math.IT

When Semantic Communication Meets Queueing: Cross-Layer Latency and Task Fidelity Optimization

Yalin E. Sagduyu, Tugba Erpek

详情
英文摘要

Semantic communication (SemCom) with learned encoder-decoder architectures enables end-to-end learning of compact task-oriented representations optimized for the wireless channel, reducing channel resources needed to convey task-relevant information and improving spectrum efficiency. This paper studies semantic image transmission over block Rayleigh fading with AWGN using a multi-task semantic autoencoder that jointly reconstructs images and predicts labels from the received waveform. The latent dimension (complex channel uses per source sample) serves as a cross-layer control variable governing semantic fidelity and channel resource usage. We characterize the resulting latency-task fidelity tradeoff: larger latent representations improve inference accuracy but increase service time, channel uses, and queueing delay. Building on this insight, we develop online semantic-rate controllers that adapt the latent dimension per update under a long-term semantic error constraint. A queue-aware drift-plus-penalty policy minimizes delay subject to an average semantic error cap, while a complementary age-aware policy minimizes time-average Age of Information (AoI). By adapting the semantic rate to congestion and fidelity requirements, the proposed framework improves spectrum utilization and enables timely semantic updates with significantly lower delay and AoI than fixed-rate baselines.

2605.05493 2026-05-08 stat.ME cond-mat.stat-mech cs.LG math.ST stat.TH

A renormalization-group inspired lattice-based framework for piecewise generalized linear models

Joshua C. Chang

Comments Under review

详情
英文摘要

We formally introduce a class of models inspired by renormalization group (RG) theory, built on additive hierarchical expansions analogous to those appearing in functional ANOVA and mixed-effects models. Like ReLU convolutional neural networks, they are almost everywhere locally linear; unlike ReLU networks, their partition structure is explicit, interpretable, and easy to modify or constrain. In these models, one defines a multidimensional lattice partition of the input space and uses it to scaffold variations in regression parameters. Each dimension of the lattice corresponds to an attribute by which the statistics of the problem may vary. The parameters are themselves expressed in the form of an expansion, where each term captures variations relative to a lower (coarser) interaction scale. These models admit multiple equivalent interpretations: as piecewise GLMs, as hierarchical mixed-effects regressions, or as regression trees with structured parameter sharing. Since RG motivates the design of these models, we use techniques from statistical physics -- specifically replica analysis -- to study their generalization properties. Specifically, we analyze the behavior of the Watanabe-Akaike Information Criterion (WAIC) as a proxy for generalization loss. This analysis yields two practical results: (i) guidance on the lattice design as a function of dataset size and predictor dimensionality; and (ii) a principled scaling law for the regularization prior when adding higher-order terms to the expansion so that one can increase model complexity without an expected increase in generalization loss. We evaluate the methodology on public datasets and find performance competitive against both blackbox methods and other intrinsically interpretable approaches.

2605.05472 2026-05-08 cs.CY cs.AI

The Pedagogy of AI Mistakes: Fostering Higher-Order Thinking

Hadi Hosseini

Comments Accepted to AIED-2026; includes supplementary material

详情
英文摘要

As generative AI becomes increasingly integrated into higher education, its frequent errors and hallucinations, often seen as limitations, offer a unique pedagogical opportunity. By framing AI as a ``learning companion'' whose imperfect outputs prompt analysis, evaluation, and reflection, we argue that instructors can engage students in the fundamental processes of higher-order thinking. This paper presents a design-oriented study in which an AI-integrated syllabus in a \textit{database design} course deliberately leverages AI's limitations to foster critical thinking and higher-order cognitive skills aligned with Bloom's taxonomy of learning. Using a mixed-methods approach, we examine how structured interaction with AI-generated errors supports metacognitive engagement, reinforces disciplinary rigor, and relates to students' perceived AI literacy and subject-matter competency.

2605.05459 2026-05-08 cs.CR cs.LG

Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs

Kennedy Edemacu, Mohammad Mahdi Shokri, Vinay M. Shashidhar, Jong Wook Kim

详情
英文摘要

This work introduces PAS -- Privacy Anchor Substitution, a structured mechanism for enabling user location privacy in spatial retrieval-augmented generation (RAG) systems. Unlike conventional differential privacy methods that directly perturb user locations, PAS represents location with relative anchor encoding consisting of an anchor, direction bin, and distance bin, allowing seamless integration with modern RAG pipelines. We evaluate PAS on a synthetic urban dataset and show that it achieves impressive coarse privacy guarantees, with approximately 370-400m adversarial location error, while retaining more than half of the baseline retrieval performance. Despite the slight drop in retrieval performance, the downstream generation quality under PAS remains comparatively robust, indicating that large language models can compensate for imperfect spatial retrieval. Furthermore, we provide empirical analysis showing that PAS exhibits non-monotonic privacy-utility relationship with respect to privacy parameters. We attribute this to geometric bias induced by anchor discretization, making it different from continuous noise mechanisms such as geo-indistinguishability. Our results show that structured spatial representations offer a practical approach to privacy in location based reasoning in RAG systems.

2605.05446 2026-05-08 stat.ML cs.IT cs.LG math.IT math.OC

Convexity in Disguise: A Theoretical Framework for Nonconvex Low-Rank Matrix Estimation

Chengyu Cui, Gongjun Xu

详情
英文摘要

Nonconvex methods have emerged as a dominant approach for low-rank matrix estimation, a problem that arises widely in machine learning and AI for learning and representing high-dimensional data. Existing analyses for these methods often require additional regularization to mitigate nonconvexity, even though such regularization is often unnecessary in practice. Moreover, most analyses rely on problem-specific arguments that are difficult to generalize to more complex settings. In this paper, we develop a theoretical framework for studying nonconvex procedures across a broad class of low-rank matrix estimation problems. Rather than focusing on a specific model, we reveal a fundamental mechanism that explains why nonconvex procedures can behave well in low-rank estimation. Our key device is a {\it benign regularizer} that does not alter the original update rule, but yields an equivalent locally strongly convex formulation of the algorithm. This perspective uncovers a disguised convexity inherent in the nonconvex procedure and provides a new route to theoretical guarantees for nonconvex low-rank matrix estimation.

2605.05436 2026-05-08 stat.ML cs.LG

Estimating Implicit Regularization in Deep Learning

Joseph H. Rudoler, Kevin Tan, Giles Hooker, Konrad P. Kording

详情
英文摘要

Deep learning systems are known to exhibit implicit regularization (alt. implicit bias), favoring simple solutions instead of merely minimizing the loss function. In some cases, we can analytically derive the implicit regularization -- connecting it to an equivalent penalty that augments the learning objective. However, modern deep learning systems are complex, carrying modifications to the training procedure and architecture (e.g. early stopping, minibatching, dropout) whose effects are not always directly interpretable. Although estimating the resulting implicit regularization could aid theorists in algorithm design and practitioners in interpreting their hyperparameter choices, this problem has received little direct attention. It is also tractable: regularization makes weight updates deviate from loss gradients, promising a signal for identifying implicit bias. Here we provide gradient matching methods that can be used to empirically estimate the implicit regularization. Our method works on networks with known regularization, recovering popular explicit penalties like $\ell_1$ and $\ell_2$. It also replicates known implicit effects, like the quadratic weight penalty induced by early stopping in gradient descent, demonstrating that it can be used to test theories of implicit regularization. Crucially, because our method is empirical, it can handle implicit regularization in arbitrary networks. We demonstrate this use by characterizing the effects of dropout in deep networks, showing implicit $\ell_2$ effects in this popular method. Our work shows that practitioners can use gradient matching to understand regularization in networks with implicit biases that are too complicated to derive analytically.

2605.05432 2026-05-08 math.ST cs.LG stat.ML stat.TH

Direct Estimation of Schrödinger Bridge Time-Series Drifts: Finite-Sample, Asymptotic, and Adaptive Guarantees

Othmane Mazhar, Huyên Pham

Comments 36 pages, 3 figures, 8 tables

详情
英文摘要

We study nonparametric estimation of Schrödinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schrödinger bridge time-series (SBTS) drift formula, we analyze a direct Nadaraya--Watson plug-in estimator built from kernelized numerator and denominator terms. Unlike recent SB analyses based on entropic-OT potentials, Sinkhorn iterations, or iterative bridge solvers, our approach works directly at the drift level and isolates \emph{statistical error} from optimization, approximation, and discretization error. Under Hölder regularity, a marginal-density floor, and bounded support, we prove a uniform non-asymptotic bound for admissible bandwidth pairs, a pointwise CLT under genuine undersmoothing, and an adaptive bandwidth selector satisfying an oracle inequality. We also prove a pivot-local minimax lower bound which, through an explicit uniform pivot, yields a global minimax lower bound under transparent compatibility conditions; hence the adaptive selector is minimax-rate optimal up to logarithmic factors. Synthetic experiments provide theorem-targeted diagnostics for finite-sample scaling, Gaussian approximation, and adaptive behavior.

2605.05400 2026-05-08 cs.SE cs.AI cs.HC

Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology

Andrew Zigler

Comments 5 pages. Accepted at VibeX 2026, the 1st International Workshop on Vibe Coding and Vibe Researching, co-located with EASE 2026, Glasgow, June 9-12 2026. Camera-ready version. Research artifact: https://doi.org/10.5281/zenodo.19868258

详情
英文摘要

The rapid adoption of AI coding agents has produced a dominant workflow pattern -- often called "vibe coding" -- that prioritizes speed of implementation over deliberate preparation. We argue that this approach creates a systematic alignment problem: agents that lack sufficient context produce code requiring extensive debugging and refactoring, consuming substantial development time. Drawing on the culinary concept of mise en place (everything in its place; abbreviated MEP), we propose a three-phase preparation methodology for agentic coding: (1) contextual grounding, where domain expertise and tacit knowledge are externalized into structured documents; (2) collaborative specification, where human-agent dialogue produces detailed design artifacts; and (3) task decomposition, where specifications are converted into structured, dependency-aware task records. We report on the application of MEP during a competitive hackathon, where roughly two hours of preparation enabled a rapid parallel implementation of a full-stack educational platform by concurrent AI agents. We introduce the concept of context fluency as an emerging developer skill -- the ability to create rich, structured context that agents can act on -- and connect it to established frameworks in backward design and tacit knowledge externalization. We conclude with a research agenda for empirically validating preparation-phase methodologies in AI-assisted software development.

2605.05382 2026-05-08 math.OC cs.LG

Meta-learning for sample-efficient Bayesian optimisation of fed-batch processes

Becky Langdon, Gabriel D. Patrón, Chrysoula D. Kappatou, Robert M. Lee, Behrang Shafei, Jixiang Qing, Ruth Misener, Mark van der Wilk, Calvin Tsay

Comments 24 pages, 12 figures

详情
英文摘要

The optimisation of fed-batch (bio)chemical process recipes is subject to inherent, underlying, and unmeasurable fluctuations across batches, whose trajectories are difficult to model and costly to measure. Bayesian Optimisation (BayesOpt) is a powerful tool for sampling and optimisation of expensive-to-measure functions. Gaussian Processes (GPs), the surrogate models used in BayesOpt, are static, forecast poorly, and lack generalisation across experiments, limiting their applicability to time-varying batch processes with stochastic parameters, i.e., process fluctuations. This work investigates System-Aware Neural ODE Processes (SANODEP) as a meta-learning model to overcome the limitations of GPs and increase few-shot optimisation performance in BayesOpt. Using a penicillin batch production case study, we find that SANODEP outperforms GP-based BayesOpt in the low-data regime, resulting in improved objectives when few experimental runs are performed. These improvements are observed in both on- and off-distribution batches, highlighting the generalisation capabilities of SANODEP. Using this approach, batch process operators can accelerate the initial optimisation steps in BayesOpt by deploying meta-learning or optimise the process with fewer experiments when the experimental cost is high.

2605.05348 2026-05-08 cs.HC cs.AI

Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

Lana Do, Shasta Ihorn, Charity M. Pitcher-Cooper, Sanjay Mirani, Gio Jung, Hyunjoo Shim, Zhenzhen Qin, Kien T. Nguyen, Vassilis Athitsos, Ilmi Yoon

详情
英文摘要

Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains an open question is how draft quality shapes the editing process. We investigate this through GenAD, an AD generation pipeline that incorporates accessibility guidelines and contextual video information, and RefineAD, an editing interface for human revisions. Human-AI contributions are measured across text, timing, and delivery. In a within-subjects study, we compared authoring from scratch against editing AI drafts of varying quality. GenAD drafts cut completion time by more than half and significantly reduced cognitive load. In contrast, baseline drafts generated from simple, unguided prompts offered only modest benefits, pointing to a minimum quality threshold for effectiveness. Qualitative findings suggest this threshold is content-dependent; as visual complexity increases, so does the quality needed from AI drafts. We propose this as a design principle: effective AI assistance should clear a quality threshold suited to the target content, rather than simply be present.

2605.05287 2026-05-08 cs.CR cs.AI cs.IR cs.SE

Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

Francisco Javier Arceo, Varsha Prasad Narsing

Comments 11 pages, 2 figures, Published in ACM Conference on AI and Agentic Systems

详情
Journal ref
ACM Conference on AI and Agentic Systems (ACM CAIS '26), May 26-29, 2026, San Jose, CA, USA
英文摘要

Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants with heterogeneous data, strict access-control requirements, regulatory compliance, and cost pressures that demand shared infrastructure. A fundamental problem underlies existing RAG architectures in these settings: retrieval systems rank documents by relevance--whether through semantic similarity, keyword matching, or hybrid approaches--not by authorization, so a query from one tenant can surface another tenant's confidential data simply because it scores highest. We formalize this gap and analyze additional shortcomings--including tool-mediated disclosure, context accumulation across turns, and client-side orchestration bypass--that arise when agentic systems conflate relevance with authorization. To address these challenges, we introduce a layered isolation architecture combining policy-aware ingestion, retrieval-time gating, and shared inference, enforced through server-side agentic orchestration. This approach centralizes security-critical operations--tool execution authorization, state isolation, and policy enforcement--on the server, creating natural enforcement points for multitenant isolation while allowing client-side frameworks to retain control over agent composition and latency-sensitive operations. We validate the proposed architecture through an open-source implementation in OGX, a vendor-neutral framework that implements an OpenAI-compatible, open-source Responses API with server-side multi-turn orchestration. We evaluate it empirically and show that ABAC gating eliminates cross-tenant leakage while introducing negligible overhead.

2605.05282 2026-05-08 cs.PL cs.CL

Beyond BLEU: A Semantic Evaluation Method for Code Translation

Julius Näumann, Sven Keidel, Amir Molzam Sharifloo, Mira Mezini

详情
英文摘要

Code translation is one of the core capabilities of LLMs. However, evaluating the correctness of translations remains difficult, as commonly used metrics such as BLEU measure only syntactic similarity, disregarding program semantics. We propose a novel evaluation methodology for code translation tasks, emphasizing semantic equivalence over surface-level string similarity. Our approach applies established compiler testing methodology to a new domain, allowing the assessment of an LLM fine-tuned for binary lifting tasks (i.e. decompiling binaries to higher-level representations). We introduce a semantic correctness score, defined as the proportion of translations that produce correct execution outcomes, and demonstrate its application by evaluating LLM-based and heuristic decompilers. Our findings show that LLM-based approaches significantly outperform heuristic ones, while BLEU scores show negligible correlation with semantic correctness (r = -0.127 to 0.354), demonstrating that syntactic metrics fail to predict functional accuracy.

2605.05271 2026-05-08 cs.CR cs.AI

Shattering the Echo Chamber: Hidden Safeguards in Manuscripts Against the AI Takeover of Peer Review

Oubo Ma, Ruixiao Lin, Jiahao Chen, Yuan Su, Yong Yang, Shouling Ji

Comments 22 pages, 14 figures, 11 tables

详情
英文摘要

As LLMs become increasingly capable, editorial boards and program committees are growing concerned about reviewers who fully outsource peer review to commercial chatbots. This concern stems from prior findings that current chatbots lack the independent critical thinking and depth of reasoning required to assess scientific novelty. One promising direction for mitigating this concern is to embed hidden instructions into manuscripts that disrupt or alter chatbot-generated reviews. However, existing methods remain intuitive and fragile, as they typically rely on homogeneous payloads injected in an inter-stream manner, rendering them susceptible to sanitization or neutralization. In this paper, we identify End-to-End Review Outsourcing as an emerging threat and propose IntraGuard, a black-box, venue-agnostic defense framework grounded in the structural--visual decoupling inherent to the PDF. Designed for committee-side deployment, IntraGuard supports both explicit strategies that trigger refusal or warning signals, and implicit strategies that embed predefined textual markers into the generated review. These strategies can be deployed via any of three intra-stream injection mechanisms, each of which seamlessly embeds heterogeneous defensive text objects within the PDF's underlying structure without altering its visual presentation. Extensive evaluations across 7 real-world commercial chatbot settings and 12 venues spanning diverse disciplines show that IntraGuard achieves a defense success rate of up to 84%, while preserving peer-review invariance for human reviewers. IntraGuard is lightweight and hardware-independent, incurring an average overhead of only one second per manuscript on a commodity personal computer. We further evaluate 11 adaptive attacks spanning manuscript sanitization and instruction interference, and discuss the implications of constructing ensemble defenses.

2605.05270 2026-05-08 stat.ML cs.LG stat.AP

Forecasting Oncology Demand Trends with Boosting-Based Bayesian Conjugate Models

Ademir Batista dos Santos Neto, Tiago Alessandro Espinola Ferreira, Paulo Renato Alves Firmino

Comments 18 pages, 3 figures

详情
英文摘要

Accurate trend forecasting in healthcare time series is essential for planning and resource allocation. This paper proposes a Bayesian framework for predicting oncology demand trends, modeling weekly appointments as a Poisson process with a Gamma prior to the demand rate. To enhance adaptability and capture persistent directional patterns, we incorporate a residual-based boosting mechanism grounded in a Gamma-Log-Normal conjugate structure. This boosting approach allows the model to track both short- and long-term trend shifts while maintaining the analytical tractability of conjugate Bayesian updating. The methodology was evaluated on real oncology service data from Cariri, Ceara, Brazil, and compared against established baselines, including linear regression, ARIMA, naive forecasting, LSTM neural networks, and XGBoost. Results showed that the proposed model outperforms competing methods in trend detection accuracy, with gains in terms of percentage of correct direction of 38.25% in relation to the second best approach in some cases.

2605.05267 2026-05-08 cs.SE cs.AI

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

Kaifeng He, Xiaojun Zhang, Peiliang Cai, Mingwei Liu, Yanlin Wang, Chong Wang, Kaifeng Huang, Bihuan Chen, Xin Peng, Zibin Zheng

详情
英文摘要

Large language models (LLMs) frequently generate defective outputs in code generation tasks, ranging from logical bugs to security vulnerabilities. While these generation failures are often treated as model-level limitations, empirical evidence increasingly traces their root causes to imperfections within the training corpora. Yet, the specific mechanisms linking training data quality issues to generated code quality issues remain largely unmapped. This paper presents a systematic literature review of 114 primary studies to investigate how training data quality issues propagate into code generation. We establish a unified taxonomy that categorizes generated code quality issues across nine dimensions and training data quality issues into code and non-code attributes. Based on this taxonomy, we formalize a causal framework detailing 18 typical propagation mapping mechanisms. Furthermore, we synthesize state-of-the-art detection and mitigation techniques across the data, model, and generation lifecycles. The reviewed literature reveals a clear methodological shift: quality assurance is transitioning from reactive, heuristic-based post-generation filtering toward proactive, data-centric governance and closed-loop repair. Finally, we identify open challenges and outline research directions for developing reliable LLMs for code through integrated data curation and continuous evaluation. Our repository is available at https://github.com/SYSUSELab/From-Data-to-Code.

2605.05266 2026-05-08 cs.CR cs.LG

Differential Privacy in the Extensive-Form Bandit Problem

Stephen Pasteris, Rahul Savani, Theodore Turocy

详情
英文摘要

We consider the extensive-form bandit problem, where on each trial the learner (a user coordinated by a server) plays an extensive-form game against an oblivious adversary, observing the information sets it finds itself in as well as the resulting payoff/loss. We give an algorithm for this problem that satisfies $ε$-local differential privacy and attains a regret of $\tilde{O}(\sqrt{A\ln(S)T}/ε)$, where $A$ is the total number of actions that the learner can possibly take, $S$ is the number of the learner's possible reduced strategies, and $T$ is the number of trials. On each trial, the time complexity of our algorithm is, up to a factor logarithmic in the maximum number of actions at an infoset, equal to the time required for the server to transmit the reduced strategy to the user. We note that local differential privacy is the strongest version of differential privacy and, to the best of our knowledge, this is the first work to study differential privacy of any form in the extensive-form bandit problem.

2605.05262 2026-05-08 stat.ML cs.AI cs.LG

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning

Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song

Comments Preprint, 9 pages, 5 figures

详情
英文摘要

We formalize Rollout Informativeness under a Fixed Budget (RIFB) as the expected non-vanishing policy-gradient mass that a tool-use rollout set injects into Group Relative Policy Optimization (GRPO). We prove that any budget-agnostic independent sampler suffers a collapse rate bounded away from zero for hard prompts regardless of the budget. Motivated by this, we recast intermediate state selection as a monotone submodular maximization problem, where a greedy one-step selector enjoys a 1 minus 1/e approximation guarantee. Our Uncertainty-aware Upper Confidence Bound (UUCB) terms arise as closed-form marginal gains of this objective. This turns the token-level entropy bonus from an empirical trick into an analytic consequence of the formulation. We present InfoTree, a training-time tree-search framework coupling UUCB with a learned Adaptive Budget Allocator (ABA) and an asynchronous Speculative Expansion scheme. ABA rescues prompts whose initial tree is wasted on uniform outcomes, lifting the mixed-outcome ratio from 58.1 percent to 76.3 percent with less than 5 percent budget overhead. Speculative Expansion reduces wall-clock overhead from 14.3 percent to 4.8 percent by tolerating bounded staleness in UUCB scores. Across nine benchmarks spanning math reasoning (AIME 2024 and 2025, MATH-500, OlympiadBench, USAMO), web-search agents (GAIA, HLE-100, BrowseComp-lite), and tool-rich coding and OS agents (APPS-verified, AgentBench-OS), InfoTree outperforms flat GRPO, DeepSearch, Tree-GRPO, AT2PO, CW-GRPO, and RC-GRPO. Head-to-head compositions with Tree-GRPO prefix sharing and CW-GRPO contribution weights deliver further gains, confirming that our selector operates orthogonally to rollout reuse and trajectory re-weighting. A 5 by 5 by 5 robustness grid reveals that over three quarters of the hyperparameter space lies on a performance plateau, confirming UUCB robustness.

2605.05259 2026-05-08 q-bio.BM cond-mat.mtrl-sci cs.AI q-bio.QM

Enhancing Cryo-EM Density Map Segmentation in Phenix for Improved Atomic Model Building

Chenwei Zhang

Comments 10 pages, 4 figures, 2 tables

详情
英文摘要

We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and artifacts that traditionally hinder this step. Our results demonstrate PhenixCraft's superior performance in TM-scores and sequence accuracy, significantly improving upon the limitations and inefficiencies of traditional model building using Phenix.

2605.05257 2026-05-08 cs.IR cs.AI cs.CL

Career-Aware Resume Tailoring via Multi-Source Retrieval-Augmented Generation with Provenance Tracking: A Case Study

Kumar Abhinav

Comments 6 pages, 1 figure, 5 tables. Also available on SSRN

详情
英文摘要

AI-assisted resume tailoring systems commonly operate on a single uploaded resume, which limits their ability to recover relevant experience omitted from the current draft and makes it difficult for users to distinguish grounded edits from model-generated suggestions. This paper presents Resume Tailor, an agentic resume-tailoring system that maintains a longitudinal career vault in a vector database and uses multi-source retrieval-augmented generation (RAG) to assemble job-specific resume content from historical resumes and structured career records. The system is implemented as a 12-node LangGraph pipeline with typed state management, hybrid semantic-lexical confidence scoring, provenance-aware fallback generation, anti-hallucination guardrails, and a conditional review loop. We report a pilot evaluation on nine job descriptions (JDs) across software engineering, data analytics, and business analysis roles using a single candidate's career history. For six JDs where the candidate held at least one prior role in the same occupational category, enabling the career vault improved Applicant Tracking System (ATS)-style fit scores by an average of 7.8 points. For two JDs requiring domain-specific expertise absent from the vault, scores decreased by an average of 8.0 points. One partially overlapping role showed a modest gain of 2 points. These results suggest that longitudinal retrieval can improve resume tailoring when relevant prior experience exists, while also highlighting the need for confidence-gated retrieval when domain overlap is weak.

2605.05252 2026-05-08 cs.SE cs.AI

Automated Population-Level Audit Assurance via AI-Based Document Intelligence

Santosh Vasudevan, Velu Natarajan

详情
英文摘要

Audit transaction testing validates accuracy and completeness of customer-facing statements against internal systems of record. Traditional manual, sample-based review of unstructured PDF statements is labor-intensive and does not scale to millions of transactions. This paper presents an automated framework for large-scale audit transaction testing using AI-based document intelligence. The solution leverages Snowflake Document AI to extract structured data from unstructured PDF statements using a small labeled corpus (approximately 20 documents). Extracted data are reconciled against authoritative source-of-truth datasets to identify discrepancies at scale. Results are surfaced through interactive dashboards and automated reports. The framework enables population-level testing rather than sampling-based approaches, improving audit coverage and supporting continuous assurance objectives. Recent advances in document intelligence and analytics-driven audit frameworks enable scalable, near real-time risk identification and continuous assurance.

2605.05251 2026-05-08 cs.CR cs.LG cs.SE

Identifier-Free Code Embedding Models for Scalable Search

Eric Wolos, Michael Doyle

详情
英文摘要

Function association is a useful process for binary reverse engineers. Search tools exist to perform association at scale, but they do not utilize the full range of capabilities that AI-enabled search provides. Prior work has explored the development of embedding models for association between certain reverse engineering code representations, but that work does not cover bidirectional association between source code and decompiled, stripped code with standard preprocessing requirements. To bridge this gap, we formalize this function association problem and evaluate the extent to which embedding models can bidirectionally associate between these two representations. To improve model performance at this task, we fine-tune a Qwen3-Embedding model with contrastive learning. We find that our new model outperforms other models on all function association baselines by a substantial margin and generalizes to a constant-algorithm association task it is not explicitly trained on.

2605.05250 2026-05-08 cs.IR cs.AI

Decision-aware User Simulation Agent for Evaluating Conversational Recommender Systems

Yuan-Chi Li, Li-Chi Chen, Sung-Yi Wu, Yu-Che Tsai, Shou-De Lin

详情
英文摘要

Conversational recommender systems (CRS) increasingly rely on user simulators for automated evaluation of sales agents. A key requirement for such simulators is the ability to model human decision-making. However, most existing simulation frameworks do not explicitly model the internal decision process, and LLM-based simulators often exhibit unrealistically strong information-processing capabilities, rarely exhibit the hesitation or decision deferral commonly observed in real consumer behavior, resulting in overly high acceptance probabilities. To address this limitation, we propose Hesitator, a theory-grounded user simulation framework that explicitly models human decision-making under choice overload. The framework introduces a modular Decision Module that separates utility-based item selection from overload-aware commitment decisions. Experiments across multiple user simulation frameworks, domains, sales modes, and LLM backbones show that integrating our module consistently mitigates unrealistic behaviors under increasing overload conditions. Furthermore, Hesitator reproduces established behavioral patterns from psychological economics, demonstrating its ability to model human decision behavior.

2605.05246 2026-05-08 eess.SP cs.AI

Memory-Efficient EDA Denoising via Knowledge Distillation for Wearable IoT Under Severe Motion Artifacts and Underwater Conditions

Yongbin Lee, Andrew Peitzsch, Youngsun Kong, Jarod Zizza, Dong-hee Kang, Farnoush Baghestani, Ki H. Chon

详情
英文摘要

Electrodermal activity (EDA) is widely used in wearable Internet of Medical Things (IoMT) systems for continuous health monitoring, including autonomic assessment. However, EDA signals are highly vulnerable to motion artifacts and environmental noise, limiting reliable deployment in harsh operating conditions such as underwater. This study proposes a robust, deployable EDA denoising framework that generalizes across multiple measurement locations and harsh environments. The framework integrates a hybrid CNN-Transformer teacher model with a lightweight depth-wise separable CNN student model via a knowledge distillation (KD) strategy. To further improve robustness, a realistic data augmentation scheme is introduced to simulate diverse motion artifacts and environmental distortions. The KD-based student model significantly reduces model size (7.87 MB to 0.51 MB) and computational cost (105.1M to 11.61M FLOPs) while maintaining denoising performance (MAE: 0.144, SNR improvement: 12.08 dB) using the public dataset validation. In real-world underwater conditions (UMAC dataset) testing, the proposed method substantially improves skin conductance response reconstruction, reducing mean absolute error from 2.809 to 0.215. Furthermore, on independent testing using the CNS-OT dataset, the denoised signals enhanced downstream CNS-OT prediction performance, achieving the highest AUROC (0.806) compared to prior denoising methods. The proposed method also improved the early prediction rate (sensitivity) from 0.550 to 0.767, enabling CNS-OT prediction up to a median of 6.9 minutes before symptom onset. These results demonstrate that the proposed framework not only improves EDA signal quality but also enhances clinically relevant prediction performance while remaining suitable for deployment in resource-constrained wearable Internet of Things systems operating in harsh environments.

2605.05244 2026-05-08 cs.IR cs.AI

Towards Dependable Retrieval-Augmented Generation Using Factual Confidence Prediction

Florian Geissler, Francesco Carella, Laura Fieback, Jakob Spiegelberg

详情
英文摘要

Incorporating specific knowledge into large language models via retrieval-augmented generation (RAG) is a widespread technique that fuels many of today's industry AI applications. A fundamental problem is to assess if the context retrieved by some similarity search provides indeed supporting facts, or instead misguides the generator with irrelevant information. It is critical to associate meaningful confidence measures about the factuality of the retrieval process with the generated answers. We present a new, two-staged approach to predict fact faithfulness of the output of retrieval-augmented generations. First, we employ conformal prediction to select only those retrieved chunks who have a high chance to come from the correct source. This approach in itself can improve answer quality by up to 6% in some of the studied datasets, however, the associated statistical guarantees do not hold generally, since the assumption of sample exchangeability depends on the retriever setup. We present diagnostic metrics to assess whether a setup is suitable. Second, we quantify confidence in the consistency of a generated final answer with a given retrieved context, using an attention-based factuality classifier. This approach can detect inconsistent answers with a chance of up to 77%. Our work helps to establish a novel type of certified RAG systems for a broad range of natural language industry applications.

2605.05242 2026-05-08 cs.IR cs.AI

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang

详情
英文摘要

Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, local context checks, and multi-step hypothesis refinement are difficult to implement by calling a conventional off-the-shelf retriever, and evidence filtered out early cannot be recovered by stronger downstream reasoning. Agentic tasks further exacerbate this limitation because they require agents to orchestrate multiple steps, including discovering intermediate entities, combining weak clues, and revising the plan after observing partial evidence. To tackle the limitation, we study direct corpus interaction (DCI), where an agent searches the raw corpus directly with general-purpose terminal tools (e.g., grep, file reads, shell commands, lightweight scripts), without any embedding model, vector index, or retrieval API. This approach requires no offline indexing and adapts naturally to evolving local corpora. Across IR benchmarks and end-to-end agentic search tasks, this simple setup substantially outperforms strong sparse, dense, and reranking baselines on several BRIGHT and BEIR datasets, and attains strong accuracy on BrowseComp-Plus and multi-hop QA without relying on any conventional semantic retriever. Our results indicate that as language agents become stronger, retrieval quality depends not only on reasoning ability but also on the resolution of the interface through which the model interacts with the corpus, with which DCI opens a broader interface-design space for agentic search.

2605.05240 2026-05-08 eess.SP cs.AI

PPO-Based Dynamic Positioning of HAPS-BS in Wind-Disturbed Stratospheric Maritime Networks

Azim Akhtarshenas, German Svistunov, Matteo Bernabè, Kuangyu Zheng, David López-Pérez

详情
英文摘要

High-Altitude Platform Stations (HAPS) offer a promising solution for wide-area wireless coverage in maritime regions lacking terrestrial infrastructure. However, maintaining reliable performance is challenging due to dynamic ship mobility and atmospheric disturbances, particularly stratospheric wind effects on HAPS positioning. This paper proposes a deep reinforcement learning (DRL)-based framework for dynamic positioning of wind-disturbed HAPS-mounted base stations in maritime networks. A centralized DRL agent deployed on a coordinator HAPS controls multiple serving HAPS using radio measurements and network feedback, capturing realistic channel conditions and user mobility. A Proximal Policy Optimization (PPO) algorithm is employed to learn robust positioning policies that enhance coverage stability and system throughput under wind disturbances. Simulation results show that the proposed approach effectively mitigates wind-induced positioning deviations while ensuring reliable wide-area connectivity for maritime users.

2605.05238 2026-05-08 cs.IR cs.LG cs.SI

Dynamic Graph with Similarity-Aware Attention Graph Neural Network for Recommender Systems

Aadarsh Senapati, Neha Kujur, Vivek Yelleti

详情
英文摘要

Recommender systems are essential components of modern online platforms which presents personalized content in various domain. The traditional collaborative filtering methods depends on static user-item interaction graphs and a limited subset of similarity measures which fail to capture the changing nature of preferences of an individual. Recent graph neural network (GNN) based approaches focus on user-item bipartite graphs which do not use explicit user-user relational modelling and dynamic graph evolution during training. To address these limitations, this paper proposes a Dynamic Graph SimilarityAware Attention Graph Neural Network (DG-SA-GNN) framework that integrates dynamic user similarity graph construction with multi-similarity propagation and attention-based aggregation. The proposed architecture constructs four parallel user similarity graphs using Cosine, Jaccard, Discounted Pearson Correlation Coefficient (Discount PCC), and IPIJ similarity functions, each processed by a dedicated UserGNN module. A Graph Transformer fuses the four graph views, and a CrossAttention module refines user embeddings through interaction with item embeddings. Crucially, the graphs are reconstructed at scheduled epochs during training, enabling the model to adapt to the learned embedding space constituting the dynamic graph component. Mini-batch training with hard negative sampling improves scalability and convergence. Experiments on the MovieLens100K benchmark demonstrate that DG-SA-GNN achieves a Recall@20 of 0.162 and NDCG@20 of 0.065 which is better than the LightGCN baseline in recall. The results validate that dynamic multi-similarity graph construction coupled with attention-based fusion which produce recommendation performance

2605.05231 2026-05-08 eess.AS cs.SD

Prompting Whisper for Joint Speech Transcription and Diarization

Mariia Zamyrova, Henk van den Heuvel

Comments To be presented at the Joint Workshop on HSCMA and CHiME 2026

详情
英文摘要

As part of the MediSpeech project, we aim to develop a system that transcribes and diarizes Dutch conversations between doctors and patients in real-time. In this research (in-progress) we explore ways of efficiently combining Whisper with speaker diarization (SD). After trying to prompt Whisper with text that contains speaker labels, we observed that it is able to insert labels into the transcription with promising accuracy. We continued this line of research by fine-tuning Whisper with speaker-labelled prompts to generate transcriptions in a format similar to that of Serialized Output Training (SOT). Fine-tuning Whisper yielded more consistent speaker IDs across the chunks of long-form audio and improved verbatim transcription. The study uncovered new challenges as Whisper's SD performance suffers because of mistakes that get propagated through prompts and inaccurate timestamps assigned to overlapping speech.