arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1559
2603.03626 2026-03-05 stat.ML cs.LG cs.NA math.NA math.PR

Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme

Zhiyuan Zhan, Masashi Sugiyama

详情
英文摘要

Low-dimensional structure in real-world data plays an important role in the success of generative models, which motivates diffusion models defined on intrinsic data manifolds. Such models are driven by stochastic differential equations (SDEs) on manifolds, which raises the need for convergence theory of numerical schemes for manifold-valued SDEs. In Euclidean space, the Euler--Maruyama (EM) scheme achieves strong convergence with order $1/2$, but an analogous result for manifold discretizations is less understood in general settings. In this work, we study a geometric version of the EM scheme for SDEs on Riemannian manifolds and prove strong convergence with order $1/2$ under geometric and regularity conditions. As an application, we obtain a Wasserstein bound for sampling on manifolds via the geometric EM discretization of Riemannian Langevin dynamics.

2603.03045 2026-03-05 quant-ph cs.AI

QFlowNet: Fast, Diverse, and Efficient Unitary Synthesis with Generative Flow Networks

Inhoe Koo, Hyunho Cha, Jungwoo Lee

Comments 7 pages, 6 figures, IEEE International Conference on Quantum Communications, Networking, and Computing (QCNC 2026)

详情
英文摘要

Unitary Synthesis, the decomposition of a unitary matrix into a sequence of quantum gates, is a fundamental challenge in quantum compilation. Prevailing reinforcement learning (RL) approaches are often hampered by sparse reward signals, which necessitate complex reward shaping or long training times, and typically converge to a single policy, lacking solution diversity. In this work, we propose QFlowNet, a novel framework that learns efficiently from sparse signals by pairing a Generative Flow Network (GFlowNet) with Transformers. Our approach addresses two key challenges. First, the GFlowNet framework is fundamentally designed to learn a diverse policy that samples solutions proportional to their reward, overcoming the single-solution limitation of RL while offering faster inference than other generative models like diffusion. Second, the Transformers act as a powerful encoder, capturing the non-local structure of unitary matrices and compressing a high-dimensional state into a dense latent representation for the policy network. Our agent achieves an overall success rate of 99.7% on a 3-qubit benchmark(lengths 1-12) and discovers a diverse set of compact circuits, establishing QFlowNet as an efficient and diverse paradigm for unitary synthesis.

2603.02887 2026-03-05 cs.GR cs.CV

Generalized non-exponential Gaussian splatting

Sébastien Speierer, Adrian Jarabo

Comments 13 pages, 6 figures, 4 tables

详情
英文摘要

In this work we generalize 3D Gaussian splatting (3DGS) to a wider family of physically-based alpha-blending operators. 3DGS has become the standard de-facto for radiance field rendering and reconstruction, given its flexibility and efficiency. At its core, it is based on alpha-blending sorted semitransparent primitives, which in the limit converges to the classic radiative transfer function with exponential transmittance. Inspired by recent research on non-exponential radiative transfer, we generalize the image formation model of 3DGS to non-exponential regimes. Based on this generalization, we use a quadratic transmittance to define sub-linear, linear, and super-linear versions of 3DGS, which exhibit faster-than-exponential decay. We demonstrate that these new non-exponential variants achieve similar quality than the original 3DGS but significantly reduce the number of overdraws, which result on speed-ups of up to $4\times$ in complex real-world captures, on a ray-tracing-based renderer.

2603.01896 2026-03-05 cs.SE cs.AI cs.PL

Agentic Code Reasoning

Shubham Ugare, Satish Chandra

详情
英文摘要

Can LLM agents explore codebases and reason about code semantics without executing the code? We study this capability, which we call agentic code reasoning, and introduce semi-formal reasoning: a structured prompting methodology that requires agents to construct explicit premises, trace execution paths, and derive formal conclusions. Unlike unstructured chain-of-thought, semi-formal reasoning acts as a certificate: the agent cannot skip cases or make unsupported claims. We evaluate across three tasks (patch equivalence verification, fault localization, and code question answering) and show that semi-formal reasoning consistently improves accuracy on all of them. For patch equivalence, accuracy improves from 78% to 88% on curated examples and reaches 93% on real-world agent-generated patches, approaching the reliability needed for execution-free RL reward signals. For code question answering on RubberDuckBench Mohammad et al. (2026), semi-formal reasoning achieves 87% accuracy. For fault localization on Defects4J Just et al. (2014), semi-formal reasoning improves Top-5 accuracy by 5 percentage points over standard reasoning. These results demonstrate that structured agentic reasoning enables meaningful semantic code analysis without execution, opening practical applications in RL training pipelines, code review, and static program analysis.

2603.01508 2026-03-05 cs.CY cs.AI

The Sentience Readiness Index: A Preliminary Framework for Measuring National Preparedness for the Possibility of Artificial Sentience

Tony Rost

Comments 22 pages, 4 figures

详情
英文摘要

The scientific study of consciousness has begun to generate testable predictions about artificial systems. A landmark collaborative assessment evaluated current AI architectures against six leading theories of consciousness and found that none currently qualifies as a strong candidate, but that future systems might. A precautionary approach to AI sentience, which holds that credible possibility of sentience warrants governance action even without proof, has gained philosophical and institutional traction. Yet existing AI readiness indices, including the Oxford Insights Government AI Readiness Index, the IMF AI Preparedness Index, and the Stanford AI Index, measure economic, technological, and governance preparedness without assessing whether societies are prepared for the possibility that AI systems might warrant moral consideration. This paper introduces the Sentience Readiness Index (SRI), a preliminary composite index measuring national-level preparedness across six weighted categories for 31 jurisdictions. The SRI was constructed following the OECD/JRC framework for composite indicators and employs LLM-assisted expert scoring with iterative expert review to generate an initial dataset. No jurisdiction exceeds ``Partially Prepared'' (the United Kingdom leads at 49/100). Research Environment scores are universally the strongest category; Professional Readiness is universally the weakest. These exploratory findings suggest that if AI sentience becomes scientifically plausible, no society currently possesses adequate institutional, professional, or cultural infrastructure to respond. As a preliminary framework, the SRI provides an initial diagnostic baseline and highlights areas for future methodological refinement, including expanded expert validation, improved measurement instruments, and longitudinal data collection.

2603.00183 2026-03-05 cs.SE cs.AI

Test Case Prioritization: A Snowballing Literature Review and TCPFramework with Approach Combinators

Tomasz Chojnacki, Lech Madeyski

Comments 39 pages, 10 figures

Journal ref Information and Software Technology, vol. 194, p. 108062, 2026

详情
英文摘要

Context: Test case prioritization (TCP) is a technique widely used by software development organizations to accelerate regression testing. Objectives: We aim to systematize existing TCP knowledge and to propose and empirically evaluate a new TCP approach. Methods: We conduct a snowballing review (SR) on TCP, implement a~comprehensive platform for TCP research (TCPFramework), analyze existing evaluation metrics and propose two new ones (\rAPFDc{} and ATR), and develop a~family of ensemble TCP methods called approach combinators. Results: The SR helped identify 324 studies related to TCP. The techniques proposed in our study were evaluated on the RTPTorrent dataset, consistently outperforming their base approaches across the majority of subject programs, and achieving performance comparable to the current state of the art for heuristical algorithms (in terms of \rAPFDc{}, NTR, and ATR), while using a distinct approach. Conclusions: The proposed methods can be used efficiently for TCP, reducing the time spent on regression testing by up to 2.7\%. Approach combinators offer significant potential for improvements in future TCP research, due to their composability.

2602.20541 2026-03-05 cs.GT cs.AI

Maximin Share Guarantees via Limited Cost-Sensitive Sharing

Hana Salavcova, Martin Černý, Arpita Biswas

Comments In Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), Paphos, Cyprus, May 25 - 29, 2026, IFAAMAS, 11 pages

详情
英文摘要

We study the problem of fairly allocating indivisible goods when limited sharing is allowed, that is, each good may be allocated to up to $k$ agents, while incurring a cost for sharing. While classic maximin share (MMS) allocations may not exist in many instances, we demonstrate that allowing controlled sharing can restore fairness guarantees that are otherwise unattainable in certain scenarios. (1) Our first contribution shows that exact maximin share (MMS) allocations are guaranteed to exist whenever goods are allowed to be cost-sensitively shared among at least half of the agents and the number of agents is even; for odd numbers of agents, we obtain a slightly weaker MMS guarantee. (2) We further design a Shared Bag-Filling Algorithm that guarantees a $(1 - C)(k - 1)$-approximate MMS allocation, where $C$ is the maximum cost of sharing a good. Notably, when $(1 - C)(k - 1) \geq 1$, our algorithm recovers an exact MMS allocation. (3) We additionally introduce the Sharing Maximin Share (SMMS) fairness notion, a natural extension of MMS to the $k$-sharing setting. (4) We show that SMMS allocations always exist under identical utilities and for instances with two agents. (5) We construct a counterexample to show the impossibility of the universal existence of an SMMS allocation. (6) Finally, we establish a connection between SMMS and constrained MMS (CMMS), yielding approximation guarantees for SMMS via existing CMMS results. These contributions provide deep theoretical insights for the problem of fair resource allocation when a limited sharing of resources are allowed in multi-agent environments.

2601.23157 2026-03-05 cs.CR cs.LG

No More, No Less: Least-Privilege Language Models

Paulius Rauba, Dominykas Seputis, Patrikas Vanagas, Mihaela van der Schaar

详情
英文摘要

Least privilege is a core security principle: grant each request only the minimum access needed to achieve its goal. Deployed language models almost never follow it, instead being exposed through a single API endpoint that serves all users and requests. This gap exists not because least privilege would be unhelpful; deployments would benefit greatly from reducing unnecessary capability exposure. The real obstacle is definitional and mechanistic: what does "access" mean inside a language model, and how can we enforce it without retraining or deploying multiple models? We take inspiration from least privilege in computer systems and define a class of models called least-privilege language models, where privilege is reachable internal computation during the forward pass. In this view, lowering privilege literally shrinks the model's accessible function class, as opposed to denying access via learned policies. We formalize deployment-time control as a monitor-allocator-enforcer stack, separating (i) request-time signals, (ii) a decision rule that allocates privilege, and (iii) an inference-time mechanism that selects privilege. We then propose Nested Least-Privilege Networks, a shape-preserving, rank-indexed intervention that provides a smooth, reversible control knob. We show that this knob yields policy-usable privilege-utility frontiers and enables selective suppression of targeted capabilities with limited collateral degradation across various policies. Most importantly, we argue for a new deployment paradigm that challenges the premise that language models can only be controlled at the output level.

2601.13286 2026-03-05 econ.GN cs.AI q-fin.EC

AI Skills Improve Job Prospects: Causal Evidence from a Hiring Experiment

Fabian Stephany, Ole Teutloff, Angelo Leone

Comments 57 pages

详情
英文摘要

The growing adoption of artificial intelligence (AI) technologies has heightened interest in the labor market value of AI related skills, yet causal evidence on their role in hiring decisions remains scarce. This study examines whether AI skills serve as a positive hiring signal and whether they can offset conventional disadvantages such as older age or lower formal education. We conducted an experimental survey with 1,725 recruiters from the United Kingdom, the United States and Germany. Using a paired conjoint design, recruiters evaluated hypothetical candidates represented by synthetically designed resumes. Across three occupations of graphic design, office assistance, and software engineering, AI skills significantly increase interview invitation probabilities by approximately 8 to 15 percentage points, compared with candidates without such skills. AI credentials, such as university or company backed skill certificates, only lead to a moderate increase in invitation probabilities compared with self declaration of AI skills. AI skills also partially or fully offset disadvantages related to age and lower education, with effects strongest for office assistants, for whom formal AI certificates play a significant additional compensatory role. Effects are weaker for graphic designers, consistent with more skeptical recruiter attitudes toward AI in creative work. Finally, recruiters own background and AI usage significantly moderate these effects. Overall, the findings demonstrate that AI skills function as a powerful hiring signal and can mitigate traditional labor market disadvantages, with implications for workers skill acquisition strategies and firms recruitment practices.

2512.18925 2026-03-05 cs.SE cs.AI cs.HC

Beyond the Prompt: An Empirical Study of Cursor Rules

Shaokang Jiang, Daye Nam

Comments To appear at MSR 2026

详情
英文摘要

While Large Language Models (LLMs) have demonstrated remarkable capabilities, research shows that their effectiveness depends not only on explicit prompts but also on the broader context provided. This requirement is especially pronounced in software engineering, where the goals, architecture, and collaborative conventions of an existing project play critical roles in response quality. To support this, many AI coding assistants have introduced ways for developers to author persistent, machine-readable directives that encode a project's unique constraints. Although this practice is growing, the content of these directives remains unstudied. This paper presents a large-scale empirical study to characterize this emerging form of developer-provided context. Through a qualitative analysis of 401 open-source repositories containing cursor rules, we developed a comprehensive taxonomy of project context that developers consider essential, organized into five high-level themes: Conventions, Guidelines, Project Information, LLM Directives, and Examples. Our study also explores how this context varies across different project types and programming languages, offering implications for the next generation of context-aware AI developer tools.

2512.12594 2026-03-05 cs.CR cs.LG

ceLLMate: Sandboxing Browser AI Agents

Luoxi Meng, Henry Feng, Ilia Shumailov, Earlence Fernandes

Comments Homepage: https://cellmate-sandbox.github.io

详情
英文摘要

Browser-using agents (BUAs) are an emerging class of AI agents that interact with web browsers in human-like ways, including clicking, scrolling, filling forms, and navigating across pages. While these agents help automate repetitive online tasks, they are vulnerable to prompt injection attacks that trick an agent into performing undesired actions, such as leaking private information or issuing unintended state-changing requests. We propose ceLLMate, a browser-level sandboxing framework that restricts the agent's ambient authority and reduces the blast radius of prompt injections. We address the semantic gap challenge that is fundamental to BUAs -- writing and enforcing security policies for low-level UI tools like clicks and keystrokes is brittle and error-prone. Our core insight is to perform sandboxing at the HTTP layer because all side-effecting UI operations will result in network communication to the website's backend. We implement ceLLMate as an agent-agnostic browser extension and demonstrate how it enables sandboxing policies that block prompt injection attacks in the WASP benchmark with 7.25--15% latency overhead.

2512.06850 2026-03-05 cs.LO cs.AI cs.AR

Formal that "Floats" High: Formal Verification of Floating Point Arithmetic

Hansa Mohanty, Vaisakh Naduvodi Viswambharan, Deepak Narayan Gadde

Comments To appear at the 37th IEEE International Conference on Microelectronics (ICM), December 14-17, 2025, Cairo, Egypt

详情
英文摘要

Formal verification of floating-point arithmetic remains challenging due to non-linear arithmetic behavior and the tight coupling between control and datapath logic. Existing approaches often rely on high-level C models for equivalence checking against Register Transfer Level (RTL) designs, but this introduces abstraction gaps, translation overhead, and limits scalability at the RTL level. To address these challenges, this paper presents a scalable methodology for verifying floating-point arithmetic using direct RTL-to-RTL model checking against a golden reference model. The approach adopts a divide-and conquer strategy that decomposes verification into modular stages, each captured by helper assertions and lemmas that collectively prove a main correctness theorem. Counterexample (CEX)-guided refinement is used to iteratively localize and resolve implementation defects, while targeted fault injection validates the robustness of the verification process against precision-critical datapath errors. To assess scalability and practicality, the methodology is extended with agentic AI-based formal property generation, integrating large language model (LLM)-driven automation with Human-in-the-Loop (HITL) refinement. Coverage analysis evaluates the effectiveness of the approach by comparing handwritten and AI-generated properties in both RTL-to-RTL model checking and standalone RTL verification settings. Results show that direct RTL-to-RTL model checking achieves higher coverage efficiency and requires fewer assertions than standalone verification, especially when combined with AI-generated properties refined through HITL guidance.

2510.26840 2026-03-05 cs.DB cs.AI cs.FL cs.LO

SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification

Rocky Klopfenstein, Yang He, Andrew Tremante, Yuepeng Wang, Nina Narodytska, Haoze Wu

Comments Accepted at ICLR'26

详情
英文摘要

Community-driven Text-to-SQL evaluation platforms play a pivotal role in tracking the state of the art of Text-to-SQL performance. The reliability of the evaluation process is critical for driving progress in the field. Current evaluation methods are largely test-based, which involves comparing the execution results of a generated SQL query and a human-labeled ground-truth on a static test database. Such an evaluation is optimistic, as two queries can coincidentally produce the same output on the test database while actually being different. In this work, we propose a new alternative evaluation pipeline, called SpotIt, where a formal bounded equivalence verification engine actively searches for a database that differentiates the generated and ground-truth SQL queries. We develop techniques to extend existing verifiers to support a richer SQL subset relevant to Text-to-SQL. A performance evaluation of ten Text-to-SQL methods on the high-profile BIRD dataset suggests that test-based methods can often overlook differences between the generated query and the ground-truth. Further analysis of the verification results reveals a more complex picture of the current Text-to-SQL evaluation.

2510.22739 2026-03-05 cs.IR cs.AI cs.CL

REVISION:Reflective Intent Mining and Online Reasoning Auxiliary for E-commerce Visual Search System Optimization

Yiwen Tang, Qiuyu Zhao, Zenghui Sun, Jinsong Lan, Xiaoyong Zhu, Bo Zheng

详情
英文摘要

In Taobao e-commerce visual search, user behavior analysis reveals a large proportion of no-click requests, suggesting diverse and implicit user intents. These intents are expressed in various forms and are difficult to mine and discover, thereby leading to the limited adaptability and lag in platform strategies. This greatly restricts users' ability to express diverse intents and hinders the scalability of the visual search system. This mismatch between user implicit intent expression and system response defines the User-SearchSys Intent Discrepancy. To alleviate the issue, we propose a novel framework REVISION. This framework integrates offline reasoning mining with online decision-making and execution, enabling adaptive strategies to solve implicit user demands. In the offline stage, we construct a periodic pipeline to mine discrepancies from historical no-click requests. Leveraging large models, we analyze implicit intent factors and infer optimal suggestions by jointly reasoning over query and product metadata. These inferred suggestions serve as actionable insights for refining platform strategies. In the online stage, REVISION-R1-3B, trained on the curated offline data, performs holistic analysis over query images and associated historical products to generate optimization plans and adaptively schedule strategies across the search pipeline. Our framework offers a streamlined paradigm for integrating large models with traditional search systems, enabling end-to-end intelligent optimization across information aggregation and user interaction. Experimental results demonstrate that our approach improves the efficiency of implicit intent mining from large-scale search logs and significantly reduces the no-click rate.

2509.25095 2026-03-05 eess.SP cs.LG

Benchmarking ECG FMs: A Reality Check Across Clinical Tasks

M A Al-Masud, Juan Miguel Lopez Alcaraz, Nils Strodthoff

Comments Accepted at ICLR 2026. OpenReview: https://openreview.net/forum?id=xXRqWpt3Xr

详情
英文摘要

The 12-lead electrocardiogram (ECG) is a long-standing diagnostic tool. Yet machine learning for ECG interpretation remains fragmented, often limited to narrow tasks or datasets. FMs promise broader adaptability, but fundamental questions remain: Which architectures generalize best? How do models scale with limited labels? What explains performance differences across model families? We benchmarked eight ECG FMs on 26 clinically relevant tasks using 12 public datasets comprising 1,650 regression and classification targets. Models were evaluated under fine-tuning and frozen settings, with scaling analyses across dataset sizes. Results show heterogeneous performance across domains: in adult ECG interpretation, three FMs consistently outperformed strong supervised baselines. In contrast, ECG-CPC, a compact structured state-space model, dominated 5 of 7 task categories, demonstrating that architecture matters more than scale. FMs improved label efficiency 3.3-9x over supervised baselines, though scaling behaviors varied across architectures. Representation analysis reveals that models with similar performance learn markedly different internal structures, suggesting multiple viable paths to effective ECG representation. Overall, while FMs show promise for adult ECG analysis, substantial gaps remain in cardiac structure, outcome prediction, and patient characterization. ECG-CPC's strong performance despite being orders of magnitude smaller challenges the assumption that FM quality requires massive scale, highlighting architectural inductive biases as an untapped opportunity.

2509.21091 2026-03-05 stat.ML cs.AI cs.LG

Best-of-$\infty$ -- Asymptotic Performance of Test-Time LLM Ensembling

Junpei Komiyama, Daisuke Oba, Masafumi Oyamada

Comments To appear at ICLR2026. Our code is available at https://github.com/jkomiyama/BoInf-code-publish/. Updated the title

详情
英文摘要

We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as \boinflower. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.

2509.13471 2026-03-05 cs.SE cs.AI

An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software

Sina Gogani-Khiabani, Ashutosh Trivedi, Diptikalyan Saha, Saeid Tizpaz-Niari

Comments To appear at ICSE 26. 12 pages

详情
英文摘要

Large language models (LLMs) show promise for translating natural-language statutes into executable logic, but reliability in legally critical settings remains challenging due to ambiguity and hallucinations. We present an agentic approach for developing legal-critical software, using U.S. federal tax preparation as a case study. The key challenge is test-case generation under the oracle problem, where correct outputs require interpreting law. Building on metamorphic testing, we introduce higher-order metamorphic relations that compare system outputs across structured shifts among similar individuals. Because authoring such relations is tedious and error-prone, we use an LLM-driven, role-based framework to automate test generation and code synthesis. We implement a multi-agent system that translates tax code into executable software and incorporates a metamorphic-testing agent that searches for counterexamples. In experiments, our framework using a smaller model (GPT-4o-mini) achieves a worst-case pass rate of 45%, outperforming frontier models (GPT-4o and Claude 3.5, 9-15%) on complex tax-code tasks. These results support agentic LLM methodologies as a path to robust, trustworthy legal-critical software from natural-language specifications.

2508.09844 2026-03-05 quant-ph cs.LG

On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators

Jasmin Frkatovic, Akash Malemath, Ivan Kankeu, Yannick Werner, Matthias Tschöpe, Vitor Fortes Rey, Sungho Suh, Paul Lukowicz, Nikolaos Palaiodimopoulos, Maximilian Kiefer-Emmanouilidis

Comments 20 pages, 6 figures

详情
英文摘要

We investigate the capabilities of Quantum Generative Adversarial Networks (QGANs) in image generations tasks. Our analysis centers on fully quantum implementations of both the generator and discriminator. Through extensive numerical testing of current main architectures, we find that QGANs struggle to generalize across datasets, converging on merely the average representation of the training data. When the output of the generator is a pure-state, we analytically derive a lower bound for the discriminator quality given by the fidelity between the pure-state output of the generator and the target data distribution, thereby providing a theoretical explanation for the limitations observed in current models. Our findings reveal fundamental challenges in the generalization capabilities of existing quantum generative models. While our analysis focuses on QGANs, the results carry broader implications for the performance of related quantum generative models.

2508.08778 2026-03-05 quant-ph cs.LG

Subsampling Factorization Machine Annealing

Yusuke Hama, Tadashi Kadowaki

Comments 29 pages and 17 figures

Journal ref Phys. Rev. Research 8, 013187 (2026)

详情
英文摘要

Quantum computing and machine learning are state-of-the-art technologies that have been investigated intensively in both academia and industry. The hybrid technology of these two ingredients is expected to be a powerful tool to solve complex problems in many branches of science and engineering such as combinatorial optimization problems and accelerate the creation of next-generation technologies. In this work, we develop an algorithm to solve a black-box optimization problem by improving Factorization Machine Annealing (FMA) such that the training of a machine learning model called Factorization Machine is performed not by a full dataset but by a subdataset that is sampled from a full dataset: Subsampling Factorization Machine Annealing (SFMA). According to such a probabilistic training process, the performance of FMA on exploring a solution space gets enhanced. As a result, SFMA exhibits balanced performance of exploration and exploitation, which we call exploitation-exploration functionality. We conduct numerical benchmarking tests to compare the performance of SFMA with that of FMA. Consequently, SFMA certainly exhibits the exploration-exploitation functionality and outperforms FMA in speed and accuracy. In addition, the performance of SFMA can be further improved by sequentially using two subsampling datasets with different sizes such that the size of the latter dataset is substantially smaller than the former. Such a substantial reduction not only enhances the exploration performance of SFMA but also enables us to run it with correspondingly low computational cost even for a large-scale problem. These results indicate the effectiveness of SFMA in a certain class of black-box optimization problems of significant size: the potential scalability of SFMA in solving large-scale problems with correspondingly low computational cost.

2508.04735 2026-03-05 q-bio.QM cs.AI

ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound

Yasemin Ozkut, Pouyan Navard, Srikar Adhikari, Elaine Situ-LaCasse, Josie Acuña, Adrienne Yarnish, Alper Yilmaz

Comments Under Review, https://github.com/OSUPCVLab/ERDES

详情
英文摘要

Retinal detachment (RD) is a vision-threatening condition that requires prompt intervention to preserve sight. A critical factor in treatment urgency and visual prognosis is macular involvement -- whether the macula is intact or detached. Point-of-care ultrasound (POCUS) is a fast, non-invasive and cost-effective imaging tool commonly used to detect RD in various clinical settings. However, its diagnostic utility is limited by the need for expert interpretation, especially in resource-limited environments. Deep learning has the potential to automate RD detection on ultrasound, but there are no clinically available models, and prior research has not addressed macular status -- an essential distinction for surgical prioritization. Additionally, no public dataset currently supports macular-based RD classification using ultrasound video. We introduce Eye Retinal DEtachment ultraSound (ERDES), the first open-access dataset of ocular ultrasound clips labeled for (i) presence of RD and (ii) macula-detached vs. macula-intact status. ERDES enables machine learning development for RD detection. We also provide baseline benchmarks by training 40 models across eight architectures, including 3D convolutional networks and transformer-based models.

2508.00450 2026-03-05 cs.IR cs.AI

When Relevance Meets Novelty: Dual-Stable Periodic Optimization for Serendipitous Recommendation

Hongxiang Lin, Hao Guo, Zeshun Li, Erpeng Xue, Yongqian He, Zhaoyu Hu, Lei Wang, Sheng Chen, Long Zeng

详情
英文摘要

Traditional recommendation systems tend to trap users in strong feedback loops by excessively pushing content aligned with their historical preferences, thereby limiting exploration opportunities and causing content fatigue. Although large language models (LLMs) demonstrate potential with their diverse content generation capabilities, existing LLM-enhanced dual-model frameworks face two major limitations: first, they overlook long-term preferences driven by group identity, leading to biased interest modeling; second, they suffer from static optimization flaws, as a one-time alignment process fails to leverage incremental user data for closed-loop optimization. To address these challenges, we propose the Co-Evolutionary Alignment (CoEA) method. For interest modeling bias, we introduce Dual-Stable Interest Exploration (DSIE) module, jointly modeling long-term group identity and short-term individual interests through parallel processing of behavioral sequences. For static optimization limitations, we design a Periodic Collaborative Optimization (PCO) mechanism. This mechanism regularly conducts preference verification on incremental data using the Relevance LLM, then guides the Novelty LLM to perform fine-tuning based on the verification results, and subsequently feeds back the output of the continually fine-tuned Novelty LLM to the Relevance LLM for re-evaluation, thereby achieving a dynamic closed-loop optimization. Extensive online and offline experiments verify the effectiveness of the CoEA model in serendipitous recommendation.

2507.12686 2026-03-05 stat.ML cs.LG math.PR math.ST stat.TH

Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights

Krishnakumar Balasubramanian, Nathan Ross

Comments To appear in Bernoulli Journal

详情
英文摘要

We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n^{-({1}/{6})^{L-1} + ε}$, for any $ε> 0$.

2507.06764 2026-03-05 eess.IV cs.CV cs.LG math.OC

Fast Equivariant Imaging: Acceleration for Unsupervised Learning via Augmented Lagrangian and Auxiliary PnP Denoisers

Guixian Xu, Jinglai Li, Junqi Tang

Comments 31 pages

详情
英文摘要

In this work, we propose Fast Equivariant Imaging (FEI), a novel unsupervised learning framework to rapidly and efficiently train deep imaging networks without ground-truth data. From the perspective of reformulating the Equivariant Imaging based optimization problem via the method of Lagrange multipliers and utilizing plug-and-play denoisers, this novel unsupervised scheme shows superior efficiency and performance compared to the vanilla Equivariant Imaging paradigm. In particular, our FEI schemes achieve an order-of-magnitude (10x) acceleration over standard EI on training U-Net for X-ray CT reconstruction and image inpainting, with improved generalization performance. In addition, the proposed scheme enables efficient test-time adaptation of a pretrained model to individual samples to secure further performance improvements. Extensive experiments show that the proposed approach provides a noticeable efficiency and performance gain over existing unsupervised methods and model adaptation techniques.

2506.03317 2026-03-05 physics.optics cs.CV cs.LG physics.app-ph

Structural Vibration Monitoring with Diffractive Optical Processors

Yuntian Wang, Zafer Yilmaz, Yuhang Li, Edward Liu, Eric Ahlberg, Farid Ghahari, Ertugrul Taciroglu, Aydogan Ozcan

Comments 33 Pages, 8 Figures, 1 Table

Journal ref Science Advances (2026)

详情
英文摘要

Structural Health Monitoring (SHM) is vital for maintaining the safety and longevity of civil infrastructure, yet current solutions remain constrained by cost, power consumption, scalability, and the complexity of data processing. Here, we present a diffractive vibration monitoring system, integrating a jointly optimized diffractive layer with a shallow neural network-based backend to remotely extract 3D structural vibration spectra, offering a low-power, cost-effective and scalable solution. This architecture eliminates the need for dense sensor arrays or extensive data acquisition; instead, it uses a spatially-optimized passive diffractive layer that encodes 3D structural displacements into modulated light, captured by a minimal number of detectors and decoded in real-time by shallow and low-power neural networks to reconstruct the 3D displacement spectra of structures. The diffractive system's efficacy was demonstrated both numerically and experimentally using millimeter-wave illumination on a laboratory-scale building model with a programmable shake table. Our system achieves more than an order-of-magnitude improvement in accuracy over conventional optics or separately trained modules, establishing a foundation for high-throughput 3D monitoring of structures. Beyond SHM, the 3D vibration monitoring capabilities of this cost-effective and data-efficient framework establish a new computational sensing modality with potential applications in disaster resilience, aerospace diagnostics, and autonomous navigation, where energy efficiency, low latency, and high-throughput are critical.

2505.22554 2026-03-05 stat.ML cs.LG

A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

Agnideep Aich, Md Monzur Murshed, Sameera Hewage, Amanda Mayeaux

详情
英文摘要

Effective feature selection is critical for robust and interpretable predictive modeling in medicine, especially when risk factors matter most in extreme patient strata. Many standard selectors emphasize average associations and can miss predictors whose relevance is concentrated in the distribution tails. We propose a computationally efficient supervised filter based on a Gumbel-copula implied upper-tail concordance score (lambda U), defined as a monotone transformation of Kendall's tau, to rank features by their tendency to be simultaneously extreme with the positive class. We compare against four common baselines (Mutual Information, mRMR, ReliefF, and L1/Elastic-Net) across four classifiers on two diabetes datasets: a large-scale public health survey (CDC, N=253,680) and a clinical benchmark (PIMA, N=768). Analyses include statistical testing, permutation importance, and robustness checks. On CDC, the proposed selector is the fastest and reduces 21 features to 10 (approx 52%). This yields a small but statistically significant trade-off relative to using all features, while performing better than standard filters (Mutual Information, mRMR) and comparably to the strong ReliefF baseline. On PIMA (8 predictors), the resulting ranking attains the highest ROC-AUC numerically, though paired DeLong tests show no significant differences versus strong baselines; PIMA therefore serves as a ranking-only sanity check in a low-dimensional setting. Across both datasets, the lambda U-based selector highlights clinically coherent predictors and provides an efficient, interpretable screening step that can complement standard feature-selection methods in public health and clinical risk prediction.

2505.05489 2026-03-05 cs.NE cs.LG

Akkumula: Evidence accumulation driver models with Spiking Neural Networks

Alberto Morando

详情
英文摘要

Processes of evidence accumulation can make driver models more realistic, by explaining how drivers adjust their actions based on perceptual inputs and decision boundaries. The absence of a standard modelling approach limits their adoption; existing methods are hand-crafted, hard to adapt, and computationally inefficient. This paper presents Akkumula, an evidence accumulation modelling framework that uses Spiking Neural Networks and other deep learning techniques. Tested on data from a test-track experiment, the model can reproduce the time course of braking, accelerating, and steering. Akkumula integrates with existing machine learning architectures, scales to large datasets, adapts to different driving scenarios, and keeps its internal logic relatively transparent.

2505.02211 2026-03-05 eess.IV cs.CV

Intelligent Diagnosis Using Dual-Branch Attention Network for Rare Thyroid Carcinoma Recognition with Ultrasound Imaging

Peiqi Li, Yincheng Gao, Renxing Li, Haojie Yang, Yunyun Liu, Boji Liu, Jiahui Ni, Ying Zhang, Yulu Wu, Xiaowei Fang, Lehang Guo, Liping Sun, Jiangang Chen

详情
英文摘要

Heterogeneous morphological features and data imbalance pose significant challenges in rare thyroid carcinoma classification using ultrasound imaging. To address this issue, we propose a novel multitask learning framework, Channel-Spatial Attention Synergy Network (CSASN), which integrates a dual-branch feature extractor - combining EfficientNet for local spatial encoding and ViT for global semantic modeling, with a cascaded channel-spatial attention refinement module. A residual multiscale classifier and dynamically weighted loss function further enhance classification stability and accuracy. Trained on a multicenter dataset comprising more than 2000 patients from four clinical institutions, our framework leverages a residual multiscale classifier and dynamically weighted loss function to enhance classification stability and accuracy. Extensive ablation studies demonstrate that each module contributes significantly to model performance, particularly in recognizing rare subtypes such as FTC and MTC carcinomas. Experimental results show that CSASN outperforms existing single-stream CNN or Transformer-based models, achieving a superior balance between precision and recall under class-imbalanced conditions. This framework provides a promising strategy for AI-assisted thyroid cancer diagnosis.

2504.10507 2026-03-05 cs.IR cs.LG

PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems

Prabhat Agarwal, Anirudhan Badrinath, Laksh Bhasin, Jaewon Yang, Edoardo Botta, Jiajing Xu, Charles Rosenberg

详情
英文摘要

Generative retrieval methods utilize generative sequential modeling techniques, such as transformers, to generate candidate items for recommender systems. These methods have demonstrated promising results in academic benchmarks, surpassing traditional retrieval models like two-tower architectures. However, current generative retrieval methods lack the scalability required for industrial recommender systems, and they are insufficiently flexible to satisfy the multiple metric requirements of modern systems. This paper introduces PinRec, a novel generative retrieval model developed for applications at Pinterest. PinRec utilizes outcome-conditioned generation, enabling modelers to specify how to balance various outcome metrics, such as the number of saves and clicks, to effectively align with business goals and user exploration. Additionally, PinRec incorporates multi-token generation to enhance output diversity while optimizing generation. Our experiments demonstrate that PinRec can successfully balance performance, diversity, and efficiency, delivering a significant positive impact to users using generative models. This paper marks a significant milestone in generative retrieval, as it presents, to our knowledge, the first rigorous study on implementing generative retrieval at the scale of Pinterest.

2504.07109 2026-03-05 cs.IR cs.AI cs.CL

OSCAR: Online Soft Compression And Reranking

Maxime Louis, Thibault Formal, Hervé Dejean, Stéphane Clinchant

详情
英文摘要

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge, leading to improved accuracy and relevance. However, scaling RAG pipelines remains computationally expensive as retrieval sizes grow. To address this, we introduce OSCAR, a novel query-dependent online soft compression method that reduces computational overhead while preserving performance. Unlike traditional hard compression methods, which shorten retrieved texts, or soft compression approaches, which map documents to continuous embeddings offline, OSCAR dynamically compresses retrieved information at inference time, eliminating storage overhead and enabling higher compression rates. Additionally, we extend OSCAR to simultaneously perform reranking, further optimizing the efficiency of the RAG pipeline. Our experiments demonstrate state-of-the-art performance with a 2-5x speed-up in inference and minimal to no loss in accuracy for LLMs ranging from 1B to 24B parameters. The models are available at: https://huggingface.co/collections/naver/oscar-67d446a8e3a2551f57464295.

2503.03141 2026-03-05 eess.IV cs.CV cs.LG

Implicit U-KAN2.0: Dynamic, Efficient and Interpretable Medical Image Segmentation

Chun-Wun Cheng, Yining Zhao, Yanqi Cheng, Javier A. Montoya-Zegarra, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Comments Accepted in MICCAI 2025

详情
英文摘要

Image segmentation is a fundamental task in both image analysis and medical applications. State-of-the-art methods predominantly rely on encoder-decoder architectures with a U-shaped design, commonly referred to as U-Net. Recent advancements integrating transformers and MLPs improve performance but still face key limitations, such as poor interpretability, difficulty handling intrinsic noise, and constrained expressiveness due to discrete layer structures, often lacking a solid theoretical foundation.In this work, we introduce Implicit U-KAN 2.0, a novel U-Net variant that adopts a two-phase encoder-decoder structure. In the SONO phase, we use a second-order neural ordinary differential equation (NODEs), called the SONO block, for a more efficient, expressive, and theoretically grounded modeling approach. In the SONO-MultiKAN phase, we integrate the second-order NODEs and MultiKAN layer as the core computational block to enhance interpretability and representation power. Our contributions are threefold. First, U-KAN 2.0 is an implicit deep neural network incorporating MultiKAN and second order NODEs, improving interpretability and performance while reducing computational costs. Second, we provide a theoretical analysis demonstrating that the approximation ability of the MultiKAN block is independent of the input dimension. Third, we conduct extensive experiments on a variety of 2D and a single 3D dataset, demonstrating that our model consistently outperforms existing segmentation networks. Project Website: https://math-ml-x.github.io/IUKAN2/