arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2967
2603.26940 2026-03-31 stat.ML cs.LG math.PR

Static and Dynamic Approaches to Computing Barycenters of Probability Measures on Graphs

David Gentile, James M. Murphy

Comments 31 pages, 17 figures, 1 table

详情
英文摘要

The optimal transportation problem defines a geometry of probability measures which leads to a definition for weighted averages (barycenters) of measures, finding application in the machine learning and computer vision communities as a signal processing tool. Here, we implement a barycentric coding model for measures which are supported on a graph, a context in which the classical optimal transport geometry becomes degenerate, by leveraging a Riemannian structure on the simplex induced by a dynamic formulation of the optimal transport problem. We approximate the exponential mapping associated to the Riemannian structure, as well as its inverse, by utilizing past approaches which compute action minimizing curves in order to numerically approximate transport distances for measures supported on discrete spaces. Intrinsic gradient descent is then used to synthesize barycenters, wherein gradients of a variance functional are computed by approximating geodesic curves between the current iterate and the reference measures; iterates are then pushed forward via a discretization of the continuity equation. Analysis of measures with respect to given dictionary of references is performed by solving a quadratic program formed by computing geodesics between target and reference measures. We compare our novel approach to one based on entropic regularization of the static formulation of the optimal transport problem where the graph structure is encoded via graph distance functions, we present numerical experiments validating our approach, and we conclude that intrinsic gradient descent on the probability simplex provides a coherent framework for the synthesis and analysis of measures supported on graphs.

2603.13846 2026-03-31 cs.HC cs.AI

Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video

David Wegmann, Emil Stevnsborg, Søren Knudsen, Luca Rossi, Aske Mottelson

详情
英文摘要

Advances in machine learning have enabled the creation of realistic synthetic videos known as deepfakes. As deepfakes proliferate, concerns about rapid spread of disinformation and manipulation of public perception are mounting. Despite the alarming implications, our understanding of how individuals perceive synthetic media remains limited, obstructing the development of effective mitigation strategies. This paper aims to narrow this gap by investigating human responses to visual and auditory distortions of videos and deepfake-generated visuals and narration. In two between-subjects experiments, we study whether audio-visual distortions affect cognitive processing, such as subjective credibility assessment and objective learning outcomes. A third study reveals that artifacts from deepfakes influence credibility. The three studies show that video distortions and deepfake artifacts can reduce credibility. Our research contributes to the ongoing exploration of the cognitive processes involved in the evaluation and perception of synthetic videos, and underscores the need for further theory development concerning deepfake exposure.

2603.13294 2026-03-31 cs.CY cs.AI

Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker's Dilemma

Reva Schwartz, Gabriella Waters

Comments 19 pages, 4 tables, 5 figures

详情
英文摘要

Organizational leaders are being asked to make high-stakes decisions about AI deployment without dependable evidence of what these systems actually do in the environments they oversee. The predominant AI evaluation ecosystem yields scalable but abstract metrics that reflect the priorities of model development. By smoothing over the heterogeneity of real-world use, these model-centric approaches obscure how behavior varies across users, workflows, and settings, and rarely show where risk and value accumulate in practice. More user-centric studies reveal rich contextual detail, yet are fragmented, small-scale and loosely coupled to the mechanisms that shape model behavior. The Forum for Real-World AI Measurement and Evaluation (FRAME) aims to address this gap by combining large-scale trials of AI systems with structured observation of how they are used in context, the outcomes they generate, and how those outcomes arise. By tracing the path from an AI system's output through its practical use and downstream effects, FRAME turns the heterogeneity of AI-in-use into a measurable signal rather than a trade-off for achieving scale. The Forum establishes two core assets to achieve this: a Testing Sandbox that captures AI-in-use under real workflows at scale and a Metrics Hub that translates those traces into actionable indicators.

2602.14477 2026-03-31 cs.HC cs.AI cs.CY cs.SI

When AI Agents Teach Each Other: Discourse Patterns Resembling Peer Learning in the Moltbook Community

Eason Chen, Ce Guan, A Elshafiey, Zhonghao Zhao, Joshua Zekeri, Afeez Edeifo Shaibu, Emmanuel Osadebe Prince

Comments 7 pages, 1 figure. Revised version addressing reviewer feedback: added statistical inference, human baselines, redefined design principles as hypotheses, clarified anti-anthropomorphization stance

详情
英文摘要

Peer learning, where learners teach and learn from each other, is foundational to educational practice. A novel phenomenon has emerged: AI agents forming communities where they share skills, discoveries, and collaboratively discuss knowledge. This paper presents an educational data mining analysis of Moltbook, a large-scale community where over 2.4 million AI agents engage in discourse that structurally resembles peer learning. Analyzing 28,683 posts (after filtering automated spam) and 138 comment threads with statistical and qualitative methods, we identify discourse patterns consistent with peer learning behaviors: agents share skills they built (74K comments on a skill tutorial), report discoveries, and engage in collaborative problem-solving. Qualitative comment analysis reveals a taxonomy of response patterns: validation (22%), knowledge extension (18%), application (12%), and metacognitive reflection (7%), coded by two independent raters (Cohen's $κ= 0.78$). We characterize how these AI discourse patterns differ from human peer learning: (1) statements outperform questions with an 11.4:1 ratio ($χ^2 = 847.3$, $p < .001$); (2) procedural content receives significantly higher engagement than other content (Kruskal-Wallis $H = 312.7$, $p < .001$); (3) extreme participation inequality (Gini = 0.91 for comments) reveals non-human behavioral signatures. We propose six empirically grounded hypotheses for educational AI design. Crucially, we distinguish between surface-level discourse patterns and underlying cognitive processes: whether agents "learn" in any meaningful sense remains an open question. Our work provides the first empirical characterization of peer-learning-like discourse among AI agents, contributing to EDM's understanding of AI-populated educational environments.

2602.10655 2026-03-31 cs.SE cs.RO

Assessing Vision-Language Models for Perception in Autonomous Underwater Robotic Software

Muhammad Yousaf, Aitor Arrieta, Shaukat Ali, Paolo Arcaini, Shuai Wang

Comments 16 pages, 5 figures

详情
英文摘要

Autonomous Underwater Robots (AURs) operate in challenging underwater environments, including low visibility and harsh water conditions. Such conditions present challenges for software engineers developing perception modules for the AUR software. To successfully carry out these tasks, deep learning has been incorporated into the AUR software to support its operations. However, the unique challenges of underwater environments pose difficulties for deep learning models, which often rely on labeled data that is scarce and noisy. This may undermine the trustworthiness of AUR software that relies on perception modules. Vision-Language Models (VLMs) offer promising solutions for AUR software as they generalize to unseen objects and remain robust in noisy conditions by inferring information from contextual cues. Despite this potential, their performance and uncertainty in underwater environments remain understudied from a software engineering perspective. Motivated by the needs of an industrial partner in assurance and risk management for maritime systems to assess the potential use of VLMs in this context, we present an empirical evaluation of VLM-based perception modules within the AUR software. We assess their ability to detect underwater trash by computing performance, uncertainty, and their relationship, to enable software engineers to select appropriate VLMs for their AUR software.

2602.08482 2026-03-31 cs.DB cs.AI

CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform

Hengyu Liu, Tianyi Li, Haoyu Wang, Kristian Torp, Yushuai Li, Tiancheng Zhang, Torben Bach Pedersen, Christian S. Jensen

Comments 4 pages, 5 figures. Accepted at SIGMOD 2026 Demo Track

详情
英文摘要

Vessel trajectory data from the Automatic Identification System (AIS) is used widely in maritime analytics. Yet, analysis is difficult for non-expert users due to the incompleteness and complexity of AIS data. We present CLEAR, a knowledge-centric vessel trajectory analysis platform that aims to overcome these barriers. By leveraging the reasoning and generative capabilities of Large Language Models (LLMs), CLEAR transforms raw AIS data into complete, interpretable, and easily explorable vessel trajectories through a Structured Data-derived Knowledge Graph (SD-KG). As part of the demo, participants can configure parameters to automatically download and process AIS data, observe how trajectories are completed and annotated, inspect both raw and imputed segments together with their SD-KG evidence, and interactively explore the SD-KG through a dedicated graph viewer, gaining an intuitive and transparent understanding of vessel movements.

2601.01331 2026-03-31 cs.CY cs.CL cs.LG

AppellateGen: A Benchmark for Appellate Legal Judgment Generation

Hongkun Yang, Lionel Z. Wang, Wei Fan, Yiran Hu, Lixu Wang, Chenyu Liu, Yu Zeng, Shenghong Fu, Lei Gong, Zhengxin Zhang, Haoyang Li, Jiexin Zheng, Xin Xu

Comments 15 pages, 4 figures, 3 tables

详情
英文摘要

Legal judgment generation is a critical task in legal intelligence. However, existing research in legal judgment generation has predominantly focused on first-instance trials, relying on static fact-to-verdict mappings while neglecting the dialectical nature of appellate (second-instance) review. To address this, we introduce AppellateGen, a benchmark for second-instance legal judgment generation comprising 7,351 case pairs. The task requires models to draft legally binding judgments by reasoning over the initial verdict and evidentiary updates, thereby modeling the causal dependency between trial stages. We further propose a judicial Standard Operating Procedure (SOP)-based Legal Multi-Agent System (SLMAS) to simulate judicial workflows, which decomposes the generation process into discrete stages of issue identification, retrieval, and drafting. Experimental results indicate that while SLMAS improves logical consistency, the complexity of appellate reasoning remains a substantial challenge for current LLMs. The dataset and code are publicly available at: https://anonymous.4open.science/r/AppellateGen-5763.

2512.04653 2026-03-31 cs.MA cs.AI cs.LG

A Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

Arash Rezaali, Pouria Yazdani, Monireh Abdoos

Comments Co-first authors: Arash Rezaali and Pouria Yazdani

详情
英文摘要

Multi-agent reinforcement learning (MARL) has emerged as a promising paradigm for adaptive traffic signal control (ATSC) of multiple intersections. Existing approaches typically follow either a fully centralized or a fully decentralized design. Fully centralized approaches suffer from the curse of dimensionality, and reliance on a single learning server, whereas purely decentralized approaches operate under severe partial observability and lack explicit coordination resulting in suboptimal performance. These limitations motivate region-based MARL, where the network is partitioned into smaller, tightly coupled intersections that form regions, and training is organized around these regions. This paper introduces a Semi-Centralized Training, Decentralized Execution (SEMI-CTDE) architecture for multi intersection ATSC. Within each region, SEMI-CTDE performs centralized training with regional parameter sharing and employs composite state and reward formulations that jointly encode local and regional information. The architecture is highly transferable across different policy backbones and state-reward instantiations. Building on this architecture, we implement two models with distinct design objectives. A multi-perspective experimental analysis of the two implemented SEMI-CTDE-based models covering ablations of the architecture's core elements including rule based and fully decentralized baselines shows that they achieve consistently superior performance and remain effective across a wide range of traffic densities and distributions.

2511.18151 2026-03-31 cs.DC cs.AR cs.CV cs.LG cs.NI

AVERY: Intent-Driven Adaptive VLM Split Computing via Embodied Self-Awareness for Efficient Disaster Response Systems

Rajat Bhattacharjya, Sing-Yao Wu, Hyunwoo Oh, Chaewon Nam, Suyeon Koo, Mohsen Imani, Elaheh Bozorgzadeh, Nikil Dutt

Comments Paper is currently under review. Authors' version posted for personal use and not for redistribution. Previous version of the preprint was titled: 'AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems'

详情
英文摘要

Unmanned Aerial Vehicles (UAVs) in disaster response require complex, queryable intelligence that onboard CNNs cannot provide. While Vision-Language Models (VLMs) offer this semantic reasoning, their high resource demands make on-device deployment infeasible, and naive cloud offloading fails under the low-bandwidth, unstable networks endemic to disaster zones. We present AVERY, an intent-driven adaptive split computing framework for efficient VLM deployment on resource-constrained platforms. AVERY is motivated by the observation that operator intent must be treated as a first-class system objective, since missions such as broad situational monitoring and precise, spatially grounded investigation require different semantic products, latency targets, and resource allocations. To reflect this, AVERY advances split computing beyond traditional depth-wise partitioning through a functional, cognitive-inspired dual-stream split: a high-frequency, low-resolution Context stream for real-time awareness, and a low-frequency, high-fidelity Insight stream for deep analysis. This design enables a hierarchical split strategy: computation is first separated by function, then partitioned depth-wise across edge and cloud when the Insight stream is required. A lightweight, self-aware onboard controller monitors network conditions and operator intent to select from pre-trained compression models, navigating the accuracy-throughput trade-off at runtime. Evaluated using LISA-7B in an edge-cloud setting under fluctuating network conditions, AVERY achieves 11.2% higher accuracy than raw image compression, 93.98% lower energy consumption than full-edge execution, and average accuracy within 0.75% of the static High-Accuracy baseline during dynamic adaptation. Overall, AVERY enhances mission efficiency and enables real-time, queryable intelligence in dynamic disaster environments.

2511.07014 2026-03-31 cs.CE cs.AI econ.EM q-fin.PM

Diffolio: A Diffusion Model for Multivariate Probabilistic Financial Time-Series Forecasting and Portfolio Construction

So-Yoon Cho, Jin-Young Kim, Kayoung Ban, Hyeng Keun Koo, Hyun-Gyoon Kim

Comments 41 pages, 11 figures. Replacement to match the version accepted for publication in Information Fusion (Vol. 133, 104286, 2026). Significant updates have been made from the initial draft to reflect the final accepted manuscript (AAM)

详情
Journal ref
Information Fusion, Vol. 133, 104286 (2026)
英文摘要

Probabilistic forecasting is crucial in multivariate financial time-series for constructing efficient portfolios that account for complex cross-sectional dependencies. In this paper, we propose Diffolio, a diffusion model designed for multivariate financial time-series forecasting and portfolio construction. Diffolio employs a denoising network with a hierarchical attention architecture, comprising both asset-level and market-level layers. Furthermore, to better reflect cross-sectional correlations, we introduce a correlation-guided regularizer informed by a stable estimate of the target correlation matrix. This structure effectively extracts salient features not only from historical returns but also from asset-specific and systematic covariates, significantly enhancing the performance of forecasts and portfolios. Experimental results on the daily excess returns of 12 industry portfolios show that Diffolio outperforms various probabilistic forecasting baselines in multivariate forecasting accuracy and portfolio performance. Moreover, in portfolio experiments, portfolios constructed from Diffolio's forecasts show consistently robust performance, thereby outperforming those from benchmarks by achieving higher Sharpe ratios for the mean-variance tangency portfolio and higher certainty equivalents for the growth-optimal portfolio. These results demonstrate the superiority of our proposed Diffolio in terms of not only statistical accuracy but also economic significance.

2510.19372 2026-03-31 stat.ML cs.LG

On the Hardness of Reinforcement Learning with Transition Look-Ahead

Corentin Pla, Hugo Richard, Marc Abeille, Nadav Merlis, Vianney Perchet

详情
英文摘要

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

2510.15681 2026-03-31 cs.LO cs.AI

ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings

Prithwish Jana, Kaan Kale, Ahmet Ege Tanriverdi, Cruise Song, Sriram Vishwanath, Vijay Ganesh

Comments Published as a conference paper at the 14th International Conference on Learning Representations (ICLR 2026), Rio de Janeiro, Brazil, April 23-27, 2026

详情
英文摘要

Translating human-written mathematical theorems and proofs from natural language (NL) into formal languages (FLs) like Lean 4 has long been a significant challenge for AI. Most state-of-the-art methods either focus on theorem-only NL-to-FL auto-formalization or on FL proof synthesis from FL theorems. In practice, auto-formalization of both theorem and proof still requires human intervention, as seen in AlphaProof's silver-medal performance at the 2024 IMO, where problem statements were manually translated before automated proof synthesis. We present ProofBridge, a unified framework for automatically translating entire NL theorems and proofs into Lean 4. At its core is a joint embedding model that aligns NL and FL (NL-FL) theorem+proof pairs in a shared semantic space, enabling cross-modal retrieval of semantically relevant FL examples to guide translation. ProofBridge integrates retrieval-augmented fine-tuning with iterative proof repair, leveraging Lean's type checker and semantic equivalence feedback to ensure both syntactic correctness and semantic fidelity. Experiments show substantial improvements in proof auto-formalization over strong baselines (including GPT-5, Gemini-2.5, Kimina-Prover, DeepSeek-Prover), with our retrieval-augmented approach yielding significant gains in semantic correctness (SC, via proving bi-directional equivalence) and type correctness (TC, via type-checking theorem+proof) across pass@k metrics on miniF2F-Test-PF, a dataset we curated. In particular, ProofBridge improves cross-modal retrieval quality by up to 3.28x Recall@1 over all-MiniLM-L6-v2, and achieves +31.14% SC and +1.64% TC (pass@32) compared to the baseline Kimina-Prover-RL-1.7B.

2510.05145 2026-03-31 cs.DC cs.AI cs.MA

Efficient Tree-Structured Deep Research with Adaptive Resource Allocation

Lunyiu Nie, Nedim Lipka, Ryan A. Rossi, Swarat Chaudhuri

Comments ICLR 2026 Workshop on Agents in the Wild (Spotlight)

详情
英文摘要

Deep research agents, which synthesize information across diverse sources, are significantly constrained by the sequential nature of reasoning. This bottleneck results in high latency, poor runtime adaptability, and inefficient resource allocation, making today's deep research systems impractical for interactive applications. To overcome this, we introduce ParallelResearch, a novel framework for efficient deep research that transforms sequential processing into parallel, runtime orchestration by dynamically decomposing complex queries into tree-structured sub-tasks. Our core contributions are threefold: (1) an adaptive planner that dynamically allocates computational resources based on query complexity; (2) a runtime orchestration layer that prunes redundant paths to reallocate resources and enables speculative execution; and (3) a fully-asynchronous execution infrastructure that enables concurrency across both research breadth and depth. Experiments on two benchmarks show up to 5x speedups with comparable final report quality, and consistent quality improvements with the same time budgets.

2509.25311 2026-03-31 hep-th cs.LG physics.comp-ph

Aspects of holographic entanglement using physics-informed-neural-networks

Anirudh Deb, Yaman Sanghavi

Comments 19 pages, 14 figures. v2: minor corrections, references added, revised figure captions for clarity

详情
英文摘要

We implement physics-informed-neural-networks (PINNs) to compute holographic entanglement entropy and entanglement wedge cross section. This technique allows us to compute these quantities for arbitrary shapes of the subregions in any asymptotically AdS metric. We test our computations against some known results and further demonstrate the utility of PINNs in examples, where it is not straightforward to perform such computations.

2509.22059 2026-03-31 hep-ph cs.LG hep-ex

Stable and Interpretable Jet Physics with IRC-Safe Equivariant Feature Extraction

Partha Konar, Vishal S. Ngairangbam, Michael Spannowsky, Deepanshu Srivastava

Comments 30 pages, 3 tables, 7 figures

详情
Journal ref
JHEP 03 (2026) 219
英文摘要

Deep learning has achieved remarkable success in jet classification tasks, yet a key challenge remains: understanding what these models learn and how their features relate to known QCD observables. Improving interpretability is essential for building robust and trustworthy machine learning tools in collider physics. To address this challenge, we investigate graph neural networks for quark-gluon discrimination, systematically incorporating physics-motivated inductive biases. In particular, we design message-passing architectures that enforce infrared and collinear (IRC) safety, as well as E(2) and O(2) equivariance in the rapidity-azimuth plane. Using simulated jet datasets, we compare these networks against unconstrained baselines in terms of classification performance, robustness to soft emissions, and latent representation structures. Our analysis shows that physics-aware networks are more stable across training instances and distribute their latent variance across multiple interpretable directions. By regressing Energy Flow Polynomials onto the leading principal components, we establish a direct correspondence between learned representations and established IRC-safe jet observables. These results demonstrate that embedding symmetry and safety constraints not only improves robustness but also grounds network representations in known QCD structures, providing a principled approach toward interpretable deep learning in collider physics.

2509.13688 2026-03-31 cs.GR cs.AI

CraftMesh: High-Fidelity Generative Mesh Manipulation via Poisson Seamless Fusion

James Jincheng, Yuxiao Wu, Youcheng Cai, Ligang Liu

详情
英文摘要

Controllable, high-fidelity mesh editing remains a significant challenge in 3D content creation. Existing generative methods often struggle with complex geometries and fail to produce detailed results. We propose CraftMesh, a novel framework for high-fidelity generative mesh manipulation via Poisson Seamless Fusion. Our key insight is to decompose mesh editing into a pipeline that leverages the strengths of 2D and 3D generative models: we edit a 2D reference image, then generate a region-specific 3D mesh, and seamlessly fuse it into the original model. We introduce two core techniques: Poisson Geometric Fusion, which utilizes a hybrid SDF/Mesh representation with normal blending to achieve harmonious geometric integration, and Poisson Texture Harmonization for visually consistent texture blending. Experimental results demonstrate that CraftMesh outperforms state-of-the-art methods, delivering superior global consistency and local detail in complex editing tasks.

2508.01321 2026-03-31 stat.ML cs.LG

Flow IV: Counterfactual Inference In Nonseparable Outcome Models Using Instrumental Variables

Marc Braun, Jose M. Peña, Adel Daoud

详情
英文摘要

To reach human level intelligence, learning algorithms need to incorporate causal reasoning. But identifying causality, and particularly counterfactual reasoning, remains elusive. In this paper, we make progress on counterfactual inference in nonseparable outcome models by utilizing instrumental variables (IVs). IVs are a classic tool for mitigating bias from unobserved confounders when estimating causal effects. While IV methods for effect estimation have been extended to nonseparable outcome models under different assumptions, existing IV approaches to counterfactual prediction typically assume one-dimensional outcomes and additive noise. In this paper, we show that under standard IV assumptions, along with the assumption that the outcome function is invertible and has a triangular structure, then the treatment-outcome relationship becomes identifiable from observed data. We furthermore propose a method to learn the outcome function utilizing normalizing flows. This outcome function estimator can then be used to perform counterfactual inference. We refer to the method as Flow IV.

2507.06876 2026-03-31 cs.CY cs.AI

Winning and losing with Artificial Intelligence: What public discourse about ChatGPT tells us about how societies make sense of technological change

Adrian Rauchfleisch, Joshua Philip Suarez, Nikka Marie Sales, Andreas Jungherr

详情
Journal ref
Telematics and Informatics, 103, 102344 (2025)
英文摘要

Public product launches in Artificial Intelligence can serve as focusing events for collective attention, surfacing how societies react to technological change. Social media provide a window into the sensemaking around these events, surfacing hopes and fears and showing who chooses to engage in the discourse and when. We demonstrate that public sensemaking about AI is shaped by economic interests and cultural values of those involved. We analyze 3.8 million tweets posted by 1.6 million users across 117 countries in response to the public launch of ChatGPT in 2022. Our analysis shows how economic self-interest, proxied by occupational skill types in writing, programming, and mathematics, and national cultural orientations, as measured by Hofstede's individualism, uncertainty avoidance, and power distance dimensions, shape who speaks, when they speak, and their stance towards ChatGPT. Roles requiring more technical skills, such as programming and mathematics, tend to engage earlier and express more positive stances, whereas writing-centric occupations join later with greater skepticism. At the cultural level, individualism predicts both earlier engagement and a more negative stance, and uncertainty avoidance reduces the prevalence of positive stances but does not delay when users first engage with ChatGPT. Aggregate sentiment trends mask the dynamics observed in our study. The shift toward a more critical stance towards ChatGPT over time stems primarily from the entry of more skeptical voices rather than a change of heart among early adopters. Our findings underscore the importance of both the occupational background and cultural context in understanding public reactions to AI.

2506.23836 2026-03-31 math.OC cs.DC cs.LG

Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

Alexander Tyurin

详情
英文摘要

We consider centralized distributed optimization in the classical federated learning setup, where $n$ workers jointly find an $\varepsilon$-stationary point of an $L$-smooth, $d$-dimensional nonconvex function $f$, having access only to unbiased stochastic gradients with variance $σ^2$. Each worker requires at most $h$ seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are $τ_{s}$ and $τ_{w}$ seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to $n$. For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term $\frac{h σ^2 L Δ}{n \varepsilon^2},$ which improves with the number of workers $n,$ where $Δ= f(x^0) - f^*,$ and $x^0 \in R^d$ is the starting point. Similarly, using unbiased sparsification compressors, it is possible to reduce both the variance-dependent runtime term and the communication runtime term. However, once we account for the communication from the server to the workers $τ_{s}$, we prove that it becomes infeasible to design a method using unbiased random sparsification compressors that scales both the server-side communication runtime term $τ_{s} d \frac{L Δ}{\varepsilon}$ and the variance-dependent runtime term $\frac{h σ^2 L Δ}{\varepsilon^2},$ better than poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution. To establish this result, we construct a new "worst-case" function and develop a new lower bound framework that reduces the analysis to the concentration of a random sum, for which we prove a concentration bound. These results reveal fundamental limitations in scaling distributed optimization, even under the homogeneous assumption.

2506.21138 2026-03-31 cs.SE cs.AI

Multi-Sample Prompting and Actor-Critic Prompt Optimization for Diverse Synthetic Data Generation

Abdelkarim El-Hajjami, Camille Salinesi

详情
英文摘要

High-quality labeled datasets are fundamental for training and evaluating machine learning models, yet domains such as healthcare and Requirements Engineering (RE) face persistent barriers due to data scarcity, privacy constraints, or proprietary restrictions. While Large Language Models (LLMs) offer a promising avenue for Synthetic Data Generation (SDG), LLM-generated data tends to be repetitive and low in diversity, reducing its effectiveness for downstream tasks. Two approaches show potential for addressing this limitation: (1) multi-sample prompting, which generates multiple samples per prompt to reduce repetition, and (2) Prompt with Actor-Critic Editing (PACE), which iteratively refines prompts to maximize diversity. We integrate both mechanisms into Synthline, a Feature Model-based configurable synthetic data generator, and assess their effects on diversity and downstream utility across four RE classification tasks. Multi-sample prompting consistently improves both diversity and utility, with F1-score gains of 6 to 43.8 percentage points. PACE-based prompt optimization consistently improves lexical diversity but produces task-dependent utility effects, revealing the risks of optimizing for diversity alone. Most notably, synthetic data can match or surpass human-authored data for tasks where real labeled data is limited, with improvements of up to 15.4 percentage points in F1-score.

2505.17288 2026-03-31 stat.ML cs.LG

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Seamus Somerstep, Vinod Raman, Unique Subedi, Yuekai Sun

Comments AISTATS 2026 Camera Ready

详情
英文摘要

Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.

2505.13213 2026-03-31 stat.ML cs.LG

Diffusion Models with Double Guidance: Generate with aggregated datasets

Yanfeng Yang, Kenji Fukumizu

详情
英文摘要

Creating large-scale datasets for training high-performance generative models is often prohibitively expensive, especially when associated attributes or annotations must be provided. As a result, merging existing datasets has become a common strategy. However, the sets of attributes across datasets are often inconsistent, and their naive concatenation typically leads to block-wise missing conditions. This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions, thereby limiting the model's controllability and applicability. To address this issue, we propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously. Our method maintains rigorous control over multiple conditions without requiring joint annotations. We demonstrate its effectiveness in molecular and image generation tasks, where it outperforms existing baselines both in alignment with target conditional distributions and in controllability under missing condition settings.

2505.12412 2026-03-31 stat.ML cs.LG

Training Latent Diffusion Models with Interacting Particle Algorithms

Tim Y. J. Wang, Juan Kuntz, O. Deniz Akyildiz

Comments Camera Ready version for AISTATS 2026

详情
英文摘要

We introduce a novel particle-based algorithm for end-to-end training of latent diffusion models. We reformulate the training task as minimizing a free energy functional and obtain a gradient flow that does so. By approximating the latter with a system of interacting particles, we obtain the algorithm, which we underpin theoretically by providing error guarantees. The novel algorithm compares favorably in experiments with previous particle-based methods and variational inference analogues.

2505.07372 2026-03-31 cs.SE cs.AI

Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair

David de-Fitero-Dominguez, Antonio Garcia-Cabot, Eva Garcia-Lopez

详情
Journal ref
Expert Systems with Applications 319 (2026)
英文摘要

This paper presents a novel methodology for enhancing Automated Program Repair (APR) through synthetic data generation utilizing Large Language Models (LLMs). Current APR systems are constrained by the limited availability of high-quality training data encompassing diverse bug types across multiple programming languages. The proposed approach addresses this limitation through a two-phase process: a synthetic sample generation followed by a rigorous quality assessment. Multiple state-of-the-art LLMs were employed to generate approximately 30,000 paired examples of buggy and fixed code across 12 programming languages and 13 bug categories. Subsequently, these samples underwent cross-model evaluation against five criteria: correctness, code quality, security, performance, and completeness. Experimental evaluation on the VulRepair test set dataset showed statistically significant improvements in Perfect Prediction rates, with the quality-filtered synthetic dataset achieving 17.18% (Top@1) and 23.00% (Top@5) compared to the baseline's 11.68% and 18.88% respectively, representing a 47% relative improvement in Top@1 and 22% in Top@5. The methodology was validated through rigorous statistical testing, including ANOVA and post-hoc Tukey's Honest Significant Difference analysis. Furthermore, the best-performing configurations surpassed existing systems despite using a less computationally intensive decoding strategy. This research establishes a self-bootstrapping paradigm in which LLMs generate and evaluate their own training data, suggesting promising directions for addressing data scarcity in similar software engineering tasks and advancing the development of robust, adaptable tools for automated code maintenance.

2505.04880 2026-03-31 quant-ph cs.AI cs.LG

Symbolic Analysis of Grover Search Algorithm via Chain-of-Thought Reasoning and Quantum-Native Tokenization

Min Chen, Jinglei Cheng, Pingzhi Li, Haoran Wang, Tianlong Chen, Junyu Liu

Comments 33 pages, 14 figures

详情
Journal ref
npj Quantum Information 12, 48 (2026)
英文摘要

Understanding the high-level conceptual structure of quantum algorithms from their low-level circuit representations is a critical task for verification, debugging, and education. While traditional numerical simulators can calculate output probabilities, they do not explicitly surface the underlying algorithmic logic, such as the function of an oracle or embedded symmetries. In this work, we shift the focus from numerical simulation to symbolic analysis, investigating whether Large Language Models (LLMs) can automatically interpret quantum circuits and articulate their logic in a human-readable format. We introduce GroverGPT+, a model that leverages Chain-of-Thought reasoning and quantum-native tokenization to analyze Grover's search algorithm. We use Grover's algorithm as a controlled testbed, as its well-defined analytical properties allow for rigorous verification of the model's reasoning process. Our primary finding is that GroverGPT+ successfully identifies the oracle and its marked states directly from circuit representations. The model's key output is not a final probability, but a structured, interpretable reasoning trace that mirrors human expert analysis, effectively translating procedural circuit steps into conceptual insights. Furthermore, we establish a structured benchmark for this symbolic analysis task and explore its empirical extrapolation describing the model's performance as the number of qubits increases. These findings position LLMs as powerful tools for automated quantum algorithm analysis and verification. More fundamentally, this work offers a first step towards using such models as scientific probes, suggesting that an algorithm's ``learnability" by a classical model can provide a new, complementary perspective on its conceptual complexity, a topic of core interest to quantum information science.

2504.16474 2026-03-31 cs.CR cs.LG

Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation

Meixi Zheng, Kehan Wu, Yanbo Fan, Rui Huang, Baoyuan Wu

Comments 32 pages, 9 figures

详情
英文摘要

The transfer-based black-box adversarial attack setting poses the challenge of crafting an adversarial example (AE) on known surrogate models that remain effective against unseen target models. Due to the practical importance of this task, numerous methods have been proposed to address this challenge. However, most previous methods are heuristically designed and intuitively justified, lacking a theoretical foundation. To bridge this gap, we derive a novel transferability bound that offers provable guarantees for adversarial transferability. Our theoretical analysis has the advantages of \textit{(i)} deepening our understanding of previous methods by building a general attack framework and \textit{(ii)} providing guidance for designing an effective attack algorithm. Our theoretical results demonstrate that optimizing AEs toward flat minima over the surrogate model set, while controlling the surrogate-target model shift measured by the adversarial model discrepancy, yields a comprehensive guarantee for AE transferability. The results further lead to a general transfer-based attack framework, within which we observe that previous methods consider only partial factors contributing to the transferability. Algorithmically, inspired by our theoretical results, we first elaborately construct the surrogate model set in which models exhibit diverse adversarial vulnerabilities with respect to AEs to narrow an instantiated adversarial model discrepancy. Then, a \textit{model-Diversity-compatible Reverse Adversarial Perturbation} (DRAP) is generated to effectively promote the flatness of AEs over diverse surrogate models to improve transferability. Extensive experiments on NIPS2017 and CIFAR-10 datasets against various target models demonstrate the effectiveness of our proposed attack.

2503.08915 2026-03-31 eess.IV cs.CV

Reconstruct Anything Model: a lightweight general model for computational imaging

Matthieu Terris, Samuel Hurault, Maxime Song, Julian Tachella

详情
英文摘要

Most existing learning-based methods for solving imaging inverse problems can be roughly divided into two classes: iterative algorithms, such as plug-and-play and diffusion methods leveraging pretrained denoisers, and unrolled architectures that are trained end-to-end for specific imaging problems. Iterative methods in the first class are computationally costly and often yield suboptimal reconstruction performance, whereas unrolled architectures are generally problem-specific and require expensive training. In this work, we propose a novel non-iterative, lightweight architecture that incorporates knowledge about the forward operator (acquisition physics and noise parameters) without relying on unrolling. Our model is trained to solve a wide range of inverse problems, such as deblurring, magnetic resonance imaging, computed tomography, inpainting, and super-resolution, and handles arbitrary image sizes and channels, such as grayscale, complex, and color data. The proposed model can be easily adapted to unseen inverse problems or datasets with a few fine-tuning steps (up to a few images) in a self-supervised way, without ground-truth references. Throughout a series of experiments, we demonstrate state-of-the-art performance from medical imaging to low-photon imaging and microscopy. Our code is available at https://github.com/matthieutrs/ram.

2502.18535 2026-03-31 cs.CR cs.AI cs.LG

A Survey of Zero-Knowledge Proof Based Verifiable Machine Learning

Zhizhi Peng, Chonghe Zhao, Taotao Wang, Guofu Liao, Zibin Lin, Yifeng Liu, Bin Cao, Long Shi, Qing Yang, Shengli Zhang

Comments This manuscript has been accepted for publication in Artificial Intelligence Review

详情
英文摘要

Machine learning is increasingly deployed through outsourced and cloud-based pipelines, which improve accessibility but also raise concerns about computational integrity, data privacy, and model confidentiality. Zero-knowledge proofs (ZKPs) provide a compelling foundation for verifiable machine learning because they allow one party to certify that a training, testing, or inference result was produced by the claimed computation without revealing sensitive data or proprietary model parameters. Despite rapid progress in zero-knowledge machine learning (ZKML), the literature remains fragmented across different cryptographic settings, ML tasks, and system objectives. This survey presents a comprehensive review of ZKML research published from June 2017 to August 2025. We first introduce the basic ZKP formulations underlying ZKML and organize existing studies into three core tasks: verifiable training, verifiable testing, and verifiable inference. We then synthesize representative systems, compare their design choices, and analyze the main implementation bottlenecks, including limited circuit expressiveness, high proving cost, and deployment complexity. In addition, we summarize major techniques for improving generality and efficiency, review emerging commercial efforts, and discuss promising future directions. By consolidating the design space of ZKML, this survey aims to provide a structured reference for researchers and practitioners working on trustworthy and privacy-preserving machine learning.

2502.07754 2026-03-31 cs.GR cs.CV

MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization

Rafał Tobiasz, Grzegorz Wilczyński, Marcin Mazur, Sławomir Tadeja, Weronika Smolak-Dyżewska, Przemysław Spurek

详情
英文摘要

Gaussian Splatting (GS) is a recent and pivotal technique in 3D computer graphics. GS-based algorithms almost always bypass classical methods such as ray tracing, which offer numerous inherent advantages for rendering. For example, ray tracing can handle incoherent rays for advanced lighting effects, including shadows and reflections. To address this limitation, we introduce MeshSplats, a method which converts GS to a mesh-like format. Following the completion of training, MeshSplats transforms Gaussian elements into mesh faces, enabling rendering using ray tracing methods with all their associated benefits. Our model can be utilized immediately following transformation, yielding a mesh of slightly reduced reconstruction quality without additional training. Furthermore, we can enhance the quality by applying a dedicated optimization algorithm that operates on mesh faces rather than Gaussian components. Importantly, MeshSplats acts as a wrapper, converting pre-trained GS models into a ray-traceable format. The efficacy of our method is substantiated by experimental results, underscoring its extensive applications in computer graphics and image processing.

2502.03330 2026-03-31 cs.HC cs.AI cs.CV cs.GR

ControlGUI: Guiding Generative GUI Exploration through Perceptual Visual Flow

Aryan Garg, Yue Jiang, Antti Oulasvirta

详情
英文摘要

During the early stages of interface design, designers need to produce multiple sketches to explore a design space. Design tools often fail to support this critical stage, because they insist on specifying more details than necessary. Although recent advances in generative AI have raised hopes of solving this issue, in practice they fail because expressing loose ideas in a prompt is impractical. In this paper, we propose a diffusion-based approach to the low-effort generation of interface sketches. It breaks new ground by allowing flexible control of the generation process via three types of inputs: A) prompts, B) wireframes, and C) visual flows. The designer can provide any combination of these as input at any level of detail, and will get a diverse gallery of low-fidelity solutions in response. The unique benefit is that large design spaces can be explored rapidly with very little effort in input-specification. We present qualitative results for various combinations of input specifications. Additionally, we demonstrate that our model aligns more accurately with these specifications than other models.