arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1423
专题追踪
2602.03948 2026-02-05 stat.ML cs.CR cs.LG cs.SI math.ST stat.TH

Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks

Bibhabasu Mandal, Sagnik Nandy

详情
英文摘要

In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network. We adopt the $β$ model, which is the prototypical statistical model adopted for this form of aggregated relational information, and study the problem of minimax-optimal parameter estimation under both local and central differential privacy constraints. We establish finite sample minimax lower bounds that characterize the precise dependence of the estimation risk on the network size and the privacy parameters, and we propose simple estimators that achieve these bounds up to constants and logarithmic factors under both local and central differential privacy frameworks. Our results provide the first comprehensive finite sample characterization of privacy utility trade offs for parameter estimation in $β$ models, addressing the classical graph case and extending the analysis to higher order hypergraph models. We further demonstrate the effectiveness of our methods through experiments on synthetic data and a real world communication network.

2602.03939 2026-02-05 eess.SY cs.LG cs.SY

C-IDS: Solving Contextual POMDP via Information-Directed Objective

Chongyang Shi, Michael Dorothy, Jie Fu

详情
英文摘要

We study the policy synthesis problem in contextual partially observable Markov decision processes (CPOMDPs), where the environment is governed by an unknown latent context that induces distinct POMDP dynamics. Our goal is to design a policy that simultaneously maximizes cumulative return and actively reduces uncertainty about the underlying context. We introduce an information-directed objective that augments reward maximization with mutual information between the latent context and the agent's observations. We develop the C-IDS algorithm to synthesize policies that maximize the information-directed objective. We show that the objective can be interpreted as a Lagrangian relaxation of the linear information ratio and prove that the temperature parameter is an upper bound on the information ratio. Based on this characterization, we establish a sublinear Bayesian regret bound over K episodes. We evaluate our approach on a continuous Light-Dark environment and show that it consistently outperforms standard POMDP solvers that treat the unknown context as a latent state variable, achieving faster context identification and higher returns.

2602.03927 2026-02-05 cond-mat.mes-hall cond-mat.str-el cs.AI

First-Principles AI finds crystallization of fractional quantum Hall liquids

Ahmed Abouelkomsan, Liang Fu

Comments 5 pages + SM

详情
英文摘要

When does a fractional quantum Hall (FQH) liquid crystallize? Addressing this question requires a framework that treats fractionalization and crystallization on equal footing, especially in strong Landau-level mixing regime. Here, we introduce MagNet, a self-attention neural-network variational wavefunction designed for quantum systems in magnetic fields on the torus geometry. We show that MagNet provides a unifying and expressive ansatz capable of describing both FQH states and electron crystals within the same architecture. Trained solely by energy minimization of the microscopic Hamiltonian, MagNet discovers topological liquid and electron crystal ground states across a broad range of Landau-level mixing. Our results highlight the power of first-principles AI for solving strongly interacting many-body problems and finding competing phases without external training data or physics pre-knowledge.

2602.03902 2026-02-05 q-bio.QM cs.AI cs.LG

All-Atom GPCR-Ligand Simulation via Residual Isometric Latent Flow

Jiying Zhang, Shuhao Zhang, Pierre Vandergheynst, Patrick Barth

Comments 36 pages

详情
英文摘要

G-protein-coupled receptors (GPCRs), primary targets for over one-third of approved therapeutics, rely on intricate conformational transitions to transduce signals. While Molecular Dynamics (MD) is essential for elucidating this transduction process, particularly within ligand-bound complexes, conventional all-atom MD simulation is computationally prohibitive. In this paper, we introduce GPCRLMD, a deep generative framework for efficient all-atom GPCR-ligand simulation.GPCRLMD employs a Harmonic-Prior Variational Autoencoder (HP-VAE) to first map the complex into a regularized isometric latent space, preserving geometric topology via physics-informed constraints. Within this latent space, a Residual Latent Flow samples evolution trajectories, which are subsequently decoded back to atomic coordinates. By capturing temporal dynamics via relative displacements anchored to the initial structure, this residual mechanism effectively decouples static topology from dynamic fluctuations. Experimental results demonstrate that GPCRLMD achieves state-of-the-art performance in GPCR-ligand dynamics simulation, faithfully reproducing thermodynamic observables and critical ligand-receptor interactions.

2602.03899 2026-02-05 stat.ML cs.AI cs.LG math.OC math.ST stat.TH

Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

Gilles Bareilles, Wassim Bouaziz, Julien Fageot, El-Mahdi El-Mhamdi

详情
英文摘要

Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce $κ^\star$, the optimal *robustness coefficient* of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum's robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum's robustness coefficient. We show that MultiKrum's bounds are never worse than Krum's, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.

2602.03889 2026-02-05 stat.ML cs.LG

Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations

Ernest Fokoué

Comments 24 pages, 6 figures, 2 tables

详情
英文摘要

Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regularization, a penalized likelihood framework with analytic barrier functions that prevent degeneracy while maintaining asymptotic efficiency. The resulting Transcendental Algorithm for Mixtures of Distributions (TAMD) offers strong theoretical guarantees: identifiability, consistency, and robustness. Empirically, TAMD successfully stabilizes estimation and prevents collapse, yet achieves only modest improvements in classification accuracy-highlighting fundamental limits of mixture models for unsupervised learning in high dimensions. Our work provides both a novel theoretical framework and an honest assessment of practical limitations, implemented in an open-source R package.

2602.03887 2026-02-05 eess.IV cs.CV

To What Extent Do Token-Level Representations from Pathology Foundation Models Improve Dense Prediction?

Weiming Chen, Xitong Ling, Xidong Wang, Zhenyang Cai, Yijia Guo, Mingxi Fu, Ziyi Zeng, Minxi Ouyang, Jiawen Li, Yizhi Wang, Tian Guan, Benyou Wang, Yonghong He

详情
英文摘要

Pathology foundation models (PFMs) have rapidly advanced and are becoming a common backbone for downstream clinical tasks, offering strong transferability across tissues and institutions. However, for dense prediction (e.g., segmentation), practical deployment still lacks a clear, reproducible understanding of how different PFMs behave across datasets and how adaptation choices affect performance and stability. We present PFM-DenseBench, a large-scale benchmark for dense pathology prediction, evaluating 17 PFMs across 18 public segmentation datasets. Under a unified protocol, we systematically assess PFMs with multiple adaptation and fine-tuning strategies, and derive insightful, practice-oriented findings on when and why different PFMs and tuning choices succeed or fail across heterogeneous datasets. We release containers, configs, and dataset cards to enable reproducible evaluation and informed PFM selection for real-world dense pathology tasks. Project Website: https://m4a1tastegood.github.io/PFM-DenseBench

2602.03870 2026-02-05 eess.IV cs.CV

DINO-AD: Unsupervised Anomaly Detection with Frozen DINO-V3 Features

Jiayu Huo, Jingyuan Hong, Liyun Chen

Comments Accepted by ISBI 2026, 4 pages, 2 figures, 3 tables

详情
英文摘要

Unsupervised anomaly detection (AD) in medical images aims to identify abnormal regions without relying on pixel-level annotations, which is crucial for scalable and label-efficient diagnostic systems. In this paper, we propose a novel anomaly detection framework based on DINO-V3 representations, termed DINO-AD, which leverages self-supervised visual features for precise and interpretable anomaly localization. Specifically, we introduce an embedding similarity matching strategy to select a semantically aligned support image and a foreground-aware K-means clustering module to model the distribution of normal features. Anomaly maps are then computed by comparing the query features with clustered normal embeddings through cosine similarity. Experimental results on both the Brain and Liver datasets demonstrate that our method achieves superior quantitative performance compared with state-of-the-art approaches, achieving AUROC scores of up to 98.71. Qualitative results further confirm that our framework produces clearer and more accurate anomaly localization. Extensive ablation studies validate the effectiveness of each proposed component, highlighting the robustness and generalizability of our approach.

2602.03858 2026-02-05 eess.SP cs.LG

PENGUIN: General Vital Sign Reconstruction from PPG with Flow Matching State Space Model

Shuntaro Suzuki, Shuitsu Koyama, Shinnosuke Hirano, Shunya Nagashima

Comments Accepted for presentation at ICASSP2026

详情
英文摘要

Photoplethysmography (PPG) plays a crucial role in continuous cardiovascular health monitoring as a non-invasive and cost-effective modality. However, PPG signals are susceptible to motion artifacts and noise, making accurate estimation of vital signs such as arterial blood pressure (ABP) challenging. Existing estimation methods are often restricted to a single-task or environment, limiting their generalizability across diverse PPG decoding scenarios. Moreover, recent general-purpose approaches typically rely on predictions over multi-second intervals, discarding the morphological characteristics of vital signs. To address these challenges, we propose PENGUIN, a generative flow-matching framework that extends deep state space models, enabling fine-grained conditioning on PPG for reconstructing multiple vital signs as continuous waveforms. We evaluate PENGUIN using six real-world PPG datasets across three distinct vital sign reconstruction tasks (electrocardiogram reconstruction, respiratory monitoring, and ABP monitoring). Our method consistently outperformed both task-specific and general-purpose baselines, demonstrating PENGUIN as a general framework for robust vital sign reconstruction from PPG.

2602.03852 2026-02-05 cs.HC cs.AI cs.CY

Perceptions of AI-CBT: Trust and Barriers in Chinese Postgrads

Chan-in Sio, Alex Mann, Lingxi Fan, Andrew Cheung, Lik-hang Lee

Comments Accepted and presented in The 30th International Conference on Technologies and Applications of Artificial Intelligence in Taipei, Taiwan on 13-14 December 2025 (TAAI 2025)

详情
英文摘要

The mental well-being of graduate students is an increasing concern, yet the adoption of scalable support remains uneven. Artificial intelligence-powered cognitive behavioral therapy chatbots (AI-CBT) offer low barrier help, but little is known about how Chinese postgraduates perceive and use them. This qualitative study explored perceptions and experiences of AI-CBT chatbots among ten Chinese graduate students recruited through social media. Semi-structured Zoom interviews were conducted and analyzed using reflexive thematic analysis, with the Health Belief Model (HBM) and the Theory of Planned Behavior (TPB) as sensitizing frameworks. The findings indicate a cautious openness to AI-CBT chatbots: perceived usefulness and 24/7 access supported favorable attitudes, while data privacy, emotional safety, and uncertainty about `fit' for complex problems restricted the intention to use. Social norms (e.g., stigma and peer views) and perceived control (digital literacy, language quality) further shaped adoption. The study offers context-specific information to guide the culturally sensitive design, communication, and deployment of AI mental well-being tools for student populations in China and outlines the design implications around transparency, safeguards, and graduated care pathways.

2602.03849 2026-02-05 cs.HC cs.AI cs.CL cs.LG

HybridQuestion: Human-AI Collaboration for Identifying High-Impact Research Questions

Keyu Zhao, Fengli Xu, Yong Li, Tie-Yan Liu

Comments 16 pages, 6 figures, 4 tables

详情
英文摘要

The "AI Scientist" paradigm is transforming scientific research by automating key stages of the research process, from idea generation to scholarly writing. This shift is expected to accelerate discovery and expand the scope of scientific inquiry. However, a key question remains unclear: can AI scientists identify meaningful research questions? While Large Language Models (LLMs) have been applied successfully to task-specific ideation, their potential to conduct strategic, long-term assessments of past breakthroughs and future questions remains largely unexplored. To address this gap, we explore a human-AI hybrid solution that integrates the scalable data processing capabilities of AI with the value judgment of human experts. Our methodology is structured in three phases. The first phase, AI-Accelerated Information Gathering, leverages AI's advantage in processing vast amounts of literature to generate a hybrid information base. The second phase, Candidate Question Proposing, utilizes this synthesized data to prompt an ensemble of six diverse LLMs to propose an initial candidate pool, filtered via a cross-model voting mechanism. The third phase, Hybrid Question Selection, refines this pool through a multi-stage filtering process that progressively increases human oversight. To validate this system, we conducted an experiment aiming to identify the Top 10 Scientific Breakthroughs of 2025 and the Top 10 Scientific Questions for 2026 across five major disciplines. Our analysis reveals that while AI agents demonstrate high alignment with human experts in recognizing established breakthroughs, they exhibit greater divergence in forecasting prospective questions, suggesting that human judgment remains crucial for evaluating subjective, forward-looking challenges.

2602.03762 2026-02-05 eess.AS cs.LG

Conditional Flow Matching for Visually-Guided Acoustic Highlighting

Hugo Malard, Gael Le Lan, Daniel Wong, David Lou Alon, Yi-Chiao Wu, Sanjeel Parekh

详情
英文摘要

Visually-guided acoustic highlighting seeks to rebalance audio in alignment with the accompanying video, creating a coherent audio-visual experience. While visual saliency and enhancement have been widely studied, acoustic highlighting remains underexplored, often leading to misalignment between visual and auditory focus. Existing approaches use discriminative models, which struggle with the inherent ambiguity in audio remixing, where no natural one-to-one mapping exists between poorly-balanced and well-balanced audio mixes. To address this limitation, we reframe this task as a generative problem and introduce a Conditional Flow Matching (CFM) framework. A key challenge in iterative flow-based generation is that early prediction errors -- in selecting the correct source to enhance -- compound over steps and push trajectories off-manifold. To address this, we introduce a rollout loss that penalizes drift at the final step, encouraging self-correcting trajectories and stabilizing long-range flow integration. We further propose a conditioning module that fuses audio and visual cues before vector field regression, enabling explicit cross-modal source selection. Extensive quantitative and qualitative evaluations show that our method consistently surpasses the previous state-of-the-art discriminative approach, establishing that visually-guided audio remixing is best addressed through generative modeling.

2602.03104 2026-02-05 cs.HC cs.AI

"I'm happy even though it's not real": GenAI Photo Editing as a Remembering Experience

Yufeng Wu, Qing Li, Elise van den Hoven, A. Baki Kocaballi

详情
英文摘要

Generative Artificial Intelligence (GenAI) is increasingly integrated into photo applications on personal devices, making editing photographs easier than ever while potentially influencing the memories they represent. This study explores how and why people use GenAI to edit personal photos and how this shapes their remembering experience. We conducted a two-phase qualitative study with 12 participants: a photo editing session using a GenAI tool guided by the Remembering Experience (RX) dimensions, followed by semi-structured interviews where participants reflected on the editing process and results. Findings show that participants prioritised felt memory over factual accuracy. For different photo elements, environments were modified easily, however, editing was deemed unacceptable if it touched upon a person's identity. Editing processes brought positive and negative impacts, and itself also became a remembering experience. We further discuss potential benefits and risks of GenAI editing for remembering purposes and propose design implications for responsible GenAI.

2602.02552 2026-02-05 eess.IV cs.CV

Super-résolution non supervisée d'images hyperspectrales de télédétection utilisant un entraînement entièrement synthétique

Xinxin Xu, Yann Gousseau, Christophe Kervazo, Saïd Ladjal

Comments in French language

Journal ref GRETSI 2025: XXXe Colloque Francophone de Traitement du Signal et des Images, Strasbourg, France, August 2025

详情
英文摘要

Hyperspectral single image super-resolution (SISR) aims to enhance spatial resolution while preserving the rich spectral information of hyperspectral images. Most existing methods rely on supervised learning with high-resolution ground truth data, which is often unavailable in practice. To overcome this limitation, we propose an unsupervised learning approach based on synthetic abundance data. The hyperspectral image is first decomposed into endmembers and abundance maps through hyperspectral unmixing. A neural network is then trained to super-resolve these maps using data generated with the dead leaves model, which replicates the statistical properties of real abundances. The final super-resolution hyperspectral image is reconstructed by recombining the super-resolved abundance maps with the endmembers. Experimental results demonstrate the effectiveness of our method and the relevance of synthetic data for training.

2602.01928 2026-02-05 stat.ML cs.LG

Privacy Amplification by Missing Data

Simon Roburin, Rafaël Pinot, Erwan Scornet

详情
英文摘要

Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications often involve datasets with missing values due to non-response, data corruption, or deliberate anonymization. Missing data is traditionally viewed as a limitation because it reduces the information available to analysts and can degrade model performance. In this work, we take an alternative perspective and study missing data from a privacy preservation standpoint. Intuitively, when features are missing, less information is revealed about individuals, suggesting that missingness could inherently enhance privacy. We formalize this intuition by analyzing missing data as a privacy amplification mechanism within the framework of differential privacy. We show, for the first time, that incomplete data can yield privacy amplification for differentially private algorithms.

2602.01589 2026-02-05 cs.GR cs.CV cs.LG math.AG

Two-chart Beltrami Optimization for Distortion-Controlled Spherical Bijection with Application to Brain Surface Registration

Zhehao Xu, Lok Ming Lui

详情
英文摘要

Many genus-0 surface mapping tasks such as landmark alignment, feature matching, and image-driven registration, can be reduced (via an initial spherical conformal map) to optimizing a spherical self-homeomorphism with controlled distortion. However, existing works lack efficient mechanisms to control the geometric distortion of the resulting mapping. To resolve this issue, we formulate this as a Beltrami-space optimization problem, where the angle distortion is encoded explicitly by the Beltrami differential and bijectivity can be enforced through the constraint $\|μ\|_{\infty}<1$. To make this practical on the sphere, we introduce the Spherical Beltrami Differential (SBD), a two-chart representation of quasiconformal self-maps of the unit sphere $\mathbb{S}^2$, together with cross-chart consistency conditions that yield a globally bijective spherical deformation (up to conformal automorphisms). Building on the Spectral Beltrami Network, we develop BOOST, a differentiable optimization framework that updates two Beltrami fields to minimize task-driven losses while regularizing distortion and enforcing consistency along the seam. Experiments on large-deformation landmark matching and intensity-based spherical registration demonstrate improved task performance meanwhile maintaining controlled distortion and robust bijective behavior. We also apply the method to cortical surface registration by aligning sulcal landmarks and matching cortical sulcal depth, achieving comparative or better registration performance without sacrificing geometric validity.

2602.00157 2026-02-05 q-bio.QM cs.AI q-bio.BM

ProDCARL: Reinforcement Learning-Aligned Diffusion Models for De Novo Antimicrobial Peptide Design

Fang Sheng, Mohammad Noaeen, Zahra Shakeri

详情
英文摘要

Antimicrobial resistance threatens healthcare sustainability and motivates low-cost computational discovery of antimicrobial peptides (AMPs). De novo peptide generation must optimize antimicrobial activity and safety through low predicted toxicity, but likelihood-trained generators do not enforce these goals explicitly. We introduce ProDCARL, a reinforcement-learning alignment framework that couples a diffusion-based protein generator (EvoDiff OA-DM 38M) with sequence property predictors for AMP activity and peptide toxicity. We fine-tune the diffusion prior on AMP sequences to obtain a domain-aware generator. Top-k policy-gradient updates use classifier-derived rewards plus entropy regularization and early stopping to preserve diversity and reduce reward hacking. In silico experiments show ProDCARL increases the mean predicted AMP score from 0.081 after fine-tuning to 0.178. The joint high-quality hit rate reaches 6.3\% with pAMP $>$0.7 and pTox $<$0.3. ProDCARL maintains high diversity, with $1-$mean pairwise identity equal to 0.929. Qualitative analyses with AlphaFold3 and ProtBERT embeddings suggest candidates show plausible AMP-like structural and semantic characteristics. ProDCARL serves as a candidate generator that narrows experimental search space, and experimental validation remains future work.

2601.18321 2026-02-05 cs.MM cs.CL cs.CV

Integrating Fine-Grained Audio-Visual Evidence for Robust Multimodal Emotion Reasoning

Zhixian Zhao, Wenjie Tian, Lei Xie

详情
英文摘要

Multimodal emotion analysis is shifting from static classification to generative reasoning. Beyond simple label prediction, robust affective reasoning must synthesize fine-grained signals such as facial micro-expressions and prosodic which shifts to decode the latent causality within complex social contexts. However, current Multimodal Large Language Models (MLLMs) face significant limitations in fine-grained perception, primarily due to data scarcity and insufficient cross-modal fusion. As a result, these models often exhibit unimodal dominance which leads to hallucinations in complex multimodal interactions, particularly when visual and acoustic cues are subtle, ambiguous, or even contradictory (e.g., in sarcastic scenery). To address this, we introduce SABER-LLM, a framework designed for robust multimodal reasoning. First, we construct SABER, a large-scale emotion reasoning dataset comprising 600K video clips, annotated with a novel six-dimensional schema that jointly captures audiovisual cues and causal logic. Second, we propose the structured evidence decomposition paradigm, which enforces a "perceive-then-reason" separation between evidence extraction and reasoning to alleviate unimodal dominance. The ability to perceive complex scenes is further reinforced by consistency-aware direct preference optimization, which explicitly encourages alignment among modalities under ambiguous or conflicting perceptual conditions. Experiments on EMER, EmoBench-M, and SABER-Test demonstrate that SABER-LLM significantly outperforms open-source baselines and achieves robustness competitive with closed-source models in decoding complex emotional dynamics. The dataset and model are available at https://github.com/zxzhao0/SABER-LLM.

2601.15313 2026-02-05 q-bio.NC cs.AI

Attention Is Not Retention: The Orthogonality Constraint in Infinite-Context Architectures

Oliver Zahn, Matt Beton, Simran Chana

Comments 32 Pages, 7 Figures

详情
英文摘要

Biological memory solves a problem that eludes current AI: storing specific episodic facts without corrupting general semantic knowledge. Complementary Learning Systems theory explains this through two subsystems - a fast hippocampal system using sparse, pattern-separated representations for episodes, and a slow neocortical system using distributed representations for statistical regularities. Current AI systems lack this separation, attempting to serve both functions through neural weights alone. We identify the Orthogonality Constraint: reliable memory requires orthogonal keys, but semantic embeddings cannot be orthogonal because training clusters similar concepts together. The result is Semantic Interference (connecting to what cognitive psychologists have long observed in human memory), where neural systems writing facts into shared continuous parameters collapse to near-random accuracy within tens of semantically related facts. Through semantic density (rho), the mean pairwise cosine similarity, we show collapse occurs at N=5 facts (rho > 0.6) or N ~ 20-75 (moderate rho). We validate across modalities: 16,309 Wikipedia facts, scientific measurements (rho = 0.96, 0.02% accuracy at N=10,000), and image embeddings (rho = 0.82, 0.05% at N=2,000). This failure is geometric - no increase in model capacity can overcome interference when keys share semantic overlap. We propose Knowledge Objects (KOs): structured facts with hash-based identity, controlled vocabularies, and explicit version chains. On Wikipedia facts, KO retrieval achieves 45.7% where Modern Hopfield Networks collapse to near-zero; hash-based retrieval maintains 100%. Production systems (Claude Memory, ChatGPT Memory) store unstructured text, causing schema drift (40-70% consistency) and version ambiguity. Knowledge Objects provide the discrete hippocampal component that enables reliable bicameral memory.

2601.12180 2026-02-05 cs.HC cs.MM cs.SD eess.AS

VidTune: Creating Video Soundtracks with Generative Music and Contextual Thumbnails

Mina Huh, C. Ailie Fraser, Dingzeyu Li, Mira Dontcheva, Bryan Wang

Comments Accepted to CHI 2026

详情
英文摘要

Music shapes the tone of videos, yet creators often struggle to find soundtracks that match their video's mood and narrative. Recent text-to-music models let creators generate music from text prompts, but our formative study (N=8) shows creators struggle to construct diverse prompts, quickly review and compare tracks, and understand their impact on the video. We present VidTune, a system that supports soundtrack creation by generating diverse music options from a creator's prompt and producing contextual thumbnails for rapid review. VidTune extracts representative video subjects to ground thumbnails in context, maps each track's valence and energy onto visual cues like color and brightness, and depicts prominent genres and instruments. Creators can refine tracks through natural language edits, which VidTune expands into new generations. In a controlled user study (N=12) and an exploratory case study (N=6), participants found VidTune helpful for efficiently reviewing and comparing music options and described the process as playful and enriching.

2601.09025 2026-02-05 eess.IV cs.LG eess.SP

Universal Latent Homeomorphic Manifolds: A Framework for Cross-Domain Representation Unification

Tong Wu, Tayab Uddin Wara, Daniel Hernandez, Sidong Lei

详情
英文摘要

We present the Universal Latent Homeomorphic Manifold (ULHM), a framework that unifies semantic representations (e.g., human descriptions, diagnostic labels) and observation-driven machine representations (e.g., pixel intensities, sensor readings) into a single latent structure. Despite originating from fundamentally different pathways, both modalities capture the same underlying reality. We establish \emph{homeomorphism}, a continuous bijection preserving topological structure, as the mathematical criterion for determining when latent manifolds induced by different semantic-observation pairs can be rigorously unified. This criterion provides theoretical guarantees for three critical applications: (1) semantic-guided sparse recovery from incomplete observations, (2) cross-domain transfer learning with verified structural compatibility, and (3) zero-shot compositional learning via valid transfer from semantic to observation space. Our framework learns continuous manifold-to-manifold transformations through conditional variational inference, avoiding brittle point-to-point mappings. We develop practical verification algorithms, including trust, continuity, and Wasserstein distance metrics, that empirically validate homeomorphic structure from finite samples. Experiments demonstrate: (1) sparse image recovery from 5% of CelebA pixels and MNIST digit reconstruction at multiple sparsity levels, (2) cross-domain classifier transfer achieving 86.73% accuracy from MNIST to Fashion-MNIST without retraining, and (3) zero-shot classification on unseen classes achieving 78.76% on CIFAR-10. Critically, the homeomorphism criterion determines when different semantic-observation pairs share compatible latent structure, enabling principled unification into universal representations and providing a mathematical foundation for decomposing general foundation models into domain-specific components.

2601.06514 2026-02-05 stat.ML cs.LG cs.NA math.NA math.OC math.ST stat.TH

Inference-Time Alignment for Diffusion Models via Variationally Stable Doob's Matching

Jinyuan Chang, Chenguang Duan, Yuling Jiao, Yi Xu, Jerry Zhijian Yang

详情
英文摘要

Inference-time alignment for diffusion models aims to adapt a pre-trained reference diffusion model toward a target distribution without retraining the reference score network, thereby preserving the generative capacity of the reference model while enforcing desired properties at the inference time. A central mechanism for achieving such alignment is guidance, which modifies the sampling dynamics through an additional drift term. In this work, we introduce variationally stable Doob's matching, a novel framework for provable guidance estimation grounded in Doob's $h$-transform. Our approach formulates guidance as the gradient of logarithm of an underlying Doob's $h$-function and employs gradient-regularized regression to simultaneously estimate both the $h$-function and its gradient, resulting in a consistent estimator of the guidance. Theoretically, we establish non-asymptotic convergence rates for the estimated guidance. Moreover, we analyze the resulting controllable diffusion processes and prove non-asymptotic convergence guarantees for the generated distributions in the 2-Wasserstein distance. Finally, we show that variationally stable guidance estimators are adaptive to unknown low dimensionality, effectively mitigating the curse of dimensionality under low-dimensional subspace assumptions.

2512.21005 2026-02-05 stat.ML cs.LG math.PR

Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments

Edwin Fong, Lancelot F. James, Juho Lee

Comments v2: Revised version incorporating peer review feedback from book chapter submission. Clarifies modeling objectives for infectious disease prediction and situates the work within a three-paper PHIBP framework, highlighting suitability for future AI/LLM plug-and-play model specification

详情
英文摘要

Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious disease prediction, with a focus on predicting outbreaks in geographic regions that have historically reported zero cases. To this end, we present the detailed computational framework and experimental application of the Poisson Hierarchical Indian Buffet Process (PHIBP), with demonstrated success in handling sparse count data in microbiome and ecological studies. The PHIBP's architecture, grounded in the concept of absolute abundance, systematically borrows statistical strength from related regions and circumvents the known sensitivities of relative-rate methods to zero counts. Through a series of experiments on infectious disease data, we show that this principled approach provides a robust foundation for generating coherent predictive distributions and for the effective use of comparative measures such as alpha and beta diversity. The chapter's emphasis on algorithmic implementation and experimental results confirms that this unified framework delivers both accurate outbreak predictions and meaningful epidemiological insights in data-sparse settings.

2512.12742 2026-02-05 stat.ML cs.LG

A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals

Pingping Yin, Xiyun Jiao

详情
英文摘要

We propose a unified framework that employs variational inference (VI) with (conditional) normalizing flows (NFs) to train both between-model and within-model proposals for reversible jump Markov chain Monte Carlo, enabling efficient trans-dimensional Bayesian inference. In contrast to the transport reversible jump (TRJ) of Davies et al. (2023), which optimizes forward KL divergence using pilot samples from the complex target distribution, our approach minimizes the reverse KL divergence, requiring only samples from a simple base distribution and largely reducing computational cost. Especially, we develop a novel trans-dimensional VI method with conditional NFs to fit the conditional transport proposal of Davies et al. (2023). We use RealNVP flows to learn the model-specific transport maps used for constructing proposals so that the calculation is parallelizable. Our framework also provides accurate estimates of marginal likelihoods, which may facilitate efficient model comparison and help design rejection-free proposals. Extensive numerical studies demonstrate that the TRJ method trained under our framework achieves faster mixing compared to existing baselines.

2510.27675 2026-02-05 cs.SE cs.CR cs.LG

On the Difficulty of Selecting Few-Shot Examples for Effective LLM-based Vulnerability Detection

Md Abdul Hannan, Ronghao Ni, Chi Zhang, Limin Jia, Ravi Mangal, Corina S. Pasareanu

Comments Workshop on LLM Assisted Security and Trust Exploration (LAST-X) 2026

详情
英文摘要

Large language models (LLMs) have demonstrated impressive capabilities across a wide range of coding tasks, including summarization, translation, completion, and code generation. Despite these advances, detecting code vulnerabilities remains a challenging problem for LLMs. In-context learning (ICL) has emerged as an effective mechanism for improving model performance by providing a small number of labeled examples within the prompt. Prior work has shown, however, that the effectiveness of ICL depends critically on how these few-shot examples are selected. In this paper, we study two intuitive criteria for selecting few-shot examples for ICL in the context of code vulnerability detection. The first criterion leverages model behavior by prioritizing samples on which the LLM consistently makes mistakes, motivated by the intuition that such samples can expose and correct systematic model weaknesses. The second criterion selects examples based on semantic similarity to the query program, using k-nearest-neighbor retrieval to identify relevant contexts. We conduct extensive evaluations using open-source LLMs and datasets spanning multiple programming languages. Our results show that for Python and JavaScript, careful selection of few-shot examples can lead to measurable performance improvements in vulnerability detection. In contrast, for C and C++ programs, few-shot example selection has limited impact, suggesting that more powerful but also more expensive approaches, such as re-training or fine-tuning, may be required to substantially improve model performance.

2510.26275 2026-02-05 cs.SE cs.AI cs.ET cs.LG cs.MA

A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI

Domenico Amalfitano, Andreas Metzger, Marco Autili, Tommaso Fulcini, Tobias Hey, Jan Keim, Patrizio Pelliccione, Vincenzo Scotti, Anne Koziolek, Raffaela Mirandola, Andreas Vogelsang

详情
英文摘要

Generative AI (GenAI) is rapidly transforming software engineering (SE) practices, influencing how SE processes are executed, as well as how software systems are developed, operated, and evolved. This paper applies design science research to build a roadmap for GenAI-augmented SE. The process consists of three cycles that incrementally integrate multiple sources of evidence, including collaborative discussions from the FSE 2025 "Software Engineering 2030" workshop, rapid literature reviews, and external feedback sessions involving peers. McLuhan's tetrads were used as a conceptual instrument to systematically capture the transforming effects of GenAI on SE processes and software products. The resulting roadmap identifies four fundamental forms of GenAI augmentation in SE and systematically characterizes their related research challenges and opportunities. These insights are then consolidated into a set of future research directions. By grounding the roadmap in a rigorous multi-cycle process and cross-validating it among independent author teams and peers, the study provides a transparent and reproducible foundation for analyzing how GenAI affects SE processes, methods and tools, and for framing future research within this rapidly evolving area.

2510.04192 2026-02-05 cs.MA cs.AI

Cooperative Flexibility Exchange: Fair and Comfort-Aware Decentralized Resource Allocation

Rabiya Khalid, Evangelos Pournaras

详情
英文摘要

The growing electricity demand and use of smart appliances are placing pressure on power grids, making efficient energy management more important than ever. The existing energy management systems often prioritize system efficiency (balanced energy demand and supply) at the expense of consumer comfort. This paper addresses this gap by proposing a novel decentralized multi-agent coordination-based demand-side management system. The proposed system enables individual agents to coordinate for demand-side energy optimization while improving consumer comfort and maintaining system efficiency. A key innovation of this work is the introduction of a slot exchange mechanism, where agents first receive optimized appliance-level energy consumption schedules and then coordinate with each other to adjust these schedules through slot exchanges to improve their comfort even when agents show non-altruistic behaviour. It also scales well with large populations and promotes fairness by balancing satisfaction levels across consumers. For performance evaluation, a real-world dataset is used, and the results demonstrate that the proposed slot exchange mechanism increases consumer comfort and fairness without raising system inefficiency cost, making it a practical and scalable solution for future smart grids.

2510.03723 2026-02-05 eess.AS cs.CL cs.SD

Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition

Martin Kocour, Martin Karafiat, Alexander Polok, Dominik Klement, Lukáš Burget, Jan Černocký

详情
英文摘要

We propose a speaker-attributed (SA) Whisper-based model for multi-talker speech recognition that combines target-speaker modeling with serialized output training (SOT). Our approach leverages a Diarization-Conditioned Whisper (DiCoW) encoder to extract target-speaker embeddings, which are concatenated into a single representation and passed to a shared decoder. This enables the model to transcribe overlapping speech as a serialized output stream with speaker tags and timestamps. In contrast to target-speaker ASR systems such as DiCoW, which decode each speaker separately, our approach performs joint decoding, allowing the decoder to condition on the context of all speakers simultaneously. Experiments show that the model outperforms existing SOT-based approaches and surpasses DiCoW on multi-talker mixtures (e.g., LibriMix).

2510.00282 2026-02-05 physics.plasm-ph cs.LG physics.comp-ph

Electron neural closure for turbulent magnetosheath simulations: energy channels

George Miloshevich, Luka Vranckx, Felipe Nathan de Oliveira Lopes, Pietro Dazzi, Giuseppe Arrò, Giovanni Lapenta

Comments 19 pages, 10 figures, 4 tables

Journal ref Phys. Plasmas 33, 012901 (2026)

详情
英文摘要

In this work, we introduce a non-local five-moment electron pressure tensor closure parametrized by a Fully Convolutional Neural Network (FCNN). Electron pressure plays an important role in generalized Ohm's law, competing with electron inertia. This model is used in the development of a surrogate model for a fully kinetic energy-conserving semi-implicit Particle-in-Cell simulation of decaying magnetosheath turbulence. We achieve this by training FCNN on a representative set of simulations with a smaller number of particles per cell and showing that our results generalise to a simulation with a large number of particles per cell. We evaluate the statistical properties of the learned equation of state, with a focus on pressure-strain interaction, which is crucial for understanding energy channels in turbulent plasmas. The resulting equation of state learned via FCNN significantly outperforms local closures, such as those learned by Multi-Layer Perceptron (MLP) or double adiabatic expressions. We report that the overall spatial distribution of pressure-strain and its conditional averages are reconstructed well. However, some small-scale features are missed, especially for the off-diagonal components of the pressure tensor. Nevertheless, the results are substantially improved with more training data, indicating favorable scaling and potential for improvement, which will be addressed in future work.

2509.25783 2026-02-05 stat.ML cs.LG math.OC

Sharpness of Minima in Deep Matrix Factorization

Anil Kamber, Rahul Parhi

Comments 18 pages, 7 figures

详情
英文摘要

Understanding the geometry of the loss landscape near a minimum is key to explaining the implicit bias of gradient-based methods in non-convex optimization problems such as deep neural network training and deep matrix factorization. A central quantity to characterize this geometry is the maximum eigenvalue of the Hessian of the loss. Currently, its precise role has been obfuscated because no exact expressions for this sharpness measure were known in general settings. In this paper, we present the first exact expression for the maximum eigenvalue of the Hessian of the squared-error loss at any minimizer in deep matrix factorization/deep linear neural network training problems, resolving an open question posed by Mulayoff & Michaeli (2020). This expression reveals a fundamental property of the loss landscape in deep matrix factorization: Having a constant product of the spectral norms of the left and right intermediate factors across layers is a sufficient condition for flatness. Most notably, in both depth-$2$ matrix and deep overparameterized scalar factorization, we show that this condition is both necessary and sufficient for flatness, which implies that flat minima are spectral-norm balanced even though they are not necessarily Frobenius-norm balanced. To complement our theory, we provide the first empirical characterization of an escape phenomenon during gradient-based training near a minimizer of a deep matrix factorization problem.