arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1289
专题追踪
2602.12465 2026-02-16 quant-ph cs.LG

Probabilistic Design of Parametrized Quantum Circuits through Local Gate Modifications

Grier M. Jones, Aviraj Newatia, Alexander Lao, Aditya K. Rao, Viki Kumar Prasad, Hans-Arno Jacobsen

详情
英文摘要

Within quantum machine learning, parametrized quantum circuits provide flexible quantum models, but their performance is often highly task-dependent, making manual circuit design challenging. Alternatively, quantum architecture search algorithms have been proposed to automate the discovery of task-specific parametrized quantum circuits using systematic frameworks. In this work, we propose an evolution-inspired heuristic quantum architecture search algorithm, which we refer to as the local quantum architecture search. The goal of the local quantum architecture search algorithm is to optimize parametrized quantum circuit architectures through a local, probabilistic search over a fixed set of gate-level actions applied to existing circuits. We evaluate the local quantum architecture search algorithm on two synthetic function-fitting regression tasks and two quantum chemistry regression datasets, including the BSE49 dataset of bond separation energies for first- and second-row elements and a dataset of water conformers generated using the data-driven coupled-cluster approach. Using state-vector simulation, our results highlight the applicability of local quantum architecture search algorithm for identifying competitive circuit architectures with desirable performance metrics. Lastly, we analyze the properties of the discovered circuits and demonstrate the deployment of the best-performing model on state-of-the-art quantum hardware.

2602.12463 2026-02-16 math.HO cs.AI

Correctness, Artificial Intelligence, and the Epistemic Value of Mathematical Proof

James Owen Weatherall, Jesse Wolfson

Comments 43 pages

详情
英文摘要

We argue that it is neither necessary nor sufficient for a mathematical proof to have epistemic value that it be "correct", in the sense of formalizable in a formal proof system. We then present a view on the relationship between mathematics and logic that clarifies the role of formal correctness in mathematics. Finally, we discuss the significance of these arguments for recent discussions about automated theorem provers and applications of AI to mathematics.

2602.12438 2026-02-16 math.DG cs.LG hep-th

Neural and numerical methods for $\mathrm{G}_2$-structures on contact Calabi-Yau 7-manifolds

Elli Heyes, Edward Hirst, Henrique N. Sá Earp, Tomás S. R. Silva

Comments 8+5 pages, 9 figures

详情
英文摘要

A numerical framework for approximating $\mathrm{G}_2$-structure 3-forms on contact Calabi-Yau manifolds is presented. The approach proceeds in three stages: first, existing neural network models are employed to compute an approximate Ricci-flat metric on a Calabi-Yau threefold. Second, using this metric and the explicit construction of a $\mathrm{G}_2$-structure on the associated 7-dimensional Calabi-Yau link in the 9-sphere, numerical approximations of the 3-form are generated on a large set of sampled points. Finally, a dedicated neural architecture is trained to learn the 3-form and its induced Riemannian metric directly from data, validating the learned structure and its torsion via a numerical implementation of the exterior derivative, which may be of independent interest.

2602.12426 2026-02-16 eess.SP cs.LG cs.MA

Interference-Robust Non-Coherent Over-the-Air Computation for Decentralized Optimization

Nicolò Michelusi

Comments To appear at IEEE ICC 2026

详情
英文摘要

Non-coherent over-the-air (NCOTA) computation enables low-latency and bandwidth-efficient decentralized optimization by exploiting the average energy superposition property of wireless channels. It has recently been proposed as a powerful tool for executing consensus-based optimization algorithms in fully decentralized systems. A key advantage of NCOTA is that it enables unbiased consensus estimation without channel state information at either transmitters or receivers, requires no transmission scheduling, and scales efficiently to dense network deployments. However, NCOTA is inherently susceptible to external interference, which can bias the consensus estimate and deteriorate the convergence of the underlying decentralized optimization algorithm. In this paper, we propose a novel interference-robust (IR-)NCOTA scheme. The core idea is to apply a coordinated random rotation of the frame of reference across all nodes, and transmit a pseudo-random pilot signal, allowing to transform external interference into a circularly symmetric distribution with zero mean relative to the rotated frame. This ensures that the consensus estimates remain unbiased, preserving the convergence guarantees of the underlying optimization algorithm. Through numerical results on a classification task, it is demonstrated that IR-NCOTA exhibits superior performance over the baseline NCOTA algorithm in the presence of external interference.

2602.12422 2026-02-16 cs.AR cs.AI cs.LG

CacheMind: From Miss Rates to Why -- Natural-Language, Trace-Grounded Reasoning for Cache Replacement

Kaushal Mhapsekar, Azam Ghanbari, Bita Aslrousta, Samira Mirbagher-Ajorpaz

Comments 16 pages, 13 figures, ASPLOS 2026

详情
英文摘要

Cache replacement remains a challenging problem in CPU microarchitecture, often addressed using hand-crafted heuristics, limiting cache performance. Cache data analysis requires parsing millions of trace entries with manual filtering, making the process slow and non-interactive. To address this, we introduce CacheMind, a conversational tool that uses Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) to enable semantic reasoning over cache traces. Architects can now ask natural language questions like, "Why is the memory access associated with PC X causing more evictions?", and receive trace-grounded, human-readable answers linked to program semantics for the first time. To evaluate CacheMind, we present CacheMindBench, the first verified benchmark suite for LLM-based reasoning for the cache replacement problem. Using the SIEVE retriever, CacheMind achieves 66.67% on 75 unseen trace-grounded questions and 84.80% on 25 unseen policy-specific reasoning tasks; with RANGER, it achieves 89.33% and 64.80% on the same evaluations. Additionally, with RANGER, CacheMind achieves 100% accuracy on 4 out of 6 categories in the trace-grounded tier of CacheMindBench. Compared to LlamaIndex (10% retrieval success), SIEVE achieves 60% and RANGER achieves 90%, demonstrating that existing Retrieval-Augmented Generation (RAGs) are insufficient for precise, trace-grounded microarchitectural reasoning. We provided four concrete actionable insights derived using CacheMind, wherein bypassing use case improved cache hit rate by 7.66% and speedup by 2.04%, software fix use case gives speedup of 76%, and Mockingjay replacement policy use case gives speedup of 0.7%; showing the utility of CacheMind on non-trivial queries that require a natural-language interface.

2602.12418 2026-02-16 cs.CR cs.CL cs.LG

Sparse Autoencoders are Capable LLM Jailbreak Mitigators

Yannick Assogba, Jacopo Cortellazzi, Javier Abad, Pau Rodriguez, Xavier Suau, Arno Blaas

Comments 26 pages, 14 figures, 3 tables

详情
英文摘要

Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based defense that identifies jailbreak-relevant sparse features by comparing token-level representations of the same harmful request with and without jailbreak context. Using paired harmful/jailbreak prompts, CC-Delta selects features via statistical testing and applies inference-time mean-shift steering in SAE latent space. Across four aligned instruction-tuned models and twelve jailbreak attacks, CC-Delta achieves comparable or better safety-utility tradeoffs than baseline defenses operating in dense latent space. In particular, our method clearly outperforms dense mean-shift steering on all four models, and particularly against out-of-distribution attacks, showing that steering in sparse SAE feature space offers advantages over steering in dense activation space for jailbreak mitigation. Our results suggest off-the-shelf SAEs trained for interpretability can be repurposed as practical jailbreak defenses without task-specific training.

2602.12387 2026-02-16 quant-ph cs.LG

Accelerating Feedback-based Algorithms for Quantum Optimization Using Gradient Descent

Masih Mozakka, Mohsen Heidari

Comments 10 pages, 6 figures

详情
英文摘要

Feedback-based methods have gained significant attention as an alternative training paradigm for the Quantum Approximate Optimization Algorithm (QAOA) in solving combinatorial optimization problems such as MAX-CUT. In particular, Quantum Lyapunov Control (QLC) employs feedback-driven control laws that guarantee monotonic non-decreasing objective values, can substantially reduce the training overhead of QAOA, and mitigate barren plateaus. However, these methods might require long control sequences, leading to sub-optimal convergence rates. In this work, we propose a hybrid method that incorporates per-layer gradient estimation to accelerate the convergence of QLC while preserving its low training overhead and stability guarantees. By leveraging layer-wise gradient information, the proposed approach selects near-optimal control parameters, resulting in significantly faster convergence and improved robustness. We validate the effectiveness of the method through extensive numerical experiments across a range of problem instances and optimization settings.

2602.12349 2026-02-16 cs.GR cs.LG

Variational Green's Functions for Volumetric PDEs

Joao Teixeira, Eitan Grinspun, Otman Benchekroun

详情
英文摘要

Green's functions characterize the fundamental solutions of partial differential equations; they are essential for tasks ranging from shape analysis to physical simulation, yet they remain computationally prohibitive to evaluate on arbitrary geometric discretizations. We present Variational Green's Function (VGF), a method that learns a smooth, differentiable representation of the Green's function for linear self-adjoint PDE operators, including the Poisson, the screened Poisson, and the biharmonic equations. To resolve the sharp singularities characteristic of the Green's functions, our method decomposes the Green's function into an analytic free-space component, and a learned corrector component. Our method leverages a variational foundation to impose Neumann boundary conditions naturally, and imposes Dirichlet boundary conditions via a projective layer on the output of the neural field. The resulting Green's functions are fast to evaluate, differentiable with respect to source application, and can be conditioned on other signals parameterizing our geometry.

2602.12317 2026-02-16 q-bio.QM cs.AI cs.LG

Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement

Yuhan Wei, Yuting He, Linshan Wu, Fuxiang Huang, Junlin Hou, Hao Chen

详情
英文摘要

Medical image foundation models (MIFMs) have demonstrated remarkable potential for a wide range of clinical tasks, yet their development is constrained by the scarcity, heterogeneity, and high cost of large-scale annotated datasets. Here, we propose RaSD (Randomized Synthesis and Disentanglement), a scalable framework for pre-training MIFMs entirely on synthetic data. By modeling anatomical structures and appearance variations with randomized Gaussian distributions, RaSD exposes models to sufficient multi-scale structural and appearance perturbations, forcing them to rely on invariant and task-relevant anatomical cues rather than dataset-specific textures, thereby enabling robust and transferable representation learning. We pre-trained RaSD on 1.2 million 3D volumes and 9.6 million 2D images, and extensively evaluated the resulting models across 6 imaging modalities, 48 datasets, and 56 downstream tasks. Across all evaluated downstream tasks, RaSD consistently outperforms training-from-scratch models, achieves the best performance on 17 tasks, and remains comparable to models pre-trained on large real datasets in most others. These results demonstrate that the capacity of synthetic data alone to drive robust representation learning. Our findings establish a paradigm shift in medical AI, demonstrating that synthetic data can serve as a "free lunch" for scalable, privacy-preserving, and clinically generalizable foundation models.

2602.12315 2026-02-16 cs.IR cs.AI

AgenticShop: Benchmarking Agentic Product Curation for Personalized Web Shopping

Sunghwan Kim, Ryang Heo, Yongsik Seo, Jinyoung Yeo, Dongha Lee

Comments Accepted at WWW 2026

详情
英文摘要

The proliferation of e-commerce has made web shopping platforms key gateways for customers navigating the vast digital marketplace. Yet this rapid expansion has led to a noisy and fragmented information environment, increasing cognitive burden as shoppers explore and purchase products online. With promising potential to alleviate this challenge, agentic systems have garnered growing attention for automating user-side tasks in web shopping. Despite significant advancements, existing benchmarks fail to comprehensively evaluate how well agentic systems can curate products in open-web settings. Specifically, they have limited coverage of shopping scenarios, focusing only on simplified single-platform lookups rather than exploratory search. Moreover, they overlook personalization in evaluation, leaving unclear whether agents can adapt to diverse user preferences in realistic shopping contexts. To address this gap, we present AgenticShop, the first benchmark for evaluating agentic systems on personalized product curation in open-web environment. Crucially, our approach features realistic shopping scenarios, diverse user profiles, and a verifiable, checklist-driven personalization evaluation framework. Through extensive experiments, we demonstrate that current agentic systems remain largely insufficient, emphasizing the need for user-side systems that effectively curate tailored products across the modern web.

2602.12311 2026-02-16 cs.SE cs.AI

Perceptual Self-Reflection in Agentic Physics Simulation Code Generation

Prashant Shende, Bradley Camburn

Comments 15 pages, 2 figures, 2 tables. Introduces a multi-agent architecture for physics simulation code generation with perceptual self-reflection via vision-based validation. Includes qualitative evaluation across multiple physics domains

详情
英文摘要

We present a multi-agent framework for generating physics simulation code from natural language descriptions, featuring a novel perceptual self-reflection mechanism for validation. The system employs four specialized agents: a natural language interpreter that converts user requests into physics-based descriptions; a technical requirements generator that produces scaled simulation parameters; a physics code generator with automated self-correction; and a physics validator that implements perceptual self-reflection. The key innovation is perceptual validation, which analyzes rendered animation frames using a vision-capable language model rather than inspecting code structure directly. This approach addresses the ``oracle gap'' where syntactically correct code produces physically incorrect behavior--a limitation that conventional testing cannot detect. We evaluate the system across seven domains including classical mechanics, fluid dynamics, thermodynamics, electromagnetics, wave physics, reaction-diffusion systems, and non-physics data visualization. The perceptual self-reflection architecture demonstrates substantial improvement over single-shot generation baselines, with the majority of tested scenarios achieving target physics accuracy thresholds. The system exhibits robust pipeline stability with consistent code self-correction capability, operating at approximately \$0.20 per animation. These results validate our hypothesis that feeding visual simulation outputs back to a vision-language model for iterative refinement significantly outperforms single-shot code generation for physics simulation tasks and highlights the potential of agentic AI to support engineering workflows and physics data generation pipelines.

2602.12306 2026-02-16 eess.IV cs.AI cs.CV cs.ET cs.IT math.IT

Quantum walk inspired JPEG compression of images

Abhishek Verma, Sahil Tomar, Sandeep Kumar

Comments 8 pages

详情
英文摘要

This work proposes a quantum inspired adaptive quantization framework that enhances the classical JPEG compression by introducing a learned, optimized Qtable derived using a Quantum Walk Inspired Optimization (QWIO) search strategy. The optimizer searches a continuous parameter space of frequency band scaling factors under a unified rate distortion objective that jointly considers reconstruction fidelity and compression efficiency. The proposed framework is evaluated on MNIST, CIFAR10, and ImageNet subsets, using Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM), Bits Per Pixel (BPP), and error heatmap visual analysis as evaluation metrics. Experimental results show average gains ranging from 3 to 6 dB PSNR, along with better structural preservation of edges, contours, and luminance transitions, without modifying decoder compatibility. The structure remains JPEG compliant and can be implemented using accessible scientific packages making it ideal for deployment and practical research use.

2602.12299 2026-02-16 eess.AS cs.SD eess.SP

Acoustivision Pro: An Open-Source Interactive Platform for Room Impulse Response Analysis and Acoustic Characterization

Mandip Goswami

详情
英文摘要

Room acoustics analysis plays a central role in architectural design, audio engineering, speech intelligibility assessment, and hearing research. Despite the availability of standardized metrics such as reverberation time, clarity, and speech transmission index, accessible tools that combine rigorous signal processing with intuitive visualization remain scarce. This paper presents AcoustiVision Pro, an open-source web-based platform for comprehensive room impulse response (RIR) analysis. The system computes twelve distinct acoustic parameters from uploaded or dataset-sourced RIRs, provides interactive 3D visualizations of early reflections, generates frequency-dependent decay characteristics through waterfall plots, and checks compliance against international standards including ANSI S12.60 and ISO 3382. We introduce the accompanying RIRMega and RIRMega Speech datasets hosted on Hugging Face, containing thousands of simulated room impulse responses with full metadata. The platform supports real-time auralization through FFT-based convolution, exports detailed PDF reports suitable for engineering documentation, and provides CSV data export for further analysis. We describe the mathematical foundations underlying each acoustic metric, detail the system architecture, and present preliminary case studies demonstrating the platform's utility across diverse application domains including classroom acoustics, healthcare facility design, and recording studio evaluation.

2602.12296 2026-02-16 eess.SY cs.AI cs.SY

Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method

Maojiang Deng, Shoufeng Lu, Jiazhao Shi, Wen Zhang

详情
英文摘要

This study proposes a novel adaptive traffic signal control method leveraging a Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) to optimize signal timing by integrating variable cell length and multi-channel state representation. A road partition formula consisting of the sum of logarithmic and linear functions was proposed. The state variables are a vector composed of three channels: the number of vehicles, the average speed, and space occupancy. The set of available signal phases constitutes the action space, the selected phase is executed with a fixed green time. The reward function is formulated using the absolute values of key traffic state metrics - waiting time, speed, and fuel consumption. Each metric is normalized by a typical maximum value and assigned a weight that reflects its priority and optimization direction. The simulation results, using Sumo-TensorFlow-Python, demonstrate a cross-range transferability evaluation and show that the proposed variable cell length and multi-channel state representation method excels compared to fixed cell length in optimization performance.

2602.10271 2026-02-16 cs.IR cs.CL

MLDocRAG: Multimodal Long-Context Document Retrieval Augmented Generation

Yongyue Zhang, Yaxiong Wu

Comments 15 pages

详情
英文摘要

Understanding multimodal long-context documents that comprise multimodal chunks such as paragraphs, figures, and tables is challenging due to (1) cross-modal heterogeneity to localize relevant information across modalities, (2) cross-page reasoning to aggregate dispersed evidence across pages. To address these challenges, we are motivated to adopt a query-centric formulation that projects cross-modal and cross-page information into a unified query representation space, with queries acting as abstract semantic surrogates for heterogeneous multimodal content. In this paper, we propose a Multimodal Long-Context Document Retrieval Augmented Generation (MLDocRAG) framework that leverages a Multimodal Chunk-Query Graph (MCQG) to organize multimodal document content around semantically rich, answerable queries. MCQG is constructed via a multimodal document expansion process that generates fine-grained queries from heterogeneous document chunks and links them to their corresponding content across modalities and pages. This graph-based structure enables selective, query-centric retrieval and structured evidence aggregation, thereby enhancing grounding and coherence in multimodal long-context question answering. Experiments on datasets MMLongBench-Doc and LongDocURL demonstrate that MLDocRAG consistently improves retrieval quality and answer accuracy, demonstrating its effectiveness for multimodal long-context understanding.

2602.10168 2026-02-16 q-bio.QM cs.AI cs.LG

EVA: Towards a universal model of the immune system

Scienta Team, Ethan Bandasack, Vincent Bouget, Apolline Bruley, Yannis Cattan, Charlotte Claye, Matthew Corney, Julien Duquesne, Karim El Kanbi, Aziz Fouché, Pierre Marschall, Francesco Strozzi

详情
英文摘要

The effective application of foundation models to translational research in immune-mediated diseases requires multimodal patient-level representations that can capture complex phenotypes emerging from multicellular interactions. Yet most current biological foundation models focus only on single-cell resolution and are evaluated on technical metrics often disconnected from actual drug development tasks and challenges. Here, we introduce EVA, the first cross-species, multimodal foundation model of immunology and inflammation, a therapeutic area where shared pathogenic mechanisms create unique opportunities for transfer learning. EVA harmonizes transcriptomics data across species, platforms, and resolutions, and integrates histology data to produce rich, unified patient representations. We establish clear scaling laws, demonstrating that increasing model size and compute translates to improvements in both pretraining and downstream tasks performance. We introduce a comprehensive evaluation suite of 39 tasks spanning the drug development pipeline: zero-shot target efficacy and gene function prediction for discovery, cross-species or cross-diseases molecular perturbations for preclinical development, and patient stratification with treatment response prediction or disease activity prediction for clinical trials applications. We benchmark EVA against several state-of-the-art biological foundation models and baselines on these tasks, and demonstrate state-of-the-art results on each task category. Using mechanistic interpretability, we further identify biological meaningful features, revealing intertwined representations across species and technologies. We release an open version of EVA for transcriptomics to accelerate research on immune-mediated diseases.

2602.09394 2026-02-16 stat.ML cs.AI cs.IT cs.LG math.IT

The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning

Seyed Morteza Emadi

Comments 50 pages, 5 figures

详情
英文摘要

Manufacturing lines, service journeys, supply chains, and AI reasoning chains share a common challenge: attributing a terminal outcome to the intermediate stage that caused it. We establish an information-theoretic barrier to this credit assignment problem: the signal connecting early steps to final outcomes decays exponentially with depth, creating a critical horizon beyond which reliable learning from endpoint data alone requires exponentially many samples. We prove four results. First, a Signal Decay Bound: sample complexity for attributing outcomes to early stages grows exponentially in the number of intervening steps. Second, Width Limits: parallel rollouts provide only logarithmic relief, with correlation capping the effective number of independent samples. Third, an Objective Mismatch: additive reward aggregation optimizes the wrong quantity when sequential validity requires all steps to be correct. Fourth, Optimal Inspection Design: uniform checkpoint spacing is minimax-optimal under homogeneous signal attenuation, while a greedy algorithm yields optimal non-uniform schedules under heterogeneous attenuation. Together, these results provide a common analytical foundation for inspection design in operations and supervision design in AI.

2602.09064 2026-02-16 cs.SE cs.AI cs.LG

Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI

S M Rakib Ul Karim, Wenyi Lu, Enock Kasaadha, Sean Goggins

详情
英文摘要

Open Source Software (OSS) projects follow diverse lifecycle trajectories shaped by evolving patterns of contribution, coordination, and community engagement. Understanding these trajectories is essential for stakeholders seeking to assess project organization and health at scale. However, prior work has largely relied on static or aggregated metrics, such as project age or cumulative activity, providing limited insight into how OSS sustainability unfolds over time. In this paper, we propose a hierarchical predictive framework that models OSS projects as belonging to distinct lifecycle stages grounded in established socio-technical categorizations of OSS development. Rather than treating sustainability solely as project longevity, these lifecycle stages operationalize sustainability as a multidimensional construct integrating contribution activity, community participation, and maintenance dynamics. The framework combines engineered tabular indicators with 24-month temporal activity sequences and employs a multi-stage classification pipeline to distinguish lifecycle stages associated with different coordination and participation regimes. To support transparency, we incorporate explainable AI techniques to examine the relative contribution of feature categories to model predictions. Evaluated on a large corpus of OSS repositories, the proposed approach achieves over 94\% overall accuracy in lifecycle stage classification. Attribution analyses consistently identify contribution activity and community-related features as dominant signals, highlighting the central role of collective participation dynamics.

2602.07466 2026-02-16 math.NA cs.LG cs.NA

Learned Finite Element-based Regularization of the Inverse Problem in Electrocardiographic Imaging

Manuel Haas, Thomas Grandits, Thomas Pinetz, Thomas Beiert, Simone Pezzuto, Alexander Effland

详情
英文摘要

Electrocardiographic imaging (ECGI) seeks to reconstruct cardiac electrical activity from body-surface potentials noninvasively. However, the associated inverse problem is severely ill-posed and requires robust regularization. While classical approaches primarily employ spatial smoothing, the temporal structure of cardiac dynamics remains underexploited despite its physiological relevance. We introduce a space-time regularization framework that couples spatial regularization with a learned temporal Fields-of-Experts (FoE) prior to capture complex spatiotemporal activation patterns. We derive a finite element discretization on unstructured cardiac surface meshes, prove Mosco-convergence, and develop a scalable optimization algorithm capable of handling the FoE term. Numerical experiments on synthetic epicardial data demonstrate improved denoising and inverse reconstructions compared to handcrafted spatiotemporal methods, yielding solutions that are both robust to noise and physiologically plausible.

2601.22983 2026-02-16 cs.CR cs.LG

PIDSMaker: Building and Evaluating Provenance-based Intrusion Detection Systems

Tristan Bilot, Baoxiang Jiang, Thomas Pasquier

详情
英文摘要

Recent provenance-based intrusion detection systems (PIDSs) have demonstrated strong potential for detecting advanced persistent threats (APTs) by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessing pipelines, non-standard dataset splits, and incompatible ground-truth labeling and metrics. These discrepancies undermine reproducibility, impede fair comparison, and impose substantial re-implementation overhead on researchers. We present PIDSMaker, an open-source framework for developing and evaluating PIDSs under consistent protocols. PIDSMaker consolidates eight state-of-the-art systems into a modular, extensible architecture with standardized preprocessing and ground-truth labels, enabling consistent experiments and apples-to-apples comparisons. A YAML-based configuration interface supports rapid prototyping by composing components across systems without code changes. PIDSMaker also includes utilities for ablation studies, hyperparameter tuning, multi-run instability measurement, and visualization, addressing methodological gaps identified in prior work. We demonstrate PIDSMaker through concrete use cases and release it with preprocessed datasets and labels to support shared evaluation for the PIDS community.

2601.15673 2026-02-16 cs.IR cs.AI

Enhancing guidance for missing data in diffusion-based sequential recommendation

Qilong Yan, Yifei Xing, Dugang Liu, Jingpu Duan, Jian Yin

Comments ICASSP 2026 accecpted

详情
英文摘要

Contemporary sequential recommendation methods are becoming more complex, shifting from classification to a diffusion-guided generative paradigm. However, the quality of guidance in the form of user information is often compromised by missing data in the observed sequences, leading to suboptimal generation quality. Existing methods address this by removing locally similar items, but overlook ``critical turning points'' in user interest, which are crucial for accurately predicting subsequent user intent. To address this, we propose a novel Counterfactual Attention Regulation Diffusion model (CARD), which focuses on amplifying the signal from key interest-turning-point items while concurrently identifying and suppressing noise within the user sequence. CARD consists of (1) a Dual-side Thompson Sampling method to identify sequences undergoing significant interest shift, and (2) a counterfactual attention mechanism for these sequences to quantify the importance of each item. In this manner, CARD provides the diffusion model with a high-quality guidance signal composed of dynamically re-weighted interaction vectors to enable effective generation. Experiments show our method works well on real-world data without being computationally expensive. Our code is available at https://github.com/yanqilong3321/CARD.

2511.02083 2026-02-16 cs.CR cs.AI cs.CY

Watermarking Discrete Diffusion Language Models

Avi Bagchi, Akhil Bhimaraju, Moulik Choraria, Daniel Alabi, Lav R. Varshney

详情
英文摘要

Watermarking has emerged as a promising technique to track AI-generated content and differentiate it from authentic human creations. While prior work extensively studies watermarking for autoregressive large language models (LLMs) and image diffusion models, it remains comparatively underexplored for discrete diffusion language models (DDLMs), which are becoming popular due to their high inference throughput. In this paper, we introduce one of the first watermarking methods for DDLMs. Our approach applies a distribution-preserving Gumbel-max sampling trick at every diffusion step and seeds the randomness by sequence position to enable reliable detection. We empirically demonstrate reliable detectability on LLaDA, a state-of-the-art DDLM. We also analytically prove that the watermark is distortion-free, with a false detection probability that decays exponentially in the sequence length. A key practical advantage is that our method realizes desired watermarking properties with no expensive hyperparameter tuning, making it straightforward to deploy and scale across models and benchmarks.

2511.02065 2026-02-16 eess.IV cs.CV

Direct Kernel Optimization: Efficient Design for Opto-Electronic Convolutional Neural Networks

Ali Almuallem, Harshana Weligampola, Abhiram Gnanasambandam, Wei Xu, Dilshan Godaliyadda, Hamid R. Sheikh, Stanley H. Chan, Qi Guo

详情
英文摘要

Hybrid opto-electronic neural networks combine optical front-ends with electronic back-ends to perform vision tasks, but joint end-to-end (E2E) optimization of optical and electronic components is computationally expensive due to large parameter spaces and repeated optical convolutions. We propose Direct Kernel Optimization (DKO), a two-stage training framework that first trains a conventional electronic CNN and then synthesizes optical kernels to replicate the first-layer convolutional filters, reducing optimization dimensionality and avoiding hefty simulated optical convolutions during optimization. We evaluate DKO in simulation on a monocular depth estimation model and show that it achieves twice the accuracy of E2E training under equal computational budgets while reducing training time. Given the substantial computational challenges of optimizing hybrid opto-electronic systems, our results position DKO as a scalable optimization approach to train and realize these systems.

2510.24803 2026-02-16 cs.MA cs.AI

MASPRM: Multi-Agent System Process Reward Model

Milad Yazdani, Mahdi Mostajabdaveh, Zirui Zhou, Ying Xiong

详情
英文摘要

Practical deployment of multi-agent systems (MAS) demands strong performance at test time, motivating methods that guide search during inference and selectively spend compute to improve quality. We present the Multi-Agent System Process Reward Model (MASPRM). It assigns values to partial inter-agent transcripts for each action and each agent, and acts as a controller during inference. MASPRM is trained from multi-agent Monte Carlo Tree Search (MCTS) rollouts labeled only with terminal outcome rewards, without requiring human step-level annotations, by propagating returns to local targets. During inference, MASPRM guides step-level beam search (SBS) and MCTS, focusing computation on promising branches and pruning unpromising ones. We train and test MASPRM across different tasks and domains, using GSM8K, MATH, MMLU, and LogiQA as benchmarks. Averaged across these benchmarks, MASPRM improves Hit@1 over policy likelihood by up to $+13.4$ points and improves ranking quality, reducing Hit@1$->$Hit@5 gaps by up to $10.3$ points. MASPRM complements inference-time search by scoring intermediate routed transcripts to guide rollouts in MAS with fixed schedules. Code: https://github.com/milad1378yz/MASPRM

2509.10766 2026-02-16 cs.CR cs.AI

MetaSeal: Defending Against Image Attribution Forgery Through Content-Dependent Cryptographic Watermarks

Tong Zhou, Ruyi Ding, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Yunsi Fei, Xiaolin Xu, Shaolei Ren

Comments To appear at TMLR 2026. 26 pages, 14 figures

详情
英文摘要

The rapid growth of digital and AI-generated images has amplified the need for secure and verifiable methods of image attribution. While digital watermarking offers more robust protection than metadata-based approaches--which can be easily stripped--current watermarking techniques remain vulnerable to forgery, creating risks of misattribution that can damage the reputations of AI model developers and the rights of digital artists. The vulnerabilities of digital watermarking arise from two key issues: (1) content-agnostic watermarks, which, once learned or leaked, can be transferred across images to fake attribution, and (2) reliance on detector-based verification, which is unreliable since detectors can be tricked. We present MetaSeal, a novel framework for content-dependent watermarking with cryptographic security guarantees to safeguard image attribution. Our design provides (1) \textbf{forgery resistance}, preventing unauthorized replication and enforcing cryptographic verification; (2) \textbf{robust self-contained protection}, embedding attribution directly into images while maintaining robustness against benign transformations; and (3) \textbf{evidence of tampering}, making malicious alterations visually detectable. Experiments demonstrate that MetaSeal effectively mitigates forgery attempts and applies to both natural and AI-generated images, establishing a new standard for secure image attribution. Code is available at: https://github.com/Tongzhou0101/MetaSeal.

2507.23266 2026-02-16 eess.AS cs.SD

CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025

Aemon Yat Fei Chiu, Jingyu Li, Yusheng Tian, Guangyan Zhang, Tan Lee

Comments Accepted at the 20th National Conference on Man-Machine Speech Communication (NCMMSC 2025)

Journal ref In: Man-Machine Speech Communication. NCMMSC 2025. Communications in Computer and Information Science, vol 2662. Springer, Singapore (2026)

详情
英文摘要

This paper presents the Voice Timbre Attribute Detection (vTAD) systems developed by the Digital Signal Processing & Speech Technology Laboratory (DSP&STL) of the Department of Electronic Engineering (EE) at The Chinese University of Hong Kong (CUHK) for the 20th National Conference on Human-Computer Speech Communication (NCMMSC 2025) vTAD Challenge. The proposed systems leverage WavLM-Large embeddings with attentive statistical pooling (ASTP) to extract robust speaker representations, followed by two variants of Diff-Net, i.e., Feed-Forward Neural Network (FFN) and Squeeze-and-Excitation-enhanced Residual FFN (SE-ResFFN), to compare timbre attribute intensities between utterance pairs. Experimental results demonstrate that the WavLM-Large+FFN system generalises better to unseen speakers, achieving 77.96% accuracy and 21.79% equal error rate (EER), while the WavLM-Large+SE-ResFFN model excels in the 'Seen' setting with 94.42% accuracy and 5.49% EER. These findings highlight a trade-off between model complexity and generalisation, and underscore the importance of architectural choices in fine-grained speaker modelling. Our analysis also reveals the impact of speaker identity, annotation subjectivity, and data imbalance on system performance, pointing to future directions for improving robustness and fairness in timbre attribute detection.

2507.21372 2026-02-16 cs.NI cs.LG

Load Balancing for AI Training Workloads

Sarah McClure, Evyatar Cohen, Alex Shpiner, Mark Silberstein, Sylvia Ratnasamy, Scott Shenker, Isaac Keslassy

详情
英文摘要

The extreme bandwidth demands of AI training has made load-balancing a critical component in AI fabrics, and a variety of load-balancing designs have emerged in recent work from both industry and research. However, there is currently little consensus on which design approach dominates or the conditions under which an approach dominates. We also lack an understanding of how far these approaches are from optimal. We provide a technical foundation for answering these questions by systematically evaluating leading load-balancing designs, while decoupling them from specific congestion control and loss recovery stacks. We find that load-balancing based on packet spraying dominates traditional approaches that load balance traffic at flow, flowlet, or subflow granularities. When comparing host- vs switch-based approaches to packet spraying, we find that they perform similarly in failure-free scenarios but that a host-based approach dominates under link failure because of its rapid visibility into end-to-end path conditions. We also identify that no leading approach achieves optimal O(1) queue scaling at maximum utilization. We demonstrate why a destination-based rotation (DR) discipline can reach this optimum and introduce Ofan, a switch-based implementation of DR that we show offers valuable performance gains over other packet spraying approaches.

2505.19558 2026-02-16 cs.CY cs.LG

PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

Zhaowei Zhang, Xiaobo Wang, Minghua Yi, Mengmeng Wang, Fengshuo Bai, Zilong Zheng, Yipeng Kang, Yaodong Yang

Comments Accepted by ICLR 2026

详情
英文摘要

Achieving political consensus is crucial yet challenging for the effective functioning of social governance. However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities in this scope are still understudied. In this paper, we introduce PoliCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to draft consensus resolutions based on divergent party positions under varying collective decision-making contexts and political requirements. Specifically, PoliCon incorporates four factors to build each task environment for finding different political consensus: specific political issues, political goals, participating parties, and power structures based on seat distribution. We also developed an evaluation framework based on social choice theory for PoliCon, which simulates the real voting outcomes of different political parties to assess whether LLM-generated resolutions meet the requirements of the predetermined political consensus. Our experimental results demonstrate that even state-of-the-art models remain undersatisfied with complex tasks like passing resolutions by a two-thirds majority and addressing security issues, while uncovering their inherent partisan biases and revealing some behaviors LLMs show to achieve the consensus, such as prioritizing the stance of the dominant party instead of uniting smaller parties, which highlights PoliCon's promise as an effective platform for studying LLMs' ability to promote political consensus. The code and dataset are released at https://zowiezhang.github.io/projects/PoliCon.

2504.20101 2026-02-16 cs.DC cs.AI

PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving

Fei Fang, Yifan Hua, Shengze Wang, Ruilin Zhou, Yi Liu, Chen Qian, Xiaoxue Zhang

详情
英文摘要

While significant progress has been made in research and development on open-source and cost-efficient large-language models (LLMs), serving scalability remains a critical challenge, particularly for small organizations and individuals seeking to deploy and test their LLM innovations. Inspired by peer-to-peer networks that leverage decentralized overlay nodes to increase throughput and availability, we propose GenTorrent, an LLM serving overlay that harnesses computing resources from decentralized contributors. We identify four key research problems inherent to enabling such a decentralized infrastructure: 1) overlay network organization; 2) LLM communication privacy; 3) overlay forwarding for resource efficiency; and 4) verification of serving quality. This work presents the first systematic study of these fundamental problems in the context of decentralized LLM serving. Evaluation results from a prototype implemented on a set of decentralized nodes demonstrate that GenTorrent achieves a latency reduction of over 50% compared to the baseline design without overlay forwarding. Furthermore, the security features introduce minimal overhead to serving latency and throughput. We believe this work pioneers a new direction for democratizing and scaling future AI serving capabilities.

2504.12579 2026-02-16 cs.CR cs.CL

Provable Secure Steganography Based on Adaptive Dynamic Sampling

Kaiyi Pang, Minhao Bai

详情
英文摘要

The security of private communication is increasingly at risk due to widespread surveillance. Steganography, a technique for embedding secret messages within innocuous carriers, enables covert communication over monitored channels. Provably Secure Steganography (PSS), which ensures computational indistinguishability between the normal model output and steganography output, is the state-of-the-art in this field. However, current PSS methods often require obtaining the explicit distributions of the model. In this paper, we propose a provably secure steganography scheme that only requires a model API that accepts a seed as input. Our core mechanism involves sampling a candidate set of tokens and constructing a map from possible message bit strings to these tokens. The output token is selected by applying this mapping to the real secret message, which provably preserves the original model's distribution. To ensure correct decoding, we address collision cases, where multiple candidate messages map to the same token, by maintaining and strategically expanding a dynamic collision set within a bounded size range. Extensive evaluations of three real-world datasets and three large language models demonstrate that our sampling-based method is comparable with existing PSS methods in efficiency and capacity.