arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2393
专题追踪
2506.03407 2026-02-17 cs.GR cs.AI cs.CV cs.LG

Multi-Spectral Gaussian Splatting with Neural Color Representation

Lukas Meyer, Josef Grün, Maximilian Weiherer, Bernhard Egger, Marc Stamminger, Linus Franke

Comments for project page, see https://meyerls.github.io/ms_splatting

详情
英文摘要

We present MS-Splatting -- a multi-spectral 3D Gaussian Splatting (3DGS) framework that is able to generate multi-view consistent novel views from images of multiple, independent cameras with different spectral domains. In contrast to previous approaches, our method does not require cross-modal camera calibration and is versatile enough to model a variety of different spectra, including thermal and near-infra red, without any algorithmic changes. Unlike existing 3DGS-based frameworks that treat each modality separately (by optimizing per-channel spherical harmonics) and therefore fail to exploit the underlying spectral and spatial correlations, our method leverages a novel neural color representation that encodes multi-spectral information into a learned, compact, per-splat feature embedding. A shallow multi-layer perceptron (MLP) then decodes this embedding to obtain spectral color values, enabling joint learning of all bands within a unified representation. Our experiments show that this simple yet effective strategy is able to improve multi-spectral rendering quality, while also leading to improved per-spectra rendering quality over state-of-the-art methods. We demonstrate the effectiveness of this new technique in agricultural applications to render vegetation indices, such as normalized difference vegetation index (NDVI).

2505.12185 2026-02-17 cs.SE cs.CL cs.LG

EVALOOOP: A Self-Consistency-Centered Framework for Assessing Large Language Model Robustness in Programming

Sen Fang, Weiyuan Ding, Mengshi Zhang, Zihao Chen, Bowen Xu

Comments 27 pages, 7 figures

详情
英文摘要

Evaluating the programming robustness of large language models (LLMs) is paramount for ensuring their reliability in AI-based software development. However, adversarial attacks exhibit fundamental limitations that compromise fair robustness assessment: they demonstrate contradictory evaluation outcomes where different attack strategies tend to favor different models, and more critically, they operate solely through external perturbations, failing to capture the intrinsic stability essential for autonomous coding agents where subsequent inputs are endogenously generated by the model itself. We introduce EVALOOOP, a novel assessment framework that evaluates robustness from a self-consistency perspective, leveraging the natural duality inherent in software engineering tasks (e.g., code generation and code summarization). EVALOOOP establishes a self-contained feedback loop where an LLM iteratively transforms between code and natural language until functional failure occurs, with robustness quantified by a novel Average Sustainable Loops (ASL) metric-the mean number of iterations maintaining functional correctness across benchmark tasks. This cyclical strategy intrinsically evaluates robustness without relying on external attack configurations, providing a unified metric that reveals how effectively LLMs preserve semantic integrity through sustained self-referential transformations. We evaluate 96 popular LLMs, ranging from 0.5B to 685B parameters, on EVALOOOP equipped with the MBPP Plus benchmark, and found that EVALOOOP typically induces a 2.65%-47.62% absolute drop in pass@1 accuracy within ten loops. Intriguingly, robustness does not always align with initial performance (i.e., one-time query); for instance, Qwen3-235B-A22B-Instruct-2507, despite inferior initial code generation compared to OpenAI's o-series models and DeepSeek-V3, demonstrated the superior robustness (ASL score).

2504.20903 2026-02-17 cs.MA cs.AI cs.HC

Modeling AI-Human Collaboration as a Multi-Agent Adaptation

Prothit Sen, Sai Mihir Jakkaraju

详情
英文摘要

We formalize AI-human collaboration through an agent-based simulation that distinguishes optimization-based AI search from satisficing-based human adaptation. Using an NK model, we examine how these distinct decision heuristics interact across modular and sequenced task structures. For modular tasks, AI typically substitutes for humans, yet complementarities emerge when AI explores a moderately broad search space and human task complexity remains low. In sequenced tasks, we uncover a counterintuitive result: when a high-performing human initiates search and AI subsequently refines it, joint performance is maximized, contradicting the dominant AI-first design principle. Conversely, when AI leads and human satisficing follows, complementarities attenuate as task interdependence increases. We further show that memory-less random AI, despite lacking structured adaptation, can improve outcomes when augmenting low-capability humans by enabling escape from local optima. Collectively, our findings reveal that effective AI-human collaboration depends less on industry context and more on task architecture: the division of labor, sequencing, and interdependence structure. By elevating task decomposition as the central design principle, we provide a generalizable framework for strategic decision-making involving agentic AI across diverse organizational settings.

2504.20630 2026-02-17 eess.AS cs.MM cs.SD

ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting

Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Tao Jin, Zhou Zhao

Comments Accepted by ACM Multimedia 2025

Journal ref MM '2025: Proceedings of the 33rd ACM International Conference on Multimedia Pages 9618 - 9627

详情
英文摘要

Multimodal immersive spatial drama generation focuses on creating continuous multi-speaker binaural speech with dramatic prosody based on multimodal prompts, with potential applications in AR, VR, and others. This task requires simultaneous modeling of spatial information and dramatic prosody based on multimodal inputs, with high data collection costs. To the best of our knowledge, our work is the first attempt to address these challenges. We construct MRSDrama, the first multimodal recorded spatial drama dataset, containing binaural drama audios, scripts, videos, geometric poses, and textual prompts. Then, we propose ISDrama, the first immersive spatial drama generation model through multimodal prompting. ISDrama comprises these primary components: 1) Multimodal Pose Encoder, based on contrastive learning, considering the Doppler effect caused by moving speakers to extract unified pose information from multimodal prompts. 2) Immersive Drama Transformer, a flow-based mamba-transformer model that generates high-quality drama, incorporating Drama-MOE to select proper experts for enhanced prosody and pose control. We also design a context-consistent classifier-free guidance strategy to coherently generate complete drama. Experimental results show that ISDrama outperforms baseline models on objective and subjective metrics. The demos are available at https://aaronz345.github.io/ISDramaDemo. We provide the dataset and the evaluation code at https://huggingface.co/datasets/AaronZ345/MRSDrama and https://github.com/AaronZ345/ISDrama.

2504.20532 2026-02-17 cs.MM cs.CR cs.SD eess.AS

TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Traceability

Yue Li, Weizhi Liu, Kaiqing Lin, Dongdong Lin, Kassem Kallas

详情
英文摘要

Diffusion-based speech generation has achieved remarkable fidelity, increasing the risk of misuse and unauthorized redistribution. However, most existing generative speech watermarking methods are developed for GAN-based pipelines, and watermarking for diffusion-based speech generation remains comparatively underexplored. In addition, prior work often focuses on content-level provenance, while support for model-level and user-level attribution is less mature. We propose \textbf{TriniMark}, a diffusion-based generative speech watermarking framework that targets trinity-level traceability, i.e., the ability to associate a generated speech sample with (i) the embedded watermark message (content-level provenance), (ii) the source generative model (model-level attribution), and (iii) the end user who requested generation (user-level traceability). TriniMark uses a lightweight encoder to embed watermark bits into time-domain speech features and reconstruct the waveform, and a temporal-aware gated convolutional decoder for reliable bit recovery. We further introduce a waveform-guided fine-tuning strategy to transfer watermarking capability into a diffusion model. Finally, we incorporate variable-watermark training so that a single trained model can embed different watermark messages at inference time, enabling scalable user-level traceability. Experiments on speech datasets indicate that TriniMark maintains speech quality while improving robustness to common single and compound signal-processing attacks, and it supports high-capacity watermarking for large-scale traceability.

2504.19062 2026-02-17 eess.AS cs.CL cs.SD

Versatile Framework for Song Generation with Prompt-based Control

Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Ruiqi Li, Jingyu Lu, Rongjie Huang, Ruiyuan Zhang, Zhiqing Hong, Ziyue Jiang, Zhou Zhao

Comments Accepted by Findings of EMNLP 2025

Journal ref Findings of the Association for Computational Linguistics: EMNLP 2025

详情
英文摘要

Song generation focuses on producing controllable high-quality songs based on various prompts. However, existing methods struggle to generate vocals and accompaniments with prompt-based control and proper alignment. Additionally, they fall short in supporting various tasks. To address these challenges, we introduce VersBand, a multi-task song generation framework for synthesizing high-quality, aligned songs with prompt-based control. VersBand comprises these primary models: 1) VocalBand, a decoupled model, leverages the flow-matching method for generating singing styles, pitches, and mel-spectrograms, allowing fast, high-quality vocal generation with style control. 2) AccompBand, a flow-based transformer model, incorporates the Band-MOE, selecting suitable experts for enhanced quality, alignment, and control. This model allows for generating controllable, high-quality accompaniments aligned with vocals. 3) Two generation models, LyricBand for lyrics and MelodyBand for melodies, contribute to the comprehensive multi-task song generation system, allowing for extensive control based on multiple prompts. Experimental results show that VersBand outperforms baseline models across multiple song generation tasks using objective and subjective metrics. Demos and codes are available at https://aaronz345.github.io/VersBandDemo and https://github.com/AaronZ345/VersBand.

2504.18367 2026-02-17 physics.comp-ph cs.LG physics.chem-ph q-bio.BM

A Novel 4-D Dataset Paradigm for Studying Complete Ligand-Protein Dissociation Dynamics

Maodong Li, Jiying Zhang, Zhe Wang, Bin Feng, Wenqi Zeng, Dechin Chen, Zhijun Pan, Yu Li, Zijing Liu, Yi Isaac Yang

Comments The dissociation dynamics dataset DD-13M is publicly available at https://huggingface.co/datasets/SZBL-IDEA/MD (For facilitated browsing and categorical download, a dedicated web interface is maintained at: https://aimm.szbl.ac.cn/database/ddd/#/home)

详情
英文摘要

The kinetics and dynamics of drug-protein binding and dissociation are crucial to understanding drug absorption and metabolism. Despite advances in artificial intelligence (AI) tools for drug-protein interaction studies, existing training datasets remain limited to static structures or quasi-static conformations. This paper proposes a novel computational approach for rapidly generating drug-protein dissociation trajectories and presents the inaugural dynamically time-resolved 4-D (t, x, y, z) trajectory database DD-13M. This dataset captures over 26,000 complete dissociation processes for 565 ligand-protein complexes, providing nearly 13 million frames of all-atom simulation trajectories. A deep equivariant generative model, UnbindingFlow, was trained using the DD-13M dataset. This model has the capacity to produce dissociation trajectories for novel targets whilst accurately predicting their rate constants (koff). DD-13M introduces a new type of training dataset for AI models, establishing a de novo paradigm for studying the dynamics of drug-protein interactions.

2502.16730 2026-02-17 cs.CR cs.AI

RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents

Sho Nakatani

详情
英文摘要

We present RapidPen, a fully automated penetration testing (pentesting) framework that addresses the challenge of achieving an initial foothold (IP-to-Shell) without human intervention. Unlike prior approaches that focus primarily on post-exploitation or require a human-in-the-loop, RapidPen leverages large language models (LLMs) to autonomously discover and exploit vulnerabilities, starting from a single IP address. By integrating advanced ReAct-style task planning (Re) with retrieval-augmented knowledge bases of successful exploits, along with a command-generation and direct execution feedback loop (Act), RapidPen systematically scans services, identifies viable attack vectors, and executes targeted exploits in a fully automated manner. In our evaluation against a vulnerable target from the Hack The Box platform, RapidPen achieved shell access within 200-400 seconds at a per-run cost of approximately \$0.3-\$0.6, demonstrating a 60\% success rate when reusing prior "success-case" data. These results underscore the potential of truly autonomous pentesting for both security novices and seasoned professionals. Organizations without dedicated security teams can leverage RapidPen to quickly identify critical vulnerabilities, while expert pentesters can offload repetitive tasks and focus on complex challenges. Ultimately, our work aims to make penetration testing more accessible and cost-efficient, thereby enhancing the overall security posture of modern software ecosystems. Fore more information, visit this link: https://secdevlab.com/rapidpen

2502.01713 2026-02-17 cs.CY cs.LG

Auditing a Dutch Public Sector Risk Profiling Algorithm Using an Unsupervised Bias Detection Tool

Floris Holstege, Mackenzie Jorgensen, Kirtan Padh, Jurriaan Parie, Krsto Prorokovic, Joel Persson, Lukas Snoek

详情
英文摘要

Algorithms are increasingly used to automate or aid human decisions, yet recent research shows that these algorithms may exhibit bias across legally protected demographic groups. However, data on these groups may be unavailable to organizations or external auditors due to privacy legislation. This paper studies bias detection using an unsupervised bias detection tool when data on demographic groups are unavailable. We collaborated with the Dutch Executive Agency for Education to audit an algorithm that was used to assign risk scores to college students at the national level in the Netherlands between 2012-2023. Our audit covers more than 250,000 students across the country. The unsupervised bias detection tool highlights known disparities between students with a non-European migration background and students with a Dutch or European-migration background. Our contributions are two-fold: (1) we assess bias in a real-world, large-scale, and high-stakes decision-making process by a governmental organization; (2) we provide the unsupervised bias detection tool in an open-source library for others to use to complete bias audits. Our work serves as a starting point for a deliberative assessment by human experts to evaluate potential discrimination in algorithmic decision-making.

2410.17587 2026-02-17 cs.CE cs.LG econ.GN physics.soc-ph q-fin.EC

Predicting Company Growth using Scaling Theory informed Machine Learning

Ruyi Tao, Veronica R. Cappelli, Kaiwei Liu, Marcus J. Hamilton, Christopher P. Kempes, Geoffrey B. Wes, Jiang Zhang

Comments 28 pages, 13 figures, 3 tables

详情
英文摘要

Predicting company growth is a critical yet challenging task because observed dynamics blend an underlying structural growth trend with volatile fluctuations. Here, we propose a Scaling-Theory-Informed Machine Learning (STIML) framework that integrates a scaling-based growth model to capture the mechanism-driven average trend, together with a data-driven forecasting model to learn the residual fluctuations. Using Compustat annual financial statement data (1950--2019) for 31,553 North American companies, we extend the growth model beyond assets to multiple financial indicators, and evaluate STIML against growth model-only and purely data-driven baselines. Across 16 target variables, we show that company growth exhibits a clear separation between trend-driven predictability and fluctuation-driven predictability, with their relative importance depending strongly on company size and volatility. Interpretability analyses further show that STIML captures multivariate dependencies beyond simple autocorrelation, and that macroeconomic variables contribute significantly less to predictive performance on average. Moreover, we find the scaling-based growth model overlooks asymmetric deviations, which instead contain the structured and learnable signals, suggesting a path to refine mechanistic models.

2409.13832 2026-02-17 eess.AS cs.CL cs.SD

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Yu Zhang, Changhao Pan, Wenxiang Guo, Ruiqi Li, Zhiyuan Zhu, Jialei Wang, Wenhao Xu, Jingyu Lu, Zhiqing Hong, Chuxin Wang, LiChao Zhang, Jinzheng He, Ziyue Jiang, Yuxin Chen, Chen Yang, Jiecheng Zhou, Xinyu Cheng, Zhou Zhao

Comments Accepted by NeurIPS 2024 (Spotlight)

Journal ref Advances in Neural Information Processing Systems 37 (NeurIPS 2024)

详情
英文摘要

The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singers, absence of multi-technique information and realistic music scores, and poor task suitability. To tackle these problems, we present GTSinger, a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks. Particularly, (1) we collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset; (2) 20 professional singers across nine widely spoken languages offer diverse timbres and styles; (3) we provide controlled comparison and phoneme-level annotations of six commonly used singing techniques, helping technique modeling and control; (4) GTSinger offers realistic music scores, assisting real-world musical composition; (5) singing voices are accompanied by manual phoneme-to-audio alignments, global style labels, and 16.16 hours of paired speech for various singing tasks. Moreover, to facilitate the use of GTSinger, we conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion. The demos can be found at http://aaronz345.github.io/GTSingerDemo/. We provide the dataset and the code for processing data and conducting benchmarks at https://huggingface.co/datasets/AaronZ345/GTSinger and https://github.com/AaronZ345/GTSinger.

2409.04994 2026-02-17 math.OC cs.LG cs.NA math.NA

Learning nonnegative matrix factorizations from compressed data

Abraar Chaudhry, Elizaveta Rebrova

详情
英文摘要

We propose a flexible and theoretically supported framework for scalable nonnegative matrix factorization. The goal is to find nonnegative low-rank components directly from compressed measurements, accessing the original data only once or twice. We consider compression through randomized sketching methods that can be adapted to the data, or can be oblivious. We formulate optimization problems that only depend on the compressed data, but which can recover a nonnegative factorization which closely approximates the original matrix. The defined problems can be approached with a variety of algorithms, and in particular, we discuss variations of the popular multiplicative updates method for these compressed problems. We demonstrate the success of our approaches empirically and validate their performance in real-world applications.

2402.03931 2026-02-17 cond-mat.mes-hall cs.LG quant-ph

Fully autonomous tuning of a spin qubit

Jonas Schuff, Miguel J. Carballido, Madeleine Kotzagiannidis, Juan Carlos Calvo, Marco Caselli, Jacob Rawling, David L. Craig, Barnaby van Straaten, Brandon Severin, Federico Fedele, Simon Svab, Pierre Chevalier Kwon, Rafael S. Eggli, Taras Patlatiuk, Nathan Korda, Dominik Zumbühl, Natalia Ares

Journal ref Nature Electronics (2026)

详情
英文摘要

Spanning over two decades, the study of qubits in semiconductors for quantum computing has yielded significant breakthroughs. However, the development of large-scale semiconductor quantum circuits is still limited by challenges in efficiently tuning and operating these circuits. Identifying optimal operating conditions for these qubits is complex, involving the exploration of vast parameter spaces. This presents a real 'needle in the haystack' problem, which, until now, has resisted complete automation due to device variability and fabrication imperfections. In this study, we present the first fully autonomous tuning of a semiconductor qubit, from a grounded device to Rabi oscillations, a clear indication of successful qubit operation. We demonstrate this automation, achieved without human intervention, in a Ge/Si core/shell nanowire device. Our approach integrates deep learning, Bayesian optimization, and computer vision techniques. We expect this automation algorithm to apply to a wide range of semiconductor qubit devices, allowing for statistical studies of qubit quality metrics. As a demonstration of the potential of full automation, we characterise how the Rabi frequency and g-factor depend on barrier gate voltages for one of the qubits found by the algorithm. Twenty years after the initial demonstrations of spin qubit operation, this significant advancement is poised to finally catalyze the operation of large, previously unexplored quantum circuits.

2306.14297 2026-02-17 stat.ME cs.LG

Inference for relative sparsity

Samuel J. Weisenthal, Sally W. Thurston, Ashkan Ertefaie

Comments 66 pages, 3 figures

详情
英文摘要

In healthcare, there is much interest in estimating policies, or mappings from covariates to treatment decisions. Recently, there is also interest in constraining these estimated policies to the standard of care, which generated the observed data. A relative sparsity penalty was proposed to derive policies that have sparse, explainable differences from the standard of care, facilitating justification of the new policy. However, the developers of this penalty only considered estimation, not inference. Here, we develop inference for the relative sparsity objective function, because characterizing uncertainty is crucial to applications in medicine. Further, in the relative sparsity work, the authors only considered the single-stage decision case; here, we consider the more general, multi-stage case. Inference is difficult, because the relative sparsity objective depends on the unpenalized value function, which is unstable and has infinite estimands in the binary action case. Further, one must deal with a non-differentiable penalty. To tackle these issues, we nest a weighted Trust Region Policy Optimization function within a relative sparsity objective, implement an adaptive relative sparsity penalty, and propose a sample-splitting framework for post-selection inference. We study the asymptotic behavior of our proposed approaches, perform extensive simulations, and analyze a real, electronic health record dataset.

2305.01186 2026-02-17 cs.CY cs.AI

Deconstructing Student Perceptions of Generative AI (GenAI) through an Expectancy Value Theory (EVT)-based Instrument

Cecilia Ka Yuk Chan, Wenxin Zhou

Journal ref Smart Learn. Environ. 10, 64 (2023)

详情
英文摘要

This study examines the relationship between student perceptions and their intention to use generative AI in higher education. Drawing on Expectancy-Value Theory (EVT), a questionnaire was developed to measure students' knowledge of generative AI, perceived value, and perceived cost. A sample of 405 students participated in the study, and confirmatory factor analysis was used to validate the constructs. The results indicate a strong positive correlation between perceived value and intention to use generative AI, and a weak negative correlation between perceived cost and intention to use. As we continue to explore the implications of generative AI in education and other domains, it is crucial to carefully consider the potential long-term consequences and the ethical dilemmas that may arise from widespread adoption.

2103.14203 2026-02-17 stat.ML cs.LG

Deep Two-Way Matrix Reordering for Relational Data Analysis

Chihiro Watanabe, Taiji Suzuki

详情
英文摘要

Matrix reordering is a task to permute the rows and columns of a given observed matrix such that the resulting reordered matrix shows meaningful or interpretable structural patterns. Most existing matrix reordering techniques share the common processes of extracting some feature representations from an observed matrix in a predefined manner, and applying matrix reordering based on it. However, in some practical cases, we do not always have prior knowledge about the structural pattern of an observed matrix. To address this problem, we propose a new matrix reordering method, called deep two-way matrix reordering (DeepTMR), using a neural network model. The trained network can automatically extract nonlinear row/column features from an observed matrix, which can then be used for matrix reordering. Moreover, the proposed DeepTMR provides the denoised mean matrix of a given observed matrix as an output of the trained network. This denoised mean matrix can be used to visualize the global structure of the reordered observed matrix. We demonstrate the effectiveness of the proposed DeepTMR by applying it to both synthetic and practical datasets.

2103.02872 2026-02-17 cs.CR cs.LG

An RL-Based Adaptive Detection Strategy to Secure Cyber-Physical Systems

Ipsita Koley, Sunandan Adhikary, Soumyajit Dey

详情
英文摘要

Increased dependence on networked, software based control has escalated the vulnerabilities of Cyber Physical Systems (CPSs). Detection and monitoring components developed leveraging dynamical systems theory are often employed as lightweight security measures for protecting such safety critical CPSs against false data injection attacks. However, existing approaches do not correlate attack scenarios with parameters of detection systems. In the present work, we propose a Reinforcement Learning (RL) based framework which adaptively sets the parameters of such detectors based on experience learned from attack scenarios, maximizing detection rate and minimizing false alarms in the process while attempting performance preserving control actions.

2602.13675 2026-02-17 cs.HC cs.AI

Transferable XAI: Relating Understanding Across Domains with Explanation Transfer

Fei Wang, Yifan Zhang, Brian Y. Lim

Comments 40 pages, accepted by IUI2026

详情
英文摘要

Current Explainable AI (XAI) focuses on explaining a single application, but when encountering related applications, users may rely on their prior understanding from previous explanations. This leads to either overgeneralization and AI overreliance, or burdensome independent memorization. Indeed, related decision tasks can share explanatory factors, but with some notable differences; e.g., body mass index (BMI) affects the risks for heart disease and diabetes at the same rate, but chest pain is more indicative of heart disease. Similarly, models using different attributes for the same task still share signals; e.g., temperature and pressure affect air pollution but in opposite directions due to the ideal gas law. Leveraging transfer of learning, we propose Transferable XAI to enable users to transfer understanding across related domains by explaining the relationship between domain explanations using a general affine transformation framework applied to linear factor explanations. The framework supports explanation transfer across various domain types: translation for data subspace (subsuming prior work on Incremental XAI), scaling for decision task, and mapping for attributes. Focusing on task and attributes domain types, in formative and summative user studies, we investigated how well participants could understand AI decisions from one domain to another. Compared to single-domain and domain-independent explanations, Transferable XAI was the most helpful for understanding the second domain, leading to the best decision faithfulness, factor recall, and ability to relate explanations between domains. This framework contributes to improving the reusability of explanations across related AI applications by explaining factor relationships between subspaces, tasks, and attributes.

2602.13672 2026-02-17 cs.NI cs.LG

LEAD-Drift: Real-time and Explainable Intent Drift Detection by Learning a Data-Driven Risk Score

Md. Kamrul Hossain, Walid Aljoby

Comments Copyright 2026 IEEE. Accepted for publication in IEEE ICC 2026. This is a author version. 6 pages

详情
英文摘要

Intent-Based Networking (IBN) simplifies network management, but its reliability is challenged by "intent drift", where the network's state gradually deviates from its intended goal, often leading to silent failures. Conventional approaches struggle to detect the subtle, early stages of intent drift, raising alarms only when degradation is significant and failure is imminent, which limits their effectiveness for proactive assurance. To address this, we propose LEAD-Drift, a framework that detects intent drift in real time to enable proactive failure prevention. LEAD-Drift's core contribution is reformulating intent failure detection as a supervised learning problem by training a lightweight neural network on fixed-horizon labels to predict a future risk score. The model's raw output is then smoothed with an Exponential Moving Average (EMA) and passed through a statistically tuned threshold to generate robust, real-time alerts. Furthermore, we enhance the framework with two key features for operational intelligence: a multi-horizon modeling technique for dynamic time-to-failure estimation, and per-alert explainability using SHAP to identify root-cause KPIs. Our evaluation on a time-series dataset shows LEAD-Drift provides significantly earlier warnings, improving the average lead time by 7.3 minutes (+17.8\%) compared to a distance-based baseline. It also reduces alert noise by 80.2\% compared to a weighted-KPI heuristic, with only a minor trade-off in lead time. These results demonstrate that LEAD-Drift as a highly effective, interpretable, and operationally efficient solution for proactive network assurance in IBN.

2602.13671 2026-02-17 cs.MA cs.AI

MAS-on-the-Fly: Dynamic Adaptation of LLM-based Multi-Agent Systems at Test Time

Guangyi Liu, Haojun Lin, Huan Zeng, Heng Wang, Quanming Yao

详情
英文摘要

Large Language Model (LLM)-based multi-agent systems (MAS) have emerged as a promising paradigm for solving complex tasks. However, existing works often rely on manual designs or "one-size-fits-all" automation, lacking dynamic adaptability after deployment. Inspired by how biological systems adapt, we introduce MASFly, a novel multi-agent framework enabling dynamic adaptation at test time. To adapt system generation, MASFly employs a retrieval-augmented SOP instantiation mechanism that leverages a self-constructed repository of successful collaboration patterns, enabling the LLM to assemble customized MASs for new queries. For adaptive execution, MASFly incorporates an experience-guided supervision mechanism, where a dedicated Watcher agent monitors system behaviors with reference to a personalized experience pool and provides real-time interventions. Extensive experiments demonstrate that MASFly achieves state-of-the-art performance, most notably a 61.7% success rate on the TravelPlanner benchmark, while exhibiting strong task adaptability and robustness.

2602.13625 2026-02-17 cs.HC cs.AI

Anthropomorphism on Risk Perception: The Role of Trust and Domain Knowledge in Decision-Support AI

Manuele Reani, Xiangyang He, Zuolan Bao

详情
英文摘要

Anthropomorphic design is routinely used to make conversational agents more approachable and engaging. Yet its influence on users' perceptions remains poorly understood. Drawing on psychological theories, we propose that anthropomorphism influences risk perception via two complementary forms of trust, and that domain knowledge moderates these relationships. To test our model, we conducted a large-scale online experiment (N = 1,256) on a financial decision-support system implementing different anthropomorphic designs. We found that anthropomorphism indirectly reduces risk perception by increasing both cognitive and affective trust. Domain knowledge moderates these paths: participants with low financial knowledge experience a negative indirect effect of perceived anthropomorphism on risk perception via cognitive trust, whereas those with high financial knowledge exhibit a positive direct and indirect effect. We discuss theoretical contributions to human-AI interaction and design implications for calibrating trust in anthropomorphic decision-support systems for responsible AI.

2602.13619 2026-02-17 stat.ML cs.IT cs.LG math.IT stat.ME

Locally Private Parametric Methods for Change-Point Detection

Anuj Kumar Yadav, Cemre Cadir, Yanina Shkel, Michael Gastpar

Comments 43 pages, 20 figures

详情
英文摘要

We study parametric change-point detection, where the goal is to identify distributional changes in time series, under local differential privacy. In the non-private setting, we derive improved finite-sample accuracy guarantees for a change-point detection algorithm based on the generalized log-likelihood ratio test, via martingale methods. In the private setting, we propose two locally differentially private algorithms based on randomized response and binary mechanisms, and analyze their theoretical performance. We derive bounds on detection accuracy and validate our results through empirical evaluation. Our results characterize the statistical cost of local differential privacy in change-point detection and show how privacy degrades performance relative to a non-private benchmark. As part of this analysis, we establish a structural result for strong data processing inequalities (SDPI), proving that SDPI coefficients for Rényi divergences and their symmetric variants (Jeffreys-Rényi divergences) are achieved by binary input distributions. These results on SDPI coefficients are also of independent interest, with applications to statistical estimation, data compression, and Markov chain mixing.

2602.13611 2026-02-17 cs.SE cs.AI

From What to How: Bridging User Requirements with Software Development Using Large Language Models

Xiao He, Ru Chen, Jialun Cao

详情
英文摘要

Recently, large language models (LLMs) are extensively utilized to enhance development efficiency, leading to numerous benchmarks for evaluating their performance. However, these benchmarks predominantly focus on implementation, overlooking the equally critical aspect of software design. This gap raises two pivotal questions: (1) Can LLMs handle software design? (2) Can LLMs write code following the specific designs? To investigate these questions, this paper proposes DesBench, a design-aware benchmark for evaluating LLMs on three software design-related tasks: design-aware code generation, object-oriented modeling, and the design of acceptance test cases. DesBench comprises 30 manually crafted Java projects that include requirement documents, design models, implementations, and acceptance tests, amounting to a total of 30 design models, 194 Java classes, and 737 test cases. We evaluated seven state-of-the-art LLMs, including three DeepSeek R1, two Qwen2.5, and two GPT models, using DesBench. The results reveal that LLMs remain significantly challenged by the intricacies of software design: (1) For code generation, LLMs struggle to produce correct implementations when provided with only high-level or no designs. (2) In object-oriented modeling, while LLMs can accurately identify objects and classes, they face challenges in defining operations and inter-class relationships. (3) Acceptance test cases generated by LLMs from functional requirements achieve code coverage quality comparable to those written by humans. Our research highlights the current limitations of LLMs in managing software design and calls for further investigation into new design methodologies and languages suitable for LLM-based development.

2602.13606 2026-02-17 cs.NI cs.AI cs.ET cs.LG

Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework

Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang

Comments 13 Pages. arXiv admin note: text overlap with arXiv:2509.11112

Journal ref IEEE Transactions on Vehicular Technology, 2026

详情
英文摘要

Millimeter wave (mmWave) communication, utilizing beamforming techniques to address the inherent path loss limitation, is considered as one of the key technologies to support ever increasing high throughput and low latency demands of connected vehicles. However, adopting standard defined beamforming approach in highly dynamic vehicular environments often incurs high beam training overheads and reduction in the available airtime for communications, which is mainly due to exchanging pilot signals and exhaustive beam measurements. To this end, we present a multi-modal sensing and fusion learning framework as a potential alternative solution to reduce such overheads. In this framework, we first extract the representative features from the sensing modalities by modality specific encoders, then, utilize multi-head cross-modal attention to learn dependencies and correlations between different modalities, and subsequently fuse the multimodal features to obtain predicted top-k beams so that the best line-of-sight links can be proactively established. To show the generalizability of the proposed framework, we perform a comprehensive experiment in four different vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) scenarios from real world multimodal and 60 GHz mmWave wireless sensing data. The experiment reveals that the proposed framework (i) achieves up to 96.72% accuracy on predicting top-15 beams correctly, (ii) incurs roughly 0.77 dB average power loss, and (iii) improves the overall latency and beam searching space overheads by 86.81% and 76.56% respectively for top-15 beams compared to standard defined approach.

2602.13576 2026-02-17 cs.CR cs.AI cs.CL

Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

Ruomeng Ding, Yifei Pang, He Sun, Yizhong Wang, Zhiwei Steven Wu, Zhun Deng

详情
英文摘要

Evaluation and alignment pipelines for large language models increasingly rely on LLM-based judges, whose behavior is guided by natural-language rubrics and validated on benchmarks. We identify a previously under-recognized vulnerability in this workflow, which we term Rubric-Induced Preference Drift (RIPD). Even when rubric edits pass benchmark validation, they can still produce systematic and directional shifts in a judge's preferences on target domains. Because rubrics serve as a high-level decision interface, such drift can emerge from seemingly natural, criterion-preserving edits and remain difficult to detect through aggregate benchmark metrics or limited spot-checking. We further show this vulnerability can be exploited through rubric-based preference attacks, in which benchmark-compliant rubric edits steer judgments away from a fixed human or trusted reference on target domains, systematically inducing RIPD and reducing target-domain accuracy up to 9.5% (helpfulness) and 27.9% (harmlessness). When these judgments are used to generate preference labels for downstream post-training, the induced bias propagates through alignment pipelines and becomes internalized in trained policies. This leads to persistent and systematic drift in model behavior. Overall, our findings highlight evaluation rubrics as a sensitive and manipulable control interface, revealing a system-level alignment risk that extends beyond evaluator reliability alone. The code is available at: https://github.com/ZDCSlab/Rubrics-as-an-Attack-Surface. Warning: Certain sections may contain potentially harmful content that may not be appropriate for all readers.

2602.13562 2026-02-17 cs.CR cs.AI cs.CL

Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning

Yanbo Wang, Minzheng Wang, Jian Liang, Lu Wang, Yongcan Yu, Ran He

Comments Preprint. 18 pages, 6 figures

详情
英文摘要

While reasoning models have achieved remarkable success in complex reasoning tasks, their increasing power necessitates stringent safety measures. For safety alignment, the core challenge lies in the inherent trade-off between safety and utility. However, prevailing alignment strategies typically construct CoT training data with explicit safety rules via context distillation. This approach inadvertently limits reasoning capabilities by creating a rigid association between rule memorization and refusal. To mitigate the safety-utility trade-off, we propose the Adaptive Safe Context Learning (ASCL) framework to improve the reasoning given proper context. ASCL formulates safety alignment as a multi-turn tool-use process, empowering the model to autonomously decide when to consult safety rules and how to generate the ongoing reasoning. Furthermore, to counteract the preference for rule consultation during RL, we introduce Inverse Frequency Policy Optimization (IFPO) to rebalance advantage estimates. By decoupling rule retrieval and subsequent reasoning, our method achieves higher overall performance compared to baselines.

2602.13556 2026-02-17 cs.IT cs.AI eess.SP math.IT

Discrete-Space Generative AI Pipeline for Semantic Transmission of Signals

Silvija Kokalj-Filipovic, Yagna Kaasaragadda

详情
英文摘要

We introduce Discernment, a semantic communication system that transmits the meaning of physical signals (baseband radio and audio) over a technical channel using GenAI models operating in discrete spaces. Discernment dynamically adapts to channel impairments - modeled as erasure channels - by switching between an autoregressive or a diffusion-based generative algorithm, depending on the erasure pattern. Our results show that Discernment maintains semantic integrity even as channel capacity severely degrades, exhibiting very small and graceful performance decline in both classification accuracy and statistical fidelity of the reconstructed meaning. These findings demonstrate Discernment's ability to adjust to diverse physical channel conditions while maintaining spectral efficiency and low model complexity, making it well suited for IoT deployments and strongly motivating further research on this semantic channel paradigm.

2602.13554 2026-02-17 cs.ET cs.IT cs.RO math.IT

From Snapshot Sensing to Persistent EM World Modeling: A Generative-Space Perspective for ISAC

Pin-Han Ho, Haoran Mei, Limei Peng, Yiming Miao, Kairan Liang, Yan Jiao

Comments 7 pages, 6 figures/tables

详情
英文摘要

Electromagnetic (EM) world modeling is emerging as a foundational capability for environment-aware and embodiment-enabled wireless systems. However, most existing mmWave sensing solutions are designed for snapshot-based parameter estimation and rely on hardware-intensive architectures, making scalable and persistent world modeling difficult to achieve. This article rethinks mmWave sensing from a system-level perspective and introduces a generative-space framework, in which sensing is realized through controlled traversal of a low-dimensional excitation space spanning frequency, waveform, and physical embodiment. This perspective decouples spatial observability from rigid antenna arrays and transmit-time multiplexing, enabling flexible and scalable sensing-by-design radios. To illustrate the practicality of this framework, we present a representative realization called Multi-RF Chain Frequency-as-Aperture Clip-on Aperture Fabric (MRC-FaA-CAF), where multiple FMCW sources coordinate frequency-selective modules distributed along guided-wave backbones. This architecture enables interference-free excitation, preserves beat-frequency separability, and maintains low calibration overhead. Case studies show that generative-space-driven sensing can achieve update rates comparable to phased arrays while avoiding dense RF replication and the latency penalties of TDM-MIMO systems. Overall, this work positions generative-space-driven sensing as a practical architectural foundation for mmWave systems that move beyond snapshot sensing toward persistent EM world modeling.

2602.13547 2026-02-17 cs.CR cs.AI

AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks

Weiming Song, Xuan Xie, Ruiping Yin

详情
英文摘要

Large language models (LLMs) remain vulnerable to jailbreak prompts that elicit harmful or policy-violating outputs, while many existing defenses rely on expensive fine-tuning, intrusive prompt rewriting, or external guardrails that add latency and can degrade helpfulness. We present AISA, a lightweight, single-pass defense that activates safety behaviors already latent inside the model rather than treating safety as an add-on. AISA first localizes intrinsic safety awareness via spatiotemporal analysis and shows that intent-discriminative signals are broadly encoded, with especially strong separability appearing in the scaled dot-product outputs of specific attention heads near the final structural tokens before generation. Using a compact set of automatically selected heads, AISA extracts an interpretable prompt-risk score with minimal overhead, achieving detector-level performance competitive with strong proprietary baselines on small (7B) models. AISA then performs logits-level steering: it modulates the decoding distribution in proportion to the inferred risk, ranging from normal generation for benign prompts to calibrated refusal for high-risk requests -- without changing model parameters, adding auxiliary modules, or requiring multi-pass inference. Extensive experiments spanning 13 datasets, 12 LLMs, and 14 baselines demonstrate that AISA improves robustness and transfer while preserving utility and reducing false refusals, enabling safer deployment even for weakly aligned or intentionally risky model variants.

2602.13543 2026-02-17 cs.IR cs.CL cs.LG

LiveNewsBench: Evaluating LLM Web Search Capabilities with Freshly Curated News

Yunfan Zhang, Kathleen McKeown, Smaranda Muresan

Comments An earlier version of this work was publicly available on OpenReview as an ICLR 2026 submission in September 2025

详情
英文摘要

Large Language Models (LLMs) with agentic web search capabilities show strong potential for tasks requiring real-time information access and complex fact retrieval, yet evaluating such systems remains challenging. We introduce \bench, a rigorous and regularly updated benchmark designed to assess the agentic web search abilities of LLMs. \bench automatically generates fresh question-answer pairs from recent news articles, ensuring that questions require information beyond an LLM's training data and enabling clear separation between internal knowledge and search capability. The benchmark features intentionally difficult questions requiring multi-hop search queries, page visits, and reasoning, making it well-suited for evaluating agentic search behavior. Our automated data curation and question generation pipeline enables frequent benchmark updates and supports construction of a large-scale training dataset for agentic web search models, addressing the scarcity of such data in the research community. To ensure reliable evaluation, we include a subset of human-verified samples in the test set. We evaluate a broad range of systems using \bench, including commercial and open-weight LLMs as well as LLM-based web search APIs. The leaderboard, datasets, and code are publicly available at livenewsbench.com.