arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1284
专题追踪
2501.08466 2026-02-09 cs.CY cs.AI

A Short-Term Predict-Then-Cluster Framework for Meal Delivery Services

Jingyi Cheng, Shadi Sharif Azadeh

Journal ref Data Science for Transportation, 7(3), 26 (2025)

详情
英文摘要

Micro-delivery services offer promising solutions for on-demand city logistics, but their success relies on efficient real-time delivery operations and fleet management. On-demand meal delivery platforms seek to optimize real-time operations based on anticipatory insights into citywide demand distributions. To address these needs, this study proposes a short-term predict-then-cluster framework for on-demand meal delivery services. The framework utilizes ensemble-learning methods for point and distributional forecasting with multivariate features, including lagged-dependent inputs to capture demand dynamics. We introduce Constrained K-Means Clustering (CKMC) and Contiguity Constrained Hierarchical Clustering with Iterative Constraint Enforcement (CCHC-ICE) to generate dynamic clusters based on predicted demand and geographical proximity, tailored to user-defined operational constraints. Evaluations of European and Taiwanese case studies demonstrate that the proposed methods outperform traditional time series approaches in both accuracy and computational efficiency. Clustering results demonstrate that the incorporation of distributional predictions effectively addresses demand uncertainties, improving the quality of operational insights. Additionally, a simulation study demonstrates the practical value of short-term demand predictions for proactive strategies, such as idle fleet rebalancing, significantly enhancing delivery efficiency. By addressing demand uncertainties and operational constraints, our predict-then-cluster framework provides actionable insights for optimizing real-time operations. The approach is adaptable to other on-demand platform-based city logistics and passenger mobility services, promoting sustainable and efficient urban operations.

2501.00382 2026-02-09 econ.GN cs.AI q-fin.EC stat.AP stat.ML

Adventures in Demand Analysis Using AI

Philipp Bach, Victor Chernozhukov, Sven Klaassen, Martin Spindler, Jan Teichert-Kluge, Suhas Vijaykumar

Comments 35 pages, 8 figures

详情
英文摘要

This paper advances empirical demand analysis by integrating multimodal product representations derived from artificial intelligence (AI). Using a detailed dataset of toy cars on textit{Amazon.com}, we combine text descriptions, images, and tabular covariates to represent each product using transformer-based embedding models. These embeddings capture nuanced attributes, such as quality, branding, and visual characteristics, that traditional methods often struggle to summarize. Moreover, we fine-tune these embeddings for causal inference tasks. We show that the resulting embeddings substantially improve the predictive accuracy of sales ranks and prices and that they lead to more credible causal estimates of price elasticity. Notably, we uncover strong heterogeneity in price elasticity driven by these product-specific features. Our findings illustrate that AI-driven representations can enrich and modernize empirical demand analysis. The insights generated may also prove valuable for applied causal inference more broadly.

2411.18220 2026-02-09 eess.SP cs.AI cs.LG

R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge

Aladin Djuhera, Vlad C. Andrei, Mohsen Pourghasemian, Haris Gacanin, Holger Boche, Walid Saad

Journal ref IEEE International Conference on Communications (ICC), 2025

详情
英文摘要

Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently. However, training MTLLMs is complex and exhaustive, particularly when tasks are subject to change. Recently, the concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM. In this paper, the problem of enabling edge users to collaboratively craft such MTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks. To this end, first the influence of adversarial noise to multi-task model fusion is investigated and a relationship between the so-called weight disentanglement error and the mean squared error (MSE) is derived. Using hypothesis testing, it is directly shown that the MSE increases interference between task vectors, thereby rendering model fusion ineffective. Then, a novel resilient MTLLM fusion (R-MTLLMF) is proposed, which leverages insights about the LLM architecture and fine-tuning process to safeguard task vector aggregation under adversarial noise by realigning the MTLLM. The proposed R-MTLLMF is then compared for both worst-case and ideal transmission scenarios to study the impact of the wireless channel. Extensive model fusion experiments with vision LLMs demonstrate R-MTLLMF's effectiveness, achieving close-to-baseline performance across eight different tasks in ideal noise scenarios and significantly outperforming unprotected model fusion in worst-case scenarios. The results further advocate for additional physical layer protection for a holistic approach to resilience, from both a wireless and LLM perspective.

2408.16553 2026-02-09 eess.IV cs.LG

Downscaling Neural Network for Coastal Simulations

Zhi-Song Liu, Markus Büttner, Matthew Scarborough, Eirik Valseth, Vadym Aizinger, Bernhard Kainz, Andreas Rupp

详情
英文摘要

Learning the fine-scale details of a coastal ocean simulation from a coarse representation is a challenging task. For real-world applications, high-resolution simulations are necessary to advance understanding of many coastal processes, specifically, to predict flooding resulting from tsunamis and storm surges. We propose a Downscaling Neural Network for Coastal Simulation (DNNCS) for spatiotemporal enhancement to learn the high-resolution numerical solution. Given images of coastal simulations produced on low-resolution computational meshes using low polynomial order discontinuous Galerkin discretizations and a coarse temporal resolution, the proposed DNNCS learns to produce high-resolution free surface elevation and velocity visualizations in both time and space. To model the dynamic changes over time and space, we propose grid-aware spatiotemporal attention to project the temporal features to the spatial domain for non-local feature matching. The coordinate information is also utilized via positional encoding. For the final reconstruction, we use the spatiotemporal bilinear operation to interpolate the missing frames and then expand the feature maps to the frequency domain for residual mapping. Besides data-driven losses, the proposed physics-informed loss guarantees gradient consistency and momentum changes, leading to a 24% reduction in root-mean-square error compared to the model trained with only data-driven losses. To train the proposed model, we propose a coastal simulation dataset and use it for model optimization and evaluation. Our method shows superior downscaling quality and fast computation compared to the state-of-the-art methods.

2407.07742 2026-02-09 cs.IT cs.LG cs.NI math.IT

Science-Informed Design of Deep Learning With Applications to Wireless Systems: A Tutorial

Atefeh Termehchi, Ekram Hossain, Angelo Vera-Rivera, Muhammad Ibrahim, Isaac Woungang

详情
英文摘要

Recent advances in computational infrastructure and large-scale data processing have accelerated the adoption of data-driven inference methods, particularly deep learning (DL), to solve problems in many scientific and engineering domains. In wireless systems, DL has been applied to problems where analytical modeling or optimization is difficult to formulate, relies on oversimplified assumptions, or becomes computationally intractable. However, conventional DL models are often regarded as non-transparent, as their internal reasoning mechanisms are difficult to interpret even when model parameters are fully accessible. This lack of transparency undermines trust and leads to three interrelated challenges: limited interpretability, weak generalization, and the absence of a principled framework for parameter tuning. Science-informed deep learning (ScIDL) has emerged as a promising paradigm to address these limitations by integrating scientific knowledge into deep learning pipelines. This integration enables more precise characterization of model behavior and provides clearer explanations of how and why DL models succeed or fail. Despite growing interest, the existing literature remains fragmented and lacks a unifying taxonomy. This tutorial presents a structured overview of ScIDL methods and their applications in wireless systems. We introduce a structured taxonomy that organizes the ScIDL landscape, present two representative case studies illustrating its use in challenging wireless problems, and discuss key challenges and open research directions. The pedagogical structure guides readers from foundational concepts to advanced applications, making the tutorial accessible to researchers in wireless communications without requiring prior expertise in AI.

2406.01523 2026-02-09 cs.CE cs.LG

Predicting the fatigue life of asphalt concrete using neural networks

Jakub Houlík, Jan Valentin, Václav Nežerka

Comments Accepted paper

Journal ref Journal of Materials in Civil Engineering, vol. 38, no. 4, Apr. 2026

详情
英文摘要

Asphalt concrete's (AC) durability and maintenance demands are strongly influenced by its fatigue life. Traditional methods for determining this characteristic are both resource-intensive and time-consuming. This study employs artificial neural networks (ANNs) to predict AC fatigue life, focusing on the impact of strain level, binder content, and air-void content. Leveraging a substantial dataset, we tailored our models to effectively handle the wide range of fatigue life data, typically represented on a logarithmic scale. The mean square logarithmic error was utilized as the loss function to enhance prediction accuracy across all levels of fatigue life. Through comparative analysis of various hyperparameters, we developed a machine-learning model that captures the complex relationships within the data. Our findings demonstrate that higher binder content significantly enhances fatigue life, while the influence of air-void content is more variable, depending on binder levels. Most importantly, this study provides insights into the intricacies of using ANNs for modeling, showcasing their potential utility with larger datasets. The codes developed and the data used in this study are provided as open source on a GitHub repository, with a link included in the paper for full access.

2405.16594 2026-02-09 stat.ML cs.LG

Training-Conditional Coverage Bounds under Covariate Shift

Mehrdad Pournaderi, Yu Xiang

Comments Published in Transactions on Machine Learning Research

详情
英文摘要

Conformal prediction methodology has recently been extended to the covariate shift setting, where the distribution of covariates differs between training and test data. While existing results ensure that the prediction sets from these methods achieve marginal coverage above a nominal level, their coverage rate conditional on the training dataset (referred to as training-conditional coverage) remains unexplored. In this paper, we address this gap by deriving upper bounds on the tail of the training-conditional coverage distribution, offering probably approximately correct (PAC) guarantees for these methods. Our results characterize the reliability of the prediction sets in terms of the severity of distributional changes and the size of the training dataset.

2405.00734 2026-02-09 eess.SP cs.AI cs.LG

EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

Comments 15 pages, 9 figures. Oral presentation at ACM MM 2024

Journal ref In Proceedings of the 32nd ACM International Conference on Multimedia (ACM MM '24), pp. 340-349, 2024

详情
英文摘要

Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Manifold Attention and Confidence Stratification (MACS) to diagnose neurodegenerative disorders based on EEG signals sourced from four centers with unreliable annotations. The MACS framework's effectiveness stems from these features: 1) The Augmentor generates various EEG-represented brain variants to enrich the data space; 2) The Switcher enhances the feature space for trusted samples and reduces overfitting on incorrectly labeled samples; 3) The Encoder uses the Riemannian manifold and Euclidean metrics to capture spatiotemporal variations and dynamic synchronization in EEG; 4) The Projector, equipped with dual heads, monitors consistency across multiple brain variants and ensures diagnostic accuracy; 5) The Stratifier adaptively stratifies learned samples by confidence levels throughout the training process; 6) Forward and backpropagation in MACS are constrained by confidence stratification to stabilize the learning system amid unreliable annotations. Our subject-independent experiments, conducted on both neurocognitive and movement disorders using cross-center corpora, have demonstrated superior performance compared to existing related algorithms. This work not only improves EEG-based diagnostics for cross-center and small-setting brain diseases but also offers insights into extending MACS techniques to other data analyses, tackling data heterogeneity and annotation unreliability in multimedia and multimodal content understanding.

2404.15742 2026-02-09 math.NA cs.LG cs.NA

Generalizing the SINDy approach with nested neural networks

Camilla Fiorini, Clément Flint, Louis Fostier, Emmanuel Franck, Reyhaneh Hashemi, Victor Michel-Dansac, Wassim Tenachi

Journal ref ESAIM: Proceedings and Surveys, 2025, Vol. 81, p. 168-192

详情
英文摘要

Symbolic Regression (SR) is a widely studied field of research that aims to infer symbolic expressions from data. A popular approach for SR is the Sparse Identification of Nonlinear Dynamical Systems (SINDy) framework, which uses sparse regression to identify governing equations from data. This study introduces an enhanced method, Nested SINDy, that aims to increase the expressivity of the SINDy approach thanks to a nested structure. Indeed, traditional symbolic regression and system identification methods often fail with complex systems that cannot be easily described analytically. Nested SINDy builds on the SINDy framework by introducing additional layers before and after the core SINDy layer. This allows the method to identify symbolic representations for a wider range of systems, including those with compositions and products of functions. We demonstrate the ability of the Nested SINDy approach to accurately find symbolic expressions for simple systems, such as basic trigonometric functions, and sparse (false but accurate) analytical representations for more complex systems. Our results highlight Nested SINDy's potential as a tool for symbolic regression, surpassing the traditional SINDy approach in terms of expressivity. However, we also note the challenges in the optimization process for Nested SINDy and suggest future research directions, including the designing of a more robust methodology for the optimization process. This study proves that Nested SINDy can effectively discover symbolic representations of dynamical systems from data, offering new opportunities for understanding complex systems through data-driven methods.

2401.08468 2026-02-09 math.ST cs.LG eess.SP stat.TH

Nonparametric Evaluation of Noisy ICA Solutions

Syamantak Kumar, Purnamrita Sarkar, Peter Bickel, Derek Bean

Comments NeurIPS 2024 (Main Conference Track). 44 pages

Journal ref Advances in Neural Information Processing Systems, 37, pp.132647-132690 (2024)

详情
英文摘要

Independent Component Analysis (ICA) was introduced in the 1980's as a model for Blind Source Separation (BSS), which refers to the process of recovering the sources underlying a mixture of signals, with little knowledge about the source signals or the mixing process. While there are many sophisticated algorithms for estimation, different methods have different shortcomings. In this paper, we develop a nonparametric score to adaptively pick the right algorithm for ICA with arbitrary Gaussian noise. The novelty of this score stems from the fact that it just assumes a finite second moment of the data and uses the characteristic function to evaluate the quality of the estimated mixing matrix without any knowledge of the parameters of the noise distribution. In addition, we propose some new contrast functions and algorithms that enjoy the same fast computability as existing algorithms like FASTICA and JADE but work in domains where the former may fail. While these also may have weaknesses, our proposed diagnostic, as shown by our simulations, can remedy them. Finally, we propose a theoretical framework to analyze the local and global convergence properties of our algorithms.

2309.01750 2026-02-09 math.CO cs.AI

On CNF formulas irredundant with respect to unit clause propagation

Petr Savický

Comments 21 pages, this version includes modifications suggested by journal reviewers to improve readability

Journal ref Theoretical Computer Science, Volume 1064, February 2026

详情
英文摘要

Two CNF formulas are called ucp-equivalent, if they behave in the same way with respect to the unit clause propagation (UCP). A formula is called ucp-irredundant, if removing any clause leads to a formula which is not ucp-equivalent to the original one. As a consequence of known results, the ratio of the size of a ucp-irredundant formula and the size of a smallest ucp-equivalent formula is at most $n^2$, where $n$ is the number of the variables. We demonstrate an example of a ucp-irredundant formula for a symmetric definite Horn function which is larger than a smallest ucp-equivalent formula by a factor $Ω(n/\ln n)$. Consequently, a general upper bound on the above ratio cannot be smaller than this.

2302.11337 2026-02-09 math.NA cs.AI cs.CE cs.LG cs.NA

Bayesian Matrix Decomposition and Applications

Jun Lu

详情
英文摘要

The sole aim of this book is to give a self-contained introduction to concepts and mathematical tools in Bayesian matrix decomposition in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning Bayesian matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of variational inference for conducting the optimization. We refer the reader to literature in the field of Bayesian analysis for a more detailed introduction to the related fields. This book is primarily a summary of purpose, significance of important Bayesian matrix decomposition methods, e.g., real-valued decomposition, nonnegative matrix factorization, Bayesian interpolative decomposition, and the origin and complexity of the methods which shed light on their applications. The mathematical prerequisite is a first course in statistics and linear algebra. Other than this modest background, the development is self-contained, with rigorous proof provided throughout.

2105.02091 2026-02-09 cs.IR cs.CY cs.LG

When Fair Ranking Meets Uncertain Inference

Avijit Ghosh, Ritam Dutt, Christo Wilson

Comments Accepted as full paper at SIGIR 2021

详情
英文摘要

Existing fair ranking systems, especially those designed to be demographically fair, assume that accurate demographic information about individuals is available to the ranking algorithm. In practice, however, this assumption may not hold -- in real-world contexts like ranking job applicants or credit seekers, social and legal barriers may prevent algorithm operators from collecting peoples' demographic information. In these cases, algorithm operators may attempt to infer peoples' demographics and then supply these inferences as inputs to the ranking algorithm. In this study, we investigate how uncertainty and errors in demographic inference impact the fairness offered by fair ranking algorithms. Using simulations and three case studies with real datasets, we show how demographic inferences drawn from real systems can lead to unfair rankings. Our results suggest that developers should not use inferred demographic data as input to fair ranking algorithms, unless the inferences are extremely accurate.

2602.06197 2026-02-09 cs.HC cs.AI

Personagram: Bridging Personas and Product Design for Creative Ideation with Multimodal LLMs

Taewook Kim, Matthew K. Hong, Yan-Ying Chen, Jonathan Q. Li, Monica P Van, Shabnam Hakimi, Matthew Kay, Matthew Klenk

Comments 22 pages, 10 figures, 4 tables

详情
英文摘要

Product designers often begin their design process with handcrafted personas. While personas are intended to ground design decisions in consumer preferences, they often fall short in practice by remaining abstract, expensive to produce, and difficult to translate into actionable design features. As a result, personas risk serving as static reference points rather than tools that actively shape design outcomes. To address these challenges, we built Personagram, an interactive system powered by multimodal large language models (MLLMs) that helps designers explore detailed census-based personas, extract product features inferred from persona attributes, and recombine them for specific customer segments. In a study with 12 professional designers, we show that Personagram facilitates more actionable ideation workflows by structuring multimodal thinking from persona attributes to product design features, achieving higher engagement with personas, perceived transparency, and satisfaction compared to a chat-based baseline. We discuss implications of integrating AI-generated personas into product design workflows.

2602.06190 2026-02-09 cs.HC cs.AI cs.CL

Generics in science communication: Misaligned interpretations across laypeople, scientists, and large language models

Uwe Peters, Andrea Bertazzoli, Jasmine M. DeJesus, Gisela J. van der Velden, Benjamin Chin-Yee

详情
英文摘要

Scientists often use generics, that is, unquantified statements about whole categories of people or phenomena, when communicating research findings (e.g., "statins reduce cardiovascular events"). Large language models (LLMs), such as ChatGPT, frequently adopt the same style when summarizing scientific texts. However, generics can prompt overgeneralizations, especially when they are interpreted differently across audiences. In a study comparing laypeople, scientists, and two leading LLMs (ChatGPT-5 and DeepSeek), we found systematic differences in interpretation of generics. Compared to most scientists, laypeople judged scientific generics as more generalizable and credible, while LLMs rated them even higher. These mismatches highlight significant risks for science communication. Scientists may use generics and incorrectly assume laypeople share their interpretation, while LLMs may systematically overgeneralize scientific findings when summarizing research. Our findings underscore the need for greater attention to language choices in both human and LLM-mediated science communication.

2602.06180 2026-02-09 eess.AS cs.CL

STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs

Kaiyuan Zhang, Mohan Shi, Eray Eren, Natarajan Balaji Shankar, Zilai Wang, Abeer Alwan

Comments ICASSP 2026

详情
英文摘要

Neural audio codecs are widely used for audio compression and can be integrated into token-based language models. Traditional codecs preserve acoustic details well but lack semantic information. Recent hybrid codecs attempt to incorporate semantic information through distillation, but this often degrades reconstruction performance, making it difficult to achieve both. To address this limitation, we introduce STACodec, a unified codec that integrates semantic information from self-supervised learning (SSL) models into the first layer of residual vector quantization (RVQ-1) via semantic token assignment (STA). To further eliminate reliance on SSL-based semantic tokenizers and improve efficiency during inference, we propose a semantic pre-distillation (SPD) module, which predicts semantic tokens directly for assignment to the first RVQ layer during inference. Experimental results show that STACodec outperforms existing hybrid codecs in both audio reconstruction and downstream semantic tasks, demonstrating a better balance between acoustic fidelity and semantic capability.

2602.06172 2026-02-09 cs.CR cs.CY cs.LG

Know Your Scientist: KYC as Biosecurity Infrastructure

Jonathan Feldman, Tal Feldman, Annie I Anton

详情
英文摘要

Biological AI tools for protein design and structure prediction are advancing rapidly, creating dual-use risks that existing safeguards cannot adequately address. Current model-level restrictions, including keyword filtering, output screening, and content-based access denials, are fundamentally ill-suited to biology, where reliable function prediction remains beyond reach and novel threats evade detection by design. We propose a three-tier Know Your Customer (KYC) framework, inspired by anti-money laundering (AML) practices in the financial sector, that shifts governance from content inspection to user verification and monitoring. Tier I leverages research institutions as trust anchors to vouch for affiliated researchers and assume responsibility for vetting. Tier II applies output screening through sequence homology searches and functional annotation. Tier III monitors behavioral patterns to detect anomalies inconsistent with declared research purposes. This layered approach preserves access for legitimate researchers while raising the cost of misuse through institutional accountability and traceability. The framework can be implemented immediately using existing institutional infrastructure, requiring no new legislation or regulatory mandates.

2602.06137 2026-02-09 quant-ph cs.LG stat.ML

Warm Starts, Cold States: Exploiting Adiabaticity for Variational Ground-States

Ricard Puig, Berta Casas, Alba Cervera-Lierta, Zoë Holmes, Adrián Pérez-Salinas

Comments 11 + 24 pages, 3 figures

详情
英文摘要

Reliable preparation of many-body ground states is an essential task in quantum computing, with applications spanning areas from chemistry and materials modeling to quantum optimization and benchmarking. A variety of approaches have been proposed to tackle this problem, including variational methods. However, variational training often struggle to navigate complex energy landscapes, frequently encountering suboptimal local minima or suffering from barren plateaus. In this work, we introduce an iterative strategy for ground-state preparation based on a stepwise (discretized) Hamiltonian deformation. By complementing the Variational Quantum Eigensolver (VQE) with adiabatic principles, we demonstrate that solving a sequence of intermediate problems facilitates tracking the ground-state manifold toward the target system, even as we scale the system size. We provide a rigorous theoretical foundation for this approach, proving a lower bound on the loss variance that suggests trainability throughout the deformation, provided the system remains away from gap closings. Numerical simulations, including the effects of shot noise, confirm that this path-dependent tracking consistently converges to the target ground state.

2602.06134 2026-02-09 cs.HC cs.AI

Hear You in Silence: Designing for Active Listening in Human Interaction with Conversational Agents Using Context-Aware Pacing

Zhihan Jiang, Qianhui Chen, Chu Zhang, Yanheng Li, Ray LC

Comments 29 pages, 10 figures. Conditionally Accepted to CHI '26

详情
英文摘要

In human conversation, empathic dialogue requires nuanced temporal cues indicating whether the conversational partner is paying attention. This type of "active listening" is overlooked in the design of Conversational Agents (CAs), which use the same pacing for one conversation. To model the temporal cues in human conversation, we need CAs that dynamically adjust response pacing according to user input. We qualitatively analyzed ten cases of active listening to distill five context-aware pacing strategies: Reflective Silence, Facilitative Silence, Empathic Silence, Holding Space, and Immediate Response. In a between-subjects study (N=50) with two conversational scenarios (relationship and career-support), the context-aware agent scored higher than static-pacing control on perceived human-likeness, smoothness, and interactivity, supporting deeper self-disclosure and higher engagement. In the career support scenario, the CA yielded higher perceived listening quality and affective trust. This work shows how insights from human conversation like context-aware pacing can empower the design of more empathic human-AI communication.

2602.06101 2026-02-09 eess.IV cs.CV cs.MM

ALIEN: Analytic Latent Watermarking for Controllable Generation

Liangqi Lei, Keke Gai, Jing Yu, Qi Wu

详情
英文摘要

Watermarking is a technical alternative to safeguarding intellectual property and reducing misuse. Existing methods focus on optimizing watermarked latent variables to balance watermark robustness and fidelity, as Latent diffusion models (LDMs) are considered a powerful tool for generative tasks. However, reliance on computationally intensive heuristic optimization for iterative signal refinement results in high training overhead and local optima entrapment.To address these issues, we propose an \underline{A}na\underline{l}ytical Watermark\underline{i}ng Framework for Controllabl\underline{e} Generatio\underline{n} (ALIEN). We develop the first analytical derivation of the time-dependent modulation coefficient that guides the diffusion of watermark residuals to achieve controllable watermark embedding pattern.Experimental results show that ALIEN-Q outperforms the state-of-the-art by 33.1\% across 5 quality metrics, and ALIEN-R demonstrates 14.0\% improved robustness against generative variant and stability threats compared to the state-of-the-art across 15 distinct conditions. Code can be available at https://anonymous.4open.science/r/ALIEN/.

2602.06090 2026-02-09 cs.SE cs.AI cs.CV

SVRepair: Structured Visual Reasoning for Automated Program Repair

Xiaoxuan Tang, Jincheng Wang, Liwei Luo, Jingxuan Xu, Sheng Zhou, Dajun Chen, Wei Jiang, Yong Li

Comments 16 pages, 3 figures

详情
英文摘要

Large language models (LLMs) have recently shown strong potential for Automated Program Repair (APR), yet most existing approaches remain unimodal and fail to leverage the rich diagnostic signals contained in visual artifacts such as screenshots and control-flow graphs. In practice, many bug reports convey critical information visually (e.g., layout breakage or missing widgets), but directly using such dense visual inputs often causes context loss and noise, making it difficult for MLLMs to ground visual observations into precise fault localization and executable patches. To bridge this semantic gap, we propose \textbf{SVRepair}, a multimodal APR framework with structured visual representation. SVRepair first fine-tunes a vision-language model, \textbf{Structured Visual Representation (SVR)}, to uniformly transform heterogeneous visual artifacts into a \emph{semantic scene graph} that captures GUI elements and their structural relations (e.g., hierarchy), providing normalized, code-relevant context for downstream repair. Building on the graph, SVRepair drives a coding agent to localize faults and synthesize patches, and further introduces an iterative visual-artifact segmentation strategy that progressively narrows the input to bug-centered regions to suppress irrelevant context and reduce hallucinations. Extensive experiments across multiple benchmarks demonstrate state-of-the-art performance: SVRepair achieves \textbf{36.47\%} accuracy on SWE-Bench M, \textbf{38.02\%} on MMCode, and \textbf{95.12\%} on CodeVision, validating the effectiveness of SVRepair for multimodal program repair.

2602.06081 2026-02-09 cs.MA cs.AI cs.GT

Communication Enhances LLMs' Stability in Strategic Thinking

Nunzio Lore, Babak Heydari

Comments 15 pages, 1 figure, 6 tables

详情
英文摘要

Large Language Models (LLMs) often exhibit pronounced context-dependent variability that undermines predictable multi-agent behavior in tasks requiring strategic thinking. Focusing on models that range from 7 to 9 billion parameters in size engaged in a ten-round repeated Prisoner's Dilemma, we evaluate whether short, costless pre-play messages emulating the cheap-talk paradigm affect strategic stability. Our analysis uses simulation-level bootstrap resampling and nonparametric inference to compare cooperation trajectories fitted with LOWESS regression across both the messaging and the no-messaging condition. We demonstrate consistent reductions in trajectory noise across a majority of the model-context pairings being studied. The stabilizing effect persists across multiple prompt variants and decoding regimes, though its magnitude depends on model choice and contextual framing, with models displaying higher baseline volatility gaining the most. While communication rarely produces harmful instability, we document a few context-specific exceptions and identify the limited domains in which communication harms stability. These findings position cheap-talk style communication as a low-cost, practical tool for improving the predictability and reliability of strategic behavior in multi-agent LLM systems.

2602.06079 2026-02-09 cs.DC cs.LG

Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers

Liangyu Wang, Siqi Zhang, Junjie Wang, Yiming Dong, Bo Zheng, Zihan Qiu, Shengkun Tang, Di Wang, Rui Men, Dayiheng Liu

详情
英文摘要

The scaling of Large Language Models (LLMs) drives interest in matrix-based optimizers (e.g., Shampoo, Muon, SOAP) for their convergence efficiency; yet their requirement for holistic updates conflicts with the tensor fragmentation in distributed frameworks like Megatron. Existing solutions are suboptimal: synchronous approaches suffer from computational redundancy, while layer-wise partitioning fails to reconcile this conflict without violating the geometric constraints of efficient communication primitives. To bridge this gap, we propose Canzona, a Unified, Asynchronous, and Load-Balanced framework that decouples logical optimizer assignment from physical parameter distribution. For Data Parallelism, we introduce an alpha-Balanced Static Partitioning strategy that respects atomicity while neutralizing the load imbalance. For Tensor Parallelism, we design an Asynchronous Compute pipeline utilizing Micro-Group Scheduling to batch fragmented updates and hide reconstruction overhead. Extensive evaluations on the Qwen3 model family (up to 32B parameters) on 256 GPUs demonstrate that our approach preserves the efficiency of established parallel architectures, achieving a 1.57x speedup in end-to-end iteration time and reducing optimizer step latency by 5.8x compared to the baseline.

2602.06078 2026-02-09 cs.DL cs.AI cs.LG

Allocate Marginal Reviews to Borderline Papers Using LLM Comparative Ranking

Elliot L. Epstein, Rajat Dwaraknath, John Winnicki, Thanawat Sornwanee

Comments 13 pages

详情
英文摘要

This paper argues that large ML conferences should allocate marginal review capacity primarily to papers near the acceptance boundary, rather than spreading extra reviews via random or affinity-driven heuristics. We propose using LLM-based comparative ranking (via pairwise comparisons and a Bradley--Terry model) to identify a borderline band \emph{before} human reviewing and to allocate \emph{marginal} reviewer capacity at assignment time. Concretely, given a venue-specific minimum review target (e.g., 3 or 4), we use this signal to decide which papers receive one additional review (e.g., a 4th or 5th), without conditioning on any human reviews and without using LLM outputs for accept/reject. We provide a simple expected-impact calculation in terms of (i) the overlap between the predicted and true borderline sets ($ρ$) and (ii) the incremental value of an extra review near the boundary ($Δ$), and we provide retrospective proxies to estimate these quantities.

2602.06072 2026-02-09 cs.DC cs.LG

PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference

Rui Ning, Wei Zhang, Fan Lai

详情
英文摘要

Attention efficiency is critical to large language model (LLM) inference. While prior advances optimize attention execution for individual requests (e.g., FlashAttention), production LLM serving relies on batching requests with highly heterogeneous sequence lengths for high serving throughput. This mismatch induces severe computation and I/O imbalance, exacerbates stragglers, and underutilizes GPU resources. We present PackInfer, a kernel-level attention framework that enables compute- and I/O-aware execution for heterogeneous batched inference. PackInfer orchestrates batched requests into load-balanced execution groups, effectively saturating GPU utilization by packing multiple requests into unified kernel launches. By constructing attention kernels directly over packed query-key regions, PackInfer eliminates redundant computation and balances thread-block execution. It then incorporates I/O-aware grouping that co-locates shared-prefix requests and reorganizes KV caches into group-contiguous layouts, reducing memory fragmentation and redundant data movement as generation evolves. Evaluations on real-world workloads show that PackInfer reduces inference latency by 13.0-20.1%, and improves throughput by 20% compared to the state-of-the-art FlashAttention.

2602.06069 2026-02-09 cs.DC cs.AI cs.LG cs.SE

HQP: Sensitivity-Aware Hybrid Quantization and Pruning for Ultra-Low-Latency Edge AI Inference

Dinesh Gopalan, Ratul Ali

Comments 7 pages, 3 figures, 2 tables

详情
英文摘要

The escalating demand for high-fidelity, real-time inference in distributed edge-cloud environments necessitates aggressive model optimization to counteract severe latency and energy constraints. This paper introduces the Hybrid Quantization and Pruning (HQP) framework, a novel, integrated methodology designed to achieve synergistic model acceleration while adhering to strict quality guarantees. We detail a sensitivity-aware structural pruning algorithm that employs a dynamic weight sensitivity metric, derived from a highly efficient approximation of the Fisher Information Matrix (FIM), to guide the iterative removal of redundant filters. This pruning is strictly conditional, enforcing an adherence to a maximum permissible accuracy drop (Delta ax) before the model proceeds to 8-bit post-training quantization. This rigorous coordination is critical, as it ensures the resultant sparse model structure is maximally robust to quantization error and hardware-specific kernel optimization. Exhaustive evaluation across heterogeneous NVIDIA Jetson edge platforms, utilizing resource-efficient architectures like MobileNetV3 and ResNet-18, demonstrates that the HQP framework achieves a peak performance gain of 3.12 times inference speedup and a 55 percent model size reduction, while rigorously containing the accuracy drop below the 1.5 percent constraint. A comprehensive comparative analysis against conventional single-objective compression techniques validates the HQP framework as a superior, hardware-agnostic solution for deploying ultra-low-latency AI in resource-limited edge infrastructures.

2602.06064 2026-02-09 cs.DC cs.AI

iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems

Yi-Xiang Hu, Yuke Wang, Feng Wu, Zirui Huang, Shuli Zeng, Xiang-Yang Li

Comments 13 pages, 7 figures,

详情
英文摘要

Scheduling precedence-constrained tasks under shared renewable resources is central to modern computing platforms. The Resource Investment Problem (RIP) models this setting by minimizing the cost of provisioned renewable resources under precedence and timing constraints. Exact mixed-integer programming and constraint programming become impractically slow on large instances, and dynamic updates require schedule revisions under tight latency budgets. We present iScheduler, a reinforcement-learning-driven iterative scheduling framework that formulates RIP solving as a Markov decision process over decomposed subproblems and constructs schedules through sequential process selection. The framework accelerates optimization and supports reconfiguration by reusing unchanged process schedules and rescheduling only affected processes. We also release L-RIPLIB, an industrial-scale benchmark derived from cloud-platform workloads with 1,000 instances of 2,500-10,000 tasks. Experiments show that iScheduler attains competitive resource costs while reducing time to feasibility by up to 43$\times$ against strong commercial baselines.

2602.06062 2026-02-09 cs.IT cs.LG math.IT

Deep Unfolded Fractional Optimization for Maximizing Robust Throughput in 6G Networks

Anh Thi Bui, Robert-Jeron Reifert, Hayssam Dahrouj, Aydin Sezgin

Comments 6 pages, 5 figures

详情
英文摘要

The sixth-generation (6G) of wireless communication networks aims to leverage artificial intelligence tools for efficient and robust network optimization. This is especially the case since traditional optimization methods often face high computational complexity, motivating the use of deep learning (DL)-based optimization frameworks. In this context, this paper considers a multi-antenna base station (BS) serving multiple users simultaneously through transmit beamforming in downlink mode. To account for robustness, this work proposes an uncertainty-injected deep unfolded fractional programming (UI-DUFP) framework for weighted sum rate (WSR) maximization under imperfect channel conditions. The proposed method unfolds fractional programming (FP) iterations into trainable neural network layers refined by projected gradient descent (PGD) steps, while robustness is introduced by injecting sampled channel uncertainties during training and optimizing a quantile-based objective. Simulation results show that the proposed UI-DUFP achieves higher WSR and improved robustness compared to classical weighted minimum mean square error, FP, and DL baselines, while maintaining low inference time and good scalability. These findings highlight the potential of deep unfolding combined with uncertainty-aware training as a powerful approach for robust optimization in 6G networks.

2602.06056 2026-02-09 cs.MM cs.AI cs.CL cs.CV

Analyzing Diffusion and Autoregressive Vision Language Models in Multimodal Embedding Space

Zihang Wang, Siyue Zhang, Yilun Zhao, Jingyi Yang, Tingyu Song, Anh Tuan Luu, Chen Zhao

详情
英文摘要

Embedding models are a fundamental component of modern AI systems such as semantic search and retrieval-augmented generation. Recent advances in large foundation models have substantially accelerated the development of embedding models, including those based on Large Language Models (LLMs), Vision Language Models (VLMs), and Multimodal LLMs. More recently, Large Diffusion Language Models (dLLMs) and Multimodal dLLMs have emerged as competitive alternatives to autoregressive models, offering advantages such as bidirectional attention and parallel generation. This progress naturally raises a critical yet unexplored question: can Multimodal dLLMs serve as effective multimodal embedding models? To answer this, we present the first systematic study of converting Multimodal dLLMs into embedding models. We evaluate state-of-the-art Multimodal dLLMs and Autoregressive VLMs across three categories of embedding tasks: classification, visual question answering, and information retrieval. Our results show that Multimodal dLLM embeddings generally underperform their autoregressive VLM counterparts. The stronger diffusion-based model, LaViDa, lags by only 3.5 points on classification, 2.5 points on VQA, and 4.4 points on retrieval tasks, whereas the other diffusion-based model, MMaDA, exhibits substantially larger performance gaps, exceeding 20 points across all tasks. Further analysis reveals insufficient image-text alignment in diffusion-based models, accounting for the observed limitations in their embedding performance.

2602.06047 2026-02-09 cs.HC cs.AI

Git for Sketches: An Intelligent Tracking System for Capturing Design Evolution

Sankar B, Amogh A S, Sandhya Baranwal, Dibakar Sen

Comments 49 pages, 25 figures

详情
英文摘要

During product conceptualization, capturing the non-linear history and cognitive intent is crucial. Traditional sketching tools often lose this context. We introduce DIMES (Design Idea Management and Evolution capture System), a web-based environment featuring sGIT (SketchGit), a custom visual version control architecture, and Generative AI. sGIT includes AEGIS, a module using hybrid Deep Learning and Machine Learning models to classify six stroke types. The system maps Git primitives to design actions, enabling implicit branching and multi-modal commits (stroke data + voice intent). In a comparative study, experts using DIMES demonstrated a 160% increase in breadth of concept exploration. Generative AI modules generated narrative summaries that enhanced knowledge transfer; novices achieved higher replication fidelity (Neural Transparency-based Cosine Similarity: 0.97 vs. 0.73) compared to manual summaries. AI-generated renderings also received higher user acceptance (Purchase Likelihood: 4.2 vs 3.1). This work demonstrates that intelligent version control bridges creative action and cognitive documentation, offering a new paradigm for design education.