arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1864
专题追踪
2511.08917 2026-04-01 cs.HC cs.CV

"It's trained by non-disabled people": Evaluating How Image Quality Affects Product Captioning with Vision-Language Models

Kapil Garg, Xinru Tang, Jimin Heo, Dwayne R. Morgan, Darren Gergle, Erik B. Sudderth, Anne Marie Piper

Comments Published at CHI 2026; Honorable Mention for Best Paper (Top 5%). Dataset available at: https://github.com/Accessibility-Research-Collective-UCI/image-quality-vlm-chi26

详情
英文摘要

Vision-Language Models (VLMs) are increasingly used by blind and low-vision (BLV) people to identify and understand products in their everyday lives, such as food, personal care items, and household goods. Despite their prevalence, we lack an empirical understanding of how common image quality issues--such as blur, misframing, and rotation--affect the accuracy of VLM-generated captions and whether the resulting captions meet BLV people's information needs. Based on a survey of 86 BLV participants, we develop an annotated dataset of 1,859 product images from BLV people to systematically evaluate how image quality issues affect VLM-generated captions. While the best VLM achieves 98% accuracy on images with no quality issues, accuracy drops to 75% overall when quality issues are present, worsening considerably as issues compound. We discuss the need for model evaluations that center on disabled people's experiences throughout the process and offer concrete recommendations for HCI and ML researchers to make VLMs more reliable for BLV people.

2510.16187 2026-04-01 cs.MA cs.AI cs.RO

Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards

Rupal Nigam, Niket Parikh, Hamid Osooli, Mikihisa Yuasa, Jacob Heglund, Huy T. Tran

Comments 10 pages, 8 figures. To appear in proceedings of 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情
英文摘要

Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.

2510.14582 2026-04-01 stat.ML cs.AI cs.LG

Local Causal Discovery for Statistically Efficient Causal Inference

Mátyás Schubert, Tom Claassen, Sara Magliacane

Comments Accepted at AISTATS 2026

详情
英文摘要

Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighborhood of the target variables, but are restricted to statistically suboptimal adjustment sets. In this work, we propose Local Optimal Adjustments Discovery (LOAD), a sound and complete causal discovery approach that combines the computational efficiency of local methods with the statistical optimality of global methods. First, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it finds the possible descendants of the treatment and infers the optimal adjustment set as the parents of the outcome in a modified forbidden projection. Otherwise, it returns the locally valid parent adjustment sets. In our experiments on synthetic and realistic data LOAD outperforms global methods in scalability, while providing more accurate effect estimation than local methods.

2509.05841 2026-04-01 math.OC cs.AI q-fin.RM

Generative AI on Wall Street -- Opportunities and Risk Controls

Jackie Shen

Comments 30 pages, 8 figures

详情
英文摘要

We give an overview on the emerging applications of GenAI in the financial industry, especially within investment banks. Inherent to these exciting opportunities is a new realm of risks that must be managed properly. By heeding both the Yin and Yang sides of GenAI, we can accelerate its organic growth while safeguarding the entire financial industry during this nascent era of AI.

2509.03378 2026-04-01 stat.ML cs.LG

Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization

Wu Lin, Scott C. Lowe, Felix Dangel, Runa Eschenhagen, Zikun Xu, Roger B. Grosse

Comments an extended version of the ICLR 2026 paper (added a sentence about viewing KL-Shampoo from a gradient orthogonalization viewpoint)

详情
英文摘要

Shampoo and its efficient variant, SOAP, employ structured second-moment estimations and have shown strong performance for training neural networks (NNs). In practice, however, Shampoo typically requires step-size grafting with Adam to be competitive, and SOAP mitigates this by applying Adam in Shampoo's eigenbasis -- at the cost of additional memory overhead from Adam in both methods. Prior analyses have largely relied on the Frobenius norm to motivate these estimation schemes. We instead recast their estimation procedures as covariance estimation under Kullback-Leibler (KL) divergence minimization, revealing a previously overlooked theoretical limitation and motivating principled redesigns. Building on this perspective, we develop $\textbf{KL-Shampoo}$ and $\textbf{KL-SOAP}$, practical schemes that match or exceed the performance of Shampoo and SOAP in NN pre-training while achieving SOAP-level per-iteration runtime. Notably, KL-Shampoo does not rely on Adam to attain competitive performance, eliminating the memory overhead introduced by Adam. Across our experiments, KL-Shampoo consistently outperforms SOAP, Shampoo, and even KL-SOAP, establishing the KL-based approach as a promising foundation for designing structured methods in NN optimization. An implementation of KL-Shampoo/KL-SOAP is available at https://github.com/yorkerlin/KL-Methods

2508.20125 2026-04-01 cs.NE cs.AI cs.CV q-bio.NC

Improving Liver Disease Diagnosis with SNNDeep: A Custom Spiking Neural Network Using Diverse Learning Algorithms

Zofia Rudnicka, Janusz Szczepanski, Agnieszka Pregowska

详情
英文摘要

Purpose: Spiking neural networks (SNNs) have recently gained attention as energy-efficient, biologically plausible alternatives to conventional deep learning models. Their application in high-stakes biomedical imaging remains almost entirely unexplored. Methods: This study introduces SNNDeep, the first tailored SNN specifically optimized for binary classification of liver health status from computed tomography (CT) features. To ensure clinical relevance and broad generalizability, the model was developed and evaluated using the Task03\Liver dataset from the Medical Segmentation Decathlon (MSD), a standardized benchmark widely used for assessing performance across diverse medical imaging tasks. We benchmark three fundamentally different learning algorithms, namely Surrogate Gradient Learning, the Tempotron rule, and Bio-Inspired Active Learning across three architectural variants: a fully customized low-level model built from scratch, and two implementations using leading SNN frameworks, i.e., snnTorch and SpikingJelly. Hyperparameter optimization was performed using Optuna. Results: Our results demonstrate that the custom-built SNNDeep consistently outperforms framework-based implementations, achieving a maximum validation accuracy of 98.35%, superior adaptability across learning rules, and significantly reduced training overhead. Conclusion:This study provides the first empirical evidence that low-level, highly tunable SNNs can surpass standard frameworks in medical imaging, especially in data-limited, temporally constrained diagnostic settings, thereby opening a new pathway for neuro-inspired AI in precision medicine.

2508.00017 2026-04-01 cs.LO cs.AI cs.AR

Generative Logic: A New Computer Architecture for Deterministic Reasoning and Knowledge Generation

Nikolai Sergeev

Comments v4: Incubator, Compressor, Verifier (34,320 checks, 0 failures). New CAS chapter. Pipeline diagram. Branching outlook, FTA campaign, CAS roadmap, LLM demo in Future Work. Updated MPL listing and runtimes. 24pp, 8 figs. Zenodo DOI: 10.5281/zenodo.17206386

详情
英文摘要

We present Generative Logic (GL), a deterministic architecture that starts from user-supplied axiomatic definitions written in a minimalist Mathematical Programming Language (MPL) and systematically explores a configurable region of their deductive neighborhood. Definitions are compiled into a distributed grid of Logic Blocks (LBs) that communicate via a unified hash-based inference engine; whenever the premises of a rule unify, a new fact is emitted with full provenance, yielding replayable, auditable proof graphs. The pipeline includes an Incubator that auto-generates ground-level fact tables, a Compressor that eliminates post-proof redundancy, and an independent external Verifier (34,320 checks, zero failures). Experimental validation on Elementary Number Theory develops Peano arithmetic from axioms and autonomously derives Gauss's summation formula. On commodity hardware, the core proving pipeline completes in under one minute; the full run including Incubator fact generation finishes in approximately ten minutes. The Incubator output further reveals that GL can perform concrete numerical calculations -- each result a proved theorem with full provenance -- opening a path toward a full-provenance Computer Algebra System (CAS). Generated proofs export as navigable HTML for independent inspection. Code, proof graphs, and reproduction instructions are available at github.com/Generative-Logic/GL (commit 6e5b9a4) and archived at doi:10.5281/zenodo.17206386.

2506.06837 2026-04-01 cs.MA cs.AI cs.GT

AI-Generated Compromises for Coalition Formation

Eyal Briman, Ehud Shapiro, Nimrod Talmon

详情
英文摘要

The challenge of finding compromises between agent proposals is fundamental to AI subfields such as argumentation, mediation, and negotiation. Building on this tradition, Elkind et al. (2021) introduced a process for coalition formation that seeks majority-supported proposals preferable to the status quo, using a metric space where each agent has an ideal point. A crucial step in this process involves identifying compromise proposals around which agent coalitions can unite. How to effectively find such compromise proposals remains an open question. We address this gap by formalizing a model that incorporates agent bounded rationality and uncertainty, and by developing AI methods to generate compromise proposals. We focus on the domain of collaborative document writing, such as the democratic drafting of a community constitution. Our approach uses natural language processing techniques and large language models to induce a semantic metric space over text. Based on this space, we design algorithms to suggest compromise points likely to receive broad support. To evaluate our methods, we simulate coalition formation processes and show that AI can facilitate large-scale democratic text editing, a domain where traditional tools are limited.

2506.00241 2026-04-01 cs.HC cs.AI

Balancing Efficiency and Empathy: Healthcare Providers' Perspectives on AI-Supported Workflows for Serious Illness Conversations in the Emergency Department

Menglin Zhao, Zhuorui Yong, Ruijia Guan, Kai-Wei Chang, Adrian Haimovich, Kei Ouchi, Timothy Bickmore, Zhan Zhang, Bingsheng Yao, Dakuo Wang, Smit Desai

Comments To appear at ACM CHI'26

详情
英文摘要

Serious Illness Conversations (SICs), discussions about values and care preferences for patients with life-threatening illness, rarely occur in Emergency Departments (EDs), despite evidence that early conversations improve care alignment and reduce unnecessary interventions. We interviewed 11 ED providers to identify challenges in SICs and opportunities for technology support, with a focus on AI. Our analysis revealed a four-stage SIC workflow (identification, preparation, conduction, documentation) and barriers at each stage, including fragmented patient information, limited time and space, lack of conversational guidance, and burdensome documentation. Providers expressed interest in AI systems for synthesizing information, supporting real-time conversations, and automating documentation, but emphasized concerns about preserving human connection and clinical autonomy. This tension highlights the need for technologies that enhance efficiency without undermining the interpersonal nature of SICs. We propose design guidelines for ambient and peripheral AI systems to support providers while preserving the essential humanity of these conversations.

2505.18602 2026-04-01 cs.NE cs.AI cs.LG

LLM-Meta-SR: In-Context Learning for Evolving Selection Operators in Symbolic Regression

Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, Mengjie Zhang

详情
英文摘要

Large language models (LLMs) have revolutionized algorithm development, yet their application in symbolic regression, where algorithms automatically discover symbolic expressions from data, remains limited. In this paper, we propose a meta-learning framework that enables LLMs to automatically design selection operators for evolutionary symbolic regression algorithms. We first identify two key limitations in existing LLM-based algorithm evolution techniques: lack of semantic guidance and code bloat. The absence of semantic awareness can lead to ineffective exchange of useful code components, while bloat results in unnecessarily complex components; both can hinder evolutionary learning progress or reduce the interpretability of the designed algorithm. To address these issues, we enhance the LLM-based evolution framework for meta-symbolic regression with two key innovations: a complementary, semantics-aware selection operator and bloat control. Additionally, we embed domain knowledge into the prompt, enabling the LLM to generate more effective and contextually relevant selection operators. Our experimental results on symbolic regression benchmarks show that LLMs can devise selection operators that outperform nine expert-designed baselines, achieving state-of-the-art performance. Moreover, the evolved operator can further improve a state-of-the-art symbolic regression algorithm, achieving the best performance among 28 symbolic regression and other machine learning algorithms across 116 regression datasets. This demonstrates that LLMs can exceed expert-level algorithm design for symbolic regression.

2503.21473 2026-04-01 stat.ML cs.LG

DeepRV: Accelerating Spatiotemporal Inference with Pre-trained Neural Priors

Jhonathan Navott, Daniel Jenson, Seth Flaxman, Elizaveta Semenova

Comments Code to reproduce all experiments is available in the dl4bi codebase: https://github.com/MLGlobalHealth/dl4bi

详情
英文摘要

Gaussian Processes (GPs) provide a flexible and statistically principled foundation for modelling spatiotemporal phenomena, but their $O(N^3)$ scaling makes them intractable for large datasets. Approximate methods such as variational inference (VI), inducing-point (sparse) GPs, low-rank kernel approximations (e.g., Nystrom methods and random Fourier features), and approximations such as INLA improve scalability but typically trade off accuracy, calibration, or modelling flexibility. We introduce DeepRV, a neural-network surrogate that replaces GP prior sampling, while closely matching full GP accuracy at inference including hyperparameter estimates, and reducing computational complexity to $O(N^2)$, increasing scalability and inference speed. DeepRV serves as a drop-in replacement for GP prior realisations in e.g. MCMC-based probabilistic programming pipelines, preserving full model flexibility. Across simulated benchmarks, non-separable spatiotemporal GPs, and a real-world application to education deprivation in London (n = 4,994 locations), DeepRV achieves the highest fidelity to exact GPs while substantially accelerating inference. Code is provided in the dl4bi Python package, with all experiments run on a single consumer-grade GPU to ensure accessibility for practitioners.

2404.08829 2026-04-01 cs.IR cs.IT cs.LG math.IT

Measuring the Predictability of Recommender Systems using Structural Complexity Metrics

Andrés Abeliuk, Alfonso Valderrama, Simón Campos, Marcelo Mendoza

Comments Accepted at WWW-24 Workshop: DCAI Data-centric Artificial Intelligence

详情
英文摘要

Recommender Systems (RS) shape the filtering and curation of online content, yet we have limited understanding of how predictable their recommendation outputs are. We propose data-driven metrics that quantify the predictability of recommendation datasets by measuring the structural complexity of the user-item interaction matrix. High complexity indicates intricate interaction patterns that are harder to predict; low complexity indicates simpler, more predictable structures. We operationalize structural complexity via data perturbations, using singular value decomposition (SVD) to assess how stable the latent structure remains under perturbations. Our hypothesis is that random perturbations minimally affect highly organized data, but cause substantial structural disruption in intrinsically complex data. By analyzing prediction errors on perturbed interactions, we derive metrics that quantify this sensitivity at both the dataset and the interaction levels, yielding a principled measure of inherent predictability. Experiments on real-world datasets show that our structural complexity metrics correlate with the performance of state-of-the-art recommendation algorithms. We also demonstrate structure-aware data selection: in low-data settings, models trained on a carefully chosen subset of interactions with low structural perturbation error consistently outperform models trained on the full dataset. Thus, structural complexity serves both as a precise diagnostic of dataset complexity and as a principled foundation for efficient, data-centric training of RS.

2603.29543 2026-04-01 quant-ph cs.AI

Reducing Complexity for Quantum Approaches in Train Load Optimization

Zhijie Tang, Albert Nieto-Morales, Arit Kumar Bishwas

Comments 8 pages, 3 figures, 4 tables

详情
英文摘要

Efficiently planning container loads onto trains is a computationally challenging combinatorial optimization problem, central to logistics and supply chain management. A primary source of this complexity arises from the need to model and reduce rehandle operations-unproductive crane moves required to access blocked containers. Conventional mathematical formulations address this by introducing explicit binary variables and a web of logical constraints for each potential rehandle, resulting in large-scale models that are difficult to solve. This paper presents a fundamental departure from this paradigm. We introduce an innovative and compact mathematical formulation for the Train Load Optimization (TLO) problem where the rehandle cost is calculated implicitly within the objective function. This novel approach helps prevent the need for dedicated rehandle variables and their associated constraints, leading to a dramatic reduction in model size. We provide a formal comparison against a conventional model to analytically demonstrate the significant reduction in the number of variables and constraints. The efficacy of our compact formulation is assessed through a simulated annealing metaheuristic, which finds high-quality loading plans for various problem instances. The results confirm that our model is not only more parsimonious but also practically effective, offering a scalable and powerful tool for modern rail logistics.

2603.29537 2026-04-01 cs.CR cs.AI cs.MM cs.NI

Mean Masked Autoencoder with Flow-Mixing for Encrypted Traffic Classification

Xiao Liu, Xiaowei Fu, Fuxiang Huang, Lei Zhang

Comments Project page \url{https://github.com/lx6c78/MMAE}

详情
英文摘要

Network traffic classification using self-supervised pre-training models based on Masked Autoencoders (MAE) has demonstrated a huge potential. However, existing methods are confined to isolated byte-level reconstruction of individual flows, lacking adequate perception of the multi-granularity contextual relationship in traffic. To address this limitation, we propose Mean MAE (MMAE), a teacher-student MAE paradigm with flow mixing strategy for building encrypted traffic pre-training model. MMAE employs a self-distillation mechanism for teacher-student interaction, where the teacher provides unmasked flow-level semantic supervision to advance the student from local byte reconstruction to multi-granularity comprehension. To break the information bottleneck in individual flows, we introduce a dynamic Flow Mixing (FlowMix) strategy to replace traditional random masking mechanism. By constructing challenging cross-flow mixed samples with interferences, it compels the model to learn discriminative representations from distorted tokens. Furthermore, we design a Packet-importance aware Mask Predictor (PMP) equipped with an attention bias mechanism that leverages packet-level side-channel statistics to dynamically mask tokens with high semantic density. Numerous experiments on a number of datasets covering encrypted applications, malware, and attack traffic demonstrate that MMAE achieves state-of-the-art performance. The code is available at https://github.com/lx6c78/MMAE

2603.29532 2026-04-01 eess.SY cs.LG cs.SY

Learning Surrogate LPV State-Space Models with Uncertainty Quantification

E. Javier Olucha, Valentin Preda, Amritam Das, Roland Tóth

Comments Preprint submitted to the 65th IEEE Conference on Decision and Control

详情
英文摘要

The Linear Parameter-Varying (LPV) framework enables the construction of surrogate models of complex nonlinear and high-dimensional systems, facilitating efficient stability and performance analysis together with controller design. Despite significant advances in data-driven LPV modelling, existing approaches do not quantify the uncertainty of the obtained LPV models. Consequently, assessing model reliability for analysis and control or detecting operation outside the training regime requires extensive validation and user expertise. This paper proposes a Bayesian approach for the joint estimation of LPV state-space models together with their scheduling, providing a characterization of model uncertainty and confidence bounds on the predicted model response directly from input-output data. Both aleatoric uncertainty due to measurement noise and epistemic uncertainty arising from limited training data and structural bias are considered. The resulting model preserves the LPV structure required for controller synthesis while enabling computationally efficient simulation and uncertainty propagation. The approach is demonstrated on the surrogate modelling of a two-dimensional nonlinear interconnection of mass-spring-damper systems.

2603.29529 2026-04-01 cond-mat.dis-nn cs.LG q-bio.BM

Sampling at intermediate temperatures is optimal for training large language models in protein structure prediction

L. Ghiringhelli, A. Zambon, G. Tiana

详情
英文摘要

We investigate the parameter space of transformer models trained on protein sequence data using a statistical mechanics framework, sampling the loss landscape at varying temperatures by Langevin dynamics to characterize the low-loss manifold and understand the mechanisms underlying the superior performance of transformers in protein structure prediction. We find that, at variance with feedforward networks, the lack of a first--order--like transition in the loss of the transformer produces a range of intermediate temperatures with good learning properties. We show that the parameters of most layers are highly conserved at these temperatures if the dimension of the embedding is optimal, and we provide an operative way to find this dimension. Finally, we show that the attention matrix is more predictive of the contact maps of the protein at higher temperatures and for higher dimensions of the embedding than those optimal for learning.

2603.29520 2026-04-01 cs.CR cs.AI cs.MM cs.NI

TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification

Qing He, Xiaowei Fu, Lei Zhang

Comments Project page \url{https://github.com/Posuly/TrafficMoE_main}

详情
英文摘要

Encrypted traffic classification is a critical task for network security. While deep learning has advanced this field, the occlusion of payload semantics by encryption severely challenges standard modeling approaches. Most existing frameworks rely on static and homogeneous pipelines that apply uniform parameter sharing and static fusion strategies across all inputs. This one-size-fits-all static design is inherently flawed: by forcing structured headers and randomized payloads into a unified processing pipeline, it inevitably entangles the raw protocol signals with stochastic encryption noise, thereby degrading the fine-grained discriminative features. In this paper, we propose TrafficMoE, a framework that breaks through the bottleneck of static modeling by establishing a Disentangle-Filter-Aggregate (DFA) paradigm. Specifically, to resolve the structural between-components conflict, the architecture disentangles headers and payloads using dual-branch sparse Mixture-of-Experts (MoE), enabling modality-specific modeling. To mitigate the impact of stochastic noise, an uncertainty-aware filtering mechanism is introduced to quantify reliability and selectively suppress high-variance representations. Finally, to overcome the limitations of static fusion, a routing-guided strategy aggregates cross-modality features dynamically, that adaptively weighs contributions based on traffic context. With this DFA paradigm, TrafficMoE maximizes representational efficiency by focusing solely on the most discriminative traffic features. Extensive experiments on six datasets demonstrate TrafficMoE consistently outperforms state-of-the-art methods, validating the necessity of heterogeneity-aware modeling in encrypted traffic analysis. The source code is publicly available at https://github.com/Posuly/TrafficMoE_main.

2603.29499 2026-04-01 eess.SY cs.LG cs.RO cs.SY math.OC

Model Predictive Path Integral PID Control for Learning-Based Path Following

Teruki Kato, Koshi Oishi, Seigo Ito

Comments Submitted to IFAC Journal of Systems and Control

详情
英文摘要

Classical proportional--integral--derivative (PID) control is widely employed in industrial applications; however, achieving higher performance often motivates the adoption of model predictive control (MPC). Although gradient-based methods are the standard for real-time optimization, sampling-based approaches have recently gained attention. In particular, model predictive path integral (MPPI) control enables gradient-free optimization and accommodates non-differentiable models and objective functions. However, directly sampling control input sequences may yield discontinuous inputs and increase the optimization dimensionality in proportion to the prediction horizon. This study proposes MPPI--PID control, which applies MPPI to optimize PID gains at each control step, thereby replacing direct high-dimensional input-sequence optimization with low-dimensional gain-space optimization. This formulation enhances sample efficiency and yields smoother inputs via the PID structure. We also provide theoretical insights, including an information-theoretic interpretation that unifies MPPI and MPPI--PID, an analysis of the effect of optimization dimensionality on sample efficiency, and a characterization of input continuity induced by the PID structure. The proposed method is evaluated on the learning-based path following of a mini forklift using a residual-learning dynamics model that integrates a physical model with a neural network. System identification is performed with real driving data. Numerical path-following experiments demonstrate that MPPI--PID improves tracking performance compared with fixed-gain PID and achieves performance comparable to conventional MPPI while significantly reducing input increments. Furthermore, the proposed method maintains favorable performance even with substantially fewer samples, demonstrating its improved sample efficiency.

2603.29474 2026-04-01 eess.SY cs.LG cs.SY

From Big Data to Fast Data: Towards High-Quality Datasets for Machine Learning Applications from Closed-Loop Data Collection

Philipp Reis, Jacqueline Henle, Stefan Otten, Eric Sax

Comments Submitted to IEEE ISSE 2026

详情
英文摘要

The increasing capabilities of machine learning models, such as vision-language and multimodal language models, are placing growing demands on data in automotive systems engineering, making the quality and relevance of collected data enablers for the development and validation of such systems. Traditional Big Data approaches focus on large-scale data collection and offline processing, while Smart Data approaches improve data selection strategies but still rely on centralized and offline post-processing. This paper introduces the concept of Fast Data for automotive systems engineering. The approach shifts data selection and recording onto the vehicle as the data source. By enabling real-time, context-aware decisions on whether and which data should be recorded, data collection can be directly aligned with data quality objectives and collection strategies within a closed-loop. This results in datasets with higher relevance, improved coverage of critical scenarios, and increased information density, while at the same time reducing irrelevant data and associated costs. The proposed approach provides a structured foundation for designing data collection strategies that are aligned with the needs of modern machine learning algorithms. It supports efficient data acquisition and contributes to scalable and cost-effective ML development processes in automotive systems engineering.

2603.29469 2026-04-01 cs.HC cs.AI

iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models

Xudong Zhou, Jinyuan Liang, Qiuyi Guo, Guozheng Li

Journal ref Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26), April 13--17, 2026, Barcelona, Spain

详情
英文摘要

We present iPoster, an interactive layout generation framework that empowers users to guide content-aware poster layout design by specifying flexible constraints. iPoster enables users to specify partial intentions within the intention module, such as element categories, sizes, positions, or coarse initial drafts. Then, the generation module instantly generates refined, context-sensitive layouts that faithfully respect these constraints. iPoster employs a unified graph-enhanced diffusion architecture that supports various design tasks under user-specified constraints. These constraints are enforced through masking strategies that precisely preserve user input at every denoising step. A cross content-aware attention module aligns generated elements with salient regions of the canvas, ensuring visual coherence. Extensive experiments show that iPoster not only achieves state-of-the-art layout quality, but offers a responsive and controllable framework for poster layout design with constraints.

2603.29438 2026-04-01 eess.IV cs.CV

Polyhedral Unmixing: Bridging Semantic Segmentation with Hyperspectral Unmixing via Polyhedral-Cone Partitioning

Antoine Bottenmuller, Etienne Decencière, Petr Dokládal

详情
英文摘要

Semantic segmentation and hyperspectral unmixing are two central problems in spectral image analysis. The former assigns each pixel a discrete label corresponding to its material class, whereas the latter estimates pure material spectra, called endmembers, and, for each pixel, a vector representing material abundances in the observed scene. Despite their complementarity, these two problems are usually addressed independently. This paper aims to bridge these two lines of work by formally showing that, under the linear mixing model, pixel classification by dominant materials induces polyhedral-cone regions in the spectral space. We leverage this fundamental property to propose a direct segmentation-to-unmixing pipeline that performs blind hyperspectral unmixing from any semantic segmentation by constructing a polyhedral-cone partition of the space that best fits the labeled pixels. Signed distances from pixels to the estimated regions are then computed, linearly transformed via a change of basis in the distance space, and projected onto the probability simplex, yielding an initial abundance estimate. This estimate is used to extract endmembers and recover final abundances via matrix pseudo-inversion. Because the segmentation method can be freely chosen, the user gains explicit control over the unmixing process, while the rest of the pipeline remains essentially deterministic and lightweight. Beyond improving interpretability, experiments on three real datasets demonstrate the effectiveness of the proposed approach when associated with appropriate clustering algorithms, and show consistent improvements over recent deep and non-deep state-of-the-art methods. The code is available at: https://github.com/antoine-bottenmuller/polyhedral-unmixing

2603.29426 2026-04-01 cs.NI cs.LG

Multi-AUV Cooperative Target Tracking Based on Supervised Diffusion-Aided Multi-Agent Reinforcement Learning

Jiaao Ma, Chuan Lin, Guangjie Han, Shengchao Zhu, Zhenyu Wang, Chen An

详情
英文摘要

In recent years, advances in underwater networking and multi-agent reinforcement learning (MARL) have significantly expanded multi-autonomous underwater vehicle (AUV) applications in marine exploration and target tracking. However, current MARL-driven cooperative tracking faces three critical challenges: 1) non-stationarity in decentralized coordination, where local policy updates destabilize teammates' observation spaces, preventing convergence; 2) sparse-reward exploration inefficiency from limited underwater visibility and constrained sensor ranges, causing high-variance learning; and 3) water disturbance fragility combined with handcrafted reward dependency that degrades real-world robustness under unmodeled hydrodynamic conditions. To address these challenges, this paper proposes a hierarchical MARL architecture comprising four layers: global training scheduling, multi-agent coordination, local decision-making, and real-time execution. This architecture optimizes task allocation and inter-AUV coordination through hierarchical decomposition. Building on this foundation, we propose the Supervised Diffusion-Aided MARL (SDA-MARL) algorithm featuring three innovations: 1) a dual-decision architecture with segregated experience pools mitigating nonstationarity through structured experience replay; 2) a supervised learning mechanism guiding the diffusion model's reverse denoising process to generate high-fidelity training samples that accelerate convergence; and 3) disturbance-robust policy learning incorporating behavioral cloning loss to guide the Deep Deterministic Policy Gradient network update using high-quality replay actions, eliminating handcrafted reward dependency. The tracking algorithm based on SDA-MARL proposed in this paper achieves superior precision compared to state-of-the-art methods in comprehensive underwater simulations.

2603.29369 2026-04-01 cs.AR cs.LG

AP-DRL: A Synergistic Algorithm-Hardware Framework for Automatic Task Partitioning of Deep Reinforcement Learning on Versal ACAP

Enlai Li, Zhe Lin, Sharad Sinha, Wei Zhang

详情
英文摘要

Deep reinforcement learning has demonstrated remarkable success across various domains. However, the tight coupling between training and inference processes makes accelerating DRL training an essential challenge for DRL optimization. Two key issues hinder efficient DRL training: (1) the significant variation in computational intensity across different DRL algorithms and even among operations within the same algorithm complicates hardware platform selection, while (2) DRL's wide dynamic range could lead to substantial reward errors with conventional FP16+FP32 mixed-precision quantization. While existing work has primarily focused on accelerating DRL for specific computing units or optimizing inference-stage quantization, we propose AP-DRL to address the above challenges. AP-DRL is an automatic task partitioning framework that harnesses the heterogeneous architecture of AMD Versal ACAP (integrating CPUs, FPGAs, and AI Engines) to accelerate DRL training through intelligent hardware-aware optimization. Our approach begins with bottleneck analysis of CPU, FPGA, and AIE performance across diverse DRL workloads, informing the design principles for AP-DRL's inter-component task partitioning and quantization optimization. The framework then addresses the challenge of platform selection through design space exploration-based profiling and ILP-based partitioning models that match operations to optimal computing units based on their computational characteristics. For the quantization challenge, AP-DRL employs a hardware-aware algorithm coordinating FP32 (CPU), FP16 (FPGA/DSP), and BF16 (AI Engine) operations by leveraging Versal ACAP's native support for these precision formats. Comprehensive experiments indicate that AP-DRL can achieve speedup of up to 4.17$\times$ over programmable logic and up to 3.82$\times$ over AI Engine baselines while maintaining training convergence.

2603.29292 2026-04-01 cs.SE cs.AI cs.PL

Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

Huan Zhang, Wei Cheng, Wei Hu

Comments Accepted in the 34th IEEE/ACM International Conference on Program Comprehension (ICPC 2026)

详情
英文摘要

Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle? To answer this, we propose ConSelf, a self-improving approach built upon two key ideas. First, we introduce code semantic entropy, a novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors, enabling a curriculum construction with the most learnable problems. Second, we present consensus-driven direct preference optimization (Con-DPO), a preference-based fine-tuning method that weights each preference pair by its behavioral consensus, thereby mitigating the impact of noisy self-generated supervision. Experiments on various benchmarks and backbone LLMs demonstrate that ConSelf significantly outperforms baselines, validating the effectiveness of semantic entropy-based curriculum construction and consensus-driven optimization in improving code generation without external supervision.

2603.29289 2026-04-01 cs.CR cs.AI cs.DC

Downsides of Smartness Across Edge-Cloud Continuum in Modern Industry

Akhil Gupta Chigullapally, Sharvan Vittala, Razin Farhan Hussian, Mohsen Amini Salehi

详情
英文摘要

The fast pace of modern AI is rapidly transforming traditional industrial systems into vast, intelligent and potentially unmanned autonomous operational environments driven by AI-based solutions. These solutions leverage various forms of machine learning, reinforcement learning, and generative AI. The introduction of such smart capabilities has pushed the envelope in multiple industrial domains, enabling predictive maintenance, optimized performance, and streamlined workflows. These solutions are often deployed across the Industrial Internet of Things (IIoT) and supported by the Edge-Fog-Cloud computing continuum to enable urgent (i.e., real-time or near real-time) decision-making. Despite the current trend of aggressively adopting these smart industrial solutions to increase profit, quality, and efficiency, large-scale integration and deployment also bring serious hazards that if ignored can undermine the benefits of smart industries. These hazards include unforeseen interoperability side-effects and heightened vulnerability to cyber threats, particularly in environments operating with a plethora of heterogeneous IIoT systems. The goal of this study is to shed light on the potential consequences of industrial smartness, with a particular focus on security implications, including vulnerabilities, side effects, and cyber threats. We distinguish software-level downsides stemming from both traditional AI solutions and generative AI from those originating in the infrastructure layer, namely IIoT and the Edge-Cloud continuum. At each level, we investigate potential vulnerabilities, cyber threats, and unintended side effects. As industries continue to become smarter, understanding and addressing these downsides will be crucial to ensure secure and sustainable development of smart industrial systems.

2603.29288 2026-04-01 cs.CY cs.AI cs.CL cs.HC cs.SI

Sima AIunty: Caste Audit in LLM-Driven Matchmaking

Atharva Naik, Shounok Kar, Varnika Sharma, Ashwin Rajadesingan, Koustuv Saha

详情
英文摘要

Social and personal decisions in relational domains such as matchmaking are deeply entwined with cultural norms and historical hierarchies, and can potentially be shaped by algorithmic and AI-mediated assessments of compatibility, acceptance, and stability. In South Asian contexts, caste remains a central aspect of marital decision-making, yet little is known about how contemporary large language models (LLMs) reproduce or disrupt caste-based stratification in such settings. In this work, we conduct a controlled audit of caste bias in LLM-mediated matchmaking evaluations using real-world matrimonial profiles. We vary caste identity across Brahmin, Kshatriya, Vaishya, Shudra, and Dalit, and income across five buckets, and evaluate five LLM families (GPT, Gemini, Llama, Qwen, and BharatGPT). Models are prompted to assess profiles along dimensions of social acceptance, marital stability, and cultural compatibility. Our analysis reveals consistent hierarchical patterns across models: same-caste matches are rated most favorably, with average ratings up to 25% higher (on a 10-point scale) than inter-caste matches, which are further ordered according to traditional caste hierarchy. These findings highlight how existing caste hierarchies are reproduced in LLM decision-making and underscore the need for culturally grounded evaluation and intervention strategies in AI systems deployed in socially sensitive domains, where such systems risk reinforcing historical forms of exclusion.

2603.29259 2026-04-01 cs.IR cs.CL

Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE

Hejin Huang, Jusheng Zhang, Kaitong Cai, Jian Wang, Rong Pan

详情
英文摘要

Preference-based alignment objectives have been widely adopted, from RLHF-style pairwise learning in large language models to emerging applications in recommender systems. Yet, existing work rarely examines how Direct Preference Optimization (DPO) behaves under implicit feedback, where unobserved items are not reliable negatives. We conduct systematic experiments on multimodal sequential recommendation to compare common negative-selection strategies and their interaction with DPO training. Our central finding is that a simple modification, replacing deterministic hard negatives with stochastic sampling from a dynamic top-K candidate pool, consistently improves ranking performance. We attribute its effectiveness to two factors: (1) reducing erroneous suppressive gradients caused by false negatives, and (2) retaining informative hard signals while smoothing optimization via controlled stochasticity. With an optional sparse Mixture-of-Experts encoder for efficient capacity scaling, RoDPO achieves up to 5.25% NDCG@5 on three Amazon benchmarks, with nearly unchanged inference cost.

2603.29255 2026-04-01 eess.SY cs.LG cs.SY

Real-Time Surrogate Modeling for Fast Transient Prediction in Inverter-Based Microgrids Using CNN and LightGBM

Osasumwen Cedric Ogiesoba-Eguakun, Kaveh Ashenayi, Suman Rath

Comments 10 pages

详情
英文摘要

Real-time monitoring of inverter-based microgrids is essential for stability, fault response, and operational decision-making. However, electromagnetic transient (EMT) simulations, required to capture fast inverter dynamics, are computationally intensive and unsuitable for real-time applications. This paper presents a data-driven surrogate modeling framework for fast prediction of microgrid behavior using convolutional neural networks (CNN) and Light Gradient Boosting Machine (LightGBM). The models are trained on a high-fidelity EMT digital twin dataset of a microgrid with ten distributed generators under eleven operating and disturbance scenarios, including faults, noise, and communication delays. A sliding-window method is applied to predict important system variables, including voltage magnitude, frequency, total active power, and voltage dip. The results show that model performance changes depending on the type of variable being predicted. The CNN demonstrates high accuracy for time-dependent signals such as voltage, with an $R^2$ value of 0.84, whereas LightGBM shows better performance for structured and disturbance-related variables, achieving an $R^2$ of 0.999 for frequency and 0.75 for voltage dip. A combined CNN+LightGBM model delivers stable performance across all variables. Beyond accuracy, the surrogate models also provide major improvements in computational efficiency. LightGBM achieves more than $1000\times$ speedup and runs faster than real time, while the hybrid model achieves over $500\times$ speedup with near real-time performance. These findings show that data-driven surrogate models can effectively represent microgrid dynamics. They also support real-time and faster-than-real-time predictions. As a result, they are well-suited for applications such as monitoring, fault analysis, and control in inverter-based power systems.

2603.29217 2026-04-01 eess.AS cs.CL cs.SD

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition

Lukuang Dong, Ziwei Li, Saierdaer Yusuyin, Xianyu Zhao, Zhijian Ou

Comments Update after INTERSPEECH2026 submission

详情
英文摘要

Phoneme-based ASR factorizes recognition into speech-to-phoneme (S2P) and phoneme-to-grapheme (P2G), enabling cross-lingual acoustic sharing while keeping language-specific orthography in a separate module. While large language models (LLMs) are promising for P2G, multilingual P2G remains challenging due to language-aware generation and severe cross-language data imbalance. We study multilingual LLM-based P2G on the ten-language CV-Lang10 benchmark. We examine robustness strategies that account for S2P uncertainty, including DANP and Simplified SKM (S-SKM). S-SKM is a Monte Carlo approximation that avoids CTC-based S2P probability weighting in P2G training. Robust training and low-resource oversampling reduce the average WER from 10.56% to 7.66%.

2603.29216 2026-04-01 cs.SE cs.AI cs.CR cs.LG

Software Vulnerability Detection Using a Lightweight Graph Neural Network

Miles Farmer, Ekincan Ufuktepe, Anne Watson, Hialo Muniz Carvalho, Vadim Okun, Zineb Maasaoui, Kannappan Palaniappan

Comments 12 pages, 3 figures, preprint of journal submission

详情
英文摘要

Large Language Models (LLMs) have emerged as a popular choice in vulnerability detection studies given their foundational capabilities, open source availability, and variety of models, but have limited scalability due to extensive compute requirements. Using the natural graph relational structure of code, we show that our proposed graph neural network (GNN) based deep learning model VulGNN for vulnerability detection can achieve performance almost on par with LLMs, but is 100 times smaller in size and fast to retrain and customize. We describe the VulGNN architecture, ablation studies on components, learning rates, and generalizability to different code datasets. As a lightweight model for vulnerability analysis, VulGNN is efficient and deployable at the edge as part of real-world software development pipelines.