arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1296
2604.22203 2026-04-27 eess.AS cs.SD

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

Szu-Jui Chen, John H. L. Hansen

Comments Accepted to Speech Communication 2026

详情
Journal ref
Speech Communication 180 (2026) 103380
英文摘要

Using self-supervised learning (SSL) models has significantly improved performance for downstream speech tasks, surpassing the capabilities of traditional hand-crafted features. This study investigates the amalgamation of SSL models, with the aim to leverage both their individual strengths and refine extracted features to achieve improved speech recognition models for naturalistic scenarios. Our research investigates the massive naturalistic Fearless Steps (FS) APOLLO resource, with particular focus on the FS Challenge (FSC) Phase-4 corpus, providing the inaugural analysis of this dataset. Additionally, we incorporate the CHiME-6 dataset to evaluate performance across diverse naturalistic speech scenarios. While exploring previously proposed Feature Refinement Loss and fusion methods, we found these methods to be less effective on the FSC Phase-4 corpus. To address this, we introduce a novel deep cross-attention (DCA) fusion method, designed to elevate performance, especially for the FSC Phase-4 corpus. Our objective is to foster creation of superior FS APOLLO community resources, catering to the diverse needs of researchers across various disciplines. The proposed solution achieves an absolute +1.1% improvement in WER, providing effective meta-data creation for the massive FS APOLLO community resource.

2604.22191 2026-04-27 cs.CR cs.CL

Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning

Chaoran Chen, Dayu Yuan, Peter Kairouz

详情
英文摘要

In agentic workflows, LLMs frequently process retrieved contexts that are legally protected from further training. However, auditors currently lack a reliable way to verify if a provider has violated the terms of service by incorporating these data into post-training, especially through Reinforcement Learning (RL). While standard auditing relies on verbatim memorization and membership inference, these methods are ineffective for RL-trained models, as RL primarily influences a model's behavioral style rather than the retention of specific facts. To bridge this gap, we introduce Behavioral Canaries, a new auditing mechanism for RLFT pipelines. The framework instruments preference data by pairing document triggers with feedback that rewards a distinctive stylistic response, inducing a latent trigger-conditioned preference if such data are used in training. Empirical results show that these behavioral signals enable detection of unauthorized document-conditioned training, achieving a 67% detection rate at a 10% false-positive rate (AUROC = 0.756) at a 1% canary injection rate. More broadly, our results establish behavioral canaries as a new auditing mechanism for RLFT pipelines, enabling auditors to test for training-time influence even when such influence manifests as distributional behavioral change rather than memorization.

2604.22180 2026-04-27 cs.IR cs.AI

ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression

Xiaojie Ke, Shuai Zhang, Liansheng Sun, Yongjin Wang, Hengjun Jiang, Xiangkun Liu, Cunxin Gu, Jian Xu, Guanjun Jiang

详情
英文摘要

Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the "lost in the middle" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment between the compressed representation space and the ranking space, we introduce a residual connection structure that combines encoder embeddings with contextualized hidden states from the reranker. Furthermore, we replace the conventional autoregressive decoding with a one-step cosine-similarity-based scoring mechanism, eliminating the generation bottleneck entirely. ResRank is trained through a carefully designed dual-stage, multi-task, end-to-end joint optimization strategy that simultaneously trains the encoder and reranker, achieving learning objective alignment between retrieval and reranking while substantially reducing training complexity. Extensive experiments on TREC Deep Learning and eight BEIR benchmark datasets demonstrate that ResRank achieves competitive or superior ranking effectiveness compared to existing approaches while requiring zero generated tokens and processing only one token per passage, yielding a fundamentally better balance between effectiveness and efficiency.

2604.22176 2026-04-27 cs.CR cs.LG

FixV2W: Correcting Invalid CVE-CWE Mappings with Knowledge Graph Embeddings

Sevval Simsek, Varsha Athreya, David Starobinski

详情
英文摘要

Accurate mapping between Common Vulnerabilities and Exposures (CVE) and Common Weakness Enumeration (CWE) entries is critical for effective vulnerability management and risk assessment. However, public databases, such as the National Vulnerability Database (NVD), suffer from inconsistent and incomplete CVE to CWE mappings, complicating automated analysis and remediation. We introduce FixV2W, a lightweight approach that leverages knowledge graph embeddings and longitudinal trends to improve mapping accuracy of the NVD. FixV2W systematically analyzes historical remapping patterns and leverages hierarchical relationships within NVD and CWE data to predict more precise CWE mappings for vulnerabilities linked to Prohibited or Discouraged categories. We run extensive experimental evaluation of FixV2W, based on test data set collected between August 2021 and December 2024. Considering the Top 10 ranked predictions, the results show that FixV2W predicts the correct CWE mappings for 69% of exploited vulnerabilities that had invalid CWEs before they were exploited. We also show that FixV2W significantly improves the performance of ML models relying on NVD data. For instance, for a model geared at uncovering unknown CVE-CWE mappings, FixV2W improves the Mean Reciprocal Rank (MRR) from 0.174 to 0.608. These results show that FixV2W is a promising approach to identify and thwart emerging threats.

2604.22157 2026-04-27 cs.CR cs.AI

PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store

Bhanuka Silva, Anirban Mahanti, Aruna Seneviratne, Suranga Senevirante

Comments 20 pages, 9 figures, 2 tables

详情
英文摘要

Existing research typically treats privacy policies as flat, uniform text, extracting information without regard for the document's logical hierarchy. Disregard for structural cues of section headings designed to guide the reader, often leads automated methods to entangle distinct data practices, particularly when linking sensitive data items to their specific purposes. To address this, we introduce PrivSTRUCT, a novel and systematic encoder and decoder combined framework that to untangle complex privacy disclosures. Benchmarking against the state-of-the-art tool PoliGrapher reveals that PrivSTRUCT robustly extracts more than x2 the number of data item and purpose excerpts while retaining developer-defined structural cues. By applying PrivSTRUCT to a large-scale dataset of 3,756 Android apps, we uncover a critical transparency gap: the probability of developers overstating a data purpose is 20.4% higher for first-party collection and 9.7% higher for third-party sharing when they rely on globally defined purposes rather than specific, locally scoped disclosures. Alarmingly, we find that sensitive third-party data flows such as sharing financial data for analytics are frequently diluted and entangled into generic or unrelated categories, highlighting a persistent failure in the current purpose disclosure landscape.

2604.22143 2026-04-27 cs.CY cs.CL

Recognition Without Authorization: LLMs and the Moral Order of Online Advice

Tom van Nuenen

详情
英文摘要

Large language models are increasingly used to mediate everyday interpersonal dilemmas, yet how their advisory defaults interact with the concentrated moral orders of specific communities remains poorly understood. This article compares four assistant-style LLMs with community-endorsed advice on 11,565 posts from r/relationship_advice, using the subreddit as a concentrated, vote-ratified moral formation whose prescriptive clarity makes divergence measurable. Across models, LLMs identify many of the same dynamics as human commenters, but are markedly less likely to convert that recognition into directive authorization for action. The gap is sharpest where community consensus is strongest: on high-consensus posts involving abuse or safety threats, models recommend exit at roughly half the human rate while maintaining elevated levels of hedging, validation, and therapeutic framing. The article describes this pattern as recognition without authorization: the capacity to register harm while withholding socially ratified permission for consequential action. This divergence is not incidental but structural: a portable advisory style that remains validating, risk-averse, and weakly directive across contexts. Safety alignment is one plausible contributor to this pattern, alongside training-data averaging and broader assistant design. The article argues that model divergence can be reframed from a technical error to a way of seeing what standardized assistant norms flatten when they encounter situated moral worlds.

2604.22136 2026-04-27 cs.CR cs.LG

Sovereign Agentic Loops: Decoupling AI Reasoning from Execution in Real-World Systems

Jun He, Deying Yu

Comments 15 pages, 2 figures

详情
英文摘要

Large language model (LLM) agents increasingly issue API calls that mutate real systems, yet many current architectures pass stochastic model outputs directly to execution layers. We argue that this coupling creates a safety risk because model correctness, context awareness, and alignment cannot be assumed at execution time. We introduce Sovereign Agentic Loops (SAL), a control-plane architecture in which models emit structured intents with justifications, and the control plane validates those intents against true system state and policy before execution. SAL combines an obfuscation membrane, which limits model access to identity-sensitive state, with a cryptographically linked Evidence Chain for auditability and replay. We formalize SAL and show that, under the stated assumptions, it provides policy-bounded execution, identity isolation, and deterministic replay. In an OpenKedge prototype for cloud infrastructure, SAL blocks 93% of unsafe intents at the policy layer, rejects the remaining 7% via consistency checks, prevents unsafe executions in our benchmark, and adds 12.4 ms median latency.

2604.22133 2026-04-27 eess.AS cs.SD

Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis

Haopeng Geng, Longfei Yang, Xi Chen, Haitong Sun, Daisuke Saito, Nobuaki Minematsu

详情
英文摘要

Mispronunciation Detection and Diagnosis (MDD) requires modeling fine-grained acoustic deviations. However, current ASR-derived MDD systems often face inherent limitations. In particular, CTC-based models favor sequence-level alignments that neglect transient mispronunciation cues, while explicit canonical priors bias predictions toward intended targets. To address these bottlenecks, we propose a prompt-free framework decoupling acoustic fidelity from canonical guidance. First, we introduce CROTTC, an acoustic model enforcing monotonic, frame-level alignment to accurately capture pronunciation deviations. Second, we implicitly inject mispronunciation information via the IF strategy under the knowledge transfer principle. Experiments show CROTTC-IF achieves a 71.77% F1-score on L2-ARCTIC and 71.70% F1-score on the Iqra'Eval2 leaderboard. With empirical analysis, we demonstrate that decoupling acoustics from explicit priors provides highly robust MDD.

2604.22121 2026-04-27 eess.SY cs.RO cs.SY

Characterizing pitch and roll torque coupling in insect-sized flapping-wing robots using a microfabricated gimbal

Aaron Weber, Daksh Dhingra, Sawyer B. Fuller

Comments Submitted for journal publication in Mechatronics and conference presentation at IFAC World Congress 2026. 9 pages, 11 figures

详情
英文摘要

Sub-gram flapping-wing flying insect robots (FIRs) are challenging to model because of mechanical complexity in their wings, unsteady aerodynamic flow, and the difficulty of making precise measurements at a small scale. Coupling effects between roll and pitch torque actuation have not previously been measured because a two-axis sensor that is sensitive enough has not been realized. To address this shortcoming, we introduce a microfabricated gimbal design capable of precisely and simultaneously measuring roll and pitch torques as well as thrust. We then used it to measure the extent to which a pitch torque command affects roll torque and vice versa on a 180 mg piezo-actuated flapping-wing flying platform. Our results show a high coefficient of determination in the linear regression for both pitch (0.95) and roll (0.98) and low cross-correlation coefficients (-0.001 and -0.085, respectively) across the full range of simultaneous torque commands, indicating negligible cross-axis coupling. Similarly, thrust force deviates by a maximum of only 5.8% from the mean thrust value. These results validate the assumption that pitch and toll can be considered independently in control and will inform future models of how inputs affect the aerodynamics of resonant flapping-wing systems.

2604.22103 2026-04-27 cs.CY cs.CV

How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits

Jason Tang, Stephen Law

详情
英文摘要

Street-view perception models predict subjective attributes such as safety at scale, but remain correlational: they do not identify which localized visual changes would plausibly shift human judgement for a specific scene. We propose a lever-based interventional counterfactual framework that recasts scene-level explainability as a bounded search over structured counterfactual edits. Each lever specifies a semantic concept, spatial support, intervention direction, and constrained edit template. Candidate edits are generated through prompt-conditioned image editing and retained only if they satisfy validity checks for same-place preservation, locality, realism, and plausibility. In a pilot across 50 scenes from five cities, the framework reveals preliminary proxy-based directional patterns and a practical failure taxonomy under prompt-only editing, with Mobility Infrastructure and Physical Maintenance showing the largest auxiliary safety shifts. Human pairwise judgements remain the ground-truth endpoint for future validation.

2604.22096 2026-04-27 cs.CR cs.LG cs.SE

Who Audits the Auditor? Tamper-Proof Fraud Detection with Blockchain-Anchored Explainable ML

Zhaohui Wang

Comments Accepted to IEEE COMPSAC 2026 (Paper ID 9376, SEPT Symposium). This is the de-anonymized camera-ready version. Code is available at: https://github.com/GeoffreyWang1117/fraud-detection-chain

详情
英文摘要

In enterprise fraud detection, model accuracy alone is insufficient when insiders can tamper with audit logs or bypass approval workflows. Real-world incidents show that fraud often persists not because detection algorithms fail, but because the audit trail itself is controllable by privileged operators. This exposes a fundamental trust gap: *who audits the auditor?* We present a tamper-evident fraud detection system that anchors both ML predictions and workflow execution to an immutable blockchain ledger. Rather than using blockchain as passive storage, we enforce the entire approval process through smart contracts, ensuring that every transaction, prediction, and explanation is atomically recorded and cannot be retroactively modified. Our detection module achieves competitive accuracy (F1 = 0.895, PR-AUC = 0.974) while providing cryptographically verifiable decision trails that support regulatory auditability requirements (e.g., GDPR Article 22). System evaluation shows sub-25 ms inference latency and economically viable deployment on Layer-2 networks at under \$0.01 per transaction (validated against PolygonScan data), supporting enterprise-scale workloads of 10,000+ monthly payments.

2604.22089 2026-04-27 cs.SE cs.AI

Ethics Testing: Proactive Identification of Generative AI System Harms

Shin Hwei Tan, Haibo Wang, Heng Li

详情
英文摘要

Generative Artificial Intelligence (GAI) systems that can automatically generate content in the form of source code or other contents (e.g., images) has seen increasing popularity due to the emergence of tools such as ChatGPT which rely on Large Language Models (LLMs). Misuse of the automatically generated content can incur serious consequences due to potential harms in the generated content. Despite the importance of ensuring the quality of automatically generated content, there is little to no approach that can systematically generate tests for identifying software harms in the content generated by these GAI systems. In this article, we introduce the novel concept of ethics testing which aims to systematically generate tests for identifying software harms. Different from existing testing methodologies (e.g., fairness testing that aims to identifying software discrimination), ethics testing aims to systematically detect software harms that could be induced due to unethical behavior (e.g., harmful behavior or behavior that violates intellectual property rights) in automatically generated content. We introduced the concept of ethics testing, discussed the challenges therewithin, and conducted five case studies to show how ethics testing can be performed for generative AI systems.

2604.22072 2026-04-27 cs.DC cs.AI

Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning

Amine Barrak

详情
英文摘要

Federated learning (FL) aggregation on serverless platforms faces a hard scalability ceiling: existing architectures (lambda-FL, LIFL) partition clients across aggregators, but every aggregator must hold the complete model gradient in memory. When gradients exceed the per-function memory limit (e.g., 10 GB on AWS Lambda), aggregation becomes infeasible regardless of tree depth or branching factor. We propose GradsSharding, which instead partitions the gradient tensor into M shards, each averaged independently by a serverless function that receives contributions from all clients. Because FedAvg averaging is element-wise, this produces bit-identical results to tree-based approaches, so model accuracy is invariant by construction. Per-function memory is bounded at O(|θ|/M), independent of client count, enabling aggregation of arbitrarily large models. We evaluate GradsSharding against lambda-FL and LIFL through HPC experiments and real AWS Lambda deployments across model sizes from 43 MB to 5 GB. Results show a cost crossover at approximately 500 MB gradient size, 2.7x cost reduction at VGG-16 scale, and that GradsSharding is the only architecture that remains deployable beyond the serverless memory ceiling.

2604.22068 2026-04-27 cs.SE cs.RO

TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation

Nahian Salsabil, Sebastian Elbaum

Comments FSE'26 Tool Demonstration Track

详情
英文摘要

Validating Autonomous Vehicles (AVs) requires exposure to rare, safety-critical scenarios, infrequent in routine driving data. Existing benchmarks address this by generating synthetic conflicts or mapping accident descriptions to abstract road geometries, failing to capture the topological complexity of real-world crashes. We introduce TRACE , a pipeline that automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by (1) retrieving site-specific OpenStreetMap data to preserve exact road topology, (2) leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and (3) generating simulation trajectories from semi-structured report data. Using this pipeline, we curated a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers, providing a challenging open source resource for testing AV systems against real-world failures.

2604.22046 2026-04-27 cs.SE cs.AI

Call-Chain-Aware LLM-Based Test Generation for Java Projects

Guancheng Wang, Qinghua Xu, Lionel C. Briand, Zhaoqiang Guo, Kui Liu

详情
英文摘要

Large language models (LLMs) have recently shown strong potential for generating project-level unit tests. However, existing state-of-the-art approaches primarily rely on execution-path information to guide prompt construction, which is often insufficient for complex software systems with rich inter-class dependencies, deep call chains, and intricate object initialization requirements. In this paper, we present CAT, a novel call-chain-aware LLM-based test generation approach that explicitly incorporates call-chain and dependency contexts into prompts through dedicated static analysis. To construct executable, semantically valid test contexts, CAT systematically models caller--callee relationships, object constructors, and third-party dependencies, and supports iterative test fixing when generation failures occur. We evaluate CAT on the widely used Defects4J benchmark and on four real-world GitHub projects released after the LLM's cut-off date. The results show that, across projects in Defects4J, CAT improves line and branch coverage by 18.04% and 21.74%, respectively, over the state-of-the-art approach PANTA, while consistently achieving superior performance on post-cutoff real-world projects. An ablation study further demonstrates the importance of call-chain and dependency contexts in CAT.

2604.22043 2026-04-27 physics.soc-ph cs.LG

Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues

Vivek Upadhyay, Amaresh Chakrabarti

Comments 42 pages, 4 figures, 1 table

详情
英文摘要

Background: The classroom discourse analysis has been transformed by the growing use of audio-video multimodal data, which demands analytical methods that balance interpretive depth with computational scalability. Methods: This study introduces the Audio Video Verbal Analysis (AVVA) framework, adapted from the Verbal Analysis method to integrate qualitative interpretation with quantitative modelling. Unlike fully multimodal learning analytics approaches, AVVA focuses on verbatim transcripts with essential interactional modalities. Findings: The framework embeds triangulation as a core design strategy across ten methodological steps, strengthening validity and analytical rigour. A comprehensive validation scheme addresses fundamental challenges in temporal observational research: Phi Ceiling for low-frequency variables (via Base Rate Filtering), estimation uncertainty (via bootstrap confidence intervals), and the Modifiable Temporal Unit Problem, where measured associations depend on observational window size. Four-criterion stability assessment (sign consistency, confidence interval overlap, zero exclusion, magnitude stability) classifies variable pairs into interpretable patterns: grain-invariant, scale-specific, or multi-scale, etc. structures across temporal grain sizes. Its application to 23 hours of classroom recordings illustrates its practical viability and its potential to yield meaningful insights. Contribution: The framework thus provides a scalable pathway for transforming rich classroom discourse into analysable datasets.

2604.22018 2026-04-27 q-bio.NC cs.AI cs.LG eess.SP

Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity

Deepank Girish, Yi Hao Chan, Sukrit Gupta, Jing Xia, Jagath C. Rajapakse

详情
英文摘要

Several brain foundation models (FM) have recently been proposed to predict brain disorders by modelling dynamic functional connectivity (FC). While they demonstrate remarkable model performance and zero- or few-shot generalization, the salient features identified as potential biomarkers are yet to be thoroughly evaluated. We propose RE-CONFIRM, a framework for evaluating the robustness of potential biomarker candidates elucidated by deep learning (DL) models including FMs. From experiments on five large datasets of Autism Spectrum Disorder (ASD), Attention-deficit Hyperactivity Disorder (ADHD), and Alzheimer's Disease (AD), we found that although commonly used performance metrics provide an intuitive assessment of model predictions, they are insufficient for evaluating the robustness of biomarkers identified by these models. RE-CONFIRM metrics revealed that simply finetuning FMs leads to models that fail to capture regional hubs effectively, even in disorders where hubs are known to be implicated, such as ASD and ADHD. In view of this, we propose Hub-LoRA (Low-Rank Adaptation) as a fine-tuning technique that enables FMs to not only outperform customised DL models but also produce neurobiologically faithful biomarkers supported by meta-analyses. RE-CONFIRM is generalizable and can be easily applied to ascertain the robustness of DL models trained on functional MRI datasets. Code is available at: https://github.com/SCSE-Biomedical-Computing-Group/RE-CONFIRM.

2604.22014 2026-04-27 cs.MA cs.RO

DM$^3$-Nav: Decentralized Multi-Agent Multimodal Multi-Object Semantic Navigation

Amin Kashiri, Atharva Jamsandekar, Yasin Yazıcıoğlu

详情
英文摘要

We present DM$^3$-Nav, a fully decentralized multi-agent semantic navigation system supporting multimodal open-vocabulary goal specification and multi-object missions. In our setting, decentralization implies operation without a central coordinator, global map aggregation, or shared global state at runtime. Robots operate autonomously and coordinate through ad-hoc pairwise communication, exchanging local maps, goal status, and navigation intent without synchronization. An implicit task allocation mechanism combining intent broadcasting and distance-weighted frontier selection reduces redundant exploration while preserving decentralized operation. Evaluations on HM3DSem scenes using the HM3Dv0.2 and GOAT-Bench datasets demonstrate that DM$^3$-Nav matches or exceeds centralized and shared-map baselines while eliminating single points of failure inherent in centralized architectures. Finally, we validate our approach in a real-world office environment using two mobile robots, demonstrating successful deployment relying entirely on onboard sensing and computation. A video of our real-world experiments is available online: https://drive.google.com/file/d/1QiUSCn5rIvtuTUqtuXLPgmt6S8x9-MCZ/view?usp=drive_link

2604.22005 2026-04-27 cs.IT cs.LG eess.SP math.IT

Null-Space Flow Matching for MIMO Channel Estimation in Latency-Constrained Systems

Junjie Zhao, Guangming Liang, Dongzhu Liu, Xiaonan Liu

Comments 6 pages, 3 figures, 20 references

详情
英文摘要

Accurate yet low-latency channel state information (CSI) acquisition is essential for multiple-input multiple-output (MIMO) communication systems. While advanced deep generative models, such as score-based and diffusion models, enable high-fidelity CSI reconstruction from limited pilot observations, they often suffer from high inference latency. To achieve accurate CSI estimation under stringent latency constraints, this paper proposes a null-space flow matching (FM) framework that decomposes pilot-limited MIMO channel estimation into a range-space reconstruction problem and a null-space generation problem. Specifically, the range-space component of the channel is directly recovered from noisy pilot observations, while only the ambiguous null-space component is iteratively refined using an FM-based generative prior. To further improve the robustness of the proposed framework, we introduce a power-law time schedule to better allocate the limited number of refinement steps, along with a noise-aware adaptive correction strategy to suppress channel noise on the refinement trajectory. Experimental results demonstrate that our method achieves a competitive normalized mean square error (NMSE) even under a strict latency budget of around 3 ms, while delivering superior estimation accuracy and faster inference than both model-based and generative baselines.

2604.21989 2026-04-27 math.OC cs.AI cs.RO cs.SY eess.SY math.DS

Model Predictive Control of Hybrid Dynamical Systems

Ricardo G. Sanfelice, Berk Altin

Comments Technical report associated with paper to appear in IEEE Transactions on Automatic Control, 2026

详情
Journal ref
IEEE Transactions on Automatic Control, 2026
英文摘要

The problem of controlling hybrid dynamical systems using model predictive control (MPC) is formulated and sufficient conditions for asymptotic stability of a set are provided. Hybrid dynamical systems are modeled in terms of hybrid equations, involving a differential equation and a difference equation with inputs and constraints. The proposed hybrid MPC algorithm uses a suitable prediction and control horizon construction inspired by hybrid time domains. Structural properties of the hybrid optimization problem, its feasible set, and its value function are provided. Checkable conditions to guarantee asymptotic stability of a set are provided. These conditions are given in terms of properties on the stage cost, terminal cost, and the existence of static state-feedback laws, related through a control Lyapunov function condition. Examples illustrate the results throughout the paper.

2604.21961 2026-04-27 cs.LO cs.AI cs.SC

A general optimization solver based on OP-to-MaxSAT reduction

Yuxin Zhao, Han Huang, Zhifeng Hao

详情
英文摘要

Optimization problems are fundamental in diverse fields, such as engineering, economics, and scientific computing. However, current algorithms are mostly designed for specific problem types and exhibit limited generality in solving multiple types of optimization problems. To enhance generality, we propose an automated reduction method named OP-to-MaxSAT reduction and a general optimization solver based on OP-to-MaxSAT reduction (GORED). GORED unifies the solving of multiple types of optimization problems by reducing the problems from optimization problems to MaxSAT instances in polynomial time and solving them using the state-of-the-art MaxSAT solver. The generality and solution quality of GORED are validated through experiments on 136 instances across 11 types of optimization problems. Experimental results demonstrate that GORED not only successfully solves a wide range of optimization problems but also yields solutions comparable in quality to those from existing methods, with no statistically significant differences observed. By introducing automated reduction, this work shifts the paradigm of optimization solvers from designing specialized algorithms for each problem type to employing a single algorithm for diverse problems. As a result, advances in this single algorithm can now drive progress in a wide range of optimization problems across various domains.

2604.21958 2026-04-27 cs.SE cs.AI

A systematic review of generative AI usage for IT project management

Ionut Anghel, Tudor Cioara

详情
英文摘要

This paper aims to synthesize current knowledge on generative AI in IT project management using the PRISMA methodology to provide researchers with a comprehensive perspective on techniques, applications, adoption trends, limitations, and integration across project management tools and process groups. The analysis reveals a clear dominance of OpenAI's GPT in the included studies but relying primarily on prompt engineering, suggesting that research in this area remains at an exploratory stage. Finally, it identifies and discusses three promising research directions for AI-enabled project management, including process group-specific AI agents, project role-based AI agents, and hybrid collaborative networks that enable human-guided orchestration.

2604.21957 2026-04-27 cs.IT cs.AI cs.LG eess.SP math.IT

MambaCSP: Hybrid-Attention State Space Models for Hardware-Efficient Channel State Prediction

Aladin Djuhera, Haris Gacanin, Holger Boche

详情
英文摘要

Recent works have demonstrated that attention-based transformer and large language model (LLM) architectures can achieve strong channel state prediction (CSP) performance by capturing long-range temporal dependencies across channel state information (CSI) sequences. However, these models suffer from quadratic scaling in sequence length, leading to substantial computational cost, memory consumption, and inference latency, which limits their applicability in real-time and resource-constrained wireless deployments. In this paper, we investigate whether selective state space models (SSMs) can serve as a hardware-efficient alternative for CSI prediction. We propose MambaCSP, a hybrid-attention SSM architecture that replaces LLM-based prediction backbones with a linear-time Mamba model. To overcome the local-only dependencies of pure SSMs, we introduce lightweight patch-mixer attention layers that periodically inject cross-token attentions, helping with long-context CSI prediction. Extensive MISO-OFDM simulations show that MambaCSP improves prediction accuracy over LLM-based approaches by 9-12%, while delivering up to 3.0x higher throughput, 2.6x lower VRAM usage, and 2.9x faster inference. Our results demonstrate that hybrid state space architectures provide a promising direction for scalable and hardware-efficient AI-native CSI prediction in future wireless networks.

2604.21950 2026-04-27 cs.SE cs.AI cs.LG

Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation

Charles Junichi McAndrews

Comments 17 pages main text, 2 page references, 3 figures. Code: https://github.com/L3G/feedback-over-form

详情
英文摘要

Small language models (1-3B) are practical to run locally, but individually limited on harder code generation tasks. We ask whether composing them into pipelines can recover some of that lost capability. We study code generation pipelines built from 1-3B models with execution feedback, and use a NEAT-inspired evolutionary search to test whether more complex pipeline structure helps beyond a simple refinement loop. We evaluate on HumanEval (164 problems) and sanitized MBPP (427 problems), all with local inference on a single laptop. Self-refinement with execution feedback improves code generation by more than 4 standard deviations on both benchmarks. The gains are narrow in mechanism: refinement fixes many runtime errors (especially NameError and SyntaxError), but rarely fixes logic errors such as AssertionError. Within our tested general-purpose model pool, generator identity mattered less than refiner capability: a 1.5B generator paired with a 3B refiner matched a 3B model doing both roles. Early stopping is essential; without it, every iteration is net-negative. The code-specialized models outperform every general-purpose pipeline configuration, suggesting model specialization matters more than pipeline architecture. Preliminary text-only pipeline experiments without execution feedback did not show gains at this scale. In our constrained search space, evolutionary search mostly rediscovered the same simple generate-execute-refine loop we found manually, with no clearly significant gain from added topology. Single-evaluation fitness inflates results by 5-7 percent, selecting lucky genomes over good ones. On these benchmarks at 1-3B scale, execution feedback mattered more than added pipeline complexity in determining whether composition helped.

2604.21938 2026-04-27 cs.CY cs.AI

The Biggest Risk of Embodied AI is Governance Lag

Shaoshan Liu

详情
英文摘要

Embodied AI is widely discussed as a job-displacement problem. The deeper risk, however, is governance lag: the inability of public institutions to keep pace with how fast the technology spreads through the physical economy. As reusable robotic platforms are combined with increasingly general AI models, embodied AI may scale across manufacturing, logistics, care, and infrastructure faster than governance systems can observe, interpret, and respond. We argue that this lag appears in three connected forms: observational, institutional, and distributive. The central policy challenge, therefore, is not automation alone, but whether governance and compliance systems can adapt before disruption becomes entrenched.

2604.18143 2026-04-27 stat.ML cs.LG stat.ME

Distributional Off-Policy Evaluation with Deep Quantile Process Regression

Qi Kuang, Chao Wang, Yuling Jiao, Fan Zhou

Comments Journal of the American Statistical Association

详情
英文摘要

This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a quantile-based approach for OPE using deep quantile process regression, presenting a novel algorithm called Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE). We provide new theoretical insights into the deep quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the rigorous sample complexity analysis for distributional OPE with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that DQPOPE achieves statistical advantages by estimating the full return distribution using the same sample size required to estimate a single policy value using conventional methods. Empirical studies further show that DQPOPE provides significantly more precise and robust policy value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.

2604.12799 2026-04-27 cs.GT cs.AI cs.DS

Efficiency of Proportional Mechanisms in Online Auto-Bidding Advertising

Nguyen Kim Thang

详情
英文摘要

The rise of automated bidding strategies in online advertising presents new challenges in designing and analyzing efficient auction mechanisms. In this paper, we focus on proportional mechanisms within the context of auto-bidding and study the efficiency of pure Nash equilibria, specifically the price of anarchy (PoA), under the liquid welfare objective. We first establish a tight PoA bound of 2 for the standard proportional mechanism. Next, we introduce a modified version with an alternative payment scheme that achieves a PoA bound of $1 + \frac{O(1)}{n-1}$ where $n \geq 2$ denotes the number of bidding agents. This improvement surpasses the existing PoA barrier of 2 and approaches full efficiency as the number of agents increases. Our methodology leverages duality and the Karush-Kuhn-Tucker (KKT) conditions from linear and convex programming. Due to its conceptual simplicity, our approach may offer broader applications for establishing PoA bounds.

2604.11594 2026-04-27 eess.AS cs.SD

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models

Shuiyuan Wang, Zhixian Zhao, Hongfei Xue, Chengyou Wang, Shuai Wang, Hui Bu, Xin Xu, Lei Xie

详情
英文摘要

Evaluating the emotional intelligence (EI) of audio language models (ALMs) is critical. However, existing benchmarks mostly rely on synthesized speech, are limited to single-turn interactions, and depend heavily on open-ended scoring. This paper proposes HumDial-EIBench, a comprehensive benchmark for evaluating ALMs' EI. Using real-recorded human dialogues from the ICASSP 2026 HumDial Challenge, it reformulates emotional tracking and causal reasoning into multiple-choice questions with adversarial distractors, mitigating subjective scoring bias for cognitive tasks. It retains the generation of empathetic responses and introduces an acoustic-semantic conflict task to assess robustness against contradictory multimodal signals. Evaluations of eight ALMs reveal that most models struggle with multi-turn emotional tracking and implicit causal reasoning. Furthermore, all models exhibit decoupled textual and acoustic empathy, alongside a severe text-dominance bias during cross-modal conflicts.

2604.07169 2026-04-27 stat.ML cs.LG cs.NA math.NA

FLUID: Flow-based Unified Inference for Dynamics

Tiangang Cui, Xiaodong Feng, Chenlong Pei, Xiaoliang Wan, Tao Zhou

Comments 43 pages

详情
英文摘要

Bayesian filtering and smoothing for high-dimensional nonlinear dynamical systems are fundamental yet challenging problems in many areas of science and engineering. In this work, we propose FLUID, a flow-based unified amortized inference framework for filtering and smoothing dynamics. The core idea is to encode each observation history into a fixed-dimensional summary statistic and use this shared representation to learn both a forward flow for the filtering distribution and a backward flow for the backward transition kernel. Specifically, a recurrent encoder maps each observation history to a fixed-dimensional summary statistic whose dimension does not depend on the length of the time series. Conditioned on this shared summary statistic, the forward flow approximates the filtering distribution, while the backward flow approximates the backward transition kernel. The smoothing distribution over an entire trajectory is then recovered by combining the terminal filtering distribution with the learned backward flow through the standard backward recursion. By learning the underlying temporal evolution structure, FLUID also supports extrapolation beyond the training horizon. Moreover, by coupling the two flows through shared summary statistics, FLUID induces an implicit regularization across latent state trajectories and improves trajectory-level smoothing. In addition, we develop a flow-based particle filtering variant that provides an alternative filtering procedure and enables ESS-based diagnostics when explicit model factors are available. Numerical experiments demonstrate that FLUID provides accurate approximations of both filtering distributions and smoothing paths.

2604.02861 2026-04-27 cs.DB cs.AI

LLM+Graph@VLDB'2025 Workshop Summary

Yixiang Fang, Arijit Khan, Tianxing Wu, Da Yan, Shu Wang

详情
英文摘要

The integration of large language models (LLMs) with graph-structured data has become a pivotal and fast evolving research frontier, drawing strong interest from both academia and industry. The 2nd LLM+Graph Workshop, co-located with the 51st International Conference on Very Large Data Bases (VLDB 2025) in London, focused on advancing algorithms and systems that bridge LLMs, graph data management, and graph machine learning for practical applications. This report highlights the key research directions, challenges, and innovative solutions presented by the workshop's speakers.