arXivDaily arXiv每日学术速递 周一至周五更新
2601.16187 2026-01-23 cs.MA cs.GT cs.SY eess.SY

Average Unfairness in Routing Games

Pan-Yang Su, Arwa Alanqary, Bryce L. Ferguson, Manxi Wu, Alexandre M. Bayen, Shankar Sastry

Comments 14 pages, 5 figures, 1 table. Accepted for publication at the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情
英文摘要

We propose average unfairness as a new measure of fairness in routing games, defined as the ratio between the average latency and the minimum latency experienced by users. This measure is a natural complement to two existing unfairness notions: loaded unfairness, which compares maximum and minimum latencies of routes with positive flow, and user equilibrium (UE) unfairness, which compares maximum latency with the latency of a Nash equilibrium. We show that the worst-case values of all three unfairness measures coincide and are characterized by a steepness parameter intrinsic to the latency function class. We show that average unfairness is always no greater than loaded unfairness, and the two measures are equal only when the flow is fully fair. Besides that, we offer a complete comparison of the three unfairness measures, which, to the best of our knowledge, is the first theoretical analysis in this direction. Finally, we study the constrained system optimum (CSO) problem, where one seeks to minimize total latency subject to an upper bound on unfairness. We prove that, for the same tolerance level, the optimal flow under an average unfairness constraint achieves lower total latency than any flow satisfying a loaded unfairness constraint. We show that such improvement is always strict in parallel-link networks and establish sufficient conditions for general networks. We further illustrate the latter with numerical examples. Our results provide theoretical guarantees and valuable insights for evaluating fairness-efficiency tradeoffs in network routing.

2408.09253 2026-01-23 cs.RO cs.SY eess.SY

Reinforcement Learning Compensated Model Predictive Control for Off-road Driving on Unknown Deformable Terrain

Prakhar Gupta, Jonathon M. Smereka, Yunyi Jia

Comments Submitted to IEEE Transactions on Intelligent Vehicles as a Regular Paper; was withdrawn in March 2025. A revised version of this manuscript was submitted to ACC 2025 review as a regular paper in Sep 2025

详情
英文摘要

This study presents an Actor-Critic reinforcement learning Compensated Model Predictive Controller (AC2MPC) designed for high-speed, off-road autonomous driving on deformable terrains. Addressing the difficulty of modeling unknown tire-terrain interaction and ensuring real-time control feasibility and performance, this framework integrates deep reinforcement learning with a model predictive controller to manage unmodeled nonlinear dynamics. We evaluate the controller framework over constant and varying velocity profiles using high-fidelity simulator Project Chrono. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers over three unknown terrains that represent sandy deformable track, sandy and rocky track and cohesive clay-like deformable soil track. Despite varied and previously unseen terrain characteristics, this framework generalized well enough to track longitudinal reference speeds with the least error. Furthermore, this framework required significantly less training data compared to purely learning based controller, converging in fewer steps while delivering better performance. Even when under-trained, this controller outperformed the standalone controllers, highlighting its potential for safer and more efficient real-world deployment.

2601.16149 2026-01-23 eess.SY cs.SY math.OC

Interconnection-based Model Reduction for Linear Hybrid Systems

Zirui Niu, Giordano Scarciotti, Alessandro Astolfi

Comments 17 pages

详情
英文摘要

In this paper, we address the model reduction problem for linear hybrid systems via the interconnection-based technique called moment matching. We consider two classical interconnections, namely the direct and swapped interconnections, in the hybrid setting, and we present families of reduced-order models for each interconnection via a hybrid characterisation of the steady-state responses. By combining the results for each interconnection, the design of a reduced-order model that achieves moment matching simultaneously for both interconnections is studied. In addition, we show that the presented results have simplified counterparts when the jumps of the hybrid system are periodic. A numerical simulation is finally given to illustrate the results.

2601.16077 2026-01-23 eess.AS

Loose coupling of spectral and spatial models for multi-channel diarization and enhancement of meetings in dynamic environments

Adrian Meise, Tobias Cord-Landwehr, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

Comments Accepted at ICASSP 2026

详情
英文摘要

Sound capture by microphone arrays opens the possibility to exploit spatial, in addition to spectral, information for diarization and signal enhancement, two important tasks in meeting transcription. However, there is no one-to-one mapping of positions in space to speakers if speakers move. Here, we address this by proposing a novel joint spatial and spectral mixture model, whose two submodels are loosely coupled by modeling the relationship between speaker and position index probabilistically. Thus, spatial and spectral information can be jointly exploited, while at the same time allowing for speakers speaking from different positions. Experiments on the LibriCSS data set with simulated speaker position changes show great improvements over tightly coupled subsystems.

2601.16064 2026-01-23 eess.IV cs.CV

Phi-SegNet: Phase-Integrated Supervision for Medical Image Segmentation

Shams Nafisa Ali, Taufiq Hasan

Comments 10 pages, 7 figures

详情
英文摘要

Deep learning has substantially advanced medical image segmentation, yet achieving robust generalization across diverse imaging modalities and anatomical structures remains a major challenge. A key contributor to this limitation lies in how existing architectures, ranging from CNNs to Transformers and their hybrids, primarily encode spatial information while overlooking frequency-domain representations that capture rich structural and textural cues. Although few recent studies have begun exploring spectral information at the feature level, supervision-level integration of frequency cues-crucial for fine-grained object localization-remains largely untapped. To this end, we propose Phi-SegNet, a CNN-based architecture that incorporates phase-aware information at both architectural and optimization levels. The network integrates Bi-Feature Mask Former (BFMF) modules that blend neighboring encoder features to reduce semantic gaps, and Reverse Fourier Attention (RFA) blocks that refine decoder outputs using phase-regularized features. A dedicated phase-aware loss aligns these features with structural priors, forming a closed feedback loop that emphasizes boundary precision. Evaluated on five public datasets spanning X-ray, US, histopathology, MRI, and colonoscopy, Phi-SegNet consistently achieved state-of-the-art performance, with an average relative improvement of 1.54+/-1.26% in IoU and 0.98+/-0.71% in F1-score over the next best-performing model. In cross-dataset generalization scenarios involving unseen datasets from the known domain, Phi-SegNet also exhibits robust and superior performance, highlighting its adaptability and modality-agnostic design. These findings demonstrate the potential of leveraging spectral priors in both feature representation and supervision, paving the way for generalized segmentation frameworks that excel in fine-grained object localization.

2601.16062 2026-01-23 cs.RO cs.SY eess.SY

Improve the autonomy of the SE2(3) group based Extended Kalman Filter for Integrated Navigation: Theoretical Analysis

Jiarui Cui, Maosong Wang, Wenqi Wu, Peiqi Li, Xianfei Pan

详情
英文摘要

One of core advantages of the SE2(3) Lie group framework for navigation modeling lies in the autonomy of error propagation. Current research on Lie group based extended Kalman filters has demonstrated that error propagation autonomy holds in low-precision applications, such as in micro electromechanical system (MEMS) based integrated navigation without considering earth rotation and inertial device biases. However, in high-precision navigation state estimation, maintaining autonomy is extremely difficult when considering with earth rotation and inertial device biases. This paper presents the theoretical analysis on the autonomy of SE2(3) group based high-precision navigation models under inertial, earth and world frame respectively. Through theoretical analysis, we find that the limitation of the traditional, trivial SE2(3) group navigation modeling method is that the presence of Coriolis force terms introduced by velocity in non-inertial frame. Therefore, a construction method for SE2(3) group navigation models is proposed, which brings the navigation models closer to full autonomy.

2601.16061 2026-01-23 eess.SY cs.SY

Dynamic Tactile Sensing System and Soft Actor Critic Reinforcement Learning for Inclusion Characterization

John Bannan, Nazia Rahman, Chang-Hee Won

详情
英文摘要

This paper presents the Dynamic Tactile Sensing System that utilizes robotic tactile sensing in conjunction with reinforcement learning to locate and characterize embedded inclusions. A dual arm robot is integrated with an optical Tactile Imaging Sensor that utilizes the Soft Actor Critic Algorithm to acquire tactile data based on a pixel intensity reward. A Dynamic Interrogation procedure for tactile exploration is developed that enables the robot to first localize inclusion and refine their positions for precise imaging. Experimental validation conducted on Polydimethylsiloxane phantoms demonstrates that the robot using the Tactile Soft Actor Critic Model was able to achieve size estimation errors of 2.61% and 5.29% for soft and hard inclusions compared to 7.84% and 6.87% for expert human operators. Results also show that Dynamic Tactile Sensing System was able to locate embedded inclusions and autonomously determine their mechanical properties, useful in applications such as breast tumor characterization.

2601.16054 2026-01-23 eess.SP

Hybrid Channel Estimation with Quantized Phase Feedback for Over-the-Air Computation

Martin Dahl, Erik G. Larsson

Comments ICASSP 2026

详情
英文摘要

To reduce the signaling overhead of over-the-air computation, a hybrid channel estimation scheme is proposed, where reciprocity-based and feedback-based channel estimation are combined. In particular, the impact of quantized phase-feedback is studied while the amplitude is assumed estimated exactly. The scheme enables selecting the estimation precision of amplitude and phase separately, depending on the importance of each. Two variants of the scheme are proposed: As shown through simulations and theory, the second variant with reciprocity-based estimation of the channel phase, and optimal quantization of phase feedback, can outperform the first variant estimating the phase by feedback only.

2601.16023 2026-01-23 eess.AS cs.HC

Timbre-Aware LLM-based Direct Speech-to-Speech Translation Extendable to Multiple Language Pairs

Lalaram Arya, Mrinmoy Bhattacharjee, Adarsh C. R., S. R. Mahadeva Prasanna

Comments 13 pages

详情
英文摘要

Direct Speech-to-Speech Translation (S2ST) has gained increasing attention for its ability to translate speech from one language to another, while reducing error propagation and latency inherent in traditional cascaded pipelines. However, existing direct S2ST systems continue to face notable challenges, including instability in semantic-acoustic alignment when parallel speech data is scarce, difficulty in preserving speaker identity, and limited multilingual scalability. In this work, we introduce DS2ST-LM, a scalable, single-stage direct S2ST framework leveraging a multilingual Large Language Model (LLM). The architecture integrates a Whisper speech encoder, a learnable projection module, a Qwen2-0.5B LLM, and a timbre-controlled vocoder. We construct GigaS2S-1000, a 1000-hour bilingual corpus by extending the GigaST dataset with high-fidelity synthetic target speech, and show that this synthetic data alleviates data scarcity to some extent. We investigate two semantic token generation strategies: speech-derived S3 tokens and text-derived tokens generated by a pre-trained LLM, and analyze their impact on training stability and semantic consistency. We further evaluate three projection architectures (Linear, Conv1D-Linear, and Q-Former) and observe that while higher-capacity projectors converge faster, the simple Linear projector achieves higher performance. Extensive experiments demonstrate that DS2ST-LM outperforms traditional cascaded and ST (Qwen-Audio) + TTS baselines across both lexical (BLEU, METEOR) and semantic (BLEURT, COMET) metrics, while extending to multiple language pairs, including French, Spanish, German, Hindi, Bengali, and Urdu. Furthermore, we incorporate timbre-aware speech synthesis to preserve speaker information, enabling DS2ST-LM to surpass prior direct S2ST systems in both speaker similarity and perceptual naturalness.

2601.16014 2026-01-23 eess.SY cs.SY

Stability Analysis of Power-Electronics-Dominated Grids Using Scaled Relative Graphs

Eder Baron-Prada, Adolfo Anta, Florian Dörfler

Comments Submitted to possible publication

详情
英文摘要

This paper presents a novel approach to stability analysis for grid-connected converters utilizing Scaled Relative Graphs (SRG). Our method effectively decouples grid and converter dynamics, thereby establishing a comprehensive and efficient framework for evaluating closed-loop stability. Our analysis accommodates both linear and non-linear loads, enhancing its practical applicability. Furthermore, we demonstrate that our stability assessment remains unaffected by angular variations resulting from dq-frame transformations, significantly increasing the method's robustness and versatility. The effectiveness of our approach is validated in several simulation case studies, which illustrate its broad applicability in modern power systems.

2601.16012 2026-01-23 eess.SP

Low-Complexity Sparse Superimposed Coding for Ultra Reliable Low Latency Communications

Yanfeng Zhang, Xi'an Fan, Xu Zhu, Jinkai Zheng, Hui Liang, Weiwei Yang, Tom H. Luan

详情
英文摘要

Sparse superimposed coding (SSC) has emerged as a promising technique for short-packet transmission in ultra-reliable low-latency communication scenarios. However, conventional SSC schemes often suffer from high encoding and decoding complexity due to the use of dense codebook matrices. In this paper, we propose a low-complexity SSC scheme by designing a sparse codebook structure, where each codeword contains only a small number of non-zero elements. The decoding is performed using the traditional multipath matching pursuit algorithm, and the overall complexity is significantly reduced by exploiting the sparsity of the codebook. Simulation results show that the proposed scheme achieves a favorable trade-off between BLER performance and computational complexity, and exhibits strong robustness across different transmission block lengths.

2601.16011 2026-01-23 eess.IV cs.AI

THOR: A Versatile Foundation Model for Earth Observation Climate and Society Applications

Theodor Forgaard, Jarle H. Reksten, Anders U. Waldeland, Valerio Marsocci, Nicolas Longépé, Michael Kampffmeyer, Arnt-Børre Salberg

Comments 25 pages

详情
英文摘要

Current Earth observation foundation models are architecturally rigid, struggle with heterogeneous sensors and are constrained to fixed patch sizes. This limits their deployment in real-world scenarios requiring flexible computeaccuracy trade-offs. We propose THOR, a "computeadaptive" foundation model that solves both input heterogeneity and deployment rigidity. THOR is the first architecture to unify data from Copernicus Sentinel-1, -2, and -3 (OLCI & SLSTR) satellites, processing their native 10 m to 1000 m resolutions in a single model. We pre-train THOR with a novel randomized patch and input image size strategy. This allows a single set of pre-trained weights to be deployed at inference with any patch size, enabling a dynamic trade-off between computational cost and feature resolution without retraining. We pre-train THOR on THOR Pretrain, a new, large-scale multi-sensor dataset and demonstrate state-of-the-art performance on downstream benchmarks, particularly in data-limited regimes like the PANGAEA 10% split, validating that THOR's flexible feature generation excels for diverse climate and society applications.

2601.15973 2026-01-23 eess.SP cs.IT math.IT

Performance Scaling Laws for PD Array-based Receivers in IM/DD Optical Wireless Communication Systems

Aravindh Krishnamoorthy, Robert Schober, Harald Haas

Comments 5 pages, 4 figures. This work has been submitted to the IEEE for possible publication

详情
英文摘要

We study the performance scaling laws for electrical-domain combining in photodetector (PD) array-based receivers employing intensity modulation and direct detection, taking into account the inherent square-law relationship between the optical and electrical received powers. The performance of PD array-based systems is compared, in terms of signal-to-noise ratio (SNR) and achievable rate, to that of a reference receiver employing a single PD. Analytical and numerical results show that PD arrays provide performance gains for sufficiently narrow beams and above an SNR threshold. Furthermore, increasing the number of PDs alone does not enhance performance, and joint optimization of beam pattern, transverse electromagnetic mode, received power, and PD positions is necessary. Our model and derived insights provide practical guidelines and highlight the trade-offs for the design of next-generation high-bandwidth PD array receivers.

2601.15952 2026-01-23 eess.SP q-bio.QM

Reconstructing Patched or Partial Holograms to allow for Whole Slide Imaging with a Self-Referencing Holographic Microscope

Philip Groult, Julia D. Sistermanns, Ellen Emken, Oliver Hayden, Wolfgang Utschick

Comments \c{opyright} 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情
英文摘要

The last decade has seen significant advances in computer-aided diagnostics for cytological screening, mainly through the improvement and integration of scanning techniques such as whole slide imaging (WSI) and the combination with deep learning. Simultaneously, new imaging techniques such as quantitative phase imaging (QPI) are being developed to capture richer cell information with less sample preparation. So far, the two worlds of WSI and QPI have not been combined. In this work, we present a reconstruction algorithm which makes whole slide imaging of cervical smears possible by using a self-referencing three-wave digital holographic microscope. Since a WSI is constructed by combining multiple patches, the algorithm is adaptive and can be used on partial holograms and patched holograms. We present the algorithm for a single shot hologram, the adaptations to make it flexible to various inputs and show that the algorithm performs well for the tested epithelial cells. This is a preprint of our paper, which has been accepted for publication in 2026 IEEE International Symposium on Biomedical Imaging (ISBI).

2601.15872 2026-01-23 cs.SD cs.CV cs.LG cs.MM eess.AS

PF-D2M: A Pose-free Diffusion Model for Universal Dance-to-Music Generation

Jaekwon Im, Natalia Polouliakh, Taketo Akama

Comments 4 pages, 2 figures

详情
英文摘要

Dance-to-music generation aims to generate music that is aligned with dance movements. Existing approaches typically rely on body motion features extracted from a single human dancer and limited dance-to-music datasets, which restrict their performance and applicability to real-world scenarios involving multiple dancers and non-human dancers. In this paper, we propose PF-D2M, a universal diffusion-based dance-to-music generation model that incorporates visual features extracted from dance videos. PF-D2M is trained with a progressive training strategy that effectively addresses data scarcity and generalization challenges. Both objective and subjective evaluations show that PF-D2M achieves state-of-the-art performance in dance-music alignment and music quality.

2601.15863 2026-01-23 eess.SP

Time-Varying Rician K-factor in Measured Vehicular Channels at cmWave and mmWave Bands

Faruk Pasic, Markus Hofer, Thomas Zemen, Andreas F. Molisch, Christoph F. Mecklenbräuker

Comments Published at the 19th European Conference on Antennas and Propagation (EuCAP), 2025

Journal ref 2025 19th European Conference on Antennas and Propagation (EuCAP), 2025

详情
英文摘要

Future vehicular communication systems will integrate millimeter wave (mmWave) technology to enhance data transmission rates. To investigate the propagation effects and small-scale fading differences between mmWave and conventional centimeter wave (cmWave) bands, multi-band channel measurements have to be conducted. One key parameter to characterize small-scale fading is the Rician K-factor. In this paper, we analyze the time-varying K-factor of vehicle-to-infrastructure (V2I) channels across multiple frequency bands, measured in an urban street environment. Specifically, we investigate three frequency bands with center frequencies of 3.2 GHz, 34.3 GHz and 62.35 GHz using measurement data with 155.5 MHz bandwidth and a sounding repetition rate of 31.25 μs. Furthermore, we analyze the relationship between K-factor and root-mean-square (RMS) delay spread. We show that the Ricean K-factor is similar at different frequency bands and that is correlated with the RMS delay spread.

2601.15831 2026-01-23 eess.SP

Performance Analysis of Digital Beamforming mmWave MIMO with Low-Resolution DACs/ADCs

Faruk Pasic, Mariam Mussbah, Stefan Schwarz, Markus Rupp, Fredrik Tufvesson, Christoph F. Mecklenbräuker

Comments Published at the IEEE Radio and Antenna Days of the Indian Ocean (RADIO), 2025

Journal ref 2025 IEEE Radio and Antenna Days of the Indian Ocean (RADIO), 2025

详情
英文摘要

Future wireless communications will rely on multiple-input multiple-output (MIMO) beamforming operating at millimeter wave (mmWave) frequency bands to deliver high data rates. To support flexible spatial processing and meet the demands of latency critical applications, it is essential to use fully digital mmWave MIMO beamforming, which relies on accurate channel estimation. However, ensuring power efficiency in fully digital mmWave MIMO systems requires the use of low-resolution digital-to-analog converters (DACs) and analog-to-digital converters (ADCs). The reduced resolution of these quantizers introduces distortion in both transmitted and received signals, ultimately degrading system performance. In this paper, we investigate the channel estimation performance of mmWave MIMO systems employing fully digital beamforming with low-resolution quantization, under practical system constraints. We evaluate the system performance in terms of spectral efficiency (SE) and energy efficiency (EE). Simulation results demonstrate that a moderate quantization resolutions of 4-bit per DAC/ADC offers a favorable trade-off between energy consumption and achievable data rate.

2601.15819 2026-01-23 eess.SP

Dual-Mapping Sparse Vector Transmission for Short Packet URLLC

Yanfeng Zhang, Xu Zhu, Jinkai Zheng, Weiwei Yang, Xianhua Yu, Haiyong Zeng, Yujie Liu, Yong Liang Guan

详情
英文摘要

Sparse vector coding (SVC) is a promising short-packet transmission method for ultra reliable low latency communication (URLLC) in next generation communication systems. In this paper, a dual-mapping SVC (DM-SVC) based short packet transmission scheme is proposed to further enhance the transmission performance of SVC. The core idea behind the proposed scheme lies in mapping the transmitted information bits onto sparse vectors via block and single-element sparse mappings. The block sparse mapping pattern is able to concentrate the transmit power in a small number of non-zero blocks thus improving the decoding accuracy, while the single-element sparse mapping pattern ensures that the code length does not increase dramatically with the number of transmitted information bits. At the receiver, a two-stage decoding algorithm is proposed to sequentially identify non-zero block indexes and single-element non-zero indexes. Extensive simulation results verify that proposed DM-SVC scheme outperforms the existing SVC schemes in terms of block error rate and spectral efficiency.

2601.15816 2026-01-23 eess.SY cs.AI cs.SY

Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents

Shiqi Wei, Qiqing Wang, Kaidi Yang

详情
英文摘要

Adaptive traffic signal control (TSC) has demonstrated strong effectiveness in managing dynamic traffic flows. However, conventional methods often struggle when unforeseen traffic incidents occur (e.g., accidents and road maintenance), which typically require labor-intensive and inefficient manual interventions by traffic police officers. Large Language Models (LLMs) appear to be a promising solution thanks to their remarkable reasoning and generalization capabilities. Nevertheless, existing works often propose to replace existing TSC systems with LLM-based systems, which can be (i) unreliable due to the inherent hallucinations of LLMs and (ii) costly due to the need for system replacement. To address the issues of existing works, we propose a hierarchical framework that augments existing TSC systems with LLMs, whereby a virtual traffic police agent at the upper level dynamically fine-tunes selected parameters of signal controllers at the lower level in response to real-time traffic incidents. To enhance domain-specific reliability in response to unforeseen traffic incidents, we devise a self-refined traffic language retrieval system (TLRS), whereby retrieval-augmented generation is employed to draw knowledge from a tailored traffic language database that encompasses traffic conditions and controller operation principles. Moreover, we devise an LLM-based verifier to update the TLRS continuously over the reasoning process. Our results show that LLMs can serve as trustworthy virtual traffic police officers that can adapt conventional TSC methods to unforeseen traffic incidents with significantly improved operational efficiency and reliability.

2601.15733 2026-01-23 eess.SP

Bistatic ISAC: Practical Challenges and Solutions

Lucas Giroto, Marcus Henninger, Alexander Felix, Maximilian Bauhofer, Taewon Jeong, Umut Utku Erdem, Stephan ten Brink, Thomas Zwick, Benjamin Nuss, Silvio Mandelli

详情
英文摘要

This article presents and discusses challenges and solutions for practical issues in bistatic integrated sensing and communication (ISAC) in 6G networks. Considering orthogonal frequency-division multiplexing as the adopted waveform, a discussion on system design aiming to achieve both a desired sensing key performance indicators and limit the impact of hardware impairments is presented. In addition, signal processing techniques to enable over-the-air synchronization and generation of periodograms with range, Doppler shift, and angular information are discussed. Simulation results are then presented for a cellular-based ISAC scenario considering system parameterization compliant to current 5G and, finally, a discussion on open challenges for future deployments is presented.

2601.15729 2026-01-23 cs.RO cs.AI cs.SY eess.SY

DualShield: Safe Model Predictive Diffusion via Reachability Analysis for Interactive Autonomous Driving

Rui Yang, Lei Zheng, Ruoyu Yao, Jun Ma

Comments 8 pages, 5 figures

详情
英文摘要

Diffusion models have emerged as a powerful approach for multimodal motion planning in autonomous driving. However, their practical deployment is typically hindered by the inherent difficulty in enforcing vehicle dynamics and a critical reliance on accurate predictions of other agents, making them prone to safety issues under uncertain interactions. To address these limitations, we introduce DualShield, a planning and control framework that leverages Hamilton-Jacobi (HJ) reachability value functions in a dual capacity. First, the value functions act as proactive guidance, steering the diffusion denoising process towards safe and dynamically feasible regions. Second, they form a reactive safety shield using control barrier-value functions (CBVFs) to modify the executed actions and ensure safety. This dual mechanism preserves the rich exploration capabilities of diffusion models while providing principled safety assurance under uncertain and even adversarial interactions. Simulations in challenging unprotected U-turn scenarios demonstrate that DualShield significantly improves both safety and task efficiency compared to leading methods from different planning paradigms under uncertainty.

2601.15676 2026-01-23 cs.SD cs.LG eess.AS

Bridging the Perception Gap: A Lightweight Coarse-to-Fine Architecture for Edge Audio Systems

Hengfan Zhang, Yueqian Lin, Hai Helen Li, Yiran Chen

Comments 10 pages, 3 figures, 2 tables. Preprint

详情
英文摘要

Deploying Audio-Language Models (Audio-LLMs) on edge infrastructure exposes a persistent tension between perception depth and computational efficiency. Lightweight local models tend to produce passive perception - generic summaries that miss the subtle evidence required for multi-step audio reasoning - while indiscriminate cloud offloading incurs unacceptable latency, bandwidth cost, and privacy risk. We propose CoFi-Agent (Tool-Augmented Coarse-to-Fine Agent), a hybrid architecture targeting edge servers and gateways. It performs fast local perception and triggers conditional forensic refinement only when uncertainty is detected. CoFi-Agent runs an initial single-pass on a local 7B Audio-LLM, then a cloud controller gates difficult cases and issues lightweight plans for on-device tools such as temporal re-listening and local ASR. On the MMAR benchmark, CoFi-Agent improves accuracy from 27.20% to 53.60%, while achieving a better accuracy-efficiency trade-off than an always-on investigation pipeline. Overall, CoFi-Agent bridges the perception gap via tool-enabled, conditional edge-cloud collaboration under practical system constraints.

2601.15653 2026-01-23 eess.AS eess.SP

Distributed Multichannel Active Noise Control with Asynchronous Communication

Junwei Ji, Dongyuan Shi, Boxiang Wang, Ziyi Yang, Haowen Li, Woon-Seng Gan

详情
英文摘要

Distributed multichannel active noise control (DMCANC) offers effective noise reduction across large spatial areas by distributing the computational load of centralized control to multiple low-cost nodes. Conventional DMCANC methods, however, typically assume synchronous communication and require frequent data exchange, resulting in high communication overhead. To enhance efficiency and adaptability, this work proposes an asynchronous communication strategy where each node executes a weight-constrained filtered-x LMS (WCFxLMS) algorithm and independently requests communication only when its local noise reduction performance degrades. Upon request, other nodes transmit the weight difference between their local control filter and the center point in WCFxLMS, which are then integrated to update both the control filter and the center point. This design enables nodes to operate asynchronously while preserving cooperative behavior. Simulation results demonstrate that the proposed asynchronous communication DMCANC (ACDMCANC) system maintains effective noise reduction with significantly reduced communication load, offering improved scalability for heterogeneous networks.

2601.15626 2026-01-23 eess.SY cs.AI cs.SY

Bridging Qualitative Rubrics and AI: A Binary Question Framework for Criterion-Referenced Grading in Engineering

Lili Chen, Winn Wing-Yiu Chow, Stella Peng, Bencheng Fan, Sachitha Bandara

Comments Proceedings of the 36th Annual Conference of the Australasian Association for Engineering Education (AAEE 2025)

详情
英文摘要

PURPOSE OR GOAL: This study investigates how GenAI can be integrated with a criterion-referenced grading framework to improve the efficiency and quality of grading for mathematical assessments in engineering. It specifically explores the challenges demonstrators face with manual, model solution-based grading and how a GenAI-supported system can be designed to reliably identify student errors, provide high-quality feedback, and support human graders. The research also examines human graders' perceptions of the effectiveness of this GenAI-assisted approach. ACTUAL OR ANTICIPATED OUTCOMES: The study found that GenAI achieved an overall grading accuracy of 92.5%, comparable to two experienced human graders. The two researchers, who also served as subject demonstrators, perceived the GenAI as a helpful second reviewer that improved accuracy by catching small errors and provided more complete feedback than they could manually. A central outcome was the significant enhancement of formative feedback. However, they noted the GenAI tool is not yet reliable enough for autonomous use, especially with unconventional solutions. CONCLUSIONS/RECOMMENDATIONS/SUMMARY: This study demonstrates that GenAI, when paired with a structured, criterion-referenced framework using binary questions, can grade engineering mathematical assessments with an accuracy comparable to human experts. Its primary contribution is a novel methodological approach that embeds the generation of high-quality, scalable formative feedback directly into the assessment workflow. Future work should investigate student perceptions of GenAI grading and feedback.

2601.15622 2026-01-23 eess.SY cs.SY

Design, Modelling, and Control of Magnetic Ball Suspension System

Sampson E. Nwachukwu

Comments 8 pages

详情
英文摘要

This paper presents the modeling, control design, and performance analysis of a Magnetic Ball Suspension System (MBSS), a nonlinear and inherently unstable electromechanical system used in various precision applications. The system's primary objective is to levitate a steel ball using electromagnetic force without physical contact, thereby eliminating frictional losses. A comprehensive state-space model was developed, capturing both the mechanical and electrical dynamics. The equilibrium points of the system were determined through feedback linearization using the Jacobian matrix. To ensure system stability, controllability and observability analyses were conducted, confirming that state feedback and observer-based control strategies could be effectively implemented. Three distinct control methods were explored: pole placement-based state feedback control, full-order observer design, and optimal state feedback control using the Linear Quadratic Regulator (LQR). Each control strategy was validated through Simulink simulations for both linearized and nonlinear models. Simulation results demonstrated that the linearized system consistently achieved desired performance with minimal oscillations, whereas the nonlinear system exhibited significant transient oscillations before stabilization. The full-order observer enhanced estimation accuracy, enabling effective control where direct state measurement was impractical. The LQR-based control offered improved robustness and minimized control effort, though its performance was comparable to standard state feedback in some cases.

2601.15621 2026-01-23 cs.SD cs.CL eess.AS

Qwen3-TTS Technical Report

Hangrui Hu, Xinfa Zhu, Ting He, Dake Guo, Bin Zhang, Xiong Wang, Zhifang Guo, Ziyue Jiang, Hongkun Hao, Zishan Guo, Xinyu Zhang, Pei Zhang, Baosong Yang, Jin Xu, Jingren Zhou, Junyang Lin

Comments https://github.com/QwenLM/Qwen3-TTS

详情
英文摘要

In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of-the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech. Trained on over 5 million hours of speech data spanning 10 languages, Qwen3-TTS adopts a dual-track LM architecture for real-time synthesis, coupled with two speech tokenizers: 1) Qwen-TTS-Tokenizer-25Hz is a single-codebook codec emphasizing semantic content, which offers seamlessly integration with Qwen-Audio and enables streaming waveform reconstruction via a block-wise DiT. 2) Qwen-TTS-Tokenizer-12Hz achieves extreme bitrate reduction and ultra-low-latency streaming, enabling immediate first-packet emission ($97\,\mathrm{ms}$) through its 12.5 Hz, 16-layer multi-codebook design and a lightweight causal ConvNet. Extensive experiments indicate state-of-the-art performance across diverse objective and subjective benchmark (e.g., TTS multilingual test set, InstructTTSEval, and our long speech test set). To facilitate community research and development, we release both tokenizers and models under the Apache 2.0 license.

2601.15602 2026-01-23 eess.SP cs.IT math.IT

Does 6G Need a New Waveform: Comparing Zak-OTFS with CP-OFDM

Imran Ali Khan, Saif Khan Mohammed, Ronny Hadani, Ananthanarayanan Chockalingam, Robert Calderbank, Anton Monk, Shachar Kons, Shlomo Rakib, Yoav Hebron

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Across the world, there is growing interest in new waveforms, Zak-OTFS in particular, and over-the-air implementations are starting to appear. The choice between OFDM and Zak-OTFS is not so much a choice between waveforms as it is an architectural choice between preventing inter-carrier interference (ICI) and embracing ICI. In OFDM, once the Input-Output (I/O) relation is known, equalization is relatively simple, at least when there is no ICI. However, in the presence of ICI the I/O relation is non-predictable and its acquisition is non-trivial. In contrast, equalization is more involved in Zak-OTFS due to inter-symbol-interference (ISI), however the I/O relation is predictable and its acquisition is simple. {Zak-OTFS exhibits superior performance in doubly-spread 6G use cases with high delay/Doppler channel spreads (i.e., high mobility and/or large cells), but architectural choice is governed by the typical use case, today and in the future. What is typical depends to some degree on geography, since large delay spread is a characteristic of large cells which are the rule rather than the exception in many important wireless markets.} This paper provides a comprehensive performance comparison of cyclic prefix OFDM (CP-OFDM) and Zak-OTFS across the full range of 6G propagation environments. The performance results provide insights into the fundamental architectural choice.

2601.15597 2026-01-23 cs.LG eess.SP

Neural Nonlinear Shrinkage of Covariance Matrices for Minimum Variance Portfolio Optimization

Liusha Yang, Siqi Zhao, Shuqi Chai

详情
英文摘要

This paper introduces a neural network-based nonlinear shrinkage estimator of covariance matrices for the purpose of minimum variance portfolio optimization. It is a hybrid approach that integrates statistical estimation with machine learning. Starting from the Ledoit-Wolf (LW) shrinkage estimator, we decompose the LW covariance matrix into its eigenvalues and eigenvectors, and apply a lightweight transformer-based neural network to learn a nonlinear eigenvalue shrinkage function. Trained with portfolio risk as the loss function, the resulting precision matrix (the inverse covariance matrix) estimator directly targets portfolio risk minimization. By conditioning on the sample-to-dimension ratio, the approach remains scalable across different sample sizes and asset universes. Empirical results on stock daily returns from Standard & Poor's 500 Index (S&P500) demonstrate that the proposed method consistently achieves lower out-of-sample realized risk than benchmark approaches. This highlights the promise of integrating structural statistical models with data-driven learning.

2601.15596 2026-01-23 cs.SD cs.AI eess.AS

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice

Leying Zhang, Tingxiao Zhou, Haiyang Sun, Mengxiao Bi, Yanmin Qian

详情
英文摘要

While modern Text-to-Speech (TTS) systems achieve high fidelity for read-style speech, they struggle to generate Autonomous Sensory Meridian Response (ASMR), a specialized, low-intensity speech style essential for relaxation. The inherent challenges include ASMR's subtle, often unvoiced characteristics and the demand for zero-shot speaker adaptation. In this paper, we introduce DeepASMR, the first framework designed for zero-shot ASMR generation. We demonstrate that a single short snippet of a speaker's ordinary, read-style speech is sufficient to synthesize high-fidelity ASMR in their voice, eliminating the need for whispered training data from the target speaker. Methodologically, we first identify that discrete speech tokens provide a soft factorization of ASMR style from speaker timbre. Leveraging this insight, we propose a two-stage pipeline incorporating a Large Language Model (LLM) for content-style encoding and a flow-matching acoustic decoder for timbre reconstruction. Furthermore, we contribute DeepASMR-DB, a comprehensive 670-hour English-Chinese multi-speaker ASMR speech corpus, and introduce a novel evaluation protocol integrating objective metrics, human listening tests, LLM-based scoring and unvoiced speech analysis. Extensive experiments confirm that DeepASMR achieves state-of-the-art naturalness and style fidelity in ASMR generation for anyone of any voice, while maintaining competitive performance on normal speech synthesis.

2601.15584 2026-01-23 eess.SP

Amalgamated CHIRP and OFDM for ISAC

Pankaj Kumar, Mohammed El-Hajjar, Ibrahim A. Hemadeh, Yasser Mestrah, Suraj Srivastava, Aditya K. Jagannatham, Lajos Hanzo

详情
英文摘要

Integrated Sensing and Communication (ISAC) requires the development of a waveform capable of efficiently supporting both communication and sensing functionalities. This paper proposes a novel waveform that combines the benefits of both the orthogonal frequency division multiplexing (OFDM) and the chirp waveforms to improve both the communication and sensing performance within an ISAC framework. Hence, a new architecture is proposed that utilizes the conventional communication framework while leveraging the parameters sensed at the receiver (Rx) for enhancing the communication performance. We demonstrate that the affine addition of OFDM and chirp signals results in a near constant-envelope OFDM waveform, which effectively reduces the peak-to-average power ratio (PAPR), a key limitation of traditional OFDM systems. Using the OFDM framework for sensing in the conventional fashion requires the allocation of some resources for sensing, which in turn reduces communication performance. As a remedy, the proposed affine amalgam facilitates sensing through the chirp waveform without consuming communication resources, thereby preserving communication efficiency. Furthermore, a novel technique of integrating the chirp signal into the OFDM framework at the slot-level is proposed to enhance the accuracy of range estimation. The results show that the OFDM signal incorporated with chirp has better autocorrelation properties, improved root mean square error (RMSE) of range and velocity, and lower PAPR. Finally, we characterize the trade-off between communications and sensing performance.