arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.00777 2026-05-04 cs.SD cs.CL eess.AS

LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Venkata Pushpak Teja Menta

Comments 7 pages, 2 figures, 2 tables. Code, model, and datasets at https://github.com/praxelhq/lase

详情
英文摘要

A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pair Western-accented voice corpus across English, Hindi, Telugu, and Tamil, WavLM-base-plus-sv loses 0.082 absolute cosine similarity when the same voice changes script and ECAPA-TDNN loses 0.105. On a 1369-pair Indian-accented voice corpus, the gap shrinks to 0.006 (WavLM-SV) and 0.044 (ECAPA-TDNN). The leak is largest where it matters most for cross-script TTS: when a system projects a non-Indic-trained voice into Indic scripts. We present LASE (Language-Adversarial Speaker Encoder), a small projection head over frozen WavLM-base-plus trained with two losses: a supervised contrastive loss over voice identity, and a gradient-reversal cross-entropy against a 4-language classifier that pushes the embedding to be language-uninformative while remaining speaker-informative. Trained on 1118 quality-gated cross-script pairs synthesised from 8 commercial multilingual voices, LASE's residual gap is consistent with zero on both corpora (Delta = 0.013 Western, Delta = 0.026 Indian; both bootstrap 95% CIs include zero) and amplifies the cross-script-vs-floor margin 2.4-2.7x over both baselines. An ECAPA+GRL ablation shows the GRL objective improves either backbone but the WavLM choice contributes too. In synthetic multi-speaker diarisation, LASE matches ECAPA-TDNN on cross-script speaker recall (0.788 vs 0.789) with ~100x less training data. We release the r1 checkpoint, both corpora, and the bootstrap recipe.

2605.00769 2026-05-04 eess.SY cs.SY

Voltage Ride-Through in Large Loads- A Dual PQ Approach

Amir Norouzi, Michael Morel

Comments 10 pages

详情
英文摘要

This paper provides a detailed investigation of voltage ride-through in large loads, such as Artificial Intelligence data centers. Voltage ride-through capability of large loads during transient disturbances in the power grid is important because of the potential impact on the stability and reliability of the Bulk Power System. A mathematical analysis is presented and it is shown how the traditional approach, based on reactive power compensation, may not be adequate for voltage ride-through in large loads. Ultimately, due to capacity limits of the load's power distribution infrastructure and grid's constraints, there is a limit to using reactive power as a corrective tool. A new dual active and reactive power (PQ) approach is proposed in which non-grid resources with dynamic P and Q capabilities are shown to be needed to help with voltage ride-through. Additionally, the analysis illustrates that at extreme voltage dips in the power grid maintaining an acceptable level of load voltage can become practically or theoretically unattainable, which may lead to the load's disconnection from the grid. Analytical results are provided with practical numerical examples.

2605.00761 2026-05-04 cs.IT eess.SP math.IT

The Benefit of Decoder-Provided Pilots in Highly Dynamic Channels

Duschia Bodet, Muriel Médard, Muralidhar Rangaswamy, Ken Duffy

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Communications in highly dynamic channels relying on training-based channel estimation experience a trade-off between increasing channel measurement accuracy by sending more frequent training sequences and increasing data rate by sending fewer training sequences. Simultaneously, most communication systems use forward error correction to enable error detection and correction at the receiver. This paper presents decoder-provided pilots for time-varying channels by using decoded codewords as training sequences to update the channel estimate at the receiver. In contrast to approaches such as data-aided channel estimation, decision-feedback equalization, joint channel estimation and error correction, and turbo equalization, the decoder-provided pilots approach is non-iterative, which is ideal for low-latency requirements in highly dynamic scenarios. Furthermore, it is modulation-, code-, and decoder-agnostic, meaning it can be implemented on top of virtually any communication system that uses forward error correction. From an information-theoretic perspective, we derive the fundamental limits of decoder-provided pilots' ability to simultaneously sense the channel and transmit data. Simulation results demonstrate that decoder-provided pilots significantly improve performance, that when coding across frequency, soft-output can further enhance performance, and that when coding across time, short codes can outperform long codes of the same rate in fast-fading channels.

2605.00752 2026-05-04 eess.SY cs.LO cs.SY

HyperCertificates: Verification of Discrete-time Dynamical Systems against HyperLTL Specifications

Vishnu Murali, Amin Falah, Ashutosh Trivedi, Majid Zamani

Comments 24 pages, 3 figures, 1 table

详情
英文摘要

We introduce a functional inductive framework to verify discrete-time dynamical systems against hyperproperties specified as Hyperlinear temporal logic formulae via a notion of HyperCertificates. Unlike linear temporal logic (LTL) formulae which are concerned with individual traces of a system, hyperproperties are properties that are concerned with how the traces of a system relate to one another. HyperLTL is an extension of LTL for hyperproperties, and is useful to describe specifications such as opacity, privacy as well as notions of robustness. Our notion of HyperCertificates consists of a pair of functions, where the first models the lookahead, and the second relies on a combination of barrier and ranking functions. We use closure certificates, to act as a model for this lookahead and then rely on barrier and ranking function arguments modulo this lookahead to provide guarantees against HyperLTL formulae. We demonstrate how our approach is automatable via existing techniques such as sum-of-squares optimization (SOS) and satisfiability modulo theories (SMT) solvers. Finally, we demonstrate our approach on some case studies.

2605.00746 2026-05-04 q-bio.NC eess.SP physics.optics

Functional Connectivity-Guided Band Selection for Motor Imagery Brain-Computer Interfaces

Natália Araújo do Carmo, Aarthy Nagarajan

详情
英文摘要

Reliable control in motor imagery brain-computer interfaces (MI-BCIs) requires the precise decoding of user-specific neural rhythms, which vary significantly across individuals. The Common Spatial Pattern (CSP) algorithm is a cornerstone of MI-BCI decoding, yet its performance depends strongly on the spectral range of the input EEG data. Although Filter Bank CSP (FBCSP) extends this as a data-driven decoding framework, its frequency sub-bands are predefined rather than selected using subject-specific physiological criteria. This paper presents a proof-of-concept study of static functional connectivity (FC)-guided band selection for MI-BCI, demonstrated using a conventional FBCSP-based pipeline. The proposed method identifies the most discriminative spectral bands by calculating phase-based connectivity across four sensorimotor channels using wPLI, PLV, and PLI. Nine bands in a 4-40 Hz filter bank are ranked by the effect size of their hemispheric coupling differences and pruned to the top K bands for feature extraction and classification via FBCSP and a Support Vector Regressor. This framework was tested for K values ranging from 1 to 8 across the BCI Competition IV-2a (n = 9) and OpenBMI (n = 54) datasets. Performance was benchmarked against standard nine-band FBCSP and random ablation to determine the minimum number of bands (K*) required to maintain accuracy within a 2% baseline equivalence zone. Results show FC-guided selection can outperform random ablation and achieve near-baseline performance while reducing required CSP fits by 22.2% to 77.8%. PLV enables the most aggressive dimensionality reduction by prioritizing the μ and low-\b{eta} ranges, while wPLI demonstrates superior inter-session robustness by mitigating volume conduction. These findings establish FC-guided selection as a principled and interpretable alternative to heuristic filter bank designs.

2605.00734 2026-05-04 eess.SY cs.SY

Economic Valuation and Optimal Deployment of Static Synchronous Series Compensators for U.S. Power System Expansion

Wei Ai, Vladimir Dvorkin, Michael T. Craig

Comments 10 pages, 7 figures

详情
英文摘要

Flexible AC Transmission Systems (FACTS), particularly Static Synchronous Series Compensators (SSSC), can improve network transfer capability and complement restricted transmission expansion. Evaluations of FACTS within large-scale, real-world power system planning are currently lacking. This paper develops a capacity expansion model for the contiguous U.S. power system toward 2050, incorporating SSSC-modified linear power flow equations and accounting for impedance feedback in transmission expansion. Cost-optimal system expansion leverages widespread nationwide SSSC deployment on small-to-medium capacity lines and reduces the number of corridors to be reinforced. Overall, SSSCs reduce annualized system costs by $1.9 billion or decrease transmission expansion requirements by 20%. The most advantageous deployments achieving benefit-cost ratios of 59 concentrated in the Midwest, facilitating the delivery of central U.S. wind power to eastern load centers. The value proposition of SSSCs is robust to cost sensitivities and potential competition from HVDC network expansion, and increases under higher demand growth and more stringent decarbonization policies. These findings provide a blueprint for leveraging SSSC deployment in the U.S. power system.

2601.05949 2026-05-04 eess.SY cs.SY math.SP

Generalized Spectral Clustering of Low-Inertia Power Networks

Gerald Ogbonna, C. Lindsay Anderson

Comments This manuscript has been submitted to IEEE Transactions on Power Systems

详情
英文摘要

Large-scale integration of distributed energy resources has led to a rapid increase in the number of controllable devices and a significant change in system dynamics. This has necessitating the shift towards more distributed and scalable control strategies to manage the increasing system complexity. In this work, we address the problem of partitioning a low-inertia power network into dynamically coherent subsystems to facilitate the utilization of distributed control schemes. We show that an embedding of the power network using the spectrum of the linearized synchronization dynamics matrix results in a natural decomposition of the network. We establish the connection between our approach and the broader framework of spectral clustering using the Laplacian matrix of the admittance network. The proposed method is demonstrated on the IEEE 30-bus test system. We consider the robustness of the clusters by analyzing the sensitivity of the small eigenvalues and their corresponding eigenspaces to perturbations caused by variation in the steady-state operating points of the network.

2509.26388 2026-05-04 eess.AS cs.AI cs.CL

Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

Kai-Wei Chang, En-Pei Hu, Chun-Yi Kuan, Wenze Ren, Wei-Chih Chen, Guan-Ting Lin, Yu Tsao, Shao-Hua Sun, Hung-yi Lee, James Glass

Comments Accepted to ICASSP 2026

详情
英文摘要

Conversational Spoken Language Models (SLMs) are emerging as a promising paradigm for real-time speech interaction. However, their capacity of temporal dynamics, including the ability to manage timing, tempo and simultaneous speaking, remains a critical and unevaluated challenge for conversational fluency. To address this gap, we introduce the Game-Time Benchmark, a framework to systematically assess these temporal capabilities. Inspired by how humans learn a language through language activities, Game-Time consists of basic instruction-following tasks and advanced tasks with temporal constraints, such as tempo adherence and synchronized responses. Our evaluation of diverse SLM architectures reveals a clear performance disparity: while state-of-the-art models handle basic tasks well, many contemporary systems still struggle with fundamental instruction-following. More critically, nearly all models degrade substantially under temporal constraints, exposing persistent weaknesses in time awareness and full-duplex interaction. The Game-Time Benchmark provides a foundation for guiding future research toward more temporally-aware conversational AI. Demos and datasets are available on our project website https://ga642381.github.io/Game-Time.

2605.00726 2026-05-04 eess.SY cs.SY

Multi-Regional Traffic Control with Travel and Charging Demand Co-Management

Yixun Wen, Stelios Timotheou, Boli Chen

详情
英文摘要

Urban traffic management is essential for reducing congestion and supporting sustainable mobility. However, the task is becoming more challenging due to the growing penetration of electric vehicles and their charging demands. This paper presents a regional traffic coordination framework that combines route guidance and charging management to improve traffic network efficiency. Regional traffic dynamics are modeled by the macroscopic fundamental diagram, which allows for the analysis of congestion at the system level. The framework jointly optimizes routes and charging decisions, and it also uses demand management to regulate external inflows into the network. A case study on a 16-region urban network demonstrates the effectiveness of the proposed approach.

2605.00721 2026-05-04 cs.SD cs.AI eess.AS eess.SP

Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation

Anton Ratnarajah, Mehmet Ergezer, Arun Nair, Mrudula Athi

Comments Accepted to Generative Data Augmentation for Real-World Signal Processing Applications (GenDA 2025). An ICASSP 2025 Satellite Workshop and IEEE Data Science and Learning Workshop: Room Acoustics and Speaker Distance Estimation Challenge

详情
Journal ref
Generative Data Augmentation for Real-World Signal Processing Applications (GenDA 2025). An ICASSP 2025 Satellite Workshop and IEEE Data Science and Learning Workshop
英文摘要

The Room Acoustics and Speaker Distance Estimation (SDE) Challenge at ICASSP 2025 explores the effectiveness of augmented room impulse response (RIR) data for improving SDE model performance. This challenge at GenDARA involves generating RIRs to supplement sparse datasets and fine-tuning SDE models with the augmented data. We employ the open-source fast diffuse room impulse response generator (FastRIR) conditioned only on speaker and listener locations. We design a quality filter to ensure generated RIR alignment with challenge RIRs, and hyperparameter optimization is employed for model fine-tuning. Our approach reduces the mean absolute error (MAE) of the five positions from 1.66m to 0.6m for GWA rooms and from 2.18m to 0.69m for Treble rooms, with results demonstrating that the augmentation approach significantly improves estimation accuracy, particularly at medium to long distances.

2605.00698 2026-05-04 eess.IV cs.LG

FedKPer: Tackling Generalization and Personalization in Medical Federated Learning via Knowledge Personalization

Zoe Fowler, Ghassan AlRegib

Comments Accepted to IEEE International Conference on Image Processing (ICIP)

详情
英文摘要

Federated learning (FL) holds great potential for medical applications. However, statistical heterogeneity across healthcare institutions poses a major challenge for FL, as the global model struggles both to generalize across unseen patient populations and to adapt to the unique data distributions of individual hospitals. This heterogeneity also exacerbates forgetting at both the global and local level, resulting in previous learned patient patterns to be misclassified after model updates. While prior work has largely treated generalization and personalization as separate challenges, we show that a better balance between the two can be achieved through selective alignment with the global model and a modified aggregation scheme, which together mitigate the effects of statistical heterogeneity. Specifically, we introduce FedKPer, which introduces knowledge personalization into the training stage of each local device. Afterwards, generalization is considered via the global model aggregation process, where local updates that are reliable and label-diverse are emphasized. We evaluate the performance of FedKPer, devising additional metrics that relate to common consequences of forgetting. Overall, we demonstrate FedKPer improves the generalization-personalization trade-off without sacrificing retention.

2605.00681 2026-05-04 eess.SY cs.SY

Deployment-Efficient Short-Term Load Forecasting in AI Data Centers via Sequence-to-Point Knowledge Distillation

Lei Wang, Jiahao Chen, Fanping Sui, Ying Zhang, Di Shi

Comments 7 pages, 4 figures, 3 tables

详情
英文摘要

Accurately forecasting the bursty and non-stationary power demand of AI data centers has become increasingly important, as abrupt workload-driven variations at the GPU-node level can affect real-time operational efficiency, power management, and grid-data center coordination. However, high-capacity forecasting models are often difficult to deploy at scale because of their memory and latency requirements, while lightweight predictors may fail to capture short-horizon temporal dynamics. To address this accuracy-deployment tradeoff, this paper proposes a deployment-efficient knowledge distillation framework for short-term load forecasting in AI data centers. The proposed framework first trains a high-capacity sequence teacher model for multi-step load trajectory prediction, where residual learning is used to improve robustness under non-stationary operating conditions. A lightweight point-wise student model is then developed for low-latency rolling inference using a compact neural network architecture. To transfer temporal knowledge from the teacher to the student, a sequence-to-point distillation strategy is introduced by aligning near-term predictive behavior and temporally pooled representations. Case studies on the MIT Supercloud dataset demonstrate that the proposed student model improves forecasting accuracy over recent deep learning baselines while reducing the deployment footprint by over 10x in parameter memory and model size.

2605.00630 2026-05-04 cs.CV cs.MM eess.IV

CMTA: Leveraging Cross-Modal Temporal Artifacts for Generalizable AI-Generated Video Detection

Hang Wang, Chao Shen, Chenhao Lin, Minghui Yang, Lei Zhang, Cong Wang

Comments 15 pages, 4 figures

详情
英文摘要

The proliferation of advanced AI video synthesis techniques poses an unprecedented challenge to digital video authenticity. Existing AI-generated video (AIGV) detection methods primarily focus on uni-modal or spatiotemporal artifacts, but they overlook the rich cues within the visual-textual cross-modal space, especially the temporal stability of semantic alignment. In this work, we identify a distinctive fingerprint in AIGVs, termed cross-modal temporal artifact (CMTA). Unlike real videos that exhibit natural temporal fluctuations in cross-modal alignment due to semantic variations, AIGVs display unnaturally stable semantic trajectories governed by given input prompts. To bridge this gap, we propose the CMTA framework, a cross-modal detection approach that captures these unique temporal artifacts through joint cross-modal embedding and multi-grained temporal modeling. Specifically, CMTA leverages BLIP to generate frame-level image captions and utilizes CLIP to extract corresponding visual-textual representations. A coarse-grained temporal modeling branch is then designed to characterize temporal fluctuations in cross-modal alignment with a GRU. In parallel, a fine-grained branch is constructed to capture intricate inter-frame variations from integrated visual-textual features with a Transformer encoder. Extensive experiments on 40 subsets across four large-scale datasets, including GenVideo, EvalCrafter, VideoPhy, and VidProM, validate that our approach sets a new state-of-the-art while exhibiting superior cross-generator generalization. Code and models of CMTA will be released at https://github.com/hwang-cs-ime/CMTA

2605.00607 2026-05-04 cs.CL eess.AS

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

Gaofei Shen, Martijn Bentum, Tom Lentz, Afra Alishahi, Grzegorz Chrupała

详情
英文摘要

Probing is widely used to study which features can be decoded from language model representations. However, the common decoding probe approach has two limitations that we aim to solve with our new encoding probe approach: contributions of different features to model representations cannot be directly compared, and feature correlations can affect probing results. We present an Encoding Probe that reverses this direction and reconstructs internal representations of models using interpretable features. We evaluate this method on text and speech transformer models, using feature sets spanning acoustics, phonetics, syntax, lexicon, and speaker identity. Our results suggest that speaker-related effects vary strongly across different training objectives and datasets, while syntactic and lexical features contribute independently to reconstruction. These results show that the Encoding Probe provides a complementary perspective on interpreting model representations beyond decodability.

2605.00585 2026-05-04 eess.SP cs.NA math.NA

Local Geometry of Least Squares for Unmixing Signals with Parameter-Dependent Dictionaries

Santos Michelena, Maxime Ferreira Da Costa, José Picheral

Comments 13 pages, 11 figures. Submitted to IEEE TSP

详情
英文摘要

Modeling signals as linear combinations of atoms from a dictionary is ubiquitous in modern signal processing. In the finite-dimensional setting, whenever atoms depend nonlinearly upon unknown parameters, the signal model is said to be separable. In this work, we study least-squares reconstruction of separable signals and establish a unified theoretical framework for their analysis. We introduce the unmixing metric, a distance that captures the distinct roles and sensitivities of linear and nonlinear parameters, and establish local convergence and stability guarantees under its topology. We then analyze variable projection from a geometric perspective, showing that it corresponds to restricting the optimization to the manifold of optimal linear parameters. This viewpoint provides a principled explanation for the improved algorithmic behavior of variable projection observed in practice, and produces sharp theoretical guarantees. The generic theory for separable problems is specialized to the case of point spread function (PSF) unmixing. We introduce a parametric notion of coherence and show that support separation directly controls both the size of the convergence region and the stability of recovery. Numerical experiments corroborate the theoretical predictions and demonstrate the practical relevance of the proposed framework.

2605.00535 2026-05-04 eess.SP

From Pilot to Precoding Design: Blind Angular Spoofing For Location Privacy in MIMO Systems

Priyanka Maity, Lorenzo Italiano, Alireza Pourafzal, Gonzalo Seco-Granados, Hui Chen, Monica Nicoli, Henk Wymeersch

详情
英文摘要

This paper studies location privacy in uplink MIMO systems, where a user equipment seeks to spoof the angular signature observed by a single base station performing localization. We propose a blind analog precoder design that manipulates the perceived angle-of-arrival and angle-of-departure configuration without requiring channel-gain knowledge. The method enforces consistency between the received signal and a desired spoofed angular subspace, and is solved using an alternating optimization algorithm under practical amplitude constraints. Simulations in a multipath scenario show that the proposed approach achieves near-perfect angular spoofing and clearly outperforms pilot-only blind spoofing, which exhibits an error floor. The results also show a trade-off between spoofing accuracy and communication rate, depending on the chosen virtual geometry.

2605.00527 2026-05-04 eess.IV cs.CV cs.LG

Multi-frame Restoration for High-rate Lissajous Confocal Laser Endomicroscopy

Minhee Lee, Sangyoon Lee, Jiwook Lee, Minki Hong, Kyuyoung Kim, Wonhwa Kim, Jaeho Lee

详情
英文摘要

Lissajous confocal laser endomicroscopy (CLE) is a promising solution for high speed in vivo optical biopsy for handheld scenarios. However, Lissajous scanning traces a resonant trajectory and samples only the visited pixels per frame; at high frame rates, many pixels remain unvisited, creating structured holes. In this work, we introduce the first benchmark for high-rate Lissajous CLE, consisting of low-quality video clips paired with high-quality reference images. The reference images are wide-FOV mosaics obtained by stitching stabilized, slow-scan frames of the same tissue, enabling temporally aligned supervision. Using this dataset, we propose MIRA, a lightweight recurrent framework for Lissajous CLE restoration that iteratively aggregates temporal context through feature reuse and displacement alignment. Our experiments demonstrate that MIRA outperforms both lightweight and high-complexity baselines in restoration quality while maintaining a favorable computational efficiency suitable for clinical deployment.

2605.00494 2026-05-04 eess.AS

Transformer-based End-to-End Control Filter Generation for Active Noise Control

Ziyi Yang, Zhengding Luo, Yisong Zou, Boxiang Wang, Qirui Huang, Woon-Seng Gan

详情
英文摘要

To address the limitations of existing Generative Fixed-Filter Active Noise Control (GFANC) methods, which rely on filter decomposition and recombination and require supervised learning with labeled data, this paper proposes a Transformer-based End-to-End Control-Filter Generation (E2E-CFG) framework. Unlike previous approaches that predict combination weights of sub control filters, the proposed method directly generates control filters in an unsupervised manner by integrating the co-processor and real-time controller into a fully differentiable ANC system, where the accumulated error signal is used as the training objective. By abandoning the decomposition--reconstruction process, the proposed design simplifies the control pipeline and avoids error accumulation, while the Transformer architecture effectively captures global and dynamic noise characteristics through its attention mechanism. Numerical simulations on real-recorded noises demonstrate that the proposed method achieves improved noise reduction performance and adaptability to different types of noises compared with the original GFANC framework.

2605.00486 2026-05-04 eess.SP

Development of Multivariate Attention LSTM Model For Dynamic Line Rating Forecasting

Anushka Bandara, Sahan Siriwardena, Akila Wijethunge, Janaka Ekanayake

详情
英文摘要

As global fossil fuel reserves diminish, there's a growing impetus for nations to transition towards renewable energy sources. Sri Lanka, for instance, aims to generate 70% of its electricity from renewable sources by 2030. Achieving this target requires optimal use of the existing power transmission infrastructure, as expanding the grid is both time-consuming and expensive. Traditionally, Static Line Ratings (SLRs) are used to define line capacity, often resulting in underutilization. Dynamic Line Rating (DLR), which estimates line capacity in real time based on weather conditions, offers a more efficient solution. However, DLR prediction is highly sensitive to environmental variability and forecasting complexity. This study proposes a novel multivariate Long Short-Term Memory (LSTM) model enhanced with an attention mechanism for improved DLR forecasting. Unlike traditional models that treat weather variables independently, the proposed approach captures nonlinear interdependencies among key environmental features such as ambient temperature, cable temperature, wind speed, humidity, and solar irradiance. The attention mechanism dynamically prioritizes the most relevant inputs during forecasting, leading to improved performance. Experimental evaluation on real-world DLR data demonstrates that the proposed model achieves a prediction accuracy of 95.84%, surpassing the conventional LSTM model's 94.62%. This improvement highlights the model's superior ability to deliver accurate and robust DLR forecasts. The findings confirm that incorporating multivariate features with attention enhances forecasting precision, supporting more efficient transmission line utilization and higher renewable energy integration.

2605.00461 2026-05-04 eess.IV cs.CV

Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

Ge Luo, Jun-Jie Huang, Qi Yu, Tianrui Liu, Ke Liang, Yuming Xiang, Wentao Zhao, Xinwang Liu, Meng Wang

详情
英文摘要

Deep Unfolding Network-based methods have emerged as effective solutions for multi-source image fusion by combining model-driven iterative optimization with data-driven deep learning. However, most existing deep unfolding image fusion methods are derived from alternating minimization, which updates the features of different modalities separately. This design introduces considerable computational and memory overhead, limiting deployment on resource-constrained edge devices. To address this issue, we propose CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. Rather than introducing a new sparse coding prior or empirically compressing an existing fusion network, CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.In addition, we design a compact High- and Low-frequency Image Fidelity loss for unsupervised training without ground-truth images. We evaluate CDNet on four tasks, including multi-exposure image fusion, infrared and visible image fusion, medical image fusion, and infrared and visible image fusion for semantic segmentation. Experimental results show that CDNet achieves competitive or superior fusion performance with high efficiency. For infrared and visible image fusion, CDNet outperforms competing methods on four of six metrics on the TNO dataset and five of six metrics on the RoadScene dataset. In particular, it surpasses the second-best method by 1.23 dB and 1.59 dB in PSNR on TNO and RoadScene, respectively.

2605.00458 2026-05-04 cs.LG eess.SP

Federated Learning with Hypergradient-based Online Update of Aggregation Weights

Ayano Nakai-Kasai, Tadashi Wadayama

详情
英文摘要

Federated learning using mobile and Internet of Things devices requires not only the ability to handle heterogeneity of clients' data distributions but also high adaptability to varying communication environments. We propose FedHAW (Federated Learning with Hypergradient-based update of Aggregation Weights) that implements online updates of aggregation weights. FedHAW updates the aggregation weights by using hypergradient, the gradient of the objective function with respect to the weights, which can be calculated with low computational overhead. Simulation results show that the proposed method possesses high generalization performance in heterogeneous environments and high robustness to communication errors.

2605.00449 2026-05-04 cs.IT cs.LG eess.SP math.IT

Soft Graph Diffusion Transformer for MIMO Detection

Nan Jiang, Jiadong Hong, Lei Liu, Xinyu Bian, Wenjie Wang, Zhaoyang Zhang

Comments 6 pages, 4 figures, 2 tables

详情
英文摘要

Learning-based MIMO detection has shown strong empirical performance, yet existing methods typically rely on fixed-depth architectures without explicitly modeling the progressive refinement of symbol estimates. In this paper, we revisit MIMO detection from a flow matching perspective and propose the Soft Graph Diffusion Transformer (SGDiT), which reformulates detection as a noise-level-conditioned denoising process that progressively transforms a Gaussian initialization toward the posterior conditioned on channel observations. An adaptive layer normalization (AdaLN)-conditioned soft graph transformer is employed to parameterize the denoising dynamics, enabling stage-aware information integration between observation and symbol domains. To better align with the discrete nature of symbol detection, we further adopt a cross-entropy-based training objective that directly models bit-wise posterior probabilities, providing a more suitable inductive bias than conventional regression-based formulations. Experimental results across various MIMO system configurations demonstrate that SGDiT achieves competitive bit error rate (BER) performance compared with representative baselines. Furthermore, the proposed model exhibits good generalization capability across different channel conditions. Overall, the SGDiT framework provides an effective and practical approach for neural MIMO detection.

2605.00448 2026-05-04 cs.CV eess.IV

Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis

Shadid Yousuf, S. M. Mahbubur Rahman, Mohammed Imamul Hassan Bhuiyan

详情
英文摘要

The deployment of artificial intelligence in medical imaging is hindered by high computational complexity and resource-intensive processing of volumetric data. Although chest computed tomography (CT) volumes offer richer diagnostic information than projection radiography, their use in AI-based diagnosis remains limited due to the computational burden of processing uncompressed volumetric images (typically stored in NIfTI or DICOM format). Addressing the growing need for low-resource deployment and efficient electronic data transfer, we investigate the utilization of JPEG-compressed chest CT volumes for thoracic abnormality detection. We propose Feature Attention Style Transfer (FAST), a novel distillation framework that transfers both activation patterns and structural relationships from high-fidelity CT representations to a spatiotemporal visual encoder operating on compressed inputs. By combining Gram-matrix-based attention style preservation with dual-attention feature alignment, FAST enables robust feature extraction from degraded volumes. Furthermore, we introduce Structured Factorized Projection (SFP), leveraging Block Tensor Train decomposition as a parameter-efficient alternative to dense projection layers, reducing projection-head parameters by almost half. Our contrastive learning pipeline, CT-Lite, integrates these components with a SigLIP-based multimodal alignment objective. Experiments on CT-RATE, NIDCH, and Rad-ChestCT demonstrate that CT-Lite achieves AUROC within 5-7\% of the uncompressed-input baseline across all three datasets, despite operating on compressed inputs with significantly fewer parameters, paving the way for AI-based clinical evaluation under resource constraints.

2605.00431 2026-05-04 cs.SD cs.CV cs.LG eess.AS

MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation

Akira Takahashi, Ryosuke Sawata, Shusuke Takahashi, Yuki Mitsufuji

Comments Accepted to the CVPR 2026 Sight and Sound Workshop

详情
英文摘要

Although recent video-to-audio (V2A) models excelled at synthesizing semantically plausible sounds from visual inputs, they do not explicitly model room-acoustic effects such as reverberation or room impulse responses (RIRs), and thus offer limited controllability over these effects. However, we hypothesize that such V2A models implicitly have semantic knowledge of the relationship between spatial audio and the corresponding vision cues. In this paper, we revisit a V2A model for the sake of the above, and propose the way to utilize the pretrained model as prior for physically grounded room-acoustic processing. Based on one of the state-of-the-art V2A models, MMAudio, we propose MMAudioReverbs that is a unified framework dealing with i) dereverberation and ii) room impulse response (RIR) estimation without network architectural modification, and fine-tuned on a small dataset. Experimental results showed that audio and visual cues respectively have advantage depending on the type of physical room acoustics. It implies that foundation V2A models can be used for physically grounded room-acoustic analysis.

2605.00428 2026-05-04 stat.ME cs.PF cs.SY eess.SY

How to Do Statistical Evaluations in ECE/CS Papers: A Practical Playbook for Defensible Results

Bhaskar Krishnamachari

Comments 30 pages, 8 figures; Tutorial paper; companion student workbook and claude skill available as ancillary material

详情
英文摘要

Strong experimental papers in electrical and computer engineering and computer science (ECE/CS), especially in systems, networking, and applied machine learning, rest on more than a single impressive number. They rest on a chain of design, measurement, analysis, and validation choices that, taken together, make a result believable. This tutorial is a compact, example-driven guide to that chain for beginning researchers. We organize it as an evaluation workflow: claim, hypothesis, unit of analysis, baseline, regime sweep, uncertainty estimate, validation check, and reporting. Within that workflow we cover the classical statistical foundations (descriptive statistics, the central limit theorem, normal- and $t$-based confidence intervals, Student's $t$-test, ANOVA, chi-squared and Pearson correlation, linear regression) alongside the modern, distribution-free techniques (the bootstrap, Wilcoxon and Mann--Whitney tests, Cliff's delta) that are usually preferred for ECE/CS data. We also discuss factorial design, randomization and blocking, multiple-comparison correction, latency-specific pitfalls, simulation verification and validation, equivalence-style claims, and reproducibility. A running example, a comparison of two job-scheduling algorithms on simulated workloads with truncated heavy-tailed job sizes, threads through the tutorial, with Python snippets the reader can paste and adapt. The paper closes with a pre-submission checklist; companion student-facing material (project-type translation tables, an evaluation-plan worksheet, exercises, and a worked ``bad evaluation autopsy'') is collected in a separate workbook released alongside this paper.

2605.00404 2026-05-04 eess.SY cs.SY

Electric Grid Topology and Admittance Estimation using Phasor Measurements

Norak Rin, Iman Shames, Ian Petersen, Elizabeth Ratnam

详情
英文摘要

Recent advances in precise phasor measurement units are enabling new approaches to estimate distribution and transmission grid parameters in real-time. In this paper, we investigate voltage and current phasor measurement requirements to estimate the electric grid topology and admittance parameters. We show necessary and sufficient conditions for the number of independent operating points (measurements) required to determine the topology and admittance of a completely unknown electric grid. With prior topology information, we also show that there is a minimum number of measurements required to uniquely determine the admittance matrix and corresponding grid topology. In the presence of noisy phasor measurements, we show that the admittance matrix can be estimated using a structured total least squares approach. By means of numerical simulations on the IEEE 13-node distribution feeder, the IEEE 14-node transmission network, and the IEEE 123-node distribution feeder, we demonstrate our approach is suitable for applications in radial and mesh grid topologies in the presence of measurement noise.

2605.00329 2026-05-04 cs.SD eess.AS

Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation

Kuan-Po Huang, Bo-Ru Lu, Byeonggeun Kim, Mihee Lee, Zalan Fabian, Renard Korzeniowski, Qingming Tang, Greg Ver Steeg, Hung-yi Lee, Chieh-Chi Kao, Chao Wang

详情
英文摘要

Autoregressive (AR) models with diffusion heads have recently achieved strong text-to-audio performance, yet their iterative decoding and multi-step sampling process introduce high-latency issues. To address this bottleneck, we propose a one-step sampling framework that combines an energy-distance training objective with representation-level distillation. An energy-scoring head maps Gaussian noise directly to audio latents in one step, eliminating the need for a costly recursive diffusion sampling process, while distillation from a masked autoregressive (MAR) text-to-audio model preserves the strong conditioning learned during diffusion training. On the AudioCaps benchmark, our method consistently outperforms prior one-step baselines such as ConsistencyTTA, SoundCTM, AudioLCM and AudioTurbo, on both objective and subjective metrics, while substantially narrowing the quality gap to AR diffusion systems with multi-step sampling. Compared to the state-of-the-art AR diffusion system, IMPACT, our approach achieves up to $8.5$x faster batch inference with highly competitive audio quality. These results demonstrate that combining energy-distance training with representation-level distillation provides an effective recipe for fast, high-quality text-to-audio synthesis.

2605.00317 2026-05-04 eess.SY cs.SY

Real-Time Neural Distributed Energy Resources Dispatch with Feasibility Guarantees

Jie Zhu, Yinliang Xu, Hongbin Sun

详情
英文摘要

The growing penetration of renewable energy necessitates high-frequency real-time scheduling. While neural network-based surrogates enable computationally efficient scheduling, strictly enforcing nonconvex power flow constraints without external solvers remains a fundamental challenge. To bridge this gap, this letter proposes a solver-free neural dispatch framework with rigorous feasibility guarantees. A convex inner approximation of the DistFlow model is first derived via the convex envelope theorem. Building upon this approximation, a robust optimization-based affine policy is formulated to yield a theoretically certified interior-point mapping rule, which is then embedded within a bisection-based projection scheme to efficiently recover feasibility for infeasible NN outputs without any external solver. Experimental results demonstrate that the proposed method restores feasibility on the order of $10^{-3}$ s while maintaining near-optimal performance.

2605.00306 2026-05-04 cs.IT eess.SP math.IT

Artificial-Noise Aided Design for Movable-Antenna Enabled Physical-Layer Service Integration

Zhifeng Tang, Guangchen Wang, Nan Yang, Xiangyun Zhou, Salman Durrani

详情
英文摘要

This paper pioneers a novel scheme for artificial-noise (AN)-aided movable-antenna (MA)-enabled physical-layer service integration (PLSI) to harmonize the simultaneous delivery of multicast and confidential messages. By jointly exploiting the spatial reconfiguration capability of MAs and the interference shaping capability of AN, we aim to enhance secrecy performance while guaranteeing multicast reliability. The joint design of MA positions and transmit variables results in a highly coupled and non-convex optimization problem. To address this, we first provide key insights into the role of spatial degrees of freedom in AN design. We then characterize the AN direction under a structured transmission design and derive a closed-form expression for the AN-to-confidential power allocation ratio, which significantly simplifies the overall design. To solve the resulting problem, we further develop a low-complexity block coordinate ascent (BCA)-based scheme that alternates between transmit design and MA position optimization. Numerical results demonstrate that the proposed scheme achieves significant secrecy performance gains with low computational complexity and fast convergence, highlighting its effectiveness for MA-enabled PLSI systems.

2605.00258 2026-05-04 cs.IT cs.SY eess.SY math.IT

Joint Accuracy and Confidentiality in Semantic-Aware Secure Remote Reconstruction

Bowen Li, Nikolaos Pappas

详情
英文摘要

In this paper, we consider remote reconstruction over wireless networks when simultaneous accuracy at the legitimate receiver and confidentiality against eavesdropping are required. These two objectives are often treated separately, even though they arise from the same update process and are marginals of a joint reconstruction event. This paper introduces confidential reconstruction accuracy (CRA), a metric to capture the joint event in which the legitimate receiver reconstructs correctly while the eavesdropper fails. Under randomized stationary policies, we develop a three-dimensional stationary analysis and derive closed-form expressions for the long-term average CRA and the optimal transmission probability. The results show that conventional marginal analysis can misidentify the optimal policy and misestimate the achievable simultaneous accuracy-confidentiality performance. They also reveal nontrivial behaviors: more frequent transmissions or better legitimate channels do not necessarily improve joint accurate and confidential reconstruction, and when the eavesdropping channel is strong, improving the legitimate channel alone may be insufficient. Finally, the framework induces the spatial safety boundary in a geofencing setting for secure remote reconstruction.