Low-Bit Quantization of Bandlimited Graph Signals via Iterative Methods
Comments 17 pages, 5 figures
Felix Krahmer, He Lyu, Rayan Saab, Jinna Qian, Anna Veselovska, Rongrong Wang
Comments 17 pages, 5 figures
We study the quantization of real-valued bandlimited signals on graphs, focusing on low-bit representations. We propose iterative noise-shaping algorithms for quantization, including sampling approaches with and without vertex replacement. The methods leverage the spectral properties of the graph Laplacian and exploit graph incoherence to achieve high-fidelity approximations. Theoretical guarantees are provided for the random sampling method, and extensive numerical experiments on synthetic and real-world graphs illustrate the efficiency and robustness of the proposed schemes.
Parampreet Singh, Somya Kumar, Chaitanya Shailendra Nitawe, Vipul Arora
Comments Accepted at NCC 2026 conference
Raga identification in Indian Art Music (IAM) remains challenging due to the presence of numerous rarely performed Ragas that are not represented in available training datasets. Traditional classification models struggle in this setting, as they assume a closed set of known categories and therefore fail to recognise or meaningfully group previously unseen Ragas. Recent works have tried categorizing unseen Ragas, but they run into a problem of catastrophic forgetting, where the knowledge of previously seen Ragas is diminished. To address this problem, we adopt a unified learning framework that leverages both labeled and unlabeled audio, enabling the model to discover coherent categories corresponding to the unseen Ragas, while retaining the knowledge of previously known ones. We test our model on benchmark Raga Identification datasets and demonstrate its performance in categorizing previously seen, unseen, and all Raga classes. The proposed approach surpasses the previous NCD-based pipeline even in discovering the unseen Raga categories, offering new insights into representation learning for IAM tasks.
Sahan Liyanaarachchi, Sennur Ulukus, Nail Akar
Most of the contemporary literature on information freshness solely focuses on the analysis of freshness for martingale estimators, which simply use the most recently received update as the current estimate. While martingale estimators are easier to analyze, they are far from optimal, especially in pull-based update systems, where maximum aposteriori probability (MAP) estimators are known to be optimal, but are analytically challenging. In this work, we introduce a new class of estimators called $p$-MAP estimators, which enable us to model the MAP estimator as a piecewise constant function with finitely many stages, bringing us closer to a full characterization of the MAP estimators when modeling information freshness.
Yulong Zhang, Ying Cui, Zili Meng, Abhishek Kumar, Dirk Kutscher
Comments Accepted to appear in IEEE Transactions on Multimedia (2026)
Journal ref IEEE Transactions on Multimedia, 2026
Large-scale video streaming events attract millions of simultaneous viewers, stressing existing delivery infrastructures. Client-driven adaptation reacts slowly to shared congestion, while server-based coordination introduces scalability bottlenecks and single points of failure. We present COMETS, a coordinated multi-destination video transmission framework that leverages information-centric networking principles such as request aggregation and in-network state awareness to enable scalable, fair, and adaptive rate control. COMETS introduces a novel range-interest protocol and distributed in-network decision process that aligns video quality across receiver groups while minimizing redundant transmissions. To achieve this, we develop a lightweight distributed optimization framework that guides per-hop quality adaptation without centralized control. Extensive emulation shows that COMETS consistently improves bandwidth utilization, fairness, and user-perceived quality of experience over DASH, MoQ, and ICN baselines, particularly under high concurrency. The results highlight COMETS as a practical, deployable approach for next-generation scalable video delivery.
Mohamed A. Mousa, Leif Bauer, Utkarsh Singh, Ziyi Yang, Angshuman Deka, Zubin Jacob
Event-based vision provides high-speed, energy-efficient sensing for applications such as autonomous navigation and motion tracking. However, implementing this technology in the long-wave infrared remains a significant challenge. Traditional infrared sensors are hindered by slow thermal response times or the heavy power requirements of cryogenic cooling. Here, we introduce the first event-based infrared detector operating in a Poisson-counting regime. This is realized with a spintronic Poisson bolometer capable of broadband detection from 0.8-14$μ\text{m}$. In this regime, infrared signals are detected through statistically resolvable changes in stochastic switching events. This approach enables room-temperature operation with high timing resolution. Our device achieves a maximum event rate of 1,250 Hz, surpassing the temporal resolution of conventional uncooled microbolometers by a factor of 4. Power consumption is kept low at 0.2$μ$W per pixel. This work establishes an operating principle for infrared sensing and demonstrates a pathway toward high-speed, energy-efficient, event-driven thermal imaging.
Chong Hyun Lee, Kibae Lee, Hyun Hee Yim
Comments 5 pages, 5 figures
We propose a tensor-based domain alignment (DA) algorithm designed to align source and target tensors within an invariant subspace through the use of alignment matrices. These matrices along with the subspace undergo iterative optimization of which constraint is on oblique manifold, which offers greater flexibility and adaptability compared to the traditional Stiefel manifold. Moreover, regularization terms defined to preserve the variance of both source and target tensors, ensures robust performance. Our framework is versatile, effectively generalizing existing tensor-based DA methods as special cases. Through extensive experiments, we demonstrate that our approach not only enhances DA conversion speed but also significantly boosts classification accuracy. This positions our method as superior to current state-of-the-art techniques, making it a preferable choice for complex domain adaptation tasks.
Zhuangzhuang Cui, Rizqi Hersyandika, Haoqiu Xiong, Sofie Pollin
Comments 4 pages, 6 figures, submitted to URSI-GASS 2026
In the context of integrated sensing and communications (ISAC), this paper presents an experimental investigation of the relationship between monostatic sensing and naturally bistatic communication channels in an indoor millimeter-wave environment. We characterize the propagation channel in the joint delay--angle domain, extract dominant multipath components (MPCs) and associate them with physical scatterers in the environment, and demonstrate how communication MPCs can be explicitly recovered from sensing channels. Finally, the radar cross-sections (RCSs) of two key scatterers, namely the wall and metal plate, are obtained based on calibrated channel power and reconstructed propagation distances.
Akhileswar Chowdary, Ahmad Bazzi, Marwa Chafii
Recent advancements have underscored the relevance of low-resolution analog-to-digital converters (ADCs) in integrated sensing and communication (ISAC) systems. Nevertheless, their specific impact on hybrid radar fusion (HRF) remains largely unexplored. In HRF systems, where uplink (UL) paths carry direct and reflected signals in the same frequency band, the reflected signal is often significantly weaker, making HRF performance particularly sensitive to ADC resolution. To study this effect, we use the quantized Cramér-Rao bound (CRB) to measure sensing accuracy. This work derives an upper bound on the quantized CRB for angle of arrival (AoA) estimation and explores CRB-rate trade-offs through two formulated optimization problems. Simulation results indicate that HRF becomes infeasible when the dynamic range of the received signal exceeds the dynamic range supported by the ADC, which is inherently limited by its resolution. Furthermore, the UL communication rate does not increase significantly when the ADC resolution is raised beyond a certain threshold. These observations highlight a fundamental trade-off between sensing and communication performance: while HRF performance benefits from higher ADC resolutions, the corresponding gains in communication rate plateau. This trade-off is effectively characterized using CRB-rate boundaries derived through simulation.
Jialun Kou, Achiel Colpaert, Zhuangzhuang Cui, Sofie Pollin
Integrated localization and communication systems aim to reuse communication waveforms for simultaneous data transmission and localization, but delay resolution is fundamentally limited by the available bandwidth. In practice, large contiguous bandwidths are difficult to obtain due to hardware constraints and spectrum fragmentation. Aggregating non-contiguous narrow bands can increase the effective frequency span, but a non-contiguous frequency layout introduces challenges such as elevated sidelobes and ambiguity in delay estimation. This paper introduces a point-spread-function (PSF)-centric framework for dual-band OFDM delay estimation. We model the observed delay profile as the convolution of the true target response with a PSF determined by the dual-band subcarrier selection pattern, explicitly linking band configuration to resolution and ambiguity. To suppress PSF-induced artifacts, we adapt the RELAX algorithm for dual-band multi-target delay estimation. Simulations demonstrate improved robustness and accuracy in dual-band scenarios, supporting ILC under fragmented spectrum.
Kohei Asai, Wataru Nakata, Yuki Saito, Hiroshi Saruwatari
Comments Accepted to ICASSP 2025 workshop
Real-world audio recordings often contain multiple speakers and various degradations, which limit both the quantity and quality of speech data available for building state-of-the-art speech processing models. Although end-to-end approaches that concatenate speech enhancement (SE) and speech separation (SS) to obtain a clean speech signal for each speaker are promising, conventional SE-SS methods suffer from complex degradations beyond additive noise. To this end, we propose \textbf{Geneses}, a generative framework to achieve unified, high-quality SE--SS. Our Geneses leverages latent flow matching to estimate each speaker's clean speech features using multi-modal diffusion Transformer conditioned on self-supervised learning representation from noisy mixture. We conduct experimental evaluation using two-speaker mixtures from LibriTTS-R under two conditions: additive-noise-only and complex degradations. The results demonstrate that Geneses significantly outperforms a conventional mask-based SE--SS method across various objective metrics with high robustness against complex degradations. Audio samples are available in our demo page.
Ivan Bondarenko, Daniil Grebenkin, Oleg Sedukhin, Mikhail Klementev, Roman Derunets, Lyudmila Budneva
Journal ref Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track), pp. 988-997
This work presents a speech-to-text system "Pisets" for scientists and journalists which is based on a three-component architecture aimed at improving speech recognition accuracy while minimizing errors and hallucinations associated with the Whisper model. The architecture comprises primary recognition using Wav2Vec2, false positive filtering via the Audio Spectrogram Transformer (AST), and final speech recognition through Whisper. The implementation of curriculum learning methods and the utilization of diverse Russian-language speech corpora significantly enhanced the system's effectiveness. Additionally, advanced uncertainty modeling techniques were introduced, contributing to further improvements in transcription quality. The proposed approaches ensure robust transcribing of long audio data across various acoustic conditions compared to WhisperX and the usual Whisper model. The source code of "Pisets" system is publicly available at GitHub: https://github.com/bond005/pisets.
Zhengyang Li, Thomas Graave, Björn Möller, Zehang Wu, Matthias Franz, Tim Fingscheidt
Comments accepted at ICASSP2026
In audiovisual automatic speech recognition (AV-ASR) systems, information fusion of visual features in a pre-trained ASR has been proven as a promising method to improve noise robustness. In this work, based on the prominent Whisper ASR, first, we propose a simple and effective visual fusion method -- use of visual features both in encoder and decoder (dual-use) -- to learn the audiovisual interactions in the encoder and to weigh modalities in the decoder. Second, we compare visual fusion methods in Whisper models of various sizes. Our proposed dual-use method shows consistent noise robustness improvement, e.g., a 35% relative improvement (WER: 4.41% vs. 6.83%) based on Whisper small, and a 57% relative improvement (WER: 4.07% vs. 9.53%) based on Whisper medium, compared to typical reference middle fusion in babble noise with a signal-to-noise ratio (SNR) of 0dB. Third, we conduct ablation studies examining the impact of various module designs and fusion options. Fine-tuned on 1929 hours of audiovisual data, our dual-use method using Whisper medium achieves 4.08% (MUSAN babble noise) and 4.43% (NoiseX babble noise) average WER across various SNRs, thereby establishing a new state-of-the-art in noisy conditions on the LRS3 AV-ASR benchmark. Our code is at https://github.com/ifnspaml/Dual-Use-AVASR
Junli Chen, Changli Tang, Yixuan Li, Guangzhi Sun, Chao Zhang
Comments 4 pages, 2 figures. Submitted to ICASSP 2026
Visual information, such as subtitles in a movie, often helps automatic speech recognition. In this paper, we propose Donut-Whisper, an audio-visual ASR model with dual encoder to leverage visual information to improve speech recognition performance in both English and Chinese. Donut-Whisper combines the advantage of the linear and the Q-Former-based modality alignment structures via a cross-attention module, generating more powerful audio-visual features. Meanwhile, we propose a lightweight knowledge distillation scheme showcasing the potential of using audio-visual models to teach audio-only models to achieve better performance. Moreover, we propose a new multilingual audio-visual speech recognition dataset based on movie clips containing both Chinese and English partitions. As a result, Donut-Whisper achieved significantly better performance on both English and Chinese partition of the dataset compared to both Donut and Whisper large V3 baselines. In particular, an absolute 5.75% WER reduction and a 16.5% absolute CER reduction were achieved on the English and Chinese sets respectively compared to the Whisper ASR baseline.
Jionghui Wang, Jun Fang, Hongbin Li, Boyu Ning
In this paper, we study the problem of uplink channel estimation for near-filed orthogonal frequency division multiplexing (OFDM) systems, where a base station (BS), equipped with an extremely large-scale antenna array (ELAA), serves multiple users over the same time-frequency resource block. A non-orthogonal pilot transmission scheme is considered to accommodate a larger number of users that can be supported by ELAA systems without incurring an excessive amount of training overhead. To facilitate efficient multi-user channel estimation, we express the received signal as a third-order low-rank tensor, which admits a canonical polyadic decomposition (CPD) model for line-of-sight (LoS) scenarios and a block term decomposition (BTD) model for non-line-of-sight (NLoS) scenarios. An alternating least squares (ALS) algorithm and a non-linear least squares (NLS) algorithm are employed to perform CPD and BTD, respectively. Channel parameters are then efficiently extracted from the recovered factor matrices. By exploiting the geometry of the propagation paths in the estimated channel, users' positions can be precisely determined in LoS scenarios. Moreover, our uniqueness analysis shows that the proposed tensor-based joint multi-user channel estimation framework is effective even when the number of pilot symbols is much smaller than the number of users, revealing its potential in training overhead reduction. Simulation results demonstrate that the proposed method achieves markedly higher channel estimation accuracy than compressed sensing (CS)-based approaches.
Thomas Deppisch, Yang Gao, Manan Mittal, Benjamin Stahl, Christoph Hold, David Alon, Zamir Ben-Hur
Emerging wearable devices such as smartglasses and extended reality headsets demand high-quality spatial audio capture from compact, head-worn microphone arrays. Ambisonics provides a device-agnostic spatial audio representation by mapping array signals to spherical harmonic (SH) coefficients. In practice, however, accurate encoding remains challenging. While traditional linear encoders are signal-independent and robust, they amplify low-frequency noise and suffer from high-frequency spatial aliasing. On the other hand, neural network approaches can outperform linear encoders but they often assume idealized microphones and may perform inconsistently in real-world scenarios. To leverage their complementary strengths, we introduce a residual-learning framework that refines a linear encoder with corrections from a neural network. Using measured array transfer functions from smartglasses, we compare a UNet-based encoder from the literature with a new recurrent attention model. Our analysis reveals that both neural encoders only consistently outperform the linear baseline when integrated within the residual learning framework. In the residual configuration, both neural models achieve consistent and significant improvements across all tested metrics for in-domain data and moderate gains for out-of-domain data. Yet, coherence analysis indicates that all neural encoder configurations continue to struggle with directionally accurate high-frequency encoding.
Teruki Kato, Ryotaro Shima, Kenji Kashima
Comments Submitted to IEEE Transactions on Control Systems Technology (TCST)
This paper presents a strictly convex chance-constrained stochastic control framework that accounts for uncertainty in control specifications such as reference trajectories and operational constraints. By jointly optimizing control inputs and risk allocation under general (possibly non-Gaussian) uncertainties, the proposed method guarantees probabilistic constraint satisfaction while ensuring strict convexity, leading to uniqueness and continuity of the optimal solution. The formulation is further extended to nonlinear model-based control using exactly linearizable models identified through machine learning. The effectiveness of the proposed approach is demonstrated through model predictive control applied to a hybrid powertrain system.
Geunsik Lim
Comments 19 pages
As climate-related hazards intensify, conventional early warning systems (EWS) disseminate alerts rapidly but often fail to trigger timely protective actions, leading to preventable losses and inequities. We introduce Climate RADAR (Risk-Aware, Dynamic, and Action Recommendation system), a generative AI-based reliability layer that reframes disaster communication from alerts delivered to actions executed. It integrates meteorological, hydrological, vulnerability, and social data into a composite risk index and employs guardrail-embedded large language models (LLMs) to deliver personalized recommendations across citizen, volunteer, and municipal interfaces. Evaluation through simulations, user studies, and a municipal pilot shows improved outcomes, including higher protective action execution, reduced response latency, and increased usability and trust. By combining predictive analytics, behavioral science, and responsible AI, Climate RADAR advances people-centered, transparent, and equitable early warning systems, offering practical pathways toward compliance-ready disaster resilience infrastructures.
Wei Jiang, Hans D. Schotten
Traditional cellular networks struggle with poor quality of service (QoS) for cell-edge users, while cell-free (CF) systems offer uniform QoS but incur high roll-out costs due to acquiring numerous access point (AP) sites and deploying a large-scale optical fiber network to connect them. This paper proposes a cost-effective heterogeneous massive MIMO architecture that integrates centralized co-located antennas at a cell-center base station with distributed edge APs. By strategically splitting massive antennas between centralized and distributed nodes, the system maintains high user fairness comparable to CF systems but reduces infrastructure costs substantially, by minimizing the required number of AP sites and fronthaul connections. Numerical results demonstrate its superiority in balancing performance and costs compared to cellular and CF systems.
Samuel Mallick, Gianpietro Battocletti, Dimitris Boskos, Azita Dabiri, Bart De Schutter
Comments 16 pages, 8 figures, submitted to Transaction on IEEE Transactions on Intelligent Transportation Systems
Cooperative control of groups of autonomous vehicles (AVs), i.e., platoons, is a promising direction to improving the efficiency of autonomous transportation systems. In this context, distributed co-optimization of both vehicle speed and gear position can offer benefits for fuel-efficient driving. To this end, model predictive control (MPC) is a popular approach, optimizing the speed and gear-shift schedule while explicitly considering the vehicles' dynamics over a prediction window. However, optimization over both the vehicles' continuous dynamics and discrete gear positions is computationally intensive, and may require overly long sample times or high-end hardware for real-time implementation. This work proposes a reinforcement learning (RL)-based distributed MPC approach to address this issue. For each vehicle in the platoon, a policy is trained to select and fix the gear positions across the prediction window of a local MPC controller, leaving a significantly simpler continuous optimization problem to be solved as part of a distributed MPC scheme. In order to reduce the computational cost of training and facilitate the scalability of the proposed approach to large platoons, the policies are parameterized such that the emergent multi-agent RL problem can be decoupled into single-agent learning tasks. In addition, a recurrent neural-network (RNN) architecture is proposed for the gear selection policy, such that the learning is scalable even as the number of possible gear-shift schedules grows exponentially with the MPC prediction horizon. In highway-driving simulations, the proposed approach is shown to have a significantly lower computation burden and a comparable performance in terms of fuel-efficient platoon control, with respect to pure MPC-based co-optimization.
Federico Villani, Christian Vogt, Luca Specht, Jero Schmid, Xiang Liu, Andrea Cossettini, Daniel Razansky, Luca Benini
Optoacoustic (OA) imaging has emerged as a powerful investigation tool, with demonstrated applicability in oncology, neuroscience, and cardiovascular biology. However, its clinical translation is limited with the existing OA systems, which often rely on bulky and expensive acquisition hardware mainly optimized for pulse-echo ultrasound (US) imaging. Despite the fact that OA imaging has different requirements for receive bandwidths and timing synchronization with external laser sources, there is a strong need for unified OA-US imaging platforms, as pulse-echo US remains the standard tool for visualizing soft tissues. To address these challenges, we propose a new data acquisition architecture for ultrafast OA and US imaging that fully covers the requirements for large channel counts, wide bandwidth, and software-defined operation. LtL combines state-of-the-art wideband analog front-ends, a Zynq UltraScale+ MPSoC integrating FPGA fabric with an Application Processing Unit, and a 100 GbE Remote Direct Memory Access (RDMA) backend enabling raw-data streaming at up to 95.6 Gb/s. The architecture avoids local buffers followed by burst transfers, which commonly constrain sustainable frame rate and recording intervals, thus achieving true continuous and sustained streaming of raw data. We validate the core elements of the LtL architecture using a 16-channel demonstration system built from commercial evaluation boards. We further verify the signal chain for up to 256-channel scalability, confirming the wide bandwidth capabilities to support state-of-the-art data transmission speeds.
Mengcheng Huang, Xue Zhou, Chen Xu, Dapeng Man
Underwater acoustic target recognition (UATR) plays a vital role in marine applications but remains challenging due to limited labeled data and the complexity of ocean environments. This paper explores a central question: can speech large models (SLMs), trained on massive human speech corpora, be effectively transferred to underwater acoustics? To investigate this, we propose UATR-SLM, a simple framework that reuses the speech feature pipeline, adapts the SLM as an acoustic encoder, and adds a lightweight classifier.Experiments on the DeepShip and ShipsEar benchmarks show that UATR-SLM achieves over 99% in-domain accuracy, maintains strong robustness across variable signal lengths, and reaches up to 96.67% accuracy in cross-domain evaluation. These results highlight the strong transferability of SLMs to UATR, establishing a promising paradigm for leveraging speech foundation models in underwater acoustics.
Mohamed A. Mousa, Leif Bauer, Ziyi Yang, Utkarsh Singh, Angshuman Deka, Zubin Jacob
High-performance room-temperature sensing is often limited by non-stationary $1/f$ fluctuations and non-Gaussian stochasticity. In spintronic devices, thermally activated Néel switching creates heavy-tailed noise that masks weak signals, defeating linear filters optimized for Gaussian statistics. Here, we introduce a physics-integrated inference framework that decouples signal morphology from stochastic transients using a hierarchical 1D CNN-GRU topology. By learning the temporal signatures of Néel relaxation, this architecture reduces the Noise Equivalent Differential Temperature (NEDT) of spintronic Poisson bolometers by a factor of six (233.78 mK to 40.44 mK), effectively elevating room-temperature sensitivity toward cryogenic limits. We demonstrate the framework's universality across the electromagnetic and biological spectrum, achieving a 9-fold error suppression in Radar tracking, a 40\% uncertainty reduction in LiDAR, and a 15.56 dB SNR enhancement in ECG. This hardware-inference coupling recovers deterministic signals from fluctuation-dominated regimes, enabling near-ideal detection limits in noisy edge environments.
Yiwen Shao, Yong Xu, Sanjeev Khudanpur, Dong Yu
Comments SLT 2024
Spatial information is a critical clue for multi-channel multi-speaker target speech recognition. Most state-of-the-art multi-channel Automatic Speech Recognition (ASR) systems extract spatial features only during the speech separation stage, followed by standard single-channel ASR on the separated speech. This approach results in an inefficient, lengthy pipeline and sub-optimal ASR performance due to the accumulated errors from preprocessing modules. Furthermore, most spatial feature extraction methods depend on the knowledge of speaker positions and microphone topology, making the systems reliant on specific settings and challenging to adapt to new equipment. In this work, we propose a solution to these issues with a lightweight embedding module named SpatialEmb, which extracts and encodes spatial information directly for the ASR model, supporting both fixed and arbitrary microphone topology. We conduct comprehensive experiments on AliMeeting, a real meeting corpus, to determine the optimal model design for SpatialEmb in terms of both performance and efficiency. Our best model trained with 105 hours Train-Ali-far achieves 17.04% and 20.32% character error rates (CER) on the Eval and Test sets, establishing a new state-of-the-art result with the same training data.
Onur Haliloğlu, Ufuk Sakarya, B. Uğur Töreyin, Orhan Gazi
Hyperspectral imagery is composed of huge amount of data which creates significant transmission latencies for communication systems. It is vital to decrease the huge data size before transmitting the Hyperspectral imagery. Besides, large data size leads to processing problems, especially in practical applications. Moreover, due to the lack of sufficient training samples, Hughes phenomena occur with huge amount of data. Feature selection can be used in order to get rid of huge data problems. In this paper, a band selection framework is introduced to reduce the data size and to find out the most proper spectral bands for a specific application. The method is based on finding "dominant sets" in hyperspectral data, so that spectral bands are clustered. From each cluster, the band that reflects the cluster behavior the most is selected to form the most valuable band set in the spectra for a specific application. The proposed feature selection method has low computational complexity since it performs on a small size of data when realizing the feature selection. The aim of the study is to find out a general framework that can define required bands for classification without requiring to perform on the whole data set. Results on Pavia and Salinas datasets show that the proposed framework performs better than the state-of-the-art feature selection methods in terms of classification accuracy.
Jian Xu, Xinxiong Jiang, Yi Bao, Yuchen Zheng, Xuhui Chen, Qiang Xu, Siyang Liao, Deping Ke, Xiaoqi Gao
Artificial-intelligence (AI) workloads are driving rapid growth in data-center electricity use and rack power density, increasing demand for power-delivery systems that are efficient and robust to fast load transients. Conventional uninterruptible power supply (UPS) based AC distribution chains involve multiple conversion stages and line-frequency transformers, which compound losses and are less compatible with dynamic AI power profiles. Although solid-state transformers (SSTs) and 800 VDC distribution architecture are widely discussed, implementable topology/control details, and long-horizon validation with realistic operating profiles remain limited. This paper develops an SST-driven 800 VDC architecture that converts 10 kV MVAC to an 800V LVDC bus using a three-phase H-bridge AC/DC stage cascaded with a dual-active-bridge (DAB) DC/DC stage. A coordinated closed-loop control scheme, combining rectifier voltage/current regulation and DAB phase-shift control, is designed to maintain DC-bus voltage stability. The proposed system is implemented on the real-time digital simulation (RTDS) platform and evaluated via sequential simulations using real-world day- and month-scale operating profiles of data centers, benchmarked against a UPS supply chain. Numerical studies demonstrate tight 800 VDC regulation, reduced input-side energy consumption compared with the UPS baseline, and satisfactory power-quality performance. A capacitance sensitivity test quantifies tradeoffs between DC-bus ripple and low-frequency input-power oscillations, yielding a practical capacitance range for design. Overall, the work provides a reproducible evaluation workflow and actionable guidance for next-generation AI data centers.
Maiken Borud Omtveit, Qian Long, Valentin Chabaud, Marte Ruud-Olsen, Steinar Halsne, Tor-Christian Ystgaard
This paper presents an innovative offshore solution where oil & gas platform clusters are powered by a wind farm and a hydrogen hub. The results show a feasible off-grid design as an alternative to conventional electrification solutions. To address the challenges of design and operation of such a system, a power system model of the equipment and control was developed in a power system simulator called Process Power Simulator (PPSim). Power fluctuations in the wind farm are modelled using a state-of-the-art method encompassing turbulence and wakes. Various operation scenarios were used to evaluate the system design and find the right equipment size. An expensive component to over dimension is the battery energy storage system (BESS). The BESS power rating and energy capacity were found by running a combination of scenarios with extreme and natural wind variations, and contingencies. The control strategy and ramp rates of electrolyzers have significant impact on both system performance and design. A ramp rate in the order of seconds as opposed to minutes will decrease the required BESS size by 60-70%. Choosing synchronized control of the electrolyzers can further reduce the BESS size by 15-20%. The simulations also revealed challenges to achieve self-sufficiency of hydrogen and potential design improvements are suggested.
Yinan Li, Hasti Seifi
Environmental sounds like footsteps, keyboard typing, or dog barking carry rich information and emotional context, making them valuable for designing haptics in user applications. Existing audio-to-vibration methods, however, rely on signal-processing rules tuned for music or games and often fail to generalize across diverse sounds. To address this, we first investigated user perception of four existing audio-to-haptic algorithms, then created a data-driven model for environmental sounds. In Study 1, 34 participants rated vibrations generated by the four algorithms for 1,000 sounds, revealing no consistent algorithm preferences. Using this dataset, we trained Sound2Hap, a CNN-based autoencoder, to generate perceptually meaningful vibrations from diverse sounds with low latency. In Study 2, 15 participants rated its output higher than signal-processing baselines on both audio-vibration match and Haptic Experience Index (HXI), finding it more harmonious with diverse sounds. This work demonstrates a perceptually validated approach to audio-haptic translation, broadening the reach of sound-driven haptics.
Mohammad Ali Vahedifar, Qi Zhang
Comments This paper has been accepted at IEEE INFOCOM 2026
Tactile Internet (TI) requires ultra-low latency and high reliability to ensure stability and transparency in touch-enabled teleoperation. However, variable delays and packet loss present significant challenges to maintaining immersive haptic communication. To address this, we propose a predictive framework that integrates Discrete Mode Decomposition (DMD) with Shapley Mode Value (SMV) for accurate and timely haptic signal prediction. DMD decomposes haptic signals into interpretable intrinsic modes, while SMV evaluates each mode's contribution to prediction accuracy, which is well-aligned with the goal-oriented semantic communication. Integrating SMV with DMD further accelerates inference, enabling efficient communication and smooth teleoperation even under adverse network conditions. Extensive experiments show that DMD+SMV, combined with a Transformer architecture, outperforms baseline methods significantly. It achieves 98.9% accuracy for 1-sample prediction and 92.5% for 100-sample prediction, as well as extremely low inference latency: 0.056 ms and 2 ms, respectively. These results demonstrate that the proposed framework has strong potential to ease the stringent latency and reliability requirements of TI without compromising performance, highlighting its feasibility for real-world deployment in TI systems.
Mihails Milehins, Dan B. Marghitu
Comments 8 pages; (v2) minor amendments, further references
The article describes fundamental analytical properties of an unforced mechanical oscillator with a Duhem-type viscoelastoplastic hysteretic element. These properties include global existence of solutions, uniqueness of solutions, and convergence of each solution to an equilibrium point.
Carlos Rafael Nogueira da Silva, Maria Cecilia Luna Alvarado, Fernando Darío Almeida García, Michel Daoud Yacoub
Journal ref 2026 IEEE Transactions on Communication
The Lognormal distribution is a fundamental statistical model widely used in different fields of science, including biology, finance, economics, engineering, etc. In wireless communications, it is the primary statistic for large-scale fading modeling. However, its known analytical intractability presents persistent channel characterization and performance analysis challenges. This paper introduces two effective and mathematically tractable surrogate models for the Lognormal distribution, constructed from the product of Nakagami-$m$ and Inverse Nakagami-$m$ (I-Nakagami-$m$) variates. These models yield asymptotically exact closed-form expressions for key performance metrics -- including the characteristic function, bit error rate, and Shannon's capacity -- and enable analytically tractable expressions for the probability density function and cumulative distribution function of the composite $α$-$μ$-Lognormal fading model. To facilitate implementation, a moment-matching framework is developed to map the Lognormal parameters to the surrogate model parameters. In addition, a random mixture approach is proposed to enhance convergence by exploiting the complementary approximation properties of the Nakagami-$m$ and I-Nakagami-$m$ distributions. The methodology is further extended to heterogeneous cascaded fading channels comprising arbitrary combinations of $α$-$μ$, $κ$-$μ$, and $η$-$μ$ variates, for which moment-based mappings to the equivalent Lognormal distributions are derived. Numerical results confirm the accuracy and efficiency of the proposed approach, positioning it as a practical and reliable alternative to exact Lognormal statistics.
扫码添加微信好友,提出您的宝贵建议 👇
💡 备注请填写:网站反馈