Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts
Comments To be published in the Fifteenth International Conference on Language Resources and Evaluation (LREC 2026)
Valentin Pelloin, Lina Bekkali, Reda Dehak, David Doukhan
Comments To be published in the Fifteenth International Conference on Language Resources and Evaluation (LREC 2026)
Audio and speech self-supervised encoder models are now widely used for a lot of different tasks. Many of these models are often trained on clean segmented speech content such as LibriSpeech. In this paper, we look into how the pretraining datasets of such SSL (Self-Supervised Learning) models impact their downstream results. We build a large pretraining corpus of highly diverse TV and Radio broadcast audio content, which we describe with automatic tools. We use these annotations to build smaller subsets, which we use to train audio SSL models. Then, we evaluate the models on multiple downstream tasks such as automatic speech recognition, voice activity and music detection, or speaker recognition. The results show the potential of pretraining SSL models on diverse audio content without restricting it to speech. We also perform a membership inference attack to evaluate the encoder ability to memorize their training datasets, which highlight the importance of data deduplication. This unified training could bridge speech and music machine learning communities.
Muazzem Hussain Khan, Tasdid Hasnain, Md. Jamil khan, Ruhul Amin, Md. Shamim Reza, Md. Al Mehedi Hasan, Md Ashad Alam
Comments 25 [ages. 9 Figures
In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within histopathological images. To validate the efficiency of the proposed architecture, an extensive experiment was executed on a comprehensive multi-cancer dataset including Breast Cancer, Oral Cancer, Lung and Colon Cancer, Kidney Cancer, and Acute Lymphocytic Leukemia (ALL), including both original and segmented images were analyzed to assess model robustness across heterogeneous clinical imaging conditions. Our approach is benchmarked alongside several state-of-the-art CNN and transfer models, including DenseNet121, DenseNet201, InceptionV3, ResNet50, EfficientNetB3, multiple ViT variants, and Swin Transformer models. However, all models were trained and validated using a unified pipeline, incorporating balanced data preprocessing, transfer learning, and fine-tuning strategies. The experimental results demonstrated that our proposed architecture consistently gained superior performance, reaching 100% test accuracy for lung-colon cancer, segmented leukemia datasets, and up to 99.23% accuracy for breast cancer classification. The model also achieved near-perfect precision, f1 score, and recall, indicating highly stable scores across divers cancer types. Overall, the proposed model establishes a highly accurate, interpretable, and also robust multi-cancer classification system, demonstrating strong benchmark for future research and provides a unified comparative assessment useful for designing reliable AI-assisted histopathological diagnosis and clinical decision-making.
Mohammad Ali Vahedifar, Mojtaba Nazari, Qi Zhang
The Tactile Internet demands sub-millisecond latency and ultra-high reliability, as high latency or packet loss could lead to haptic control instability. To address this, we propose the Mode-Domain Architecture (MDA), a bilateral predictive neural network architecture designed to restore missing signals on both the human and robot sides. Unlike conventional models that extract features implicitly from raw data, MDA utilizes a novel Continuous-Orthogonal Mode Decomposition framework. By integrating an orthogonality constraint, we overcome the pervasive issue of "mode overlapping" found in state-of-the-art decomposition methods. Experimental results demonstrate that this structured feature extraction achieves high prediction accuracies of 98.6% (human) and 97.3% (robot). Furthermore, the model achieves ultra-low inference latency of 0.065 ms, significantly outperforming existing benchmarks and meeting the stringent real-time requirements of haptic teleoperation.
Junqi Liu, Yun Zhang, Xiaoxia Huang, Long Xu, Weisi Lin
Comments Submitted to IEEE Transactions on Circuits and Systems for Video Technology
Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario. To address this issue, we propose a Multi-Task JRD (MT-JRD) dataset and an Attribute-assisted MT-JRD (AMT-JRD) model for Video Coding for Machines (VCM), enhancing both prediction accuracy and coding efficiency. First, we construct a dataset comprising 27,264 JRD annotations from machines, supporting three representative tasks including object detection, instance segmentation, and keypoint detection. Secondly, we propose the AMT-JRD prediction model, which integrates Generalized Feature Extraction Module (GFEM) and Specialized Feature Extraction Module (SFEM) to facilitate joint learning across multiple tasks. Thirdly, we innovatively incorporate object attribute information into object-wise JRD prediction through the Attribute Feature Fusion Module (AFFM), which introduces prior knowledge about object size and location. This design effectively compensates for the limitations of relying solely on image features and enhances the model's capacity to represent the perceptual mechanisms of machine vision. Finally, we apply the AMT-JRD model to VCM, where the accurately predicted JRDs are applied to reduce the coding bit rate while preserving accuracy across multiple machine vision tasks. Extensive experimental results demonstrate that AMT-JRD achieves precise and robust multi-task prediction with a mean absolute error of 3.781 and error variance of 5.332 across three tasks, outperforming the state-of-the-art single-task prediction model by 6.7% and 6.3%, respectively. Coding experiments further reveal that compared to the baseline VVC and JPEG, the AMT-JRD-based VCM improves an average of 3.861% and 7.886% Bjontegaard Delta-mean Average Precision (BD-mAP), respectively.
Luke Jacobs, Ishfaq Aziz, Benhao Lu, Alireza Tabatabaeenejad, Mohamad Alipour, Elahe Soltanaghai
Soil moisture is a critical variable for managing irrigation, improving crop yield, and understanding field-scale hydrology. Radars mounted on unmanned aerial vehicles (UAVs) offer a promising means to monitor soil moisture over large fields with flexible, high-resolution coverage. However, during the growing season, canopy scattering and soil reflections become strongly coupled in the radar measurement. These coupled effects vary with crop structure or flight altitude, complicating the retrieval of soil moisture. To overcome this challenge, we present GreenScatter, a physics-based soil moisture retrieval framework for nadir-looking wideband UAV radars. GreenScatter introduces a microwave radiative transfer model that explicitly captures the dominant electromagnetic interactions between vegetation and soil, enabling accurate modeling of coherent ground backscatter through canopy. In parallel, it develops a radar cross-section (RCS) estimation method that transforms time-domain radar signals into calibrated wideband RCS spectra, isolating soil reflections while compensating for hardware and waveform effects. Together, these components enable robust soil moisture estimation through vegetation across varying canopy conditions and UAV configurations. Field experiments across multiple corn and soybean sites demonstrate consistent retrieval with an average volumetric water content (VWC) error of 4.49%.
Bhaskar Varma, Ying Shuai Quan, Karl D. von Ellenrieder, Paolo Falcone
Comments Submitted to CDC 2026 with L-CSS Parallel option
In this letter, we consider the problem of decentralized decision making among connected autonomous vehicles at unsignalized intersections, where existing centralized approaches do not scale gracefully under mixed maneuver intentions and coordinator failure. We propose a closed-loop opinion-dynamic decision model for intersection coordination, where vehicles exchange intent through dual signed networks: a conflict topology based communication network and a commitment-driven belief network that enable cooperation without a centralized coordinator. Continuous opinion states modulate velocity optimizer weights prior to commitment; a closed-form predictive feasibility gate then freezes each vehicle's decision into a GO or YIELD commitment, which propagates back through the belief network to pre-condition neighbor behavior ahead of physical conflicts. Crossing order emerges from geometric feasibility and arrival priority without the use of joint optimization or a solver. The approach is validated across three scenarios spanning fully competitive, merge, and mixed conflict topologies. The results demonstrate collision-free coordination and lower last-vehicle exit times compared to first come first served (FCFS) in all conflict non-trivial configurations.
Gokce Hacioglu, Serkan Vela
Multiple access techniques are vital for 5G and beyond. While Orthogonal Frequency Division Multiple Access (OFDMA) is standard, its high peak-to-average power ratio (PAPR) reduces energy efficiency in uplink transmissions. This paper presents Periodic OFDMA (P-OFDMA), a novel multiple access scheme with reduced PAPR and computational complexity. By assigning subcarriers in a periodic pattern across the entire frequency band, P-OFDMA enhances frequency diversity and simplifies allocation. We also introduce two precoded variants: P-OFDMA-DCT and P-OFDMA-DFT. Comprehensive simulations comparing P-OFDMA with OFDMA and SC-FDMA show that P-OFDMA-DFT consistently achieves the lowest PAPR. Furthermore, the standard P-OFDMA scheme outperforms SC-FDMA in PAPR for low subcarrier-per-user scenarios and achieves better bit error rate (BER) performance under high delay-spread conditions. Notably, P-OFDMA and its variants reduce transmitter-side processing by up to an eightfold factor compared to SC-FDMA, greatly benefiting low-complexity uplink devices. Although receiver complexity increases, the overall system processing load decreases, yielding improved energy efficiency. Thus, P-OFDMA offers a robust, energy-efficient uplink solution for future wireless networks.
Ignacio Santamaria, Mohammad Soleymani, Jesus Gutierrez, Eduard Jorswieck
Comments 12 pages, 5 figures
Beyond-diagonal reconfigurable intelligent surfaces (BD-RISs) significantly improve wireless performance by allowing tunable interconnections among elements, but their design in multiple-input multiple-output (MIMO) systems has so far relied on complex iterative algorithms or suboptimal approximations. This work introduces a simple yet powerful approach: instead of directly maximizing the achievable rate, we maximize the absolute value of the determinant of the equivalent MIMO channel. We derive a closed-form symmetric unitary scattering matrix whose rank is exactly twice the channel's degrees of freedom ($2r$). Remarkably, this low-rank solution achieves the same determinant value as the optimal unitary BD-RIS. Using log-majorization theory, we prove that the rate loss relative to the optimal unitary BD-RIS vanishes at high signal-to-noise ratio (SNR) or when the number of BD-RIS elements becomes large. Moreover, the proposed solution can be perfectly implemented using a $q$-stem BD-RIS architecture with only $q=2r-1$ stems, requiring a minimum number of reconfigurable circuits. The resulting Max-Det solution is orders of magnitude faster to compute than existing iterative methods while achieving near-optimal rates in practical scenarios. This makes high-performance BD-RIS deployment feasible even with large surfaces and limited computational resources.
Ziwei Li, Lukuang Dong, Saierdaer Yusuyin, Xianyu Zhao, Zhijian Ou
Comments Update after INTERSPEECH2026 submission
Integrating pretrained speech encoders with large language models (LLMs) is promising for ASR, but performance and data efficiency depend on the speech-language interface. A common choice is a learned projector that maps encoder features into the LLM embedding space, whereas an alternative is to expose discrete phoneme sequences to the LLM. Using the same encoder and LLM backbones, we compare phoneme-based and vanilla projector-based interfaces in high-resource English and low-resource Tatar. We also propose a BPE-phoneme interface that groups frequent local phoneme patterns while preserving explicit word-boundary cues for phoneme-to-grapheme generation. On LibriSpeech, the phoneme-based interface is competitive with the vanilla projector, and the BPE-phoneme interface yields further gains. On Tatar, the phoneme-based interface substantially outperforms the vanilla projector. We further find that phoneme supervision yields a phoneme-informed hybrid interface that is stronger than the vanilla projector.
Carl R. Richardson, Jichen Zhang, Ethan King, Ján Drgoňa
A novel stability-enhanced Gaussian process variational autoencoder (SEGP-VAE) is proposed for indirectly training a low-dimensional linear time invariant (LTI) system, using high-dimensional video data. The mean and covariance function of the novel SEGP prior are derived from the definition of an LTI system, enabling the SEGP to capture the indirectly observed latent process using a combined probabilistic and interpretable physical model. The search space of LTI parameters is restricted to the set of semi-contracting systems via a complete and unconstrained parametrisation. As a result, the SEGP-VAE can be trained using unconstrained optimisation algorithms. Furthermore, this parametrisation prevents numerical issues caused by the presence of a non-Hurwitz state matrix. A case study applies SEGP-VAE to a dataset containing videos of spiralling particles. This highlights the benefits of the approach and the application-specific design choices that enabled accurate latent state predictions.
Xiaohan Wang, Chen Wu, Dawei Zhao, Guangwei Gao, Dianjie Lu, Guijuan Zhang, Linwei Fan, Xu Lu, Shuai Wu, Hang Wei, Zhuoran Zheng
Considering efficiency, ultra-high-definition (UHD) low-light image restoration is extremely challenging. Existing methods based on Transformer architectures or high-dimensional complex convolutional neural networks often suffer from the "memory wall" bottleneck, failing to achieve millisecond-level inference on edge devices. To address this issue, we propose a novel real-time UHD low-light enhancement network based on geometric feature fusion using Clifford algebra in 2D Euclidean space. First, we construct a four-layer feature pyramid with gradually increasing resolution, which decomposes input images into low-frequency and high-frequency structural components via a Gaussian blur kernel, and adopts a lightweight U-Net based on depthwise separable convolution for dual-branch feature extraction. Second, to resolve structural information loss and artifacts from traditional high-low frequency feature fusion, we introduce spatially aware Clifford algebra, which maps feature tensors to a multivector space (scalars, vectors, bivectors) and uses Clifford similarity to aggregate features while suppressing noise and preserving textures. In the reconstruction stage, the network outputs adaptive Gamma and Gain maps, which perform physically constrained non-linear brightness adjustment via Retinex theory. Integrated with FP16 mixed-precision computation and dynamic operator fusion, our method achieves millisecond-level inference for 4K/8K images on a single consumer-grade device, while outperforming state-of-the-art (SOTA) models on several restoration metrics.
Jinquan Yan, Zhicheng Zhao, Zhengzheng Tu, Chenglong Li, Jin Tang, Bin Luo
UAV images are critical for applications such as large-area mapping, infrastructure inspection, and emergency response. However, in real-world flight environments, a single image is often affected by multiple degradation factors, including rain, haze, and noise, undermining downstream task performance. Current unified restoration approaches typically rely on implicit degradation representations that entangle multiple factors into a single condition, causing mutual interference among heterogeneous corrections. To this end, we propose DAME-Net, a Degradation-Aware Mixture-of-Experts Network that decouples explicit degradation perception from degradation-conditioned reconstruction for compositional UAV image restoration. Specifically, we design a Factor-wise Degradation Perception module(FDPM) to provide explicit per-factor degradation cues for the restoration stage through multi-label prediction with label-similarity-guided soft alignment, replacing implicit entangled conditions with interpretable and generalizable degradation descriptions. Moreover, we develop a Conditioned Decoupled MoE module(CDMM) that leverages these cues for stage-wise conditioning, spatial-frequency hybrid processing, and mask-constrained decoupled expert routing, enabling selective factor-specific correction while suppressing irrelevant interference. In addition, we construct the Multi-Degradation UAV Restoration benchmark (MDUR), the first large-scale UAV benchmark for compositional UAV image restoration, with 43 degradation configurations from single degradations to four-factor composites and standardized seen/unseen splits.Extensive experiments on MDUR demonstrate consistent improvements over representative unified restoration methods, with greater gains on unseen and higher-order composite degradations. Downstream experiments further validate benefits for UAV object detection.
Tianyu Zhou, Zihao Liang, Zehui Lu, Shaoshuai Mou
This paper presents an online intention prediction framework for estimating the goal state of autonomous systems in real time, even when intention is time-varying, and system dynamics or objectives include unknown parameters. The problem is formulated as an inverse optimal control / inverse reinforcement learning task, with the intention treated as a parameter in the objective. A shifting horizon strategy discounts outdated information, while online control-informed learning enables efficient gradient computation and online parameter updates. Simulations under varying noise levels and hardware experiments on a quadrotor drone demonstrate that the proposed approach achieves accurate, adaptive intention prediction in complex environments.
Matheus J. A. Oliveira, Israel F. Araujo, José R. de Oliveira Neto, Juliano B. Lima
Comments 16 pages 8 figures
This paper presents a generalized circuit framework for constructing Shih-type fractionalizations of unitary operators of dyadic order, i.e., operators $U$ satisfying $U^{2^n}=I$. Building upon the architecture of the quantum fractional Fourier transform (QFrFT), we show that fractionalization can be implemented coherently as a weighted superposition of integer powers, $\sum_k c_k(α)U^k$, where the coefficients are generated through an ancilla-domain quantum Fourier transform and a diagonal phase modulation. Under the assumption that controlled implementations of the required powers of $U$ are available, the resulting circuit yields a parameterized family of operators that interpolates the integer powers of $U$ and satisfies the additive property of fractional transforms. As concrete applications, we derive explicit quantum circuit realizations of the quantum fractional Hartley transform (QFrHT) and of the fractional cosine-transform families associated with Types~I and~IV. These constructions demonstrate the versatility of the proposed dyadic-order fractionalization framework for structured operators arising in quantum signal processing.
Tolga Girici, Meng Hua, Deniz Gündüz
Comments Accepted to VTC 2026 Spring, Nice, France
We study the problem of implementing a fully-connected layer of a neural network using wireless over-the-air computing. We assume a multi hop system with a multi-antenna transmitter and receiver, along with a number of multi-hop amplify-and-forward relay devices in between. We formulate an optimization problem that optimizes the transmitter precoder, receiver combiner and amplify-and-forward gains, subject to relay device power constraint and transmitter power constraint. We propose an alternating optimization framework that optimizes the imitation accuracy. Simulation study results reveal that multi-hop relaying achieves an almost perfect classification accuracy when used in a neural network.
Ndagijimana Cyprien, Mehdi Sookhak, Hosein Zarini, Chandra N Sekharan, Mohammed Atiquzzaman
Joint deployment of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) has been shown to be an effective method to establish communications in areas affected by disasters. However, ensuring good Quality of Services (QoS) while using as few UAVs as possible also requires optimal positioning and trajectory planning for UAVs and UGVs. This paper proposes a joint UAV-UGV-based positioning and trajectory planning framework for UAVs and UGVs deployment that guarantees optimal QoS for ground users. To model the UGVs' mobility, we introduce a road graph, which directs their movement along valid road segments and adheres to the road network constraints. To solve the sum rate optimization problem, we reformulate the problem as a Markov Decision Process (MDP) and propose a novel asynchronous Advantage Actor Critic (A3C) incorporated with meta-learning for rapid adaptation to new environments and dynamic conditions. Numerical results demonstrate that our proposed Meta-A3C approach outperforms A3C and DDPG, delivering 13.1\% higher throughput and 49\% faster execution while meeting the QoS requirements.
Anil Alan, Bart De Schutter
Comments 8 pages, final version for ECC 2026
We study feasibility guarantees for safety filters developed using Control Barrier Functions (CBFs) when a safe set is defined using the pointwise minimum of continuously differentiable functions, a construction that is common for the backup CBF (BCBF) method and typically nonsmooth. We replace the minimum by its log-sum-exp (soft-min) smoothing and show that, under a strict safety condition, the smooth function becomes a CBF (or extended CBF) for a range of the smoothing parameter. For compact safe sets, we derive an explicit lower bound on the smoothing parameter that makes the smooth function a CBF and hence renders the corresponding safety constraint feasible. For unbounded sets, we introduce tail conditions under which the smooth function satisfies an extended CBF condition uniformly. Finally, we apply these results to BCBFs. We show that safety of a compact (terminal) backup set under a backup controller, together with a condition ensuring safety of the backup trajectories on the relevant boundary of the safe set, is sufficient for constraint feasibility for BCBFs. These results provide a recipe for a priori feasibility guarantees for smooth inner approximations of nonsmooth safe sets without the need for additional online certification.
Thomas Lew, Marcus Greiff, John Subosits, Brian Plancher
Comments European Control Conference (ECC) 2026
Proximal methods such as the Alternating Direction Method of Multipliers (ADMM) are effective at solving constrained quadratic programs (QPs). To tackle infeasible QPs, slack variables are often introduced to ensure feasibility, which changes the structure of the problem, increases its size, and slows down numerical resolution. In this letter, we propose a simple ADMM scheme to tackle QPs with slack variables without increasing the size of the original problem. The only modification is a slightly different projection in the z-update, while the rest of the algorithm remains standard. We prove that the method is equivalent to applying ADMM to the QP with additional slack variables, even though slack variables are not added. Numerical experiments show speedups of the approach.
Jianing Chen, Sichen Qian, Chuangyin Dang, Sitian Qin
Comments Accepted by IEEE Transactions on Automatic Control
This paper mainly investigates a class of distributed Variational Generalized Nash Equilibrium (VGNE) seeking problems for both online noncooperative games and online aggregative games with time-varying coupling inequality constraints. Two novel continuous-time distributed VGNE seeking algorithms are proposed, which realize the constant regret bound and sublinear fit bound, superior to those of the criteria for online optimization problems and online games. Furthermore, to reduce unnecessary communication among players, a dynamic event-triggered mechanism involving internal variables is introduced into the distributed VGNE seeking algorithm, while the constant regret bound and sublinear fit bound are still maintained. Also, the Zeno behavior is strictly prohibited. Moreover, we further investigate the impact of communication noise on the player's measurement of its neighbors' relative states. It is demonstrated that both the regret and fit bounds remain valid as long as the noise level is not excessively large. This result reveals, to some extent, the proposed algorithm's noise-resilient capability. Finally, an online Uncrewed Aerial Vehicle (UAV) swarm game and an online Nash-Cournot game are given to demonstrate the validity of the theoretical results.
Gautier Hénique, William Le, Gabriel Dayan, Coralie Brodeur, Kristoff Nelson, Apostolos Christopoulos, Edith Filion, Phuc-Felix Nguyen-Tan, Laurent Letourneau-Guillon, Houda Bahig, Samuel Kadoury
Extranodal extension (ENE) is an emerging prognostic factor in human papillomavirus (HPV)-associated oropharyngeal cancer (OPC), although it is currently omitted as a clinical staging criteria. Recent works have advocated for the inclusion of iENE as a prognostic marker in HPV-positive OPC staging. However, several practical limitations continue to hinder its clinical integration, including inconsistencies in segmentation, low contrast in the periphery of metastatic lymph nodes on CT imaging, and laborious manual annotations. To address these limitations, we propose a fully automated end-to-end pipeline that uses computed tomography (CT) images with clinical data to assess the status of nodal ENE and predict treatment outcomes. Our approach includes a hierarchical 3D semi-supervised segmentation model designed to detect and delineate relevant iENE from radiotherapy planning CT scans. From these segmentations, a set of radiomics and deep features are extracted to train an imaging-detected ENE grading classifier. The predicted ENE status is then evaluated for its prognostic value and compared with existing staging criteria. Furthermore, we integrate these nodal features with primary tumor characteristics in a multimodal, attention-based outcome prediction model, providing a dynamic framework for outcome prediction. Our method is validated in an internal cohort of 397 HPV-positive OPC patients treated with radiation therapy or chemoradiotherapy between 2009 and 2020. For outcome prediction at the 2-year mark, our pipeline surpassed baseline models with 88.2% (4.8) in AUC for metastatic recurrence, 79.2% (7.4) for overall survival, and 78.1% (8.6) for disease-free survival. We also obtain a concordance index of 83.3% (6.5) for metastatic recurrence, 71.3% (8.9) for overall survival, and 70.0% (8.1) for disease-free survival, making it feasible for clinical decision making.
Sami Leon Noel Aziz Hanna, Nicolas Hoischen, Sandra Hirche, Armin Lederer
Comments Accepted at the European Control Conference (ECC)
Koopman operator-based methods enable data-driven bilinear representations of unknown nonlinear control systems. Accurate representations often demand significantly higher dimensions than the original system, making control design challenging. Control Lyapunov Functions (CLFs) are widely used for controller synthesis, with quadratic CLF candidates being the most common due to their simplicity. Yet, we show that this class is highly restrictive, especially when the state dimension is large: under mild conditions, their existence implies stabilizability of the bilinear system by a constant input -- that is, the control remains fixed over time. We establish this result by formulating a quadratically constrained quadratic program (QCQP) that exactly characterizes valid CLFs. Since QCQPs are NP-hard, we propose a convex semidefinite relaxation that offers a sufficient validity condition. For single-input systems, we prove that a quadratic CLF requires constant control stabilizability, and empirically demonstrate that this extends to high-dimensional multi-input systems in many cases.
Jiaxiang Wang, Zhaohui Yang, Mingzhe Chen, Mohammad Shikh-Bahaei
Comments 6 pages, 3 figures, accepted by ICC 2026 workshop
This paper presents a Semantic Feature Multiple Access (SFMA) framework for multi-user semantic communication in downlink wireless systems. By extending SwinJSCC to a two-user superimposition paradigm, SFMA enables simultaneous semantic transmission to multiple users over shared time-frequency resources. A key innovation is the Cross-User Attention (CUA) module, which facilitates controlled semantic feature exchange between paired users by leveraging inter-image similarity while mitigating interference. We formulate a joint user pairing and resource allocation problem to minimize global semantic distortion under constraints on bandwidth, end-to-end latency, and energy. This mixed-integer non-convex problem is decomposed into a Minimum-Weight Perfect Matching (MWPM) sub-problem and a convex bandwidth allocation feasibility check, with semi-closed-form bandwidth bounds derived from a strictly concave rate expression. A polynomial-time algorithm based on Blossom matching and bisection search is proposed. Extensive simulations on ImageNet-100 show that SFMA significantly improves reconstruction quality across pairing modes, and the proposed optimization effectively reduces overall distortion while satisfying physical-layer constraints.
Jiaxiang Wang, Zhouxiang Zhao, Yahao Ding, Zhijin Qin, Zhaohui Yang, Mingzhe Chen, Mohammad Shikh-Bahaei
Comments 13 pages, 8 figures
Integrated learning and communication (ILAC) unifies learned transceivers with radio resource management, where semantic feature multiple access (SFMA) enables paired users to superpose their learned representations over shared time-frequency resources. Unlike conventional multiple access schemes, SFMA interference arises in the learned feature space and depends jointly on the user pair, the transmit power, and the compression ratio. This coupling ties binary pairing decisions to continuous resource variables, yielding a mixed-integer non-convex optimization problem. To address this problem, we first propose similarity-conditioned SFMA (SC-SFMA), a Swin Transformer-based transceiver whose dual-conditioned similarity modulator (DC-SimM) gates cross-user feature fusion according to the inter-user semantic similarity. We then characterize the resulting pair-dependent interference by a bivariate logistic function parameterized by transmit power and compression ratio, thereby bridging the learned transceiver with network-level optimization. On this basis, we formulate a sum-rate maximization problem subject to per-user distortion, latency, energy, power, and bandwidth constraints. To solve this problem, we develop a three-block alternating optimization algorithm that integrates dual-decomposition-assisted compression ratio allocation, trust-region successive convex approximation (SCA) for joint power-bandwidth optimization, and dynamic feasible graph-based user pairing. Simulation results show that SC-SFMA achieves considerable peak signal-to-noise ratio (PSNR) and multi-scale structural similarity index measure (MS-SSIM) gains over deep joint source-channel coding (JSCC) and separation-based baselines. The proposed optimization framework attains significant sum rate improvements over conventional multiple access baselines.
Veronica Centorrino, Rawan Hoteit, Efe C. Balta, John Lygeros
Comments 8 Pages, 3 Figures
This paper studies equality-constrained minimization problems through the lens of feedback control. We introduce a unified control-theoretic framework by showing that a PID feedback law acting on the dual variable induces the PID saddle-point flow (PID-SPF), a broad class of saddle-point dynamics associated with the augmented Lagrangian. This framework recovers several classical primal-dual flows as special cases. We prove that the equilibria of the proposed flow coincide with the stationary points of the original problem. Our analysis reveals how the feedback gains affect the optimization: integral action enforces constraint satisfaction, proportional action introduces the augmented Lagrangian structure, and derivative action modifies the geometry of the primal dynamics by inducing a state-dependent Riemannian metric. Moreover, for convex problems with affine constraints, we establish global exponential convergence by leveraging contraction theory for all admissible PID gains, providing in the process explicit bounds on the convergence rate. Finally, we validate our theoretical results on numerical examples including an application to bilevel optimization.
Samuel Bianchi, Klaas P. Pruessmann
Comments 31 pages, 10 figures, 1 table
Purpose: Image reconstruction in challenging scenarios requires accurate characterisations of coil sensitivity profiles, local off-resonances (B0) and effective encoding fields. Reconstruction methods utilising all of this information rely on signal models that are not compatible with the classical Fourier/k-space interpretation of the coil data. Hence, the FFT and related techniques are no more applicable, rendering image reconstruction computationally demanding. Methods: This article contains a workflow for accurate sensitivity and B0 mapping as well as other required processing steps. An implementation of non-Fourier SENSE reconstruction is provide that is well suited for execution on a GPU using the FFT. Important practical aspects like stopping criteria and sources of image artifacts are analyzed and documented. Results: Highly performant image reconstruction could be demonstrated on a 2D and 3D spiral dataset. These datasets contain trajectories featuring readout durations up to 71.5ms and undersampling factors up to R = 7. Running the reconstruction on a GPU greatly boosts reconstruction speed. Stopping the reconstruction at the right moment is crucial for image quality. All methods included in this article are available in a public code repository. Conclusion: The provided implementation of non-Fourier SENSE reconstruction is highly performant. When it is executed on GPU, runtimes reach a duration feasible in practice. The presented workflow ensures robust and accurate computation of coil sensitive profiles and off-resonance maps.
Wongi Jeong, Hoigi Seo, Se Young Chun
Image generative models have become indispensable tools to yield exquisite high-resolution (HR) images for everyone, ranging from general users to professional designers. However, a desired outcome often requires generating a large number of HR images with different prompts and seeds, resulting in high computational cost for both users and service providers. Generating low-resolution (LR) images first could alleviate computational burden, but it is not straightforward how to generate LR images that are perceptually consistent with their HR counterparts. Here, we consider the task of generating high-fidelity LR images, called Previews, that preserve perceptual similarity of their HR counterparts for an efficient workflow, allowing users to identify promising candidates before generating the final HR image. We propose the commutator-zero condition to ensure the LR-HR perceptual consistency for flow matching models, leading to the proposed training-free solution with downsampling matrix selection and commutator-zero guidance. Extensive experiments show that our method can generate LR images with up to 33\% computation reduction while maintaining HR perceptual consistency. When combined with existing acceleration techniques, our method achieves up to 3$\times$ speedup. Moreover, our formulation can be extended to image manipulations, such as warping and translation, demonstrating its generalizability.
Riccardo Morselli, Davide Tebaldi, Roberto Zanasi
In this paper, a new discrete-time approach to model the clutches engagement/disengagement in a two-speed powershift is proposed. The core idea is the development of a model for the computation of the exact torque needed to achieve the clutches engagement, including both the cases of single clutch engagement and of simultaneous clutch engagement (full lock condition). Based on this, the control logic for the clutches engagement and disengagement phases is also developed. The advantages in terms of real-time applicability with respect to the continuous-time version are shown through extensive simulation results.
Xiangyu Dong, Ran Yang, Songjie Yang, Weidong Mei, Lipeng Zhu, Yue Xiu, Zhongpei Zhang
Flexible-geometry arrays based on movable antennas have shown considerable potential for improving wireless communication performance. In this letter, we investigate a multiuser multiple-input single-output (MU-MISO) downlink secure communication system aided by a flexible cylindrical array (FCLA) and artificial noise (AN), where each antenna element rotates along circular tracks while the circular slices move along a vertical axis. To guarantee transmission security, we aim to maximize the achievable sum rate at multiple legitimate information receivers by jointly optimizing transmit beamforming, AN covariance matrix, and antenna placement under secrecy constraints for an eavesdropper. While the resulting problem is intractable to solve, we develop a block coordinate descent (BCD)-based framework that combines the Lagrangian dual transform, tight semidefinite relaxation (SDR), and Nesterov-accelerated projected gradient descent (PGD). Numerical results show that the proposed algorithm converges rapidly and achieves significant sum-rate gains over benchmark schemes by exploiting the geometry flexibility of the array.
Elias Milios, Felix Berkel, Felix Gruber, Melanie N. Zeilinger, Kim P. Wabersich
Model Predictive Control (MPC) offers safe and near-optimal control but suffers from high computational costs. Approximate MPC (AMPC) mitigates this by learning a cheaper surrogate policy, typically by training a neural network on state-MPC input pairs. Generating training data is a major bottleneck, requiring solving the MPC for numerous states sampled from its feasible set. Since this feasible set is implicitly defined and unknown, efficient sampling is nontrivial but crucial. We propose the linear MPC Hit-and-Run (LMPC-HR) sampler for linear MPC with polyhedral constraints. We identify the feasible set boundaries along search directions, a crucial step within HR, by formulating the problem as a convex linear program, replacing expensive iterative searches with a single optimization step. A numerical study demonstrates that LMPC-HR achieves an order of magnitude reduction in computation time for generating uniformly distributed samples from the feasible set compared to naive baselines.
Yixuan Zhu, Bo Zhang, Yinkang Gao, Haoyuan Ren, Cheng Tang, Caixu Zhao, Lei Gong, Teng Wang, Wenqi Lou, Xi Li
In real-time systems, both individual task execution and data propagation must meet strict timing constraints. Cause-effect (CE) chains are widely used to analyze such behaviors by end-to-end latency. However, timing anomalies (TAs) can distort it, where a local reduction in execution times leads to an increase in the overall end-to-end latency. As a result, precisely analyzing the upper bounds of the latency becomes challenging, and such systems typically exhibit larger upper bounds than TA-eliminated systems. Existing studies either eliminate TAs by completely sacrificing average latency to simplify analysis or, despite adopting complex safe analysis methods, do not eliminate TAs effectively, still having high latencies. To address this issue, we identify two basic causes of TAs in end-to-end latency. Based on these causes, we propose the first treatment that eliminates TAs in the latency with negligible average latency loss using Deterministic Data Flow (DDF). We further formally prove its TA-free property. Therefore, we can get a precise upper bound for latency when all jobs execute with their worst-case execution times. Experimental results show that it effectively reduces the maximum end-to-end latency, the average latency, and latency jitter compared with the state-of-the-art (SOTA) method.