arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2086
2605.05214 2026-05-08 eess.SP cs.AI cs.LG

MedMamba: Recasting Mamba for Medical Time Series Classification

ZhengXiao He, Huayu Li, Xiwen Chen, Janet M Roveda, Jinghao Wen, Siyuan Tian, Ao Li

详情
英文摘要

Medical time series, such as electrocardiograms (ECG) and electroencephalograms (EEG), exhibit complex temporal dynamics and structured cross-channel dependencies, posing fundamental challenges for automated analysis. Conventional convolutional and recurrent models struggle to capture long-range dependencies, while Transformer-based approaches incur quadratic complexity and often introduce redundant interactions that are misaligned with the intrinsic structure of physiological signals. To address these limitations, we propose MedMamba, a principle-driven multi-scale bidirectional state space architecture tailored for medical time series classification. Our design is guided by three key inductive biases of physiological signals: spatial centralization, multi-timescale temporal composition, and non-causal contextual dependency. These principles are instantiated through a lightweight channel-mixing module for cross-channel reparameterization, multi-scale convolutional tokenization for temporal decomposition, and bidirectional Mamba blocks for efficient global context modeling with linear complexity. Extensive experiments on six benchmark datasets spanning EEG, ECG, and human activity signals demonstrate that MedMamba consistently outperforms state-of-the-art methods across diverse modalities. Notably, it achieves 85.97% accuracy on PTB and establishes new state-of-the-art performance on the challenging ADFTD dataset (54.72% accuracy and 52.01% F1-score). Strong results on long-sequence benchmarks, such as SleepEDF, further validate its capability in modeling long-range dependencies. Moreover, MedMamba achieves a speedup of 4.6x in inference, highlighting its practicality for real-time clinical deployment. These results suggest that principle-guided state space modeling offers an effective and scalable alternative to Transformer-based approaches for medical time series analysis.

2605.05212 2026-05-08 eess.SP cs.HC cs.LG

MPNet: A Robust and Efficient Manifold Pooling Network for Multi-Rhythm EEG Signal Decoding

Guoqing Cai, Kai Zeng, Shoulin Huang, Ting Ma

详情
英文摘要

Deep Riemannian networks provide a powerful framework for Electroencephalography (EEG) decoding, but their practical applications are severely constrained. Accurately decoding EEG signals requires modeling complex temporal dynamics across multiple rhythms, which results in high-dimensional Riemannian inputs and significant computational costs. To address this, we propose the Manifold Pooling Network (MPNet). MPNet uses a rhythm-adaptive convolutional frontend to extract comprehensive time-frequency representations and generate multi-view Riemannian nodes. A novel manifold node pooling layer is then proposed to aggregate these nodes into a single fusion node with a fixed size, enabling the following deep Riemannian network to process it with greatly reduced costs. Experiments on two public EEG datasets show that MPNet achieves state-of-the-art accuracy, runs up to 10 times faster than the comparable Riemannian model, and maintains robust performance under limited-data conditions. These findings highlight MPNet's practicality and efficiency for real-world EEG applications.

2605.05211 2026-05-08 q-fin.PR cs.AI cs.LG q-fin.ST

A Review of Large Language Models for Stock Price Forecasting from a Hedge-Fund Perspective

Olivia Zhang, Zhilin Zhang

Comments Accepted at the IEEE Conference on Artificial Intelligence, Spain, May 8--10, 2026

详情
英文摘要

Large language models (LLMs) are increasingly deployed in quantitative finance for stock price forecasting. This review synthesizes recent applications of LLMs in this domain, including extracting sentiment from financial news and social media, analyzing financial reports and earnings-call transcripts, tokenizing or symbolizing stock price series, and constructing multi-agent trading systems. Particular attention is paid to practical pitfalls that are often understated in the literature, such as fragility in sentiment analysis, dataset and horizon design, performance evaluation metrics, data leakage, illiquidity premia, and limits of stock price predictability. Organized from a hedge-fund perspective, the review is intended to guide both academic researchers and hedge fund managers in integrating LLMs into real-world trading pipelines and in stress-testing their robustness under realistic market frictions.

2605.04723 2026-05-08 cs.IR cs.LG

Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation

Shereen Elsayed, Ngoc Son Le, Ahmed Rashed, Lars Schmidt-Thieme

Comments Accepted at IJCAI-ECAI 2026

详情
英文摘要

Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms to aggregate the entire sequence into a unified representation used for next-item prediction. While effective, these models often suffer from high computational complexity and memory consumption, limiting their ability to process long user histories. This constraint restricts the model's capacity to fully capture long-term user preferences. In some scenarios, modeling item interactions purely through attention may also not be the most effective approach to extract sequential patterns. In this work, we propose ConvRec, an alternative method with linear computational and memory complexity that employs convolutional layers in a hierarchical, down-scaled fashion to generate compact, yet expressive sequence representations. To further enhance the model's ability to capture diverse sequential patterns, each layer aggregates the neighboring items gradually to reach a comprehensive sequence representation. Extensive experiments on four real-world datasets demonstrate that our approach outperforms state-of-the-art sequential recommendation models, highlighting the potential of convolution-based architectures for efficient and effective sequence modeling in recommendation systems. Our implementation code and datasets are available here https://github.com/ismll-research/ConvRec.

2605.04400 2026-05-08 cs.IT cs.LG math.IT

Contextual Memory-Enhanced Source Coding for Low-SNR Communications

Ziqiong Wang, Rongpeng Li

详情
英文摘要

While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR regimes, even a small number of residual bit errors after channel decoding may derail the subsequent lossless reconstruction process, especially when Arithmetic Coding (AC) relies on Large Language Model (LLM)-based probability estimation. Existing remedies either strengthen channel decoding based solely on channel observations or introduce contextual information only at the receiver for post-hoc correction, yet neither fully addresses the fragility of source probability modeling under residual channel errors. To this end, this paper proposes a Memory-Augmented Source Coding (MASC) scheme for robust SSCC-based transmission. Rather than treating context as external side information, MASC internalizes contextual patterns into a source model shared by both the transmitter-side source encoder and the receiver-side source decoder. Specifically, MASC employs a shared Parameterized Contextual Memory (PCM) to encode multi-order $n$-gram patterns, and further introduces a Mixture-of-Memory-Experts Router (MMER) to perform sparse, hidden-state-dependent routing over memory experts during autoregressive source modeling. By adaptively activating only the most relevant memories at each coding step, MASC refines source probability estimation, shortens average codelength, and mitigates the sensitivity of source decoding to residual channel errors. Extensive experiments over Rayleigh fading and AWGN channels demonstrate the effectiveness of the proposed scheme compared with state-of-the-art methods.

2605.03061 2026-05-08 stat.ML cs.LG q-bio.QM stat.ME

Dynamic Vine Copulas: Detecting and Quantifying Time-Varying Higher-Order Interactions

Houman Safaai, Alessandro Marin Vargas

详情
英文摘要

Time-varying dependence is often modeled with dynamic correlations or Gaussian graphical models, but multivariate systems can change through tail behavior, asymmetry, or conditional structure even when correlations are nearly stable. We introduce Dynamic Vine Copulas (DVC), a temporal vine-copula framework for estimating and diagnosing sequence-wide non-Gaussian dependence. DVC fixes a chosen vine factorization for comparability; the framework applies to C-, D-, and R-vines, and our experiments use fixed-root-order C-vines. Pair-copula states evolve through smooth parameter trajectories or temporally regularized family-switching paths. The main diagnostic is a held-out comparison between a full vine and its matched 1-truncated version, which separates flexible first-tree pairwise dependence from evidence contributed by higher-tree conditional terms. At the population level, under a correct fixed vine and the simplifying assumption, this contrast equals the higher-tree component of a vine total-correlation decomposition; in finite samples, it is a predictive diagnostic. In controlled benchmarks, DVC detects Student-t degrees-of-freedom changes, Clayton-to-Gumbel switches, and recurrent conditional-interaction episodes missed or conflated by Gaussian dynamic baselines. The higher-tree score remains near zero in pairwise-only regimes and rises during conditional-interaction regimes. On Allen Visual Behavior Neuropixels data, DVC identifies a reproducible time-indexed higher-tree signal that is positive across held-out splits and vanishes under a decorrelated null, indicating simultaneous cross-area dependence. DVC therefore provides a flexible temporal copula model and an interpretable test of whether temporal dependence changes are pairwise or conditional.

2605.01669 2026-05-08 stat.ML cs.LG stat.ME

PRCD-MAP: Learning How Much to Trust Imperfect Priors in Causal Discovery

Xihang Shan, Da Zhou

详情
英文摘要

External priors of unknown reliability create a brittle trade-off in causal discovery: blind trust amplifies errors, blind rejection wastes signal. Real priors are also heterogeneously reliable -- physical laws are trustworthy, LLM-suggested edges are speculative -- yet existing methods either ignore priors or impose them through globally uniform trust. We propose PRCD-MAP, a soft prior-consumption layer that assigns per-edge trust to an imperfect prior and uses it to modulate a prior-aware $\ell_1$ and prior-weighted $\ell_2$ regularizer in a MAP objective. Trust is calibrated by empirical Bayes on a Laplace-approximated marginal likelihood and propagated along the prior graph by an MLP, so data-confirmed neighborhoods boost trust and contradictions suppress it. PRCD-MAP enjoys a population-level safety guarantee: it is $\varepsilon$-safe in expectation over the prior-generation distribution, with $\varepsilon\leq C\cdot\mathrm{acc}(1{-}\mathrm{acc})\cdot d^2/T$ at the parametric $T^{-1}$ rate and vanishing at the prior-quality endpoints. When the prior is uninformative, learned trust provably collapses to its floor and the method recovers a no-prior baseline. Empirically, on real CausalTime data PRCD-MAP exploits informative LLM priors (LLM-prior gain $+0.067/+0.089$ AUROC on AQI/Medical over a no-prior PRCD-MAP backbone; combined backbone+prior lead $+0.123/+0.043$ over PCMCI+), auto-attenuates on the anonymous-variable Traffic stress test, and retains a lead at $d{=}300$; against BayesDAG, the closest soft-Bayesian baseline, PRCD-MAP wins on every CausalTime dataset under a matched $W_0$-only protocol. A four-way ablation isolates each component: EB calibration and MLP trust propagation jointly carry the plurality of the gain, with positive sign on every dataset. Extensions to nonlinear (NAM) and cross-sectional settings show the calibrated-trust principle is setting-agnostic.

2605.01297 2026-05-08 cs.CY cs.AI

Are we Doomed to an AI Race? Why Self-Interest Could Drive Countries Towards a Moratorium on Superintelligence

Edward Roussel, Lode Lauwaert, Torben Swoboda, Grant Ramsey, Risto Uuk, Leonard Dung, Anthony Aguirre

Comments 19 pages, 3 figures

详情
英文摘要

This paper uses game theory to argue that, contrary to the prevailing view, a moratorium on Artificial Superintelligence (ASI) can be in a state's self-interest. By formalizing trategic interactions between geopolitical superpowers, we model the trade-off between the benefits of technological supremacy and the catastrophic risks of uncontrolled ASI. The analysis reveals that as the perceived cost of loss of control increases sufficiently relative to other parameters, it becomes in each state's self-interest to impose a moratorium. We further provide empirical evidence suggesting that the global perception of ASI risk is rising, making a stable, rational moratorium increasingly plausible in the current geopolitical landscape.

2605.00062 2026-05-08 eess.IV cs.LG

RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics

Bojun Zhang, Huiyu Yang, Yunpeng Wang, Yuntian Chen, Yuanwei Bin, Rikui Zhang, Jianchun Wang

详情
英文摘要

Rapid aerodynamic evaluation is crucial for modern vehicle design, yet existing neural operators struggle to capture intricate spatial correlations. We propose the rotary-enhanced transformer operator (RETO), a novel neural solver featuring a dual-stage spatial awareness mechanism: sinusoidal-cosine encodings for global referencing and rotary positional encodings (RoPE) for relative displacements. RoPE encodes spatial relations via unitary rotations, enforcing translation invariance and enhancing local gradient resolution. RETO is validated on ShapeNet and the high-fidelity DrivAerML benchmark. On ShapeNet, RETO achieves a relative $L_2$ error of 0.063, outperforming RegDGCNN at 0.125 and representing a 16\% improvement over the Transolver baseline, which yields an error of 0.075. These performance gains are further amplified on the DrivAerML dataset, where RETO achieves relative $L_2$ errors of 0.089 for surface pressure and 0.097 for velocity. In comparison, Transolver results in errors of 0.116 and 0.121 for the same metrics, indicating that RETO achieves precision enhancements of 23\% and 19\%, respectively. For comprehensive comparison, the surface pressure and velocity errors for AB-UBT are 0.102 and 0.124, while RegDGCNN yields 0.235 and 0.312, respectively. Information-theoretical analysis shows that the entropy peak of RETO at 0.35 is significantly lower than that of Transolver at 0.75 under $10^4$ resolution, indicating a focused attentional mechanism capable of preserving localized gradients against global diffusion.

2604.27307 2026-05-08 stat.ML cs.LG

A Novel Computational Framework for Causal Inference: Tree-Based Discretization with ILP-Based Matching

Tianyu Yang, Md. Noor-E-Alam

详情
英文摘要

Causal inference is essential for data-driven decision-making, as it aims to uncover causal relationships from observational data. However, identifying causality remains challenging due to the potential for confounding and the distinction between correlation and causation. While recent advances in causal machine learning and matching algorithms have improved estimation accuracy, these methods often face trade-offs between interpretability and computational efficiency. This paper proposes a novel approach that combines a tree-based discretization technique, tailored for causal inference, with an integer linear programming-based matching algorithm. The discretization ensures approximately linear relationships for control datasets within strata, enabling effective matching, while the optimization framework optimizes for global balance. The resulting algorithm yields computational efficiency and less biased ATT estimates compared to state-of-the-art algorithms. Empirical evaluations demonstrate the proposed method's practical advantages over existing techniques in causal inference scenarios.

2604.22158 2026-05-08 math.OC cs.LG

Rate-Optimal Regret for the Safe Learning-based Control of the Constrained Linear Quadratic Regulator

Spencer Hutchinson, Nanfei Jiang, Mahnoosh Alizadeh

详情
英文摘要

We study the problem of adaptive control of the stochastic linear quadratic regulator (LQR) with constraints that must be satisfied at every time step. Prior work on the multidimensional problem has shown $\tilde{O}(T^{2/3})$ regret and satisfaction of robust constraints, leaving open the question of whether $\tilde{O}(\sqrt{T})$ regret can be attained in the constrained LQR setting. We contribute to this problem by showing $\tilde{O}(\sqrt{T})$ regret and satisfaction of chance constraints. This type of constraints allow us to handle unbounded noise and also enable analytical techniques not directly applicable to robust constraints. Our proposed algorithm for this problem uses an SDP to select an optimistic policy, and then "scales back" this policy until it is verifiably-safe. Our theoretical analysis establishes regret and constraint guarantees via a key lemma that bounds the system covariance in terms of the chosen policy. This covariance-based analysis is in contrast with the cost-to-go based analysis that is typically used in adaptive LQR.

2603.20531 2026-05-08 cs.DC cs.AI cs.CL cs.LG

Epistemic Observability in Language Models

Tony Mason, Vaastav Anand

详情
英文摘要

We find that models report highest confidence precisely when they are fabricating. Across four model families (OLMo-3, Llama-3.1, Qwen3, Mistral), self-reported confidence inversely correlates with accuracy, with AUC ranging from 0.28 to 0.36 where 0.5 is random guessing. We prove, under explicit formal assumptions, that this is not a capability gap but an observational one. Under text-only observation, where a supervisor sees only the model's output text, no monitoring system can reliably distinguish honest model outputs from plausible fabrications. We prove two results: first, that any policy conditioning only on the query cannot satisfy epistemic honesty across ambiguous world states; second, that no learning algorithm optimizing reward from a text-only supervisor can converge to honest behavior when the supervisor's observations are identical for both grounded and fabricated responses. Within our formal model, these impossibilities hold regardless of model scale or training procedure, including RLHF and instruction tuning. We construct a tensor interface that escapes the impossibility by exporting computational byproducts (per-token entropy and log-probability distributions) that are structurally coupled to correctness under standard training. Per-token entropy achieves pooled AUC 0.757, outperforming all text baselines by 2.5--3.9 percentage points at every budget level tested (10\%, 20\%, 30\%). The entropy signal generalizes across architectures (Spearman $ρ= 0.762$). The core contribution is a cost surface where the empirical mapping from verification budget (fraction of queries receiving expensive checks) to detection accuracy for each judge strategy is a practical lookup for system builders deciding how to allocate verification resources. The contribution is the map. The territory is the system you are building.

2603.13441 2026-05-08 stat.ML cond-mat.mtrl-sci cs.LG

Filtered Spectral Projection for Quantum Principal Component Analysis

Sk Mujaffar Hossain, Satadeep Bhattacharjee

详情
英文摘要

Quantum principal component analysis (qPCA) is commonly formulated as the extraction of eigenvalues and eigenvectors of a covariance-encoded density operator. Yet in many qPCA settings the practical goal is simpler: projection onto the dominant spectral subspace. Here we introduce a projection-first framework, the Filtered Spectral Projection Algorithm (FSPA), which bypasses explicit eigenvalue estimation while preserving the relevant spectral structure. FSPA amplifies any nonzero warm-start overlap with the leading subspace and remains robust in small-gap and near-degenerate regimes, without artificial symmetry breaking in the absence of bias. We show that FSPA achieves an oracle complexity $\mathcal{O}((\log(1/ε)+\log(1/|a_1|^2))/\log(λ_1/λ_2))$,which is tight by a matching lower bound, establishing it as an\emph{optimal} projection primitive. We derive a convergence rate for degenerate spectra, give a circuit resource analysis with $n+\mathcal{O}(1)$ qubit overhead independent of system dimension, and extend the method to threshold spectral projection, Threshold-FSPA, which converges in $\mathcal{O}(\log(1/ε))$ calls when the threshold lies between eigenvalues. In the density matrix exponentiation access model, FSPA gives an exponential copy-complexity advantage over classical methods. For classical datasets, we show that for amplitude-encoded centered data the ensemble density matrix $ρ=\sum_i p_i|ψ_i\rangle\langleψ_i|$ equals the covariance matrix. Numerical tests on chemistry density matrices, noisy circuit outputs, Breast Cancer Wisconsin, handwritten Digits, and 1--4-qubit scalability confirm the theory. A minimal Qiskit implementation validates magnitude invariance, signal amplification, and no spurious symmetry breaking. These results establish FSPA as an optimal and deployable quantum spectral projection primitive.

2603.04807 2026-05-08 stat.ML cs.LG

Does Sparse Connectivity Improve Generalization? Convolutional Networks Below the Edge of Stability

Tongtong Liang, Esha Singh, Rahul Parhi, Alexander Cloninger, Yu-Xiang Wang

Comments Under Review. Comments welcome!

详情
英文摘要

Gradient descent on overparameterized neural networks typically operates at the Edge of Stability (EoS), where the largest Hessian eigenvalue hovers around a step-size-dependent threshold. We study how sparse connectivity changes generalization below this threshold in two-layer ReLU networks. Prior results have shown that for fully-connected networks (FCNs), generalization guarantees in this regime degrade and become vacuous on high-dimensional spherical inputs. Our analysis reveals that sparse connectivity fundamentally alters this picture. Under sparse connectivity, the network processes a collection of low-dimensional patches rather than the full input vector, so the effective constraint imposed by the stability condition is governed by the geometry of the training patch collection. We prove that when the receptive fields are small relative to the ambient dimension, the effective constraint yields non-vacuous generalization bounds in precisely the spherical regime where FCNs provably fail. The same framework also reveals a contrasting failure mode: if the patch collection lacks geometric structure, the constraint becomes unable to prevent overfitting. We corroborate this theory by analyzing the patch geometry of natural images, showing that standard convolutional designs produce patch multiset with low-dimensional structure that facilitates generalization. This provides a principled explanation for the generalization advantage of convolutional networks. Thus, our analysis yields a unified framework that identifies how architecture, data geometry, and gradient descent jointly govern generalization performance.

2602.23405 2026-05-08 cs.NE cs.LG

Isotropic Activation Functions Enable Deindividuated Neurons and Adaptive Topologies

George Bird

Comments 33 pages, 5 figures, UPDATED CHANGES: Improved the main body text (same content), slight modification to title and abstract, and updated formatting for clarity and to comply with submission to NeurIPS review. Updated version reflects those changes made

详情
英文摘要

Introduced is a methodology for adapting the topology of dense neural networks, enabled by isotropic activation functions. Achieved through prescribed reparameterisation symmetries and singular-value decomposition of affine maps, this diagonalises layers into one-to-one, ordered connections. This makes it simpler to assess the impact of individual connections on the function. Low-impact neurons can be removed (neurodegeneration), and a thresholded buffer of largely inactive 'scaffold' neurons is maintained (neurogenesis). These symmetry-led diagonalisation and structural changes are function-invariant, demonstrated to be computationally identical during neurogenesis, arbitrarily well approximated during neurodegeneration, and enable asymptotic 50\% parameter sparsification of dense networks with identically preserved function. Thus, real-time restructuring of the architecture in response to task demands, task appending, removal or changes is shown. The approach is conceptually centred on primitive symmetry-prescriptions, through which isotropic functions are derived that feature explicit basis independence and a loss in the individuation of neurons implicit in typical elementwise functional forms. Hence, this allows freedom in the basis to which layers are decomposed and interpreted as individual artificial neurons, directly enabling this adaptive topology approach. Additionally, a new tunable model parameter, the 'intrinsic length', is introduced to improve this analytical invariance, alongside a generalised isotropic-perceptron architecture that enables parallel precomputation of all matrix-vector products and displays a nested functional class. Diagonalisation is suggested to offer new possibilities for interpretability and monitoring of isotropic networks.

2602.14481 2026-05-08 cs.IT cs.AI math.IT

On the Rate-Distortion-Complexity Tradeoff for Semantic Communication

Jingxuan Chai, Yong Xiao, Guangming Shi

Comments Accepted at IEEE Internet of Things Journal

详情
英文摘要

Semantic communication is a novel communication paradigm that focuses on conveying the user's intended meaning rather than the bit-wise transmission of source signals. One of the key challenges is to effectively represent and extract the semantic meaning of any given source signals. While deep learning (DL)-based solutions have shown promising results in extracting implicit semantic information from a wide range of sources, existing work often overlooks the high computational complexity inherent in both model training and inference for the DL-based encoder and decoder. To bridge this gap, this paper proposes a rate-distortion-complexity (RDC) framework which extends the classical rate-distortion theory by incorporating the constraints on semantic distance, including both the traditional bit-wise distortion metric and statistical difference-based divergence metric, and complexity measure, adopted from the theory of minimum description length and information bottleneck. We derive the closed-form theoretical results of the minimum achievable rate under given constraints on semantic distance and complexity for both Gaussian and binary semantic sources. Our theoretical results show a fundamental three-way tradeoff among achievable rate, semantic distance, and model complexity. Extensive experiments on real-world image and video datasets validate this tradeoff and further demonstrate that our information-theoretic complexity measure effectively correlates with practical computational costs, guiding efficient system design in resource-constrained scenarios.

2602.06381 2026-05-08 quant-ph cs.LG

HyQuRP: Hybrid quantum-classical neural network with rotational and permutational equivariance

Semin Park, Chae-Yeun Park

Comments 12+41 pages; 1 figure

详情
英文摘要

Group-equivariant quantum machine learning has emerged as a promising paradigm by incorporating symmetry into quantum models. However, constructing models simultaneously equivariant to both rotational and permutational symmetries in a principled manner remains a bottleneck. In this work, we develop a general framework for dual-equivariant gates under rotations and permutations and analyze the dimension of the resulting gate space using group representation theory. Based on this, we introduce HyQuRP, a hybrid quantum-classical neural network with dual equivariance. On 3D point cloud classification benchmarks in the sparse-point regime, HyQuRP outperforms strong classical and quantum baselines. For example, when six subsampled points are used, HyQuRP ($\sim$1.5K parameters) achieves 76.13% accuracy on the 5-class ModelNet benchmark, compared with 72.54%, 71.09%, and 71.03% for Tensor Field Network, PointNet, and PointMamba with similar parameter counts. These results highlight HyQuRP's strong data efficiency and suggest the potential of equivariant quantum machine learning approaches in symmetry-sensitive tasks.

2602.01390 2026-05-08 cs.HC cs.AI

Toward Scalable Audio Description Quality Control: A Workflow for Evaluating Human and VLM Raters

Lana Do, Gio Jung, Juvenal Francisco Barajas, Andrew Taylor Scott, Shasta Ihorn, Alexander Mario Blum, Vassilis Athitsos, Ilmi Yoon

详情
英文摘要

Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision users are excluded. While crowdsourced platforms and vision-language models (VLMs) expand AD production, quality is rarely checked systematically. Existing evaluations rely on NLP metrics and short-clip guidelines, leaving open the question of how to assess long-form AD quality at scale. To address this, we developed a methodological workflow using Item Response Theory to evaluate VLM and human rater proficiency against expert-established ground truth. Evaluations were based on a six-dimensional framework, grounded in professional guidelines and shaped by insights from our accessibility experts and blind consultants. Findings suggest that top-performing VLMs can approximate ground-truth ratings at levels comparable to human raters. However, qualitative analysis reveals that VLM reasoning is less reliable and actionable than that of human respondents. These insights underscore the potential of hybrid evaluation systems that leverage VLMs alongside human oversight, offering a path toward scalable AD quality control.

2601.21831 2026-05-08 stat.ML cs.LG

Generative Modeling of Discrete Data Using Geometric Latent Subspaces

Daniel Gonzalez-Alvarado, Jonas Cassel, Stefania Petra, Christoph Schnörr

详情
英文摘要

We propose a geometric latent-subspace framework for generative modeling of discrete data. Specifically, we introduce latent subspaces in the exponential parameter space of product manifolds of categorical distributions as a novel method for learning generative models of discrete data. The resulting low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorical variables. We equip the parameter domain with a Riemannian geometry such that the latent subspace and induced data manifold are related by isometries enabling consistent flow matching. Exploiting this structure, we propose a geometry-aware dimensionality reduction objective, called geometric PCA (GPCA), which we formulate as a regularized cross-entropy minimization that encourages small Riemannian distances between the data and their reconstructions. In particular, under the induced geometry, geodesics become straight lines in the latent parameter space which makes model training by flow matching effective. Empirical results show that low-dimensional latent representations suffice to accurately model high-dimensional discrete data.

2601.21264 2026-05-08 cs.HC cs.SD eess.AS

Evaluating Spatialized Auditory Cues for Rapid Attention Capture in XR

Yoonsang Kim, Swapnil Dey, Arie Kaufman

Comments 8 pages, 4 figures. This is the author's version of the article that appeared at the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (IEEE VRW) 2026

详情
英文摘要

In time-critical eXtended reality (XR) scenarios where users must rapidly reorient their attention to hazards, alerts, or instructions while engaged in a primary task, spatial audio can provide an immediate directional cue without occupying visual bandwidth. However, such scenarios can afford only a brief auditory exposure, requiring users to interpret sound direction quickly and without extended listening or head-driven refinement. This paper reports a controlled exploratory study of rapid spatial-audio localization in XR. Using HRTF-rendered broadband stimuli presented from a semi-dense set of directions around the listener, we quantify how accurately users can infer coarse direction from brief audio alone. We further examine the effects of short-term visuo-auditory feedback training as a lightweight calibration mechanism. Our findings show that brief spatial cues can convey coarse directional information, and that even short calibration can improve users' perception of aural signals. While these results highlight the potential of spatial audio for rapid attention guidance, they also show that auditory cues alone may not provide sufficient precision for complex or high-stakes tasks, and that spatial audio may be most effective when complemented by other sensory modalities or visual cues, without relying on head-driven refinement. We leverage this study on spatial audio as a preliminary investigation into a first-stage attention-guidance channel for wearable XR (e.g., VR head-mounted displays and AR smart glasses), and provide design insights on stimulus selection and calibration for time-critical use.

2601.17622 2026-05-08 cs.HC cs.CL cs.IR

Memento: Towards Proactive Visualization of Everyday Memories with Personal Wearable AR Assistant

Yoonsang Kim, Yalong Yang, Arie E. Kaufman

Comments 8 pages, 5 figures. This is the author's version of the article that appeared at the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (IEEE VRW) 2026

详情
英文摘要

We introduce Memento, a conversational AR assistant that permanently captures and memorizes user's verbal queries alongside their spatiotemporal and activity contexts. By storing these "memories," Memento discovers connections between users' recurring interests and the contexts that trigger them. Upon detection of similar or identical spatiotemporal activity, Memento proactively recalls user interests and delivers up-to-date responses through AR, seamlessly integrating AR experience into their daily routine. Unlike prior work, each interaction in Memento is not a transient event, but a connected series of interactions with coherent long--term perspective, tailored to the user's broader multimodal (visual, spatial, temporal, and embodied) context. We conduct a preliminary evaluation through user feedbacks with participants of diverse expertise in immersive apps, and explore the value of proactive context-aware AR assistant in everyday settings. We share our findings and challenges in designing a proactive, context-aware AR system.

2601.10915 2026-05-08 cs.IT cs.LG math.IT

A PAC-Bayesian Analysis of Channel-Induced Degradation in Edge Inference

Yangshuo He, Guanding Yu, Jingge Zhu

详情
英文摘要

In the emerging paradigm of edge learning, neural networks (NNs) are partitioned across distributed edge devices that collaboratively perform inference via wireless transmission. However, deploying NNs for edge inference over wireless channels inevitably leads to performance degradation, as the exact channel realizations in the inference stage are not known in the training stage. In this paper, we establish a theoretical framework to evaluate and bound this performance degradation. Inspired by statistical learning theory, we define a wireless generalization error to characterize the gap between the empirical performance during training and the expected inference performance under the true stochastic channel. To enable theoretical analysis, we introduce an augmented NN model that incorporates channel statistics directly into the weight space. Leveraging the PAC-Bayesian framework, we derive a high-probability bound on this error, which provides theoretical guarantees for wireless inference performance. Furthermore, we propose a channel-aware training algorithm that minimizes a tractable surrogate objective based on the derived bound. Simulations demonstrate that the proposed algorithm effectively improves wireless inference performance and model robustness under various channel conditions.

2601.09056 2026-05-08 cs.CR cs.CL cs.IR

StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching

Robert Dilworth

Comments 16 pages, 6 figures, 1 table

详情
英文摘要

Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investigations, aids detection of harmful content, offers exploratory cues for certain medical conditions (e.g., early signs of dementia or depression), provides historical context for literary works, and helps uncover misinformation and disinformation. In contrast, when stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes. Techniques such as de-anonymization, re-identification, tracking, profiling, and downstream effects like censorship illustrate the privacy threats that stylometric analysis can enable. Building on these concerns, this paper further explores how adversarial stylometry combined with steganography can counteract stylometric analysis. We first present enhancements to our adversarial attack, $\textit{TraceTarnish}$, providing stronger evidence of its capacity to confound stylometric systems and reduce their attribution and verification accuracy. Next, we examine how steganographic embedding can be fine-tuned to mask an author's stylistic fingerprint, quantifying the level of authorship obfuscation achievable as a function of the proportion of words altered with zero-width Unicode characters. Based on our findings, steganographic coverage of 33% or higher seemingly ensures authorship obfuscation. Finally, we reflect on the ways stylometry can be used to undermine privacy and argue for the necessity of defensive tools like $\textit{TraceTarnish}$.

2512.00751 2026-05-08 quant-ph cs.LG

Fragmentation is Efficiently Learnable by Quantum Neural Networks

Mikhail Mints, Eric R. Anschuetz

Comments 26 pages, 1 figure

详情
英文摘要

In certain classes of physical quantum systems, the exponentially large state space "fragments" into many low-dimensional, dynamically disconnected subspaces. We introduce a learning problem known as fragment classification, where given a quantum state input, one is interested in classifying to which subspace the state belongs. We prove that solving this learning problem is efficient on a quantum computer when the fragmentation phenomenon satisfies certain conditions. Furthermore, we give evidence supporting the classical hardness of this task by demonstrating that known dequantization techniques fail for the fragment classification problem. Consequently, this work provides a rare example of a physically motivated quantum machine learning task that is both efficient for quantum computers to perform and admits no known classical dequantization.

2511.06454 2026-05-08 math.OC cs.LG

Feature weighting for data analysis via evolutionary simulation

Aris Daniilidis, Alberto Domínguez Corella, Philipp Wissgott

详情
英文摘要

We analyze an algorithm for assigning weights prior to scalarization in discrete multi-objective problems arising from data analysis. The algorithm evolves weights (interpreted as the relevance of features) by a replicator-type dynamic on the standard simplex, with update indices computed from a normalized data matrix. We prove that the resulting sequence converges globally to a unique interior equilibrium, yielding non-degenerate limiting weights.

2511.02526 2026-05-08 eess.SY cs.LG cs.RO cs.SY

Many-vs-Many Missile Guidance via Virtual Targets

Marc Schneider, Walter Fichter

Comments Subsequent investigations showed that the proposed method does not generalize beyond the specific scenario considered in this manuscript

详情
英文摘要

This paper presents a novel approach to many-vs-many missile guidance using virtual targets (VTs) generated by a Normalizing Flows-based trajectory predictor. Rather than assigning n interceptors directly to m physical targets through conventional weapon target assignment algorithms, we propose a centralized strategy that constructs n VT trajectories representing probabilistic predictions of maneuvering target behavior. Each interceptor is guided toward its assigned VT using Zero-Effort-Miss guidance during midcourse flight, transitioning to Proportional Navigation guidance for terminal interception. This approach treats many-vs-many engagements as many-vs-distribution scenarios, exploiting numerical superiority (n > m) by distributing interceptors across diverse trajectory hypotheses rather than pursuing identical deterministic predictions. Monte Carlo simulations across various target-interceptor configurations (1-6 targets, 1-8 interceptors) demonstrate that the VT method matches or exceeds baseline straight-line prediction performance by 0-4.1% when n = m, with improvements increasing to 5.8-14.4% when n > m. The results confirm that probabilistic VTs enable effective exploitation of numerical superiority, significantly increasing interception probability in many-vs-many scenarios.

2510.18120 2026-05-08 stat.ML cs.LG

Generalization Below the Edge of Stability: The Role of Data Geometry

Tongtong Liang, Alexander Cloninger, Rahul Parhi, Yu-Xiang Wang

Comments Accepted by ICLR 2026

详情
英文摘要

Understanding generalization in overparameterized neural networks hinges on the interplay between the data geometry, neural architecture, and training dynamics. In this paper, we theoretically explore how data geometry controls this implicit bias. This paper presents theoretical results for overparametrized two-layer ReLU networks trained below the edge of stability. First, for data distributions supported on a mixture of low-dimensional balls, we derive generalization bounds that provably adapt to the intrinsic dimension. Second, for a family of isotropic distributions that vary in how strongly probability mass concentrates toward the unit sphere, we derive a spectrum of bounds showing that rates deteriorate as the mass concentrates toward the sphere. These results instantiate a unifying principle: When the data is harder to "shatter" with respect to the activation thresholds of the ReLU neurons, gradient descent tends to learn representations that capture shared patterns and thus finds solutions that generalize well. On the other hand, for data that is easily shattered (e.g., data supported on the sphere) gradient descent favors memorization. Our theoretical results consolidate disparate empirical findings that have appeared in the literature.

2509.24814 2026-05-08 stat.ME cs.LG stat.ML

A Greedy PDE Router for Blending Neural Operators and Classical Methods

Sahana Rayan, Yash Patel, Ambuj Tewari

详情
英文摘要

When solving PDEs, classical numerical solvers are often computationally expensive, while machine learning methods can suffer from spectral bias, failing to capture high-frequency components. Designing an optimal hybrid iterative solver--where, at each iteration, a solver is selected from an ensemble of solvers to leverage their complementary strengths--poses a challenging combinatorial problem. While greedy selection is desirable for its constant-factor approximation guarantee to the optimal solution under Lipschitz assumptions, it requires knowledge of the true error at each step, which is unavailable in practice. We address this by proposing an approximate greedy router that efficiently mimics a greedy approach to solver selection. Empirical results on the Poisson and convection-diffusion equations show that our method consistently reduces final error and area-under-the-curve (AUC) of the error trajectory relative to single-solver baselines and existing hybrid approaches such as HINTS. In particular, our method reaches comparable error levels in substantially fewer iterations while exhibiting more stable error decay.

2508.14804 2026-05-08 math.OC cs.LG

Learning from user's behaviour of some well-known congested traffic networks

Isolda Cardoso, Lucas Venturato, Jorgelina Walpen

Comments 30 pages, 8 figures, 7 tables

详情
英文摘要

The traffic assignment problem (TAP) aims to predict how traffic flows distribute themselves across a road network, traditionally requiring computationally expensive iterative simulations to reach a user equilibrium (UE) where no driver can unilaterally reduce their travel time. Recent developments in machine learning (ML), particularly Graph Neural Networks (GNNs) and hybrid approaches, aim to solve this faster while maintaining accuracy

2508.11659 2026-05-08 cs.NE cs.AI cs.LG q-bio.NC

Toward Practical Equilibrium Propagation: Brain-inspired Recurrent Neural Network with Feedback Regulation and Residual Connections

Zhuo Liu, Tao Chen

详情
英文摘要

Brain-like intelligent systems need brain-like learning methods. Equilibrium Propagation (EP) is a biologically plausible learning framework with strong potential for brain-inspired computing hardware. However, existing im-plementations of EP suffer from instability and prohibi-tively high computational costs. Inspired by the structure and dynamics of the brain, we propose a biologically plau-sible Feedback-regulated REsidual recurrent neural network (FRE-RNN) and study its learning performance in EP framework. Feedback regulation enables rapid convergence by reducing the spectral radius. The improvement in con-vergence property reduces the computational cost and train-ing time of EP by orders of magnitude, delivering perfor-mance on par with backpropagation (BP) in benchmark tasks. Meanwhile, residual connections with brain-inspired topologies help alleviate the vanishing gradient problem that arises when feedback pathways are weak in deep RNNs. Our approach substantially enhances the applicabil-ity and practicality of EP in large-scale networks that un-derpin artificial intelligence. The techniques developed here also offer guidance to implementing in-situ learning in physical neural networks.