arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1289
2602.04053 2026-02-16 cs.CV

Seeing Through Clutter: Structured 3D Scene Reconstruction via Iterative Object Removal

Rio Aguina-Kang, Kevin James Blackburn-Matzen, Thibault Groueix, Vladimir Kim, Matheus Gadelha

Comments To appear in 3DV 2026

详情
英文摘要

We present SeeingThroughClutter, a method for reconstructing structured 3D representations from single images by segmenting and modeling objects individually. Prior approaches rely on intermediate tasks such as semantic segmentation and depth estimation, which often underperform in complex scenes, particularly in the presence of occlusion and clutter. We address this by introducing an iterative object removal and reconstruction pipeline that decomposes complex scenes into a sequence of simpler subtasks. Using VLMs as orchestrators, foreground objects are removed one at a time via detection, segmentation, object removal, and 3D fitting. We show that removing objects allows for cleaner segmentations of subsequent objects, even in highly occluded scenes. Our method requires no task-specific training and benefits directly from ongoing advances in foundation models. We demonstrate stateof-the-art robustness on 3D-Front and ADE20K datasets. Project Page: https://rioak.github.io/seeingthroughclutter/

2602.01308 2026-02-16 cs.LG cs.AI

Dispelling the Curse of Singularities in Neural Network Optimizations

Hengjie Cao, Mengyi Chen, Yifeng Yang, Fang Dong, Ruijun Huang, Anrui Chen, Jixian Zhou, Mingzhi Dong, Yujiang Wang, Dongsheng Li, Wenyi Fang, Yuanyi Lin, Fan Wu, Li Shang

详情
英文摘要

This work investigates the optimization instability of deep neural networks from a less-explored yet insightful perspective: the emergence and amplification of singularities in the parametric space. Our analysis reveals that parametric singularities inevitably grow with gradient updates and further intensify alignment with representations, leading to increased singularities in the representation space. We show that the gradient Frobenius norms are bounded by the top singular values of the weight matrices, and as training progresses, the mutually reinforcing growth of weight and representation singularities, termed the curse of singularities, relaxes these bounds, escalating the risk of sharp loss explosions. To counter this, we propose Parametric Singularity Smoothing (PSS), a lightweight, flexible, and effective method for smoothing the singular spectra of weight matrices. Extensive experiments across diverse datasets, architectures, and optimizers demonstrate that PSS mitigates instability, restores trainability even after failure, and improves both training efficiency and generalization.

2602.01157 2026-02-16 cs.LG cs.AI

Deep Time-Series Models Meet Volatility: Multi-Horizon Electricity Price Forecasting in the Australian National Electricity Market

Mohammed Osman Gani, Zhipeng He, Chun Ouyang, Sara Khalifa

Comments 10 pages, 4 figures, 6 tables

详情
英文摘要

Accurate electricity price forecasting (EPF) is increasingly difficult in markets characterised by extreme volatility, frequent price spikes, and rapid structural shifts. Deep learning (DL) has been increasingly adopted in EPF due to its ability to achieve high forecasting accuracy. Recently, state-of-the-art (SOTA) deep time-series models have demonstrated promising performance across general forecasting tasks. Yet, their effectiveness in highly volatile electricity markets remains underexplored. Moreover, existing EPF studies rarely assess how model accuracy varies across intraday periods, leaving model sensitivity to market conditions unexplored. To address these gaps, this paper proposes an EPF framework that systematically evaluates SOTA deep time-series models using a direct multi-horizon forecasting approach across day-ahead and two-day-ahead settings. We conduct a comprehensive empirical study across all five regions of the Australian National Electricity Market using contemporary, high-volatility data. The results reveal a clear gap between time-series benchmark expectations and observed performance under real-world price volatility: recent deep time-series models often fail to surpass standard DL baselines. All models experience substantial degradation under extreme and negative prices, yet DL baselines often remain competitive. Intraday performance analysis further reveals that all evaluated models are consistently vulnerable to prevailing market conditions, where absolute errors peak during evening ramps, relative errors escalate during midday negative-price periods, and directional accuracy deteriorates sharply during abrupt shifts in price direction. These findings emphasise the need for volatility-aware modelling strategies and richer feature representations to advance EPF.

2602.01115 2026-02-16 cs.RO cs.CV

KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV

Zhihao Chen, Yiyuan Ge, Ziyang Wang

Comments Accepted By ICRA2026

详情
英文摘要

Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient time/channel mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform feature-wise nonlinear calibration of the action mapping on RWKV outputs. Moreover, we introduce an Action Consistency Regularization (ACR), a lightweight auxiliary loss that enforces alignment between predicted action trajectories and expert demonstrations via Euler extrapolation, providing additional supervision to stabilize training and improve policy precision. Without resorting to large UNets, our design reduces parameters by 86.8\%, maintains fast runtime, and achieves state-of-the-art success rates on Adroit, Meta-World, and DexArt benchmarks. Our project page can be viewed in \href{https://zhihaochen-2003.github.io/KAN-We-Flow.github.io/}{\textcolor{red}{link}}

2601.22977 2026-02-16 cs.AI

Quantifying Model Uniqueness in Heterogeneous AI Ecosystems

Lei You

详情
英文摘要

As AI systems evolve from isolated predictors into complex, heterogeneous ecosystems of foundation models and specialized adapters, distinguishing genuine behavioral novelty from functional redundancy becomes a critical governance challenge. Here, we introduce a statistical framework for auditing model uniqueness based on In-Silico Quasi-Experimental Design (ISQED). By enforcing matched interventions across models, we isolate intrinsic model identity and quantify uniqueness as the Peer-Inexpressible Residual (PIER), i.e. the component of a target's behavior strictly irreducible to any stochastic convex combination of its peers, with vanishing PIER characterizing when such a routing-based substitution becomes possible. We establish the theoretical foundations of ecosystem auditing through three key contributions. First, we prove a fundamental limitation of observational logs: uniqueness is mathematically non-identifiable without intervention control. Second, we derive a scaling law for active auditing, showing that our adaptive query protocol achieves minimax-optimal sample efficiency ($dσ^2γ^{-2}\log(Nd/δ)$). Third, we demonstrate that cooperative game-theoretic methods, such as Shapley values, fundamentally fail to detect redundancy. We implement this framework via the DISCO (Design-Integrated Synthetic Control) estimator and deploy it across diverse ecosystems, including computer vision models (ResNet/ConvNeXt/ViT), large language models (BERT/RoBERTa), and city-scale traffic forecasters. These results move trustworthy AI beyond explaining single models: they establish a principled, intervention-based science of auditing and governing heterogeneous model ecosystems.

2601.22620 2026-02-16 cs.CL

Layer-wise Swapping for Generalizable Multilingual Safety

Hyunseo Shin, Wonseok Hwang

Comments EACL 2026 main

详情
英文摘要

Despite the rapid advancements of Large Language Models (LLMs), safety risks remain a critical challenge for low-resource languages. Existing safety datasets are predominantly English centric, limiting progress in multilingual safety alignment. As a result, low resource expert models, finetuned on their respective instruction datasets, tend to exhibit higher unsafety rates compared to their high resource counterparts. In this work, we propose a safety aware layer swapping method that transfers safety alignment from an English safety expert to low resource language experts without additional training. To further enhance transfer ability, our method adaptively selects or blends modules based on their degree of specialization. Our approach preserves performance on general language understanding tasks while enhancing safety in the target languages. Experimental results show that the proposed method achieves comparable performance to the language expert on general benchmarks such as MMMLU, BELEBELE, and MGSM, while producing more aligned and less harmful responses on the MultiJail safety benchmark.

2601.21452 2026-02-16 cs.LG cs.AI

SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation

Yu Xie, Xing Kai Ren, Ying Qi, Hu Yao

Comments arXiv admin note: text overlap with arXiv:2506.19235

详情
英文摘要

Reinforcement learning-based preference optimization is increasingly used to align list-wise generative recommenders with complex, multi-objective user feedback, yet existing optimizers such as Gradient-Bounded Policy Optimization (GBPO) exhibit structural limitations in recommendation settings. We identify a Symmetric Conservatism failure mode in which symmetric update bounds suppress learning from rare positive signals (e.g., cold-start items), static negative-sample constraints fail to prevent diversity collapse under rejection-dominated feedback, and group-normalized multi-objective rewards lead to low-resolution training signals. To address these issues, we propose SAGE (Sequence-level Adaptive Gradient Evolution), a unified optimizer designed for list-wise generative recommendation. SAGE introduces sequence-level signal alignment via a geometric-mean importance ratio and a decoupled multi-objective advantage estimator to reduce token-level variance and mitigate reward collapse, together with asymmetric adaptive bounding that applies positive Boost updates to successful slates and an entropy-aware penalty to discourage low-diversity failures. Experiments on Amazon Product Reviews and the large-scale RecIF-Bench demonstrate consistent improvements in top-K accuracy, cold-start recall, and diversity across both Semantic-ID and native-text action spaces, while preserving numerical stability during training. These results suggest that asymmetric, sequence-aware policy optimization provides a principled and effective framework for addressing optimization failures in generative recommendation.

2601.20154 2026-02-16 cs.LG

Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning

Bo Dai, Na Li, Dale Schuurmans

Comments 43 pages, 3 figures

详情
英文摘要

Self-supervised learning (SSL) has improved empirical performance by unleashing the power of unlabeled data for practical applications. Specifically, SSL extracts the representation from massive unlabeled data, which will be transferred to a plenty of down streaming tasks with limited data. The significant improvement on diverse applications of representation learning has attracted increasing attention, resulting in a variety of dramatically different self-supervised learning objectives for representation extraction, with an assortment of learning procedures, but the lack of a clear and unified understanding. Such an absence hampers the ongoing development of representation learning, leaving a theoretical understanding missing, principles for efficient algorithm design unclear, and the use of representation learning methods in practice unjustified. The urgency for a unified framework is further motivated by the rapid growth in representation learning methods. In this paper, we are therefore compelled to develop a principled foundation of representation learning. We first theoretically investigate the sufficiency of the representation from a spectral representation view, which reveals the spectral essence of the existing successful SSL algorithms and paves the path to a unified framework for understanding and analysis. Such a framework work also inspires the development of more efficient and easy-to-use representation learning algorithms with principled way in real-world applications.

2601.16909 2026-02-16 cs.AI

Preventing the Collapse of Peer Review Requires Verification-First AI

Lei You, Lele Cao, Iryna Gurevych

详情
英文摘要

This paper argues that AI-assisted peer review should be verification-first rather than review-mimicking. We propose truth-coupling, i.e. how tightly venue scores track latent scientific truth, as the right objective for review tools. We formalize two forces that drive a phase transition toward proxy-sovereign evaluation: verification pressure, when claims outpace verification capacity, and signal shrinkage, when real improvements become hard to separate from noise. In a minimal model that mixes occasional high-fidelity checks with frequent proxy judgment, we derive an explicit coupling law and an incentive-collapse condition under which rational effort shifts from truth-seeking to proxy optimization, even when current decisions still appear reliable. These results motivate actions for tool builders and program chairs: deploy AI as an adversarial auditor that generates auditable verification artifacts and expands effective verification bandwidth, rather than as a score predictor that amplifies claim inflation.

2601.12357 2026-02-16 cs.CV cs.AI

SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence

Hailing Jin, Huiying Li

详情
英文摘要

Recent advances in semantic correspondence have been largely driven by the use of pre-trained large-scale models. However, a limitation of these approaches is their dependence on high-resolution input images to achieve optimal performance, which results in considerable computational overhead. In this work, we address a fundamental limitation in current methods: the irreversible fusion of adjacent keypoint features caused by deep downsampling operations. This issue is triggered when semantically distinct keypoints fall within the same downsampled receptive field (e.g., 16x16 patches). To address this issue, we present SimpleMatch, a simple yet effective framework for semantic correspondence that delivers strong performance even at low resolutions. We propose a lightweight upsample decoder that progressively recovers spatial detail by upsampling deep features to 1/4 resolution, and a multi-scale supervised loss that ensures the upsampled features retain discriminative features across different spatial scales. In addition, we introduce sparse matching and window-based localization to optimize training memory usage and reduce it by 51%. At a resolution of 252x252 (3.3x smaller than current SOTA methods), SimpleMatch achieves superior performance with 84.1% PCK@0.1 on the SPair-71k benchmark. We believe this framework provides a practical and efficient baseline for future research in semantic correspondence. Code is available at: https://github.com/hailong23-jin/SimpleMatch.

2601.09725 2026-02-16 cs.CL

Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation

Kaustubh Shivshankar Shejole, Sourabh Deoghare, Pushpak Bhattacharyya

详情
英文摘要

Neural Machine Translation (NMT) systems rely heavily on explicit punctuation cues to resolve semantic ambiguities in a source sentence. Inputting user-generated sentences, which are likely to contain missing or incorrect punctuation, results in fluent but semantically disastrous translations. This work attempts to highlight and address the problem of punctuation robustness of NMT systems through an English-to-Marathi translation. First, we introduce \textbf{\textit{Viram}}, a human-curated diagnostic benchmark of 54 punctuation-ambiguous English-Marathi sentence pairs to stress-test existing NMT systems. Second, we evaluate two simple remediation strategies: cascade-based \textit{restore-then-translate} and \textit{direct fine-tuning}. Our experimental results and analysis demonstrate that both strategies yield substantial NMT performance improvements. Furthermore, we find that current Large Language Models (LLMs) exhibit relatively poorer robustness in translating such sentences than these task-specific strategies, thus necessitating further research in this area. The code and dataset are available at https://github.com/KaustubhShejole/Viram_Marathi.

2512.23864 2026-02-16 cs.RO cs.CV

Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation

Guo Ye, Zexi Zhang, Xu Zhao, Shang Wu, Haoran Lu, Shihan Lu, Han Liu

详情
英文摘要

Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they struggle with contact-rich manipulation tasks that require reasoning about force, texture, and slip. While some approaches incorporate low-dimensional tactile signals, they fail to capture the high-resolution dynamics essential for such interactions. To address this limitation, we introduce DreamTacVLA, a framework that grounds VLA models in contact physics by learning to feel the future. Our model adopts a hierarchical perception scheme in which high-resolution tactile images serve as micro-vision inputs coupled with wrist-camera local vision and third-person macro vision. To reconcile these multi-scale sensory streams, we first train a unified policy with a Hierarchical Spatial Alignment (HSA) loss that aligns tactile tokens with their spatial counterparts in the wrist and third-person views. To further deepen the model's understanding of fine-grained contact dynamics, we finetune the system with a tactile world model that predicts future tactile signals. To mitigate tactile data scarcity and the wear-prone nature of tactile sensors, we construct a hybrid large-scale dataset sourced from both high-fidelity digital twin and real-world experiments. By anticipating upcoming tactile states, DreamTacVLA acquires a rich model of contact physics and conditions its actions on both real observations and imagined consequences. Across contact-rich manipulation tasks, it outperforms state-of-the-art VLA baselines, achieving up to 95% success, highlighting the importance of understanding physical contact for robust, touch-aware robotic agents.

2512.19135 2026-02-16 cs.AI

Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis

Chenghao Li, Chaoning Zhang, Yi Lu, Shuxu Chen, Xudong Wang, Jiaquan Zhang, Zhicheng Wang, Zhengxun Jin, Kuien Liu, Sung-Ho Bae, Guoqing Wang, Yang Yang, Heng Tao Shen

详情
英文摘要

With the development of large language models (LLMs), particularly with the introduction of the long reasoning chain technique, the reasoning ability of LLMs in complex problem-solving has been significantly enhanced. While acknowledging the power of long reasoning chains, we cannot help but wonder: Why do different reasoning chains perform differently in reasoning? What components of the reasoning chains play a key role? Existing studies mainly focus on evaluating reasoning chains from a functional perspective, with little attention paid to their structural mechanisms. To address this gap, this work is the first to analyze and evaluate the quality of the reasoning chain from a structural perspective. We apply persistent homology from Topological Data Analysis (TDA) to map reasoning steps into semantic space, extract topological features, and analyze structural changes. These changes reveal semantic coherence, logical redundancy, and identify logical breaks and gaps. By calculating homology groups, we assess connectivity and redundancy at various scales, using barcode and persistence diagrams to quantify stability and consistency. Our results show that the topological structural complexity of reasoning chains correlates positively with accuracy. More complex chains identify correct answers sooner, while successful reasoning exhibits simpler topologies, reducing redundancy and cycles, enhancing efficiency and interpretability. This work provides a new perspective on reasoning chain quality assessment and offers guidance for future optimization.

2512.17298 2026-02-16 cs.CV

ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

Fanpu Cao, Yaofo Chen, Zeng You, Wei Luo

Comments Accepted for poster presentation at AAAI 2026

详情
英文摘要

Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by exploiting temporal redundancy, existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation. In this work, we analyze the evolution of DiT features during denoising and reveal that both feature changes and error propagation are highly time- and depth-varying. Motivated by this, we propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components: (i) a constraint-aware caching pattern search module that generates non-uniform activation schedules through offline constrained sampling, tailored to the model's temporal characteristics; and (ii) a selective computation module that selectively computes within deep blocks and high-importance tokens for cached segments to mitigate error accumulation with minimal overhead. Extensive experiments on PixArt-alpha and DiT demonstrate that ProCache achieves up to 1.96x and 2.90x acceleration with negligible quality degradation, significantly outperforming prior caching-based methods.

2512.15052 2026-02-16 cs.CL cs.AI

SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification

Hongbo Wang, MaungMaung AprilPyone, Isao Echizen

详情
英文摘要

Disclaimer: Samples in this paper may be harmful and cause discomfort. Multimodal large language models (MLLMs) enable multimodal generation but inherit toxic, biased, and NSFW signals from weakly curated pretraining corpora, causing safety risks, especially under adversarial triggers that late, opaque training-free detoxification methods struggle to handle. We propose SGM, a white-box neuron-level multimodal intervention that acts like safety glasses for toxic neurons: it selectively recalibrates a small set of toxic expert neurons via expertise-weighted soft suppression, neutralizing harmful cross-modal activations without any parameter updates. We establish MM-TOXIC-QA, a multimodal toxicity evaluation framework, and compare SGM with existing detoxification techniques. Experiments on open-source MLLMs show that SGM mitigates toxicity in standard and adversarial conditions, cutting harmful rates from 48.2\% to 2.5\% while preserving fluency and multimodal reasoning. SGM is extensible, and its combined defenses, denoted as SGM*, integrate with existing detoxification methods for stronger safety performance, providing an interpretable, low-cost solution for toxicity-controlled multimodal generation.

2512.14908 2026-02-16 cs.LG

ATLAS: Adaptive Topology-based Learning at Scale for Homophilic and Heterophilic Graphs

Turja Kundu, Sanjukta Bhowmick

Comments Preprint

详情
英文摘要

Graph neural networks (GNNs) excel on homophilic graphs where connected nodes share labels, but struggle with heterophilic graphs where edges do not imply similarity. Moreover, iterative message passing limits scalability due to neighborhood expansion overhead. We introduce ATLAS (Adaptive Topology-based Learning at Scale), a propagation-free framework that encodes graph structure through multi-resolution community features rather than message passing. We first prove that community refinement involves a fundamental trade-off: finer partitions increase label-community mutual information but also increase entropy. We formalize when refinement improves normalized mutual information, explaining why intermediate granularities are often most predictive. ATLAS employs modularity-guided adaptive search to automatically identify informative community scales, which are one-hot encoded, projected into learnable embeddings, and concatenated with node attributes for MLP classification. This enables standard mini-batch training and adjacency-free inference after one-time preprocessing. Across 13 benchmarks including million-node graphs, ATLAS achieves competitive or superior accuracy, up to 20-point gains over GCN on heterophilic datasets and 12-point gains over MLPs on homophilic graphs. By treating topology as explicit features, ATLAS adapts intelligently: leveraging structure when informative, remaining robust when weakly aligned, and avoiding propagation when structure misleads, providing both scalable performance and interpretable structural insights.

2512.09654 2026-02-16 cs.LG

Membership and Dataset Inference Attacks on Large Audio Generative Models

Jakub Proboszcz, Paweł Kochanski, Karol Korszun, Donato Crisostomi, Giorgio Strano, Emanuele Rodolà, Kamil Deja, Jan Dubinski

Comments NeurIPS 2025 AI for Music Workshop NeurIPS 2025 Workshop on Creativity & Generative AI

详情
英文摘要

Generative audio models, based on diffusion and autoregressive architectures, have advanced rapidly in both quality and expressiveness. This progress, however, raises pressing copyright concerns, as such models are often trained on vast corpora of artistic and commercial works. A central question is whether one can reliably verify if an artist's material was included in training, thereby providing a means for copyright holders to protect their content. In this work, we investigate the feasibility of such verification through membership inference attacks (MIA) on open-source generative audio models, which attempt to determine whether a specific audio sample was part of the training set. Our empirical results show that membership inference alone is of limited effectiveness at scale, as the per-sample membership signal is weak for models trained on large and diverse datasets. However, artists and media owners typically hold collections of works rather than isolated samples. Building on prior work in text and vision domains, in this work we focus on dataset inference (DI), which aggregates diverse membership evidence across multiple samples. We find that DI is successful in the audio domain, offering a more practical mechanism for assessing whether an artist's works contributed to model training. Our results suggest DI as a promising direction for copyright protection and dataset accountability in the era of large audio generative models.

2512.07841 2026-02-16 cs.AI cs.PF

Impact of Data-Oriented and Object-Oriented Design on Performance and Cache Utilization with Artificial Intelligence Algorithms in Multi-Threaded CPUs

Gabriel M. Arantes, Giancarlo Lucca, Eduardo N. Borges, Richard F. Pinto, Bruno L. Dalmazo, Rafael A. Berri

Journal ref v. 1 n. 26 (2025): Revista Junior de Iniciação Científica em Ciências Exatas e Engenharia

详情
英文摘要

The growing performance gap between multi-core CPUs and main memory necessitates hardware-aware software design paradigms. This study provides a comprehensive performance analysis of Data Oriented Design (DOD) versus the traditional Object-Oriented Design (OOD), focusing on cache utilization and efficiency in multi-threaded environments. We developed and compared four distinct versions of the A* search algorithm: single-threaded OOD (ST-OOD), single-threaded DOD (ST-DOD), multi-threaded OOD (MT-OOD), and multi-threaded DOD (MT-DOD). The evaluation was based on metrics including execution time, memory usage, and CPU cache misses. In multi-threaded tests, the DOD implementation demonstrated considerable performance gains, with faster execution times and a lower number of raw system calls and cache misses. While OOD occasionally showed marginal advantages in memory usage or percentage-based cache miss rates, DOD's efficiency in data-intensive operations was more evident. Furthermore, our findings reveal that for a fine-grained task like the A* algorithm, the overhead associated with thread management led to single-threaded versions significantly outperforming their multi-threaded counterparts in both paradigms. We conclude that even when performance differences appear subtle in simple algorithms, the consistent advantages of DOD in critical metrics highlight its foundational architectural superiority, suggesting it is a more effective approach for maximizing hardware efficiency in complex, large-scale AI and parallel computing tasks.

2512.06630 2026-02-16 cs.LG quant-ph

Quantum Temporal Convolutional Neural Networks for Cross-Sectional Equity Return Prediction: A Comparative Benchmark Study

Chi-Sheng Chen, Xinyu Zhang, En-Jui Kuo, Rong Fu, Qiuzhe Xie, Fan Zhang

详情
英文摘要

Quantum machine learning offers a promising pathway for enhancing stock market prediction, particularly under complex, noisy, and highly dynamic financial environments. However, many classical forecasting models struggle with noisy input, regime shifts, and limited generalization capacity. To address these challenges, we propose a Quantum Temporal Convolutional Neural Network (QTCNN) that combines a classical temporal encoder with parameter-efficient quantum convolution circuits for cross-sectional equity return prediction. The temporal encoder extracts multi-scale patterns from sequential technical indicators, while the quantum processing leverages superposition and entanglement to enhance feature representation and suppress overfitting. We conduct a comprehensive benchmarking study on the JPX Tokyo Stock Exchange dataset and evaluate predictions through long-short portfolio construction using out-of-sample Sharpe ratio as the primary performance metric. QTCNN achieves a Sharpe ratio of 0.538, outperforming the best classical baseline by approximately 72\%. These results highlight the practical potential of quantum-enhanced forecasting model, QTCNN, for robust decision-making in quantitative finance.

2511.03383 2026-02-16 cs.CL

Segmentation Beyond Defaults: Asymmetrical Byte Pair Encoding for Optimal Machine Translation Performance

Saumitra Yadav, Manish Shrivastava

Comments Accepted at WAT 2025 (Camera-Ready Version)

Journal ref https://aclanthology.org/2025.wat-1.4/

详情
英文摘要

Existing Machine Translation (MT) research often suggests a single, fixed set of hyperparameters for word segmentation models, symmetric Byte Pair Encoding (BPE), which applies the same number of merge operations (NMO) to train tokenizers for both source and target languages. However, we demonstrate that this uniform approach doesn't guarantee optimal MT performance across different language pairs and data sizes. This work investigates BPE segmentation recipes across various data volumes and language pairs to evaluate MT system performance. We find that utilizing asymmetric BPE, where the source and target languages have different NMOs, significantly improves results over the symmetric approach, especially in low-resource settings (50K, 100K, and 500K sentence pairs). Specifically, asymmetric BPE yield statistically significant ($p<0.05$) average gains of 5.32, 4.46, and 0.7 CHRF++ on English-Hindi in low-resource setups (50K, 100K, and 500K sentence pairs, respectively). We validated this trend across six additional language pairs (English and Telugu, Shona, Norwegian, Kyrgyz, Hausa, and Inuktitut), observing statistically significant improvement in 10 out of 12 systems compared to symmetric BPE. Our findings indicate a high NMO for the source (4K to 32K) and a low NMO for the target (0.5K to 2K) provides optimal results, particularly benefiting low-resource MT.

2510.25926 2026-02-16 cs.LG

Active Learning with Task-Driven Representations for Messy Pools

Kianoosh Ashouritaklimi, Tom Rainforth

详情
英文摘要

Active learning has the potential to be especially useful for messy, uncurated pools where datapoints vary in relevance to the target task. However, state-of-the-art approaches to this problem currently rely on using fixed, unsupervised representations of the pool, focusing on modifying the acquisition function instead. We show that this model setup can undermine their effectiveness at dealing with messy pools, as such representations can fail to capture important information relevant to the task. To address this, we propose using task-driven representations that are periodically updated during the active learning process using the previously collected labels. We introduce two specific strategies for learning these representations, one based on directly learning semi-supervised representations and the other based on supervised fine-tuning of an initial unsupervised representation. We find that both significantly improve empirical performance over using unsupervised or pretrained representations.

2510.17406 2026-02-16 cs.LG eess.SP

Multi-Window Temporal Analysis for Enhanced Arrhythmia Classification: Leveraging Long-Range Dependencies in Electrocardiogram Signals

Tiezhi Wang, Wilhelm Haverkamp, Nils Strodthoff

Journal ref Physiol. Meas. 47 015010 (2026)

详情
英文摘要

Objective. Arrhythmia classification from electrocardiograms (ECGs) suffers from high false positive rates and limited cross-dataset generalization, particularly for atrial fibrillation (AF) detection where specificity ranges from 0.72 to 0.98 using conventional 30-s analysis windows. While most deep learning approaches analyze isolated 30-s ECG windows, many arrhythmias, including AF and atrial flutter, exhibit diagnostic features that emerge over extended time scales. Approach. We introduce S4ECG, a deep learning architecture based on structured state-space models (S4), designed to capture long-range temporal dependencies by jointly analyzing multiple consecutive ECG windows spanning up to 20 min. We evaluate S4ECG on four publicly available databases for multi-class arrhythmia classification and perform systematic cross-dataset evaluations to assess out-of-distribution robustness. Results. Multi-window analysis consistently outperforms single-window approaches across all datasets, improving macro-averaged AUROC by 1.0-11.6 percentage points. For AF, specificity increases from 0.718-0.979 to 0.967-0.998 at a fixed sensitivity threshold, yielding a 3-10-fold reduction in false positive rates. Significance. Compared with convolutional neural network baselines, the S4 architecture shows superior performance, and multi-window training substantially reduces cross-dataset degradation. Optimal diagnostic windows are 10-20 min, beyond which performance plateaus or degrades. These findings demonstrate that structured incorporation of extended temporal context enhances both arrhythmia classification accuracy and cross-dataset robustness. The identified optimal temporal windows provide practical guidance for ECG monitoring system design and may reflect underlying physiological timescales of arrhythmogenic dynamics.

2510.11834 2026-02-16 cs.LG cs.CL

Don't Walk the Line: Boundary Guidance for Filtered Generation

Sarah Ball, Andreas Haupt

Comments 14 pages, 3 figures, 10 tables

详情
英文摘要

Generative models are increasingly paired with safety classifiers that filter harmful or undesirable outputs. A common strategy is to fine-tune the generator to reduce the probability of being filtered, but this can be suboptimal: it often pushes the model toward producing samples near the classifier's decision boundary, increasing both false positives and false negatives. We propose Boundary Guidance, a reinforcement learning fine-tuning method that explicitly steers generation away from the classifier's margin. On a benchmark of jailbreak, ambiguous, and longcontext prompts, Boundary Guidance improves both the safety and the utility of outputs, as judged by LLM-as-a-Judge evaluations. Comprehensive ablations across model scales and reward designs demonstrate the robustness of our approach.

2510.09717 2026-02-16 cs.LG cs.AI

Provable Training Data Identification for Large Language Models

Zhenlong Liu, Hao Zeng, Weiran Huang, Hongxin Wei

详情
英文摘要

Identifying training data of large-scale models is critical for copyright litigation, privacy auditing, and ensuring fair evaluation. However, existing works typically treat this task as an instance-wise identification without controlling the error rate of the identified set, which cannot provide statistically reliable evidence. In this work, we formalize training data identification as a set-level inference problem and propose Provable Training Data Identification (PTDI), a distribution-free approach that enables provable and strict false identification rate control. Specifically, our method computes conformal p-values for each data point using a set of known unseen data and then develops a novel Jackknife-corrected Beta boundary (JKBB) estimator to estimate the training-data proportion of the test set, which allows us to scale these p-values. By applying the Benjamini-Hochberg (BH) procedure to the scaled p-values, we select a subset of data points with provable and strict false identification control. Extensive experiments across various models and datasets demonstrate that PTDI achieves higher power than prior methods while strictly controlling the FIR.

2510.07117 2026-02-16 cs.AI cs.LG

The Conditions of Physical Embodiment Enable Generalization and Care

Leonardo Christov-Moore, Arthur Juliani, Alex Kiefer, Joel Lehman, Nicco Reggente, B. Scot Rousse, Adam Safron, Nicolás Hinrichs, Daniel Polani, Antonio Damasio

Comments 15 pages, 1 figure

详情
英文摘要

As artificial agents enter open-ended physical environments -- eldercare, disaster response, and space missions -- they must persist under uncertainty while providing reliable care. Yet current systems struggle to generalize across distribution shifts and lack intrinsic motivation to preserve the well-being of others. Vulnerability and mortality are often seen as constraints to be avoided, yet organisms survive and provide care in an open-ended world with relative ease and efficiency. We argue that generalization and care arise from conditions of physical embodiment: being-in-the-world (the agent is a part of the environment) and being-towards-death (unless counteracted, the agent drifts toward terminal states). These conditions necessitate a homeostatic drive to maintain oneself and maximize the future capacity to continue doing so. Fulfilling this drive over long time horizons in multi-agent environments necessitates robust causal modeling of self and others' embodiment and jointly achievable future states. Because embodied agents are part of the environment, with the self delimited by reliable control, empowering others can expand self-boundaries, enabling other-regard. This provides a path from embodiment toward generalization and care based in shared constraints. We outline a reinforcement-learning framework for examining these questions. Homeostatic mortal agents continually learning in open-ended environments may offer efficient robustness and trustworthy alignment.

2510.01022 2026-02-16 cs.LG eess.SP stat.ML

VDW-GNNs: Vector diffusion wavelets for geometric graph neural networks

David R. Johnson, Alexander Sietsema, Rishabh Anand, Deanna Needell, Smita Krishnaswamy, Michael Perlmutter

Comments A previous, shorter version of this work was presented in the workshop "New Perspectives in Advancing Graph Machine Learning" at NeurIPS 2025

详情
英文摘要

We introduce vector diffusion wavelets (VDWs), a novel family of wavelets inspired by the vector diffusion maps algorithm that was introduced to analyze data lying in the tangent bundle of a Riemannian manifold. We show that these wavelets may be effectively incorporated into a family of geometric graph neural networks, which we refer to as VDW-GNNs. We demonstrate that such networks are effective on synthetic point cloud data, as well as on real-world data derived from wind-field measurements and neural activity data. Theoretically, we prove that these new wavelets have desirable frame theoretic properties, similar to traditional diffusion wavelets. Additionally, we prove that these wavelets have desirable symmetries with respect to rotations and translations.

2510.00602 2026-02-16 cs.LG cs.SY eess.SY

Multi-Agent Stage-wise Conservative Linear Bandits

Amirhossein Afsharrad, Ahmadreza Moradipari, Sanjay Lall

详情
英文摘要

In many real-world applications such as recommendation systems, multiple learning agents must balance exploration and exploitation while maintaining safety guarantees to avoid catastrophic failures. We study the stochastic linear bandit problem in a multi-agent networked setting where agents must satisfy stage-wise conservative constraints. A network of $N$ agents collaboratively maximizes cumulative reward while ensuring that the expected reward at every round is no less than $(1-α)$ times that of a baseline policy. Each agent observes local rewards with unknown parameters, but the network optimizes for the global parameter (average of local parameters). Agents communicate only with immediate neighbors, and each communication round incurs additional regret. We propose MA-SCLUCB (Multi-Agent Stage-wise Conservative Linear UCB), an episodic algorithm alternating between action selection and consensus-building phases. We prove that MA-SCLUCB achieves regret $\tilde{O}\left(\frac{d}{\sqrt{N}}\sqrt{T}\cdot\frac{\log(NT)}{\sqrt{\log(1/|λ_2|)}}\right)$ with high probability, where $d$ is the dimension, $T$ is the horizon, and $|λ_2|$ is the network's second largest eigenvalue magnitude. Our analysis shows: (i) collaboration yields $\frac{1}{\sqrt{N}}$ improvement despite local communication, (ii) communication overhead grows only logarithmically for well-connected networks, and (iii) stage-wise safety adds only lower-order regret. Thus, distributed learning with safety guarantees achieves near-optimal performance in reasonably connected networks.

2509.25889 2026-02-16 cs.CV cs.CL

Multimodal LLM With Hierarchical Mixture-of-Experts for VQA on 3D Brain MRI

Arvind Murari Vepa, Yannan Yu, Jingru Gan, Anthony Cuturrufo, Michael F. Romano, Weikai Li, Fabien Scalzo, Wei Wang, Yizhou Sun

Comments 17 pages, 3 figures

详情
英文摘要

Multiparametric 3D brain MRI (mpMRI) is central to neuroradiology, but producing tumor location, appearance, size, and involvement of critical structures for neurosurgical planning remains challenging. We introduce mpLLM, a multimodal LLM for visual question answering (VQA) on mpMRI that produces clinically interpretable tumor descriptors (e.g., volume, morphology, extent, and coarse localization) as an adjunct to clinical expertise for referring neurosurgeons. mpLLM uses a prompt-conditioned hierarchical mixture-of-experts (MoE) to fuse multiple 3D sequences via routing over modality- and token-level projection experts, enabling data-efficient end-to-end training without large-scale image-report pretraining. To address limited paired image-text supervision, we propose a synthetic VQA protocol that derives clinically grounded questions and answers from expert segmentation annotations and is validated with radiologist collaboration. Across multiple mpMRI datasets, mpLLM improves over strong medical VLM baselines by +5.5 points on average (+9.1% relative) and increases radiologist-rated clinical acceptability by +15.9 points (+46.6% relative). Our study features three main contributions: (1) the first VQA dataset for 3D brain mpMRI, (2) a hierarchical MoE architecture for joint reasoning over interrelated 3D sequences, and (3) expert-supported evidence of clinical utility. Source code is available at https://github.com/arvindmvepa/mpllm, and we will release the dataset upon publication.

2509.21370 2026-02-16 cs.RO cs.CV

Language-in-the-Loop Culvert Inspection on the Erie Canal

Yash Turkar, Yashom Dighe, Karthik Dantu

Comments First two authors contributed equally

详情
英文摘要

Culverts on canals such as the Erie Canal, built originally in 1825, require frequent inspections to ensure safe operation. Human inspection of culverts is challenging due to age, geometry, poor illumination, weather, and lack of easy access. We introduce VISION, an end-to-end, language-in-the-loop autonomy system that couples a web-scale vision-language model (VLM) with constrained viewpoint planning for autonomous inspection of culverts. Brief prompts to the VLM solicit open-vocabulary ROI proposals with rationales and confidences, stereo depth is fused to recover scale, and a planner -- aware of culvert constraints -- commands repositioning moves to capture targeted close-ups. Deployed on a quadruped in a culvert under the Erie Canal, VISION closes the see, decide, move, re-image loop on-board and produces high-resolution images for detailed reporting without domain-specific fine-tuning. In an external evaluation by New York Canal Corporation personnel, initial ROI proposals achieved 61.4\% agreement with subject-matter experts, and final post-re-imaging assessments reached 80\%, indicating that VISION converts tentative hypotheses into grounded, expert-aligned findings.

2509.18401 2026-02-16 cs.CL

Evaluating the Creativity of LLMs in Persian Literary Text Generation

Armin Tourajmehr, Mohammad Reza Modarres, Yadollah Yaghoobzadeh

Journal ref In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 14762-14774, Suzhou, China. Association for Computational Linguistics

详情
英文摘要

Large language models (LLMs) have demonstrated notable creative abilities in generating literary texts, including poetry and short stories. However, prior research has primarily centered on English, with limited exploration of non-English literary traditions and without standardized methods for assessing creativity. In this paper, we evaluate the capacity of LLMs to generate Persian literary text enriched with culturally relevant expressions. We build a dataset of user-generated Persian literary spanning 20 diverse topics and assess model outputs along four creativity dimensions-originality, fluency, flexibility, and elaboration-by adapting the Torrance Tests of Creative Thinking. To reduce evaluation costs, we adopt an LLM as a judge for automated scoring and validate its reliability against human judgments using intraclass correlation coefficients, observing strong agreement. In addition, we analyze the models' ability to understand and employ four core literary devices: simile, metaphor, hyperbole, and antithesis. Our results highlight both the strengths and limitations of LLMs in Persian literary text generation, underscoring the need for further refinement.