arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2797
专题追踪
2602.01465 2026-02-10 cs.AI cs.SE

Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering

Nikita Benkovich, Vitalii Valkov

详情
英文摘要

Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-based process. In contrast, real-world software development is organized as a collaborative activity carried out by teams following shared methodologies, with clear role separation, communication, and review. In this work, we present a fully automated multi-agent system that explicitly models software engineering as an organizational process, replicating the structure of an engineering team. Built on top of agyn, an open-source platform for configuring agent teams, our system assigns specialized agents to roles such as coordination, research, implementation, and review, provides them with isolated sandboxes for experimentation, and enables structured communication. The system follows a defined development methodology for working on issues, including analysis, task specification, pull request creation, and iterative review, and operates without any human intervention. Importantly, the system was designed for real production use and was not tuned for SWE-bench. When evaluated post hoc on SWE-bench 500, it resolves 72.2% of tasks, outperforming single-agent baselines using comparable language models. Our results suggest that replicating team structure, methodology, and communication is a powerful paradigm for autonomous software engineering, and that future progress may depend as much on organizational design and agent infrastructure as on model improvements.

2602.01189 2026-02-10 cs.RO cs.SY eess.SY

SPOT: Spatio-Temporal Obstacle-free Trajectory Planning for UAVs in an Unknown Dynamic Environment

Astik Srivastava, Thomas J Chackenkulam, Bitla Bhanu Teja, Antony Thomas, Madhava Krishna

Comments Accepted for publication at ICRA 2026

详情
英文摘要

We address the problem of reactive motion planning for quadrotors operating in unknown environments with dynamic obstacles. Our approach leverages a 4-dimensional spatio-temporal planner, integrated with vision-based Safe Flight Corridor (SFC) generation and trajectory optimization. Unlike prior methods that rely on map fusion, our framework is mapless, enabling collision avoidance directly from perception while reducing computational overhead. Dynamic obstacles are detected and tracked using a vision-based object segmentation and tracking pipeline, allowing robust classification of static versus dynamic elements in the scene. To further enhance robustness, we introduce a backup planning module that reactively avoids dynamic obstacles when no direct path to the goal is available, mitigating the risk of collisions during deadlock situations. We validate our method extensively in both simulation and real-world hardware experiments, and benchmark it against state-of-the-art approaches, showing significant advantages for reactive UAV navigation in dynamic, unknown environments.

2602.01100 2026-02-10 cs.RO

StreamVLA: Breaking the Reason-Act Cycle via Completion-State Gating

Tongqing Chen, Hang Wu, Jiasen Wang, Xiaotao Li, Lu Fang

详情
英文摘要

Long-horizon robotic manipulation requires bridging the gap between high-level planning (System 2) and low-level control (System 1). Current Vision-Language-Action (VLA) models often entangle these processes, performing redundant multimodal reasoning at every timestep, which leads to high latency and goal instability. To address this, we present StreamVLA, a dual-system architecture that unifies textual task decomposition, visual goal imagination, and continuous action generation within a single parameter-efficient backbone. We introduce a "Lock-and-Gated" mechanism to intelligently modulate computation: only when a sub-task transition is detected, the model triggers slow thinking to generate a textual instruction and imagines the specific visual completion state, rather than generic future frames. Crucially, this completion state serves as a time-invariant goal anchor, making the policy robust to execution speed variations. During steady execution, these high-level intents are locked to condition a Flow Matching action head, allowing the model to bypass expensive autoregressive decoding for 72% of timesteps. This hierarchical abstraction ensures sub-goal focus while significantly reducing inference latency. Extensive evaluations demonstrate that StreamVLA achieves state-of-the-art performance, with a 98.5% success rate on the LIBERO benchmark and robust recovery in real-world interference scenarios, achieving a 48% reduction in latency compared to full-reasoning baselines.

2602.00612 2026-02-10 cs.CL

Lookahead-then-Verify: Reliable Constrained Decoding for Diffusion LLMs under Context-Free Grammars

Yitong Zhang, Yongmin Li, Yuetong Liu, Jia Li, Xiaoran Jia, Zherui Li, Ge Li

详情
英文摘要

Diffusion Large Language Models (dLLMs) have demonstrated promising generative capabilities and are increasingly used to produce formal languages defined by context-free grammars, such as source code and chemical expressions. However, as probabilistic models, they still struggle to generate syntactically valid outputs reliably. A natural and promising direction to address this issue is to adapt constrained decoding techniques to enforce grammatical correctness during generation. However, applying these techniques faces two primary obstacles. On the one hand, the non-autoregressive nature of dLLMs renders most existing constrained decoding approaches inapplicable. On the other hand, current approaches specifically designed for dLLMs may allow intermediate outputs that are impossible to complete into valid sentences, which significantly limits their reliability in practice. To address these challenges, we present LAVE, a constrained decoding approach specifically designed for dLLMs. Our approach leverages a key property of dLLMs, namely their ability to predict token distributions for all positions in parallel during each forward pass. Whenever a new token is proposed by model, LAVE performs lookahead using these distributions to efficiently and reliably verify the validity of the proposed token. This design ensures reliable constraints by reliably preserving the potential for intermediate outputs to be extended into valid sentences. Extensive experiments across four widely used dLLMs and three representative benchmarks demonstrate that LAVE consistently outperforms existing baselines and achieves substantial improvements in syntactic correctness, while incurring negligible runtime overhead.

2602.00030 2026-02-10 cs.LG

RAPTOR-AI for Disaster OODA Loop: Hierarchical Multimodal RAG with Experience-Driven Agentic Decision-Making

Takato Yasuno

Comments 8 pages, 3 figures, 2 tables

详情
英文摘要

Humanitarian Assistance and Disaster Relief (HADR) operations demand rapid synthesis of multimodal information for time-critical decision-making under extreme uncertainty. Traditional information systems struggle with the fragmented, multimodal nature of disaster data and lack adaptive reasoning capabilities essential for dynamic emergency contexts. This work introduces RAPTOR-AI, an agentic multimodal Retrieval-Augmented Generation (RAG) framework that advances beyond conventional static knowledge bases by implementing dynamic, experience-driven decision support for disaster response. The system addresses HADR requirements across initial rescue, recovery, and reconstruction phases through three key innovations: hierarchical multimodal knowledge construction from diverse sources (textual reports, aerial imagery, historical documentation), entropy-aware agentic control that dynamically selects optimal retrieval strategies based on situational context, and experiential knowledge integration using LoRA adaptation for both expert and non-expert responders. The framework constructs hierarchical knowledge trees from 46 tsunami-related PDFs (2,378 pages) using BLIP-based image understanding, ColVBERT embeddings, and long-context summarization within the OODA loop (Observe, Orient, Decide, Act) tactical framework. Experiments demonstrate significant improvements over existing approaches: 23\% improvement in retrieval precision, 31\% better situational grounding, and 27\% enhanced task decomposition accuracy, with efficient scaling up to 3,000 document chunks.

2601.22636 2026-02-10 cs.AI

Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling

Mingqian Feng, Xiaodong Liu, Weiwei Yang, Chenliang Xu, Christopher White, Jianfeng Gao

详情
英文摘要

Large Language Models (LLMs) are typically evaluated for safety under single-shot or low-budget adversarial prompting, which underestimates real-world risk. In practice, attackers can exploit large-scale parallel sampling to repeatedly probe a model until a harmful response is produced. While recent work shows that attack success increases with repeated sampling, principled methods for predicting large-scale adversarial risk remain limited. We propose a scaling-aware Best-of-N estimation of risk, SABER, for modeling jailbreak vulnerability under Best-of-N sampling. We model sample-level success probabilities using a Beta distribution, the conjugate prior of the Bernoulli distribution, and derive an analytic scaling law that enables reliable extrapolation of large-N attack success rates from small-budget measurements. Using only n=100 samples, our anchored estimator predicts ASR@1000 with a mean absolute error of 1.66, compared to 12.04 for the baseline, which is an 86.2% reduction in estimation error. Our results reveal heterogeneous risk scaling profiles and show that models appearing robust under standard evaluation can experience rapid nonlinear risk amplification under parallel adversarial pressure. This work provides a low-cost, scalable methodology for realistic LLM safety assessment. We will release our code and evaluation scripts upon publication to future research.

2601.22527 2026-02-10 cs.CL

$ρ$-$\texttt{EOS}$: Training-free Bidirectional Variable-Length Control for Masked Diffusion LLMs

Jingyi Yang, Yuxian Jiang, Jing Shao

Comments 11 pages,6 figures,6 tables

详情
英文摘要

Beyond parallel generation and global context modeling, current masked diffusion large language models (masked dLLMs, i.e., LLaDA) suffer from a fundamental limitation: they require a predefined, fixed generation length, which lacks flexibility and forces an inevitable trade-off between output quality and computational efficiency. To address this, we study the denoising dynamics and find that the implicit density ($ρ$) of end-of-sequence ($\texttt{EOS}$) tokens serves as a reliable signal of generation sufficiency. In particular, the evolving implicit $\texttt{EOS}$ density during denoising reveals whether the current masked space is excessive or insufficient, thereby guiding the adjustment direction for generation length. Building on this insight, we propose $\textbf{$ρ$-$\texttt{EOS}$}$, a training-free, single-stage strategy that enables bidirectional variable-length generation for masked dLLMs. Unlike prior two-stage approaches--which require separate length adjustment and iterative mask insertion phases while supporting only unidirectional expansion--$\textbf{$ρ$-$\texttt{EOS}$}$ achieves bidirectional length adjustment within a unified denoising process by continuously estimating the implicit $\texttt{EOS}$ density: excessively high density triggers $\texttt{MASK}$ token contraction, while insufficient density induces expansion. Extensive experiments on mathematics and code benchmarks demonstrate that $\textbf{$ρ$-$\texttt{EOS}$}$ achieves comparable performance while substantially improving inference efficiency and token utilization. Code is available at https://github.com/yjyddq/rho-EOS.

2601.22488 2026-02-10 cs.LG

Elastic Spectral State Space Models for Budgeted Inference

Dachuan Song, Xuan Wang

Comments Minor update: added code repository link

详情
英文摘要

Foundation models are typically trained at a fixed computational capacity, while real-world applications require deployment across platforms with different resource constraints. Current approaches usually rely on training families of model variants or model distillation, which requires additional training and supports only a pre-selected set of sizes rather than fine-grained adaptation at runtime. In this paper, we propose Elastic Spectral State Space Models (ES-SSM), which require only one-time training at full capacity, but can be directly truncated into arbitrary scales for budgeted, runtime inference without retraining. Our ES-SSM builds on Hankel spectral filtering over a state space model (SSM), coupled with a lightweight input-adaptive gate trained under randomized spectral budgets. Using a shared masked normalization rule over the ordered spectral channels, we encourage predictive capability to concentrate in low-index components, while higher-index components act primarily as refinement. We test our algorithm across long-sequence benchmarks spanning text, logic, retrieval, vision, and audio. We demonstrate that a single ES-SSM model trained once can be truncated to provide competitive performance compared with modern Transformer and SSM baselines at similar parameter scales. Furthermore, by testing under various runtime budgets, we observe smooth and stable budget-performance curves over a wide range of truncation levels.

2601.21760 2026-02-10 cs.AI

Zero-Shot Statistical Downscaling via Diffusion Posterior Sampling

Ruian Tie, Wenbo Xiong, Zhengyu Shi, Xinyu Su, Chenyu jiang, Libo Wu, Hao Li

详情
英文摘要

Conventional supervised climate downscaling struggles to generalize to Global Climate Models (GCMs) due to the lack of paired training data and inherent domain gaps relative to reanalysis. Meanwhile, current zero-shot methods suffer from physical inconsistencies and vanishing gradient issues under large scaling factors. We propose Zero-Shot Statistical Downscaling (ZSSD), a zero-shot framework that performs statistical downscaling without paired data during training. ZSSD leverages a Physics-Consistent Climate Prior learned from reanalysis data, conditioned on geophysical boundaries and temporal information to enforce physical validity. Furthermore, to enable robust inference across varying GCMs, we introduce Unified Coordinate Guidance. This strategy addresses the vanishing gradient problem in vanilla DPS and ensures consistency with large-scale fields. Results show that ZSSD significantly outperforms existing zero-shot baselines in 99th percentile errors and successfully reconstructs complex weather events, such as tropical cyclones, across heterogeneous GCMs.

2601.21239 2026-02-10 cs.AI

TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design

Chentong Chen, Mengyuan Zhong, Ye Fan, Jialong Shi, Jianyong Sun

详情
英文摘要

Although Large Language Models have advanced Automated Heuristic Design, treating algorithm evolution as a monolithic text generation task overlooks the coupling between discrete algorithmic structures and continuous numerical parameters. Consequently, existing methods often discard promising algorithms due to uncalibrated constants and suffer from premature convergence resulting from simple similarity metrics. To address these limitations, we propose TIDE, a Tuning-Integrated Dynamic Evolution framework designed to decouple structural reasoning from parameter optimization. TIDE features a nested architecture where an outer parallel island model utilizes Tree Similarity Edit Distance to drive structural diversity, while an inner loop integrates LLM-based logic generation with a differential mutation operator for parameter tuning. Additionally, a UCB-based scheduler dynamically prioritizes high-yield prompt strategies to optimize resource allocation. Extensive experiments across nine combinatorial optimization problems demonstrate that TIDE discovers heuristics that significantly outperform state-of-the-art baselines in solution quality while achieving improved search efficiency and reduced computational costs.

2601.21183 2026-02-10 cs.AI cs.LG

Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models

Jacek Duszenko

详情
英文摘要

Reasoning models frequently agree with incorrect user suggestions -- a behavior known as sycophancy. However, it is unclear where in the reasoning trace this agreement originates and how strong the commitment is. We introduce \emph{sycophantic anchors} -- sentences identified via counterfactual analysis that commit models to user agreement. Across four reasoning models spanning three architecture families (Llama, Qwen, Falcon-hybrid) and 1.5B--8B parameters, we analyze over 200,000 counterfactual rollouts and show that linear probes reliably detect sycophantic anchors (74--85\% balanced accuracy), outperforming text-only baselines at high commitment levels -- confirming they capture internal states beyond surface vocabulary. Regressors further predict commitment strength from activations ($R^2$ up to 0.74). We observe a consistent asymmetry: sycophancy leaves a stronger mechanistic footprint than correct reasoning. We also find that sycophancy builds gradually during generation rather than being determined by the prompt. These findings enable sentence-level detection and quantification of model misalignment mid-inference.

2601.20232 2026-02-10 cs.CV

Visual Prompt-Agnostic Evolution

Junze Wang, Lei Fan, Dezheng Zhang, Weipeng Jing, Donglin Di, Yang Song, Sidong Liu, Cong Cong

Comments Accepted by ICLR 2026

详情
英文摘要

Visual Prompt Tuning (VPT) adapts a frozen Vision Transformer (ViT) to downstream tasks by inserting a small number of learnable prompt tokens into the token sequence at each layer. However, we observe that existing VPT variants often suffer from unstable training dynamics, characterized by gradient oscillations. A layer-wise analysis reveals that shallow-layer prompts tend to stagnate early, while deeper-layer prompts exhibit high-variance oscillations, leading to cross-layer mismatch. These issues slow convergence and degrade final performance. To address these challenges, we propose Prompt-Agnostic Evolution ($\mathtt{PAE}$), which strengthens vision prompt tuning by explicitly modeling prompt dynamics. From a frequency-domain perspective, we initialize prompts in a task-aware direction by uncovering and propagating frequency shortcut patterns that the backbone inherently exploits for recognition. To ensure coherent evolution across layers, we employ a shared Koopman operator that imposes a global linear transformation instead of uncoordinated, layer-specific updates. Finally, inspired by Lyapunov stability theory, we introduce a regularizer that constrains error amplification during evolution. Extensive experiments show that $\mathtt{PAE}$ accelerates convergence with an average $1.41\times$ speedup and improves accuracy by 1-3% on 25 datasets across multiple downstream tasks. Beyond performance, $\mathtt{PAE}$ is prompt-agnostic and lightweight, and it integrates seamlessly with diverse VPT variants without backbone modification or inference-time changes.

2601.18588 2026-02-10 cs.AI cs.CL cs.LG

Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs

Xianzhe Meng, Qiangsheng Zeng, Ling Luo, Qinghan Yang, Jiarui Hao, Wenbo Wu, Qinyu Wang, Rui Yin, Lin Qi, Renzhi Lu

详情
英文摘要

Training stability is typically regarded as a prerequisite for reliable optimization in large language models. In this work, we analyze how stabilizing training dynamics affects the induced generation distribution. We show that under standard maximum likelihood training, stable parameter trajectories lead stationary solutions to approximately minimize the forward KL divergence to the empirical distribution, while implicitly reducing generative entropy. As a consequence, the learned model can concentrate probability mass on a limited subset of empirical modes, exhibiting systematic degeneration despite smooth loss convergence. We empirically validate this effect using a controlled feedback-based training framework that stabilizes internal generation statistics, observing consistent low-entropy outputs and repetitive behavior across architectures and random seeds. It indicates that optimization stability and generative expressivity are not inherently aligned, and that stability alone is an insufficient indicator of generative quality.

2601.18012 2026-02-10 cs.CL cs.AI

Evaluating Semantic and Syntactic Understanding in Large Language Models for Payroll Systems

Hendrika Maclean, Mert Can Cakmak, Muzakkiruddin Ahmed Mohammed, Shames Al Mandalawi, John Talburt

Comments ITNG 2026 conference

详情
英文摘要

Large language models are now used daily for writing, search, and analysis, and their natural language understanding continues to improve. However, they remain unreliable on exact numerical calculation and on producing outputs that are straightforward to audit. We study synthetic payroll system as a focused, high-stakes example and evaluate whether models can understand a payroll schema, apply rules in the right order, and deliver cent-accurate results. Our experiments span a tiered dataset from basic to complex cases, a spectrum of prompts from minimal baselines to schema-guided and reasoning variants, and multiple model families including GPT, Claude, Perplexity, Grok and Gemini. Results indicate clear regimes where careful prompting is sufficient and regimes where explicit computation is required. The work offers a compact, reproducible framework and practical guidance for deploying LLMs in settings that demand both accuracy and assurance.

2601.09172 2026-02-10 cs.LG

BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning

Pengyang Shao, Naixin Zhai, Lei Chen, Yonghui Yang, Fengbin Zhu, Xun Yang, Meng Wang

详情
英文摘要

As Large Language Models (LLMs) increasingly shape online content, removing targeted information from well-trained LLMs (also known as LLM unlearning) has become critical for web governance. A key challenge lies in sample-wise imbalance within the forget set: different samples exhibit widely varying unlearning difficulty, leading to asynchronous forgetting where some knowledge remains insufficiently erased while others become over-forgotten. To address this, we propose BalDRO, a novel and efficient framework for balanced LLM unlearning. BalDRO formulates unlearning as a min-sup process: an inner step identifies a worst-case data distribution that emphasizes hard-to-unlearn samples, while an outer step updates model parameters under this distribution. We instantiate BalDRO via two efficient variants: BalDRO-G, a discrete GroupDRO-based approximation focusing on high-loss subsets, and BalDRO-DV, a continuous Donsker-Varadhan dual method enabling smooth adaptive weighting within standard training pipelines. Experiments on TOFU and MUSE show that BalDRO significantly improves both forgetting quality and model utility over existing methods, and we release code for reproducibility.

2601.07941 2026-02-10 cs.CV cs.AI

Moonworks Lunara Aesthetic Dataset

Yan Wang, Sayeef Abdullah, Partho Hassan, Sabit Hassan

详情
英文摘要

The dataset spans diverse artistic styles, including regionally grounded aesthetics from the Middle East, Northern Europe, East Asia, and South Asia, alongside general categories such as sketch and oil painting. All images are generated using the Moonworks Lunara model and intentionally crafted to embody distinct, high-quality aesthetic styles, yielding a first-of-its-kind dataset with substantially higher aesthetic scores, exceeding even aesthetics-focused datasets, and general-purpose datasets by a larger margin. Each image is accompanied by a human-refined prompt and structured annotations that jointly describe salient objects, attributes, relationships, and stylistic cues. Unlike large-scale web-derived datasets that emphasize breadth over precision, the Lunara Aesthetic Dataset prioritizes aesthetic quality, stylistic diversity, and licensing transparency, and is released under the Apache 2.0 license to support research and unrestricted academic and commercial use.

2601.07760 2026-02-10 cs.LG cs.NA math.NA

Free-RBF-KAN: Kolmogorov-Arnold Networks with Adaptive Radial Basis Functions for Efficient Function Learning

Shao-Ting Chiu, Siu Wun Cheung, Ulisses Braga-Neto, Chak Shing Lee, Rui Peng Li

详情
英文摘要

Kolmogorov-Arnold Networks (KANs) offer a promising framework for approximating complex nonlinear functions, yet the original B-spline formulation suffers from significant computational overhead due to De Boor algorithm. While recent RBF-based variants improve efficiency, they often sacrifice the approximation accuracy inherent in the original spline-based design. To bridge this gap, we propose Free-RBF-KAN, an architecture that integrates adaptive learning grids and trainable smoothness parameters to enable expressive, high-resolution function approximation. Our method utilizes learnable RBF shapes that dynamically align with activation patterns, and we provide the first formal universal approximation proof for the RBF-KAN family. Empirical evaluations across multiscale regression, physics-informed PDEs, and operator learning demonstrate that Free-RBF-KAN can achieve accuracy comparable to its B-spline counterparts while delivering significantly faster training and inference. These results establish Free-RBF-KAN as an efficient and adaptive alternative for high-dimensional structured modeling tasks.

2601.07553 2026-02-10 cs.AI

VirtualEnv: A Platform for Embodied AI Research

Kabir Swain, Sijie Han, Ayush Raina, Jin Zhang, Shuang Li, Michael Stopa, Antonio Torralba

详情
英文摘要

As large language models (LLMs) continue to improve in reasoning and decision-making, there is a growing need for realistic and interactive environments where their abilities can be rigorously evaluated. We present VirtualEnv, a next-generation simulation platform built on Unreal Engine 5 that enables fine-grained benchmarking of LLMs in embodied and interactive scenarios. VirtualEnv supports rich agent-environment interactions, including object manipulation, navigation, and adaptive multi-agent collaboration, as well as game-inspired mechanics like escape rooms and procedurally generated environments. We provide a user-friendly API built on top of Unreal Engine, allowing researchers to deploy and control LLM-driven agents using natural language instructions. We integrate large-scale LLMs and vision-language models (VLMs), such as GPT-based models, to generate novel environments and structured tasks from multimodal inputs. Our experiments benchmark the performance of several popular LLMs across tasks of increasing complexity, analyzing differences in adaptability, planning, and multi-agent coordination. We also describe our methodology for procedural task generation, task validation, and real-time environment control. VirtualEnv is released as an open-source platform, we aim to advance research at the intersection of AI and gaming, enable standardized evaluation of LLMs in embodied AI settings, and pave the way for future developments in immersive simulations and interactive entertainment.

2601.06641 2026-02-10 cs.LG cs.CR

Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning

Quan Minh Nguyen, Min-Seon Kim, Hoang M. Ngo, Trong Nghia Hoang, Hyuk-Yoon Kwon, My T. Thai

详情
英文摘要

Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific data sample. While defenses against membership inference attacks in standard FL have been well studied, the recent shift toward federated fine-tuning has introduced new, largely unexplored attack surfaces. To highlight this vulnerability in the emerging FL paradigm, we demonstrate that federated prompt-tuning, which adapts pre-trained models with small input prefixes to improve efficiency, also exposes a new vector for privacy attacks. We propose PromptMIA, a membership inference attack tailored to federated prompt-tuning, in which a malicious server can insert adversarially crafted prompts and monitors their updates during collaborative training to accurately determine whether a target data point is in a client's private dataset. We formalize this threat as a security game and empirically show that PromptMIA consistently attains high advantage in this game across diverse benchmark datasets. Our theoretical analysis further establishes a lower bound on the attack's advantage which explains and supports the consistently high advantage observed in our empirical results. We also investigate the effectiveness of standard membership inference defenses originally developed for gradient or output based attacks and analyze their interaction with the distinct threat landscape posed by PromptMIA. The results highlight non-trivial challenges for current defenses and offer insights into their limitations, underscoring the need for defense strategies that are specifically tailored to prompt-tuning in federated settings.

2601.05814 2026-02-10 cs.LG

A Dual Pipeline Machine Learning Framework for Automated Multi Class Sleep Disorder Screening Using Hybrid Resampling and Ensemble Learning

Md Sultanul Islam Ovi, Muhsina Tarannum Munfa, G. M. M Miftahul Alam Adib, Syed Sabbir Hasan

Comments 32 pages, 5 figures, 14 tables

详情
英文摘要

Accurate classification of sleep disorders, particularly insomnia and sleep apnea, is important for reducing long term health risks and improving patient quality of life. However, clinical sleep studies are resource intensive and are difficult to scale for population level screening. This paper presents a Dual Pipeline Machine Learning Framework for multi class sleep disorder screening using the Sleep Health and Lifestyle dataset. The framework consists of two parallel processing streams: a statistical pipeline that targets linear separability using Mutual Information and Linear Discriminant Analysis, and a wrapper based pipeline that applies Boruta feature selection with an autoencoder for non linear representation learning. To address class imbalance, we use the hybrid SMOTETomek resampling strategy. In experiments, Extra Trees and K Nearest Neighbors achieved an accuracy of 98.67%, outperforming recent baselines on the same dataset. Statistical testing using the Wilcoxon Signed Rank Test indicates that the improvement over baseline configurations is significant, and inference latency remains below 400 milliseconds. These results suggest that the proposed dual pipeline design supports accurate and efficient automated screening for non invasive sleep disorder risk stratification.

2601.03510 2026-02-10 cs.CV

G2P: Gaussian-to-Point Attribute Alignment for Boundary-Aware 3D Semantic Segmentation

Hojun Song, Chae-yeong Song, Jeong-hun Hong, Chaewon Moon, Dong-hwi Kim, Gahyeon Kim, Soo Ye Kim, Yiyi Liao, Jaehyup Lee, Sang-hyo Park

Comments Preprint. Under review

详情
英文摘要

Semantic segmentation on point clouds is critical for 3D scene understanding. However, sparse and irregular point distributions provide limited appearance evidence, making geometry-only features insufficient to distinguish objects with similar shapes but distinct appearances (e.g., color, texture, material). We propose Gaussian-to-Point (G2P), which transfers appearance-aware attributes from 3D Gaussian Splatting to point clouds for more discriminative and appearance-consistent segmentation. Our G2P address the misalignment between optimized Gaussians and original point geometry by establishing point-wise correspondences. By leveraging Gaussian opacity attributes, we resolve the geometric ambiguity that limits existing models. Additionally, Gaussian scale attributes enable precise boundary localization in complex 3D scenes. Extensive experiments demonstrate that our approach achieves superior performance on standard benchmarks and shows significant improvements on geometrically challenging classes, all without any 2D or language supervision.

2601.03322 2026-02-10 cs.LG cs.AI

HEEGNet: Hyperbolic Embeddings for EEG

Shanglin Li, Shiwen Chu, Okan Koç, Yi Ding, Qibin Zhao, Motoaki Kawanabe, Ziheng Chen

Comments Accepted to ICLR 2026

详情
英文摘要

Electroencephalography (EEG)-based brain-computer interfaces facilitate direct communication with a computer, enabling promising applications in human-computer interactions. However, their utility is currently limited because EEG decoding often suffers from poor generalization due to distribution shifts across domains (e.g., subjects). Learning robust representations that capture underlying task-relevant information would mitigate these shifts and improve generalization. One promising approach is to exploit the underlying hierarchical structure in EEG, as recent studies suggest that hierarchical cognitive processes, such as visual processing, can be encoded in EEG. While many decoding methods still rely on Euclidean embeddings, recent work has begun exploring hyperbolic geometry for EEG. Hyperbolic spaces, regarded as the continuous analogue of tree structures, provide a natural geometry for representing hierarchical data. In this study, we first empirically demonstrate that EEG data exhibit hyperbolicity and show that hyperbolic embeddings improve generalization. Motivated by these findings, we propose HEEGNet, a hybrid hyperbolic network architecture to capture the hierarchical structure in EEG and learn domain-invariant hyperbolic embeddings. To this end, HEEGNet combines both Euclidean and hyperbolic encoders and employs a novel coarse-to-fine domain adaptation strategy. Extensive experiments on multiple public EEG datasets, covering visual evoked potentials, emotion recognition, and intracranial EEG, demonstrate that HEEGNet achieves state-of-the-art performance. The code is available at https://github.com/fightlesliefigt/HEEGNet

2601.01829 2026-02-10 cs.LG

RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data

Peiyan Hu, Haodong Feng, Hongyuan Liu, Tongtong Yan, Wenhao Deng, Tianrun Gao, Rong Zheng, Haoren Zheng, Chenglei Yu, Chuanrui Wang, Kaiwen Li, Zhi-Ming Ma, Dezhi Zhou, Xingcai Lu, Dixia Fan, Tailin Wu

Comments iclr26 oral; 46 pages, 21 figures

详情
英文摘要

Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottleneck is the lack of expensive real-world data, resulting in most current models being trained and validated on simulated data. Beyond limiting the development and evaluation of scientific ML, this gap also hinders research into essential tasks such as sim-to-real transfer. We introduce RealPDEBench, the first benchmark for scientific ML that integrates real-world measurements with paired numerical simulations. RealPDEBench consists of five datasets, three tasks, eight metrics, and ten baselines. We first present five real-world measured datasets with paired simulated datasets across different complex physical systems. We further define three tasks, which allow comparisons between real-world and simulated data, and facilitate the development of methods to bridge the two. Moreover, we design eight evaluation metrics, spanning data-oriented and physics-oriented metrics, and finally benchmark ten representative baselines, including state-of-the-art models, pretrained PDE foundation models, and a traditional method. Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence. In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment. Our benchmark, datasets, and instructions are available at https://realpdebench.github.io/.

2601.00892 2026-02-10 cs.LG cs.CV physics.data-an stat.ME stat.ML

Hierarchical topological clustering

Ana Carpio, Gema Duro

Comments not peer reviewed, reviewed version to appear in Soft Computing

Journal ref Soft Computing 2026

详情
英文摘要

Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be implemented with any distance choice. The persistence of outliers and clusters of arbitrary shape is inferred from the resulting hierarchy. We demonstrate the potential of the algorithm on selected datasets in which outliers play relevant roles, consisting of images, medical and economic data. These methods can provide meaningful clusters in situations in which other techniques fail to do so.

2512.22730 2026-02-10 cs.CV cs.AI eess.IV

Improved cystic hygroma detection from prenatal imaging using ultrasound-specific self-supervised representation learning

Youssef Megahed, Robin Ducharme, Inok Lee, Inbal Willner, Adrian D. C. Chan, Mark Walker, Steven Hawken

Comments 13 pages, 6 figures, 2 tables

详情
英文摘要

Cystic hygroma is a high-risk prenatal ultrasound finding that portends high rates of chromosomal abnormalities, structural malformations, and adverse pregnancy outcomes. Automated detection can increase reproducibility and support scalable early screening programs, but supervised deep learning methods are limited by small labelled datasets. This study assesses whether ultrasound-specific self-supervised pretraining can facilitate accurate, robust deep learning detection of cystic hygroma in first-trimester ultrasound images. We fine-tuned the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE), pretrained on over 370,000 unlabelled ultrasound images, for binary classification of normal controls and cystic hygroma cases used in this study. Performance was evaluated on the same curated ultrasound dataset, preprocessing pipeline, and 4-fold cross-validation protocol as for the DenseNet-169 baseline, using accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC-AUC). Model interpretability was analyzed qualitatively using Score-CAM visualizations. USF-MAE outperformed the DenseNet-169 baseline on all evaluation metrics. The proposed model yielded a mean accuracy of 0.96, sensitivity of 0.94, specificity of 0.98, and ROC-AUC of 0.98 compared to 0.93, 0.92, 0.94, and 0.94 for the DenseNet-169 baseline, respectively. Qualitative Score-CAM visualizations of model predictions demonstrated clinical relevance by highlighting expected regions in the fetal neck for both positive and negative cases. Paired statistical analysis using a Wilcoxon signed-rank test confirmed that performance improvements achieved by USF-MAE were statistically significant (p = 0.0057).

2512.22439 2026-02-10 cs.CV cs.AI

SuperiorGAT: Graph Attention Networks for Sparse LiDAR Point Cloud Reconstruction in Autonomous Systems

Khalfalla Awedat, Mohamed Abidalrekab, Gurcan Comert, Mustafa Ayad

详情
英文摘要

LiDAR-based perception in autonomous systems is constrained by fixed vertical beam resolution and further compromised by beam dropout resulting from environmental occlusions. This paper introduces SuperiorGAT, a graph attention-based framework designed to reconstruct missing elevation information in sparse LiDAR point clouds. By modeling LiDAR scans as beam-aware graphs and incorporating gated residual fusion with feed-forward refinement, SuperiorGAT enables accurate reconstruction without increasing network depth. To evaluate performance, structured beam dropout is simulated by removing every fourth vertical scanning beam. Extensive experiments across diverse KITTI environments, including Person, Road, Campus, and City sequences, demonstrate that SuperiorGAT consistently achieves lower reconstruction error and improved geometric consistency compared to PointNet-based models and deeper GAT baselines. Qualitative X-Z projections further confirm the model's ability to preserve structural integrity with minimal vertical distortion. These results suggest that architectural refinement offers a computationally efficient method for improving LiDAR resolution without requiring additional sensor hardware.

2512.18837 2026-02-10 cs.LG math.DS

Generative Modeling through Koopman Spectral Analysis: An Operator-Theoretic Perspective

Yuanchao Xu, Fengyi Li, Masahiro Fujisawa, Xiaoyuan Cheng, Youssef Marzouk, Isao Ishikawa

详情
英文摘要

We propose Koopman Spectral Wasserstein Gradient Descent (KSWGD), a particle-based generative modeling framework that learns the Langevin generator via Koopman theory and integrates it with Wasserstein gradient descent. Our key insight is that this spectral structure of the underlying distribution can be directly estimated from trajectory data via the Koopman operator, eliminating the need for explicit knowledge of the target potential. Additionally, we prove that KSWGD maintains an approximately constant dissipation rate, thereby establishing linear convergence and overcoming the vanishing-gradient phenomenon that hinders existing kernel-based particle methods. We further provide a Feynman--Kac interpretation that clarifies the method's probabilistic foundation. Experiments on compact manifolds, metastable multi-well systems, and high-dimensional stochastic partial differential equations demonstrate that KSWGD consistently outperforms baselines in both convergence speed and sample quality.

2512.15973 2026-02-10 cs.LG cs.CL

Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models

Caner Erden

Journal ref International Journal of Complexity in Applied Science and Technology, 2026

详情
英文摘要

Dynamic Rank Reinforcement Learning (DR-RL) approximations rely on static rank assumptions, limiting their flexibility across diverse linguistic contexts. Our method dynamically modulates ranks based on real-time sequence dynamics, layer-specific sensitivities, and hardware constraints. The core innovation is a deep reinforcement learning agent that formulates rank selection as a sequential policy optimization problem, strictly balancing attention fidelity against computational latency. To ensure stability during inference, we derive and employ online matrix perturbation bounds, enabling incremental rank updates without the prohibitive cost of full decomposition. Furthermore, the integration of a lightweight Transformer-based policy network and batched Singular Value Decomposition (SVD) operations ensures scalable deployment on modern architectures. Extensive experiments demonstrate that DR-RL significantly reduces Floating Point Operations (FLOPs) by over 40% in long-sequence regimes (L > 4096) while maintaining downstream accuracy statistically equivalent to full-rank attention. Beyond standard language modeling benchmarks, we validate the real-world applicability of DR-RL on the GLUE benchmark. Specifically, our method achieves 92.78% accuracy on the SST-2 sentiment analysis task, matching the performance of full-rank baselines and outperforming static low-rank methods, such as Performer and Nyströmformer, by a significant margin.

2512.10937 2026-02-10 cs.AI quant-ph

Agent policies from higher-order causal functions

Matt Wilson

详情
英文摘要

We establish a correspondence between equivalence classes of agent-state policies for deterministic POMDPs and one-input process functions (the classical-deterministic limit of higher-order quantum operations). We use this correspondence to build a bridge between the agent-environment interaction in artificial intelligence, causal structure in the foundations of physics, and logic in computer science. We construct a *-autonomous category PF of types which supports an interpretation of one-step evaluation of policies, and multi-agent observation constraints, into cuts and monoidal products. In terms of types, we develop the correspondence further by identifying observation-independent decentralised POMDPs as the natural domain for the multi-input process functions used to model indefinite causality. We then prove a strict separation between general multi-input process function and definite-ordered process function performance on such dec-POMDPs, by finding an instance for which policies utilizing an indefinite causal structure can achieve greater finite-horizon rewards than policies which are restricted to a fixed background causal structure.

2512.05132 2026-02-10 cs.CV cs.AI

Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training

Wenshuo Wang, Fan Zhang

Comments Accepted as a poster paper at ICLR 2026

详情
英文摘要

Zero-Shot Super-Resolution Spatiotemporal Forecasting requires a deep learning model to be trained on low-resolution data and deployed for inference on high-resolution. Existing studies consider maintaining similar error across different resolutions as indicative of successful multi-resolution generalization. However, deep learning models serving as alternatives to numerical solvers should reduce error as resolution increases. The fundamental limitation is, the upper bound of physical law frequencies that low-resolution data can represent is constrained by its Nyquist frequency, making it difficult for models to process signals containing unseen frequency components during high-resolution inference. This results in errors being anchored at low resolution, incorrectly interpreted as successful generalization. We define this fundamental phenomenon as a new problem distinct from existing issues: Scale Anchoring. Therefore, we propose architecture-agnostic Frequency Representation Learning. It alleviates Scale Anchoring through resolution-aligned frequency representations and spectral consistency training: on grids with higher Nyquist frequencies, the frequency response in high-frequency bands of FRL-enhanced variants is more stable. This allows errors to decrease with resolution and significantly outperform baselines within our task and resolution range, while incurring only modest computational overhead.