arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1284
专题追踪
2508.12840 2026-02-09 cs.AI cs.MA

Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics

Giovanni Briglia, Francesco Fabiano, Stefano Mariani

详情
英文摘要

Multi-agent Epistemic Planning (MEP) is an autonomous planning framework for reasoning about both the physical world and the beliefs of agents, with applications in domains where information flow and awareness among agents are critical. The richness of MEP requires states to be represented as Kripke structures, i.e., directed labeled graphs. This representation limits the applicability of existing heuristics, hindering the scalability of epistemic solvers, which must explore an exponential search space without guidance, resulting often in intractability. To address this, we exploit Graph Neural Networks (GNNs) to learn patterns and relational structures within epistemic states, to guide the planning process. GNNs, which naturally capture the graph-like nature of Kripke models, allow us to derive meaningful estimates of state quality -- e.g., the distance from the nearest goal -- by generalizing knowledge obtained from previously solved planning instances. We integrate these predictive heuristics into an epistemic planning pipeline and evaluate them against standard baselines, showing improvements in the scalability of multi-agent epistemic planning.

2508.06214 2026-02-09 cs.LG cs.AI

Reparameterization Proximal Policy Optimization

Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang

详情
英文摘要

By leveraging differentiable dynamics, Reparameterization Policy Gradient (RPG) achieves high sample efficiency. However, current approaches are hindered by two critical limitations: the under-utilization of computationally expensive dynamics Jacobians and inherent training instability. While sample reuse offers a remedy for under-utilization, no prior principled framework exists, and naive attempts risk exacerbating instability. To address these challenges, we propose Reparameterization Proximal Policy Optimization (RPO). We first establish that under sample reuse, RPG naturally optimizes a PPO-style surrogate objective via Backpropagation Through Time, providing a unified framework for both on- and off-policy updates. To further ensure stability, RPO integrates a clipped policy gradient mechanism tailored for RPG and employs explicit Kullback-Leibler divergence regularization. Experimental results demonstrate that RPO maintains superior sample efficiency and consistently outperforms or achieves state-of-the-art performance across diverse tasks.

2508.04102 2026-02-09 cs.CV

AR as an Evaluation Playground: Bridging Metrics and Visual Perception of Computer Vision Models

Ashkan Ganj, Yiqin Zhao, Tian Guo

Comments Accepted at MMSys 2026

详情
英文摘要

Quantitative metrics are central to evaluating computer vision (CV) models, but they often fail to capture real-world performance due to protocol inconsistencies and ground-truth noise. While visual perception studies can complement these metrics, they often require end-to-end systems that are time-consuming to implement and setups that are difficult to reproduce. We systematically summarize key challenges in evaluating CV models and present the design of ARCADE, an evaluation platform that leverages augmented reality (AR) to enable easy, reproducible, and human-centered CV evaluation. ARCADE uses a modular architecture that provides cross-platform data collection, pluggable model inference, and interactive AR tasks, supporting both metric and visual perception evaluation. We demonstrate ARCADE through a user study with 15 participants and case studies on two representative CV tasks, depth and lighting estimation, showing that ARCADE can reveal perceptual flaws in model quality that are often missed by traditional metrics. We also evaluate ARCADE's usability and performance, showing its flexibility as a reliable real-time platform.

2508.01309 2026-02-09 cs.CL cs.AI

D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation

Weibo Zhou, Lingbo Li, Shangsong Liang

详情
英文摘要

The scarcity and high cost of high-quality domain-specific question-answering (QA) datasets limit supervised fine-tuning of large language models (LLMs). We introduce $\textbf{D-SCoRE}$, a training-free framework that leverages LLMs and prompt engineering to automatically generate diverse, rich QA datasets with Chain-of-Thought (CoT) from arbitrary textual sources. By integrating $\textbf{D}$ocument-centric processing, $\textbf{S}$egmentation, $\textbf{Co}$T $\textbf{R}$easoning, and structured $\textbf{E}$xport - along with multi-dimensional controls such as semantic role transformation, question type balancing, and counterfactual augmentation - D-SCoRE produces tailored QA pairs with enhanced diversity and relevance. LLMs fine-tuned on D-SCoRE-generated datasets outperform those trained on human-annotated QA data across most evaluated domains. Its efficiency and scalability enable rapid, high-performance domain-adaptive fine-tuning on consumer-grade hardware, generating over 1,100 high-quality QA pairs per GPU-hour end-to-end.

2507.16184 2026-02-09 cs.AI cs.HC

Emergent Cognitive Convergence via Implementation: Structured Cognitive Loop Reflecting Four Theories of Mind

Myung Ho Kim

Comments This revised version improves conceptual consistency between Agentic Flow and the Structured Cognitive Loop (SCL; arXiv:2510.05107)

详情
英文摘要

We report a structural convergence among four influential theories of mind: Kahneman dual-system theory, Friston predictive processing, Minsky society of mind, and Clark extended mind, emerging unintentionally within a practical AI architecture known as Agentic Flow. Designed to address limitations of large language models LLMs, Agentic Flow comprises five interlocking modules - Retrieval, Cognition, Control, Action, and Memory - organized into a repeatable cognitive loop. Although originally inspired only by Minsky and Clark, subsequent analysis showed that its structure echoes computational motifs from all four theories. This suggests that theoretical convergence may arise from implementation constraints rather than deliberate synthesis. In controlled evaluations, the structured agent achieved 95.8 percent task success compared to 62.3 percent for baseline LLMs, demonstrating stronger constraint adherence and more reproducible reasoning. We characterize this convergence through a broader descriptive meta-architecture called PEACE, highlighting recurring patterns such as predictive modeling, associative recall, and error-sensitive control. Later formalized as the Structured Cognitive Loop (SCL), this abstraction generalizes principles first realized in Agentic Flow as a foundation for behavioral intelligence in LLM-based agents.Rather than asserting theoretical unification, this position paper proposes that intelligent architectures may evolve toward shared structural patterns shaped by practical demands. Agentic Flow thus functions as an implementation instance of the Structured Cognitive Loop, illustrating how a unified cognitive form can emerge not from abstraction, but from the necessities of real-world reasoning.

2507.13501 2026-02-09 cs.CL math.RA q-bio.NC

Encoding syntactic objects and Merge operations in function spaces

Matilde Marcolli, Robert C. Berwick

Comments 48 pages, LaTeX, 4 png figures; v2: expository changes

详情
英文摘要

We provide a mathematical argument showing that, given a representation of lexical items as functions (wavelets, for instance) in some function space, it is possible to construct a faithful representation of arbitrary syntactic objects in the same function space. This space can be endowed with a commutative non-associative semiring structure built using the second Renyi entropy. The resulting representation of syntactic objects is compatible with the magma structure. The resulting set of functions is an algebra over an operad, where the operations in the operad model circuits that transform the input wave forms into a combined output that encodes the syntactic structure. The action of Merge on workspaces is faithfully implemented as action on these circuits, through a coproduct and a Hopf algebra Markov chain. The results obtained here provide a constructive argument showing the theoretical possibility of a neurocomputational realization of the core computational structure of syntax. We also present a particular case of this general construction where this type of realization of Merge is implemented as a cross frequency phase synchronization on sinusoidal waves. This also shows that Merge can be expressed in terms of the successor function of a semiring, thus clarifying the well known observation of its similarities with the successor function of arithmetic.

2507.10204 2026-02-09 cs.RO cs.SY eess.SY

REACT: Real-time Entanglement-Aware Coverage Path Planning for Tethered Underwater Vehicles

Abdelhakim Amer, Mohit Mehindratta, Yury Brodskiy, Bilal Wehbe, Erdal Kayacan

Comments Accepted for publication at International Conference on Robotics & Automation 2026

详情
英文摘要

Inspection of underwater structures with tethered underwater vehicles is often hindered by the risk of tether entanglement. We propose REACT (real-time entanglement-aware coverage path planning for tethered underwater vehicles), a framework designed to overcome this limitation. REACT comprises a computationally efficient geometry-based tether model using the signed distance field (SDF) map for accurate, real-time simulation of taut tether configurations around arbitrary structures in 3D. This model enables an efficient online replanning strategy by enforcing a maximum tether length constraint, thereby actively preventing entanglement. By integrating REACT into a coverage path planning framework, we achieve safe and entanglement-free inspection paths, previously challenging due to tether constraints. The complete REACT framework's efficacy is validated in a pipe inspection scenario, demonstrating safe navigation and full coverage inspection. Simulation results show that REACT achieves complete coverage while maintaining tether constraints and completing the total mission 20% faster than conventional planners, despite a longer inspection time due to proactive avoidance of entanglement that eliminates extensive post-mission disentanglement. Real-world experiments confirm these benefits, where REACT completes the full mission, while the baseline planner fails due to physical tether entanglement.

2507.02917 2026-02-09 cs.LG cs.AI

Echo State Transformer: Attention Over Finite Memories

Yannis Bendi-Ouis, Xavier Hinaut

详情
英文摘要

While Large Language Models and their underlying Transformer architecture are remarkably efficient, they do not reflect how our brain processes and learns a diversity of cognitive tasks such as language, nor how it leverages working memory. Furthermore, Transformers encounters a computational limitation: quadratic complexity growth with sequence length. Motivated by these limitations, we aim to design architectures that leverage efficient working memory dynamics to overcome standard computational barriers. We introduce Echo State Transformers (EST), a hybrid architecture that resolves this challenge while demonstrating state of the art performance in classification and detection tasks. EST integrates the Transformer attention mechanisms with nodes from Reservoir Computing to create a fixed-size memory system. Drawing inspiration from Echo State Networks, our approach leverages several reservoirs (random recurrent networks) in parallel as a lightweight and efficient working memory. These independent units possess distinct and learned internal dynamics with an adaptive leak rate, enabling them to dynamically adjust their own temporality. By applying attention on those fixed number of units instead of input tokens, EST achieves linear complexity for the whole sequence, effectively breaking the quadratic scaling problem of standard Transformers. We evaluate ESTs on a recent timeseries benchmark: the Time Series Library, which comprises 69 tasks across five categories. Results show that ESTs ranks first overall in two of five categories, outperforming strong state-of-the-art baselines on classification and anomaly detection tasks, while remaining competitive on short-term forecasting. These results demonstrate that by shifting the attention mechanism from the entire input sequence to a fixed set of evolving memory units, it is possible to maintains high sensitivity to temporal events while achieving constant computational complexity per step.

2506.21526 2026-02-09 cs.CV

WAFT: Warping-Alone Field Transforms for Optical Flow

Yihan Wang, Jia Deng

详情
英文摘要

We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow. WAFT is similar to RAFT but replaces cost volume with high-resolution warping, achieving better accuracy with lower memory cost. This design challenges the conventional wisdom that constructing cost volumes is necessary for strong performance. WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs. Compared with existing methods, WAFT ranks 1st on Spring, Sintel, and KITTI benchmarks, achieves the best zero-shot generalization on KITTI, while being 1.3-4.1x faster than existing methods that have competitive accuracy (e.g., 1.3x than Flowformer++, 4.1x than CCMR+). Code and model weights are available at \href{https://github.com/princeton-vl/WAFT}{https://github.com/princeton-vl/WAFT}.

2506.14625 2026-02-09 cs.CL cs.AI

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

Chenchen Yuan, Zheyu Zhang, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

Comments Accepted to ACL 2025 (Findings)

详情
英文摘要

Large Language Models (LLMs) have shown impressive moral reasoning abilities. Yet they often diverge when confronted with complex, multi-factor moral dilemmas. To address these discrepancies, we propose a framework that synthesizes multiple LLMs' moral judgments into a collectively formulated moral judgment, realigning models that deviate significantly from this consensus. Our aggregation mechanism fuses continuous moral acceptability scores (beyond binary labels) into a collective probability, weighting contributions by model reliability. For misaligned models, a targeted embedding-optimization procedure fine-tunes token embeddings for moral philosophical theories, minimizing JS divergence to the consensus while preserving semantic integrity. Experiments on a large-scale social moral dilemma dataset show our approach builds robust consensus and improves individual model fidelity. These findings highlight the value of data-driven moral alignment across multiple models and its potential for safer, more consistent AI systems.

2506.12014 2026-02-09 cs.CL cs.AI cs.LG cs.SE

code_transformed: The Influence of Large Language Models on Code

Yuliang Xu, Siming Huang, Mingmeng Geng, Yao Wan, Xuanhua Shi, Dongping Chen

Comments EACL 2026 Findings

详情
英文摘要

Coding remains one of the most fundamental modes of interaction between humans and machines. With the rapid advancement of Large Language Models (LLMs), code generation capabilities have begun to significantly reshape programming practices. This development prompts a central question: Have LLMs transformed code style, and how can such transformation be characterized? In this paper, we present a pioneering study that investigates the impact of LLMs on code style, with a focus on naming conventions, complexity, maintainability, and similarity. By analyzing code from over 20,000 GitHub repositories linked to arXiv papers published between 2020 and 2025, we identify measurable trends in the evolution of coding style that align with characteristics of LLM-generated code. For instance, the proportion of snake_case function names in Python code increased from 40.7% in Q1 2023 to 49.8% in Q3 2025. Furthermore, we investigate how LLMs approach algorithmic problems by examining their reasoning processes. Our experimental results may provide the first large-scale empirical evidence that LLMs affect real-world programming style. We release all the experimental dataset and source code at: https://github.com/ignorancex/LLM_code

2506.11924 2026-02-09 cs.CV

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

Min-Seop Kwak, Junho Kim, Sangdoo Yun, Dongyoon Han, Taekyung Kim, Seungryong Kim, Jin-Hwa Kim

Comments Project page at https://cvlab-kaist.github.io/MoAI

详情
英文摘要

We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology. Unlike prior methods that require dense posed images or pose-embedded generative models limited to in-domain views, our method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images, and formulates novel-view synthesis as an inpainting task for both image and geometry. To ensure accurate alignment between generated images and geometry, we propose cross-modal attention distillation, where attention maps from the image diffusion branch are injected into a parallel geometry diffusion branch during both training and inference. This multi-task approach achieves synergistic effects, facilitating geometrically robust image synthesis as well as well-defined geometry prediction. We further introduce proximity-based mesh conditioning to integrate depth and normal cues, interpolating between point cloud and filtering erroneously predicted geometry from influencing the generation process. Empirically, our method achieves high-fidelity extrapolative view synthesis on both image and geometry across a range of unseen scenes, delivers competitive reconstruction quality under interpolation settings, and produces geometrically aligned colored point clouds for comprehensive 3D completion. Project page is available at https://cvlab-kaist.github.io/MoAI.

2506.07822 2026-02-09 cs.LG cs.AI

Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J. Zico Kolter, Jeff Schneider

详情
英文摘要

Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While consistency models offer a potential solution, existing applications to decision-making either struggle with suboptimal demonstrations under behavior cloning or rely on complex concurrent training of multiple networks under the actor-critic framework. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method achieves single-step sampling while generating higher-reward action trajectories through decoupled training and noise-free reward signals. Empirical evaluations on the Gym MuJoCo, FrankaKitchen, and long horizon planning benchmarks demonstrate that our approach can achieve a 9.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.

2506.06522 2026-02-09 cs.CL cs.AI

Fixing It in Post: A Comparative Study of LLM Post-Training Data Quality and Model Performance

Aladin Djuhera, Swanand Ravindra Kadhe, Syed Zawad, Farhan Ahmed, Heiko Ludwig, Holger Boche

Journal ref The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

详情
英文摘要

Recent work on large language models (LLMs) has increasingly focused on post-training and alignment with datasets curated to enhance instruction following, world knowledge, and specialized skills. However, most post-training datasets used in leading open- and closed-source LLMs remain inaccessible to the public, with limited information about their construction process. This lack of transparency has motivated the recent development of open-source post-training corpora. While training on these open alternatives can yield performance comparable to that of leading models, systematic comparisons remain challenging due to the significant computational cost of conducting them rigorously at scale, and are therefore largely absent. As a result, it remains unclear how specific samples, task types, or curation strategies influence downstream performance when assessing data quality. In this work, we conduct the first comprehensive side-by-side analysis of two prominent open post-training datasets: Tulu-3-SFT-Mix and SmolTalk. Using the Magpie framework, we annotate each sample with detailed quality metrics, including turn structure (single-turn vs. multi-turn), task category, input quality, and response quality, and we derive statistics that reveal structural and qualitative similarities and differences between the two datasets. Based on these insights, we design a principled curation recipe that produces a new data mixture, TuluTalk, which contains 14% fewer samples than either source dataset while matching or exceeding their performance on key benchmarks. Our findings offer actionable insights for constructing more effective post-training datasets that improve model performance within practical resource limits. To support future research, we publicly release both the annotated source datasets and our curated TuluTalk mixture.

2506.01299 2026-02-09 cs.AI cs.LG

Scalable In-Context Q-Learning

Jinmei Liu, Fuhong Liu, Zhenhong Sun, Jianye Hao, Huaxiong Li, Bo Wang, Daoyi Dong, Chunlin Chen, Zhi Wang

Comments accepted by ICLR 2026

详情
英文摘要

Recent advancements in language models have demonstrated remarkable in-context learning abilities, prompting the exploration of in-context reinforcement learning (ICRL) to extend the promise to decision domains. Due to involving more complex dynamics and temporal correlations, existing ICRL approaches may face challenges in learning from suboptimal trajectories and achieving precise in-context inference. In the paper, we propose \textbf{S}calable \textbf{I}n-\textbf{C}ontext \textbf{Q}-\textbf{L}earning (\textbf{S-ICQL}), an innovative framework that harnesses dynamic programming and world modeling to steer ICRL toward efficient reward maximization and task generalization, while retaining the scalability and stability of supervised pretraining. We design a prompt-based multi-head transformer architecture that simultaneously predicts optimal policies and in-context value functions using separate heads. We pretrain a generalized world model to capture task-relevant information, enabling the construction of a compact prompt that facilitates fast and precise in-context inference. During training, we perform iterative policy improvement by fitting a state value function to an upper-expectile of the Q-function, and distill the in-context value functions into policy extraction using advantage-weighted regression. Extensive experiments across a range of discrete and continuous environments show consistent performance gains over various types of baselines, especially when learning from suboptimal data. Our code is available at \textcolor{magenta}{\href{https://github.com/NJU-RL/SICQL}{https://github.com/NJU-RL/SICQL}}.

2506.00956 2026-02-09 cs.CV

Continual-MEGA: A Large-scale Benchmark for Generalizable Continual Anomaly Detection

Geonu Lee, Yujeong Oh, Geonhui Jang, Soyoung Lee, Jeonghyo Song, Sungmin Cha, YoungJoon Yoo

详情
英文摘要

In this paper, we introduce a new benchmark for continual learning in anomaly detection, aimed at better reflecting real-world deployment scenarios. Our benchmark, Continual-MEGA, includes a large and diverse dataset that significantly expands existing evaluation settings by combining carefully curated existing datasets with our newly proposed dataset, ContinualAD. In addition to standard continual learning with expanded quantity, we propose a novel scenario that measures zero-shot generalization to unseen classes, those not observed during continual adaptation. This setting poses a new problem setting that continual adaptation also enhances zero-shot performance. We also present a unified baseline algorithm that improves robustness in few-shot detection and maintains strong generalization. Through extensive evaluations, we report three key findings: (1) existing methods show substantial room for improvement, particularly in pixel-level defect localization; (2) our proposed method consistently outperforms prior approaches; and (3) the newly introduced ContinualAD dataset enhances the performance of strong anomaly detection models. We release the benchmark and code in https://github.com/Continual-Mega/Continual-Mega.

2505.23506 2026-02-09 cs.LG stat.ML

Position: Epistemic uncertainty estimation methods are fundamentally incomplete

Sebastián Jiménez, Mira Jürgens, Willem Waegeman

详情
英文摘要

Identifying and disentangling sources of predictive uncertainty is essential for trustworthy supervised learning. We argue that widely used second-order methods that disentangle aleatoric and epistemic uncertainty are fundamentally incomplete. First, we show that unaccounted bias contaminates uncertainty estimates by overestimating aleatoric (data-related) uncertainty and underestimating the epistemic (model-related) counterpart, leading to incorrect uncertainty quantification. Second, we demonstrate that existing methods capture only partial contributions to the variance-driven part of epistemic uncertainty; different approaches account for different variance sources, yielding estimates that are incomplete and difficult to interpret. Together, these results highlight that current epistemic uncertainty estimates can only be used in safety-critical and high-stakes decision-making when limitations are fully understood by end users and acknowledged by AI developers.

2505.23071 2026-02-09 cs.LG

Rethinking Multi-Modal Learning from Gradient Uncertainty

Peizheng Guo, Jingyao Wang, Wenwen Qiang, Jiahuan Zhou, Changwen Zheng, Gang Hua

详情
英文摘要

Multi-Modal Learning (MML) integrates information from diverse modalities to improve predictive accuracy. While existing optimization strategies have made significant strides by mitigating gradient direction conflicts, we revisit MML from a gradient-based perspective to explore further improvements. Empirically, we observe an interesting phenomenon: performance fluctuations can persist in both conflict and non-conflict settings. Based on this, we argue that: beyond gradient direction, the intrinsic reliability of gradients acts as a decisive factor in optimization, necessitating the explicit modeling of gradient uncertainty. Guided by this insight, we propose Bayesian-Oriented Gradient Calibration for MML (BOGC-MML). Our approach explicitly models gradients as probability distributions to capture uncertainty, interpreting their precision as evidence within the framework of subjective logic and evidence theory. By subsequently aggregating these signals using a reduced Dempster's combination rule, BOGC-MML adaptively weights gradients based on their reliability to generate a calibrated update. Extensive experiments demonstrate the effectiveness and advantages of the proposed method.

2505.18759 2026-02-09 cs.AI cs.LG

The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation

Ruichen Zhang, Rana Muhammad Shahroz Khan, Zhen Tan, Dawei Li, Song Wang, Tianlong Chen

详情
英文摘要

Data-centric distillation, including data augmentation, selection, and mixing, offers a promising path to creating smaller, more efficient student Large Language Models (LLMs) that retain strong reasoning abilities. However, there still lacks a comprehensive benchmark to systematically assess the effect of each distillation approach. This paper introduces DC-CoT, the first data-centric benchmark that investigates data manipulation in chain-of-thought (CoT) distillation from method, model and data perspectives. Utilizing various teacher models (e.g., o4-mini, Gemini-Pro, Claude-3.5) and student architectures (e.g., 3B, 7B parameters), we rigorously evaluate the impact of these data manipulations on student model performance across multiple reasoning datasets, with a focus on in-distribution (IID) and out-of-distribution (OOD) generalization, and cross-domain transfer. Our findings aim to provide actionable insights and establish best practices for optimizing CoT distillation through data-centric techniques, ultimately facilitating the development of more accessible and capable reasoning models. The codebase can be accessed at https://github.com/UNITES-Lab/Distillation-Bench

2505.15592 2026-02-09 cs.CV

VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation

Niccolo Avogaro, Thomas Frick, Yagmur G. Cinar, Daniel Caraballo, Cezary Skura, Filip M. Janicki, Piotr Kluska, Brown Ebouky, Nicola Farronato, Florian Scheidegger, Cristiano Malossi, Konrad Schindler, Andrea Bartezzaghi, Roy Assaf, Mattia Rigotti

Journal ref IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

详情
英文摘要

Large-scale pretrained vision backbones have transformed computer vision by providing powerful feature extractors that enable various downstream tasks, including training-free approaches like visual prompting for semantic segmentation. Despite their success in generic scenarios, these models often fall short when applied to specialized technical domains where the visual features differ significantly from their training distribution. To bridge this gap, we introduce VP Lab, a comprehensive iterative framework that enhances visual prompting for robust segmentation model development. At the core of VP Lab lies E-PEFT, a novel ensemble of parameter-efficient fine-tuning techniques specifically designed to adapt our visual prompting pipeline to specific domains in a manner that is both parameter- and data-efficient. Our approach not only surpasses the state-of-the-art in parameter-efficient fine-tuning for the Segment Anything Model (SAM), but also facilitates an interactive, near-real-time loop, allowing users to observe progressively improving results as they experiment within the framework. By integrating E-PEFT with visual prompting, we demonstrate a remarkable 50\% increase in semantic segmentation mIoU performance across various technical datasets using only 5 validated images, establishing a new paradigm for fast, efficient, and interactive model deployment in new, challenging domains. This work comes in the form of a demonstration.

2505.11781 2026-02-09 cs.LG

Multi-Order Wavelet Derivative Transform for Deep Time Series Forecasting

Ziyu Zhou, Jiaxi Hu, Qingsong Wen, James T. Kwok, Yuxuan Liang

Comments Preprint

详情
英文摘要

In deep time series forecasting, the Fourier Transform (FT) is extensively employed for frequency representation learning. However, it often struggles in capturing multi-scale, time-sensitive patterns. Although the Wavelet Transform (WT) can capture these patterns through frequency decomposition, its coefficients are insensitive to change points in time series, leading to suboptimal modeling. To mitigate these limitations, we introduce the multi-order Wavelet Derivative Transform (WDT) grounded in the WT, enabling the extraction of time-aware patterns spanning both the overall trend and subtle fluctuations. Compared with the standard FT and WT, which model the raw series, the WDT operates on the derivative of the series, selectively magnifying rate-of-change cues and exposing abrupt regime shifts that are particularly informative for time series modeling. Practically, we embed the WDT into a multi-branch framework named WaveTS, which decomposes the input series into multi-scale time-frequency coefficients, refines them via linear layers, and reconstructs them into the time domain via the inverse WDT. Extensive experiments on ten benchmark datasets demonstrate that WaveTS achieves state-of-the-art forecasting accuracy while retaining high computational efficiency.

2505.05017 2026-02-09 cs.CL

Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

Yuntai Bao, Xuhong Zhang, Tianyu Du, Xinkui Zhao, Jiang Zong, Hao Peng, Jianwei Yin

Comments 17 pages, 4 figures; accepted by IJCAI 2025

Journal ref Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence Main Track (2025) 8022-8030

详情
英文摘要

Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may provide valuable insights. Influence functions have been proposed as a means to explain model predictions based on training data. However, existing approaches fail to compute ``multi-stage'' influence and lack scalability to billion-scale LLMs. In this paper, we propose the multi-stage influence function to attribute the downstream predictions of fine-tuned LLMs to pre-training data under the full-parameter fine-tuning paradigm. To enhance the efficiency and practicality of our multi-stage influence function, we leverage Eigenvalue-corrected Kronecker-Factored (EK-FAC) parameterization for efficient approximation. Empirical results validate the superior scalability of EK-FAC approximation and the effectiveness of our multi-stage influence function. Additionally, case studies on a real-world LLM, dolly-v2-3b, demonstrate its interpretive power, with exemplars illustrating insights provided by multi-stage influence estimates. Our code is public at https://github.com/colored-dye/multi_stage_influence_function.

2504.15243 2026-02-09 cs.LG stat.ML

Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints

Ming Yang, Gang Li, Quanqi Hu, Qihang Lin, Tianbao Yang

详情
英文摘要

Constrained optimization with multiple functional inequality constraints has significant applications in machine learning. This paper examines a crucial subset of such problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop algorithmic designs. To overcome these challenges, we introduce a novel single-loop penalty-based stochastic algorithm. Following the classical exact penalty method, our approach employs a {\bf hinge-based penalty}, which permits the use of a constant penalty parameter, enabling us to achieve a {\bf state-of-the-art complexity} for finding an approximate Karush-Kuhn-Tucker (KKT) solution. We further extend our algorithm to address finite-sum coupled compositional objectives, which are prevalent in artificial intelligence applications, establishing improved complexity over existing approaches. Finally, we validate our method through experiments on fair learning with receiver operating characteristic (ROC) fairness constraints and continual learning with non-forgetting constraints.

2504.13707 2026-02-09 cs.AI cs.CL

OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation

Yichen Wu, Qianqian Gao, Xudong Pan, Geng Hong, Min Yang

详情
英文摘要

As large language models (LLMs) are increasingly deployed as interactive agents, open-ended human-AI interactions can involve deceptive behaviors with serious real-world consequences, yet existing evaluations remain largely scenario-specific and model-centric. We introduce OpenDeception, a lightweight framework for jointly evaluating deception risk from both sides of human-AI dialogue. It consists of a scenario benchmark with 50 real-world deception cases, an IntentNet that infers deceptive intent from agent reasoning, and a TrustNet that estimates user susceptibility. To address data scarcity, we synthesize high-risk dialogues via LLM-based role-and-goal simulation, and train the User Trust Scorer using contrastive learning on controlled response pairs, avoiding unreliable scalar labels. Experiments on 11 LLMs and three large reasoning models show that over 90% of goal-driven interactions in most models exhibit deceptive intent, with stronger models displaying higher risk. A real-world case study adapted from a documented AI-induced suicide incident further demonstrates that our joint evaluation can proactively trigger warnings before critical trust thresholds are reached.

2504.11770 2026-02-09 cs.CL

Unsupervised Classification of English Words Based on Phonological Information: Discovery of Germanic and Latinate Clusters

Takashi Morita, Timothy J. O'Donnell

详情
英文摘要

Cross-linguistically, native words and loanwords follow different phonological rules. In English, for example, words of Germanic and Latinate origin exhibit different stress patterns, and a certain syntactic structure, double-object datives, is predominantly associated with Germanic verbs rather than Latinate verbs. From the perspective of language acquisition, however, such etymology-based generalizations raise learnability concerns, since the historical origins of words are presumably inaccessible information for general language learners. In this study, we present computational evidence indicating that the Germanic-Latinate distinction in the English lexicon is learnable from the phonotactic information of individual words. Specifically, we performed an unsupervised clustering on corpus-extracted words, and the resulting word clusters largely aligned with the etymological distinction. The model-discovered clusters also recovered various linguistic generalizations documented in the previous literature regarding the corresponding etymological classes. Moreover, our model also uncovered previously unrecognized features of the quasi-etymological clusters. Taken together with prior results from Japanese, our findings indicate that the proposed method provides a general, cross-linguistic approach to discovering etymological structure from phonotactic cues in the lexicon.

2504.10852 2026-02-09 cs.CV

Enhancing Features in Long-tailed Data Using Large Vision Model

Pengxiao Han, Changkun Ye, Jinguang Tong, Cuicui Jiang, Jie Hong, Li Fang, Xuesong Li

详情
英文摘要

Language-based foundation models, such as large language models (LLMs) or large vision-language models (LVLMs), have been widely studied in long-tailed recognition. However, the need for linguistic data is not applicable to all practical tasks. In this study, we aim to explore using large vision models (LVMs) or visual foundation models (VFMs) to enhance long-tailed data features without any language information. Specifically, we extract features from the LVM and fuse them with features in the baseline network's map and latent space to obtain the augmented features. Moreover, we design several prototype-based losses in the latent space to further exploit the potential of the augmented features. In the experimental section, we validate our approach on two benchmark datasets: ImageNet-LT and iNaturalist2018.

2504.08202 2026-02-09 cs.CL

Harnessing the Unseen: The Hidden Influence of Intrinsic Knowledge in Long-Context Language Models

Yu Fu, Haz Sameen Shahgir, Hui Liu, Xianfeng Tang, Qi He, Yue Dong

Comments 17 pages,11figures (accepted to AAAI 2026)

详情
英文摘要

Recent advances in long-context language models (LCLMs), designed to handle extremely long contexts, primarily focus on utilizing external contextual information, often leaving the influence of language models' parametric knowledge underexplored. In this work, we firstly investigate how this parametric knowledge affects content generation and demonstrate that its impact becomes increasingly pronounced as context length extends. Furthermore, we show that the model's ability to utilize parametric knowledge, which we call parametric recall ability, does not improve simultaneously with its ability to leverage contextual knowledge through extrinsic retrieval ability. Moreover, better extrinsic retrieval ability can interfere with the model's parametric recall ability, limiting its full potential. To bridge this gap, we design a simple yet effective Hybrid Needle-in-a-Haystack test that evaluates models based on their capabilities across both abilities, rather than solely emphasizing extrinsic retrieval ability. Our experimental results reveal that Qwen-2.5 models significantly outperform Llama-3.1 models, demonstrating superior potential to combine various abilities. Moreover, even the more powerful Llama-3.1-70B-Instruct model fails to exhibit better performance, highlighting the importance of evaluating models from a dual-ability perspective.

2504.02275 2026-02-09 cs.LG

Enhancing Customer Contact Efficiency with Graph Neural Networks in Credit Card Fraud Detection Workflow

Menghao Huo, Kuan Lu, Qiang Zhu, Zhenrui Chen

Comments Published in Proceedings of the 2025 IEEE 7th International Conference on Communications, Information System and Computer Engineering (CISCE), pp. 320-324. DOI: 10.1109/CISCE65916.2025.11065245

Journal ref Proceedings of 2025 IEEE 7th International Conference on Communications, Information System and Computer Engineering (CISCE), pp. 320-324

详情
英文摘要

Credit card fraud has been a persistent issue since the last century, causing significant financial losses to the industry. The most effective way to prevent fraud is by contacting customers to verify suspicious transactions. However, while these systems are designed to detect fraudulent activity, they often mistakenly flag legitimate transactions, leading to unnecessary declines that disrupt the user experience and erode customer trust. Frequent false positives can frustrate customers, resulting in dissatisfaction, increased complaints, and a diminished sense of security. To address these limitations, we propose a fraud detection framework incorporating Relational Graph Convolutional Networks (RGCN) to enhance the accuracy and efficiency of identifying fraudulent transactions. By leveraging the relational structure of transaction data, our model reduces the need for direct customer confirmation while maintaining high detection performance. Our experiments are conducted using the IBM credit card transaction dataset to evaluate the effectiveness of this approach.

2503.19647 2026-02-09 cs.CV cs.AI

Show or Tell? Effectively prompting Vision-Language Models for semantic segmentation

Niccolo Avogaro, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Filip Janicki, Cristiano Malossi, Konrad Schindler, Roy Assaf

Journal ref Transactions on Machine Learning Research, 2025

详情
英文摘要

Large Vision-Language Models (VLMs) are increasingly being regarded as foundation models that can be instructed to solve diverse tasks by prompting, without task-specific training. We examine the seemingly obvious question: how to effectively prompt VLMs for semantic segmentation. To that end, we systematically evaluate the segmentation performance of several recent models guided by either text or visual prompts on the out-of-distribution MESS dataset collection. We introduce a scalable prompting scheme, few-shot prompted semantic segmentation, inspired by open-vocabulary segmentation and few-shot learning. It turns out that VLMs lag far behind specialist models trained for a specific segmentation task, by about 30% on average on the Intersection-over-Union metric. Moreover, we find that text prompts and visual prompts are complementary: each one of the two modes fails on many examples that the other one can solve. Our analysis suggests that being able to anticipate the most effective prompt modality can lead to a 11% improvement in performance. Motivated by our findings, we propose PromptMatcher, a remarkably simple training-free baseline that combines both text and visual prompts, achieving state-of-the-art results outperforming the best text-prompted VLM by 2.5%, and the top visual-prompted VLM by 3.5% on few-shot prompted semantic segmentation.

2503.18087 2026-02-09 cs.LG cs.NA math.NA

HyperNOs: Automated and Parallel Library for Neural Operators Research

Massimiliano Ghiotto

Comments 25 pages, 11 figures

Journal ref Bollettino dell Unione Matematica Italiana 2025

详情
英文摘要

This paper introduces HyperNOs, a PyTorch library designed to streamline and automate the process of exploring neural operators, with a special focus on hyperparameter optimization for comprehensive and exhaustive exploration. Indeed, HyperNOs takes advantage of state-of-the-art optimization algorithms and parallel computing implemented in the Ray-tune library to efficiently explore the hyperparameter space of neural operators. We also implement many useful functionalities for studying neural operators with a user-friendly interface, such as the possibility to train the model with a fixed number of parameters or to train the model with multiple datasets and different resolutions. We integrate Fourier neural operators and convolutional neural operators in our library, achieving state of the art results on many representative benchmarks, demonstrating the capabilities of HyperNOs to handle real datasets and modern architectures. The library is designed to be easy to use with the provided model and datasets, but also to be easily extended to use new datasets and custom neural operator architectures.