arXivDaily arXiv每日学术速递 周一至周五更新
2601.23285 2026-02-02 cs.RO cs.AI cs.HC cs.LG

End-to-end Optimization of Belief and Policy Learning in Shared Autonomy Paradigms

MH Farhadi, Ali Rabiee, Sima Ghafoori, Anna Cetera, Andrew Fisher, Reza Abiri

详情
英文摘要

Shared autonomy systems require principled methods for inferring user intent and determining appropriate assistance levels. This is a central challenge in human-robot interaction, where systems must be successful while being mindful of user agency. Previous approaches relied on static blending ratios or separated goal inference from assistance arbitration, leading to suboptimal performance in unstructured environments. We introduce BRACE (Bayesian Reinforcement Assistance with Context Encoding), a novel framework that fine-tunes Bayesian intent inference and context-adaptive assistance through an architecture enabling end-to-end gradient flow between intent inference and assistance arbitration. Our pipeline conditions collaborative control policies on environmental context and complete goal probability distributions. We provide analysis showing (1) optimal assistance levels should decrease with goal uncertainty and increase with environmental constraint severity, and (2) integrating belief information into policy learning yields a quadratic expected regret advantage over sequential approaches. We validated our algorithm against SOTA methods (IDA, DQN) using a three-part evaluation progressively isolating distinct challenges of end-effector control: (1) core human-interaction dynamics in a 2D human-in-the-loop cursor task, (2) non-linear dynamics of a robotic arm, and (3) integrated manipulation under goal ambiguity and environmental constraints. We demonstrate improvements over SOTA, achieving 6.3% higher success rates and 41% increased path efficiency, and 36.3% success rate and 87% path efficiency improvement over unassisted control. Our results confirmed that integrated optimization is most beneficial in complex, goal-ambiguous scenarios, and is generalizable across robotic domains requiring goal-directed assistance, advancing the SOTA for adaptive shared autonomy.

2601.23281 2026-02-02 cs.CV

User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments

Junfeng Lin, Yanming Xiu, Maria Gorlatova

Comments Accepted by IEEE VR 2026: GenAI-XR workshop

详情
英文摘要

Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect GroundingDINO. Prompt enhancement substantially improves robustness under ambiguity, yielding gains exceeding 55% mIoU and 41% average confidence. Based on the findings, we propose several prompting strategies and prompt enhancement methods for OSOD models in XR environments.

2601.23269 2026-02-02 math.NA cs.NA

Rank Reduction AutoEncoders for Mechanical Design: Advancing Novel and Efficient Data-Driven Topology Optimization

Ismael Ben-Yelun, Mohammed El Fallaki Idrissi, Jad Mounayer, Sebastian Rodriguez, Francisco Chinesta

详情
英文摘要

This work presents a data-driven framework for fast forward and inverse analysis in topology optimization (TO) by combining Rank Reduction Autoencoders (RRAEs) with neural latent-space mappings. The methodology targets the efficient approximation of the relationship between optimized geometries and their corresponding mechanical responses or Quantity of Interest (QoI), with a particular focus on compliance-minimized linear elastic structures. High-dimensional TO results are first compressed using RRAEs, which encode the data into a low-rank approximation via Singular Value Decomposition (SVD), obtained in this sense the most important features that approximate the data. Separate RRAE models are trained for geometry and for different types of QoIs, including scalar metrics, one-dimensional stress fields, and full two-dimensional von Mises stress distributions. The resulting low-dimensional latent coefficients of the latent space are then related through multilayer perceptrons to address both direct problems -- predicting structural responses from geometry -- and inverse problems -- recovering geometries from prescribed performance targets. The proposed approach is demonstrated on a benchmark TO problem based on a half MBB beam, using datasets generated via density-based Solid Isotropic Material with Penalization (SIMP) optimization. Numerical results show that the framework enables accurate and computationally efficient surrogate models, with increasing robustness and fidelity as richer QoIs are considered. The methodology also provides a foundation for generative mechanical design by enabling the synthesis of new geometries and responses through latent-space exploration.

2601.23266 2026-02-02 cs.RO cs.AI

IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models

Seyed Ahmad Hosseini Miangoleh, Amin Jalal Aghdasian, Farzaneh Abdollahi

详情
英文摘要

This paper proposes a novel inverse reinforcement learning framework using a diffusion-based adaptive lookahead planner (IRL-DAL) for autonomous vehicles. Training begins with imitation from an expert finite state machine (FSM) controller to provide a stable initialization. Environment terms are combined with an IRL discriminator signal to align with expert goals. Reinforcement learning (RL) is then performed with a hybrid reward that combines diffuse environmental feedback and targeted IRL rewards. A conditional diffusion model, which acts as a safety supervisor, plans safe paths. It stays in its lane, avoids obstacles, and moves smoothly. Then, a learnable adaptive mask (LAM) improves perception. It shifts visual attention based on vehicle speed and nearby hazards. After FSM-based imitation, the policy is fine-tuned with Proximal Policy Optimization (PPO). Training is run in the Webots simulator with a two-stage curriculum. A 96\% success rate is reached, and collisions are reduced to 0.05 per 1k steps, marking a new benchmark for safe navigation. By applying the proposed approach, the agent not only drives in lane but also handles unsafe conditions at an expert level, increasing robustness.We make our code publicly available.

2601.23255 2026-02-02 cs.CL cs.AI cs.CR

Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models

Ye Yu, Haibo Jin, Yaoning Yu, Jun Zhuang, Haohan Wang

Comments to be published at EACL 2026 main conference

详情
英文摘要

Large audio-language models increasingly operate on raw speech inputs, enabling more seamless integration across domains such as voice assistants, education, and clinical triage. This transition, however, introduces a distinct class of vulnerabilities that remain largely uncharacterized. We examine the security implications of this modality shift by designing a text-to-audio jailbreak that embeds disallowed directives within a narrative-style audio stream. The attack leverages an advanced instruction-following text-to-speech (TTS) model to exploit structural and acoustic properties, thereby circumventing safety mechanisms primarily calibrated for text. When delivered through synthetic speech, the narrative format elicits restricted outputs from state-of-the-art models, including Gemini 2.0 Flash, achieving a 98.26% success rate that substantially exceeds text-only baselines. These results highlight the need for safety frameworks that jointly reason over linguistic and paralinguistic representations, particularly as speech-based interfaces become more prevalent.

2601.23253 2026-02-02 cs.CV cs.LG

Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models

Yi Zhang, Chun-Wun Cheng, Angelica I. Aviles-Rivero, Zhihai He, Liang-Jie Zhang

Comments Accepted in ICASSP 2026

详情
英文摘要

Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computationally intensive, rely on back-propagation, and often focus on single modalities. To address these issues, we propose Training-free Test-Time Adaptation with Brownian Distance Covariance (TaTa). TaTa leverages Brownian Distance Covariance-a powerful statistical measure that captures both linear and nonlinear dependencies via pairwise distances-to dynamically adapt VLMs to new domains without training or back-propagation. This not only improves efficiency but also enhances stability by avoiding disruptive weight updates. TaTa further integrates attribute-enhanced prompting to improve vision-language inference with descriptive visual cues. Combined with dynamic clustering and pseudo-label refinement, it effectively recalibrates the model for novel visual contexts. Experiments across diverse datasets show that TaTa significantly reduces computational cost while achieving state-of-the-art performance in domain and cross-dataset generalization.

2601.23248 2026-02-02 cs.GT

(Doubly) Exponential Lower Bounds for Follow the Regularized Leader in Potential Games

Ioannis Anagnostides, Ioannis Panageas, Nikolas Patris, Tuomas Sandholm

详情
英文摘要

Follow the regularized leader FTRL is the premier algorithm for online optimization. However, despite decades of research on its convergence in constrained optimization -- and potential games in particular -- its behavior remained hitherto poorly understood. In this paper, we establish that FTRL can take exponential time to converge to a Nash equilibrium in two-player potential games for any (permutation-invariant) regularizer and potentially vanishing learning rate. By known equivalences, this translates to an exponential lower bound for certain mirror descent counterparts, most notably multiplicative weights update. On the positive side, we establish the potential property for FTRL and obtain an exponential upper bound $\exp(O_ε(1/ε^2))$ for any no-regret dynamics executed in a lazy, alternating fashion, matching our lower bound up to factors in the exponent. Finally, in multi-player potential games, we show that fictitious play -- the extreme version of FTRL -- can take doubly exponential time to reach a Nash equilibrium. This constitutes an exponentially stronger lower bound for the foundational learning algorithm in games.

2601.23246 2026-02-02 cs.SI

The Iterated Local Model for tournaments

Anthony Bonato, MacKenzie Carr, Ketan Chaudhary, Trent G. Marbach, Teddy Mishura

详情
英文摘要

Transitivity is a central, generative principle in social and other complex networks, capturing the tendency for two nodes with a common neighbor to form a direct connection. We propose a new model for highly dense, complex networks based on transitivity, called the Iterated Local Model Tournament (ILMT). In ILMT, we iteratively apply transitivity to form new tournaments by cloning nodes and their adjacencies, and either preserving or reversing the orientation of existing arcs between clones. The resulting model generates tournaments with small diameters and high connectivity as observed in real-world complex networks. We analyze subtournaments or motifs in the ILMT model and their universality properties. For many parameter choices, the model generates sequences of quasirandom tournaments. We also study the graph-theoretic properties of ILMT tournaments, including their cop number, domination number, and chromatic number. We finish with a set of open problems and variants of the ILMT model for oriented graphs.

2601.23240 2026-02-02 cs.DS

Compressed Set Representations based on Set Difference

Travis Gagie, Meng He, Gonzalo Navarro

Comments To be presented at LATIN 2026

详情
英文摘要

We introduce a compressed representation of sets of sets that exploits how much they differ from each other. Our representation supports access, membership, predecessor and successor queries on the sets within logarithmic time. In addition, we give a new MST-based construction algorithm for the representation that outperforms standard ones.

2601.23239 2026-02-02 stat.ML cs.IT cs.LG cs.SI math.IT math.ST stat.TH

Graph Attention Network for Node Regression on Random Geometric Graphs with Erdős--Rényi contamination

Somak Laha, Suqi Liu, Morgane Austern

Comments 62 pages, 2 figures, 2 tables

详情
英文摘要

Graph attention networks (GATs) are widely used and often appear robust to noise in node covariates and edges, yet rigorous statistical guarantees demonstrating a provable advantage of GATs over non-attention graph neural networks~(GNNs) are scarce. We partially address this gap for node regression with graph-based errors-in-variables models under simultaneous covariate and edge corruption: responses are generated from latent node-level covariates, but only noise-perturbed versions of the latent covariates are observed; and the sample graph is a random geometric graph created from the node covariates but contaminated by independent Erdős--Rényi edges. We propose and analyze a carefully designed, task-specific GAT that constructs denoised proxy features for regression. We prove that regressing the response variables on the proxies achieves lower error asymptotically in (a) estimating the regression coefficient compared to the ordinary least squares (OLS) estimator on the noisy node covariates, and (b) predicting the response for an unlabelled node compared to a vanilla graph convolutional network~(GCN) -- under mild growth conditions. Our analysis leverages high-dimensional geometric tail bounds and concentration for neighbourhood counts and sample covariances. We verify our theoretical findings through experiments on synthetically generated data. We also perform experiments on real-world graphs and demonstrate the effectiveness of the attention mechanism in several node regression tasks.

2601.23238 2026-02-02 cs.LG

How well do generative models solve inverse problems? A benchmark study

Patrick Krüger, Patrick Materne, Werner Krebs, Hanno Gottschalk

Comments 32 pages, 11 figures, 5 tables

详情
英文摘要

Generative learning generates high dimensional data based on low dimensional conditions, also called prompts. Therefore, generative learning algorithms are eligible for solving (Bayesian) inverse problems. In this article we compare a traditional Bayesian inverse approach based on a forward regression model and a prior sampled with the Markov Chain Monte Carlo method with three state of the art generative learning models, namely conditional Generative Adversarial Networks, Invertible Neural Networks and Conditional Flow Matching. We apply them to a problem of gas turbine combustor design where we map six independent design parameters to three performance labels. We propose several metrics for the evaluation of this inverse design approaches and measure the accuracy of the labels of the generated designs along with the diversity. We also study the performance as a function of the training dataset size. Our benchmark has a clear winner, as Conditional Flow Matching consistently outperforms all competing approaches.

2601.23237 2026-02-02 math.NA cs.MS cs.NA

Applications of QR-based Vector-Valued Rational Approximation

Simon Dirckx

详情
英文摘要

Several applications of the QR-AAA algorithm, a greedy scheme for vector-valued rational approximation, are presented. The focus is on demonstrating the flexibility and practical effectiveness of QR-AAA in a variety of computational settings, including Stokes flow computation, multivariate rational approximation, function extension, the development of novel quadrature methods and near-field approximation in the boundary element method.

2601.23233 2026-02-02 cs.LG

Sequence Diffusion Model for Temporal Link Prediction in Continuous-Time Dynamic Graph

Nguyen Minh Duc, Viet Cuong Ta

详情
英文摘要

Temporal link prediction in dynamic graphs is a fundamental problem in many real-world systems. Existing temporal graph neural networks mainly focus on learning representations of historical interactions. Despite their strong performance, these models are still purely discriminative, producing point estimates for future links and lacking an explicit mechanism to capture the uncertainty and sequential structure of future temporal interactions. In this paper, we propose SDG, a novel sequence-level diffusion framework that unifies dynamic graph learning with generative denoising. Specifically, SDG injects noise into the entire historical interaction sequence and jointly reconstructs all interaction embeddings through a conditional denoising process, thereby enabling the model to capture more comprehensive interaction distributions. To align the generative process with temporal link prediction, we employ a cross-attention denoising decoder to guide the reconstruction of the destination sequence and optimize the model in an end-to-end manner. Extensive experiments on various temporal graph benchmarks show that SDG consistently achieves state-of-the-art performance in the temporal link prediction task.

2601.23226 2026-02-02 cs.AR cs.ET

Toward Digital Twins in 3D IC Packaging: A Critical Review of Physics, Data, and Hybrid Architectures

Gourab Datta, Sarah Safura Sharif, Yaser Mike Banad

详情
英文摘要

Three-dimensional integrated circuit (3D IC) pack-aging and heterogeneous integration have emerged as central pillars of contemporary semiconductor scaling. Yet, the multi-physics coupling inherent to stacked architectures manifesting as thermal hot spots, warpage-induced stresses, and interconnect aging demands monitoring and control capabilities that surpass traditional offline metrology. Although Digital Twin (DT) technology provides a principled route to real-time reliability management, the existing literature remains fragmented and frequently blurs the distinction between static multiphysics simulation workflows and truly dynamic, closed-loop twins. This critical review distinguishes itself by addressing these deficiencies through three specific contributions. First, we clarify the Digital Twin hierarchy to resolve terminological ambiguity between digital models, shadows, and twins. Second, we synthesize three foundational enabling technologies: (1) physics-based modeling, emphasizing the shift from computationally intensive finite-element analysis (FEA) to real-time surrogate models; (2) data-driven paradigms, highlighting virtual metrology (VM) for inferring latent metrics; and (3) in-situ sensing, the nervous system coupling the physical stack to its virtual counterpart. Third, beyond a descriptive survey, we propose a unified hybrid DT architecture that leverages physics-informed machine learning (e.g., PINNs) to reconcile data scarcity with latency constraints. Finally, we outline a standards-aligned roadmap incorporating IEEE 1451 and UCIe protocols to accelerate the transition from passive digital shadows to autonomous, self-optimizing Digital Twins for 3D IC manufacturing and field operation.

2601.23225 2026-02-02 cs.LG cs.AI

Agile Reinforcement Learning through Separable Neural Architecture

Rajib Mostakim, Reza T. Batley, Sourav Saha

详情
英文摘要

Deep reinforcement learning (RL) is increasingly deployed in resource-constrained environments, yet the go-to function approximators - multilayer perceptrons (MLPs) - are often parameter-inefficient due to an imperfect inductive bias for the smooth structure of many value functions. This mismatch can also hinder sample efficiency and slow policy learning in this capacity-limited regime. Although model compression techniques exist, they operate post-hoc and do not improve learning efficiency. Recent spline-based separable architectures - such as Kolmogorov-Arnold Networks (KANs) - have been shown to offer parameter efficiency but are widely reported to exhibit significant computational overhead, especially at scale. In seeking to address these limitations, this work introduces SPAN (SPline-based Adaptive Networks), a novel function approximation approach to RL. SPAN adapts the low rank KHRONOS framework by integrating a learnable preprocessing layer with a separable tensor product B-spline basis. SPAN is evaluated across discrete (PPO) and high-dimensional continuous (SAC) control tasks, as well as offline settings (Minari/D4RL). Empirical results demonstrate that SPAN achieves a 30-50% improvement in sample efficiency and 1.3-9 times higher success rates across benchmarks compared to MLP baselines. Furthermore, SPAN demonstrates superior anytime performance and robustness to hyperparameter variations, suggesting it as a viable, high performance alternative for learning intrinsically efficient policies in resource-limited settings.

2601.23222 2026-02-02 cs.CV

Region-Normalized DPO for Medical Image Segmentation under Noisy Judges

Hamza Kalisch, Constantin Seibold, Jens Kleesiek, Ken Herrmann, Frederic Jonske

详情
英文摘要

While dense pixel-wise annotations remain the gold standard for medical image segmentation, they are costly to obtain and limit scalability. In contrast, many deployed systems already produce inexpensive automatic quality-control (QC) signals like model agreement, uncertainty measures, or learned mask-quality scores which can be used for further model training without additional ground-truth annotation. However, these signals can be noisy and biased, making preference-based fine-tuning susceptible to harmful updates. We study Direct Preference Optimization (DPO) for segmentation from such noisy judges using proposals generated by a supervised base segmenter trained on a small labeled set. We find that outcomes depend strongly on how preference pairs are mined: selecting the judge's top-ranked proposal can improve peak performance when the judge is reliable, but can amplify harmful errors under weaker judges. We propose Region-Normalized DPO (RN-DPO), a segmentation-aware objective which normalizes preference updates by the size of the disagreement region between masks, reducing the leverage of harmful comparisons and improving optimization stability. Across two medical datasets and multiple regimes, RN-DPO improves sustained performance and stabilizes preference-based fine-tuning, outperforming standard DPO and strong baselines without requiring additional pixel annotations.

2601.23215 2026-02-02 cs.LG

Tackling air quality with SAPIENS

Marcella Bona, Nathan Heatley, Jia-Chen Hua, Adriana Lara, Valeria Legaria-Santiago, Alberto Luviano Juarez, Fernando Moreno-Gomez, Jocelyn Richardson, Natan Vilchis, Xiwen Shirley Zheng

Comments 24 pages, 13 figures

详情
英文摘要

Air pollution is a chronic problem in large cities worldwide and awareness is rising as the long-term health implications become clearer. Vehicular traffic has been identified as a major contributor to poor air quality. In a lot of cities the publicly available air quality measurements and forecasts are coarse-grained both in space and time. However, in general, real-time traffic intensity data is openly available in various forms and is fine-grained. In this paper, we present an in-depth study of pollution sensor measurements combined with traffic data from Mexico City. We analyse and model the relationship between traffic intensity and air quality with the aim to provide hyper-local, dynamic air quality forecasts. We developed an innovative method to represent traffic intensities by transforming simple colour-coded traffic maps into concentric ring-based descriptions, enabling improved characterisation of traffic conditions. Using Partial Least Squares Regression, we predict pollution levels based on these newly defined traffic intensities. The model was optimised with various training samples to achieve the best predictive performance and gain insights into the relationship between pollutants and traffic. The workflow we have designed is straightforward and adaptable to other contexts, like other cities beyond the specifics of our dataset.

2601.23213 2026-02-02 cs.IT math.IT

A complete characterisation of conditional entropies

Roberto Rubboli, Erkka Haapasalo, Marco Tomamichel

详情
英文摘要

Entropies are fundamental measures of uncertainty with central importance in information theory and statistics and applications across all the quantitative sciences. Under a natural set of operational axioms, the most general form of entropy is captured by the family of Rényi entropies, parameterized by a real number $α$. Conditional entropy extends the notion of entropy by quantifying uncertainty from the viewpoint of an observer with access to potentially correlated side information. However, despite their significance and the emergence of various useful definitions, a complete characterization of measures of conditional entropy that satisfy a natural set of operational axioms has remained elusive. In this work, we provide a complete characterization of conditional entropy, defined through a set of axioms that are essential for any operationally meaningful definition: additivity for independent random variables, invariance under relabeling, and monotonicity under conditional mixing channels. We prove that the most general form of conditional entropy is captured by a family of measures that are exponential averages of Rényi entropies of the conditioned distribution and parameterized by a real parameter and a probability measure on the positive reals. Finally, we show that these quantities determine the rate of transformation under conditional mixing and provide a set of second laws of quantum thermodynamics with side information for states diagonal in the energy eigenbasis.

2601.23212 2026-02-02 q-bio.BM cs.AI

Disentangling multispecific antibody function with graph neural networks

Joshua Southern, Changpeng Lu, Santrupti Nerli, Samuel D. Stanton, Andrew M. Watkins, Franziska Seeger, Frédéric A. Dreyer

Comments 16 pages, 5 figures, code available at https://github.com/prescient-design/synapse

详情
英文摘要

Multispecific antibodies offer transformative therapeutic potential by engaging multiple epitopes simultaneously, yet their efficacy is an emergent property governed by complex molecular architectures. Rational design is often bottlenecked by the inability to predict how subtle changes in domain topology influence functional outcomes, a challenge exacerbated by the scarcity of comprehensive experimental data. Here, we introduce a computational framework to address part of this gap. First, we present a generative method for creating large-scale, realistic synthetic functional landscapes that capture non-linear interactions where biological activity depends on domain connectivity. Second, we propose a graph neural network architecture that explicitly encodes these topological constraints, distinguishing between format configurations that appear identical to sequence-only models. We demonstrate that this model, trained on synthetic landscapes, recapitulates complex functional properties and, via transfer learning, has the potential to achieve high predictive accuracy on limited biological datasets. We showcase the model's utility by optimizing trade-offs between efficacy and toxicity in trispecific T-cell engagers and retrieving optimal common light chains. This work provides a robust benchmarking environment for disentangling the combinatorial complexity of multispecifics, accelerating the design of next-generation therapeutics.

2601.23211 2026-02-02 cs.MA

Multi-Agent Systems Should be Treated as Principal-Agent Problems

Paulius Rauba, Simonas Cepenas, Mihaela van der Schaar

详情
英文摘要

Consider a multi-agent systems setup in which a principal (a supervisor agent) assigns subtasks to specialized agents and aggregates their responses into a single system-level output. A core property of such systems is information asymmetry: agents observe task-specific information, produce intermediate reasoning traces, and operate with different context windows. In isolation, such asymmetry is not problematic, since agents report truthfully to the principal when incentives are fully aligned. However, this assumption breaks down when incentives diverge. Recent evidence suggests that LLM-based agents can acquire their own goals, such as survival or self-preservation, a phenomenon known as scheming, and may deceive humans or other agents. This leads to agency loss: a gap between the principal's intended outcome and the realized system behavior. Drawing on core ideas from microeconomic theory, we argue that these characteristics, information asymmetry and misaligned goals, are best studied through the lens of principal-agent problems. We explain why multi-agent systems, both human-to-LLM and LLM-to-LLM, naturally induce information asymmetry under this formulation, and we use scheming, where LLM agents pursue covert goals, as a concrete case study. We show that recently introduced terminology used to describe scheming, such as covert subversion or deferred subversion, corresponds to well-studied concepts in the mechanism design literature, which not only characterizes the problem but also prescribes concrete mitigation strategies. More broadly, we argue for applying tools developed to study human agent behavior to the analysis of non-human agents.

2601.23208 2026-02-02 stat.ML cs.LG

A Random Matrix Theory of Masked Self-Supervised Regression

Arie Wortsman Zurich, Federica Gerace, Bruno Loureiro, Yue M. Lu

详情
英文摘要

In the era of transformer models, masked self-supervised learning (SSL) has become a foundational training paradigm. A defining feature of masked SSL is that training aggregates predictions across many masking patterns, giving rise to a joint, matrix-valued predictor rather than a single vector-valued estimator. This object encodes how coordinates condition on one another and poses new analytical challenges. We develop a precise high-dimensional analysis of masked modeling objectives in the proportional regime where the number of samples scales with the ambient dimension. Our results provide explicit expressions for the generalization error and characterize the spectral structure of the learned predictor, revealing how masked modeling extracts structure from data. For spiked covariance models, we show that the joint predictor undergoes a Baik--Ben Arous--Péché (BBP)-type phase transition, identifying when masked SSL begins to recover latent signals. Finally, we identify structured regimes in which masked self-supervised learning provably outperforms PCA, highlighting potential advantages of SSL objectives over classical unsupervised methods

2601.23206 2026-02-02 cs.AI

High-quality generation of dynamic game content via small language models: A proof of concept

Morten I. K. Munk, Arturo Valdivia, Paolo Burelli

详情
英文摘要

Large language models (LLMs) offer promise for dynamic game content generation, but they face critical barriers, including narrative incoherence and high operational costs. Due to their large size, they are often accessed in the cloud, limiting their application in offline games. Many of these practical issues are solved by pivoting to small language models (SLMs), but existing studies using SLMs have resulted in poor output quality. We propose a strategy of achieving high-quality SLM generation through aggressive fine-tuning on deliberately scoped tasks with narrow context, constrained structure, or both. In short, more difficult tasks require narrower scope and higher specialization to the training corpus. Training data is synthetically generated via a DAG-based approach, grounding models in the specific game world. Such models can form the basis for agentic networks designed around the narratological framework at hand, representing a more practical and robust solution than cloud-dependent LLMs. To validate this approach, we present a proof-of-concept focusing on a single specialized SLM as the fundamental building block. We introduce a minimal RPG loop revolving around rhetorical battles of reputations, powered by this model. We demonstrate that a simple retry-until-success strategy reaches adequate quality (as defined by an LLM-as-a-judge scheme) with predictable latency suitable for real-time generation. While local quality assessment remains an open question, our results demonstrate feasibility for real-time generation under typical game engine constraints.

2601.23201 2026-02-02 eess.IV cs.CV cs.LG

Scale-Cascaded Diffusion Models for Super-Resolution in Medical Imaging

Darshan Thaker, Mahmoud Mostapha, Radu Miron, Shihan Qiu, Mariappan Nadar

Comments Accepted at IEEE International Symposium for Biomedical Imaging (ISBI) 2026

详情
英文摘要

Diffusion models have been increasingly used as strong generative priors for solving inverse problems such as super-resolution in medical imaging. However, these approaches typically utilize a diffusion prior trained at a single scale, ignoring the hierarchical scale structure of image data. In this work, we propose to decompose images into Laplacian pyramid scales and train separate diffusion priors for each frequency band. We then develop an algorithm to perform super-resolution that utilizes these priors to progressively refine reconstructions across different scales. Evaluated on brain, knee, and prostate MRI data, our approach both improves perceptual quality over baselines and reduces inference time through smaller coarse-scale networks. Our framework unifies multiscale reconstruction and diffusion priors for medical image super-resolution.

2601.23200 2026-02-02 cs.CE

Large Language Models for Patent Classification: Strengths, Trade-offs, and the Long Tail Effect

Lorenzo Emer, Marco Lippi, Andrea Mina, Andrea Vandin

Comments 44 pages, 8 figures

详情
英文摘要

Patent classification into CPC codes underpins large scale analyses of technological change but remains challenging due to its hierarchical, multi label, and highly imbalanced structure. While pre Generative AI supervised encoder based models became the de facto standard for large scale patent classification, recent advances in large language models (LLMs) raise questions about whether they can provide complementary capabilities, particularly for rare or weakly represented technological categories. In this work, we perform a systematic comparison of encoder based classifiers (BERT, SciBERT, and PatentSBERTa) and open weight LLMs on a highly imbalanced benchmark dataset (USPTO 70k). We evaluate LLMs under zero shot, few shot, and retrieval augmented prompting, and further assess parameter efficient fine tuning of the best performing model. Our results show that encoder based models achieve higher aggregate performance, driven by strong results on frequent CPC subclasses, but struggle on rare ones. In contrast, LLMs achieve relatively higher performance on infrequent subclasses, often associated with early stage, cross domain, or weakly institutionalised technologies, particularly at higher hierarchical levels. These findings indicate that encoder based and LLM based approaches play complementary roles in patent classification. We additionally quantify inference time and energy consumption, showing that encoder based models are up to three orders of magnitude more efficient than LLMs. Overall, our results inform responsible patentometrics and technology mapping, and motivate hybrid classification approaches that combine encoder efficiency with the long tail coverage of LLMs under computational and environmental constraints.

2601.23198 2026-02-02 cs.CC

Planar Graph Homomorphisms: A Dichotomy and a Barrier from Quantum Groups

Jin-Yi Cai, Ashwin Maran, Ben Young

Comments 59 pages, submitted to STOC'26

详情
英文摘要

We study the complexity of counting (weighted) planar graph homomorphism problem $\tt{Pl\text{-}GH}(M)$ parametrized by an arbitrary symmetric non-negative real valued matrix $M$. For matrices with pairwise distinct diagonal values, we prove a complete dichotomy theorem: $\tt{Pl\text{-}GH}(M)$ is either polynomial-time tractable, or $\#$P-hard, according to a simple criterion. More generally, we obtain a dichotomy whenever every vertex pair of the graph represented by $M$ can be separated using some planar edge gadget. A key question in proving complexity dichotomies in the planar setting is the expressive power of planar edge gadgets. We build on the framework of Mančinska and Roberson to establish links between \textit{planar} edge gadgets and the theory of the \textit{quantum automorphism group} $\tt{Qut}(M)$. We show that planar edge gadgets that can separate vertex pairs of $M$ exist precisely when $\tt{Qut}(M)$ is \emph{trivial}, and prove that the problem of whether $\tt{Qut}(M)$ is trivial is undecidable. These results delineate the frontier for planar homomorphism counting problems and uncover intrinsic barriers to extending nonplanar reduction techniques to the planar setting.

2601.23193 2026-02-02 cs.SI

Network analysis and link prediction in competitive women's basketball

Anthony Bonato, Morganna Hinds

详情
英文摘要

Network structure and its role in prediction are examined in competitive basketball at the team and player levels. Adversarial game outcome networks from NCAA Division I women's basketball from 2021 to 2024 are used to compute the common out-neighbor score and PageRank, which are combined into a low-key leader strength that identifies competitors influential through structural similarity despite relatively low centrality. This measure is related to changes in NCAA NET rankings by grouping teams into quantiles and comparing average rank changes across seasons for both previous-to-current and current-to-next transitions. Link prediction is then studied using node2vec embeddings across three interaction settings. For NCAA regular-season game networks, cosine similarity between team embeddings is used in a logistic regression model to predict March Madness matchups. For WNBA shot-blocking networks, future directed blocking interactions are predicted via logistic regression on concatenated source-target player embeddings. For WNBA passing networks, region embeddings learned from first-quarter passes are evaluated for their ability to predict subsequent passing connections. Across NCAA and WNBA settings, embedding-based models provide statistically significant evidence that higher-order network structure contains predictive signals for future interactions, while the passing experiment shows weaker predictive performance but yields interpretable similarity patterns consistent with passing feasibility.

2601.23188 2026-02-02 cs.CL

Deep Search with Hierarchical Meta-Cognitive Monitoring Inspired by Cognitive Neuroscience

Zhongxiang Sun, Qipeng Wang, Weijie Yu, Jingxuan Yang, Haolang Lu, Jun Xu

Comments 11 pages, 3 figures

详情
英文摘要

Deep search agents powered by large language models have demonstrated strong capabilities in multi-step retrieval, reasoning, and long-horizon task execution. However, their practical failures often stem from the lack of mechanisms to monitor and regulate reasoning and retrieval states as tasks evolve under uncertainty. Insights from cognitive neuroscience suggest that human metacognition is hierarchically organized, integrating fast anomaly detection with selectively triggered, experience-driven reflection. In this work, we propose Deep Search with Meta-Cognitive Monitoring (DS-MCM), a deep search framework augmented with an explicit hierarchical metacognitive monitoring mechanism. DS-MCM integrates a Fast Consistency Monitor, which performs lightweight checks on the alignment between external evidence and internal reasoning confidence, and a Slow Experience-Driven Monitor, which is selectively activated to guide corrective intervention based on experience memory from historical agent trajectories. By embedding monitoring directly into the reasoning-retrieval loop, DS-MCM determines both when intervention is warranted and how corrective actions should be informed by prior experience. Experiments across multiple deep search benchmarks and backbone models demonstrate that DS-MCM consistently improves performance and robustness.

2601.23185 2026-02-02 math.NA cs.NA

Preconditioning and Numerical Stability in Neural Network Training for Parametric PDEs

Markus Bachmayr, Wolfgang Dahmen, Chenguang Duan, Mathias Oster

详情
英文摘要

In the context of training neural network-based approximations of solutions of parameter-dependent PDEs, we investigate the effect of preconditioning via well-conditioned frame representations of operators and demonstrate a significant improvement on the performance of standard training methods. We also observe that standard representations of preconditioned matrices are insufficient for obtaining numerical stability and propose a generally applicable form of stable representations that enables computations with single- and half-precision floating point numbers without loss of precision.

2601.23184 2026-02-02 cs.CL

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Fanmeng Wang, Haotian Liu, Guojiang Zhao, Hongteng Xu, Zhifeng Gao

详情
英文摘要

While Chain-of-Thought (CoT) significantly enhances the performance of Large Language Models (LLMs), explicit reasoning chains introduce substantial computational redundancy. Recent latent reasoning methods attempt to mitigate this by compressing reasoning processes into latent space, but often suffer from severe performance degradation due to the lack of appropriate compression guidance. In this study, we propose Rendered CoT-Guided variational Latent Reasoning (ReGuLaR), a simple yet novel latent learning paradigm resolving this issue. Fundamentally, we formulate latent reasoning within the Variational Auto-Encoding (VAE) framework, sampling the current latent reasoning state from the posterior distribution conditioned on previous ones. Specifically, when learning this variational latent reasoning model, we render explicit reasoning chains as images, from which we extract dense visual-semantic representations to regularize the posterior distribution, thereby achieving efficient compression with minimal information loss. Extensive experiments demonstrate that ReGuLaR significantly outperforms existing latent reasoning methods across both computational efficiency and reasoning effectiveness, and even surpasses CoT through multi-modal reasoning, providing a new and insightful solution to latent reasoning. Code: https://github.com/FanmengWang/ReGuLaR.

2601.23183 2026-02-02 cs.CL

JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and JDs

Casimiro Pio Carrino, Paula Estrella, Rabih Zbib, Carlos Escolano, José A. R. Fonollosa

Comments Under review

详情
英文摘要

We introduce JobResQA, a multilingual Question Answering benchmark for evaluating Machine Reading Comprehension (MRC) capabilities of LLMs on HR-specific tasks involving résumés and job descriptions. The dataset comprises 581 QA pairs across 105 synthetic résumé-job description pairs in five languages (English, Spanish, Italian, German, and Chinese), with questions spanning three complexity levels from basic factual extraction to complex cross-document reasoning. We propose a data generation pipeline derived from real-world sources through de-identification and data synthesis to ensure both realism and privacy, while controlled demographic and professional attributes (implemented via placeholders) enable systematic bias and fairness studies. We also present a cost-effective, human-in-the-loop translation pipeline based on the TEaR methodology, incorporating MQM error annotations and selective post-editing to ensure an high-quality multi-way parallel benchmark. We provide a baseline evaluations across multiple open-weight LLM families using an LLM-as-judge approach revealing higher performances on English and Spanish but substantial degradation for other languages, highlighting critical gaps in multilingual MRC capabilities for HR applications. JobResQA provides a reproducible benchmark for advancing fair and reliable LLM-based HR systems. The benchmark is publicly available at: https://github.com/Avature/jobresqa-benchmark