arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4056
2605.09216 2026-05-12 cs.RO

Continuum Robot Modeling with Action Conditioned Flow Matching

Jiong Lin, Jinchen Ruan, Hod Lipson

AI总结 该研究针对肌腱驱动连续体机器人的稳态形状预测问题,提出了一种基于动作条件的流匹配建模方法。通过构建轻量3D打印硬件平台和RGB-D数据采集系统,学习从电机动作状态到机器人最终3D几何结构的映射模型,实现了对连续体机器人形状的高精度预测。实验表明,该方法在模拟和实际硬件中均优于现有方法,并可扩展用于基于末端负载条件的形状预测,展示了数据驱动的自建模框架在准静态连续体机器人几何预测中的有效性。

Comments 14 pages, 9 figures

详情
英文摘要

Predicting the shape of tendon driven continuum robots (TDCRs) at steady state from actuation remains challenging due to continuous deformation, complex tendon routing, compliance, friction, and fabrication variability. In this paper, we address this problem as kinematic self modeling conditioned on action. We present a lightweight 3D printed TDCR hardware platform and an RGB-D data collection pipeline with multiple cameras, and we learn a point cloud flow matching model that maps motor actuation states to the robot's settled 3D geometry. The model is trained from randomly sampled quasi static configurations and evaluated on test motor commands within the same TDCR design family and actuation range. We compare against prior 3D deformable object and robot self modeling approaches in both MuJoCo simulation and real hardware experiments. Experiments on simulated 2-, 3-, and 5-module TDCRs and real 2- and 3-module robots show improved shape prediction accuracy under CD and EMD metrics. We further show in simulation that the same conditional formulation generalizes to tip payload as a conditioning input, enabling payload conditioned steady-state shape prediction. These results demonstrate a data driven self modeling framework for quasi static TDCR geometry prediction.

2605.09214 2026-05-12 cs.LG cs.AI cs.IT math.IT math.ST stat.ML stat.TH

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Quanquan Gu

AI总结 本文研究了在单策略可集中性条件下,使用前向KL正则化的离线上下文老虎机问题,提出了首个达到 $\tilde{O}(ε^{-1})$ 的上界分析,显著优于以往 $\tilde{O}(ε^{-2})$ 的慢速率结果。通过引入一种新的凸分析方法,结合悲观原则,统一了表格和一般函数逼近场景,并避开了基于平均值定理的传统证明方法。此外,作者还给出了匹配的下界,证明了所获上界在统计速率上的最优性,并揭示了前向KL正则化在低正则化区域与无正则化方法具有一致的慢速率表现。

Comments 31 pages, comments are welcome

详情
英文摘要

\emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $ε^{-1}$-type fast rates for decision making under reverse KL regularization, in contrast to the standard $ε^{-2}$-type sample complexity. However, for forward-KL-regularized objectives, existing statistical analyses are either not applicable or result in $\tilde{O}(ε^{-2})$ slow rates. We take the first step towards addressing this problem via a streamlined analysis of forward-KL-regularized offline CBs. We give the first $\tilde{O}(ε^{-1})$ upper bounds in tabular and general function approximation settings, both under notions of \emph{single-policy concentrability}. In particular, our convex-analytical pipeline unifies these settings by exploiting the pessimism principle in a novel way and completely bypasses the proof routines in previous works based on the mean value theorem, which might be of independent interest. Moreover, we provide rate-optimal lower bounds, manifesting the tightness of our upper bounds in terms of statistical rates. Our lower bounds also demonstrate that the forward-KL-regularized sample complexity recovers the unregularized slow rate in the low-regularization regime, similarly to the reverse-KL regularization.

2605.09212 2026-05-12 cs.LG

Rethinking Ratio-Based Trust Regions for Policy Optimization in Multi-Agent Reinforcement Learning

Chulabhaya Wijesundara, Andrea Baisero, Zhongheng Li, Gregory Castañón, Alan Carlin, Christopher Amato

AI总结 本文研究了多智能体强化学习中基于比率的信任区域策略优化方法的问题,指出现有方法如MAPPO和MASPO在处理队友非平稳性时会导致优势估计方差增大,进而影响策略更新的稳定性。为此,作者提出了一种新的优化目标MARS,通过引入对称几何屏障替代原有的加法信任区域机制,有效保留了校正梯度并避免概率比趋近于零时的成本爆炸。实验表明,MARS在多个多智能体环境中表现优异,优于或匹敌现有方法。

详情
英文摘要

Centralized training with decentralized execution (CTDE) is a standard framework for cooperative multi-agent policy-gradient reinforcement learning, allowing agents to learn from joint information while acting from local observations. Ratio-based trust-region methods such as Multi-Agent Proximal Policy Optimization (MAPPO) and Multi-Agent Simple Policy Optimization (MASPO) update decentralized actors using per-agent probability ratios weighted by joint advantage estimates. Teammate non-stationarity increases the variance of these advantages, which in turn increases the variance in the local ratio updates. This exposes two method-specific failure modes: MAPPO's additive clipping removes gradients for outlier samples and weakens recovery from policy drift, while MASPO's soft quadratic penalty can allow probability collapse. We introduce Multi-Agent Ratio Symmetry (MARS), a novel policy optimization objective that replaces these additive ratio-based trust-region mechanisms with a multiplicatively symmetric geometric barrier. MARS preserves corrective gradients while assigning unbounded cost as probability ratios approach zero. Across 47 tasks spanning eight multi-agent environments, including novel JAX benchmarks PaxMen and AeroJAX, MARS matches or exceeds MAPPO and MASPO in aggregate environment-level performance. Ablations show that these gains arise from the geometry of the symmetric barrier rather than from flexible trust-region boundaries alone.

2605.09208 2026-05-12 cs.LG

TSNN: A Non-parametric and Interpretable Framework for Traffic Time Series Forecasting

Bowen Liu, Haijian Lai, Chan-Tong Lam, Junhao Dong, Benjamin Ng, Wei Ke, Sio-Kei Im

AI总结 本文提出了一种名为TSNN的非参数且可解释的交通时间序列预测框架。该方法通过匹配记忆库中的条目来解耦时间序列,利用交通数据的周期性提升预测精度,同时保持模型结构简单且无需可训练参数。实验表明,TSNN在四个真实交通流量数据集上的表现与典型深度学习模型相当,并通过可视化展示了其解耦过程和模型的可解释性。

Comments Accepted by IEEE Transactions on Knowledge and Data Engineering

详情
英文摘要

Although many complex models were proposed to analyze time series data, some studies have demonstrated remarkable performance with simpler structures. A recent study proposed a non-parametric framework for 3D point cloud classification, which has the potential to be adapted for time series forecasting and enable interpretability. Inspired by the previous works, we present TSNN, a non-parametric and interpretable framework for traffic time series forecasting. TSNN consists of multiple layers that decouple the time series by matching the entries in a memory bank, where the memory bank is constructed using a similar matching process within the training set. It leverages the periodicity in traffic data to enhance forecasting accuracy while maintaining a simple model architecture. The proposed model operates without trainable parameters, preserving its inherent interpretability. In the experiments, TSNN achieves competitive performance compared to the typical deep learning models in four real-world traffic flow datasets. We also visualize the decoupling process to show the effectiveness of the components. Finally, we demonstrate the interpretability of the model and illustrate the contribution of each time step within the memory bank.

2605.09204 2026-05-12 cs.LG

LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces

Shaun Christopher Lee, Sangeetha Abdu Jyothi

AI总结 反向传播本质上是深度方向的串行过程,导致训练效率受限。本文提出了一种名为LBI的新方法,通过引入低维潜在接口来限制区域间通信,将基于扫描的反向传播计算复杂度从$O(d^3)$大幅降低至$O(r^3)$,从而实现了高效的并行训练。实验表明,LBI在多种模型架构中保持了良好的性能,仅需较小的接口维度即可接近全连接模型的训练效果,为区域并行训练提供了算法基础。

详情
英文摘要

Backpropagation is inherently sequential across depth, creating an $O(K)$-deep dependency chain that bottlenecks parallel training. While parallel-scan formulations theoretically reduce this depth to $O(\log K)$, they are computationally prohibitive for modern architectures due to the $O(d^3)$ cost of composing full-rank $d\times d$ Jacobians over the entire hidden state. We introduce Latent Bounded Interfaces (LBI), an algorithmic formulation that makes scan-based backpropagation tractable by restricting inter-region communication to a low-dimensional latent interface, $ m_k \in \mathbb{R}^{r}$, where $r \ll d$. This reduces the adjoint recursion to a suffix scan over $r \times r$ Jacobians, cutting per-combine cost from $O(d^3)$ to $O(r^3)$ while preserving exact gradients under the bounded-interface model. We demonstrate that LBI maintains model quality across four architectures (Mamba-2, Mamba-3, Transformer, and a Mamba--Transformer hybrid) at 47--61M block parameters. Interfaces of dimension $r=16$ suffice to preserve training quality within 0.16--0.35 cross entropy of dense baselines. The resulting framework provides an algorithmic foundation for region-parallel training, reducing cross-device backward communication to a single scan over $K$ fixed-size matrices, of approximately 56 KB for our experimental configurations.

2605.09200 2026-05-12 cs.LG

On Characterizing Learnability for Adversarial Noisy Bandits

Steve Hanneke, Kun Wang

AI总结 本文研究了在已知函数类 $\mathcal{F}$ 的情况下对抗性噪声老虎机问题的可学习性。在每一轮中,对手选择一个函数 $f \in \mathcal{F}$,学习者选择一个动作并观察由该动作和函数 $f$ 决定的噪声奖励,目标是通过最小化累积遗憾 $R(T)$ 来优化性能。文章引入了一种凸化后的广义极大极小体积概念,用于刻画函数类的可学习性,并分别针对非自适应和自适应对手给出了相应的学习性刻画,同时提出了关于不可数动作空间的猜想及相关复杂度度量。

详情
英文摘要

We study adversarial noisy bandits given a known function class $\mathcal{F}$. In each round, the adversary selects a function $f \in \mathcal{F}$, the learner chooses an arm, and then observes a noisy reward determined by the chosen arm and the function $f$. The goal is to minimize the cumulative regret $R(T)$, defined as the difference between the learner's performance and that of the best fixed arm in hindsight over $T$ rounds. We say that a function class $\mathcal{F}$ is learnable if there exists an algorithm achieving sublinear regret. Our main results concern characterizing learnability. The main quantity appearing in our characterization is a convexified variant of the generalized maximin volume introduced by Hanneke and Wang (2025). For oblivious adversaries, we characterize learnability in terms of this convexified generalized maximin volume. For adaptive adversaries, we show that the same quantity characterizes learnability when the arm space is countable. Our analysis builds on a connection between convexified generalized maximin volume and the existence of simple hitting sets. We further conjecture that the same quantity also characterizes learnability when the arm space is uncountable, via its relation to a new complexity measure, which we call the distribution covering number. This notion can be viewed as a strengthened form of the hitting set that still admits efficient learning via the multiplicative weights algorithm. We also pose a number of relevant open questions regarding this problem.

2605.09196 2026-05-12 cs.CV cs.AI cs.GR cs.LG cs.RO

RigidFormer: Learning Rigid Dynamics using Transformers

Zhiyang Dou, Minghao Guo, Haixu Wu, Doug Roble, Tuur Stuyck, Wojciech Matusik

AI总结 本文提出了一种基于Transformer的模型RigidFormer,用于学习多物体刚体动力学,特别适用于点云等无网格表示。该模型通过对象级的锚点进行动态建模,结合锚点-顶点池化和基于锚点的RoPE注意力机制,实现了高效且高保真的刚体运动模拟。RigidFormer在多个基准测试中表现优于传统网格基方法,计算效率更高,并能处理大量物体和不同点云分辨率的输入。

Comments Project Page: https://people.csail.mit.edu/frankzydou/projects/RigidFormer/index.html

详情
英文摘要

Learning-based simulation of multi-object rigid-body dynamics remains difficult because contact is discontinuous and errors compound over long horizons. Most existing methods remain tied to mesh connectivity and vertex-level message passing, which limits their applicability to mesh-free inputs such as point clouds and leads to high computational cost. Efficiently modeling high-fidelity rigid-body dynamics from mesh-free representations, therefore, remains challenging. We introduce RigidFormer, an object-centric Transformer-based model that learns mesh-free rigid-body dynamics with controllable integration step sizes. RigidFormer reasons at the object level and advances each object through compact anchors; Anchor-Vertex Pooling enriches these anchors with local vertex features, retaining contact-relevant geometry without dense vertex-level interaction. We propose Anchor-based RoPE to inject anchor geometry into attention while respecting the unordered nature of objects and anchors: object-token processing is permutation-equivariant, and the mean-pooled anchor descriptor is invariant to anchor reindexing while preserving shape extent. RigidFormer further enforces rigidity by projecting updates onto the rigid-body manifold using differentiable Kabsch alignment. On standard benchmarks, RigidFormer outperforms or matches mesh-based baselines using point inputs, runs faster, generalizes to unseen point resolutions and across datasets, and scales to 200+ objects; we also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components.

2605.09195 2026-05-12 cs.AI

The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations

Rania Elbadry, Ahmed Heakl, Fan Zhang, Dani Bouch, Yuxia Wang, Preslav Nakov, Zhuohan Xie

AI总结 本文研究了大语言模型在生成过时答案时的结构性原因,指出时间漂移(即训练后事实的变化)在模型的残差流中被编码为一个与正确性与不确定性正交的独立方向。研究通过多种实验验证了这一几何特性,并表明基于正确性或不确定性的方法无法检测时间漂移,而直接针对漂移标签训练的线性探针能取得较高的检测性能。该研究揭示了模型内部知识状态与输出之间的机制联系,并提供了可公开获取的代码和数据集。

详情
英文摘要

Large language models confidently produce outdated answers, and no existing method can detect them. We show this is not an engineering failure but a structural one: temporal drift, whether a stored fact has changed since training, is encoded as a direction in the residual stream geometrically orthogonal to both correctness and uncertainty. Any method operating on correctness or uncertainty signals is therefore blind to drift by construction. We verify this across six instruction-tuned models. A linear probe trained directly on drift labels achieves AUROC $0.83$--$0.95$; methods based on token entropy, semantic entropy, CCS, and SAPLMA all remain near chance ($0.49$--$0.57$). Five tests confirm the geometric orthogonality: weight cosines ($|\cos| \leq 0.14$), score correlations ($|r| \leq 0.20$), bidirectional null-space projection ($|Δ| \leq 0.008$), iterative null-space projection with $k{=}10$, and difference-of-means dissociation. Mechanistically, the MLP retrieval circuit produces identical dynamics for stale recall and confabulation ($r > 0.81$, six models), explaining why output confidence cannot separate them. A cross-cutoff experiment holds inputs constant and varies only the model: the probe fires on the model whose training predates the fact's transition and stays silent otherwise ($P(A{>}B) = 0.975$--$0.998$, twelve model pairs), confirming it reads model-internal knowledge state rather than input properties. Our code and datasets will be publicly released.

2605.09190 2026-05-12 cs.CV

AQMP: Image compression through Adaptive Quadtree Refinement and Matching Pursuit with Hyperparameter Optimization

Franco Cerino, Emmanuel Tassone, Manuel Tiglio

AI总结 本文提出了一种新型图像编码方法 AQMP,结合自适应四叉树划分与匹配追踪技术,通过动态调整块大小以适应图像局部结构,从而在保证图像质量的前提下实现更高的压缩率。该方法引入超参数优化机制,利用树结构帕尔森估计器进行多目标优化,获得压缩效率与视觉质量之间的最佳平衡。实验表明,AQMP 在与 JPEG 相当的结构相似度(SSIM)下,压缩率可提升至其 4 倍,且在不同压缩条件下均表现出良好的性能。

Comments 34 pages, 18 figures

详情
英文摘要

We present AQMP, a novel image codec combining Adaptive Quadtree Refinement with Matching Pursuit. Unlike conventional Matching Pursuit methods that operate on fixed-size sub-images, AQMP dynamically adapts block sizes to local image structure, allocating finer partitions where the image is complex and coarser ones where it is smooth. This adaptivity yields superior compression ratios compared to fixed-size block Matching Pursuit at equivalent image quality, while offering significant parallelization opportunities at both the tree-leaf level and during compression of individual nodes. The algorithm is governed by user-specified accuracy and sparsity parameters alongside a small set of additional hyperparameters. To navigate the trade-off between compression efficiency and visual quality, we perform multi-objective hyperparameter optimization using the Tree-Structured Parzen Estimator, producing comprehensive Pareto fronts. Experimental results show that AQMP achieves up to $4\times$ higher compression rates than JPEG at comparable SSIM values, while maintaining competitive quality across a broad range of compression regimes. Performance evaluation is provided using a representative set of test images. To ensure reproducibility and promote adoption, we have made our implementation publicly available on GitHub under the MIT license.

2605.09189 2026-05-12 cs.LG

Practical Scaling Laws: Converting Compute into Performance in a Data-Constrained World

Christopher M. Bryant, Hao Liu

AI总结 本文研究了在数据受限条件下如何将计算资源有效转化为模型性能,提出了对传统缩放定律的改进模型。传统定律在数据丰富、单轮预训练场景下适用,但在数据有限或多轮训练时存在局限,如无法准确描述过拟合和数据不足的情况。作者提出了一种闭式扩展公式,将损失分解为欠容量、欠训练和过拟合项,并在多个实验中验证了其有效性,显著提升了对大规模语言模型缩放规律的预测精度。

详情
英文摘要

The scaling laws guiding modern model training were calibrated for a single regime: data-rich, single-epoch pretraining. The dominant such scaling law form, Chinchilla's $L = E + A/N^α+ B/D^β$, has three structural limitations outside that regime: it diverges as unique data shrinks instead of saturating at the uninformed baseline; it cannot represent overfitting when capacity exceeds the data; and it conflates total examples seen with unique examples available. We propose a closed-form extension, $L(N, D, T) = E + (L_0 - E)\,h/(1+h)$ with $h = a/N^α+ b/T^β+ c\,N^γ/D^δ$, that decomposes loss into undercapacity, undertraining, and overfitting terms. It saturates between the irreducible loss $E$ and an uninformed baseline $L_0$ fixed by the loss type, and reduces to Chinchilla in the data-rich, single-epoch limit. We validate it on four multi-epoch experiments spanning four architecture families (MLPs, ResNets, Fourier neural operators, and transformers) across vision, scientific ML, and language domains, and refit it to five published LLM scaling-law grids. Extrapolating to higher compute and larger unique data than seen at fit time, our form achieves state-of-the-art RMSE on every published LLM grid we evaluate and on most cells of our constructed experiments. Once calibrated, the form admits a cost-aware allocation that recovers Chinchilla's optimum when data is free and shifts toward smaller corpora and more epochs as data grows expensive.

2605.09188 2026-05-12 cs.LG cs.AI

DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation

Yang Zhou, Can Jin, Zihan Dong, Zhepeng Wang, Yanting Yang, Shiyu Zhao, Lei Li, Runxue Bao, Yaochen Xie, Dimitris N. Metaxas

AI总结 DARE 是一种难度自适应的强化学习框架,旨在提升大语言模型的训练效率和推理效果。该方法通过与策略协同进化的难度估计机制,结合对称 Beta 分布采样和分层训练策略,实现了对不同难度任务的有效学习与响应优化。实验表明,DARE 在多个模型和领域中均优于现有方法,显著提升了训练效率、最终性能和推理效率,尤其在简单任务中生成更简洁的响应,在复杂任务中提升正确性。

详情
英文摘要

Reinforcement learning improves the reasoning ability of large language models but remains costly and sample-inefficient, as many rollouts provide weak learning signals. Difficulty-aware data selection methods attempt to address this by prioritizing moderately difficult prompts, yet our analysis reveals three limitations: difficulty estimates become inaccurate under policy drift, data selection alone yields limited final-performance gains, and inference efficiency remains largely unchanged. These findings suggest that efficient and effective RL requires more than filtering by difficulty: the policy should learn to solve hard tasks while producing concise responses for easy ones. To this end, we propose **Dare**, a unified framework that co-evolves difficulty estimation with the policy via self-normalized importance sampling, maintains diverse difficulty coverage through a symmetric Beta sampling distribution, and applies tailored training strategies across difficulty tiers with adaptive compute allocation. Extensive experiments across multiple models and domains demonstrate that **Dare** consistently outperforms existing methods in training efficiency, final effectiveness, and inference efficiency, producing more concise responses on easy tasks while improving correctness on hard ones. Code is available at https://github.com/EtaYang10th/DARE.

2605.09187 2026-05-12 cs.AI cs.CL cs.LG

Emergent Semantic Role Understanding in Language Models

Carla Griffiths, Mirco Musolesi

AI总结 本文研究语言模型在预训练过程中是否能自发产生语义角色理解能力,即“谁对谁做了什么”的语义表征。研究通过冻结解码器-only transformer 模型,并训练线性探针来提取语义角色,发现预训练模型的冻结表示中已包含大量语义角色信息,尽管性能未达到微调模型的水平。这表明语义角色理解在预训练中部分出现,但尚未完全形成,且随着模型规模增大,其内部表征趋于更分散的分布形式。

详情
英文摘要

Understanding how linguistic structure emerges in language models is central to interpreting what these systems learn from data and how much supervision they truly require. In particular, semantic role understanding ("who did what to whom") is a core component of meaning representation, yet it remains unclear whether it arises from pre-training alone or depends on task-specific fine-tuning. We study whether semantic role understanding emerges during language model pre-training or requires task-specific fine-tuning. We freeze decoder-only transformers and train linear probes to extract semantic roles, using performance to infer whether role information is already encoded in pre-training or learned during adaptation. Across model scales, we find that frozen representations contain substantial semantic role information, with performance improving but not fully matching fine-tuned models. This indicates partial but incomplete emergence from pre-training alone. We show that semantic role structure emerges from language modeling objectives, but its internal implementation shifts toward more distributed representations as model scale increases.

2605.09186 2026-05-12 cs.AI cs.CL

Agentic MIP Research: Accelerated Constraint Handler Generation

Liding Xu, Yugeng Zhou, Sebastian Pokutta

AI总结 该研究提出了一种基于智能体的混合整数规划(MIP)研究框架,通过嵌入大型语言模型(LLM)代理,加速了约束处理模块的生成与验证过程。核心方法是将MIP问题中的约束形式化为全局约束,并自动生成仅用于传播的SCIP约束处理模块。实验表明,该框架能够从约束规划中恢复全局约束结构,并在MIPLIB 2017基准集上成功生成可执行的约束检测器,提升了求解效率,展示了LLM代理在自动推进MIP算法研究中的潜力。

详情
英文摘要

Mixed-integer programming (MIP) research is both mathematically sophisticated and engineering-intensive: testing an algorithmic hypothesis within a branch-and-cut solver requires substantial implementation, debugging, tuning, and large-scale benchmarking. We propose an agentic MIP research framework that shortens this feedback loop by embedding LLM agents into a solver-aware harness for generating, verifying, and evaluating plugins for the open-source solver SCIP. Propagation methods play a central role in accelerating MIP solving by exploiting global constraints. We instantiate our framework on the semantic lifting of MIP formulations into global constraints and the automatic construction of propagation-only SCIP constraint handlers. On the MIPLIB 2017 benchmark set, the framework successfully recovers global constraint structures from constraint programming and generates executable constraint detectors and propagation-only constraint handlers. Furthermore, the framework naturally extends to in-context learning within a sandboxed environment, enabling agents not only to tune and debug generated constraint handlers on real instances, but also to explore global constraint patterns in MIP problems and discover novel propagation strategies not yet implemented in SCIP. This framework allows us to systematically distinguish meaningful algorithmic improvements from low-value or overly costly candidates: the novel propagation methods successfully solved five additional instances within the explored benchmark. Overall, this framework demonstrates that LLM agents can autonomously navigate the complex MIP research loop, paving the way for a more automated solver development process.

2605.09184 2026-05-12 cs.AI cs.CL cs.DB

Open Ontologies: Tool-Augmented Ontology Engineering with Stable Matching Alignment

Fabio Rovai

AI总结 本文提出了一种名为 Open Ontologies 的开源本体工程系统,结合大语言模型(LLM)构建、形式化 OWL 推理与基于模型上下文协议的本体对齐。研究发现,稳定的 1 对 1 匹配是提升本体对齐质量的关键因素,在 OAEI 解剖学数据集上取得了较高的 F1 值,且在精确度上优于现有先进系统。实验还表明,通过结构化工具接口访问本体信息,比直接读取原始 OWL 文件能显著提升 LLM 的对齐性能,展示了工具结构在本体交互中的重要作用。

Comments 10 pages, 6 tables. Code: https://github.com/fabio-rovai/open-ontologies

详情
英文摘要

We present Open Ontologies, an open-source ontology engineering system implemented in Rust that integrates LLM-driven construction with formal OWL reasoning and ontology alignment via the Model Context Protocol. Our primary finding is that stable 1-to-1 matching is the dominant factor in ontology alignment quality: on the OAEI Anatomy track, it achieves F1 = 0.832 (P = 0.963, R = 0.733), competitive with state-of-the-art systems and exceeding all in precision. Ablation across five weight configurations shows that signal weights are irrelevant when stable matching is applied (F1 varies by less than 0.004), while removing stable matching drops F1 to 0.728. On the Conference track, the same method achieves F1 = 0.438. On tool-augmented ontology interaction, we find a surprising result: an LLM reading a raw OWL file (F1 = 0.323) performs worse than the same LLM with no file at all (F1 = 0.431), while structured MCP tool access achieves F1 = 0.717. This demonstrates that tool structure provides a qualitatively different mode of access that the LLM cannot replicate by reading raw syntax. The system ships as a single binary under the MIT licence.

2605.09181 2026-05-12 cs.CV cs.ET eess.IV

Establishing Robust Retinal Eye Tracking: A Weakly Supervised Algorithmic Framework

Bo Wen, Dillon Lohr, Yatong An, Pushkar Anand, Alexander Fix, Ruobing Qian, Catherine A. Fromm, Yimin Ding, Truong Nguyen, Mohamed El-Haddad, Francesco La Rocca

AI总结 本文提出了一种基于弱监督学习的新型框架,用于实现鲁棒的视网膜眼动追踪。该方法克服了传统模板匹配方法在应对视网膜特征变化和实际成像条件时的不足,初步实验表明其在6名受试者中达到95百分位的注视误差小于0.45度,具有较高的准确性。这一成果为眼科成像和视觉科学中的眼动追踪提供了新的技术路径。

Comments 2026 IEEE International Conference on Image Processing (Accepted for Publication)

详情
英文摘要

Retinal image-based eye tracking is widely used in ophthalmic imaging and vision science, and is a promising path to deliver higher gaze accuracy than the pupil- and cornea-based approaches commonly used in modern AR/VR devices. Nevertheless, existing retinal tracking algorithms still primarily rely on classical template-matching registration, which can be insufficiently robust to retinal feature variability and real-world imaging conditions. In this work, we propose a novel weakly-supervised, learning-based framework for robust retinal eye tracking. Initial studies demonstrate high accuracy, achieving the 95th-percentile gaze error < 0.45 deg across a cohort of 6 participants.

2605.09176 2026-05-12 cs.LG cs.AI

Navigating LLM Valley: From AdamW to Memory-Efficient and Matrix-Based Optimizers

Aditya Ranganath

AI总结 本文综述了大语言模型优化器的设计进展,重点探讨了从传统一阶优化器到基于矩阵的高效优化方法的演变。研究分析了多种优化技术,包括自适应矩估计、内存优化、曲率感知和低秩投影等,并讨论了评估这些优化器的基准方法。文章指出,当前大模型优化研究正从单一算法加速转向更全面的、考虑规模与实现复杂性的综合性能比较。

Comments No figures, 65 pages

详情
英文摘要

Training large language models requires optimization algorithms that are not only statistically effective, but also computationally and memory efficient at extreme scale. Although Adam remains the dominant optimizer for large-scale language-model pretraining and fine-tuning, recent work has revisited nearly every component of the optimization stack: adaptive moment estimation, decoupled weight decay, memory footprint, curvature approximation, sign-based updates, large-batch stability, low-rank gradient structure, and matrix-wise orthogonalized updates. This survey reviews optimizer design for large language models through a systems-and-optimization lens. We organize the literature into classical first-order optimizers, adaptive optimizers, memory-efficient variants, second-order and curvature-aware methods, sign-based and discovered optimizers, low-rank and projection-based methods, and matrix-based optimizers such as Muon. We also discuss benchmarking methodology, including hyperparameter fairness, scale dependence, wall-clock efficiency, token efficiency, memory overhead, and downstream evaluation. We argue that optimizer research for LLMs is entering a new phase: moving from single-algorithm speedup claims toward rigorous, scale-aware comparisons that jointly evaluate convergence, stability, memory, and implementation complexity.

2605.09173 2026-05-12 cs.LG cs.AI

WavesFM: Hierarchical Representation Learning for Longitudinal Wearable Sensor Waveforms

Peng Cao, Zhijian Yang, Tennison Liu, Jonathan Wang, Jiang Wu, Magdalena Proszewska, Arvind Pillai, Mingwu Gao, Amir Farjadian, Lawrence Cai, Emily Blanchard, Daniel McDuff, Pramod Rudrapatna, Matthew Thompson, Anupam Pathak, Mark Malhotra, Shwetak Patel, Dina Katabi, Paolo Di Achille, Ming-Zher Poh

AI总结 WavesFM 是一种用于长期可穿戴传感器波形的分层表征学习方法,旨在解决高采样频率、多模态依赖和长序列长度带来的健康表型推断挑战。该方法采用两阶段自监督学习框架,首先通过局部编码器提取短时波形的嵌入,再通过时间编码器建模多天尺度上的动态变化,从而同时捕捉局部信号特征和生理节律等复杂模式。WavesFM 在大量真实数据上预训练,表现出在人口统计、生活方式、健康状况和用药等多个任务上的优越性能。

详情
英文摘要

Wearable sensors enable the continuous acquisition of high-resolution physiological waveforms, such as photoplethysmography and accelerometry, under free-living conditions. However, inferring health-related phenotypes from these signals presents significant challenges due to high sampling frequencies, multimodal dependencies, and extreme sequence lengths (e.g., weeks of recordings), compounded by a scarcity of ground-truth labels. To address these challenges, existing self-supervised learning (SSL) methodologies typically follow two paradigms: (1) learning rich morphological representations from short waveform segments while collapsing longitudinal dynamics through simple aggregation, or (2) modeling behavioral patterns from coarse, hand-crafted features (e.g. heart rate, step counts) spanning longer horizons but foregoing subtle, predictive signatures in raw waveforms. To bridge this gap, we propose WavesFM, a foundation model utilizing a two-stage SSL framework for longitudinal physiological data. Specifically, we decompose the learning problem into two stages: first, a segment-level encoder is pretrained to extract local embeddings from short waveforms; subsequently, a temporal encoder is trained to model the sequence of these embeddings across a multi-day horizon. This hierarchical approach overcomes the computational complexity of high-resolution, long-sequence data, allowing the overall model to capture both local signal semantics and the complex circadian and inter-day variations governing physiological dynamics. Pretrained on over 6.8M hours (N=324k individuals) of recordings for the first stage and 5.3M hours (N=10k) for the second stage, WavesFM demonstrates superior performance across 58 diverse tasks spanning demographics, lifestyle, health conditions, and medications.

2605.09168 2026-05-12 cs.AI cs.LG

CIVeX: Causal Intervention Verification for Language Agents

Fabio Rovai

AI总结 本文提出了一种名为 CIVeX 的因果干预验证器,用于验证语言智能体在调用工具时是否具有可识别的因果效应,从而确保其行为的安全性和有效性。CIVeX 通过将拟执行的动作映射到结构化的因果查询,检查因果效应的可识别性,并返回四种可审计的判断结果,如执行、拒绝、实验或回避。实验表明,CIVeX 在多种基准测试中表现出色,尤其在对抗性混淆场景下,其准确率和效用均优于现有方法,凸显了因果可识别性在工具使用中的关键作用。

Comments 16 pages, 3 figures. Includes Causal-ToolBench, IHDP, ZOZO Open Bandit, and LaLonde NSW evaluations

详情
英文摘要

A valid tool call is not necessarily a valid intervention. Tool-using language agents are guarded by schema validators, policy filters, provenance checks, state predictors, and self-verification, yet such safeguards do not certify that a state-changing action has an identifiable causal effect. In confounded workflows, the action that looks optimal in observational logs can reduce utility when executed. We introduce CIVeX, a causal intervention verifier that maps proposed actions to structural causal queries over a committed action-state graph, checks identifiability, and returns one of four auditable verdicts: EXECUTE, REJECT, EXPERIMENT, or ABSTAIN. Execution requires an assumption-scoped causal certificate carrying graph commitments, an identification argument, a one-sided lower confidence bound (LCB), provenance, and risk limits. On Causal-ToolBench (1,890 instances, 7 seeds), CIVeX yields zero observed false executions across moderate and adversarial confounding. Under adversarial confounding it reaches 84.9% accuracy and 81.1% of oracle utility (+2.23 vs +2.76) and is the only non-oracle method whose constrained utility under a zero-false-execution constraint exceeds the AlwaysAbstain floor. On IHDP and ZOZO Open Bandit (real production logs with uniform-random ground truth), CIVeX matches Oracle correct-execution within 0.1pp and cuts per-execute false-execution by >=50x over naive baselines. A chain-of-thought LLM verifier (Claude Opus, Sonnet) cuts false-execution by an order of magnitude over a terse baseline, yet under adversarial confounding Opus's utility falls to 74% of CIVeX's. Intervention identifiability, not action validity, is the missing primitive for reliable tool use.

2605.09167 2026-05-12 cs.CL cs.AI cs.LG

WorldSpeech: A Multilingual Speech Corpus from Around the World

Antonis Asonitis, Luca A. Lanzendörfer, Frédéric Berdoz, Roger Wattenhofer

AI总结 该研究提出了WorldSpeech,一个包含65,000小时对齐音频-文本数据的多语言语音语料库,涵盖76种语言,数据来源包括议会记录、国际广播和公共领域有声书等。该语料库为37种语言提供了超过200小时的对齐语音数据,其中28种语言超过500小时,24种语言超过1,000小时。在11种语言上对现有语音识别模型进行微调后,平均相对词错误率降低了63.5%,显著提升了低资源语言的语音识别性能。

详情
英文摘要

Automatic speech recognition (ASR) performs well for high-resource languages with abundant paired audio-transcript data, but its accuracy degrades sharply for most languages due to limited publicly available aligned data. To this end, we introduce WorldSpeech, a 24 kHz multilingual speech corpus comprising 65k hours of aligned audio-transcript data across 76 languages, collected from diverse public sources including parliamentary proceedings, international broadcasts, and public-domain audiobooks. For 37 languages, WorldSpeech provides more than 200 hours of aligned speech, with 28 exceeding 500 hours and 24 surpassing 1k hours. Fine-tuning existing ASR models on WorldSpeech results in an average relative Word-Error-Rate reduction of 63.5% across 11 typologically diverse languages.

2605.09165 2026-05-12 cs.LG cs.CL

Sparse Layers are Critical to Scaling Looped Language Models

Ryan Lee, Jacob Biloki, Edward J. Hu, Jonathan May

AI总结 本文研究了循环语言模型在扩展性方面的关键问题,发现稀疏层(如Mixture-of-Experts)对提升模型性能至关重要。通过对比标准和混合专家(MoE)结构的循环与非循环模型,研究发现循环-MoE模型在扩展性上优于标准模型,其原因是循环中不同专家的激活提升了模型表达能力。此外,循环模型在计算与质量的权衡上更具优势,尤其在早期退出点上表现更优,为大规模模型的高效推理提供了新方向。

详情
英文摘要

Looped language models repeat a set of transformer layers through depth, reducing memory costs and providing natural early-exit points at loop boundaries. However, looped models do not scale as favorably as standard transformers with unique layers. We compare standard and Mixture-of-Experts (MoE) transformers, with and without looping, and find two main results. First, we find Looped-MoE models scale better than the standard baseline while dense looped models do not. We trace this to routing divergence between loops: in Looped-MoE models, different experts are activated on each pass through the same shared layers, recovering expressivity without additional parameters. Our second finding is that looped models have better compute-quality trade-offs with early exits than standard models. Because each loop ends with the same layers that produce the final output, loop boundaries are superior exit points, as confirmed by earlier output convergence at these points. In sum, we provide a clear direction for scaling looped models: a Looped-MoE model with early exits can not only beat standard transformers at scale, but also enable significant memory and inference savings with minimal degradation in quality.

2605.09160 2026-05-12 cs.LG

Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning

Arghamitra Talukder, Philippe Chlenski, Itsik Pe'er

AI总结 该研究探讨了如何通过全前缀的套娃表示学习(MRL)方法,学习到与任务目标对齐的特权基底,以解决传统表征学习中维度不可辨识的问题。研究证明,在线性情况下,全前缀MRL能够有效恢复有序的主成分方向,并可通过共享统计量高效计算。实验表明,MRL能够生成与任务信号一致的维度结构,其中各维度的数值大小反映了其信息量。

详情
英文摘要

Learned representations are often invariant to rotational transformations, leaving individual dimensions non-identifiable and interchangeable. We study how Matryoshka Representation Learning (MRL) induces a task-aligned privileged basis distinct from variance-based or regularizer-induced orderings. In the linear setting, we prove that full-prefix MRL recovers the ordered principal directions, and can be computed efficiently using shared statistics. Empirically, we demonstrate that MRL yields consistent per-dimension structure aligned with task signal, where coordinate magnitude reflects informativeness.

2605.09159 2026-05-12 cs.AI

Do LLMs Experience an Internal Polylogue? Investigating Reasoning through the Lens of Personas

Nils A. Herrmann, Leander Girrbach, Kirill Bykov, Zeynep Akata

AI总结 本文探讨了大型语言模型(LLMs)在推理过程中是否经历“内部多角色对话”,即不同行为特征(称为“角色向量”)在生成过程中的动态变化。研究将这些角色向量视为动态信号,通过分析其与隐藏激活之间的对齐时间序列(称为“多角色对话”),揭示了模型推理过程中的行为变化模式。实验表明,多角色对话特征能够有效预测模型在MMLU-Pro任务中的表现,并为推理过程中的行为引导提供了可解释的干预方向,展示了其在推理时监控与调控中的应用潜力。

详情
英文摘要

Recent work shows that large language models (LLMs) encode behavioural traits ("personas") as linear directions in activation space, often called "persona vectors". Prior work has used such directions as static handles for behavioural steering. Building on this, we treat them as dynamic signals instead: probes we can monitor and intervene on as reasoning unfolds. We use the term polylogue to denote the time series of alignments between persona vectors and hidden activations over the course of generation. Experiments across four open-weight models show that polylogue features predict correctness on MMLU-Pro competitively with low-dimensional activation baselines, while remaining interpretable through their associated persona directions. They also suggest concrete steering targets, namely which latent directions to modulate at different stages of a response. We instantiate this as a simple paragraph-conditioned intervention that improves accuracy on three of four models, pointing to stage-aware latent steering as a promising direction for reasoning-time control. Together, this positions the polylogue as an interpretable tool for reasoning-time monitoring and intervention.

2605.09157 2026-05-12 cs.LG cs.AI

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

Jiamin He, Samuel Neumann, Jincheng Mei, Adam White, Martha White

AI总结 本文研究了在熵正则化策略梯度框架下混合策略的实用性,指出尽管混合策略在理论上比单一策略更具灵活性,但在实际应用中其优势尚未被充分挖掘。作者提出了一种边际化重参数化(MRP)估计器,解决了混合策略在梯度估计中高方差的问题,并通过实验表明,基于MRP的混合策略在多个标准任务中表现优于传统似然比方法,甚至可与高斯策略相媲美,验证了其作为实用工具的潜力。

详情
英文摘要

Mixture policies theoretically offer greater flexibility than unimodal policies in continuous action reinforcement learning, but the practical benefits of this complexity remain elusive. Mixture policies are notably absent from most state-of-the-art algorithms, raising a fundamental question: Is the added representational overhead useful? We show that increased flexibility can theoretically enhance solution quality and entropy robustness. Yet standard algorithms like SAC do not leverage these advantages. A core issue is the lack of a low-variance reparameterization trick for mixtures, a luxury Gaussian policies enjoy. We propose a marginalized reparameterization (MRP) estimator to address this, proving it offers lower variance than the standard likelihood-ratio (LR) approach. Our experiments across Gym MuJoCo, DeepMind Control Suite, and MetaWorld show that MRP mixture policies significantly outperform their LR ones, and reach parity (sometimes better) with Gaussian counterparts. In addition, we do find several cases where MRP mixture policies exhibit clear empirical advantages. In this paper, we provide a clearer understanding of the trade-offs involved, elevating MRP mixture policies from theoretical curiosity to a practical tool.

2605.09154 2026-05-12 cs.LG

Predicting Large Model Test Losses with a Noisy Quadratic System

Chuning Li, Chris J. Maddison

AI总结 本文提出了一种预测模型,能够根据模型规模(N)、批量大小(B)和权重更新次数(K)来估计大模型的预训练损失。该模型是首个能够处理批量大小变化的损失预测模型,在计算预算外推(高达1000倍)方面优于基于批量大小和token数的Chinchilla损失模型。该模型可用于在时间、内存和计算等资源约束下寻找最优的N、B、K配置,实验表明其预测结果接近真实最优配置。

Comments ICML 2026

详情
英文摘要

We introduce a predictive model that estimates the pre-training loss of large models from model size (N), batch size (B) and number of weight updates (K). This is the first loss prediction model that can handle changing batch size. The model outperforms Chinchilla's loss model, a model of the test loss using the batch size and number of tokens, in terms of projecting the loss at extrapolated compute budgets (up to 1000 folds). A natural use of the model is to find optimal N, B, K configurations under explicit and compound resource constraints like time, memory and compute. In our experiments, the model-selected configurations are close to ground-truth optimal. Our work advocates for loss prediction as a better alternative to heuristic-based laws, which are growing in complexity. The implementation is available on https://github.com/chuningxdy/Noisy-Quadratic-System.

2605.09153 2026-05-12 cs.RO cs.AI

Beyond Self-Play: Hierarchical Reasoning for Continuous Motion in Closed-Loop Traffic Simulation

Weifan Zhang, Xiaofeng Zhao, Adel Bazzi, Mingrui Li, Yifan Wei, Dengfeng Sun

AI总结 该研究针对封闭式交通仿真中智能体的行为真实性与可扩展性问题,提出了一种超越自对抗训练的分层推理框架。该方法结合高层多智能体交互推理与底层连续轨迹生成,通过Stackelberg风格的多智能体强化学习生成意图指令,并将其转化为物理合理、场景响应的控制序列。实验表明,该框架在控制平滑性与安全性方面优于自对抗和被动模仿方法,同时保持了良好的交通效率。

Comments Submitted to IEEE Robotics and Automation Letters (RA-L)

详情
英文摘要

Closed-loop traffic simulation requires agents that are both scalable and behaviorally realistic. Recent self-play reinforcement learning approaches demonstrate strong scalability, but their equilibrium strategies fail to capture the socially aware behaviors of real human drivers. We propose a hierarchical architecture that goes beyond self-play by combining high-level multi-agent interaction reasoning with low-level continuous trajectory realization. Specifically, a Stackelberg-style Multi-Agent Reinforcement Learning (MARL) module generates interaction-aware intention commands. These commands condition a low-level continuous motion module, translating the strategic intent into physically consistent, scene-responsive control sequences. To mitigate distribution shift in closed-loop deployment, we introduce a hybrid co-training scheme combining MARL with auxiliary recovery supervision. Experiments on a SUMO-based urban network demonstrate that the proposed framework achieves superior control smoothness and safety compared to self-play and passive imitation baselines, while maintaining competitive traffic efficiency.

2605.09152 2026-05-12 cs.CL q-bio.NC

Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology

Jucheng Hu, Zhangquan Chen, Yulin Chen, Chengjie Hong, Liang Zhou, Tairan Wang, Sifei Li, Giulio Zhu, Feng Zhou, Yiheng Zeng, Suorong Yang, Dongzhan Zhou

AI总结 本文提出 Meow-Omni 1,一种专为猫类行为学研究设计的多模态大语言模型,旨在解决动物意图识别中的语义混淆问题。该模型首次融合视频、音频、生理时间序列和文本信息,通过跨模态对齐和专用科学编码器实现对猫的内部状态进行更准确的推理。实验表明,Meow-Omni 1 在新构建的 MeowBench 数据集上取得了领先的意图识别准确率,并开源了模型权重、训练框架和数据集,为跨物种意图理解和实际应用提供了新范式。

详情
英文摘要

Deciphering animal intent is a fundamental challenge in computational ethology, largely because of semantic aliasing, the phenomenon where identical external signals (e.g., a cat's purr) correspond to radically different internal states depending on physiological context. Existing Multimodal Large Language Models (MLLMs) are blind to high-frequency biological time-series data, restricting them to superficial behavioural pattern matching rather than genuine latent-state reasoning. To bridge this gap, we introduce Meow-Omni 1, the first open-source, quad-modal MLLM purpose-built for computational ethology. It natively fuses video, audio, and physiological time-series streams with textual reasoning. Through targeted architectural adaptation, we integrate specialized scientific encoders into a unified backbone and formalize intent inference via physiologically grounded cross-modal alignment. Evaluated on MeowBench, a novel, expert-verified quad-modal benchmark, Meow-Omni 1 achieves state-of-the-art intent-recognition accuracy (71.16%), substantially outperforming leading vision-language and omni-modal baselines. We release the complete open-source pipeline including model weights, training framework, and the Meow-10K dataset, to establish a scalable paradigm for inter-species intent understanding and to advance foundation models toward real-world veterinary diagnostics and wildlife conservation.

2605.09151 2026-05-12 cs.CV

MultiMedVision: Multi-Modal Medical Vision Framework

Frank Li, Bardia Khosravi, Mohammadreza Chavoshi, Young Seok Jeon, Theo Dapamede, Hari Trivedi, Janice Newsome, Judy Gichoya

AI总结 本文提出了一种名为 MultiMedVision 的多模态医学视觉框架,旨在统一处理二维(如X光)和三维(如CT)医学影像数据。该框架基于稀疏视觉变换器,通过三维旋转位置嵌入和可变长度序列打包技术,在共享的潜在空间中直接处理混合模态数据,无需模态特定适配器或将三维体积视为二维切片序列。实验表明,MultiMedVision 在多个医学影像基准测试中表现出色,验证了其在跨维度统一表征学习上的有效性。

Comments 9 pages, 2 figures

详情
英文摘要

Multi-modal medical imaging enables comprehensive diagnostics, yet current foundation models process 2D (e.g. X-ray) and 3D (e.g. CT) data with separate, dimensionality-specific architectures. We present MultiMedVision, a unified framework for joint 2D/3D representation learning built on a Sparse Vision Transformer. Our model uses 3D Rotary Positional Embeddings and variable-length sequence packing to process mixed-modality batches natively within a shared latent space, without modality-specific adapters or treating 3D volumes as 2D slice sequences. Trained with a self-supervised objective on chest X-rays (MIMIC-CXR) and CT scans (CT-RATE), and using a single shared encoder with 5x less data, MultiMedVision achieves competitive performance on both 2D benchmarks (Macro AUROC 0.82 on MIMIC, 0.84 on CheXpert) and 3D tasks (0.85 on CT-RATE). Analysis of the learned representations reveals coexisting modality-specific and shared feature subspaces, demonstrating that unified cross-dimensional representation learning is feasible without sacrificing modality-specific performance.

2605.09150 2026-05-12 cs.LG

AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play

Vlad Murgoci, Matthijs Spaan, Yaniv Oren

AI总结 本文提出了一种名为AlphaExploitem的新方法,旨在扑克等不完全信息博弈中超越纳什均衡,通过学习利用对手的非最优策略来提升自身收益。该方法基于已有强化学习扑克代理AlphaHoldem,引入了分层Transformer编码器以增强对历史牌局的推理能力,并通过引入多样化的可被利用对手来改进训练过程。实验表明,AlphaExploitem能够有效识别并利用不同分布对手的弱点,同时在面对纳什均衡对手时仍能保持良好表现。

详情
英文摘要

Poker is an imperfect information game that has served as a long-standing benchmark for decision-making under uncertainty. To maximize utility beyond the Nash equilibrium, an agent can deviate from Nash-equilibrium policies to exploit suboptimal play. We introduce AlphaExploitem, which extends the competitive RL poker agent AlphaHoldem by using a hierarchical transformer encoder that enables reasoning over previously played hands and modifying the training procedure with the inclusion of a diverse pool of exploitable opponents to facilitate learning to exploit. We train and evaluate AlphaExploitem on two standard benchmarks for imperfect-information games. Empirically, AlphaExploitem successfully exploits weak play by both in- and out-of-distribution opponents, without losing performance against NE opponents.

2605.09147 2026-05-12 cs.CL cs.AI stat.AP

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

Matthias Schöffel, Esteban Garces Arias

AI总结 本文对比研究了传统词性标注工具与大型语言模型(LLMs)在中世纪罗曼语(包括中世纪奥克语、加泰罗尼亚语和法语)词性标注任务中的表现。研究发现,基于LLM的方法在零样本、少样本、单语微调和跨语言迁移等设置下均优于传统标注工具,其中微调和多语训练效果最佳。研究还指出,跨语言迁移对资源匮乏的语言尤为有效,而有针对性的双语训练在特定目标语言上可能优于更广泛的多语配置,为历史自然语言处理提供了重要的实践指导。

Comments Accepted at NLP4DH @ ACL 2026

详情
英文摘要

Part-of-speech (POS) tagging for Medieval Romance languages remains challenging due to orthographic variation, morphological complexity, and limited annotated resources. This paper presents a systematic empirical evaluation of large language models (LLMs) for POS tagging across three medieval varieties: Medieval Occitan, Medieval Catalan, and Medieval French. We compare traditional rule-based and statistical taggers with modern open-source LLMs under zero-shot prompting, few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning settings. Experiments on historically grounded datasets show that LLM-based approaches consistently outperform traditional taggers, with fine-tuning and multilingual training yielding the largest improvements. In particular, cross-lingual transfer learning substantially benefits under-resourced varieties, while targeted bilingual training can outperform broader multilingual configurations for specific target languages. The results highlight the importance of linguistic proximity and dataset characteristics when designing transfer strategies for historical NLP. These findings provide empirical insights into the applicability of modern neural methods to medieval text processing and provide practical guidance for deploying LLM-based POS tagging pipelines in digital humanities research. All code, models, and processed datasets are released for reproducibility.

2605.09146 2026-05-12 cs.CV

Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search

Jingdong Zhang, Yizhou Wang, Zhengzhong Tu, Xin Li, Wenping Wang, Xiaohang Zhan

AI总结 本文研究了人形视觉搜索(HVS)问题,即智能体在360度沉浸式环境中主动探索目标。为了解决现有方法依赖繁琐的多轮推理链(CoT)所带来的高认知负担和数据标注成本,作者提出了一种新的框架“Imagining in 360°”,将探索过程解耦为Imaginator和Actor两个模块。Imaginator通过一次推理预测环境的语义布局,为Actor提供多样化的空间信息分布,从而在不确定环境下实现高效搜索。该方法大幅降低了数据工程成本,并在复杂真实环境中显著提升了搜索效率和成功率。

详情
英文摘要

Humanoid Visual Search (HVS) requires agents to actively explore immersive 360$^\circ$ environments. While prior methods treat this as a monolithic task relying on cumulative, multi-turn Chain-of-Thought (CoT) reasoning, they impose heavy cognitive burdens and require expensive trajectory-level annotations. In this paper, we propose Imagining in 360$^\circ$, a novel framework that decouples the exploration process into a specialized Imaginator and an Actor. The Imaginator functions as a probabilistic predictor of spatial priors; instead of maintaining a cumulative reasoning chain, it infers the semantic layout of both observed and unobserved regions in a single step. By sampling multiple hypotheses within this semantic space, we provide the Actor with a distribution of effective spatial information, offering robust guidance that hedges against uncertainty during active search. This decoupled architecture significantly lowers data engineering costs by eliminating the need for full-trajectory CoT annotations, enabling the generation of over 1.96 million curated training samples. Extensive experiments demonstrate that explicitly modeling semantic spatial priors drastically improves search efficiency and success rates in complex, in-the-wild environments.