arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.29355 2026-06-11 cs.LG q-bio.NC

Neural-Behavioral Representation of Natural Whole-body Movement in Monkeys

猴子自然全身运动的神经-行为表征

Jieshi He, Puzhe Li, Yanan Sui, Mu-ming Poo

发表机构 * Center for Excellence in Brain Science and Intelligence Technology, CAS(脑科学与智能技术 excellence 中心,中国科学院) Tsinghua University(清华大学)

AI总结 通过大规模皮层信号与多视角运动捕捉,结合自回归编码器-解码器模型,实现了对自由运动猴子全身运动的准确解码。

详情
AI中文摘要

理解皮层活动如何表征灵长类动物的自然全身行为仍然具有挑战性。受限于运动的多样性和全身运动学大规模神经表征的不可及性,先前的运动解码研究集中于受限任务和有限的肢体运动。在这里,我们提出了一个用于自由运动猴子的神经-行为记录和建模框架,通过定制的数据采集平台,将来自分布式感觉和运动相关区域的大规模硬膜外皮层信号与同步的多视角运动捕捉相结合。我们重建了猴子的全身运动学,并使用自回归编码器-解码器模型学习了紧凑的行为先验。以神经信号为条件,该模型在没有明确物理约束的情况下解码出准确且逼真的全身运动。我们的结果为利用大规模颅内神经活动解码灵长类动物的自然全身运动提供了一种新颖的概念验证方法。

英文摘要

Understanding how cortical activity represents natural whole-body behaviors in primates remains challenging. Limited by the diversity of movements and inaccessibility of large-scale neural representation of whole-body kinematics, previous motor decoding studies focused on constrained tasks and limited limb movements. Here, we present a neural-behavioral recording and modeling framework for freely moving monkeys, combining large-scale epidural cortical signals from distributed sensory- and motor-related areas with synchronized multi-view motion capture through a custom-made data collection platform. We reconstructed whole-body monkey kinematics and learned a compact behavior prior using an autoregressive encoder-decoder model. Conditioned on neural signals, the model decoded accurate and realistic whole-body movement without explicit physical constraints. Our results provide a novel proof-of-concept approach for decoding natural whole-body movements in primates using large-scale intracranial neural activity.

2605.29292 2026-06-11 cs.CV

Turbulence-Robust Dynamic Object Segmentation with Multi-Signal Priors and SAM2 Refinement

基于多信号先验和SAM2优化的湍流鲁棒动态目标分割

Bolian Peng, Ying Tang, Xu Liu, Long Sun, Xiaoqiang Lu

发表机构 * Xidian University(西安电子科技大学)

AI总结 提出一种无需训练的多信号分割流水线,结合RAFT运动估计、DINOv2语义先验、ViBe背景建模和SAM2掩码优化,解决大气湍流下的动态目标分割问题。

详情
Journal ref
Proceedings of the CVPR 2026 Workshops, UG2+ Challenge, 2026
AI中文摘要

本技术报告介绍了我们针对CVPR 2026 UG2+挑战赛第三赛道:湍流中动态目标分割(DOST)的解决方案。我们设计了一种无需训练的多信号分割流水线,结合了预训练的运动估计、自监督语义先验、背景异常建模、手动校准的提议融合以及基于SAM2的掩码优化。该方法使用RAFT获取密集运动响应,DINOv2获取语义目标先验,ViBe进行无需训练的背景建模,以及预训练的SAM2进行框提示掩码优化。 我们的系统完全在推理模式下运行,而不是优化端到端的分割网络。这种设计适用于DOST场景,其中严重的大气湍流会产生伪运动、模糊和间歇性目标可见性,使得单一运动线索不可靠。最终提交的掩码由官方排行榜评估,报告了0.425041 mIoU和0.457206 mDice。由于没有进行特定任务的模型训练或微调,更强的学习时间关联、自适应提议选择或任务特定适应可能进一步改进系统。

英文摘要

This technical report presents our solution for the CVPR 2026 UG2+ Challenge Track 3: Dynamic Object Segmentation in Turbulence (DOST). We design a training-free multi-signal segmentation pipeline that combines pretrained motion estimation, self-supervised semantic priors, background anomaly modeling, manually calibrated proposal fusion, and SAM2-based mask refinement. The method uses RAFT for dense motion responses, DINOv2 for semantic objectness priors, ViBe for training-free background modeling, and pretrained SAM2 for box-prompt mask refinement. Instead of optimizing an end-to-end segmentation network, our system operates entirely in inference mode. This design is suitable for the DOST setting, where severe atmospheric turbulence produces pseudo-motion, blur, and intermittent target visibility, making a single motion cue unreliable. The final submitted masks are evaluated by the official leaderboard, which reports 0.425041 mIoU and 0.457206 mDice. Since no task-specific model training or fine-tuning is performed, stronger learned temporal association, adaptive proposal selection, or task-specific adaptation may further improve the system.

2605.22509 2026-06-11 cs.HC cs.CL

Reflecti-Mate: A Conversational Agent for Adaptive Decision-Making Support Through System 1 and System 2 Thinking

Reflecti-Mate: 通过系统1和系统2思维实现自适应决策支持的对话代理

Morita Tarvirdians, Senthil Chandrasegaran, Hayley Hung, Catholijn M. Jonker, Catharine Oertel

发表机构 * TU Delft(代尔夫特理工大学) TU Delft/Leiden University(代尔夫特理工大学/莱顿大学)

AI总结 本文研究了一种对话代理,通过适应个体思维模式促进决策整合,该代理能提供更个性化的反思路径和整合性反思语言,优于传统决策支持系统。

详情
Journal ref
UMAP 2026: Proceedings of the 34th ACM Conference on User Modeling, Adaptation and Personalization
Comments
Accepted at UMAP 2026
AI中文摘要

在做出高风险个人决策时,涉及认知、情感和直觉过程,个体在这些模式间的注意力分配各不相同。整合这些过程已被证明有助于决策。然而,大多数现有决策支持系统主要支持认知方面,而非适应个体的思维特征以促进不同思维类型的整合。在本研究中,我们探讨了一种代理,旨在通过适应个体用户思维模式来促进整合。我们探讨了该代理对参与者对代理的看法及其反思行为的影响,与未受助的预反思和基线代理进行比较。在被试间研究(N=128)中,我们的代理促进了广泛且深入的思考,使参与者能够形成更个性化的反思轨迹,产生更多整合性的反思语言,并被感知为提供更强的全面反思支持。相比之下,基线代理产生了受认知语言主导的同质化特征。

英文摘要

Making high-stakes personal decisions involves cognitive, emotional, and intuitive processes, and individuals differ in how they allocate attention across these modes. Integration of these processes has shown to benefit decision making. Yet, most current decision-support systems focus primarily on supporting cognitive aspects, rather than adapting to the individual's thinking profile to support integration of different types of thoughts. In this study, we investigate an agent designed to encourage integration by adapting to the individual user's thought patterns. We explore its effects on participants' perceptions of the agent and their reflective behavior, in comparison with unaided pre-reflection and a baseline agent. In a between-subjects study (N = 128), our agent, which fostered broad and elaborated thinking, enabled more personalized reflective trajectories, elicited more integrative reflective language, and was perceived as providing stronger support for holistic reflection. In contrast, the baseline agent produced homogenized profiles dominated by cognitive language across participants.

2605.17773 2026-06-11 cs.CV

PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation

PlantPose: 通过树约束图生成实现通用植物骨架估计

Xinpeng Liu, Hiroaki Santo, Yosuke Toda, Fumio Okura

发表机构 * Graduate School of Information Science and Technology, Osaka University(大阪大学信息科学与技术研究生院) Phytometrics(Phytometrics公司) Institute of Transformative Bio-Molecules, Nagoya University(名古屋大学变革生物分子研究所)

AI总结 本文提出PlantPose,一种通过树约束图生成实现通用植物骨架估计的方法,通过结合学习基于图生成和传统图算法,提高模型的泛化能力,并在多个领域实现了鲁棒且准确的植物骨架估计。

详情
Comments
International Journal of Computer Vision, 2026
AI中文摘要

准确地从图像中估计植物骨架结构(例如分支结构)对于智能农业和植物科学至关重要。与人类骨骼固定拓扑结构不同,植物骨架估计面临独特的挑战,即从图像中估计任意树状图。为了解决这个问题,我们介绍了PlantPose,一种通过树约束图生成实现的通用植物骨架估计器。PlantPose结合了基于学习的图生成与传统图算法,在训练循环中强制执行树约束。为了提高模型的泛化能力,我们精心编排了一个包含真实世界和合成植物图像以及简化表示(例如草图和抽象画)的大型多样化数据集。该数据集使通用模型能够适应各种输入样式和植物图像类别,同时保持拓扑一致性。我们的方法在多个领域实现了鲁棒且准确的植物骨架估计,包括之前未见过的域外场景。进一步的分析突显了该方法在处理复杂、异质数据分布方面的优势和局限性。所有实现和数据集均在https://github.com/huntorochi/PlantPose/上提供。

英文摘要

Accurate estimation of plant skeletal structures (e.g., branching structures) from images is essential for smart agriculture and plant science. Unlike human skeletons with fixed topology, plant skeleton estimation presents a unique challenge, i.e., estimating arbitrary tree graphs from images. To address this problem, we introduce PlantPose, a universal plant skeleton estimator via tree-constrained graph generation. PlantPose combines learning-based graph generation with traditional graph algorithms to enforce tree constraints during the training loop. To enhance the model's generalization capability, we curate a large and diverse dataset comprising real-world and synthetic plant images, along with simplified representations (e.g., sketches and abstract drawings). This dataset enables the generalized model to adapt to diverse input styles and categories of plant images while preserving topological consistency. Our approach demonstrates robust and accurate plant skeleton estimation across multiple domains, including previously unseen out-of-domain scenarios. Further analyses highlight the method's strengths and limitations in handling complex, heterogeneous data distributions. All implementations and datasets are available at https://github.com/huntorochi/PlantPose/.

2604.01383 2026-06-11 cs.CV cs.AI

GRAZE: Grounded Refinement and Motion-Aware Zero-Shot Event Localization

Syed Ahsan Masud Zaidi, Lior Shamir, William Hsu, Scott Dietrich, Talha Zaidi

发表机构 * Kansas State University(堪萨斯州立大学) Albright College(阿尔比恩学院)

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 10087-10095, June 2026
Comments
9 pages, 5 figures, accepted to the CVPR 2026 Workshop on Computer Vision in Sports (CVSports) code: https://github.com/AhsanZaidi12/GRAZE
英文摘要

American football practice generates video at scale, yet the interaction of interest occupies only a brief window of each long, untrimmed clip. Reliable biomechanical analysis, therefore, depends on spatiotemporal localization that identifies both the interacting entities and the onset of contact. We study First Point of Contact (FPOC), defined as the first frame in which a player physically touches a tackle dummy, in unconstrained practice footage with camera motion, clutter, multiple similarly equipped athletes, and rapid pose changes around impact. We present GRAZE, a training-free pipeline for FPOC localization that requires no labeled tackle-contact examples. GRAZE uses Grounding DINO to discover candidate player-dummy interactions, refines them with motion-aware temporal reasoning, and uses SAM2 as an explicit pixel-level verifier of contact rather than relying on detection confidence alone. This separation between candidate discovery and contact confirmation makes the approach robust to cluttered scenes and unstable grounding near impact. On 738 tackle-practice videos, GRAZE produces valid outputs for 97.4% of clips and localizes FPOC within $\pm$ 10 frames on 77.5% of all clips and within $\pm$ 20 frames on 82.7% of all clips. These results show that frame-accurate contact onset localization in real-world practice footage is feasible without task-specific training.

2511.08113 2026-06-11 cs.CL

Multimodal LLMs Do Not Compose Skills Optimally Across Modalities

Paula Ontalvilla, Aitor Ormazabal, Gorka Azkune

发表机构 * arXiv

详情
英文摘要

Skill composition is the ability to combine previously learned skills to solve new tasks. As neural networks acquire increasingly complex skills during their pretraining, it is not clear how successfully they can compose them. In this paper, we focus on Multimodal Large Language Models (MLLM), and study their ability to compose skills across modalities. To this end, we design three evaluation tasks which can be solved sequentially composing two modality-dependent skills, and evaluate several open MLLMs under two main settings: i) prompting the model to directly solve the task, and ii) using a two-step cascaded inference approach, which manually enforces the composition of the two skills for a given task. Even with these straightforward compositions, we find that all evaluated MLLMs exhibit a significant cross-modality skill composition gap. To mitigate the aforementioned gap, we explore two alternatives: i) use chain-of-thought prompting to explicitly instruct MLLMs for skill composition and ii) a specific fine-tuning recipe to promote skill composition. Although those strategies improve model performance, they still exhibit significant skill composition gaps, suggesting that more research is needed to improve cross-modal skill composition in MLLMs.

2506.22141 2026-06-11 cs.CL cs.IR

DAPFAM: A Domain-Aware Family-level Dataset to benchmark cross domain patent retrieval

Iliass Ayaou, Denis Cavallucci, Hicham Chibane

发表机构 * Institut National des Sciences de l'Univers, Strasbourg(斯特拉斯堡国家科学大学)

详情
英文摘要

Patent prior-art retrieval becomes especially challenging when relevant disclosures cross technological boundaries. Existing benchmarks lack explicit domain partitions, making it difficult to assess how retrieval systems cope with such shifts. We introduce DAPFAM, a family-level benchmark with explicit IN-domain and OUT-domain partitions defined by a new IPC3 overlap scheme. The dataset contains 1,247 query families and 45,336 target families aggregated at the family level to reduce international redundancy, with citation based relevance judgments. We conduct 249 controlled experiments spanning lexical (BM25) and dense (transformer) backends, document and passage level retrieval, multiple query and document representations, aggregation strategies, and hybrid fusion via Reciprocal Rank Fusion (RRF). Results reveal a pronounced domain gap: OUT-domain performance remains roughly five times lower than IN-domain across all configurations. Passage-level retrieval consistently outperforms document-level, and dense methods provide modest gains over BM25, but none close the OUT-domain gap. Document-level RRF yields strong effectiveness efficiency trade-offs with minimal overhead. By exposing the persistent challenge of cross-domain retrieval, DAPFAM provides a reproducible, compute-aware testbed for developing more robust patent IR systems. The dataset is publicly available on huggingface at https://huggingface.co/datasets/datalyes/DAPFAM_patent.

2602.13513 2026-06-11 math.OC cs.CE cs.LG cs.NA math.DS math.NA

Learning Gradient Flow: Using Equation Discovery to Accelerate Engineering Optimization

Grant Norman, Conor Rowan, Kurt Maute, Alireza Doostan

发表机构 * Smead Aerospace Engineering Sciences(Smead航空航天工程科学)

详情
Comments
44 pages, 13 figures. Submitted to CMAME. Changed Topology Optimization example to be 250% acceleration
英文摘要

In this work, we investigate the use of data-driven equation discovery for dynamical systems to model and forecast continuous-time dynamics of unconstrained optimization problems. To avoid expensive evaluations of the objective function and its gradient, we leverage trajectory data on the optimization variables to learn the continuous-time dynamics associated with gradient descent, Newton's method, and ADAM optimization. The discovered gradient flows are then solved as a surrogate for the original optimization problem. To this end, we introduce the Learned Gradient Flow (LGF) optimizer, which is equipped to build surrogate models of variable polynomial order in full- or reduced-dimensional spaces at user-defined intervals in the optimization process. We demonstrate the efficacy of this approach on several standard problems from engineering mechanics and scientific machine learning, including two inverse problems, structural topology optimization, and two forward solves with different discretizations. Our results suggest that the learned gradient flows can significantly expedite convergence by capturing critical features of the optimization trajectory while avoiding expensive evaluations of the objective and its gradient.

2510.09885 2026-06-11 cs.CL cs.AI

Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs

Xu Pan, Ely Hahami, Jingxuan Fan, Ziqian Xie, Haim Sompolinsky

发表机构 * Harvard University(哈佛大学) University of Texas Health Science Center at Houston(德克萨斯大学健康科学中心休斯顿分校) Hebrew University(希伯来大学)

详情
英文摘要

Large language models (LLMs) are often used in environments where facts evolve, yet factual knowledge updates via fine-tuning on unstructured text often suffer from 1) reliance on compute-heavy paraphrasing augmentation and 2) the reversal curse. Recent studies show diffusion large language models (dLLMs) require fewer training samples to achieve lower loss in pre-training and are more resistant to the reversal curse, suggesting dLLMs may learn new knowledge more easily than autoregressive LLMs (arLLMs). We test this hypothesis in controlled knowledge fine-tuning experiments and find that while arLLMs rely on paraphrase augmentation to generalize knowledge text into question-answering (QA) capability, dLLMs do not require paraphrases to achieve high QA accuracy. To further investigate whether the demasking objective alone can induce such a knowledge injection advantage in dLLMs regardless of their diffusion denoising paradigm, we propose masked fine-tuning for arLLMs, which prompts an arLLM to reconstruct the original text given a masked version in context. The masked fine-tuning for arLLMs substantially improves the efficacy of knowledge injection, i.e. no paraphrase needed and resistant to the reversal curse, closing the gap between arLLMs and dLLMs. We also demonstrate broader applicability: on a large-scale knowledge-intensive dataset (1.2M samples), masked SFT achieves the best downstream accuracy on GPQA-diamond among all fine-tuning variants. The demasking objective also improves SFT on math tasks, suggesting broad utility beyond factual knowledge injection.

2602.03147 2026-06-11 cs.RO

Multi-function Robotized Surgical Dissector for Endoscopic Pulmonary Thromboendarterectomy: Preclinical Study and Evaluation

Runfeng Zhu, Xin Zhong, Qingxiang Zhao, Jing Lin, Zhong Wu, Kang Li

发表机构 * West China Hospital of Medicine, Sichuan University(四川大学华西医学中心) School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China(电子科技大学机械与电子工程学院)

详情
英文摘要

Patients suffering chronic severe pulmonary thromboembolism need Pulmonary Thromboendarterectomy (PTE) to remove the thromb and intima located inside pulmonary artery (PA). During the surgery, a surgeon holds tweezers and a dissector to delicately strip the blockage, but available tools for this surgery are rigid and straight, lacking distal dexterity to access into thin branches of PA. Therefore, this work presents a novel robotized dissector based on concentric push/pull robot (CPPR) structure, enabling entering deep thin branch of tortuous PA. Compared with conventional rigid dissectors, our design characterizes slenderness and dual-segment-bending dexterity. Owing to the hollow and thin-walled structure of the CPPR-based dissector as it has a slender body of 3.5mm in diameter, the central lumen accommodates two channels for irrigation and tip tool, and space for endoscopic camera's signal wire. To provide accurate surgical manipulation, optimization-based kinematics model was established, realizing a 2mm accuracy in positioning the tip tool (60mm length) under open-loop control strategy. As such, with the endoscopic camera, traditional PTE is possible to be upgraded as endoscopic PTE. Basic physic performance of the robotized dissector including stiffness, motion accuracy and maneuverability was evaluated through experiments. Surgery simulation on ex vivo porcine lung also demonstrates its dexterity and notable advantages in PTE.

2601.21824 2026-06-11 cs.LG cs.DC

DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training

Xinwei Qiang, Hongmin Chen, Shixuan Sun, Jingwen Leng, Xin Liu, Minyi Guo

发表机构 * School of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学学院) ByteDance Seed(字节跳动种子) Zhiyuan College, Shanghai Jiao Tong University(上海交通大学智源学院)

详情
Journal ref
Proceedings of the International Conference on Learning Representations (ICLR), 2026
英文摘要

Determinism is indispensable for reproducibility in large language model (LLM) training, yet it often exacts a steep performance cost. In widely used attention implementations such as FlashAttention-3, the deterministic backward pass can incur up to a 37.9% throughput reduction relative to its non-deterministic counterpart, primarily because gradient accumulation operations must be serialized to guarantee numerical consistency. This performance loss stems from suboptimal scheduling of compute and gradient-reduction phases, leading to significant hardware underutilization. To address this challenge, we formulate the backward pass of deterministic attention as a scheduling problem on a Directed Acyclic Graph (DAG) and derive schedules that minimize the critical path length. Building on this formulation, we present DASH (Deterministic Attention Scheduling for High-Throughput), which encapsulates two complementary scheduling strategies: (i) Descending Q-Tile Iteration, a reversed query-block traversal that shrinks pipeline stalls in causal attention, and (ii) Shift Scheduling, a theoretically optimal schedule within our DAG model that reduces pipeline stalls for both full and causal masks. Our empirical evaluations on NVIDIA H800 GPUs demonstrate that DASH narrows the performance gap of deterministic attention. The proposed strategies improve the throughput of the attention backward pass by up to 1.28$\times$ compared to the baseline, significantly advancing the efficiency of reproducible LLM training. Our code is open-sourced at https://github.com/SJTU-Liquid/deterministic-FA3.

2409.00743 2026-06-11 cs.LG cs.AI

Interpretable Clustering: A Survey

Lianyu Hu, Mudi Jiang, Junjie Dong, Xinying Liu, Zengyou He

发表机构 * College of Information Science and Engineering, Henan University of Technology(河南理工大学信息科学与工程学院) School of Software, Dalian University of Technology(大连理工大学软件学院) Xinchang Power Supply Company, State Grid Corporation of China(国网浙江绍兴供电公司)

详情
Journal ref
ACM Computing Surveys, Volume 58, Issue 8, Article 215 (2026)
Comments
14 pages, 2 figures, 3 tables
英文摘要

In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concern. This is not only necessary for gaining user trust but also for satisfying the growing ethical and regulatory demands in these fields. Ensuring that decisions derived from clustering algorithms can be clearly understood and justified is now a fundamental requirement. To address this need, this paper provides a comprehensive and structured review of the current state of explainable clustering algorithms, identifying key criteria to distinguish between various methods. These insights can effectively assist researchers in making informed decisions about the most suitable explainable clustering methods for specific application contexts, while also promoting the development and adoption of clustering algorithms that are both efficient and transparent. For convenient access and reference, an open repository organizes representative and emerging interpretable clustering methods under the taxonomy proposed in this survey, available at https://github.com/hulianyu/Awesome-Interpretable-Clustering

2601.09072 2026-06-11 cs.AI cs.CL stat.ME

Human-AI Co-design for Clinical Prediction Models

Jean Feng, Avni Kothari, Patrick Vossler, Andrew Bishara, Lucas Zier, Newton Addo, Aaron Kornblith, Yan Shuo Tan, Chandan Singh

发表机构 * University of California, San Francisco(加州大学旧金山分校) National University of Singapore(新加坡国立大学) Microsoft Research(微软研究院)

详情
Journal ref
npj Digital Medicine 2026
英文摘要

Developing safe, effective, and practically useful clinical prediction models (CPMs) traditionally requires iterative collaboration between clinical experts, data scientists, and informaticists. This process refines the often small but critical details of the model building process, such as which features/patients to include and how clinical categories should be defined. However, this traditional collaboration process is extremely time- and resource-intensive, resulting in only a small fraction of CPMs reaching clinical practice. This challenge intensifies when teams attempt to incorporate unstructured clinical notes, which can contain an enormous number of concepts. To address this challenge, we introduce HACHI, an iterative human-in-the-loop framework that uses AI agents to accelerate the development of fully interpretable CPMs by enabling the exploration of concepts in clinical notes. HACHI alternates between (i) an AI agent rapidly exploring and evaluating candidate concepts in clinical notes and (ii) clinical and domain experts providing feedback to improve the CPM learning process. HACHI defines concepts as simple yes-no questions that are used in linear models, allowing the clinical AI team to transparently review, refine, and validate the CPM learned in each round. In two real-world prediction tasks (acute kidney injury and traumatic brain injury), HACHI outperforms existing approaches, surfaces new clinically relevant concepts not included in commonly-used CPMs, and improves model generalizability across clinical sites and time periods. Furthermore, HACHI reveals the critical role of the clinical AI team, such as directing the AI agent to explore concepts that it had not previously considered, adjusting the granularity of concepts it considers, changing the objective function to better align with the clinical objectives, and identifying issues of data bias and leakage.

2601.07436 2026-06-11 eess.SP cs.LG physics.optics

PIDT: Physics-Informed Digital Twin for Optical Fiber Parameter Estimation

Zicong Jiang, Magnus Karlsson, Erik Agrell, Christian Häger

发表机构 * Dept. of Electrical Engineering, Chalmers Univ. of Technology, Sweden(电气工程系,瑞典查尔姆斯理工大学) Dept. of Microtechnology and Nanoscience, Chalmers Univ. of Technology, Sweden(微电子与纳米科技系,瑞典查尔姆斯理工大学)

详情
Comments
The paper will be appeared in Optical Fiber Communications Conference and Exhibition (OFC) 2026
英文摘要

We propose physics-informed digital twin (PIDT): a fiber parameter estimation approach that combines a parameterized split-step method with a physics-informed loss. PIDT improves accuracy and convergence speed with lower complexity compared to previous neural operators.

2512.20011 2026-06-11 cs.CV

PaveSync: A Unified and Comprehensive Dataset for Pavement Distress Analysis and Classification

Blessing Agyei Kyem, Joshua Kofi Asamoah, Anthony Dontoh, Andrews Danyo, Eugene Denteh, Armstrong Aboah

发表机构 * University of Ghana(加纳大学)

详情
Journal ref
2025 IEEE International Conference on Future Machine Learning and Data Science (FMLDS)
英文摘要

Automated pavement defect detection often struggles to generalize across diverse real-world conditions due to the lack of standardized datasets. Existing datasets differ in annotation styles, distress type definitions, and formats, limiting their integration for unified training. To address this gap, we introduce a comprehensive benchmark dataset that consolidates multiple publicly available sources into a standardized collection of 52747 images from seven countries, with 135277 bounding box annotations covering 13 distinct distress types. The dataset captures broad real-world variation in image quality, resolution, viewing angles, and weather conditions, offering a unique resource for consistent training and evaluation. Its effectiveness was demonstrated through benchmarking with state-of-the-art object detection models including YOLOv8-YOLOv12, Faster R-CNN, and DETR, which achieved competitive performance across diverse scenarios. By standardizing class definitions and annotation formats, this dataset provides the first globally representative benchmark for pavement defect detection and enables fair comparison of models, including zero-shot transfer to new environments.

2510.11290 2026-06-11 cs.AI cs.HC

Evolution in Simulation: AI-Agent School with Dual Memory for High-Fidelity Educational Dynamics

Sheng Jin, Haoming Wang, Zhiqi Gao, Yongbo Yang, Bao Chunjia, Chengliang Wang

发表机构 * Guanghua Law School, Zhejiang University(浙江大学法学院) Faculty of Education, East China Normal University(华东师范大学教育学院) School of Data Science, The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)数据科学学院) Department of Electrical and Computer Engineering, University of California San Diego(加州大学圣地亚哥分校电子与计算机工程系) Institute of Systems Science, National University of Singapore(新加坡国立大学系统科学研究所)

详情
Journal ref
Findings of the Association for Computational Linguistics: EMNLP 2025
Comments
9 pages, 7 figures, EMNLP conference
英文摘要

Large language models (LLMs) based Agents are increasingly pivotal in simulating and understanding complex human systems and interactions. We propose the AI-Agent School (AAS) system, built around a self-evolving mechanism that leverages agents for simulating complex educational dynamics. Addressing the fragmented issues in teaching process modeling and the limitations of agents performance in simulating diverse educational participants, AAS constructs the Zero-Exp strategy, employs a continuous "experience-reflection-optimization" cycle, grounded in a dual memory base comprising experience and knowledge bases and incorporating short-term and long-term memory components. Through this mechanism, agents autonomously evolve via situated interactions within diverse simulated school scenarios. This evolution enables agents to more accurately model the nuanced, multi-faceted teacher-student engagements and underlying learning processes found in physical schools. Experiment confirms that AAS can effectively simulate intricate educational dynamics and is effective in fostering advanced agent cognitive abilities, providing a foundational stepping stone from the "Era of Experience" to the "Era of Simulation" by generating high-fidelity behavioral and interaction data.

2412.13841 2026-06-11 cs.CY cs.AI cs.HC

Cultural Dimensions of AI Perception: Charting Expectations, Risks, Benefits, Tradeoffs, and Value in Germany and China

Philipp Brauner, Felix Glawe, Gian Luca Liehner, Luisa Vervier, Martina Ziefle

发表机构 * RWTH Aachen University(亚琛工业大学)

详情
Journal ref
Acta Psychologica (2026), volume 268, article 107094
英文摘要

As artificial intelligence (AI) continues to advance, understanding public perceptions -- including biases, risks, and benefits -- is essential for guiding research priorities and AI alignment, shaping public discourse, and informing policy. This exploratory study investigates cultural differences in mental models of AI using 71 imaginaries of AI's potential futures. Drawing on cross-cultural convenience samples from Germany (N=52) and China (N=60), we identify significant differences in expectations, evaluations, and risk-benefit tradeoffs. Participants from Germany generally provided more cautious assessments, whereas participants from China expressed greater optimism regarding AI's societal benefits. Chinese participants exhibited relatively balanced risk-benefit tradeoffs ($β=-0.463$ for risk and $β=+0.484$ for benefit, $r^2=.630$). In contrast, German participants placed greater emphasis on AI's benefits and comparatively less on risks ($β=-0.337$ for risk and $β=+0.715$ for benefit, $r^2=.839$). Visual cognitive maps illustrate these contrasts, offering new perspectives on how cultural contexts shape AI acceptance. Our findings highlight key factors influencing public perception and provide insights for aligning AI with societal values and promoting equitable and culturally sensitive integration of AI technologies.

2508.11703 2026-06-11 cs.NE cs.LG

Data-Driven Discovery of Interpretable Kalman Filter Variants through Large Language Models and Genetic Programming

Vasileios Saketos, Sebastian Kaltenbach, Sergey Litvinov, Petros Koumoutsakos

发表机构 * University of Reading(reading大学) University of Cambridge(剑桥大学)

详情
英文摘要

Algorithmic discovery has traditionally relied on human ingenuity and extensive experimentation. Here we investigate whether a prominent scientific computing algorithm, the Kalman Filter, can be discovered through an automated, data-driven, evolutionary process that relies on Cartesian Genetic Programming (CGP) and Large Language Models (LLM). We evaluate the contributions of both modalities (CGP and LLM) in discovering the Kalman filter under varying conditions. Our results demonstrate that our framework of CGP and LLM-assisted evolution converges to near-optimal solutions when Kalman optimality assumptions hold. When these assumptions are violated, our framework evolves interpretable alternatives that outperform the Kalman filter. These results demonstrate that combining evolutionary algorithms and generative models for interpretable, data-driven synthesis of simple computational modules is a potent approach for algorithmic discovery in scientific computing.

2508.15943 2026-06-11 cs.AI

T-ILR: a Neurosymbolic Integration for LTLf

Riccardo Andreoni, Andrei Buliga, Alessandro Daniele, Chiara Ghidini, Marco Montali, Massimiliano Ronzani

发表机构 * Fondazione Bruno Kessler(布鲁诺·科塞勒基金会) Free University of Bozen-Bolzano(博兹纳-博尔扎诺自由大学) University of Bozen-Bolzano(博兹纳-博尔扎诺大学)

详情
Journal ref
Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning (NeSy 2025)
Comments
Accepted for presentation at NeSy 2025. 10 pages
英文摘要

State-of-the-art approaches for integrating symbolic knowledge with deep learning architectures have demonstrated promising results in static domains. However, methods to handle temporal logic specifications remain underexplored. The only existing approach relies on an explicit representation of a finite-state automaton corresponding to the temporal specification. Instead, we aim at proposing a neurosymbolic framework designed to incorporate temporal logic specifications, expressed in Linear Temporal Logic over finite traces (LTLf), directly into deep learning architectures for sequence-based tasks. We extend the Iterative Local Refinement (ILR) neurosymbolic algorithm, leveraging the recent introduction of fuzzy LTLf interpretations. We name this proposed method Temporal Iterative Local Refinement (T-ILR). We assess T-ILR on an existing benchmark for temporal neurosymbolic architectures, consisting of the classification of image sequences in the presence of temporal knowledge. The results demonstrate improved accuracy and computational efficiency compared to the state-of-the-art method.

2503.08379 2026-06-11 cs.IR cs.CL

JurisTCU: A Brazilian Portuguese Information Retrieval Dataset with Query Relevance Judgments

Leandro Carísio Fernandes, Leandro dos Santos Ribeiro, Marcos Vinícius Borela de Castro, Leonardo Augusto da Silva Pacheco, Edans Flávius de Oliveira Sandes

发表机构 * Câmara dos Deputados(议会委员会) Tribunal de Contas da União (TCU)(联邦审计法院)

详情
Comments
23 pages
英文摘要

This paper introduces JurisTCU, a Brazilian Portuguese dataset for legal information retrieval (LIR). The dataset is freely available and consists of 16,045 jurisprudential documents from the Brazilian Federal Court of Accounts, along with 150 queries annotated with relevance judgments. It addresses the scarcity of Portuguese-language LIR datasets with query relevance annotations. The queries are organized into three groups: real user keyword-based queries, synthetic keyword-based queries, and synthetic question-based queries. Relevance judgments were produced through a hybrid approach combining LLM-based scoring with expert domain validation. We used JurisTCU in 14 experiments using lexical search (document expansion methods) and semantic search (BERT-based and OpenAI embeddings). We show that the document expansion methods significantly improve the performance of standard BM25 search on this dataset, with improvements exceeding 45% in P@10, R@10, and nDCG@10 metrics when evaluating short keyword-based queries. Among the embedding models, the OpenAI models produced the best results, with improvements of approximately 70% in P@10, R@10, and nDCG@10 metrics for short keyword-based queries, suggesting that these dense embeddings capture semantic relationships in this domain, surpassing the reliance on lexical terms. Besides offering a dataset for the Portuguese-language IR research community, suitable for evaluating search systems, the results also contribute to enhancing a search system highly relevant to Brazilian citizens.

2505.11308 2026-06-11 cs.LG physics.comp-ph

Reinforcement Learning Closures for Underresolved Partial Differential Equations using Synthetic Data

Lothar Heimbach, Sebastian Kaltenbach, Petr Karnakov, Francis J. Alexander, Petros Koumoutsakos

发表机构 * ETH Zurich/ Harvard University(苏黎世联邦理工学院/哈佛大学) Harvard University(哈佛大学) Argonne National Laboratory(阿贡国家实验室)

详情
英文摘要

Partial Differential Equations (PDEs) describe phenomena ranging from turbulence and epidemics to quantum mechanics and financial markets. Despite recent advances in computational science, solving such PDEs for real-world applications remains prohibitively expensive because of the necessity of resolving a broad range of spatiotemporal scales. In turn, practitioners often rely on coarse-grained approximations of the original PDEs, trading off accuracy for reduced computational resources. To mitigate the loss of detail inherent in such approximations, closure models are employed to represent unresolved spatiotemporal interactions. We present a framework for developing closure models for PDEs using synthetic data acquired through the method of manufactured solutions. These data are used in conjunction with reinforcement learning to provide closures for coarse-grained PDEs. We illustrate the efficacy of our method using the one-dimensional and two-dimensional Burgers' equations and the two-dimensional advection equation. Moreover, we demonstrate that closure models trained for inhomogeneous PDEs can be effectively generalized to homogeneous PDEs. The results demonstrate the potential for developing accurate and computationally efficient closure models for systems with scarce data.

2407.08035 2026-06-11 cs.CL cs.IR

FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios

Yongjian Tang, Rakebul Hasan, Thomas Runkler

发表机构 * Technical University of Munich(慕尼黑技术大学) Siemens AG(西门子股份公司)

详情
Comments
accepted in the main track at the 27th European Conference on Artificial Intelligence (ECAI-2024)
英文摘要

Large Language Models (LLMs) have provided a new pathway for Named Entity Recognition (NER) tasks. Compared with fine-tuning, LLM-powered prompting methods avoid the need for training, conserve substantial computational resources, and rely on minimal annotated data. Previous studies have achieved comparable performance to fully supervised BERT-based fine-tuning approaches on general NER benchmarks. However, none of the previous approaches has investigated the efficiency of LLM-based few-shot learning in domain-specific scenarios. To address this gap, we introduce FsPONER, a novel approach for optimizing few-shot prompts, and evaluate its performance on domain-specific NER datasets, with a focus on industrial manufacturing and maintenance, while using multiple LLMs -- GPT-4-32K, GPT-3.5-Turbo, LLaMA 2-chat, and Vicuna. FsPONER consists of three few-shot selection methods based on random sampling, TF-IDF vectors, and a combination of both. We compare these methods with a general-purpose GPT-NER method as the number of few-shot examples increases and evaluate their optimal NER performance against fine-tuned BERT and LLaMA 2-chat. In the considered real-world scenarios with data scarcity, FsPONER with TF-IDF surpasses fine-tuned models by approximately 10% in F1 score.

2411.10077 2026-06-11 cs.CV

Hierarchical Mutual Distillation for Multi-View Fusion: Learning from All Possible View Combinations

Jiwoong Yang, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University(翰阳大学) Hankuk University of Foreign Studies(韩国民法大学)

详情
Journal ref
Pattern Recognition 178 (2026) 113432
英文摘要

Multi-view learning often faces challenges in effectively leveraging images captured from different angles and locations. This challenge is particularly pronounced when addressing inconsistencies and uncertainties between views. In this paper, we propose a novel Multi-View Uncertainty-Weighted Mutual Distillation (MV-UWMD) method. Our method enhances prediction consistency by performing hierarchical mutual distillation across all possible view combinations, including single-view, partial multi-view, and full multi-view predictions. This introduces an uncertainty-based weighting mechanism through mutual distillation, allowing effective exploitation of unique information from each view while mitigating the impact of uncertain predictions. We extend a CNN-Transformer hybrid architecture to facilitate robust feature learning and integration across multiple view combinations. We conducted extensive experiments using a large, unstructured dataset captured from diverse, non-fixed viewpoints. The results demonstrate that MV-UWMD improves prediction accuracy and consistency compared to existing multi-view learning approaches.

2502.09084 2026-06-11 cs.CR cs.LG cs.NI

Application of Tabular Transformer Architectures for Operating System Fingerprinting

Rubén Pérez-Jove, Cristian R. Munteanu, Alejandro Pazos, Jose Vázquez-Naya

发表机构 * RNASNA-IMEDIR Research Group Department of Computer Science and Information Technologies Facultad de Informática Universidade da Coruña(RNASNA-IMEDIR研究组计算机科学与信息科技系信息学院科鲁纳大学) CITIC Research Centre Universidade da Coruña(CITIC研究中心科鲁纳大学) IKERDATA S.L(IKERDATA公司)

详情
Comments
Submitted as a preprint (not peer reviewed). 22 pages, 9 figures. Code and datasets available at: https://github.com/rubenpjove/tabularT-OS-fingerprinting
英文摘要

Operating System (OS) fingerprinting is essential for network management and cybersecurity, enabling accurate device identification based on network traffic analysis. Traditional rule-based tools such as Nmap and p0f face challenges in dynamic environments due to frequent OS updates and obfuscation techniques. While Machine Learning (ML) approaches have been explored, Deep Learning (DL) models, particularly Transformer architectures, remain unexploited in this domain. This study investigates the application of Tabular Transformer architectures-specifically TabTransformer and FT-Transformer-for OS fingerprinting, leveraging structured network data from three publicly available datasets. Our experiments demonstrate that FT-Transformer generally outperforms traditional ML models, previous approaches and TabTransformer across multiple classification levels (OS family, major, and minor versions). The results establish a strong foundation for DL-based OS fingerprinting, improving accuracy and adaptability in complex network environments. Furthermore, we ensure the reproducibility of our research by providing an open-source implementation.

2502.07990 2026-06-11 cs.LG physics.comp-ph physics.flu-dyn

Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * Harvard SEAS(哈佛大学SEAS)

详情
Comments
Conference on Parsimony and Learning (CPAL)
英文摘要

Modeling and simulation of complex fluid flows with dynamics that span multiple spatio-temporal scales is a fundamental challenge in many scientific and engineering domains. Full-scale resolving simulations for systems such as highly turbulent flows are not feasible in the foreseeable future, and reduced-order models must capture dynamics that involve interactions across scales. In the present work, we propose a novel framework, Graph-based Learning of Effective Dynamics (Graph-LED), that leverages graph neural networks (GNNs), as well as an attention-based autoregressive model, to extract the effective dynamics from a small amount of simulation data. GNNs represent flow fields on unstructured meshes as graphs and effectively handle complex geometries and non-uniform grids. The proposed method combines a GNN based, dimensionality reduction for variable-size unstructured meshes with an autoregressive temporal attention model that can learn temporal dependencies automatically. We evaluated the proposed approach on a suite of fluid dynamics problems, including flow past a cylinder and flow over a backward-facing step over a range of Reynolds numbers. The results demonstrate robust and effective forecasting of spatio-temporal physics; in the case of the flow past a cylinder, both small-scale effects that occur close to the cylinder as well as its wake are accurately captured.

2402.00972 2026-06-11 cs.LG cs.MA physics.comp-ph

Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning

Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * ETH Zurich(苏黎世联邦理工学院) Harvard SEAS(哈佛大学工程学院)

详情
Comments
Conference on Parsimony and Learning (CPAL)
英文摘要

Reliable predictions of critical phenomena, such as weather, wildfires and epidemics often rely on models described by Partial Differential Equations (PDEs). However, simulations that capture the full range of spatio-temporal scales described by such PDEs are often prohibitively expensive. Consequently, coarse-grained simulations are usually deployed that adopt various heuristics and empirical closure terms to account for the missing information. We propose a novel and systematic approach for identifying closures in under-resolved PDEs using grid-based Reinforcement Learning. This formulation incorporates inductive bias and exploits locality by deploying a central policy represented efficiently by a Fully Convolutional Network (FCN). We demonstrate the capabilities and limitations of our framework through numerical solutions of the advection equation and the Burgers' equation. Our results show accurate predictions for in- and out-of-distribution test cases as well as a significant speedup compared to resolving all scales.

2412.12231 2026-06-11 cs.RO cs.LG

Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab

Leon Gorißen, Jan-Niklas Schneider, Mohamed Behery, Philipp Brauner, Moritz Lennartz, David Kötter, Thomas Kaster, Oliver Petrovic, Christian Hinke, Thomas Gries, Gerhard Lakemeyer, Martina Ziefle, Christian Brecher, Constantin Häfner

发表机构 * Chair for Laser Technology, RWTH Aachen University(激光技术系,亚琛RWTH大学) Knowledge Based Systems Group, RWTH Aachen University(知识系统小组,亚琛RWTH大学) Communication Science, RWTH Aachen University(通信科学,亚琛RWTH大学) Chair of Textile Technology, RWTH Aachen University(纺织技术系,亚琛RWTH大学) Laboratory for Machine Tools and Production Engineering, RWTH Aachen University(机械加工与生产工程实验室,亚琛RWTH大学) Human Computer Interaction Center, RWTH Aachen University(人机交互中心,亚琛RWTH大学) Fraunhofer Institute for Laser Technology(弗劳恩霍夫激光技术研究所)

详情
Journal ref
MDPI MAKE (Machine Learning and Knowledge Extraction (2026), 8(5)
Comments
15 pages, 6 figures, submitted to CAiSE 2025
英文摘要

The digital transformation of production requires new methods of data integration and storage, as well as decision making and support systems that work vertically and horizontally throughout the development, production, and use cycle. In this paper, we propose Data-to-Knowledge (and Knowledge-to-Data) pipelines for production as a universal concept building on a network of Digital Shadows (a concept augmenting Digital Twins). We show a proof of concept that builds on and bridges existing infrastructure to 1) capture and semantically annotates trajectory data from multiple similar but independent robots in different organisations and use cases in a data lakehouse and 2) an independent process that dynamically queries matching data for training an inverse dynamic foundation model for robotic control. The article discusses the challenges and benefits of this approach and how Data-to-Knowledge pipelines contribute efficiency gains and industrial scalability in a World Wide Lab as a research outlook.

2406.07909 2026-06-11 eess.AS cs.CL cs.SD stat.ML

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Eungbeom Kim, Hantae Kim, Kyogu Lee

发表机构 * KAIST(韩国科学技术院)

详情
Comments
Accepted by Interspeech 2024
英文摘要

Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduces a self-knowledge distillation (SKD) method that guides the frame-level alignment during the training time. In contrast to the conventional method using separate teacher and student models, this study introduces a simple and effective method sharing encoder layers and applying the sub-model as the student model. Overall, our approach is effective in improving both the resource efficiency as well as performance. We also conducted an experimental analysis of the spike timings to illustrate that the proposed method improves performance by reducing the alignment disagreement.

2305.13108 2026-06-11 eess.AS cs.CL cs.LG cs.SD

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

发表机构 * Institute of Information & communications Technology Planning & Evaluation (IITP)(信息与通信技术规划与评估机构)

详情
Comments
Accepted by Interspeech 2023
英文摘要

Automatic speech recognition systems based on deep learning are mainly trained under empirical risk minimization (ERM). Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR systems are unaware of the performance disparities across the groups. This results in biased ASR systems whose performance differences among groups are severe. In this study, we aim to improve the ASR system in terms of group robustness for dysarthric speakers. To achieve our goal, we present a novel approach, sample reweighting with sample affinity test (Re-SAT). Re-SAT systematically measures the debiasing helpfulness of the given data sample and then mitigates the bias by debiasing helpfulness-based sample reweighting. Experimental results demonstrate that Re-SAT contributes to improved ASR performance on dysarthric speech without performance degradation on healthy speech.

2606.12373 2026-06-11 cs.CL 新提交

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

可验证环境是乐高积木:递归组合实现推理泛化

Hao Xiang, Qiaoyu Tang, Le Yu, Yaojie Lu, Xianpei Han, Ben He, Le Sun, Bowen Yu, Peng Wang, Hongyu Lin, Dayiheng Liu

AI总结 提出RACES框架,将可验证环境视为可递归组合的构建块,通过定义四种组合算子自动生成复合环境,在六个未见基准上平均提升DeepSeek-R1-Distill-Qwen-14B 3.1分,且仅用50个基础环境即可达到300个环境的性能。

详情
AI中文摘要

基于可验证环境的强化学习已成为增强大语言模型推理能力的有效方法。虽然先前研究表明扩展环境数量可提升强化学习性能,但现有手动或单独构建方法受限于线性扩展瓶颈,阻碍了可扩展的推理泛化。本文提出RACES(递归自动组合环境扩展)框架,将可验证环境视为可递归组装的可组合构建块。关键洞察是:当一个环境的余域(输出类型)与另一个环境的定义域(输入类型)匹配时,它们可以自动融合为新的可验证环境,从而实现递归组合。RACES使用300个独立环境实现,并定义了四种组合算子(SEQUENTIAL、PARALLEL、SORT和SELECT),诱导出多样化的推理模式。大量实验表明,在这些复合环境上进行强化学习训练持续提升了推理泛化能力。具体而言,RACES在六个未见基准上平均提升DeepSeek-R1-Distill-Qwen-14B 3.1分(从48.2到51.3),并将Qwen3-14B的性能从58.8提升至61.1。此外,RACES仅使用50个基础环境即可达到与使用300个独立环境训练相当的性能,展现了显著的环境利用效率。

英文摘要

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual construction methods suffer from linear scaling limits, thereby hindering scalable reasoning generalization. This paper introduces RACES (\textbf{R}ecursive \textbf{A}utomated \textbf{C}omposition for \textbf{E}nvironment \textbf{S}caling), a framework that conceptualizes verifiable environments as composable building blocks that can be recursively assembled. The key insight is that when the codomain (output type) of one environment matches the domain (input type) of another, they can be automatically fused into a new verifiable environment, enabling recursive composition. RACES is implemented with 300 individual environments and defines a set of composition operators (\textsc{SEQUENTIAL}, \textsc{PARALLEL}, \textsc{SORT}, and \textsc{SELECT}) that induce diverse reasoning patterns. Extensive experiments show that RL training on these composite environments consistently enhances reasoning generalization. Specifically, RACES improves DeepSeek-R1-Distill-Qwen-14B by an average of 3.1 points (from 48.2 to 51.3) and boosts Qwen3-14B performance from 58.8 to 61.1 on six benchmarks, which are unseen during the construction of training environments. Moreover, RACES achieves performance comparable to training on 300 individual environments using only 50 base environments, demonstrating significant efficiency in environment utilization.