arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.29355 2026-06-11 cs.LG q-bio.NC

Neural-Behavioral Representation of Natural Whole-body Movement in Monkeys

猴子自然全身运动的神经-行为表征

Jieshi He, Puzhe Li, Yanan Sui, Mu-ming Poo

发表机构 * Center for Excellence in Brain Science and Intelligence Technology, CAS（脑科学与智能技术 excellence 中心，中国科学院）； Tsinghua University（清华大学）

AI总结通过大规模皮层信号与多视角运动捕捉，结合自回归编码器-解码器模型，实现了对自由运动猴子全身运动的准确解码。

详情

AI中文摘要

理解皮层活动如何表征灵长类动物的自然全身行为仍然具有挑战性。受限于运动的多样性和全身运动学大规模神经表征的不可及性，先前的运动解码研究集中于受限任务和有限的肢体运动。在这里，我们提出了一个用于自由运动猴子的神经-行为记录和建模框架，通过定制的数据采集平台，将来自分布式感觉和运动相关区域的大规模硬膜外皮层信号与同步的多视角运动捕捉相结合。我们重建了猴子的全身运动学，并使用自回归编码器-解码器模型学习了紧凑的行为先验。以神经信号为条件，该模型在没有明确物理约束的情况下解码出准确且逼真的全身运动。我们的结果为利用大规模颅内神经活动解码灵长类动物的自然全身运动提供了一种新颖的概念验证方法。

英文摘要

Understanding how cortical activity represents natural whole-body behaviors in primates remains challenging. Limited by the diversity of movements and inaccessibility of large-scale neural representation of whole-body kinematics, previous motor decoding studies focused on constrained tasks and limited limb movements. Here, we present a neural-behavioral recording and modeling framework for freely moving monkeys, combining large-scale epidural cortical signals from distributed sensory- and motor-related areas with synchronized multi-view motion capture through a custom-made data collection platform. We reconstructed whole-body monkey kinematics and learned a compact behavior prior using an autoregressive encoder-decoder model. Conditioned on neural signals, the model decoded accurate and realistic whole-body movement without explicit physical constraints. Our results provide a novel proof-of-concept approach for decoding natural whole-body movements in primates using large-scale intracranial neural activity.

URL PDF HTML ☆

赞 0 踩 0

2605.29292 2026-06-11 cs.CV

Turbulence-Robust Dynamic Object Segmentation with Multi-Signal Priors and SAM2 Refinement

基于多信号先验和SAM2优化的湍流鲁棒动态目标分割

Bolian Peng, Ying Tang, Xu Liu, Long Sun, Xiaoqiang Lu

发表机构 * Xidian University（西安电子科技大学）

AI总结提出一种无需训练的多信号分割流水线，结合RAFT运动估计、DINOv2语义先验、ViBe背景建模和SAM2掩码优化，解决大气湍流下的动态目标分割问题。

详情

Journal ref: Proceedings of the CVPR 2026 Workshops, UG2+ Challenge, 2026

AI中文摘要

本技术报告介绍了我们针对CVPR 2026 UG2+挑战赛第三赛道：湍流中动态目标分割（DOST）的解决方案。我们设计了一种无需训练的多信号分割流水线，结合了预训练的运动估计、自监督语义先验、背景异常建模、手动校准的提议融合以及基于SAM2的掩码优化。该方法使用RAFT获取密集运动响应，DINOv2获取语义目标先验，ViBe进行无需训练的背景建模，以及预训练的SAM2进行框提示掩码优化。我们的系统完全在推理模式下运行，而不是优化端到端的分割网络。这种设计适用于DOST场景，其中严重的大气湍流会产生伪运动、模糊和间歇性目标可见性，使得单一运动线索不可靠。最终提交的掩码由官方排行榜评估，报告了0.425041 mIoU和0.457206 mDice。由于没有进行特定任务的模型训练或微调，更强的学习时间关联、自适应提议选择或任务特定适应可能进一步改进系统。

英文摘要

This technical report presents our solution for the CVPR 2026 UG2+ Challenge Track 3: Dynamic Object Segmentation in Turbulence (DOST). We design a training-free multi-signal segmentation pipeline that combines pretrained motion estimation, self-supervised semantic priors, background anomaly modeling, manually calibrated proposal fusion, and SAM2-based mask refinement. The method uses RAFT for dense motion responses, DINOv2 for semantic objectness priors, ViBe for training-free background modeling, and pretrained SAM2 for box-prompt mask refinement. Instead of optimizing an end-to-end segmentation network, our system operates entirely in inference mode. This design is suitable for the DOST setting, where severe atmospheric turbulence produces pseudo-motion, blur, and intermittent target visibility, making a single motion cue unreliable. The final submitted masks are evaluated by the official leaderboard, which reports 0.425041 mIoU and 0.457206 mDice. Since no task-specific model training or fine-tuning is performed, stronger learned temporal association, adaptive proposal selection, or task-specific adaptation may further improve the system.

URL PDF HTML ☆

赞 0 踩 0

2605.22509 2026-06-11 cs.HC cs.CL

Reflecti-Mate: A Conversational Agent for Adaptive Decision-Making Support Through System 1 and System 2 Thinking

Reflecti-Mate: 通过系统1和系统2思维实现自适应决策支持的对话代理

Morita Tarvirdians, Senthil Chandrasegaran, Hayley Hung, Catholijn M. Jonker, Catharine Oertel

发表机构 * TU Delft（代尔夫特理工大学）； TU Delft/Leiden University（代尔夫特理工大学/莱顿大学）

AI总结本文研究了一种对话代理，通过适应个体思维模式促进决策整合，该代理能提供更个性化的反思路径和整合性反思语言，优于传统决策支持系统。

详情

DOI: 10.1145/3774935.3806176
Journal ref: UMAP 2026: Proceedings of the 34th ACM Conference on User Modeling, Adaptation and Personalization
Comments: Accepted at UMAP 2026

AI中文摘要

在做出高风险个人决策时，涉及认知、情感和直觉过程，个体在这些模式间的注意力分配各不相同。整合这些过程已被证明有助于决策。然而，大多数现有决策支持系统主要支持认知方面，而非适应个体的思维特征以促进不同思维类型的整合。在本研究中，我们探讨了一种代理，旨在通过适应个体用户思维模式来促进整合。我们探讨了该代理对参与者对代理的看法及其反思行为的影响，与未受助的预反思和基线代理进行比较。在被试间研究（N=128）中，我们的代理促进了广泛且深入的思考，使参与者能够形成更个性化的反思轨迹，产生更多整合性的反思语言，并被感知为提供更强的全面反思支持。相比之下，基线代理产生了受认知语言主导的同质化特征。

英文摘要

Making high-stakes personal decisions involves cognitive, emotional, and intuitive processes, and individuals differ in how they allocate attention across these modes. Integration of these processes has shown to benefit decision making. Yet, most current decision-support systems focus primarily on supporting cognitive aspects, rather than adapting to the individual's thinking profile to support integration of different types of thoughts. In this study, we investigate an agent designed to encourage integration by adapting to the individual user's thought patterns. We explore its effects on participants' perceptions of the agent and their reflective behavior, in comparison with unaided pre-reflection and a baseline agent. In a between-subjects study (N = 128), our agent, which fostered broad and elaborated thinking, enabled more personalized reflective trajectories, elicited more integrative reflective language, and was perceived as providing stronger support for holistic reflection. In contrast, the baseline agent produced homogenized profiles dominated by cognitive language across participants.

URL PDF HTML ☆

赞 0 踩 0

2605.17773 2026-06-11 cs.CV

PlantPose: Universal Plant Skeleton Estimation via Tree-constrained Graph Generation

PlantPose: 通过树约束图生成实现通用植物骨架估计

Xinpeng Liu, Hiroaki Santo, Yosuke Toda, Fumio Okura

发表机构 * Graduate School of Information Science and Technology, Osaka University（大阪大学信息科学与技术研究生院）； Phytometrics（Phytometrics公司）； Institute of Transformative Bio-Molecules, Nagoya University（名古屋大学变革生物分子研究所）

AI总结本文提出PlantPose，一种通过树约束图生成实现通用植物骨架估计的方法，通过结合学习基于图生成和传统图算法，提高模型的泛化能力，并在多个领域实现了鲁棒且准确的植物骨架估计。

详情

DOI: 10.1007/s11263-026-02882-4
Comments: International Journal of Computer Vision, 2026

AI中文摘要

准确地从图像中估计植物骨架结构（例如分支结构）对于智能农业和植物科学至关重要。与人类骨骼固定拓扑结构不同，植物骨架估计面临独特的挑战，即从图像中估计任意树状图。为了解决这个问题，我们介绍了PlantPose，一种通过树约束图生成实现的通用植物骨架估计器。PlantPose结合了基于学习的图生成与传统图算法，在训练循环中强制执行树约束。为了提高模型的泛化能力，我们精心编排了一个包含真实世界和合成植物图像以及简化表示（例如草图和抽象画）的大型多样化数据集。该数据集使通用模型能够适应各种输入样式和植物图像类别，同时保持拓扑一致性。我们的方法在多个领域实现了鲁棒且准确的植物骨架估计，包括之前未见过的域外场景。进一步的分析突显了该方法在处理复杂、异质数据分布方面的优势和局限性。所有实现和数据集均在https://github.com/huntorochi/PlantPose/上提供。

英文摘要

Accurate estimation of plant skeletal structures (e.g., branching structures) from images is essential for smart agriculture and plant science. Unlike human skeletons with fixed topology, plant skeleton estimation presents a unique challenge, i.e., estimating arbitrary tree graphs from images. To address this problem, we introduce PlantPose, a universal plant skeleton estimator via tree-constrained graph generation. PlantPose combines learning-based graph generation with traditional graph algorithms to enforce tree constraints during the training loop. To enhance the model's generalization capability, we curate a large and diverse dataset comprising real-world and synthetic plant images, along with simplified representations (e.g., sketches and abstract drawings). This dataset enables the generalized model to adapt to diverse input styles and categories of plant images while preserving topological consistency. Our approach demonstrates robust and accurate plant skeleton estimation across multiple domains, including previously unseen out-of-domain scenarios. Further analyses highlight the method's strengths and limitations in handling complex, heterogeneous data distributions. All implementations and datasets are available at https://github.com/huntorochi/PlantPose/.

URL PDF HTML ☆

赞 0 踩 0

2604.01383 2026-06-11 cs.CV cs.AI

GRAZE: Grounded Refinement and Motion-Aware Zero-Shot Event Localization

Syed Ahsan Masud Zaidi, Lior Shamir, William Hsu, Scott Dietrich, Talha Zaidi

发表机构 * Kansas State University（堪萨斯州立大学）； Albright College（阿尔比恩学院）

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 10087-10095, June 2026
Comments: 9 pages, 5 figures, accepted to the CVPR 2026 Workshop on Computer Vision in Sports (CVSports) code: https://github.com/AhsanZaidi12/GRAZE

英文摘要

American football practice generates video at scale, yet the interaction of interest occupies only a brief window of each long, untrimmed clip. Reliable biomechanical analysis, therefore, depends on spatiotemporal localization that identifies both the interacting entities and the onset of contact. We study First Point of Contact (FPOC), defined as the first frame in which a player physically touches a tackle dummy, in unconstrained practice footage with camera motion, clutter, multiple similarly equipped athletes, and rapid pose changes around impact. We present GRAZE, a training-free pipeline for FPOC localization that requires no labeled tackle-contact examples. GRAZE uses Grounding DINO to discover candidate player-dummy interactions, refines them with motion-aware temporal reasoning, and uses SAM2 as an explicit pixel-level verifier of contact rather than relying on detection confidence alone. This separation between candidate discovery and contact confirmation makes the approach robust to cluttered scenes and unstable grounding near impact. On 738 tackle-practice videos, GRAZE produces valid outputs for 97.4% of clips and localizes FPOC within $\pm$ 10 frames on 77.5% of all clips and within $\pm$ 20 frames on 82.7% of all clips. These results show that frame-accurate contact onset localization in real-world practice footage is feasible without task-specific training.

URL PDF HTML ☆

赞 0 踩 0

2511.08113 2026-06-11 cs.CL

Multimodal LLMs Do Not Compose Skills Optimally Across Modalities

Paula Ontalvilla, Aitor Ormazabal, Gorka Azkune

发表机构 * arXiv

2506.22141 2026-06-11 cs.CL cs.IR

DAPFAM: A Domain-Aware Family-level Dataset to benchmark cross domain patent retrieval

Iliass Ayaou, Denis Cavallucci, Hicham Chibane

发表机构 * Institut National des Sciences de l'Univers, Strasbourg（斯特拉斯堡国家科学大学）

2602.13513 2026-06-11 math.OC cs.CE cs.LG cs.NA math.DS math.NA

Learning Gradient Flow: Using Equation Discovery to Accelerate Engineering Optimization

Grant Norman, Conor Rowan, Kurt Maute, Alireza Doostan

发表机构 * Smead Aerospace Engineering Sciences（Smead航空航天工程科学）

2510.09885 2026-06-11 cs.CL cs.AI

Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs

Xu Pan, Ely Hahami, Jingxuan Fan, Ziqian Xie, Haim Sompolinsky

发表机构 * Harvard University（哈佛大学）； University of Texas Health Science Center at Houston（德克萨斯大学健康科学中心休斯顿分校）； Hebrew University（希伯来大学）

2602.03147 2026-06-11 cs.RO

Multi-function Robotized Surgical Dissector for Endoscopic Pulmonary Thromboendarterectomy: Preclinical Study and Evaluation

Runfeng Zhu, Xin Zhong, Qingxiang Zhao, Jing Lin, Zhong Wu, Kang Li

发表机构 * West China Hospital of Medicine, Sichuan University（四川大学华西医学中心）； School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China（电子科技大学机械与电子工程学院）

2601.21824 2026-06-11 cs.LG cs.DC

DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training

Xinwei Qiang, Hongmin Chen, Shixuan Sun, Jingwen Leng, Xin Liu, Minyi Guo

发表机构 * School of Computer Science, Shanghai Jiao Tong University（上海交通大学计算机科学学院）； ByteDance Seed（字节跳动种子）； Zhiyuan College, Shanghai Jiao Tong University（上海交通大学智源学院）

2409.00743 2026-06-11 cs.LG cs.AI

Interpretable Clustering: A Survey

Lianyu Hu, Mudi Jiang, Junjie Dong, Xinying Liu, Zengyou He

发表机构 * College of Information Science and Engineering, Henan University of Technology（河南理工大学信息科学与工程学院）； School of Software, Dalian University of Technology（大连理工大学软件学院）； Xinchang Power Supply Company, State Grid Corporation of China（国网浙江绍兴供电公司）

详情

DOI: 10.1145/3789495
Journal ref: ACM Computing Surveys, Volume 58, Issue 8, Article 215 (2026)
Comments: 14 pages, 2 figures, 3 tables

英文摘要

In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concern. This is not only necessary for gaining user trust but also for satisfying the growing ethical and regulatory demands in these fields. Ensuring that decisions derived from clustering algorithms can be clearly understood and justified is now a fundamental requirement. To address this need, this paper provides a comprehensive and structured review of the current state of explainable clustering algorithms, identifying key criteria to distinguish between various methods. These insights can effectively assist researchers in making informed decisions about the most suitable explainable clustering methods for specific application contexts, while also promoting the development and adoption of clustering algorithms that are both efficient and transparent. For convenient access and reference, an open repository organizes representative and emerging interpretable clustering methods under the taxonomy proposed in this survey, available at https://github.com/hulianyu/Awesome-Interpretable-Clustering

URL PDF HTML ☆

赞 0 踩 0

2601.09072 2026-06-11 cs.AI cs.CL stat.ME

Human-AI Co-design for Clinical Prediction Models

Jean Feng, Avni Kothari, Patrick Vossler, Andrew Bishara, Lucas Zier, Newton Addo, Aaron Kornblith, Yan Shuo Tan, Chandan Singh

发表机构 * University of California, San Francisco（加州大学旧金山分校）； National University of Singapore（新加坡国立大学）； Microsoft Research（微软研究院）

详情

DOI: 10.1038/s41746-026-02838-5
Journal ref: npj Digital Medicine 2026

英文摘要

Developing safe, effective, and practically useful clinical prediction models (CPMs) traditionally requires iterative collaboration between clinical experts, data scientists, and informaticists. This process refines the often small but critical details of the model building process, such as which features/patients to include and how clinical categories should be defined. However, this traditional collaboration process is extremely time- and resource-intensive, resulting in only a small fraction of CPMs reaching clinical practice. This challenge intensifies when teams attempt to incorporate unstructured clinical notes, which can contain an enormous number of concepts. To address this challenge, we introduce HACHI, an iterative human-in-the-loop framework that uses AI agents to accelerate the development of fully interpretable CPMs by enabling the exploration of concepts in clinical notes. HACHI alternates between (i) an AI agent rapidly exploring and evaluating candidate concepts in clinical notes and (ii) clinical and domain experts providing feedback to improve the CPM learning process. HACHI defines concepts as simple yes-no questions that are used in linear models, allowing the clinical AI team to transparently review, refine, and validate the CPM learned in each round. In two real-world prediction tasks (acute kidney injury and traumatic brain injury), HACHI outperforms existing approaches, surfaces new clinically relevant concepts not included in commonly-used CPMs, and improves model generalizability across clinical sites and time periods. Furthermore, HACHI reveals the critical role of the clinical AI team, such as directing the AI agent to explore concepts that it had not previously considered, adjusting the granularity of concepts it considers, changing the objective function to better align with the clinical objectives, and identifying issues of data bias and leakage.

URL PDF HTML ☆

赞 0 踩 0

2601.07436 2026-06-11 eess.SP cs.LG physics.optics

PIDT: Physics-Informed Digital Twin for Optical Fiber Parameter Estimation

Zicong Jiang, Magnus Karlsson, Erik Agrell, Christian Häger

发表机构 * Dept. of Electrical Engineering, Chalmers Univ. of Technology, Sweden（电气工程系，瑞典查尔姆斯理工大学）； Dept. of Microtechnology and Nanoscience, Chalmers Univ. of Technology, Sweden（微电子与纳米科技系，瑞典查尔姆斯理工大学）

2512.20011 2026-06-11 cs.CV

PaveSync: A Unified and Comprehensive Dataset for Pavement Distress Analysis and Classification

Blessing Agyei Kyem, Joshua Kofi Asamoah, Anthony Dontoh, Andrews Danyo, Eugene Denteh, Armstrong Aboah

发表机构 * University of Ghana（加纳大学）

2510.11290 2026-06-11 cs.AI cs.HC

Evolution in Simulation: AI-Agent School with Dual Memory for High-Fidelity Educational Dynamics

Sheng Jin, Haoming Wang, Zhiqi Gao, Yongbo Yang, Bao Chunjia, Chengliang Wang

发表机构 * Guanghua Law School, Zhejiang University（浙江大学法学院）； Faculty of Education, East China Normal University（华东师范大学教育学院）； School of Data Science, The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳）数据科学学院）； Department of Electrical and Computer Engineering, University of California San Diego（加州大学圣地亚哥分校电子与计算机工程系）； Institute of Systems Science, National University of Singapore（新加坡国立大学系统科学研究所）

2412.13841 2026-06-11 cs.CY cs.AI cs.HC

Cultural Dimensions of AI Perception: Charting Expectations, Risks, Benefits, Tradeoffs, and Value in Germany and China

Philipp Brauner, Felix Glawe, Gian Luca Liehner, Luisa Vervier, Martina Ziefle

发表机构 * RWTH Aachen University（亚琛工业大学）

2508.11703 2026-06-11 cs.NE cs.LG

Data-Driven Discovery of Interpretable Kalman Filter Variants through Large Language Models and Genetic Programming

Vasileios Saketos, Sebastian Kaltenbach, Sergey Litvinov, Petros Koumoutsakos

发表机构 * University of Reading（reading大学）； University of Cambridge（剑桥大学）

2508.15943 2026-06-11 cs.AI

T-ILR: a Neurosymbolic Integration for LTLf

Riccardo Andreoni, Andrei Buliga, Alessandro Daniele, Chiara Ghidini, Marco Montali, Massimiliano Ronzani

发表机构 * Fondazione Bruno Kessler（布鲁诺·科塞勒基金会）； Free University of Bozen-Bolzano（博兹纳-博尔扎诺自由大学）； University of Bozen-Bolzano（博兹纳-博尔扎诺大学）

2503.08379 2026-06-11 cs.IR cs.CL

JurisTCU: A Brazilian Portuguese Information Retrieval Dataset with Query Relevance Judgments

Leandro Carísio Fernandes, Leandro dos Santos Ribeiro, Marcos Vinícius Borela de Castro, Leonardo Augusto da Silva Pacheco, Edans Flávius de Oliveira Sandes

发表机构 * Câmara dos Deputados（议会委员会）； Tribunal de Contas da União (TCU)（联邦审计法院）

详情

DOI: 10.1007/s10579-025-09881-w
Comments: 23 pages

英文摘要

This paper introduces JurisTCU, a Brazilian Portuguese dataset for legal information retrieval (LIR). The dataset is freely available and consists of 16,045 jurisprudential documents from the Brazilian Federal Court of Accounts, along with 150 queries annotated with relevance judgments. It addresses the scarcity of Portuguese-language LIR datasets with query relevance annotations. The queries are organized into three groups: real user keyword-based queries, synthetic keyword-based queries, and synthetic question-based queries. Relevance judgments were produced through a hybrid approach combining LLM-based scoring with expert domain validation. We used JurisTCU in 14 experiments using lexical search (document expansion methods) and semantic search (BERT-based and OpenAI embeddings). We show that the document expansion methods significantly improve the performance of standard BM25 search on this dataset, with improvements exceeding 45% in P@10, R@10, and nDCG@10 metrics when evaluating short keyword-based queries. Among the embedding models, the OpenAI models produced the best results, with improvements of approximately 70% in P@10, R@10, and nDCG@10 metrics for short keyword-based queries, suggesting that these dense embeddings capture semantic relationships in this domain, surpassing the reliance on lexical terms. Besides offering a dataset for the Portuguese-language IR research community, suitable for evaluating search systems, the results also contribute to enhancing a search system highly relevant to Brazilian citizens.

URL PDF HTML ☆

赞 0 踩 0

2505.11308 2026-06-11 cs.LG physics.comp-ph

Reinforcement Learning Closures for Underresolved Partial Differential Equations using Synthetic Data

Lothar Heimbach, Sebastian Kaltenbach, Petr Karnakov, Francis J. Alexander, Petros Koumoutsakos

发表机构 * ETH Zurich/ Harvard University（苏黎世联邦理工学院/哈佛大学）； Harvard University（哈佛大学）； Argonne National Laboratory（阿贡国家实验室）

2407.08035 2026-06-11 cs.CL cs.IR

FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios

Yongjian Tang, Rakebul Hasan, Thomas Runkler

发表机构 * Technical University of Munich（慕尼黑技术大学）； Siemens AG（西门子股份公司）

2411.10077 2026-06-11 cs.CV

Hierarchical Mutual Distillation for Multi-View Fusion: Learning from All Possible View Combinations

Jiwoong Yang, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（翰阳大学）； Hankuk University of Foreign Studies（韩国民法大学）

2502.09084 2026-06-11 cs.CR cs.LG cs.NI

Application of Tabular Transformer Architectures for Operating System Fingerprinting

Rubén Pérez-Jove, Cristian R. Munteanu, Alejandro Pazos, Jose Vázquez-Naya

发表机构 * RNASNA-IMEDIR Research Group Department of Computer Science and Information Technologies Facultad de Informática Universidade da Coruña（RNASNA-IMEDIR研究组计算机科学与信息科技系信息学院科鲁纳大学）； CITIC Research Centre Universidade da Coruña（CITIC研究中心科鲁纳大学）； IKERDATA S.L（IKERDATA公司）

2502.07990 2026-06-11 cs.LG physics.comp-ph physics.flu-dyn

Learning Effective Dynamics across Spatio-Temporal Scales of Complex Flows

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * Harvard SEAS（哈佛大学SEAS）

2402.00972 2026-06-11 cs.LG cs.MA physics.comp-ph

Closure Discovery for Coarse-Grained Partial Differential Equations Using Grid-based Reinforcement Learning

Jan-Philipp von Bassewitz, Sebastian Kaltenbach, Petros Koumoutsakos

发表机构 * ETH Zurich（苏黎世联邦理工学院）； Harvard SEAS（哈佛大学工程学院）

2412.12231 2026-06-11 cs.RO cs.LG

Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab

Leon Gorißen, Jan-Niklas Schneider, Mohamed Behery, Philipp Brauner, Moritz Lennartz, David Kötter, Thomas Kaster, Oliver Petrovic, Christian Hinke, Thomas Gries, Gerhard Lakemeyer, Martina Ziefle, Christian Brecher, Constantin Häfner

发表机构 * Chair for Laser Technology, RWTH Aachen University（激光技术系，亚琛RWTH大学）； Knowledge Based Systems Group, RWTH Aachen University（知识系统小组，亚琛RWTH大学）； Communication Science, RWTH Aachen University（通信科学，亚琛RWTH大学）； Chair of Textile Technology, RWTH Aachen University（纺织技术系，亚琛RWTH大学）； Laboratory for Machine Tools and Production Engineering, RWTH Aachen University（机械加工与生产工程实验室，亚琛RWTH大学）； Human Computer Interaction Center, RWTH Aachen University（人机交互中心，亚琛RWTH大学）； Fraunhofer Institute for Laser Technology（弗劳恩霍夫激光技术研究所）

2406.07909 2026-06-11 eess.AS cs.CL cs.SD stat.ML

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Eungbeom Kim, Hantae Kim, Kyogu Lee

发表机构 * KAIST（韩国科学技术院）

2305.13108 2026-06-11 eess.AS cs.CL cs.LG cs.SD

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

发表机构 * Institute of Information & communications Technology Planning & Evaluation (IITP)（信息与通信技术规划与评估机构）

2606.12373 2026-06-11 cs.CL 新提交

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

可验证环境是乐高积木：递归组合实现推理泛化

Hao Xiang, Qiaoyu Tang, Le Yu, Yaojie Lu, Xianpei Han, Ben He, Le Sun, Bowen Yu, Peng Wang, Hongyu Lin, Dayiheng Liu

AI总结提出RACES框架，将可验证环境视为可递归组合的构建块，通过定义四种组合算子自动生成复合环境，在六个未见基准上平均提升DeepSeek-R1-Distill-Qwen-14B 3.1分，且仅用50个基础环境即可达到300个环境的性能。

详情

AI中文摘要

基于可验证环境的强化学习已成为增强大语言模型推理能力的有效方法。虽然先前研究表明扩展环境数量可提升强化学习性能，但现有手动或单独构建方法受限于线性扩展瓶颈，阻碍了可扩展的推理泛化。本文提出RACES（递归自动组合环境扩展）框架，将可验证环境视为可递归组装的可组合构建块。关键洞察是：当一个环境的余域（输出类型）与另一个环境的定义域（输入类型）匹配时，它们可以自动融合为新的可验证环境，从而实现递归组合。RACES使用300个独立环境实现，并定义了四种组合算子（SEQUENTIAL、PARALLEL、SORT和SELECT），诱导出多样化的推理模式。大量实验表明，在这些复合环境上进行强化学习训练持续提升了推理泛化能力。具体而言，RACES在六个未见基准上平均提升DeepSeek-R1-Distill-Qwen-14B 3.1分（从48.2到51.3），并将Qwen3-14B的性能从58.8提升至61.1。此外，RACES仅使用50个基础环境即可达到与使用300个独立环境训练相当的性能，展现了显著的环境利用效率。

英文摘要

Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual construction methods suffer from linear scaling limits, thereby hindering scalable reasoning generalization. This paper introduces RACES (\textbf{R}ecursive \textbf{A}utomated \textbf{C}omposition for \textbf{E}nvironment \textbf{S}caling), a framework that conceptualizes verifiable environments as composable building blocks that can be recursively assembled. The key insight is that when the codomain (output type) of one environment matches the domain (input type) of another, they can be automatically fused into a new verifiable environment, enabling recursive composition. RACES is implemented with 300 individual environments and defines a set of composition operators (\textsc{SEQUENTIAL}, \textsc{PARALLEL}, \textsc{SORT}, and \textsc{SELECT}) that induce diverse reasoning patterns. Extensive experiments show that RL training on these composite environments consistently enhances reasoning generalization. Specifically, RACES improves DeepSeek-R1-Distill-Qwen-14B by an average of 3.1 points (from 48.2 to 51.3) and boosts Qwen3-14B performance from 58.8 to 61.1 on six benchmarks, which are unseen during the construction of training environments. Moreover, RACES achieves performance comparable to training on 300 individual environments using only 50 base environments, demonstrating significant efficiency in environment utilization.

URL PDF HTML ☆

赞 0 踩 0