arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.06613 2026-03-11 cs.LG cs.AI cs.CV cs.NE

OptiRoulette Optimizer: A New Stochastic Meta-Optimizer for up to 5.3x Faster Convergence

Stamatis Mastromichalakis

Comments 23 pages, 10 figures, 7 tables

详情

英文摘要

This paper presents OptiRoulette, a stochastic meta-optimizer that selects update rules during training instead of fixing a single optimizer. The method combines warmup optimizer locking, random sampling from an active optimizer pool, compatibility-aware learning-rate scaling during optimizer transitions, and failure-aware pool replacement. OptiRoulette is implemented as a drop-in, "torch.optim.Optimizer-compatible" component and packaged for pip installation. We report completed 10-seed results on five image-classification suites: CIFAR-100, CIFAR-100-C, SVHN, Tiny ImageNet, and Caltech-256. Against a single-optimizer AdamW baseline, OptiRoulette improves mean test accuracy from 0.6734 to 0.7656 on CIFAR-100 (+9.22 percentage points), 0.2904 to 0.3355 on CIFAR-100-C (+4.52), 0.9667 to 0.9756 on SVHN (+0.89), 0.5669 to 0.6642 on Tiny ImageNet (+9.73), and 0.5946 to 0.6920 on Caltech-256 (+9.74). Its main advantage is convergence reliability at higher targets: it reaches CIFAR-100/CIFAR-100-C 0.75, SVHN 0.96, Tiny ImageNet 0.65, and Caltech-256 0.62 validation accuracy in 10/10 runs, while the AdamW baseline reaches none of these targets within budget. On shared targets, OptiRoulette also reduces time-to-target (e.g., Caltech-256 at 0.59: 25.7 vs 77.0 epochs). Paired-seed deltas are positive on all datasets; CIFAR-100-C test ROC-AUC is the only metric not statistically significant in the current 10-seed study.

URL PDF HTML ☆

赞 0 踩 0

2603.05698 2026-03-11 cs.CL

Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis

Hazem Amamou, Stéphane Gagnon, Alan Davoust, Anderson R. Avila

Comments Accepted at IEEE SMC 2025 (International Conference on Systems, Man, and Cybernetics)

2603.01549 2026-03-11 cs.CV cs.AI cs.RO

Pri4R: Learning World Dynamics for Vision-Language-Action Models with Privileged 4D Representation

Jisoo Kim, Jungbin Cho, Sanghyeok Chu, Ananya Bal, Jinhyung Kim, Gunhee Lee, Sihaeng Lee, Seung Hwan Kim, Bohyung Han, Hyunmin Lee, Laszlo A. Jeni, Seungryong Kim

2603.01479 2026-03-11 cs.RO

Multimodal Adversarial Quality Policy for Safe Grasping

Kunlin Xie, Chenghao Li, Haolan Zhang, Nak Young Chong

Comments submitted

2602.22755 2026-03-11 cs.CL

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

Abhay Sheshadri, Aidan Ewart, Kai Fronsdal, Isha Gupta, Samuel R. Bowman, Sara Price, Samuel Marks, Rowan Wang

2602.16020 2026-03-11 cs.LG cond-mat.mtrl-sci

MolCrystalFlow: Molecular Crystal Structure Prediction via Flow Matching

Cheng Zeng, Harry W. Sullivan, Thomas Egg, Maya M. Martirossyan, Philipp Höllmer, Jirui Jin, Richard G. Hennig, Adrian Roitberg, Stefano Martiniani, Ellad B. Tadmor, Mingjie Liu

Comments 20 pages, 4 figures. 14 pages in SI. Code: https://github.com/Liu-Group-UF/MolCrystalFlow

2602.03733 2026-03-11 cs.CV

RegionReasoner: Region-Grounded Multi-Round Visual Reasoning

Wenfang Sun, Hao Chen, Yingjun Du, Yefeng Zheng, Cees G. M. Snoek

Comments Accepted by ICLR 2026

2601.22444 2026-03-11 cs.LG cs.AI

Automating Forecasting Question Generation and Resolution for AI Evaluation

Nikos I. Bosse, Peter Mühlbacher, Jack Wildman, Lawrence Phillips, Dan Schwarz

Comments 41 pages, 4 figures

2601.20461 2026-03-11 cs.CV

Exploiting the Final Component of Generator Architectures for AI-Generated Image Detection

Yanzhu Liu, Xiao Liu, Yuexuan Wang, Mondal Soumik

2601.11915 2026-03-11 cs.CV

Low-rank Orthogonal Subspace Intervention for Generalizable Face Forgery Detection

Chi Wang, Xinjue Hu, Boyu Wang, Ziwen He, Zhangjie Fu

2512.15943 2026-03-11 cs.AI

Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning

Polaris Jhandi, Owais Kazi, Shreyas Subramanian, Neel Sendas

Comments Accepted at AAAI 2026 Workshop on Agentic AI Benchmarks and Applications for Enterprise Tasks

详情

英文摘要

As organizations scale adoption of generative AI, model cost optimization and operational efficiency have emerged as critical factors determining sustainability and accessibility. While Large Language Models (LLMs) demonstrate impressive capabilities across diverse tasks, their extensive computational requirements make them cost-prohibitive for routine enterprise use. This limitation motivates the exploration of Small Language Models (SLMs), which can deliver comparable performance in targeted applications while drastically reducing infrastructure overhead (Irugalbandara et al., 2023). In this work, we investigate the feasibility of replacing LLM-driven workflows with optimized SLMs. We trained a domain-adapted SLM to execute representative tasks traditionally handled by LLMs, such as document summarization, query answering, and structured data interpretation. As part of the experiment, we investigated the fine-tuning of facebook/opt-350m model (single epoch only) using the Hugging Face TRL (Transformer Reinforcement Learning), specifically the Supervised Fine-Tuning (SFT) trainer. The OPT-350M model was released by Meta AI in 2022 as part of the OPT (Open Pretrained Transformer) family of models. Similar studies demonstrate that even models at the 350M parameter scale can meaningfully contribute to instruction-tuning pipelines (Mekala et al., 2024). Experimental results demonstrated that our fine-tuned SLM achieves exceptional performance with a 77.55\% pass rate on ToolBench evaluation, significantly outperforming all baseline models including ChatGPT-CoT (26.00\%), ToolLLaMA-DFS (30.18\%), and ToolLLaMA-CoT (16.27\%). These findings emphasize that thoughtful design and targeted training of SLMs can significantly lower barriers to adoption, enabling cost-effective, large-scale integration of generative AI into production systems.

URL PDF HTML ☆

赞 0 踩 0

2511.00153 2026-03-11 cs.RO

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

Justin Yu, Yide Shentu, Di Wu, Pieter Abbeel, Ken Goldberg, Philipp Wu

2510.24427 2026-03-11 cs.CL

SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models

Ken Gu, Advait Bhat, Mike A Merrill, Robert West, Xin Liu, Daniel McDuff, Tim Althoff

Comments ICLR 2026

2510.15242 2026-03-11 cs.LG

Bradley-Terry Policy Optimization for Generative Preference Modeling

Shengyu Feng, Yun He, Shuang Ma, Beibin Li, Yuanhao Xiong, Songlin Li, Karishma Mandyam, Julian Katz-Samuels, Shengjie Bi, Licheng Yu, Hejia Zhang, Karthik Abinav Sankararaman, Han Fang, Yiming Yang, Manaal Faruqui

2510.08173 2026-03-11 cs.RO cs.AI cs.CL cs.CV

NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

Haolin Yang, Yuxing Long, Zhuoyuan Yu, Zihan Yang, Minghan Wang, Jiapeng Xu, Yihan Wang, Ziyan Yu, Wenzhe Cai, Lei Kang, Hao Dong

Comments ICRA 2026

2510.06261 2026-03-11 cs.AI cs.CL cs.LG

AlphaApollo: A System for Deep Agentic Reasoning

Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Tian Cheng, Jianghangfan Zhang, Tangyu Jiang, Linrui Xu, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han

Comments Ongoing project

2510.06195 2026-03-11 cs.CL cs.AI cs.LG eess.AS

Latent Speech-Text Transformer

Yen-Ju Lu, Yashesh Gaur, Wei Zhou, Benjamin Muller, Jesus Villalba, Najim Dehak, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Srinivasan Iyer, Duc Le

Comments Accepted to ICLR 2026 (Oral)

2510.03504 2026-03-11 cs.RO

Connectivity Maintenance and Recovery for Multi-Robot Motion Planning

Yutong Wang, Lishuo Pan, Yichun Qu, Tengxiang Wang, Nora Ayanian

2510.02097 2026-03-11 cs.CV

Mapping Historic Urban Footprints in France: Balancing Quality, Scalability and AI Techniques

Walid Rabehi, Marion Le Texier, Rémi Lemoy

2510.00172 2026-03-11 cs.CL

DRBench: A Realistic Benchmark for Enterprise Deep Research

Amirhossein Abaskohi, Tianyi Chen, Miguel Muñoz-Mármol, Curtis Fox, Amrutha Varshini Ramesh, Étienne Marcotte, Xing Han Lù, Nicolas Chapados, Spandana Gella, Peter West, Giuseppe Carenini, Christopher Pal, Alexandre Drouin, Issam H. Laradji

2509.21609 2026-03-11 cs.CV cs.LG

VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment

Md. Mahfuzur Rahman, Kishor Datta Gupta, Marufa Kamal, Fahad Rahman, Sunzida Siddique, Ahmed Rafi Hasan, Mohd Ariful Haque, Roy George

Comments 28 pages, 30 figures, 1 algorithms

2509.18057 2026-03-11 cs.LG cs.AI cs.CC math.CO

Reinforced Generation of Combinatorial Structures: Hardness of Approximation

Ansh Nagda, Prabhakar Raghavan, Abhradeep Thakurta

详情

英文摘要

Can AI based methods help us make advances in complexity theory? We provide evidence towards answering this in the affirmative, using AlphaEvolve (an LLM code mutation agent) to obtain new results in three settings: a) We improve a recent result of Kunisky and Yu to obtain near-optimal upper and (conditional) lower bounds on certification algorithms for MAX-CUT and MAX-Independent Set on random 3- and 4-regular graphs. Our improved lower bounds are obtained by constructing nearly extremal Ramanujan graphs on as many as $163$ vertices, and our upper bounds are obtained via analytical arguments. b) We obtain new inapproximability results for MAX-4-CUT and MAX-3-CUT, proving that it is NP-hard to approximate them within factors of $0.987$ and $0.9649$ respectively, using AlphaEvolve to discover new gadget reductions. Our MAX-4-CUT result improves upon the SOTA of $0.9883$, and our MAX-3-CUT result improves on the current best gadget-based inapproximability result of $0.9853$, but falls short of the SOTA of $16/17$ that relies on a custom PCP (rather than a reduction from ``standard'' Håstad-style PCPs). c) Inapproximability for the metric Traveling Salesman Problem (TSP): We show that it is NP-hard to approximate the minimum cost tour within a factor of $111/110$ using AlphaEvolve to discover a new gadget, thus improving the SOTA of $117/116$. Along the way, we provide new modular soundness and completeness arguments that can be of independent interest. A key technical challenge we faced: verifying a candidate construction produced by AlphaEvolve is costly (sometimes requiring time exponential in the size of the construction). We used AlphaEvolve itself to evolve the verification procedure to be faster (sometimes by $10,000\times$ for our gadgets). Our results suggest that gadget based proofs would benefit from a pass through AI-based tools to obtain stronger results.

URL PDF HTML ☆

赞 0 踩 0

2509.15339 2026-03-11 cs.CL

Quantifying Genuine Awareness in Hallucination Prediction Beyond Question-Side Shortcuts

Yeongbin Seo, Dongha Lee, Jinyoung Yeo

2509.14342 2026-03-11 cs.RO

Multi-Quadruped Cooperative Object Transport: Learning Decentralized Pinch-Lift-Move

Bikram Pandit, Aayam Kumar Shrestha, Alan Fern

Comments Accepted to ICRA 2026. Project page: https://decplm.github.io

2509.00544 2026-03-11 cs.CL

When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment

Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He

Comments ICLR 2026

2508.17366 2026-03-11 cs.AI cs.CY cs.MA

Computational Multi-Agents Society Experiments: Social Modeling Framework Based on Generative Agents

Hanzhong Zhang, Muhua Huang, Jindong Wang

Comments 20 pages, 3 figures

2508.12296 2026-03-11 cs.RO

A robust and compliant robotic assembly control strategy for batch precision assembly task with uncertain fit types and fit amounts

Bin Wang, Jiwen Zhang, Song Wang, Dan Wu

详情

DOI: 10.1109/TASE.2026.3669249
Journal ref: IEEE Transactions on Automation Science and Engineering, vol. 23, pp. 5502-5515, 2026

英文摘要

In some high-precision industrial applications, robots are deployed to perform precision assembly tasks on mass batches of manufactured pegs and holes. If the peg and hole are designed with transition fit, machining errors may lead to either a clearance or an interference fit for a specific pair of components, with uncertain fit amounts. This paper focuses on the robotic batch precision assembly task involving components with uncertain fit types and fit amounts, and proposes an efficient methodology to construct the robust and compliant assembly control strategy. Specifically, the batch precision assembly task is decomposed into multiple deterministic subtasks, and a force-vision fusion controller-driven reinforcement learning method and a multi-task reinforcement learning training method (FVFC-MTRL) are proposed to jointly learn multiple compliance control strategies for these subtasks. Subsequently, the multi-teacher policy distillation approach is designed to integrate multiple trained strategies into a unified student network, thereby establishing a robust control strategy. Real-world experiments demonstrate that the proposed method successfully constructs the robust control strategy for high-precision assembly task with different fit types and fit amounts. Moreover, the MTRL framework significantly improves training efficiency, and the final developed control strategy achieves superior force compliance and higher success rate compared with many existing methods.

URL PDF HTML ☆

赞 0 踩 0

2508.11144 2026-03-11 cs.LG

CTRL Your Shift: Clustered Transfer Residual Learning for Many Small Datasets

Gauri Jain, Dominik Rothenhäusler, Kirk Bansak, Elisabeth Paulson

2507.18311 2026-03-11 cs.CV

Improving Large Vision-Language Models' Understanding for Flow Field Data

Xiaomei Zhang, Hanyu Zheng, Xiangyu Zhu, Jinghuan Wei, Junhong Zou, Zhen Lei, Zhaoxiang Zhang

Comments Accepted by Machine Intelligence Research

详情

DOI: 10.1007/s11633-026-1635-z

英文摘要

Large Vision-Language Models (LVLMs) have shown impressive capabilities across a range of tasks that integrate visual and textual understanding, such as image captioning and visual question answering. These models are trained on large-scale image and video datasets paired with text, enabling them to bridge visual perception and natural language processing. However, their application to scientific domains, especially in interpreting complex field data commonly used in the natural sciences, remains underexplored. In this work, we introduce FieldLVLM, a novel framework designed to improve large vision-language models' understanding of field data. FieldLVLM consists of two main components: a field-aware language generation strategy and a data-compressed multimodal model tuning. The field-aware language generation strategy leverages a special-purpose machine learning pipeline to extract key physical features from field data, such as flow classification, Reynolds number, and vortex patterns. This information is then converted into structured textual descriptions that serve as a dataset. The data-compressed multimodal model tuning focuses on LVLMs with these generated datasets, using a data compression strategy to reduce the complexity of field inputs and retain only the most informative values. This ensures compatibility with the models language decoder and guides its learning more effectively. Experimental results on newly proposed benchmark datasets demonstrate that FieldLVLM significantly outperforms existing methods in tasks involving scientific field data. Our findings suggest that this approach opens up new possibilities for applying large vision-language models to scientific research, helping bridge the gap between large models and domain-specific discovery.

URL PDF HTML ☆

赞 0 踩 0

2506.15304 2026-03-11 cs.CL cs.AI cs.LG

ConLID: Supervised Contrastive Learning for Low-Resource Language Identification

Negar Foroutan, Jakhongir Saydaliev, Ye Eun Kim, Antoine Bosselut

Comments EACL 2026 - Main Conference