arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.25044 2026-04-10 cs.RO

ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

Young-Chae Son, Dae-Kwan Ko, Yoon-Ji Choi, Soo-Chul Lim

Comments 2026 RA-L

详情

DOI: 10.1109/LRA.2026.3678130

英文摘要

In recent human-robot collaboration environments, there is a growing focus on integrating diverse sensor data beyond visual information to enable safer and more intelligent task execution. Although thermal data can be crucial for enhancing robot safety and operational efficiency, its integration has been relatively overlooked in prior research. This paper proposes a novel Vision-Language-Action (VLA) framework that incorporates thermal information for robot task execution. The proposed system leverages a Vision-Language Model (VLM) as a high-level planner to interpret complex natural language commands and decompose them into simpler sub-tasks. This approach facilitates efficient data collection and robust reasoning for complex operations. Unlike conventional methods that rely solely on visual data, our approach integrates thermal information, enabling the robot to perceive physical properties and proactively ensure environmental safety. Experimental results from real-world task scenarios validate the feasibility of our proposed framework, suggesting its potential to enhance task success rates and safety compared to existing vision-based systems.

URL PDF HTML ☆

赞 0 踩 0

2603.22128 2026-04-10 cs.LG stat.ML

Computationally lightweight classifiers with frequentist bounds on predictions

Shreeram Murali, Cristian R. Rojas, Dominik Baumann

Comments 9 pages, references, checklist, and appendix. Total 23 pages. Accepted to AISTATS2026

2603.21354 2026-04-10 cs.LG cs.DC

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

Huamin Chen, Xunzhuo Liu, Bowei He, Fuyuan Lyu, Yankai Chen, Xue Liu, Yuhan Liu, Junchen Jiang

Comments Vision Paper

详情

英文摘要

Over the past year, the vLLM Semantic Router project has released a series of work spanning: (1) core routing mechanisms -- signal-driven routing, context-length pool routing, router performance engineering, policy conflict detection, low-latency embedding models, category-aware semantic caching, user-feedback-driven routing adaptation, hallucination detection, and hierarchical content-safety classification for privacy and jailbreak protection; (2) fleet optimization -- fleet provisioning and energy-efficiency analysis; (3) agentic and multimodal routing -- multimodal agent routing, tool selection, CUA security, and multi-turn context memory and safety; (4) governance and standards -- inference routing protocols and multi-provider API extensions. Each paper tackled a specific problem in LLM inference, but the problems are not independent; for example, fleet provisioning depends on the routing policy, which depends on the workload mix, shifting as organizations adopt agentic and multimodal workloads. This paper distills those results into the Workload-Router-Pool (WRP) architecture, a three-dimensional framework for LLM inference optimization. Workload characterizes what the fleet serves (chat vs. agent, single-turn vs. multi-turn, warm vs. cold, prefill-heavy vs. decode-heavy). Router determines how each request is dispatched (static semantic rules, online bandit adaptation, RL-based model selection, quality-aware cascading). Pool defines where inference runs (homogeneous vs. heterogeneous GPU, disaggregated prefill/decode, KV-cache topology). We map our prior work onto a 3x3 WRP interaction matrix, identify which cells we have covered and which remain open, and propose twenty-one concrete research directions at the intersections, each grounded in our prior measurements, tiered by maturity from engineering-ready to open research.

URL PDF HTML ☆

赞 0 踩 0

2603.20843 2026-04-10 cs.CL cs.AI cs.LG

HiCI: Hierarchical Construction-Integration for Long-Context Attention

Xiangyu Zeng, Qi Xu, Yunke Wang, Chang Xu

Comments 18 pages, 5 figures

2603.20698 2026-04-10 cs.CV cs.CL

Clinical Cognition Alignment for Gastrointestinal Diagnosis with Multimodal LLMs

Huan Zheng, Yucheng Zhou, Tianyi Yan, Dubing Chen, Hongbo Lu, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen

2603.20114 2026-04-10 cs.CL

Current LLMs still cannot 'talk much' about grammar modules: Evidence from syntax

Mohammed Q. Shormani, Yehia A. AlSohbani

Comments 15 pages

2603.18474 2026-04-10 cs.CL cs.AI

WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior

Haonan Yu, Junhao Liu, Zhenyu Yan, Haoran Lin, Xin Zhang

2603.18472 2026-04-10 cs.AI cs.CV

Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding

Yinghui Li, Jiayi Kuang, Peng Xing, Daixian Liu, Yongheng Zhang, Junnan Dong, Shu-Yu Guo, Yangning Li, Qingyu Zhou, Wenhao Jiang, Hai-Tao Zheng, Ying Shen, Liang Lin, Philip S. Yu

2603.18019 2026-04-10 cs.CL cs.AI cs.SE

BenchBrowser: Retrieving Evidence for Evaluating Benchmark Validity

Harshita Diddee, Gregory Yauney, Swabha Swayamdipta, Daphne Ippolito

2603.16951 2026-04-10 cs.LG

Minimum-Action Learning: Energy-Constrained Symbolic Model Selection for Physical Law Identification from Noisy Data

Martin G. Frasch

Comments 28 pages, 10 figures, https://github.com/martinfrasch/minAction_kepler

2603.16570 2026-04-10 cs.CV

Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration

Amirhossein Kazerouni, Maitreya Suin, Tristan Aumentado-Armstrong, Sina Honari, Amanpreet Walia, Iqbal Mohomed, Konstantinos G. Derpanis, Babak Taati, Alex Levinshtein

Comments Accepted at CVPR 2026

2603.16365 2026-04-10 cs.AI

FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment

Qinhong Lin, Ruitao Feng, Yinglun Feng, Zhenxin Huang, Yukun Chen, Zhongliang Yang, Linna Zhou, Binjie Fei, Jiaqi Liu, Yu Li

Comments 26 pages, 10 figures

2603.15118 2026-04-10 cs.CV

VAREX: A Benchmark for Multi-Modal Structured Extraction from Documents

Udi Barzelay, Ophir Azulai, Inbar Shapira, Idan Friedman, Foad Abo Dahood, Madison Lee, Abraham Daniels

Comments 9 pages, 4 figures, 4 tables, plus 12-page supplementary. Dataset: https://huggingface.co/datasets/ibm-research/VAREX Code: https://github.com/udibarzi/varex-bench

2603.14997 2026-04-10 cs.CL cs.AI cs.IR

OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora

Jeffrey Flynt

Comments v2: Major revision. Recenters the paper on the simulation framework as the primary contribution. System Architecture substantially expanded (CRM state machine, Knowledge Recovery Arc, multi-pathway knowledge gap detection, embedding-based ticket assignment). Introduction restructured for broader framing. RAG retrieval baselines replaced by cross-document consistency evaluation

详情

英文摘要

Building and evaluating enterprise AI systems requires synthetic organizational corpora that are internally consistent, temporally structured, and cross-artifact traceable. Existing corpora either carry legal constraints or inherit hallucination artifacts from the generating LLMs, silently corrupting results when timestamps or facts contradict across documents and reinforcing those errors during training. We present OrgForge, an open-source multi-agent simulation framework that enforces a strict physics-cognition boundary: a deterministic Python engine maintains a SimEvent ground-truth bus while LLMs generate only surface prose. OrgForge simulates the organizational processes that produce documents, not the documents themselves. Engineers leave mid-sprint, triggering incident handoffs and CRM ownership lapses. Knowledge gaps emerge when under-documented systems break and recover through organic documentation and incident resolution. Customer emails fire only when simulation state warrants contact; silence is verifiable ground truth. A live CRM state machine extends the physics-cognition boundary to the customer boundary, producing cross-system causal cascades spanning engineering incidents, support escalation, deal risk flagging, and SLA-adjusted invoices. The framework generates fifteen interleaved artifact categories traceable to a shared immutable event log. Four graph-dynamic subsystems govern organizational behavior independently of any LLM. An embedding-based ticket assignment system using the Hungarian algorithm makes the simulation domain-agnostic. An empirical evaluation across ten incidents demonstrates a 0.46 absolute improvement in prose-to-ground-truth fidelity over chained LLM baselines, and isolates a consistent hallucination failure mode in which chaining propagates fabricated facts faithfully across documents without correcting them.

URL PDF HTML ☆

赞 0 踩 0

2603.11633 2026-04-10 cs.CV

MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation

Baicheng Li, Dong Wu, Jun Li, Shunkai Zhou, Zecui Zeng, Lusong Li, Hongbin Zha

2603.10476 2026-04-10 cs.CL cs.AI

Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs

Panatchakorn Anantaprayoon, Nataliia Babina, Nima Asgharbeygi, Jad Tarifi

Comments To appear in LREC 2026 2nd DELITE Workshop

2603.10100 2026-04-10 cs.LG cs.AI cs.AR

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

Vishal Shashidhar, Anupam Kumari, Roy P Paily

Comments Submitted to IEEE GCON 2026

2603.04791 2026-04-10 cs.AI

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling

Yong Liu, Xingjian Su, Shiyu Wang, Haoran Zhang, Haixuan Liu, Yuxuan Wang, Zhou Ye, Yang Xiang, Jianmin Wang, Mingsheng Long

2603.04385 2026-04-10 cs.CV cs.AI cs.LG

ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training

Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, Aleksander Holynski

Comments Project page: https://haian-jin.github.io/ZipMap

2603.04038 2026-04-10 cs.RO

Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control

Yiou Huang, Ning Ma, Weichu Zhao, Zinuo Liu, Jun Sun, Qiufeng Wang, Yaran Chen

2603.02070 2026-04-10 cs.AI cs.CL cs.HC cs.MA

Exploring Plan Space through Conversation: An Agentic Framework for LLM-Mediated Explanations in Planning

Guilhem Fouilhé, Rebecca Eifler, Antonin Poché, Sylvie Thiébaux, Nicholas Asher

Comments Preprint

2603.01059 2026-04-10 cs.CL

GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Zhuokang Shen, Yifan Wang, Hanyu Chen, Yunhang Shen, Wenxuan Huang, Gaoqi He, Jiao Xie, Rongrong Ji, Shaohui Lin

Comments 14 pages, 8 figures

2602.22683 2026-04-10 cs.CV cs.AI

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

Zhuohang Jiang, Xu Yuan, Haohao Qu, Shanru Lin, Kanglong Liu, Wenqi Fan, Qing Li

2602.22545 2026-04-10 cs.CV cs.AI

Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet

Agamdeep S. Chopra, Caitlin Neher, Tianyi Ren, Juampablo E. Heras Rivera, Hesam Jahanian, Mehmet Kurt

2602.20231 2026-04-10 cs.RO cs.CV

UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models

Manish Kumar Govind, Dominick Reilly, Pu Wang, Srijan Das

Comments https://manishgovind.github.io/unilact-vla/

2602.20223 2026-04-10 cs.LG cs.AI

MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning

Wall Kim, Chaeyoung Song, Hanul Kim

Comments Accepted to CVPR 2026

2602.13235 2026-04-10 cs.AI cs.CV

Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains

Yuqi Xiong, Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Zulong Chen, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu

2602.06912 2026-04-10 cs.CV cs.AI

PANC: Prior-Aware Normalized Cut via Anchor-Augmented Token Graphs

Juan Gutiérrez, Victor Gutiérrez-García, José Luis Blanco-Murillo

2601.20524 2026-04-10 cs.CV

AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors

Matic Fučka, Vitjan Zavrtanik, Danijel Skočaj

Comments Accepted to CVPR 2026

2601.16282 2026-04-10 cs.CL cs.AI

Generating Literature-Driven Scientific Theories at Scale

Peter Jansen, Peter Clark, Doug Downey, Daniel S. Weld

Comments 9 pages plus appendix, 3 figures