arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.14559 2026-03-17 cs.CV cs.AI cs.IR

A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy

Noha Ghatwary, Jiangbei Yue, Ahmed Elgendy, Hanna Nagdy, Ahmed Galal, Hayam Fathy, Hussein El-Amin, Venkataraman Subramanian, Noor Mohammed, Gilberto Ochoa-Ruiz, Sharib Ali

Comments 11

详情

英文摘要

Ulcerative colitis (UC) is a chronic mucosal inflammatory condition that places patients at increased risk of colorectal cancer. Colonoscopic surveillance remains the gold standard for assessing disease activity, and reporting typically relies on standardised endoscopic scoring metrics. The most widely used is the Mayo Endoscopic Score (MES), with some centres also adopting the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). Both are descriptive assessments of mucosal inflammation (MES: 0 to 3; UCEIS: 0 to 8), where higher values indicate more severe disease. However, computational methods for automatically predicting these scores remain limited, largely due to the lack of publicly available expert-annotated datasets and the absence of robust benchmarking. There is also a significant research gap in generating clinically meaningful descriptions of UC images, despite image captioning being a well-established computer vision task. Variability in endoscopic systems and procedural workflows across centres further highlights the need for multi-centre datasets to ensure algorithmic robustness and generalisability. In this work, we introduce a curated multi-centre, multi-resolution dataset that includes expert-validated MES and UCEIS labels, alongside detailed clinical descriptions. To our knowledge, this is the first comprehensive dataset that combines dual scoring metrics for classification tasks with expert-generated captions describing mucosal appearance and clinically accepted reasoning for image captioning. This resource opens new opportunities for developing clinically meaningful multimodal algorithms. In addition to the dataset, we also provide benchmarking using convolutional neural networks, vision transformers, hybrid models, and widely used multimodal vision-language captioning algorithms.

URL PDF HTML ☆

赞 0 踩 0

2603.14554 2026-03-17 cs.RO

MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer

Prakhar Mishra, Amir Hossain Raj, Xuesu Xiao, Dinesh Manocha

2603.14550 2026-03-17 cs.LG

Learning to Order: Task Sequencing as In-Context Optimization

Jan Kobiolka, Christian Frey, Arlind Kadra, Gresa Shala, Josif Grabocka

Comments Under Review

2603.14541 2026-03-17 cs.AI cs.IR

Expert Mind: A Retrieval-Augmented Architecture for Expert Knowledge Preservation in the Energy Sector

Diego Ezequiel Cervera

Comments 6 pages, 1 figure, conceptual architecture paper on retrieval-augmented expert knowledge systems

2603.14536 2026-03-17 cs.CV

Distilling Latent Manifolds: Resolution Extrapolation by Variational Autoencoders

Jiaming Chu, Tao Wang, Lei Jin

2603.14535 2026-03-17 cs.LG cs.AI cs.RO

Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms

Jingyi Liu, Jian Guo, Eberhard Gill

Comments Revised manuscript, submitted to Acta Astronautica

2603.14531 2026-03-17 cs.AI

Emotional Cost Functions for AI Safety: Teaching Agents to Feel the Weight of Irreversible Consequences

Pandurang Mopgar

2603.14529 2026-03-17 cs.RO

Bots and Blocks: Presenting a project-based approach for robotics education

Tobias Geger, Dominique Briechle, Andreas Rausch

Comments 12 pages, 3 figures, 23 references

2603.14528 2026-03-17 cs.CV cs.RO

Interp3R: Continuous-time 3D Geometry Estimation with Frames and Events

Shuang Guo, Filbert Febryanto, Lei Sun, Guillermo Gallego

Comments 18 pages, 6 figures, 5 tables

2603.14526 2026-03-17 cs.CV

LatSearch: Latent Reward-Guided Search for Faster Inference-Time Scaling in Video Diffusion

Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Zhensong Zhang, Jifei Song, Jiankang Deng, Ioannis Patras

Comments Project page: see https://zengqunzhao.github.io/LatSearch

详情

英文摘要

The recent success of inference-time scaling in large language models has inspired similar explorations in video diffusion. In particular, motivated by the existence of "golden noise" that enhances video quality, prior work has attempted to improve inference by optimising or searching for better initial noise. However, these approaches have notable limitations: they either rely on priors imposed at the beginning of noise sampling or on rewards evaluated only on the denoised and decoded videos. This leads to error accumulation, delayed and sparse reward signals, and prohibitive computational cost, which prevents the use of stronger search algorithms. Crucially, stronger search algorithms are precisely what could unlock substantial gains in controllability, sample efficiency and generation quality for video diffusion, provided their computational cost can be reduced. To fill in this gap, we enable efficient inference-time scaling for video diffusion through latent reward guidance, which provides intermediate, informative and efficient feedback along the denoising trajectory. We introduce a latent reward model that scores partially denoised latents at arbitrary timesteps with respect to visual quality, motion quality, and text alignment. Building on this model, we propose LatSearch, a novel inference-time search mechanism that performs Reward-Guided Resampling and Pruning (RGRP). In the resampling stage, candidates are sampled according to reward-normalised probabilities to reduce over-reliance on the reward model. In the pruning stage, applied at the final scheduled step, only the candidate with the highest cumulative reward is retained, improving both quality and efficiency. We evaluate LatSearch on the VBench-2.0 benchmark and demonstrate that it consistently improves video generation across multiple evaluation dimensions compared to the baseline Wan2.1 model.

URL PDF HTML ☆

赞 0 踩 0

2603.14525 2026-03-17 cs.CL cs.AI

MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection

Arkadiusz Modzelewski, Witold Sosnowski, Eleni Papadopulos, Elisa Sartori, Tiziano Labruna, Giovanni Da San Martino, Adam Wierzbicki

Comments Paper accepted to EACL 2026 Main Conference

2603.14524 2026-03-17 cs.RO

Architecting Autonomy for Safe Microgravity Free-Flyer Inspection

Keenan Albee, David C. Sternberg, Alexander Hansson, David Schwartz, Ritwik Majumdar, Oliver Jia-Richards

Comments 10 pages, 6 figures, published in the Proceedings of the 2025 IEEE Aerospace Conference

详情

DOI: 10.1109/AERO63441.2025.11068557
Journal ref: 2025 IEEE Aerospace Conference, Big Sky, MT, USA, 2025, pp. 1-10

英文摘要

Small free-flying spacecraft can provide vital extravehicular activity (EVA) services like inspection and repair for future orbital outposts like the Lunar Gateway. Operating adjacent to delicate space station and microgravity targets, these spacecraft require formalization to describe the autonomy that a free-flyer inspection mission must provide. This work explores the transformation of general mission requirements for this class of free-flyer into a set of concrete decisions for the planning and control autonomy architectures that will power such missions. Flowing down from operator commands for inspection of important regions and mission time-criticality, a motion planning problem emerges that provides the basis for developing autonomy solutions. Unique constraints are considered such as velocity limitations, pointing, and keep-in/keep-out zones, with mission fallback techniques for providing hierarchical safety guarantees under model uncertainties and failure. Planning considerations such as cost function design and path vs. trajectory control are discussed. The typical inputs and outputs of the planning and control autonomy stack of such a mission are also provided. Notional system requirements such as solve times and propellant use are documented to inform planning and control design. The entire proposed autonomy framework for free-flyer inspection is realized in the SmallSatSim simulation environment, providing a reference example of free-flyer inspection autonomy. The proposed autonomy architecture serves as a blueprint for future implementations of small satellite autonomous inspection in proximity to mission-critical hardware, going beyond the existing literature in terms of both (1) providing realistic system requirements for an autonomous inspection mission and (2) translating these requirements into autonomy design decisions for inspection planning and control.

URL PDF HTML ☆

赞 0 踩 0

2603.14523 2026-03-17 cs.CV cs.AI cs.RO

VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning

Chaoyang Wang, Wenrui Bao, Sicheng Gao, Bingxin Xu, Yu Tian, Yogesh S. Rawat, Yunhao Ge, Yuzhang Shang

Comments We introduce VLA-Thinker, the first VLA model capable of thinking-with-image reasoning, which models visual perception as a dynamically invocable reasoning action, enabling Multimodal Embodied Chain-of-Thought

2603.14522 2026-03-17 cs.RO

One-Policy-Fits-All: Geometry-Aware Action Latents for Cross-Embodiment Manipulation

Juncheng Mu, Sizhe Yang, Hojin Bae, Feiyu Jia, Qingwei Ben, Boyi Li, Huazhe Xu, Jiangmiao Pang

Comments ICRA 2026

2603.14517 2026-03-17 cs.AI cs.LG

Learning to Forget: Sleep-Inspired Memory Consolidation for Resolving Proactive Interference in Large Language Models

Ying Xie

2603.14514 2026-03-17 cs.LG cs.SY eess.SY math.OC stat.ML

High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise

Avik Kar, Siddharth Chandak, Rahul Singh, Eric Moulines, Shalabh Bhatnagar, Nicholas Bambos

Comments Submitted to SIAM Journal on Optimization

2603.14505 2026-03-17 cs.CV

Unlocking the Latent Canvas: Eliciting and Benchmarking Symbolic Visual Expression in LLMs

Yiren Zheng, Shibo Li, Jiaming Liu, Haofan Wang, Yiren Song

2603.14504 2026-03-17 cs.LG cs.AI cs.CV

Trust-Region Noise Search for Black-Box Alignment of Diffusion and Flow Models

Niklas Schweiger, Daniel Cremers, Karnik Ram

Comments Preprint (shorter version accepted at ICLR ReaLM-GEN workshop)

2603.14503 2026-03-17 cs.CV astro-ph.CO

Mapping Dark-Matter Clusters via Physics-Guided Diffusion Models

Diego Royo, Brandon Zhao, Adolfo Muñoz, Diego Gutierrez, Katherine L. Bouman

Comments 22 pages, 7 figures. Project page available at: https://graphics.unizar.es/projects/DarkMatterMapping

2603.14496 2026-03-17 cs.CV cs.LG

Refining 3D Medical Segmentation with Verbal Instruction

Kangxian Xie, Jiancheng Yang, Nandor Pinter, Chao Wu, Behzad Bozorgtabar, Mingchen Gao

2603.14493 2026-03-17 cs.CV cs.CL cs.LG

Fine-tuning MLLMs Without Forgetting Is Easier Than You Think

He Li, Yuhui Zhang, Xiaohan Wang, Kaifeng Lyu, Serena Yeung-Levy

2603.14489 2026-03-17 cs.LG

Predicting Stress-strain Behaviors of Additively Manufactured Materials via Loss-based and Activation-based Physics-informed Machine Learning

Chenglong Duan, Dazhong Wu

详情

英文摘要

Predicting the stress-strain behaviors of additively manufactured materials is crucial for part qualification in additive manufacturing (AM). Conventional physics-based constitutive models often oversimplify material properties, while data-driven machine learning (ML) models often lack physical consistency and interpretability. To address these issues, we propose a physics-informed machine learning (PIML) framework to improve the predictive performance and physical consistency for predicting the stress-strain curves of additively manufactured polymers and metals. A polynomial regression model is used to predict the yield point from AM process parameters, then stress-strain curves are segmented into elastic and plastic regions. Two long short-term memory (LSTM) models are trained to predict two regions separately. For the elastic region, Hooke's law is embedded into the LSTM model for both polymer and metal. For the plastic region, Voce hardening law and Hollomon's law are embedded into the LSTM model for polymer and metal, respectively. The loss-based and activation-based PIML architectures are developed by embedding the physical laws into the loss and activation functions, respectively. The performance of the two PIML architectures are compared with two LSTM-based ML models, three additional ML models, and a physics-based constitutive model. These models are built on experimental data collected from two additively manufactured polymers (i.e., Nylon and carbon fiber-acrylonitrile butadiene styrene) and two additively manufactured metals (i.e., AlSi10Mg and Ti6Al4V). Experimental results demonstrate that two PIML architectures consistently outperform the other models. The segmental predictive model with activation-based PIML architecture achieves the lowest MAPE of 10.46+/-0.81% and the highest R^2 of 0.82+/-0.05 arocss four datasets.

URL PDF HTML ☆

赞 0 踩 0

2603.14486 2026-03-17 cs.CL cs.AI

Infinite Problem Generator: Verifiably Scaling Physics Reasoning Data with Agentic Workflows

Aditya Sharan, Sriram Hebbale, Dhruv Kumar

2603.14484 2026-03-17 cs.LG

Unlearning-based sliding window for continual learning under concept drift

Michal Wozniak, Marek Klonowski, Maciej Maczynski, Bartosz Krawczyk

Comments 14 pages, 3 figures

2603.14478 2026-03-17 cs.LG cond-mat.mtrl-sci cs.AI cs.CE

Geometric and Topological Deep Learning for Predicting Thermo-mechanical Performance in Cold Spray Deposition Process Modeling

Akshansh Mishra

Comments 27 pages, 19 figures, 6 tables

2603.14475 2026-03-17 cs.CV

Wi-Spike: A Low-power WiFi Human Multi-action Recognition Model with Spiking Neural Networks

Nengbo Zhang, Yao Ying, Lu Wang, Kaishun Wu, Jieming Ma, Fei Luo

2603.14474 2026-03-17 cs.LG

On the (Generative) Linear Sketching Problem

Xinyu Yuan, Yan Qiao, Zonghui Wang, Wenzhi Chen

Comments 28 figures, 43 pages

2603.14473 2026-03-17 cs.CL

AI Can Learn Scientific Taste

Jingqi Tong, Mingzhe Li, Hangcheng Li, Yongzhuo Yang, Yurong Mou, Weijie Ma, Zhiheng Xi, Hongji Chen, Xiaoran Liu, Qinyuan Cheng, Ming Zhang, Qiguang Chen, Weifeng Ge, Qipeng Guo, Tianlei Ying, Tianxiang Sun, Yining Zheng, Xinchi Chen, Jun Zhao, Ning Ding, Xuanjing Huang, Yugang Jiang, Xipeng Qiu

Comments 44 pages, 4 figures

2603.14468 2026-03-17 cs.CV cs.IR

LongVidSearch: An Agentic Benchmark for Multi-hop Evidence Retrieval Planning in Long Videos

Rongyi Yu, Chenyuan Duan, Wentao Zhang

Comments 12 pages, 2 figures, appendix included

详情

英文摘要

Long video question answering (Long-Video QA) increasingly relies on agentic tool use to retrieve evidence from long videos. In realistic settings, this process often requires multi-hop retrieval, where agents must iteratively gather multiple discontinuous evidence clips. However, existing long-video benchmarks are largely static: they rarely enforce strict multi-hop retrieval and typically lack a standardized evidence-access interface, making it difficult to separate failures in retrieval planning from those in answer generation. To address this gap, we introduce LongVidSearch, a benchmark for evaluating agentic multi-hop evidence retrieval planning in long videos under standardized access constraints. LongVidSearch enforces retrieval necessity: a Hop-k question requires exactly k necessary evidence clips, and removing any single clip renders the question unsolvable. The benchmark contains 3,000 questions over 447 long videos (average length 26 minutes), covering four reasoning categories: State Mutation, Causal Inference, Global Summary, and Visual Tracking, with 2-hop, 3-hop, and 4-hop evidence requirements. To ensure fair and controlled evaluation, all agents interact with LongVidSearch through a unified tool interface, which fixes the retrieval backend and isolates the agent's ability to formulate queries and plan iterative retrieval. In addition to answer accuracy, we measure tool-call cost to analyze the accuracy-efficiency trade-off under identical access conditions. We evaluate VideoAgent-style QA agents with multiple backbone LLMs using three-judge majority voting. GPT-5 achieves the highest accuracy (42.43), outperforming Gemini 3 Pro (30.97) and GPT-4o (19.20), yet remaining below 50 %, highlighting the difficulty of multi-hop retrieval planning. With gold evidence clips, performance becomes near-perfect, confirming retrieval planning as the primary bottleneck.

URL PDF HTML ☆

赞 0 踩 0

2603.14458 2026-03-17 cs.CL cs.AI cs.IR

Distilling Reasoning Without Knowledge: A Framework for Reliable LLMs

Auksarapak Kietkajornrit, Jad Tarifi, Nima Asgharbeygi