arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.13051 2026-03-16 cs.LG q-bio.QM

Causal Cellular Context Transfer Learning (C3TL): An Efficient Architecture for Prediction of Unseen Perturbation Effects

Michael Scholkemper, Sach Mukherjee

Comments 12 Pages, 3 figures, Keywords: perturbation prediction, context transfer, lightweight, machine learning

详情

英文摘要

Predicting the effects of chemical and genetic perturbations on quantitative cell states is a central challenge in computational biology, molecular medicine and drug discovery. Recent work has leveraged large-scale single-cell data and massive foundation models to address this task. However, such computational resources and extensive datasets are not always accessible in academic or clinical settings, hence limiting utility. Here we propose a lightweight framework for perturbation effect prediction that exploits the structured nature of biological interventions and specific inductive biases/invariances. Our approach leverages available information concerning perturbation effects to allow generalization to novel contexts and requires only widely-available bulk molecular data. Extensive testing, comparing predictions of context-specific perturbation effects against real, large-scale interventional experiments, demonstrates accurate prediction in new contexts. The proposed approach is competitive with SOTA foundation models but requires simpler data, much smaller model sizes and less time. Focusing on robust bulk signals and efficient architectures, we show that accurate prediction of perturbation effects is possible without proprietary hardware or very large models, hence opening up ways to leverage causal learning approaches in biomedicine generally.

URL PDF HTML ☆

赞 0 踩 0

2603.13045 2026-03-16 cs.CL

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Yifeng Liu, Siqi Ouyang, Yatish Hosmane Revanasiddappa, Lei Li

Comments Our code is available at https://github.com/LeiLiLab/WALAR

2603.13044 2026-03-16 cs.CV cs.AI

Are General-Purpose Vision Models All We Need for 2D Medical Image Segmentation? A Cross-Dataset Empirical Study

Vanessa Borst, Samuel Kounev

Comments Under review, MICCAI 2026

2603.13038 2026-03-16 cs.CL

Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse

Hubert Plisiecki, Maria Leniarska, Jan Piotrowski, Marcin Zajenkowski

Comments Submitted to ACL 2026

2603.13033 2026-03-16 cs.CV cs.LG cs.RO

ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models

Yanpeng Zhao, Wentao Ding, Hongtao Li, Baoxiong Jia, Zilong Zheng

2603.13027 2026-03-16 cs.CV cs.AI cs.LG

SortScrews: A Dataset and Baseline for Real-time Screw Classification

Tianhao Fu, Bingxuan Yang, Juncheng Guo, Shrena Sribalan, Yucheng Chen

2603.13026 2026-03-16 cs.LG cs.CR

PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses

Chenlong Yin, Runpeng Geng, Yanting Wang, Jinyuan Jia

Comments 26 pages, 3 figures

2603.13024 2026-03-16 cs.CV cs.AI cs.LG eess.IV

SAW: Toward a Surgical Action World Model via Controllable and Scalable Video Generation

Sampath Rapuri, Lalithkumar Seenivasan, Dominik Schneider, Roger Soberanis-Mukul, Yufan He, Hao Ding, Jiru Xu, Chenhao Yu, Chenyan Jing, Pengfei Guo, Daguang Xu, Mathias Unberath

Comments The manuscript is under review

详情

英文摘要

A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulation -- from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expensive annotations or complex structured intermediates as conditioning signals at inference, limiting their scalability. Other approaches exhibit limited temporal consistency across complex laparoscopic scenes and do not possess sufficient realism. We propose Surgical Action World (SAW) -- a step toward surgical action world modeling through video diffusion conditioned on four lightweight signals: language prompts encoding tool-action context, a reference surgical scene, tissue affordance mask, and 2D tool-tip trajectories. We design a conditional video diffusion approach that reformulates video-to-video diffusion into trajectory-conditioned surgical action synthesis. The backbone diffusion model is fine-tuned on a custom-curated dataset of 12,044 laparoscopic clips with lightweight spatiotemporal conditioning signals, leveraging a depth consistency loss to enforce geometric plausibility without requiring depth at inference. SAW achieves state-of-the-art temporal consistency (CD-FVD: 199.19 vs. 546.82) and strong visual quality on held-out test data. Furthermore, we demonstrate its downstream utility for (a) surgical AI, where augmenting rare actions with SAW-generated videos improves action recognition (clipping F1-score: 20.93% to 43.14%; cutting: 0.00% to 8.33%) on real test data, and (b) surgical simulation, where rendering tool-tissue interaction videos from simulator-derived trajectory points toward a visually faithful simulation engine.

URL PDF HTML ☆

赞 0 踩 0

2603.13017 2026-03-16 cs.AI cs.CL cs.IR

Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation

Sydney Lewis

Comments 6 figures. Code: https://github.com/Process-Point-Technologies-Corporation/searchat

2603.12998 2026-03-16 cs.CV

A Closed-Form Solution for Debiasing Vision-Language Models with Utility Guarantees Across Modalities and Tasks

Tangzheng Lian, Guanyu Hu, Yijing Ren, Dimitrios Kollias, Oya Celiktutan

2603.12997 2026-03-16 cs.LG cs.CV

Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis

Chen Feng, Zhuo Zhi, Zhao Huang, Jiawei Ge, Ling Xiao, Nicu Sebe, Georgios Tzimiropoulos, Ioannis Patras

Comments Accepted to CVPR2026

2603.12994 2026-03-16 cs.RO

Route Fragmentation Based on Resource-centric Prioritisation for Efficient Multi-Robot Path Planning in Agricultural Environments

James R. Heselden, Gautham P. Das

Comments This work has been submitted to the IEEE for possible publication

2603.12989 2026-03-16 cs.CV cs.CR

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He, Weitong Chen, Wei Emma Zhang, Olaf Maennel, Lei Feng, Miao Xu

2603.12988 2026-03-16 cs.CV cs.AI

Fair Lung Disease Diagnosis from Chest CT via Gender-Adversarial Attention Multiple Instance Learning

Aditya Parikh, Aasa Feragen

2603.12986 2026-03-16 cs.LG

Retrieval-Enhanced Real Estate Appraisal

Simon Popelier, Matthieu X. B. Sarazin, Maximilien Bohm, Mathieu Gierski, Hanna Mergui, Matthieu Ospici, Adrien Bernhardt

Comments Accepted at NFMCP 2024 workshop (New Frontiers in Mining Complex Patterns), held in conjunction with ECML 2024

2603.12976 2026-03-16 cs.LG cs.CV

SCOPE: Semantic Coreset with Orthogonal Projection Embeddings for Federated learning

Md Anwar Hossen, Nathan R. Tallent, Luanzheng Guo, Ali Jannesary

2603.12967 2026-03-16 cs.RO

Language-Grounded Decoupled Action Representation for Robotic Manipulation

Wuding Weng, Tongshu Wu, Liucheng Chen, Siyu Xie, Zheng Wang, Xing Xu, Jingkuan Song, Heng Tao Shen

Comments Accepted by CVPR2026

2603.12963 2026-03-16 cs.CL

Long-form RewardBench: Evaluating Reward Models for Long-form Generation

Hui Huang, Yancheng He, Wei Liu, Muyun Yang, Jiaheng Liu, Kehai Chen, Bing Xu, Conghui Zhu, Hailong Cao, Tiejun Zhao

Comments Accepted by AAAI2026

2603.12960 2026-03-16 cs.RO cs.AI

Efficient Real-World Autonomous Racing via Attenuated Residual Policy Optimization

Raphael Trumpp, Denis Hoornaert, Mirco Theile, Marco Caccamo

2603.12942 2026-03-16 cs.RO

ReMem-VLA: Empowering Vision-Language-Action Model with Memory via Dual-Level Recurrent Queries

Hang Li, Fengyi Shen, Dong Chen, Liudi Yang, Xudong Wang, Jinkui Shi, Zhenshan Bing, Ziyuan Liu, Alois Knoll

Comments 14 pages, 6 figures

2603.12940 2026-03-16 cs.RO

Coordinated Manipulation of Hybrid Deformable-Rigid Objects in Constrained Environments

Anees Peringal, Anup Teejo Mathew, Panagiotis liatsis, Federico Renda

Comments 15 pages, 10 figures

2603.12939 2026-03-16 cs.RO

RoboStream: Weaving Spatio-Temporal Reasoning with Memory in Vision-Language Models for Robotics

Yuzhi Huang, Jie Wu, Weijue Bu, Ziyi Xiong, Gaoyang Jiang, Ye Li, Kangye Ji, Shuzhao Xie, Yue Huang, Chenglei Wu, Jingyan Jiang, Zhi Wang

详情

英文摘要

Enabling reliable long-horizon robotic manipulation is a crucial step toward open-world embodied intelligence. However, VLM-based planners treat each step as an isolated observation-to-action mapping, forcing them to reinfer scene geometry from raw pixels at every decision point while remaining unaware of how prior actions have reshaped the environment. Despite strong short-horizon performance, these systems lack the spatio-temporal reasoning required for persistent geometric anchoring and memory of action-triggered state transitions. Without persistent state tracking, perceptual errors accumulate across the execution horizon, temporarily occluded objects are catastrophically forgotten, and these compounding failures lead to precondition violations that cascade through subsequent steps. In contrast, humans maintain a persistent mental model that continuously tracks spatial relations and action consequences across interactions rather than reconstructing them at each instant. Inspired by this human capacity for causal spatio-temporal reasoning with persistent memory, we propose RoboStream, a training-free framework that achieves geometric anchoring through Spatio-Temporal Fusion Tokens (STF-Tokens), which bind visual evidence to 3D geometric attributes for persistent object grounding, and maintains causal continuity via a Causal Spatio-Temporal Graph (CSTG) that records action-triggered state transitions across steps. This design enables the planner to trace causal chains and preserve object permanence under occlusion without additional training or fine-tuning. RoboStream achieves 90.5% on long-horizon RLBench and 44.4% on challenging real-world block-building tasks, where both SoFar and VoxPoser score 11.1%, demonstrating that spatio-temporal reasoning and causal memory are critical missing components for reliable long-horizon manipulation.

URL PDF HTML ☆

赞 0 踩 0

2603.12938 2026-03-16 cs.CV cs.AI

Thinking in Streaming Video

Zikang Liu, Longteng Guo, Handong Li, Ru Zhen, Xingjian He, Ruyi Ji, Xiaoming Ren, Yanhao Zhang, Haonan Lu, Jing Liu

2603.12937 2026-03-16 cs.CV

SGMatch: Semantic-Guided Non-Rigid Shape Matching with Flow Regularization

Tianwei Ye, Xiaoguang Mei, Yifan Xia, Fan Fan, Jun Huang, Jiayi Ma

Comments 27 pages, 13 figures

2603.12936 2026-03-16 cs.RO cs.CV

MotionAnymesh: Physics-Grounded Articulation for Simulation-Ready Digital Twins

WenBo Xu, Liu Liu, Li Zhang, Dan Guo, RuoNan Liu

Comments 5 figures

2603.12933 2026-03-16 cs.AI

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

Xudong Wang, Chaoning Zhang, Jiaquan Zhang, Chenghao Li, Qigan Sun, Sung-Ho Bae, Peng Wang, Ning Xie, Jie Zou, Yang Yang, Hengtao Shen

Comments 11 pages, 3 figures, submitted to IEEE Transactions on Artificial Intelligence

2603.12930 2026-03-16 cs.CV cs.LG

Rethinking VLMs for Image Forgery Detection and Localization

Shaofeng Guo, Jiequan Cui, Richang Hong

Comments 8pages

2603.12926 2026-03-16 cs.AI cs.LO

ODRL Policy Comparison Through Normalisation

Jaime Osvaldo Salas, Paolo Pareti, George Konstantinidis

Comments Accepted at the 23rd European Semantic Web Conference (ESWC), ESWC 2026

2603.12920 2026-03-16 cs.CL stat.ML

HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection

Zixin Feng, Xinying Cui, Yifan Sun, Zheng Wei, Jiachen Yuan, Jiazhen Hu, Ning Xin, Md Maruf Hasan

2603.12915 2026-03-16 cs.CV cs.AI

Stake the Points: Structure-Faithful Instance Unlearning

Kiseong Hong, JungKyoo Shin, Eunwoo Kim

Comments Accepted by CVPR 2026