arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.21161 2026-03-06 cs.RO

MarketGen: A Scalable Simulation Platform with Auto-Generated Embodied Supermarket Environments

Xu Hu, Yiyang Feng, Junran Peng, Jiawei He, Liyi Chen, Wei Sui, Chuanchen Luo, Xucheng Yin, Qing Li, Zhaoxiang Zhang

Comments Project Page: https://xuhu0529.github.io/MarketGen

详情

英文摘要

The development of embodied agents for complex commercial environments is hindered by a critical gap in existing robotics datasets and benchmarks, which primarily focus on household or tabletop settings with short-horizon tasks. To address this limitation, we introduce MarketGen, a scalable simulation platform with automatic scene generation for complex supermarket environments. MarketGen features a novel agent-based Procedural Content Generation (PCG) framework. It uniquely supports multi-modal inputs (text and reference images) and integrates real-world design principles to automatically generate complete, structured, and realistic supermarkets. We also provide an extensive and diverse 3D asset library with a total of 1100+ supermarket goods and parameterized facilities assets. Building on this generative foundation, we propose a novel benchmark for assessing supermarket agents, featuring two daily tasks in a supermarket: (1) Checkout Unloading: long-horizon tabletop tasks for cashier agents, and (2) In-Aisle Item Collection: complex mobile manipulation tasks for salesperson agents. We validate our platform and benchmark through extensive experiments, including the deployment of a modular agent system and successful sim-to-real transfer. MarketGen provides a comprehensive framework to accelerate research in embodied AI for complex commercial applications.

URL PDF HTML ☆

赞 0 踩 0

2511.21033 2026-03-06 cs.AI

Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

Linze Chen, Yufan Cai, Zhe Hou, Jin Song Dong

2511.19854 2026-03-06 cs.CV

STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction

Jiankuo Zhao, Xiangyu Zhu, Zidu Wang, Zhen Lei

Comments Accepted to CVPR 2026. Project page: https://jiankuozhao.github.io/STAvatar/

2511.18140 2026-03-06 cs.RO cs.CV cs.LG

Observer-Actor: Active Vision Imitation Learning with Sparse-View Gaussian Splatting

Yilong Wang, Cheng Qian, Ruomeng Fan, Edward Johns

Comments Accepted at ICRA 2026. Project Webpage: https://obact.github.io

2511.17929 2026-03-06 cs.CV cs.AI

MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection

Hui Lu, Yi Yu, Shijian Lu, Deepu Rajan, Boon Poh Ng, Alex C. Kot, Xudong Jiang

2511.14599 2026-03-06 cs.CV cs.AI

CCSD: Cross-Modal Compositional Self-Distillation for Robust Brain Tumor Segmentation with Missing Modalities

Dongqing Xie, Yonghuang Wu, Zisheng Ai, Jun Min, Zhencun Jiang, Shaojin Geng, Lei Wang

Comments 29 pages, 5 figures, 6 tables

2511.11991 2026-03-06 cs.LG

ReCast: Reliability-aware Codebook Assisted Lightweight Time Series Forecasting

Xiang Ma, Taihua Chen, Pengcheng Wang, Xuemei Li, Caiming Zhang

Comments AAAI 2026 Oral

2511.08344 2026-03-06 cs.CV cs.AI cs.HC

SASG-DA: Sparse-Aware Semantic-Guided Diffusion Augmentation For Myoelectric Gesture Recognition

Chen Liu, Can Han, Weishi Xu, Yaqi Wang, Dahong Qian

Comments Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI), 2026

详情

DOI: 10.1109/JBHI.2026.3670388

英文摘要

Surface electromyography (sEMG)-based gesture recognition plays a critical role in human-machine interaction (HMI), particularly for rehabilitation and prosthetic control. However, sEMG-based systems often suffer from the scarcity of informative training data, leading to overfitting and poor generalization in deep learning models. Data augmentation offers a promising approach to increasing the size and diversity of training data, where faithfulness and diversity are two critical factors to effectiveness. However, promoting untargeted diversity can result in redundant samples with limited utility. To address these challenges, we propose a novel diffusion-based data augmentation approach, Sparse-Aware Semantic-Guided Diffusion Augmentation (SASG-DA). To enhance generation faithfulness, we introduce the Semantic Representation Guidance (SRG) mechanism by leveraging fine-grained, task-aware semantic representations as generation conditions. To enable flexible and diverse sample generation, we propose a Gaussian Modeling Semantic Sampling (GMSS) strategy, which models the semantic representation distribution and allows stochastic sampling to produce both faithful and diverse samples. To enhance targeted diversity, we further introduce a Sparse-Aware Semantic Sampling strategy to explicitly explore underrepresented regions, improving distribution coverage and sample utility. Extensive experiments on benchmark sEMG datasets, Ninapro DB2, DB4, and DB7, demonstrate that SASG-DA significantly outperforms existing augmentation methods. Overall, our proposed data augmentation approach effectively mitigates overfitting and improves recognition performance and generalization by offering both faithful and diverse samples.

URL PDF HTML ☆

赞 0 踩 0

2511.04439 2026-03-06 cs.AI cs.LG

CoRPO: Adding a Correctness Bias to GRPO Improves Generalization

Anisha Garg, Claire Zhang, Nishit Neema, David Bick, Ganesh Venkatesh, Joel Hestness

2511.01266 2026-03-06 cs.CV cs.LG

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Shechtman, Xun Huang

Comments ICLR 2026, Project webpage: https://joonghyuk.com/motionstream-web/

2510.22758 2026-03-06 cs.CL

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Li Zhou, Lutong Yu, You Lyu, Yihang Lin, Zefeng Zhao, Junyi Ao, Yuhao Zhang, Benyou Wang, Haizhou Li

Comments Speech Language Models, Spoken Language Understanding, Vocal Cue Perception, Empathetic Dialogue, Benchmark Evaluation; Accepted by ICLR 2026

2510.18876 2026-03-06 cs.CV cs.AI cs.CL

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Haochen Wang, Yuhao Wang, Tao Zhang, Yikang Zhou, Yanwei Li, Jiacong Wang, Jiani Zheng, Ye Tian, Jiahao Meng, Zilong Huang, Guangcan Mai, Anran Wang, Yunhai Tong, Zhuochen Wang, Xiangtai Li, Zhaoxiang Zhang

Comments ICLR 2026 Camera Ready Version

2510.18643 2026-03-06 cs.RO

Least Restrictive Hyperplane Control Barrier Functions

Mattias Trende, Petter Ögren

2510.17276 2026-03-06 cs.LG cs.CR cs.SY eess.SY

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems

Rishi Jha, Harold Triedman, Justin Wagle, Vitaly Shmatikov

2510.16714 2026-03-06 cs.CV cs.AI

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

Xiongkun Linghu, Jiangyong Huang, Ziyu Zhu, Baoxiong Jia, Siyuan Huang

Comments Accepted by ICLR 2026. Project page: https://scenecot.github.io/

2510.16688 2026-03-06 cs.CV cs.AI

Pursuing Minimal Sufficiency in Spatial Reasoning

Yejie Guo, Yunzhong Hou, Wufei Ma, Meng Tang, Ming-Hsuan Yang

2510.14383 2026-03-06 cs.CV

DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights

Danish Ali, Ajmal Mian, Naveed Akhtar, Ghulam Mubashar Hassan

详情

英文摘要

Accurate brain tumor segmentation is significant for clinical diagnosis and treatment but remains challenging due to tumor heterogeneity. Mamba-based State Space Models have demonstrated promising performance. However, despite their computational efficiency over other neural architectures, they incur considerable overhead for this task due to their sequential feature computation across multiple spatial axes. Moreover, their robustness across diverse BraTS data partitions remains largely unexplored, leaving a critical gap in reliable evaluation. To address this, we first propose a dual-resolution bi-directional Mamba (DRBD-Mamba), an efficient 3D segmentation model that captures multi-scale long-range dependencies with minimal computational overhead. We leverage a space-filling curve to preserve spatial locality during 3D-to-1D feature mapping, thereby reducing reliance on computationally expensive multi-axial feature scans. To enrich feature representation, we propose a gated fusion module that adaptively integrates forward and reverse contexts, along with a quantization block that improves robustness. We further propose five systematic folds on BraTS2023 for rigorous evaluation of segmentation techniques under diverse conditions and present analysis of common failure scenarios. On the 20% test set used by recent methods, our model achieves Dice improvements of 0.10% for whole tumor, 1.75% for tumor core, and 0.93% for enhancing tumor. Evaluations on the proposed systematic folds demonstrate that our model maintains competitive whole tumor accuracy while achieving clear average Dice gains of 1.16% for tumor core and 1.68% for enhancing tumor over existing state-of-the-art. Furthermore, our model achieves a 15x efficiency improvement while maintaining high segmentation accuracy, highlighting its robustness and computational advantage over existing methods.

URL PDF HTML ☆

赞 0 踩 0

2510.13900 2026-03-06 cs.CL cs.AI

Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

Julian Minder, Clément Dumas, Stewart Slocum, Helena Casademunt, Cameron Holmes, Robert West, Neel Nanda

Comments ICLR 2026

详情

英文摘要

Finetuning on narrow domains has become an essential tool to adapt Large Language Models (LLMs) to specific tasks and to create models with known unusual properties that are useful for research. We show that narrow finetuning creates strong biases in LLM activations that can be interpreted to understand the finetuning domain. These biases can be discovered using simple tools from model diffing - the study of differences between models before and after finetuning. In particular, analyzing activation differences on the first few tokens of random text and steering by adding this difference to the model activations produces text similar to the format and general content of the finetuning data. We demonstrate that these analyses contain crucial information by creating an LLM-based interpretability agent to understand the finetuning domain. With access to the bias, the agent performs significantly better compared to baseline agents using simple prompting. Our analysis spans synthetic document finetuning for false facts, emergent misalignment, subliminal learning, and taboo word guessing game models across different architectures (Gemma, LLaMA, Qwen) and scales (1B to 32B parameters). We suspect these biases reflect overfitting and find that mixing pretraining data into the finetuning corpus largely removes them, though residual risks may remain. Our work (1) demonstrates that narrowly finetuned models have salient traces of their training objective in their activations and suggests ways to improve how they are trained, (2) warns AI safety and interpretability researchers that the common practice of using such models as a proxy for studying broader finetuning (e.g., chat-tuning) might not be realistic, and (3) highlights the need for deeper investigation into the effects of narrow finetuning and development of truly realistic case studies for model-diffing, safety and interpretability research.

URL PDF HTML ☆

赞 0 踩 0

2510.13454 2026-03-06 cs.CV

Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator

Hyojun Go, Dominik Narnhofer, Goutam Bhat, Prune Truong, Federico Tombari, Konrad Schindler

Comments ICLR 2026 (Oral), Project page: https://gohyojun15.github.io/VIST3A/

2510.13063 2026-03-06 cs.CV cs.AI cs.LG

True Self-Supervised Novel View Synthesis is Transferable

Thomas W. Mitchel, Hyunwoo Ryu, Vincent Sitzmann

2510.12670 2026-03-06 cs.CV

TerraCodec: Compressing Optical Earth Observation Data

Julen Costa-Watanabe, Isabelle Wittmann, Benedikt Blumenstiel, Konrad Schindler

2510.10689 2026-03-06 cs.AI

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Caorui Li, Yu Chen, Yiyan Ji, Jin Xu, Zhenyu Cui, Shihao Li, Yuanxing Zhang, Wentao Wang, Zhenghao Song, Dingling Zhang, Ying He, Haoxiang Liu, Yuxuan Wang, Qiufeng Wang, Jiafu Tang, Zhenhe Wu, Jiehui Luo, Zhiyu Pan, Weihao Xie, Chenchen Zhang, Zhaohui Wang, Jiayi Tian, Yanghai Wang, Zhe Cao, Minxin Dai, Ke Wang, Runzhe Wen, Yinghao Ma, Yaning Pan, Sungkyun Chang, Termeh Taheri, Haiwen Xia, Christos Plachouras, Emmanouil Benetos, Yizhi Li, Ge Zhang, Jian Yang, Tianhao Peng, Zili Wang, Minghao Liu, Junran Peng, Zhaoxiang Zhang, Jiaheng Liu

2510.10539 2026-03-06 cs.CL

Detecting Hallucinations in Authentic LLM-Human Interactions

Yujie Ren, Niklas Gruhlke, Anne Lauscher

Comments Accepted to LREC 2026

2510.08966 2026-03-06 cs.AI cs.CL

Beyond Prefixes: Graph-as-Memory Cross-Attention for Knowledge Graph Completion with Large Language Models

Ruitong Liu, Boxu Lin, Peize Li, Siyuan Li, Yunjia Wu, Te Sun, Chaohan Wu

2510.07093 2026-03-06 cs.LG stat.ML

Non-Asymptotic Analysis of Efficiency in Conformalized Regression

Yunzhen Yao, Lie He, Michael Gastpar

Comments Published as a conference paper at ICLR 2026

2510.06068 2026-03-06 cs.RO cs.AI

MachaGrasp: Morphology-Aware Cross-Embodiment Dexterous Hand Articulation Generation for Grasping

Heng Zhang, Kevin Yuchen Ma, Mike Zheng Shou, Weisi Lin, Yan Wu

2510.04739 2026-03-06 cs.CV cs.MM

ExposureEngine: Oriented Logo Detection and Sponsor Visibility Analytics in Sports Broadcasts

Mehdi Houshmand Sarkhoosh, Frøy Øye, Henrik Nestor Sørlie, Nam Hoang Vu, Dag Johansen, Cise Midoglu, Tomas Kupka, Pål Halvorsen

Comments This work has been submitted to the IEEE for possible publication

详情

DOI: 10.1109/ISM66958.2025.00059

英文摘要

Quantifying sponsor visibility in sports broadcasts is a critical marketing task traditionally hindered by manual, subjective, and unscalable analysis methods. While automated systems offer an alternative, their reliance on axis-aligned Horizontal Bounding Box (HBB) leads to inaccurate exposuremetrics when logos appear rotated or skewed due to dynamic camera angles and perspective distortions. This paper introduces ExposureEngine, an end-to-end system designed for accurate, rotation-aware sponsor visibility analytics in sports broadcasts, demonstrated in a soccer case study. Our approach predicts Oriented Bounding Box (OBB) to provide a geometrically precise fit to each logo regardless of the orientation on-screen. To train and evaluate our detector, we developed a new dataset comprising 1,103 frames from Swedish elite soccer, featuring 670 unique sponsor logos annotated with OBBs. Our model achieves a mean Average Precision (mAP@0.5) of 0.859, with a precision of 0.96 and recall of 0.87, demonstrating robust performance in localizing logos under diverse broadcast conditions. The system integrates these detections into an analytical pipeline that calculates precise visibility metrics, such as exposure duration and on-screen coverage. Furthermore, we incorporate a language-driven agentic layer, enabling users to generate reports, summaries, and media content through natural language queries. The complete system, including the dataset and the analytics dashboard, provides a comprehensive solution for auditable and interpretable sponsor measurement in sports media. An overview of the ExposureEngine is available online: https://youtu.be/tRw6OBISuW4 .

URL PDF HTML ☆

赞 0 踩 0

2510.03885 2026-03-06 cs.RO

Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning

Sunghwan Kim, Woojeh Chung, Zhirui Dai, Dwait Bhatt, Arth Shukla, Hao Su, Yulun Tian, Nikolay Atanasov

Comments ICRA 2026, project page: https://existentialrobotics.org/sbp_page/

2510.03160 2026-03-06 cs.CV cs.AI

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus

Ming Zhao, Wenhui Dong, Yang Zhang, Xiang Zheng, Zhonghao Zhang, Zian Zhou, Yunzhi Guan, Liukun Xu, Wei Peng, Zhaoyang Gong, Zhicheng Zhang, Dachuan Li, Xiaosheng Ma, Yuli Ma, Jianing Ni, Changjiang Jiang, Lixia Tian, Qixin Chen, Kaishun Xia, Pingping Liu, Tongshun Zhang, Zhiqiang Liu, Zhongyan Bi, Chenyang Si, Tiansheng Sun, Caifeng Shan

2510.02282 2026-03-06 cs.CV cs.LG

VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

Kyoungjun Park, Yifan Yang, Juheon Yi, Shicheng Zheng, Yifei Shen, Dongqi Han, Caihua Shan, Muhammad Muaz, Lili Qiu

Comments Accepted to ICLR 2026