arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.00184 2026-03-11 cs.CV cs.AI

Zero-Shot and Supervised Bird Image Segmentation Using Foundation Models: A Dual-Pipeline Approach with Grounding DINO~1.5, YOLOv11, and SAM~2.1

Abhinav Munagala

详情

DOI: 10.13140/RG.2.2.31779.11043

英文摘要

Bird image segmentation remains a challenging task in computer vision due to extreme pose diversity, complex plumage patterns, and variable lighting conditions. This paper presents a dual-pipeline framework for binary bird image segmentation leveraging 2025 foundation models. We introduce two operating modes built upon Segment Anything Model 2.1 (SAM 2.1) as a shared frozen backbone: (1) a zero-shot pipeline using Grounding DINO 1.5 to detect birds via the text prompt "bird" before prompting SAM 2.1 with bounding boxes requiring no labelled bird data; and (2) a supervised pipeline that fine-tunes YOLOv11 on the CUB-200-2011 dataset for high-precision detection, again prompting SAM 2.1 for pixel-level masks. The segmentation model is never retrained for new species or domains. On CUB-200-2011 (11,788 images, 200 species), the supervised pipeline achieves IoU 0.912, Dice 0.954, and F1 0.953 outperforming all prior baselines including SegFormer-B2 (IoU 0.842) by +7.0 percentage points. The zero-shot pipeline achieves IoU 0.831 using only a text prompt, the first such result reported on this benchmark. We demonstrate that prompt-based foundation model pipelines outperform task specific end-to-end trained segmentation networks, while requiring only lightweight detector fine-tuning (~1 hour) for domain adaptation. Complete PyTorch implementation, dataset preparation scripts, and trained weights are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2603.00124 2026-03-11 cs.CV cs.AI

OrthoAI: A Neurosymbolic Framework for Evidence-Grounded Biomechanical Reasoning in Clear Aligner Orthodontics

Edouard Lansiaux, Margaux Leman, Mehdi Ammi

2603.00045 2026-03-11 cs.LG cs.AI

Breaking the Factorization Barrier in Diffusion Language Models

Ian Li, Zilei Shao, Benjie Wang, Rose Yu, Guy Van den Broeck, Anji Liu

2602.24235 2026-03-11 cs.RO cs.AI

SafeGen-LLM: Enhancing Safety Generalization in Task Planning for Robotic Systems

Jialiang Fan, Weizhe Xu, Mengyu Liu, Oleg Sokolsky, Insup Lee, Fanxin Kong

Comments 12 pages, 6 figures

2602.19708 2026-03-11 cs.CV

ChimeraLoRA: Multi-Head LoRA-Guided Synthetic Datasets

Hoyoung Kim, Minwoo Jang, Jabin Koo, Sangdoo Yun, Jungseul Ok

2602.18406 2026-03-11 cs.CV cs.LG

Latent Equivariant Operators for Robust Object Recognition: Promises and Challenges

Minh Dinh, Stéphane Deny

Comments Version accepted at GrAM Workshop of ICLR 2026, Tiny Paper Track

2602.18057 2026-03-11 cs.CV

Temporal Consistency-Aware Text-to-Motion Generation

Hongsong Wang, Wenjing Yan, Qiuxia Lai, Xin Geng

Comments Code is on https://github.com/Giat995/TCA-T2M/

2602.17174 2026-03-11 cs.LG cs.AI cs.SY eess.SY

Continual uncertainty learning

Heisei Yonezawa, Ansei Yonezawa, Itsuro Kajiwara

详情

英文摘要

Robust control of mechanical systems with multiple uncertainties remains a fundamental challenge, particularly when nonlinear dynamics and operating-condition variations are intricately intertwined. Although deep reinforcement learning (DRL) combined with domain randomization has shown promise in mitigating the sim-to-real gap, simultaneously handling all the sources of uncertainty often leads to sub-optimal policies and poor learning efficiency. This study formulates a new curriculum-based continual learning framework for robust control problems involving nonlinear dynamical systems in which multiple sources of uncertainty are simultaneously superimposed. The key idea is to decompose a complex control problem with multiple uncertainties into a sequence of continual learning tasks, in which the strategies for handling each uncertainty are acquired sequentially. The original system is extended into a finite set of plants whose dynamic uncertainties are gradually expanded and diversified as learning progresses. The policy is stably updated across the entire plant sets associated with tasks defined by different uncertainty configurations without catastrophic forgetting. To ensure high learning efficiency, we jointly incorporate a model-based controller (MBC), which guarantees a shared baseline performance across the plant sets, into the learning process in order to accelerate the convergence. This residual learning scheme facilitates task-specific optimization of the DRL agent for each uncertainty, thereby enhancing sample efficiency. Finally, this study adopts the proposed method to design an active vibration controller for automotive powertrains as a practical industrial application. We verify that the resulting controller is robust against structural nonlinearities and dynamic variations; thus, it can realize successful sim-to-real transfer.

URL PDF HTML ☆

赞 0 踩 0

2602.15971 2026-03-11 cs.LG cs.AI cs.CV cs.NE

B-DENSE: Branching For Dense Ensemble Network Supervision Efficiency

Cherish Puniani, Tushar Kumar, Arnav Bendre, Gaurav Kumar, Shree Singhi

Comments 11 pages, 5 figures, 4 algorithms and 2 tables. ICLR DeLTa 2026

2602.13015 2026-03-11 cs.CV

Multimodal Classification via Total Correlation Maximization

Feng Yu, Xiangyu Wu, Yang Yang, Jianfeng Lu

Comments Accepted for publication at ICLR 2026; 19 pages; 2 figures

2602.08707 2026-03-11 cs.AI cs.CY cs.HC

Why do we Trust Chatbots? From Normative Principles to Behavioral Drivers

Aditya Gulati, Nuria Oliver

Comments Accepted at the CHI 2026 Workshop on "Understanding, Mitigating, and Leveraging Cognitive Biases to Calibrate Trust in Evolving AI Systems" (https://chi-bias-trust.github.io/)

2602.08220 2026-03-11 cs.CL

Pretraining with Token-Level Adaptive Latent Chain-of-Thought

Boyi Zeng, Yiqin Hao, He Li, Shixiang Song, Feichen Song, Zitong Wang, Siyuan Huang, Yi Xu, ZiWei He, Xinbing Wang, Zhouhan Lin

Comments 15pages

2602.06811 2026-03-11 cs.RO

A 26-Gram Butterfly-Inspired Robot Achieving Autonomous Tailless Flight

Weibin Gu, Chenrui Feng, Lian Liu, Chen Yang, Xingchi Jiao, Yuhe Ding, Xiaofei Shi, Chao Gao, Alessandro Rizzo, Guyue Zhou

2602.05871 2026-03-11 cs.CV

Pathwise Test-Time Correction for Autoregressive Long Video Generation

Xunzhi Xiang, Zixuan Duan, Guiyu Zhang, Haiyu Zhang, Zhe Gao, Junta Wu, Shaofeng Zhang, Tengfei Wang, Qi Fan, Chunchao Guo

2602.05630 2026-03-11 cs.LG cs.CL

Rewards as Labels: Revisiting RLVR from a Classification Perspective

Zepeng Zhai, Meilin Chen, Jiaxuan Zhao, Junlang Qian, Lei Shen, Yuan Lu

2602.02952 2026-03-11 cs.AI

UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers

Elias Hossain, Shubhashis Roy Dipta, Subash Neupane, Rajib Rana, Ravid Shwartz-Ziv, Ivan Garibay, Niloofar Yousefi

2602.02471 2026-03-11 cs.CV cs.AI physics.med-ph

Multi-head automated segmentation by incorporating detection head into the contextual layer neural network

Edwin Kys, Febian Febian

Comments 8 pages, 3 figures, 1 table

详情

DOI: 10.33140/OAJAST.04.01.10
Journal ref: OA J Applied Sci Technol, 4(1), 01-07 (2026)

英文摘要

Deep learning based auto segmentation is increasingly used in radiotherapy, but conventional models often produce anatomically implausible false positives, or hallucinations, in slices lacking target structures. We propose a gated multi-head Transformer architecture based on Swin U-Net, augmented with inter-slice context integration and a parallel detection head, which jointly performs slice-level structure detection via a multi-layer perceptron and pixel-level segmentation through a context-enhanced stream. Detection outputs gate the segmentation predictions to suppress false positives in anatomically invalid slices, and training uses slice-wise Tversky loss to address class imbalance. Experiments on the Prostate-Anatomical-Edge-Cases dataset from The Cancer Imaging Archive demonstrate that the gated model substantially outperforms a non-gated segmentation-only baseline, achieving a mean Dice loss of $0.013 \pm 0.036$ versus $0.732 \pm 0.314$, with detection probabilities strongly correlated with anatomical presence, effectively eliminating spurious segmentations. In contrast, the non-gated model exhibited higher variability and persistent false positives across all slices. These results indicate that detection-based gating enhances robustness and anatomical plausibility in automated segmentation applications, reducing hallucinated predictions without compromising segmentation quality in valid slices, and offers a promising approach for improving the reliability of clinical radiotherapy auto-contouring workflows.

URL PDF HTML ☆

赞 0 踩 0

2601.22607 2026-03-11 cs.AI cs.CL

From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents

Jiaxuan Gao, Jiaao Chen, Chuyi He, Shusheng Xu, Di Jin, Yi Wu

Comments Submitted to ICML 2026

2601.20601 2026-03-11 cs.CV cs.AI

CLEAR-Mamba:Towards Accurate, Adaptive and Trustworthy Multi-Sequence Ophthalmic Angiography Classification

Zhuonan Wang, Wenjie Yan, Wenqiao Zhang, Xiaohui Song, Jian Ma, Ke Yao, Yibo Yu, Beng Chin Ooi

Comments 12 pages, 7 figures

详情

英文摘要

Medical image classification is a core task in computer-aided diagnosis (CAD), playing a pivotal role in early disease detection, treatment planning, and patient prognosis assessment. In ophthalmic practice, fluorescein fundus angiography (FFA) and indocyanine green angiography (ICGA) provide hemodynamic and lesion-structural information that conventional fundus photography cannot capture. However, due to the single-modality nature, subtle lesion patterns, and significant inter-device variability, existing methods still face limitations in generalization and high-confidence prediction. To address these challenges, we propose CLEAR-Mamba, an enhanced framework built upon MedMamba with optimizations in both architecture and training strategy. Architecturally, we introduce HaC, a hypernetwork-based adaptive conditioning layer that dynamically generates parameters according to input feature distributions, thereby improving cross-domain adaptability. From a training perspective, we develop RaP, a reliability-aware prediction scheme built upon evidential uncertainty learning, which encourages the model to emphasize low-confidence samples and improves overall stability and reliability. We further construct a large-scale ophthalmic angiography dataset covering both FFA and ICGA modalities, comprising multiple retinal disease categories for model training and evaluation. Experimental results demonstrate that CLEAR-Mamba consistently outperforms multiple baseline models, including the original MedMamba, across various metrics-showing particular advantages in multi-disease classification and reliability-aware prediction. This study provides an effective solution that balances generalizability and reliability for modality-specific medical image classification tasks. Our project can be accessed at https://github.com/ZJU4HealthCare/CLEAR-Mamba.

URL PDF HTML ☆

赞 0 踩 0

2601.12667 2026-03-11 cs.AI

Empowering All-in-Loop Health Management of Spacecraft Power System in the Mega-Constellation Era via Human-AI Collaboration

Yi Di, Zhibin Zhao, Fujin Wang, Xue Liu, Jiafeng Tang, Jiaxin Ren, Zhi Zhai, Xuefeng Chen

详情

英文摘要

It is foreseeable that the number of spacecraft will increase exponentially, ushering in an era dominated by satellite mega-constellations (SMC). This necessitates a focus on energy in space: spacecraft power systems (SPS), especially their health management (HM), given their role in power supply and high failure rates. Providing health management for dozens of SPS and for thousands of SPS represents two fundamentally different paradigms. Therefore, to adapt the health management in the SMC era, this work proposes a principle of aligning underlying capabilities (AUC principle) and develops SpaceHMchat, an open-source Human-AI collaboration (HAIC) framework for all-in-loop health management (AIL HM). SpaceHMchat serves across the entire loop of work condition recognition, anomaly detection, fault localization, and maintenance decision making, achieving goals such as conversational task completion, adaptive human-in-the-loop learning, personnel structure optimization, knowledge sharing, efficiency enhancement, as well as transparent reasoning and improved interpretability. Meanwhile, to validate this exploration, a hardware-realistic fault injection experimental platform is established, and its simulation model is built and open-sourced, both fully replicating the real SPS. The corresponding experimental results demonstrate that SpaceHMchat achieves excellent performance across 23 quantitative metrics, such as 100% conclusion accuracy in logical reasoning of work condition recognition, over 99% success rate in anomaly detection tool invocation, over 90% precision in fault localization, and knowledge base search time under 3 minutes in maintenance decision-making. Another contribution of this work is the release of the first-ever AIL HM dataset of SPS. This dataset contains four sub-datasets, involving 4 types of AIL HM sub-tasks, 17 types of faults, and over 700,000 timestamps.

URL PDF HTML ☆

赞 0 踩 0

2601.04664 2026-03-11 cs.CL cs.AI

CRANE: Causal Relevance Analysis of Language-Specific Neurons in Multilingual Large Language Models

Yifan Le, Yunliang Li

Comments 10 pages, 6 figures. Work in progress

2512.24146 2026-03-11 cs.CV

Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

Chubin Chen, Sujie Hu, Jiashu Zhu, Meiqi Wu, Jintao Chen, Yanxun Li, Nisha Huang, Chengyu Fang, Jiahong Wu, Xiangxiang Chu, Xiu Li

Comments Accepted by CVPR 2026

2512.23851 2026-03-11 cs.CV

Pretraining Frame Preservation for Lightweight Autoregressive Video History Embedding

Lvmin Zhang, Shengqu Cai, Muyang Li, Chong Zeng, Beijia Lu, Anyi Rao, Song Han, Gordon Wetzstein, Maneesh Agrawala

Comments Additional Results: https://lllyasviel.github.io/pfp_gitpage/

2512.22247 2026-03-11 cs.LG

The Affine Divergence: Aligning Activation Updates Beyond Normalisation

George Bird

Comments 30 pages, 10 figures. Accepted for submission to the ICLR 2026 Workshop on "Geometry-grounded Representation Learning and Generative Modeling"

2512.21064 2026-03-11 cs.CV

Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition

Hongsong Wang, Heng Fei, Bingxuan Dai, Jie Gui

Comments Accepted by Machine Intelligence Research (Journal Impact Factor 8.7, 2024)

2512.17776 2026-03-11 cs.CL

DEER: A Benchmark for Evaluating Deep Research Agents on Expert Report Generation

Janghoon Han, Heegyu Kim, Changho Lee, Dahm Lee, Min Hyung Park, Hosung Song, Stanley Jungkyu Choi, Moontae Lee, Honglak Lee

Comments 39 pages, 10 figures, 16 tables, 123 references

2512.17102 2026-03-11 cs.AI

Reinforcement Learning for Self-Improving Agent with Skill Library

Jiongxiao Wang, Qiaojing Yan, Yawei Wang, Yijun Tian, Soumya Smruti Mishra, Zhichao Xu, Megha Gandhi, Panpan Xu, Lin Lee Cheong

2512.13672 2026-03-11 cs.LG cs.CV

Directional Textual Inversion for Personalized Text-to-Image Generation

Kunhee Kim, NaHyeon Park, Kibeom Hong, Hyunjung Shim

Comments ICLR 2026; Project page: https://kunheek.github.io/dti

2512.13095 2026-03-11 cs.CV cs.LG

ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning

Feng Zhang, Zezhong Tan, Xinhong Ma, Ziqiang Dong, Xi Leng, Jianfei Zhao, Xin Sun, Yang Yang

2512.11609 2026-03-11 cs.RO

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Tingyu Yuan, Biaoliang Guan, Wen Ye, Ziyan Tian, Yi Yang, Weijie Zhou, Zhaowen Li, Yan Huang, Peng Wang, Chaoyang Zhao, Jinqiao Wang