arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.18757 2026-03-20 cs.CV

DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection

Haochen Li, Rui Zhang, Hantao Yao, Xin Zhang, Yifan Hao, Shaohui Peng, Yongwei Zhao, Ling Li

Comments Accepted by CVPR 2026

详情

英文摘要

Domain Adaptive Object Detection (DAOD) aims to transfer detectors from a labeled source domain to an unlabeled target domain. Existing DAOD methods employ multi-granularity feature alignment to learn domain-invariant representations. However, the local connectivity of their CNN-based backbone and detection head restricts alignment to local regions, failing to extract global domain-invariant features. Although transformer-based DAOD methods capture global dependencies via attention mechanisms, their quadratic computational cost hinders practical deployment. To solve this, we propose DA-Mamba, a hybrid CNN-State Space Models (SSMs) architecture that combines the efficiency of CNNs with the linear-time long-range modeling capability of State Space Models (SSMs) to capture both global and local domain-invariant features. Specifically, we introduce two novel modules: Image-Aware SSM (IA-SSM) and Object-Aware SSM (OA-SSM). IA-SSM is integrated into the backbone to enhance global domain awareness, enabling image-level global and local alignment. OA-SSM is inserted into the detection head to model spatial and semantic dependencies among objects, enhancing instance-level alignment. Comprehensive experiments demonstrate that the proposed method can efficiently improve the cross-domain performance of the object detector.

URL PDF HTML ☆

赞 0 踩 0

2603.18756 2026-03-20 cs.LG cs.AI cs.CL

Are complicated loss functions necessary for teaching LLMs to reason?

Gabriele Carrino, Andrea Sassella, Nicolo Brunello, Federico Toschi, Mark James Carman

2603.18752 2026-03-20 cs.CV cs.AI

WeNLEX: Weakly Supervised Natural Language Explanations for Multilabel Chest X-ray Classification

Isabel Rio-Torto, Jaime S. Cardoso, Luís F. Teixeira

2603.18750 2026-03-20 cs.CL

Automatic detection of Gen-AI texts: A comparative framework of neural models

Cristian Buttaro, Irene Amerini

2603.18746 2026-03-20 cs.RO

ROFT-VINS: Robust Feature Tracking-based Visual-Inertial State Estimation for Harsh Environment

Sanghyun Park, Soohee Han

Comments 6 pages, published ICCAS 2024

2603.18743 2026-03-20 cs.AI cs.CL cs.LG

Memento-Skills: Let Agents Design Agents

Huichi Zhou, Siyuan Guo, Anjie Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, Runyu Yang, Qiangbin Liu, Xinlei Yu, Jianmin Zhou, Na Wang, Chunyang Sun, Jun Wang

Comments Memento-Skills Technical Report

2603.18742 2026-03-20 cs.CV

6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models

Rundong Su, Jintao Zhang, Zhihang Yuan, Haojie Duanmu, Jianfei Chen, Jun Zhu

2603.18736 2026-03-20 cs.LG cs.AI cs.CL stat.ML

CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks

Hao Wang, Licheng Pan, Zhichao Chen, Chunyuan Zheng, Zhixuan Chu, Xiaoxi Li, Yuan Lu, Xinggao Liu, Haoxuan Li, Zhouchen Lin

2603.18729 2026-03-20 cs.AI

Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures

Martina Ullasci, Marco Rondina, Riccardo Coppola, Flavio Giobergia, Riccardo Bellanca, Gabriele Mancari Pasi, Luca Prato, Federico Spinoso, Silvia Tagliente

详情

英文摘要

Many works in the literature show that LLM outputs exhibit discriminatory behaviour, triggering stereotype-based inferences based on the dialect in which the inputs are written. This bias has been shown to be particularly pronounced when the same inputs are provided to LLMs in Standard American English (SAE) and African-American English (AAE). In this paper, we replicate existing analyses of dialect-sensitive stereotype generation in LLM outputs and investigate the effects of mitigation strategies, including prompt engineering (role-based and Chain-Of-Thought prompting) and multi-agent architectures composed of generate-critique-revise models. We define eight prompt templates to analyse different ways in which dialect bias can manifest, such as suggested names, jobs, and adjectives for SAE or AAE speakers. We use an LLM-as-judge approach to evaluate the bias in the results. Our results show that stereotype-bearing differences emerge between SAE- and AAE-related outputs across all template categories, with the strongest effects observed in adjective and job attribution. Baseline disparities vary substantially by model, with the largest SAE-AAE differential observed in Claude Haiku and the smallest in Phi-4 Mini. Chain-Of-Thought prompting proved to be an effective mitigation strategy for Claude Haiku, whereas the use of a multi-agent architecture ensured consistent mitigation across all the models. These findings suggest that for intersectionality-informed software engineering, fairness evaluation should include model-specific validation of mitigation strategies, and workflow-level controls (e.g., agentic architectures involving critique models) in high-impact LLM deployments. The current results are exploratory in nature and limited in scope, but can lead to extensions and replications by increasing the dataset size and applying the procedure to different languages or dialects.

URL PDF HTML ☆

赞 0 踩 0

2603.18718 2026-03-20 cs.AI

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Minhua Lin, Zhiwei Zhang, Hanqing Lu, Hui Liu, Xianfeng Tang, Qi He, Xiang Zhang, Suhang Wang

Comments 23 pages, 5 figures

2603.18712 2026-03-20 cs.AI

Accurate and Efficient Multi-Channel Time Series Forecasting via Sparse Attention Mechanism

Lei Gao, Hengda Bao, Jingfei Fang, Guangzheng Wu, Weihua Zhou, Yun Zhou

Comments Accepted by ICDE 2026

2603.18707 2026-03-20 cs.LG cs.CV cs.GR

From ex(p) to poly: Gaussian Splatting with Polynomial Kernels

Joerg H. Mueller, Martin Winter, Markus Steinberger

2603.18697 2026-03-20 cs.LG

OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

Chen Sun, Beilin Xu, Boheng Tan, Jiacheng Wang, Yuefeng Sun, Rite Bo, Ying He, Yaqiang Zang, Pinghua Gong

Comments 5 pages, 4 figures

2603.18688 2026-03-20 cs.LG cs.CL

STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain Distillation

Chen Zhang, Liwei Liu, Jun Tao, Xiaoyu Yang, Xuenan Xu, Kai Chen, Bowen Zhou, Wen Wu, Chao Zhang

2603.18683 2026-03-20 cs.LG cs.AI cs.CL

HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning

Zhicong Lu, Zichuan Lin, Wei Jia, Changyuan Tian, Deheng Ye, Peiguang Li, Li Jin, Nayu Liu, Guangluan Xu, Wei Feng

Comments Submitted to ACL 2026 on Jan 5, 2026

2603.18680 2026-03-20 cs.LG cs.CR

Revisiting Label Inference Attacks in Vertical Federated Learning: Why They Are Vulnerable and How to Defend

Yige Liu, Dexuan Xu, Zimai Guo, Yongzhi Cao, Hanpin Wang

2603.18678 2026-03-20 cs.SD cs.CL

Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models

Yuchen Su, Shaoxin Zhong, Yonghua Zhu, Ruofan Wang, Zijian Huang, Qiqi Wang, Na Zhao, Diana Benavides-Prado, Michael Witbrock

Comments The paper is currently under review

2603.18676 2026-03-20 cs.AI

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

Zuher Jahshan, Ben Ben Ishay, Leonid Yavits

详情

英文摘要

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract concepts and an Abstract Conceptual Representation (ACR). The architecture follows a two-stage logic that maps directly to GWT mechanics: (i) an integration phase, where retrieved memory concepts converge to form a collective "mental image" (the ACR) based on input stimuli; and (ii) a broadcasting phase, where this global state navigates and informs the contextualization of individual local tokens. We demonstrate that efficient linear-time scaling is a fundamental architectural byproduct of instantiating GWT functional bottleneck, as routing global information through a constant-sized ACR resolves the quadratic complexity inherent in standard attention. MANAR is a compatible re-parameterization of MHA with identical semantic roles for its projections, enabling knowledge transfer from pretrained transformers via weight-copy and thus overcoming the adoption barriers of structurally incompatible linear-time alternatives. MANAR enables non-convex contextualization, synthesizing representations that provably lie outside the convex hull of input tokens - a mathematical reflection of the creative synthesis described in GWT. Empirical evaluations confirm that MANAR matches or exceeds strong baselines across language (GLUE score of 85.1), vision (83.9% ImageNet-1K), and speech (2.7% WER on LibriSpeech), positioning it as an efficient and expressive alternative to quadratic attention.

URL PDF HTML ☆

赞 0 踩 0

2603.18671 2026-03-20 cs.CV

Towards High-Quality Image Segmentation: Improving Topology Accuracy by Penalizing Neighbor Pixels

Juan Miguel Valverde, Dim P. Papadopoulos, Rasmus Larsen, Anders Bjorholm Dahl

Comments Accepted to CVPR 2026

2603.18669 2026-03-20 cs.RO

CSSDF-Net: Safe Motion Planning Based on Neural Implicit Representations of Configuration Space Distance Field

Haohua Chen, Yixuan Zhou, Yifan Zhou, Hesheng Wang

2603.18662 2026-03-20 cs.AI

Thinking with Constructions: A Benchmark and Policy Optimization for Visual-Text Interleaved Geometric Reasoning

Haokun Zhao, Wanshi Xu, Haidong Yuan, Songjun Cao, Long Ma, Yanghua Xiao

2603.18660 2026-03-20 cs.CV

Multimodal Model for Computational Pathology:Representation Learning and Image Compression

Peihang Wu, Zehong Chen, Lijian Xu

2603.18657 2026-03-20 cs.LG

Enhancing Multi-Corpus Training in SSL-Based Anti-Spoofing Models: Domain-Invariant Feature Extraction

Anh-Tuan Dao, Driss Matrouf, Mickael Rouvier, Nicholas Evans

2603.18656 2026-03-20 cs.AI

Balanced Thinking: Improving Chain of Thought Training in Vision Language Models

Shaked Perek, Ben Wiesel, Avihu Dekel, Nimrod Shabtay, Eli Schwartz

2603.18655 2026-03-20 cs.CV cs.AI

Multiscale Switch for Semi-Supervised and Contrastive Learning in Medical Ultrasound Image Segmentation

Jingguo Qu, Xinyang Han, Yao Pu, Man-Lik Chui, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying

Comments This is the author-submitted LaTeX version with original typesetting. The final published version (with IEEE production formatting and layout changes) is available at http://doi.org/10.1109/TNNLS.2026.3669814 under CC BY 4.0 license

2603.18649 2026-03-20 cs.CV

Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA

Ruizhi Yu, Keyang Zhong, Peng Liu, Qi Wu, Haoran Zhang, Yanhao Zhang, Chen Chen, Haonan Lu

Comments 4 pages, 2 figures, Accepted at WWW2026 Demos

2603.18645 2026-03-20 cs.CV

MeInTime: Bridging Age Gap in Identity-Preserving Face Restoration

Teer Song, Yue Zhang, Yu Tian, Ziyang Wang, Xianlin Zhang, Guixuan Zhang, Xuan Liu, Xueming Li, Yasen Zhang

2603.18642 2026-03-20 cs.LG

Evaluating Model-Free Policy Optimization in Masked-Action Environments via an Exact Blackjack Oracle

Kevin Song

Comments 23 pages, 2 figures, 3 tables, 6 supplementary figures

2603.18641 2026-03-20 cs.CL

A Comparative Empirical Study of Catastrophic Forgetting Mitigation in Sequential Task Adaptation for Continual Natural Language Processing Systems

Aram Abrahamyan, Sachin Kumar

2603.18631 2026-03-20 cs.AI

D-Mem: A Dual-Process Memory System for LLM Agents

Zhixing You, Jiachen Yuan, Jason Cai