arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.02772 2026-04-06 cs.CL

Multiple-Debias: A Full-process Debiasing Method for Multilingual Pre-trained Language Models

Haoyu Liang, Peijian Zeng, Wentao Huang, Aimin Yang, Dong Zhou

详情

英文摘要

Multilingual Pre-trained Language Models (MPLMs) have become essential tools for natural language processing. However, they often exhibit biases related to sensitive attributes such as gender, race, and religion. In this paper, we introduce a comprehensive multilingual debiasing method named Multiple-Debias to address these issues across multiple languages. By incorporating multilingual counterfactual data augmentation and multilingual Self-Debias across both pre-processing and post-processing stages, alongside parameter-efficient fine-tuning, we significantly reduced biases in MPLMs across three sensitive attributes in four languages. We also extended CrowS-Pairs to German, Spanish, Chinese, and Japanese, validating our full-process multilingual debiasing method for gender, racial, and religious bias. Our experiments show that (i) multilingual debiasing methods surpass monolingual approaches in effectively mitigating biases, and (ii) integrating debiasing information from different languages notably improves the fairness of MPLMs.

URL PDF HTML ☆

赞 0 踩 0

2604.02770 2026-04-06 cs.AI

Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity

Guoling Zhou, Wenpei Han, Fengqin Yang, Li Wang, Yingcong Zhou, Zhiguo Fu

2604.02766 2026-04-06 cs.LG cs.AI

Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs

Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh

Comments first commit

2604.02765 2026-04-06 cs.LG

Towards Realistic Class-Incremental Learning with Free-Flow Increments

Zhiming Xu, Baile Xu, Jian Zhao, Furao Shen, Suorong Yang

Comments 15pages, 5figures, 3 tables

2604.02764 2026-04-06 cs.CV

InverseDraping: Recovering Sewing Patterns from 3D Garment Surfaces via BoxMesh Bridging

Leyang Jin, Zirong Jin, Zisheng Ye, Haokai Pang, Xiaoguang Han, Yujian Zheng, Hao Li

Comments 13 pages, 13 figures

2604.02759 2026-04-06 cs.RO

OMNI-PoseX: A Fast Vision Model for 6D Object Pose Estimation in Embodied Tasks

Michael Zhang, Wei Ying, Fangwen Chen, Shifeng Bai, Hanwen Kang

2604.02752 2026-04-06 cs.CV

Differentiable Stroke Planning with Dual Parameterization for Efficient and High-Fidelity Painting Creation

Jinfan Liu, Wuze Zhang, Zhangli Hu, Zhehan Zhao, Ye Chen, Bingbing Ni

2604.02748 2026-04-06 cs.CV

Visual Instruction-Finetuned Language Model for Versatile Brain MR Image Tasks

Jonghun Kim, Sinyoung Ra, Hyunjin Park

Comments ICPR 2026 accepted

2604.02745 2026-04-06 cs.RO

Geometrically-Constrained Radar-Inertial Odometry via Continuous Point-Pose Uncertainty Modeling

Wooseong Yang, Dongjae Lee, Minwoo Jung, Ayoung Kim

Comments 8 pages, 8 figures, 6 tables, accepted to RA-L

2604.02744 2026-04-06 cs.RO

Learning Locomotion on Complex Terrain for Quadrupedal Robots with Foot Position Maps and Stability Rewards

Matthew Hwang, Yubin Liu, Ryo Hakoda, Takeshi Oishi

Comments Project page located at https://mhwang003.github.io/footmaplocomotion/

2604.02734 2026-04-06 cs.AI

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

Bin Wen, Ruoxuan Zhang, Yang Chen, Hongxia Xie, Lan-Zhe Guo

2604.02733 2026-04-06 cs.AI

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

Amit Dhanda

Comments ICLR 2026 Workshop on Logical Reasoning of Large Language Models

2604.02721 2026-04-06 cs.AI

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

DeepReinforce Team, Xiaoya Li, Xiaofei Sun, Guoyin Wang, Songqiao Su, Chris Shum, Jiwei Li

Comments Tech Report; Pre-print

2604.02719 2026-04-06 cs.CV cs.AI cs.LG

MOMO: Mars Orbital Model Foundation Model for Mars Orbital Applications

Mirali Purohit, Bimal Gajera, Irish Mehta, Bhanu Tokas, Jacob Adler, Steven Lu, Scott Dickenshied, Serina Diniega, Brian Bue, Umaa Rebbapragada, Hannah Kerner

Comments Accepted at CVPR 2026 (Main Track)

2604.02718 2026-04-06 cs.LG cs.CL

Generative Frontiers: Why Evaluation Matters for Diffusion Language Models

Patrick Pynadath, Jiaxin Shi, Ruqi Zhang

2604.02714 2026-04-06 cs.CV

ExploreVLA: Dense World Modeling and Exploration for End-to-End Autonomous Driving

Zihao Sheng, Xin Ye, Jingru Luo, Sikai Chen, Liu Ren

Comments The code and demo will be publicly available at https://zihaosheng.github.io/ExploreVLA/

详情

英文摘要

End-to-end autonomous driving models based on Vision-Language-Action (VLA) architectures have shown promising results by learning driving policies through behavior cloning on expert demonstrations. However, imitation learning inherently limits the model to replicating observed behaviors without exploring diverse driving strategies, leaving it brittle in novel or out-of-distribution scenarios. Reinforcement learning (RL) offers a natural remedy by enabling policy exploration beyond the expert distribution. Yet VLA models, typically trained on offline datasets, lack directly observable state transitions, necessitating a learned world model to anticipate action consequences. In this work, we propose a unified understanding-and-generation framework that leverages world modeling to simultaneously enable meaningful exploration and provide dense supervision. Specifically, we augment trajectory prediction with future RGB and depth image generation as dense world modeling objectives, requiring the model to learn fine-grained visual and geometric representations that substantially enrich the planning backbone. Beyond serving as a supervisory signal, the world model further acts as a source of intrinsic reward for policy exploration: its image prediction uncertainty naturally measures a trajectory's novelty relative to the training distribution, where high uncertainty indicates out-of-distribution scenarios that, if safe, represent valuable learning opportunities. We incorporate this exploration signal into a safety-gated reward and optimize the policy via Group Relative Policy Optimization (GRPO). Experiments on the NAVSIM and nuScenes benchmarks demonstrate the effectiveness of our approach, achieving a state-of-the-art PDMS score of 93.7 and an EPDMS of 88.8 on NAVSIM. The code and demo will be publicly available at https://zihaosheng.github.io/ExploreVLA/.

URL PDF HTML ☆

赞 0 踩 0

2604.02713 2026-04-06 cs.CL

Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts

Jiawen Deng, Wentao Zhang, Ziyun Jiao, Fuji Ren

Comments 22 pages, ACM CHI 2026

2604.02710 2026-04-06 cs.RO cs.AI cs.CV

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views

Junwei You, Pei Li, Zhuoyu Jiang, Weizhe Tang, Zilin Huang, Rui Gan, Jiaxi Liu, Yan Zhao, Sikai Chen, Bin Ran

详情

英文摘要

Multimodal large language models (MLLMs) have shown strong potential for autonomous driving, yet existing benchmarks remain largely ego-centric and therefore cannot systematically assess model performance in infrastructure-centric and cooperative driving conditions. In this work, we introduce V2X-QA, a real-world dataset and benchmark for evaluating MLLMs across vehicle-side, infrastructure-side, and cooperative viewpoints. V2X-QA is built around a view-decoupled evaluation protocol that enables controlled comparison under vehicle-only, infrastructure-only, and cooperative driving conditions within a unified multiple-choice question answering (MCQA) framework. The benchmark is organized into a twelve-task taxonomy spanning perception, prediction, and reasoning and planning, and is constructed through expert-verified MCQA annotation to enable fine-grained diagnosis of viewpoint-dependent capabilities. Benchmark results across ten representative state-of-the-art proprietary and open-source models show that viewpoint accessibility substantially affects performance, and infrastructure-side reasoning supports meaningful macroscopic traffic understanding. Results also indicate that cooperative reasoning remains challenging since it requires cross-view alignment and evidence integration rather than simply additional visual input. To address these challenges, we introduce V2X-MoE, a benchmark-aligned baseline with explicit view routing and viewpoint-specific LoRA experts. The strong performance of V2X-MoE further suggests that explicit viewpoint specialization is a promising direction for multi-view reasoning in autonomous driving. Overall, V2X-QA provides a foundation for studying multi-perspective reasoning, reliability, and cooperative physical intelligence in connected autonomous driving. The dataset and V2X-MoE resources are publicly available at: https://github.com/junwei0001/V2X-QA.

URL PDF HTML ☆

赞 0 踩 0

2604.02707 2026-04-06 cs.RO cs.CV cs.SY eess.SY

A Rapid Instrument Exchange System for Humanoid Robots in Minimally Invasive Surgery

Bingcong Zhang, Yihang Lyv, Lianbo Ma, Yushi He, Pengfei Wei, Xingchi Liu, Jinhua Li, Jianchang Zhao, Lizhi Pan

2604.02706 2026-04-06 cs.RO

ALIVE-LIO: Degeneracy-Aware Learning of Inertial Velocity for Enhancing ESKF-Based LiDAR-Inertial Odometry

Seongjun Kim, Daehan Lee, Junwoo Hong, Sanghyun Park, Hyunyoung Jo, Soohee Han

Comments 18 pages, 9 figures

2604.02699 2026-04-06 cs.CL cs.AI

Trivial Vocabulary Bans Improve LLM Reasoning More Than Deep Linguistic Constraints

Rodney Jehu-Appiah

Comments 19 pages, 10 tables, 3 appendices

2604.02697 2026-04-06 cs.LG

LieTrunc-QNN: Lie Algebra Truncation and Quantum Expressivity Phase Transition from LiePrune to Provably Stable Quantum Neural Networks

Haijian Shao, Dalong Zhao, Xing Deng, Wenzheng Zhu, Yingtao Jiang

Comments 9 pages, 4 figures, 1 table

2604.02696 2026-04-06 cs.CV cs.RO

VBGS-SLAM: Variational Bayesian Gaussian Splatting Simultaneous Localization and Mapping

Yuhan Zhu, Yanyu Zhang, Jie Xu, Wei Ren

2604.02695 2026-04-06 cs.CV

XrayClaw: Cooperative-Competitive Multi-Agent Alignment for Trustworthy Chest X-ray Diagnosis

Shawn Young, Lijian Xu

Comments 14 pages

2604.02694 2026-04-06 cs.CV cs.AI

DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning

Fanwei Zeng, Changtao Miao, Jing Huang, Zhiya Tan, Shutao Gong, Xiaoming Yu, Yang Wang, Weibin Yao, Joey Tianyi Zhou, Jianshu Li, Yin Yan

Comments 10 pages, 4 figures, 5 tables. Preprint

2604.02692 2026-04-06 cs.CV

Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

Fuyuan Liu, Dianyu Yu, He Ren, Nayu Liu, Xiaomian Kang, Delai Qiu, Fa Zhang, Genpeng Zhen, Shengping Liu, Jiaen Liang, Wei Huang, Yining Wang, Junnan Zhu

2604.02691 2026-04-06 cs.LG

Adaptive Semantic Communication for Wireless Image Transmission Leveraging Mixture-of-Experts Mechanism

Haowen Wan, Qianqian Yang

2604.02689 2026-04-06 cs.CV cs.AI

Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

Yuhui Lin, Siyue Yu, Yuxing Yang, Guangliang Cheng, Jimin Xiao

2604.02686 2026-04-06 cs.LG cs.AI

Beyond Semantic Manipulation: Token-Space Attacks on Reward Models

Yuheng Zhang, Mingyue Huo, Minghao Zhu, Mengxue Zhang, Nan Jiang

2604.02685 2026-04-06 cs.LG cs.AI

Finding Belief Geometries with Sparse Autoencoders

Matthew Levinson

详情

英文摘要

Understanding the geometric structure of internal representations is a central goal of mechanistic interpretability. Prior work has shown that transformers trained on sequences generated by hidden Markov models encode probabilistic belief states as simplex-shaped geometries in their residual stream, with vertices corresponding to latent generative states. Whether large language models trained on naturalistic text develop analogous geometric representations remains an open question. We introduce a pipeline for discovering candidate simplex-structured subspaces in transformer representations, combining sparse autoencoders (SAEs), $k$-subspace clustering of SAE features, and simplex fitting using AANet. We validate the pipeline on a transformer trained on a multipartite hidden Markov model with known belief-state geometry. Applied to Gemma-2-9B, we identify 13 priority clusters exhibiting candidate simplex geometry ($K \geq 3$). A key challenge is distinguishing genuine belief-state encoding from tiling artifacts: latents can span a simplex-shaped subspace without the mixture coordinates carrying predictive signal beyond any individual feature. We therefore adopt barycentric prediction as our primary discriminating test. Among the 13 priority clusters, 3 exhibit a highly significant advantage on near-vertex samples (Wilcoxon $p < 10^{-14}$) and 4 on simplex-interior samples. Together 5 distinct real clusters pass at least one split, while no null cluster passes either. One cluster, 768_596, additionally achieves the highest causal steering score in the dataset. This is the only case where passive prediction and active intervention converge. We present these findings as preliminary evidence that genuine belief-like geometry exists in Gemma-2-9B's representation space, and identify the structured evaluation that would be required to confirm this interpretation.

URL PDF HTML ☆

赞 0 踩 0