arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.21356 2026-04-24 cs.CV

SparseGF: A Height-Aware Sparse Segmentation Framework with Context Compression for Robust Ground Filtering Across Urban to Natural Scenes

Nannan Qin, Pengjie Tao, Haiyan Guan, Zhizhong Kang, Lingfei Ma, Xiangyun Hu, Jonathan Li

详情

英文摘要

High-quality digital terrain models derived from airborne laser scanning (ALS) data are essential for a wide range of geospatial analyses, and their generation typically relies on robust ground filtering (GF) to separate point clouds across diverse landscapes into ground and non-ground parts. Although current deep-learning-based GF methods have demonstrated impressive performance, especially in specific challenging terrains, their cross-scene generalization remains limited by two persistent issues: the context-detail dilemma in large-scale processing due to limited computational resources, and the random misclassification of tall objects arising from classification-only optimization. To overcome these limitations, we propose SparseGF, a height-aware sparse segmentation framework enhanced with context compression. It is built upon three key innovations: (1) a convex-mirror-inspired context compression module that condenses expansive contexts into compact representations while preserving central details; (2) a hybrid sparse voxel-point network architecture that effectively interprets compressed representations while mitigating compression-induced geometric distortion; and (3) a height-aware loss function that explicitly enforces topographic elevation priors during training to suppress random misclassification of tall objects. Extensive evaluations on two large-scale ALS benchmark datasets demonstrate that SparseGF delivers robust GF across urban to natural terrains, achieving leading performance in complex urban scenes, competitive results on mixed terrains, and moderate yet non-catastrophic accuracy in densely forested steep areas. This work offers new insights into deep-learning-based GF research and encourages further exploration toward truly cross-scene generalization for large-scale environmental monitoring.

URL PDF HTML ☆

赞 0 踩 0

2604.21355 2026-04-24 cs.RO

RPG: Robust Policy Gating for Smooth Multi-Skill Transitions in Humanoid Fighting

Yucheng Xin, Jiacheng Bao, Yubo Dong, Xueqian Wang, Bin Zhao, Xuelong Li, Junbo Tan, Dong Wang

2604.21354 2026-04-24 cs.LG

Decoupled Travel Planning with Behavior Forest

Duanyang Yuan, Sihang Zhou, Yanning Hou, Xiaoshu Chen, Haoyuan Chen, Ke Liang, Jiyuan Liu, Chuan Ma, Xinwang Liu, Jian Huang

详情

英文摘要

Behavior sequences, composed of executable steps, serve as the operational foundation for multi-constraint planning problems such as travel planning. In such tasks, each planning step is not only constrained locally but also influenced by global constraints spanning multiple subtasks, leading to a tightly coupled and complex decision process. Existing travel planning methods typically rely on a single decision space that entangles all subtasks and constraints, failing to distinguish between locally acting constraints within a subtask and global constraints that span multiple subtasks. Consequently, the model is forced to jointly reason over local and global constraints at each decision step, increasing the reasoning burden and reducing planning efficiency. To address this problem, we propose the Behavior Forest method. Specifically, our approach structures the decision-making process into a forest of parallel behavior trees, where each behavior tree is responsible for a subtask. A global coordination mechanism is introduced to orchestrate the interactions among these trees, enabling modular and coherent travel planning. Within this framework, large language models are embedded as decision engines within behavior tree nodes, performing localized reasoning conditioned on task-specific constraints to generate candidate subplans and adapt decisions based on coordination feedback. The behavior trees, in turn, provide an explicit control structure that guides LLM generation. This design decouples complex tasks and constraints into manageable subspaces, enabling task-specific reasoning and reducing the cognitive load of LLM. Experimental results show that our method outperforms state-of-the-art methods by 6.67% on the TravelPlanner and by 11.82% on the ChinaTravel benchmarks, demonstrating its effectiveness in increasing LLM performance for complex multi-constraint travel planning.

URL PDF HTML ☆

赞 0 踩 0

2604.21352 2026-04-24 cs.CL

CARE: Counselor-Aligned Response Engine for Online Mental-Health Support

Hagai Astrin, Ayal Swaid, Avi Segal, Kobi Gal

Comments 9 pages, 4 figures

2604.21351 2026-04-24 cs.RO

Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot

Yucheng Xin, Jiacheng Bao, Haoran Yang, Wenqiang Que, Dong Wang, Junbo Tan, Xueqian Wang, Bin Zhao, Xuelong Li

2604.21349 2026-04-24 cs.CV cs.AI cs.LG cs.NE

Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

Wadii Boulila, Adel Ammar, Bilel Benjdira, Maha Driss

Comments 17 pages

详情

英文摘要

Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views, which works well when augmentations preserve semantic content. However, aerial images are frequently degraded by haze, motion blur, rain, and occlusion that remove critical evidence. Enforcing alignment between a clean and a severely degraded view can introduce spurious structure into the latent space. This study proposes a training strategy and architectural modification to enhance SSL robustness to such corruptions. It introduces a per-sample, per-factor trust weight into the alignment objective, combined with the base contrastive loss as an additive residual. A stop-gradient is applied to the trust weight instead of a multiplicative gate. While a multiplicative gate is a natural choice, experiments show it impairs the backbone, whereas our additive-residual approach improves it. Using a 200-epoch protocol on a 210,000-image corpus, the method achieves the highest mean linear-probe accuracy among six backbones on EuroSAT, AID, and NWPU-RESISC45 (90.20% compared to 88.46% for SimCLR and 89.82% for VICReg). It yields the largest improvements under severe information-erasing corruptions on EuroSAT (+19.9 points on haze at s=5 over SimCLR). The method also demonstrates consistent gains of +1 to +3 points in Mahalanobis AUROC on a zero-shot cross-domain stress test using BDD100K weather splits. Two ablations (scalar uncertainty and cosine gate) indicate the additive-residual formulation is the primary source of these improvements. An evidential variant using Dempster-Shafer fusion introduces interpretable signals of conflict and ignorance. These findings offer a concrete design principle for uncertainty-aware SSL. Code is publicly available at https://github.com/WadiiBoulila/trust-ssl.

URL PDF HTML ☆

赞 0 踩 0

2604.21346 2026-04-24 cs.AI cs.CL cs.CV

Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav, Tanel Tammet

2604.21344 2026-04-24 cs.CL cs.AI cs.CV cs.LG cs.MA

Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts

Azher Ahmed Efat, Seok Hwan Song, Wallapak Tavanapong

2604.21343 2026-04-24 cs.CV

Latent Denoising Improves Visual Alignment in Large Multimodal Models

Dhruv Parikh, Jacob Fein-Ashley, Rajgopal Kannan, Viktor Prasanna

Comments Technical Report

2604.21337 2026-04-24 cs.RO cs.MA

PREVENT-JACK: Context Steering for Swarms of Long Heavy Articulated Vehicles

Adrian Baruck, Michael Dubé, Christoph Steup, Sanaz Mostaghim

Comments 32 pages, 7 figures, 4 videos; submitted to the Swarm Robotics collection of the Nature Portfolio Journal Robotics (NPJ Robot)

2604.21334 2026-04-24 cs.AI cs.CE cs.CL cs.LG econ.GN q-fin.EC

Ideological Bias in LLMs' Economic Causal Reasoning

Donggyu Lee, Hyeok Yun, Jungwon Kim, Junsik Min, Sungwon Park, Sangyoon Park, Jihee Kim

2604.21331 2026-04-24 cs.RO

FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

Zhen Zhang, Weinan Wang, Hejia Sun, Qingpeng Ding, Xiangyu Chu, Guoxin Fang, K. W. Samuel Au

Comments 12 pages, 6 figures

2604.21330 2026-04-24 cs.CV

Teacher-Guided Routing for Sparse Vision Mixture-of-Experts

Masahiro Kada, Ryota Yoshihashi, Satoshi Ikehata, Rei Kawakami, Ikuro Sato

2604.21327 2026-04-24 cs.LG cs.AI cs.CL

Understanding and Mitigating Spurious Signal Amplification in Test-Time Reinforcement Learning for Math Reasoning

Yongcan Yu, Lingxiao He, Jian Liang, Kuangpu Guo, Meng Wang, Qianlong Xie, Xingxing Wang, Ran He

Comments Accepted to ACL 2026 Findings

2604.21326 2026-04-24 cs.CV cs.AI

MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment

Juan Li, Chuanghao Ding, Xujie Zhang, Cam-Tu Nguyen

2604.21324 2026-04-24 cs.CV

Temporal Prototyping and Hierarchical Alignment for Unsupervised Video-based Visible-Infrared Person Re-Identification

Zhiyong Li, Wei Jiang, Haojie Liu, Mingyu Wang, Wanchong Xu, Weijie Mao

详情

英文摘要

Visible-infrared person re-identification (VI-ReID) enables cross-modality identity matching for all-day surveillance, yet existing methods predominantly focus on the image level or rely heavily on costly identity annotations. While video-based VI-ReID has recently emerged to exploit temporal dynamics for improved robustness, existing studies remain limited to supervised settings. Crucially, the unsupervised video VI-ReID problem, where models must learn from RGB and infrared tracklets without identity labels, remains largely unexplored despite its practical importance in real-world deployment. To bridge this gap, we propose HiTPro (Hierarchical Temporal Prototyping), a prototype-driven framework without explicit hard pseudo-label assignment for unsupervised video-based VI-ReID. HiTPro begins with an efficient Temporal-aware Feature Encoder that first extracts discriminative frame-level features and then aggregates them into a robust tracklet-level representation. Building upon these features, HiTPro first constructs reliable intra-camera prototypes via Intra-Camera Tracklet Prototyping by aggregating features from temporally partitioned sub-tracklets. Through Hierarchical Cross-Prototype Alignment, we perform a two-stage positive mining process: progressing from within-modality associations to cross-modality matching, enhanced by Dynamic Threshold Strategy and Soft Weight Assignment. Finally, {Hierarchical Contrastive Learning} progressively optimizes feature-prototype alignment across three levels: intra-camera discrimination, cross-camera same-modality consistency, and cross-modality invariance. Extensive experiments on HITSZ-VCM and BUPTCampus demonstrate that HiTPro achieves state-of-the-art performance under fully unsupervised settings, significantly outperforming adapted baselines and establishes a strong baseline for future research.

URL PDF HTML ☆

赞 0 踩 0

2604.21321 2026-04-24 cs.CV

FryNet: Dual-Stream Adversarial Fusion for Non-Destructive Frying Oil Oxidation Assessment

Khaled R Ahmed, Toqi Tahamid Sarker, Taminul Islam, Tamany M Alanezi, Amer AbuGhazaleh

Comments 10 pages, 7 figures, this paper has been submitted and accepted for publication at CVPRW 2026

2604.21313 2026-04-24 cs.CV cs.CY

PLAS-Net: Pixel-Level Area Segmentation for UAV-Based Beach Litter Monitoring

Yongying Liu, Jiaqi Wang, Jian Song, Xinlei Shao, Yijia Chen, Nan Xu, Katsunori Mizuno, Shigeru Tabeta, Fan Zhao

Comments 30 pages, 12 figures

2604.21312 2026-04-24 cs.CV cs.AI

The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview

Kai Liu, Haoyang Yue, Zeli Lin, Zheng Chen, Jingkai Wang, Jue Gong, Jiatong Li, Xianglong Yan, Libo Zhu, Jianze Li, Ziqing Zhang, Zihan Zhou, Xiaoyang Liu, Radu Timofte, Yulun Zhang, Junye Chen, Zhenming Yan, Yucong Hong, Ruize Han, Song Wang, Li Pang, Heng Zhao, Xinqiao Wu, Deyu Meng, Xiangyong Cao, Weijun Yuan, Zhan Li, Zhanglu Chen, Boyang Yao, Yihang Chen, Yifan Deng, Zengyuan Zuo, Junjun Jiang, Saiprasad Meesiyawar, Sulocha Yatageri, Nikhil Akalwadi, Ramesh Ashok Tabib, Uma Mudenagudi, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Cici Liu, Tongyao Mu, Qiong Cao, Yifan Wang, Kosuke Shigematsu, Hiroto Shirono, Asuka Shin, Wei Zhou, Linfeng Li, Lingdong Kong, Ce Wang, Xingwei Zhong, Wanjie Sun, Dafeng Zhang, Hongxin Lan, Qisheng Xu, Mingyue He, Hui Geng, Tianjiao Wan, Kele Xu, Changjian Wang, Antoine Carreaud, Nicola Santacroce, Shanci Li, Jan Skaloud, Adrien Gressin

Comments Github Repo: https://github.com/Kai-Liu001/NTIRE2026_infraredSR

2604.21311 2026-04-24 cs.CV

an interpretable vision transformer framework for automated brain tumor classification

Chinedu Emmanuel Mbonu, Tochukwu Sunday Belonwu, Okwuchukwu Ejike Chukwuogo, Kenechukwu Sylvanus Anigbogu

Comments 9 pages, 6 figures

2604.21309 2026-04-24 cs.CL

When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation

Nannan Huang, Iffat Maab, Junichi Yamagishi

Comments Accepted to ACL 2026 Main Conference

2604.21300 2026-04-24 cs.CL cs.IR cs.LG

Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

Hieu Man, Van-Cuong Pham, Nghia Trung Ngo, Franck Dernoncourt, Thien Huu Nguyen

2604.21291 2026-04-24 cs.CV cs.AI

Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation

Yuanchen Fei, Yude Zou, Zejian Kang, Ming Li, Jiaying Zhou, Xiangru Huang

2604.21290 2026-04-24 cs.CV cs.DC

GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

Anvitha Ramachandran, Dhruv Parikh, Viktor Prasanna

Comments FCCM 2026

2604.21289 2026-04-24 cs.CV

AttDiff-GAN: A Hybrid Diffusion-GAN Framework for Facial Attribute Editing

Wenmin Huang, Weiqi Luo, Xiaochun Cao, Jiwu Huang

2604.21286 2026-04-24 cs.CL cs.AI cs.LG

Cross-Entropy Is Load-Bearing: A Pre-Registered Scope Test of the K-Way Energy Probe on Bidirectional Predictive Coding

Jon-Paul Cacioli

Comments 11 pages, 3 figures, 4 tables. Pre-registered on OSF (https://osf.io/2kvsp). Code at https://github.com/synthiumjp/ima

2604.21284 2026-04-24 cs.AI cs.CL cs.IR

Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture

Robin Dey, Panyanon Viradecha

Comments 20 pages, 10 tables. Code and data at https://github.com/web3guru888/mempalace-scientific-analysis

2604.21280 2026-04-24 cs.CV

ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

Jebacyril Arockiaraj, Dhruv Parikh, Viktor Prasanna

Comments FCCM 2026

2604.21279 2026-04-24 cs.CV

LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation

Wenmin Huang, Weiqi Luo, Xiaochun Cao, Jiwu Huang

2604.21276 2026-04-24 cs.CL cs.AI cs.SD

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition

Srishti Ginjala, Eric Fosler-Lussier, Christopher W. Myers, Srinivasan Parthasarathy