arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.23432 2026-04-28 cs.CV cs.AI

Sphere-Depth: A Benchmark for Depth Estimation Methods with Varying Spherical Camera Orientations

Soulayma Gazzeh, Giuseppe Mazzola, Liliana Lo Presti, Marco La Cascia

Comments Preprint

详情

DOI: 10.1007/978-3-032-04968-1_31
Journal ref: CAIP 2025, LNCS, vol 15621. Springer (2026)

英文摘要

Reliable depth estimation from spherical images is crucial for 360° vision in robotic navigation and immersive scene understanding. However, the onboard spherical camera can experience unintentional pose variations in real-world robotic platforms that, along with the geometric distortions inherent in equirectangular projections, significantly impact the effectiveness of depth estimation. To study this issue, a novel public benchmark, called Sphere-Depth, is introduced to systematically evaluate the robustness of monocular depth estimation models from equirectangular images in a reproducible way. Camera pose perturbations are simulated and used to assess the performance of a popular perspective-based model, Depth Anything, and of spherical-aware models such as Depth Anywhere, ACDNet, Bifuse++, and SliceNet. Furthermore, to ensure meaningful evaluation across models, a depth calibration-based error protocol is proposed to convert predicted relative depth values into metric depth values using supervised learned scaling factors for each model. Experiments show that even models explicitly designed to process spherical images exhibit substantial performance degradation when variations in the camera pose are observed with respect to the canonical pose. The full benchmark, evaluation protocol, and dataset splits are made publicly available at: https://github.com/sgazzeh/Sphere_depth

URL PDF HTML ☆

赞 0 踩 0

2604.23426 2026-04-28 cs.CV cs.LG

Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning with Adaptive Quantization and Differential Privacy

Emre Ardıç, Yakup Genç

Comments Published in IEEE Access, Vol. 13, 2025. DOI: 10.1109/ACCESS.2025.3554138 Github: https://github.com/eardic/FL_DPQS

详情

DOI: 10.1109/ACCESS.2025.3554138
Journal ref: IEEE Access, vol. 13, pp. 54322-54337, 2025

英文摘要

Federated learning (FL) is a distributed machine learning method where multiple devices collaboratively train a model under the management of a central server without sharing underlying data. One of the key challenges of FL is the communication bottleneck caused by variations in connection speed and bandwidth across devices. Therefore, it is essential to reduce the size of transmitted data during training. Additionally, there is a potential risk of exposing sensitive information through the model or gradient analysis during training. To address both privacy and communication efficiency, we combine differential privacy (DP) and adaptive quantization methods. We use Laplacian-based DP to preserve privacy, which is relatively underexplored in FL and offers tighter privacy guarantees than Gaussian-based DP. We propose a simple and efficient global bit-length scheduler using round-based cosine annealing, along with a client-based scheduler that dynamically adapts based on client contribution estimated through dataset entropy analysis. We evaluate our approach through extensive experiments on CIFAR10, MNIST, and medical imaging datasets, using non-IID data distributions across varying client counts, bit-length schedulers, and privacy budgets. The results show that our adaptive quantization methods reduce total communicated data by up to 52.64% for MNIST, 45.06% for CIFAR10, and 31% to 37% for medical imaging datasets compared to 32-bit float training while maintaining competitive model accuracy and ensuring robust privacy through differential privacy.

URL PDF HTML ☆

赞 0 踩 0

2604.23424 2026-04-28 cs.LG cs.CL

Evolve: A Persistent Knowledge Lifecycle for Small Language Models

Dikran Hovagimian

Comments 35 pages, 1 figure. Code and evaluation data: https://gitlab.com/dikran.hovagimian/evolve

2604.23418 2026-04-28 cs.LG cs.PF

Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions

Tomer Zilca, Gal Mendelson

2604.23415 2026-04-28 cs.CV

A Heterogeneous Two-Stream Framework for Video Action Recognition with Comparative Fusion Analysis

Md. Afzalur Rahaman, Tahmid Rahman

Comments 17 pages, 8 figures

详情

英文摘要

Most two-stream action recognition networks apply the same convolutional backbone to both RGB and optical flow streams, ignoring the fact that the two modalities have fundamentally different structural properties. Optical flow captures fine-grained motion patterns, while RGB frames carry rich appearance and scene context - treating them identically discards this distinction. We propose DualStreamHybrid, a heterogeneous two-stream architecture that assigns each stream a backbone suited to its input: a pretrained ViT-Tiny/16 for RGB frames, and a MobileNetV2 trained from scratch on a 20-channel stacked optical flow representation. A learned projection layer maps the two differently-sized feature vectors to a common dimensionality before fusion, enabling the two streams to interact without forcing architectural symmetry. We design five fusion strategies within a unified framework - late fusion, concatenation, cross-attention, weighted fusion, and gated fusion - and evaluate them on UCF11 (1,600 videos, 11 classes) and UCF50 (6,681 videos, 50 classes) to study how fusion behaviour scales with dataset size. On UCF11, cross-attention achieves 98.12% test accuracy, outperforming the RGB-only ViT-Tiny baseline of 95.94%, which suggests that explicit inter-modal attention is particularly effective on smaller, less complex datasets. On UCF50, weighted fusion reaches 96.86% and proves the most consistent strategy across both benchmarks. The learned stream weights reveal an interesting pattern: UCF11 sees near-equal modality contribution (RGB: 0.507, flow: 0.493), while UCF50 favours the RGB stream slightly more (RGB: 0.554, flow: 0.446) - arguably reflecting the larger and more visually diverse action space. Taken together, these results suggest that even a lightweight motion stream meaningfully complements a strong appearance encoder, and that the optimal fusion strategy depends on dataset scale.

URL PDF HTML ☆

赞 0 踩 0

2604.23413 2026-04-28 cs.CL

Beyond Local vs. External: A Game-Theoretic Framework for Trustworthy Knowledge Acquisition

Rujing Yao, Yufei Shi, Yang Wu, Ang Li, Zhuoren Jiang, XiaoFeng Wang, Haixu Tang, Xiaozhong Liu

2604.23412 2026-04-28 cs.CL

Overcoming Copyright Barriers in Corpus Distribution Through Non-Reversible Hashing

Arthur Amalvy, Vincent Labatut, Xavier Bost, Hen-Hsen Huang

Comments Accepted to ACL 2026

2604.23407 2026-04-28 cs.CV cs.AI

PushupBench: Your VLM is not good at counting pushups

Shengzhi Li, Jiarun Chen, Karun Sharma, Jiaqi Su, Shichao Pei

2604.23403 2026-04-28 cs.CV cs.AI cs.NE

Learn&Drop: Fast Learning of CNNs based on Layer Dropping

Giorgio Cruciata, Luca Cruciata, Liliana Lo Presti, Jan Van Gemert, Marco La Cascia

Comments Preprint. Paper accepted to Springer Neural Computing and Applications

2604.23399 2026-04-28 cs.CV

Breaking the Resource Wall: Geometry-Guided Sequence Modeling for Efficient Semantic Segmentation

Sheng-Wei Chan, Xin-Jui Pan, Chun-Po Shen, Chia-Min Lin, Yung-Che Wang, Jen-Shiun Chiang

Comments 15 pages, 20 figures. Code will be released

2604.23398 2026-04-28 cs.AI

When Corrective Hints Hurt: Prompt Design in Reasoner-Guided Repair of LLM Overcaution on Entailed Negations under OWL~2~DL

Yijiashun Qi, Xiang Xu, Yuxuan Li

Comments accepted by icaide 2026

2604.23392 2026-04-28 cs.AI

SoccerRef-Agents: Multi-Agent System for Automated Soccer Refereeing

Zi Meng, Wanli Song, Yi Hu, Jiayuan Rao, Gang Chen

Comments 26 pages, 8 figures. Submitted to ISACE 2026. Github Repo: https://github.com/yjlog/SoccerRef-Agents

2604.23387 2026-04-28 cs.CV cs.RO

Keypoint-based Dynamic Object 6-DoF Pose Tracking via Event Camera

Zhe Wang, Qijin Song, Zihao Li, Jingyu Xiao, Weibang Bai

Comments Accepted to 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)

2604.23385 2026-04-28 cs.LG

Domain-Adapted Fine-Tuning of ECG Foundation Models for Multi-Label Structural Heart Disease Screening

Duc N. Do, Minh N. Do, Dang Nguyen, Khanh T. Q. Le, Khoa D. Pham, Hung N. Huynh, Phi Pham-Van-Hoang, Quan K. Huynh, Ramez M. Odat, Perisa Ashar, Ethan Philip Lowder, Minh H. N. Le, Hoang Le, Phat V. H. Nguyen, Quan Le, Jacques Kpodonu, Phat K. Huynh

Comments Accepted to Canadian AI 2026

2604.23380 2026-04-28 cs.LG cs.CV

V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

Bingda Tang, Yuhui Zhang, Xiaohan Wang, Jiayuan Mao, Ludwig Schmidt, Serena Yeung-Levy

2604.23377 2026-04-28 cs.AI

Constraint-Based Analysis of Reasoning Shortcuts in Neurosymbolic Learning

Akihiro Takemura, Katsumi Inoue, Masaaki Nishino

Comments This is the full version of a paper appearing at the 23rd International Conference on Principles of Knowledge Representation and Reasoning (KR 2026)

2604.23375 2026-04-28 cs.CV stat.ML

Hierarchical Spatio-Channel Clustering for Efficient Model Compression in Medical Image Analysis

Sisipho Hamlomo, Marcellin Atemkeng, Habte Tadesse Likassa, Blaise Ravelo, Thierry Bouwmans, Sébastien Lalléchère, Antoine Vacavant, Ding-Geng Chen

2604.23371 2026-04-28 cs.LG

When Context Sticks: Studying Interference in In-Context Learning

Hanna Rød, Dagny Streit, Nils Valseth Selte, Justin Li

Comments 14 pages, 6 figures, 2 tables. Code available at: https://github.com/nilsvselte/icl-context-stickiness

2604.23368 2026-04-28 cs.LG

TEMPO: Transformers for Temporal Disease Progression from Cross-Sectional Data

Hongtao Hao, Joseph L. Austerweil

Comments 31 pages; Published at Conference on Health, Inference, and Learning (CHIL) 2026

2604.23366 2026-04-28 cs.AI cs.MA

GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs

Federico A. Kamelhar

2604.23360 2026-04-28 cs.RO

Learning from Demonstration with Failure Awareness for Safe Robot Navigation

Xianghui Wang, Siwei Cheng, Shanze Wang, Xinming Zhang, Dan Zhang, Wei Zhang

2604.23356 2026-04-28 cs.CL cs.HC

VeriLLMed: Interactive Visual Debugging of Medical Large Language Models with Knowledge Graphs

Yurui Xiang, Xingyi Mao, Rui Sheng, Zixin Chen, Zelin Zang, Yuyang Wu, Haipeng Zeng, Huamin Qu, Yushi Sun, Yanna Lin

2604.23351 2026-04-28 cs.CL cs.AI cs.LG

When Chain-of-Thought Fails, the Solution Hides in the Hidden States

Houman Mehrafarin, Amit Parekh, Ioannis Konstas

2604.23348 2026-04-28 cs.CV cs.AI

EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs

He Hu, Tengjin Weng, Zebang Cheng, Yu Wang, Jiachen Luo, Björn Schuller, Zheng Lian, Laizhong Cui

2604.23347 2026-04-28 cs.CL

Evaluating Large Language Models on Computer Science University Exams in Data Structures

Edan Gabay, Yael Maoz, Jonathan Stahl, Naama Maoz, Abdo Amer, Orr Eilat, Hanoch Levy, Michal Kleinbort, Amir Rubinstein, Adi Haviv

2604.23345 2026-04-28 cs.CL cs.HC

Bridging Reasoning and Action: Hybrid LLM-RL Framework for Efficient Cross-Domain Task-Oriented Dialogue

Yangyang Zhao, Linfan Dai, Li Cai, Bowen Xing, Libo Qin

2604.23344 2026-04-28 cs.CV

Exploring Hierarchical Consistency and Unbiased Objectness for Open-Vocabulary Object Detection

Sanghoon Lee, Geon Lee, Hyekang Park, Bumsub Ham

Comments Accepted to CVPR 2026 Findings

2604.23335 2026-04-28 cs.CV

H-SemiS: Hierarchical Fusion of Semi and Self-Supervised Learning for Knee Osteoarthritis Severity Grading

Chandravardhan Singh Raghaw, Anushka Parwal, Shahid Shafi Dar, Prajakta Darade, Nagendra Kumar

详情

DOI: 10.1016/j.eswa.2026.132279
Journal ref: Expert Systems with Applications, Volume 322, 1 August 2026, 132279

英文摘要

Knee osteoarthritis (KOA) is a degenerative joint disease that can lead to chronic pain, reduced mobility, and long-term disability. Automated severity grading from knee radiographs can support early assessment, but current methods heavily depend on large labeled datasets and remain sensitive to class imbalance, noisy samples, and variability in clinical annotations. To alleviate these limitations, we propose a Hierarchical fusion of Semi-Supervised framework with Self-Supervision (H-SemiS) for KOA severity grading in knee X-ray samples using limited annotated data. Rather than treating severity grading as a flat multi-class problem, H-SemiS decomposes the task into a sequence of binary sub-tasks within a semi-supervised teacher-student architecture, directly mitigating the impact of class imbalance. To further enhance feature learning from unlabeled data, the framework integrates an adversarial self-supervised reconstruction module that encourages the network to capture robust anatomical structures. In parallel, a teacher-student design with quantum-inspired feature mixing improves discrimination boundaries between adjacent grades when pseudo-labels are noisy. We comprehensively evaluate H-SemiS on two challenging multi-class datasets and assess its generalizability on two binary-class datasets. Our experimental results demonstrate the superiority of the proposed H-SemiS framework across multiple evaluation metrics, consistently outperforming several competing baselines and state-of-the-art methods. The code is publicly available at https://github.com/chandravardhan-singh-raghaw/H-SemiS.

URL PDF HTML ☆

赞 0 踩 0

2604.23333 2026-04-28 cs.LG cs.CL

Process Supervision of Confidence Margin for Calibrated LLM Reasoning

Liaoyaqi Wang, Chunsheng Zuo, William Jurayj, Benjamin Van Durme, Anqi Liu

2604.23327 2026-04-28 cs.RO

An Efficient Beam Search Algorithm for Active Perception in Mobile Robotics

Kaixian Qu, Han Wang, Victor Klemm, Cesar Cadena, Marco Hutter

Comments Accepted to The International Journal of Robotics Research (IJRR). Project page: https://efficient-beam-search.github.io/