arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.02591 2026-03-04 cs.CV

Maximizing Generalization: The Effect of Different Augmentation Techniques on Lightweight Vision Transformer for Bengali Character Classification

Rafi Hassan Chowdhury, Naimul Haque, Kaniz Fatiha

详情

英文摘要

Deep learning models have proven to be highly effective in computer vision, with deep convolutional neural networks achieving impressive results across various computer vision tasks. However, these models rely heavily on large datasets to avoid overfitting. When a model learns features with either low or high variance, it can lead to underfitting or overfitting on the training data. Unfortunately, large-scale datasets may not be available in many domains, particularly for resource-limited languages such as Bengali. In this experiment, a series of tests were conducted in the field of image data augmentation as an approach to addressing the limited data problem for Bengali handwritten characters. The study also provides an in-depth analysis of the performance of different augmentation techniques. Data augmentation refers to a set of techniques applied to data to increase its size and diversity, making it more suitable for training deep learning models. The image augmentation techniques evaluated in this study include CLAHE, Random Rotation, Random Affine, Color Jitter, and their combinations. The study further explores the use of augmentation methods with a lightweight model such as EfficientViT. Among the different augmentation strategies, the combination of Random Affine and Color Jitter produced the best accuracy on the Ekush [1] and AIBangla [2] datasets, achieving accuracies of 97.48% and 97.57%, respectively. This combination outperformed all other individual and combined augmentation techniques. Overall, this analysis presents a thorough examination of the impact of image data augmentation in resource-scarce languages, particularly in the context of Bengali handwritten character recognition using lightweight models.

URL PDF HTML ☆

赞 0 踩 0

2603.02588 2026-03-04 cs.CL

ExpGuard: LLM Content Moderation in Specialized Domains

Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, Jungmin Son

Comments ICLR 2026

2603.02586 2026-03-04 cs.AI

LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

Hao Li, Huan Wang, Jinjie Gu, Wenjie Wang, Chenyi Zhuang, Sikang Bian

2603.02582 2026-03-04 cs.CV eess.SP

Neural Electromagnetic Fields for High-Resolution Material Parameter Reconstruction

Zhe Chen, Peilin Zheng, Wenshuo Chen, Xiucheng Wang, Yutao Yue, Nan Cheng

Comments 10 pages, 5 figures

2603.02581 2026-03-04 cs.CV

ATD: Improved Transformer with Adaptive Token Dictionary for Image Restoration

Leheng Zhang, Wei Long, Yawei Li, Xingyu Zhou, Xiaorui Zhao, Shuhang Gu

Comments 16 pages, 10 figures

2603.02579 2026-03-04 cs.LG cs.SY eess.SY

Joint Optimization of Model Partitioning and Resource Allocation for Anti-Jamming Collaborative Inference Systems

Mengru Wu, Jiawei Li, Jiaqi Wei, Bin Lyu, Kai-Kit Wong, Hyundong Shin

2603.02577 2026-03-04 cs.LG

Towards Parameter-Free Temporal Difference Learning

Yunxiang Li, Mark Schmidt, Reza Babanezhad, Sharan Vaswani

2603.02576 2026-03-04 cs.LG

Wasserstein Proximal Policy Gradient

Zhaoyu Zhu, Shuhan Zhang, Rui Gao, Shuang Li

2603.02562 2026-03-04 cs.LG

EdgeFLow: Serverless Federated Learning via Sequential Model Migration in Edge Networks

Yuchen Shi, Qijun Hou, Pingyi Fan, Khaled B. Letaief

2603.02560 2026-03-04 cs.CV

CAWM-Mamba: A unified model for infrared-visible image fusion and compound adverse weather restoration

Huichun Liu, Xiaosong Li, Zhuangfan Huang, Tao Ye, Yang Liu, Haishu Tan

2603.02557 2026-03-04 cs.CV cs.AI

CAPT: Confusion-Aware Prompt Tuning for Reducing Vision-Language Misalignment

Maoyuan Shao, Yutong Gao, Xinyang Huang, Chuang Zhu, Lijuan Sun, Guoshun Nan

Comments Accepted by CVPR2026

2603.02556 2026-03-04 cs.CV cs.AI cs.CL cs.LG

Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs

Zhiyu Pan, Yizheng Wu, Jiashen Hua, Junyi Feng, Shaotian Yan, Bing Deng, Zhiguo Cao, Jieping Ye

Comments 19 pages, 9 figures, accepted to ICLR 2026 (oral)

2603.02554 2026-03-04 cs.CV

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation

Chonghua Lv, Dong Zhao, Shuang Wang, Dou Quan, Ning Huyan, Nicu Sebe, Zhun Zhong

Comments Accepted by CVPR2026

2603.02553 2026-03-04 cs.RO cs.CV cs.HC cs.LG

Give me scissors: Collision-Free Dual-Arm Surgical Assistive Robot for Instrument Delivery

Xuejin Luo, Shiquan Sun, Runshi Zhang, Ruizhi Zhang, Junchen Wang

Comments 8 pages, 10 figures. Accepted by IEEE International Conference on Robotics and Automation (ICRA), 2026

2603.02548 2026-03-04 cs.CV

SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding

Sheng Ye, Zhen-Hui Dong, Ruoyu Fan, Tian Lv, Yong-Jin Liu

Comments ICRA 2026

2603.02547 2026-03-04 cs.CL cs.AI cs.LG

CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think

Junzhe Shen, Jieru Zhao, Ziwei He, Zhouhan Lin

2603.02546 2026-03-04 cs.CV

On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding

Zhanzhong Pang, Dibyadip Chatterjee, Fadime Sener, Angela Yao

Comments 22 pages, 9 figures, 16 tables. Accepted by ICLR2026

2603.02542 2026-03-04 cs.AI

AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

Zhulin Jiang, Zetao Li, Cheng Wang, Ziwen Wang, Chen Xiong

2603.02541 2026-03-04 cs.CV

ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection

Deokyun Kim, Jeongjun Lee, Jungwon Choi, Jonggeon Park, Giyoung Lee, Yookyung Kim, Myungseok Ki, Juho Lee, Jihun Cha

Comments ICLR 2026 Accepted

2603.02540 2026-03-04 cs.AI

A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

Faiz Ghifari Haznitrama, Faeyza Rishad Ardi, Alice Oh

Comments 26 pages, 2 figures, 16 tables

2603.02532 2026-03-04 cs.CV

EIMC: Efficient Instance-aware Multi-modal Collaborative Perception

Kang Yang, Peng Wang, Lantao Li, Tianci Bu, Chen Sun, Deying Li, Yongcai Wang

Comments 9 pages, 8 figures, 7 tables

2603.02528 2026-03-04 cs.AI cs.RO

LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

Xiangyu Li, Tianyi Wang, Xi Cheng, Rakesh Chowdary Machineni, Zhaomiao Guo, Sikai Chen, Junfeng Jiao, Christian Claudel

2603.02525 2026-03-04 cs.LG

Thermodynamic Regulation of Finite-Time Gibbs Training in Energy-Based Models: A Restricted Boltzmann Machine Study

Görkem Can Süleymanoğlu

Comments 35 pages, 12 Tables, 7 figures. Includes theoretical analysis and experimental validation on MNIST

2603.02522 2026-03-04 cs.CV

NeighborMAE: Exploiting Spatial Dependencies between Neighboring Earth Observation Images in Masked Autoencoders Pretraining

Liang Zeng, Valerio Marsocci, Wufan Zhao, Andrea Nascetti, Maarten Vergauwen

2603.02518 2026-03-04 cs.CV

Beyond Anatomy: Explainable ASD Classification from rs-fMRI via Functional Parcellation and Graph Attention Networks

Syeda Hareem Madani, Noureen Bibi, Adam Rafiq Jeraj, Sumra Khan, Anas Zafar, Rizwan Qureshi

Comments 10 pages

2603.02511 2026-03-04 cs.RO cs.AI

Learning Object-Centric Spatial Reasoning for Sequential Manipulation in Cluttered Environments

Chrisantus Eze, Ryan C Julian, Christopher Crick

2603.02510 2026-03-04 cs.LG cs.DC cs.NE cs.PF

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Liu Yang, Zeyu Nie, Andrew Liu, Felix Zou, Deniz Altinbüken, Amir Yazdanbakhsh, Quanquan C. Liu

2603.02505 2026-03-04 cs.CV

SGMA: Semantic-Guided Modality-Aware Segmentation for Remote Sensing with Incomplete Multimodal Data

Lekang Wen, Liang Liao, Jing Xiao, Mi Wang

详情

英文摘要

Multimodal semantic segmentation integrates complementary information from diverse sensors for remote sensing Earth observation. However, practical systems often encounter missing modalities due to sensor failures or incomplete coverage, termed Incomplete Multimodal Semantic Segmentation (IMSS). IMSS faces three key challenges: (1) multimodal imbalance, where dominant modalities suppress fragile ones; (2) intra-class variation in scale, shape, and orientation across modalities; and (3) cross-modal heterogeneity with conflicting cues producing inconsistent semantic responses. Existing methods rely on contrastive learning or joint optimization, which risk over-alignment, discarding modality-specific cues or imbalanced training, favoring robust modalities, while largely overlooking intra-class variation and cross-modal heterogeneity. To address these limitations, we propose the Semantic-Guided Modality-Aware (SGMA) framework, which ensures balanced multimodal learning while reducing intra-class variation and reconciling cross-modal inconsistencies through semantic guidance. SGMA introduces two complementary plug-and-play modules: (1) Semantic-Guided Fusion (SGF) module extracts multi-scale, class-wise semantic prototypes that capture consistent categorical representations across modalities, estimates per-modality robustness based on prototype-feature alignment, and performs adaptive fusion weighted by robustness scores to mitigate intra-class variation and cross-modal heterogeneity; (2) Modality-Aware Sampling (MAS) module leverages robustness estimations from SGF to dynamically reweight training samples, prioritizing challenging samples from fragile modalities to address modality imbalance. Extensive experiments across multiple datasets and backbones demonstrate that SGMA consistently outperforms state-of-the-art methods, with particularly significant improvements in fragile modalities.

URL PDF HTML ☆

赞 0 踩 0

2603.02500 2026-03-04 cs.RO

Instant and Reversible Adhesive-free Bonding Between Silicones and Glossy Papers for Soft Robotics

Takumi Shibuya, Kazuya Murakami, Akitsu Shigetou, Jun Shintake

2603.02497 2026-03-04 cs.CV

WTHaar-Net: a Hybrid Quantum-Classical Approach

Vittorio Palladino, Tsai Idden, Ahmet Enis Cetin

Comments 16 pages, 5 images