arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.14861 2026-03-17 cs.CV cs.AI

Video Detector: A Dual-Phase Vision-Based System for Real-Time Traffic Intersection Control and Intelligent Transportation Analysis

Mustafa Fatih Şen, Halûk Gümüşkaya, Şenol Pazar

Comments 18 pages, 10 figures, 4 tables, preprint, the dataset is openly available

详情

英文摘要

Urban traffic management increasingly requires intelligent sensing systems capable of adapting to dynamic traffic conditions without costly infrastructure modifications. Vision-based vehicle detection has therefore become a key technology for modern intelligent transportation systems. This study presents Video Detector (VD), a dual-phase vision-based traffic intersection management system designed as a flexible and cost-effective alternative to traditional inductive loop detectors. The framework integrates a real-time module (VD-RT) for intersection control with an offline analytical module (VD-Offline) for detailed traffic behavior analysis. Three system configurations were implemented using SSD Inception v2, Faster R-CNN Inception v2, and CenterNet ResNet-50 V1 FPN, trained on datasets totaling 108,000 annotated images across 6-10 vehicle classes. Experimental results show detection performance of up to 90% test accuracy and 29.5 mAP@0.5, while maintaining real-time throughput of 37 FPS on HD video streams. Field deployments conducted in collaboration with Istanbul IT and Smart City Technologies Inc. (ISBAK) demonstrate stable operation under diverse environmental conditions. The system supports virtual loop detection, vehicle counting, multi-object tracking, queue estimation, speed analysis, and multiclass vehicle classification, enabling comprehensive intersection monitoring without the need for embedded road sensors. The annotated dataset and training pipeline are publicly released to support reproducibility. These results indicate that the proposed framework provides a scalable and deployable vision-based solution for intelligent transportation systems and smart-city traffic management.

URL PDF HTML ☆

赞 0 踩 0

2603.14856 2026-03-17 cs.CV

From Horizontal to Rotated: Cross-View Object Geo-Localization with Orientation Awareness

Chenlin Fu, Ao Gong, Yingying Zhu

2603.14853 2026-03-17 cs.SD

WhispSynth: Scaling Multilingual Whisper Corpus through Real Data Curation and A Novel Pitch-free Generative Framework

Tianyi Tan, Jiaxin Ye, Yuanming Zhang, Xiaohuai Le, Xianjun Xia, Chuanzeng Huang, Jing Lu

Comments Under Review

2603.14852 2026-03-17 cs.RO cs.SY eess.SY

Surgical Robot, Path Planning, Joint Space, Riemannian Manifolds

Yoshiki Yamamoto, Maina Sogabe, Shunichi Hirahara, Toshiki Kaisaki, Tetsuro Miyazaki, Kenji Kawashima

Comments 11 pages, 8 figures

2603.14850 2026-03-17 cs.CV cond-mat.mes-hall

From Artefact to Insight: Efficient Low-Rank Adaptation of BrushNet for Scanning Probe Microscopy Image Restoration

Ziwei Wei, Yao Shen, Wanheng Lu, Ghim Wei Ho, Kaiyang Zeng

Comments 37 pages, 7 figures, 7 tables, jounral paper

2603.14848 2026-03-17 cs.CV

Personalized Federated Learning with Residual Fisher Information for Medical Image Segmentation

Meilu Zhu, Yuxing Li, Zhiwei Wang, Edmund Y. Lam

Comments accepted by ISBI 2026

2603.14843 2026-03-17 cs.CL cs.AI

ContiGuard: A Framework for Continual Toxicity Detection Against Evolving Evasive Perturbations

Hankun Kang, Xin Miao, Jianhao Chen, Jintao Wen, Mayi Xu, Weiyu Zhang, Wenpeng Lu, Tieyun Qian

2603.14838 2026-03-17 cs.CL

The Impact of Ideological Discourses in RAG: A Case Study with COVID-19 Treatments

Elmira Salari, Maria Claudia Nunes Delfino, Hazem Amamou, José Victor de Souza, Shruti Kshirsagar, Alan Davoust, Anderson Avila

2603.14837 2026-03-17 cs.CV

DamageArbiter: A CLIP-Enhanced Multimodal Arbitration Framework for Hurricane Damage Assessment from Street-View Imagery

Yifan Yang, Lei Zou, Wenjing Gong, Kani Fu, Zongrong Li, Siqin Wang, Bing Zhou, Heng Cai, Hao Tian

详情

英文摘要

Analyzing street-view imagery with computer vision models for rapid, hyperlocal damage assessment is becoming popular and valuable in emergency response and recovery, but traditional models often act like black boxes, lacking interpretability and reliability. This study proposes a multimodal disagreement-driven Arbitration framework powered by Contrastive Language-Image Pre-training (CLIP) models, DamageArbiter, to improve the accuracy, interpretability, and robustness of damage estimation from street-view imagery. DamageArbiter leverages the complementary strengths of unimodal and multimodal models, employing a lightweight logistic regression meta-classifier to arbitrate cases of disagreement. Using 2,556 post-disaster street-view images, paired with both manually generated and large language model (LLM)-generated text descriptions, we systematically compared the performance of unimodal models (including image-only and text-only models), multimodal CLIP-based models, and DamageArbiter. Notably, DamageArbiter improved the accuracy from 74.33% (ViT-B/32, image-only) to 82.79%, surpassing the 80% accuracy threshold and achieving an absolute improvement of 8.46% compared to the strongest baseline model. Beyond improvements in overall accuracy, compared to visual models relying solely on images, DamageArbiter, through arbitration of discrepancies between unimodal and multimodal predictions, mitigates common overconfidence errors in visual models, especially in situations where disaster visual cues are ambiguous or subject to interference, reducing overconfidence but incorrect predictions. We further mapped and analyzed geo-referenced predictions and misclassifications to compare model performance across locations. Overall, this work advances street-view-based disaster assessment from coarse severity classification toward a more reliable and interpretable framework.

URL PDF HTML ☆

赞 0 踩 0

2603.14833 2026-03-17 cs.LG cs.AI

Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections

William Peng, Josheev Rai, Kevin Tseng, Siwei Wang, Sean Wu

2603.14825 2026-03-17 cs.CV cs.AI

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Yewon Han, Yumin Seol, EunGyung Kong, Minsoo Jo, Taesup Kim

2603.14822 2026-03-17 cs.CV

RadarXFormer: Robust Object Detection via Cross-Dimension Fusion of 4D Radar Spectra and Images for Autonomous Driving

Yue Sun, Yeqiang Qian, Zhe Wang, Tianhui Li, Chunxiang Wang, Ming Yang

2603.14819 2026-03-17 cs.CV cs.AI

RAZOR: Ratio-Aware Layer Editing for Targeted Unlearning in Vision Transformers and Diffusion Models

Ravi Ranjan, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou

Comments 18 pages, 6 figures, 8 tables, accepted to the CVPR 2026 and to appear in the Findings Track Proceedings of IEEE/CVF Conference

2603.14816 2026-03-17 cs.CV

M2IR: Proactive All-in-One Image Restoration via Mamba-style Modulation and Mixture-of-Experts

Shiwei Wang, Yongzhen Wang, Bingwen Hu, Liyan Zhang, Xiao-Ping Zhang, Mingqiang Wei

2603.13228 2026-03-17 cs.LG cs.AI cs.CV cs.RO

PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization

Yangsong Zhang, Anujith Muraleedharan, Rikhat Akizhanov, Abdul Ahad Butt, Gül Varol, Pascal Fua, Fabio Pizzati, Ivan Laptev

Comments Project page: https://mael-zys.github.io/PhysMoDPO/

2603.13049 2026-03-17 cs.LG

3DTCR: A Physics-Based Generative Framework for Vortex-Following 3D Reconstruction to Improve Tropical Cyclone Intensity Forecasting

Jun Liu, Xiaohui Zhong, Kai Zheng, Jiarui Li, Yifei Li, Tao Zhou, Wenxu Qian, Shun Dai, Ruian Tie, Yangyang Zhao, Hao Li

2603.12908 2026-03-17 cs.RO

GoalSwarm: Multi-UAV Semantic Coordination for Open-Vocabulary Object Navigation

MoniJesu Wonders James, Amir Atef Habel, Aleksey Fedoseev, Dzmitry Tsetserokou

Comments 6 pages, 2 figures

2603.12794 2026-03-17 cs.LG math.FA

A Fractional Fox H-Function Kernel for Support Vector Machines: Robust Classification via Weighted Transmutation Operators

Gustavo Dorrego

Comments 7 pages, 4 figures

2603.12718 2026-03-17 cs.CV

The COTe score: A decomposable framework for evaluating Document Layout Analysis models

Jonathan Bourne, Mwiza Simbeye, Ishtar Govia

Comments 6906 words, 4 Figures, 10 Tables,

2603.12683 2026-03-17 cs.CL cs.AI

Experimental evidence of progressive ChatGPT models self-convergence

Konstantinos F. Xylogiannopoulos, Petros Xanthopoulos, Panagiotis Karampelas, Georgios A. Bakamitsos

2603.12664 2026-03-17 cs.CL cs.AI

From Text to Forecasts: Bridging Modality Gap with Temporal Evolution Semantic Space

Lehui Li, Yuyao Wang, Jisheng Yan, Wei Zhang, Jinliang Deng, Haoliang Sun, Zhongyi Han, Yongshun Gong

Comments 15 pages, 6 figures

2603.12248 2026-03-17 cs.LG

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich

2603.11750 2026-03-17 cs.LG

Mitigating the Multiplicity Burden: The Role of Calibration in Reducing Predictive Multiplicity of Classifiers

Mustafa Cavus

Comments 16 pages, 3 figures

2603.11664 2026-03-17 cs.CV

BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder

Siquan Huang, Yijiang Li, Ningzhi Gao, Xingfu Yan, Leyu Shi, Ying Gao

Comments 17 pages, 10 figures, 6 tables

2603.10969 2026-03-17 cs.LG cs.CL cs.CR cs.SE

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi, Angela Makhanu, Gaëtan Peter, Roos Wensveen

2603.10910 2026-03-17 cs.CL

GLM-OCR Technical Report

Shuaiqi Duan, Yadong Xue, Weihan Wang, Zhe Su, Huan Liu, Sheng Yang, Guobing Gan, Guo Wang, Zihan Wang, Shengdong Yan, Dexin Jin, Yuxuan Zhang, Guohong Wen, Yanfeng Wang, Yutao Zhang, Xiaohan Zhang, Wenyi Hong, Yukuo Cen, Da Yin, Bin Chen, Wenmeng Yu, Xiaotao Gu, Jie Tang

2603.09957 2026-03-17 cs.AI cs.CL cs.LG

Think Before You Lie: How Reasoning Leads to Honesty

Ann Yuan, Asma Ghandeharioun, Carter Blum, Alicia Machado, Jessica Hoffmann, Daphne Ippolito, Martin Wattenberg, Lucas Dixon, Katja Filippova

2603.09783 2026-03-17 cs.RO cs.SY eess.SY

Lightweight 3D LiDAR-Based UAV Tracking: An Adaptive Extended Kalman Filtering Approach

Nivand Khosravi, Meysam Basiri, Rodrigo Ventura

Comments Presented at the 19th International Conference on Intelligent Autonomous Systems, IAS-19, Genoa, Italy, June 30 to July 4, 2025. To appear in the Springer post-proceedings of the conference

2603.09420 2026-03-17 cs.CV cs.AI cs.RO

Open-World Motion Forecasting

Nicolas Schischka, Nikhil Gosala, B Ravi Kiran, Senthil Yogamani, Abhinav Valada

Comments V2: Adapt author affiliation

详情

英文摘要

Motion forecasting aims to predict the future trajectories of dynamic agents in the scene, enabling autonomous vehicles to effectively reason about scene evolution. Existing approaches operate under the closed-world regime and assume fixed object taxonomy as well as access to high-quality perception. Therefore, they struggle in real-world settings where perception is imperfect and object taxonomy evolves over time. In this work, we bridge this fundamental gap by introducing open-world motion forecasting, a novel setting in which new object classes are sequentially introduced over time and future object trajectories are estimated directly from camera images. We tackle this setting by proposing the first end-to-end class-incremental motion forecasting framework to mitigate catastrophic forgetting while simultaneously learning to forecast newly introduced classes. When a new class is introduced, our framework employs a pseudo-labeling strategy to first generate motion forecasting pseudo-labels for all known classes which are then processed by a vision-language model to filter inconsistent and over-confident predictions. Parallelly, our approach further mitigates catastrophic forgetting by using a novel replay sampling strategy that leverages query feature variance to sample previous sequences with informative motion patterns. Extensive evaluation on the nuScenes and Argoverse 2 datasets demonstrates that our approach successfully resists catastrophic forgetting and maintains performance on previously learned classes while improving adaptation to novel ones. Further, we demonstrate that our approach supports zero-shot transfer to real-world driving and naturally extends to end-to-end class-incremental planning, enabling continual adaptation of the full autonomous driving system. We provide the code at https://omen.cs.uni-freiburg.de.

URL PDF HTML ☆

赞 0 踩 0

2603.09205 2026-03-17 cs.CL cs.AI cs.LG

Emotion is Not Just a Label: Latent Emotional Factors in LLM Processing

Benjamin Reichman, Adar Avsian, Samuel Webster, Larry Heck