arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.18420 2026-03-27 cs.LG cs.AI

Gradient Regularized Natural Gradients

Satya Prakash Dash, Hossein Abdi, Wei Pan, Samuel Kaski, Mingfei Sun

详情

英文摘要

Gradient regularization (GR) has been shown to improve the generalizability of trained models. While Natural Gradient Descent has been shown to accelerate optimization in the initial phase of training, little attention has been paid to how the training dynamics of second-order optimizers can benefit from GR. In this work, we propose Gradient-Regularized Natural Gradients (GRNG), a family of scalable second-order optimizers that integrate explicit gradient regularization with natural gradient updates. Our framework introduces two frequentist algorithms: Regularized Explicit Natural Gradient (RENG), which utilizes double backpropagation to explicitly minimize the gradient norm, and Regularized Implicit Natural Gradient (RING), which incorporates regularization implicitly into the update direction. We also propose a Bayesian variant based on a Regularized-Kalman formulation that eliminates the need for FIM inversion entirely. We establish convergence guarantees for GRNG, showing that gradient regularization improves stability and enables convergence to global minima. Empirically, we demonstrate that GRNG consistently enhances both optimization speed and generalization compared to first-order methods (SGD, AdamW) and second-order baselines (K-FAC, Sophia), with strong results on vision and language benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2601.08750 2026-03-27 cs.CL

A Geolocation-Aware Multimodal Approach for Ecological Prediction

Valerie Zermatten, Chiara Vanalli, Gencer Sumbul, Diego Marcos, Devis Tuia

Comments under review

详情

英文摘要

While integrating multiple modalities has the potential to improve environmental monitoring, current approaches struggle to combine data sources with heterogeneous formats or contents. A central difficulty arises when combining continuous gridded data (e.g., remote sensing) with sparse and irregular point observations such as species records. Existing geostatistical and deep-learning-based approaches typically operate on a single modality or focus on spatially aligned inputs, and thus cannot seamlessly overcome this difficulty. We propose a Geolocation-Aware MultiModal Approach (GAMMA), a transformer-based fusion approach designed to integrate heterogeneous ecological data using explicit spatial context. Instead of interpolating observations into a common grid, GAMMA first represents all inputs as location-aware embeddings that preserve spatial relationships between samples. GAMMA dynamically selects relevant neighbours across modalities and spatial scales, enabling the model to jointly exploit continuous remote sensing imagery and sparse geolocated observations. We evaluate GAMMA on the task of predicting 103 environmental variables from the SWECO25 data cube across Switzerland. Inputs combine aerial imagery with biodiversity observations from GBIF and textual habitat descriptions from Wikipedia, provided by the EcoWikiRS dataset. Experiments show that multimodal fusion consistently improves prediction performance over single-modality baselines and that explicit spatial context further enhances model accuracy. The flexible architecture of GAMMA also allows to analyse the contribution of each modality through controlled ablation experiments. These results demonstrate the potential of location-aware multimodal learning for integrating heterogeneous ecological data and for supporting large-scale environmental mapping tasks and biodiversity monitoring.

URL PDF HTML ☆

赞 0 踩 0

2601.04033 2026-03-27 cs.CV

Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model

Yuan Wang, Borui Liao, Huijuan Huang, Jinda Lu, Ouxiang Li, Kuien Liu, Meng Wang, Xiang Wang

2601.00759 2026-03-27 cs.CV

Unified Primitive Proxies for Structured Shape Completion

Zhaiyu Chen, Yuqing Wang, Xiao Xiang Zhu

Comments CVPR 2026

2601.00428 2026-03-27 cs.LG

Interpretable ML Under the Microscope: Performance, Meta-Features, and the Regression-Classification Predictability Gap

Mattia Billa, Giovanni Orlandi, Veronica Guidetti, Federica Mandreoli

Comments 36 pages, new experimental findings added

2601.00216 2026-03-27 cs.CL

From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark

Jinning Zhang, Jie Song, Wenhui Tu, Zecheng Li, Jingxuan Li, Jin Li, Xuan Liu, Taole Sha, Zichen Wei, Yan Li

Comments 18 pages, 3 figures, 9 tables

2512.23042 2026-03-27 cs.CV

3D sans 3D Scans: Scalable Pre-training from Video-Generated Point Clouds

Ryousuke Yamada, Kohsuke Ide, Yoshihiro Fukuhara, Hirokatsu Kataoka, Gilles Puy, Andrei Bursuc, Yuki M. Asano

Comments Accepted to CVPR 2026. Project page: https://ryosuke-yamada.github.io/lam3c/

2512.22854 2026-03-27 cs.CV cs.GR cs.LG

ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning

Bangya Liu, Xinyu Gong, Zelin Zhao, Ziyang Song, Yulei Lu, Suhui Wu, Jun Zhang, Suman Banerjee, Hao Zhang

2512.20821 2026-03-27 cs.LG

Divided We Fall: Defending Against Adversarial Attacks via Soft-Gated Fractional Mixture-of-Experts with Randomized Adversarial Training

Mohammad Meymani, Roozbeh Razavi-Far

2512.20749 2026-03-27 cs.LG cs.AI

Stabilizing Multimodal Autoencoders: A Theoretical and Empirical Analysis of Fusion Strategies

Diyar Altinses, Andreas Schwung

2512.19918 2026-03-27 cs.CV

Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs

Houston H. Zhang, Tao Zhang, Baoze Lin, Yuanqi Xue, Yincheng Zhu, Huan Liu, Li Gu, Linfeng Ye, Ziqiang Wang, Xinxin Zuo, Yang Wang, Yuanhao Yu, Zhixiang Chi

Comments CVPR 2026, Code: https://github.com/Djanghao/widget2code

2512.17900 2026-03-27 cs.CV cs.RO

Diffusion Forcing for Multi-Agent Interaction Sequence Modeling

Vongani H. Maluleke, Kie Horiuchi, Lea Wilken, Evonne Ng, Jitendra Malik, Angjoo Kanazawa

Comments Project page: https://von31.github.io/MAGNet/ ; Code: https://github.com/Von31/MAGNet-code

2512.14698 2026-03-27 cs.CV cs.AI cs.CL cs.MM

TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs

Jun Zhang, Teng Wang, Yuying Ge, Yixiao Ge, Xinhao Li, Ying Shan, Limin Wang

Comments CVPR 2026. Website: https://timelens-arc-lab.github.io/

2512.13840 2026-03-27 cs.CV

MoLingo: Motion-Language Alignment for Text-to-Motion Generation

Yannan He, Garvita Tiwari, Xiaohan Zhang, Pankaj Bora, Tolga Birdal, Jan Eric Lenssen, Gerard Pons-Moll

Comments Accepted by CVPR 2026. Project page: https://hynann.github.io/molingo/MoLingo.html

2512.10660 2026-03-27 cs.CV

Closing the Navigation Compliance Gap in End-to-end Autonomous Driving

Hanfeng Wu, Marlon Steiner, Michael Schmidt, Alvaro Marcos-Ramiro, Christoph Stiller

2512.10461 2026-03-27 cs.LG cs.AI math.OC

T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method

Haoyu Zhu, Yao Zhang, Jiashen Ren, Qingchun Hou

详情

DOI: 10.1609/aaai.v40i17.38459

英文摘要

Neural network constraint satisfaction is crucial for safety-critical applications such as power system optimization, robotic path planning, and autonomous driving. However, existing constraint satisfaction methods face efficiency-applicability trade-offs, with hard constraint methods suffering from either high computational complexity or restrictive assumptions on constraint structures. The Sampling Kaczmarz-Motzkin (SKM) method is a randomized iterative algorithm for solving large-scale linear inequality systems with favorable convergence properties, but its argmax operations introduce non-differentiability, posing challenges for neural network applications. This work proposes the Trainable Sampling Kaczmarz-Motzkin Network (T-SKM-Net) framework and, for the first time, systematically integrates SKM-type methods into neural network constraint satisfaction. The framework transforms mixed constraint problems into pure inequality problems through null space transformation, employs SKM for iterative solving, and maps solutions back to the original constraint space, efficiently handling both equality and inequality constraints. We provide theoretical proof of post-processing effectiveness in expectation and end-to-end trainability guarantees based on unbiased gradient estimators, demonstrating that despite non-differentiable operations, the framework supports standard backpropagation. On the DCOPF case118 benchmark, our method achieves 4.27ms/item GPU serial forward inference with 0.0025% max optimality gap with post-processing mode and 5.25ms/item with 0.0008% max optimality gap with joint training mode, delivering over 25$\times$ speedup compared to the pandapower solver while maintaining zero constraint violations under given tolerance.

URL PDF HTML ☆

赞 0 踩 0

2512.09270 2026-03-27 cs.CV

MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification

Sangwoon Kwak, Weeyoung Kwon, Jun Young Jeong, Geonho Kim, Won-Sik Cheong, Jihyong Oh

Comments CVPR 2026 (camera ready ver.). The first two authors contributed equally to this work (equal contribution). Please visit our project page at https://cmlab-korea.github.io/MoRel/

2512.08985 2026-03-27 cs.CV

Verifier Threshold: An Efficient Test-Time Scaling Approach for Image Generation

Vignesh Sundaresha, Akash Haridas, Vikram Appia, Lav R. Varshney

Comments ICLR 2026 ReALM-Gen and DeLTa

2512.07885 2026-03-27 cs.LG cs.AI

ByteStorm: a multi-step data-driven approach for Tropical Cyclones detection and tracking

Davide Donno, Donatello Elia, Gabriele Accarino, Marco De Carlo, Enrico Scoccimarro, Silvio Gualdi

Comments 26 pages, 17 figures

2512.05272 2026-03-27 cs.CV

Inferring Compositional 4D Scenes without Ever Seeing One

Ahmet Berke Gokmen, Ajad Chhatkuli, Luc Van Gool, Danda Pani Paudel

Comments Project page: https://github.com/insait-institute/COM4D

2512.02787 2026-03-27 cs.RO cs.CV

Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols

Xianchao Zeng, Xinyu Zhou, Youcheng Li, Jiayou Shi, Tianle Li, Liangming Chen, Lei Ren, Yong-Lu Li

Comments Accepted by CVPR 2026. Project Website: https://x1nyuzhou.github.io/vifailback.github.io/

2512.01906 2026-03-27 cs.LG

Delays in Spiking Neural Networks: A State Space Model Approach

Sanja Karilanova, Subhrakanti Dey, Ayça Özçelikkale

2512.00939 2026-03-27 cs.RO cs.AI

Constant-Time Motion Planning with Manipulation Behaviors

Nayesha Gandotra, Itamar Mishani, Maxim Likhachev

Comments In submission

2511.22344 2026-03-27 cs.LG

Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active Learning

Denis Huseljic, Marek Herde, Lukas Rauch, Paul Hahn, Bernhard Sick

Comments Accepted at CVPR 2026

2511.20721 2026-03-27 cs.CV cs.AI cs.LG cs.NE

Foundry: Distilling 3D Foundation Models for the Edge

Guillaume Letellier, Siddharth Srivastava, Frédéric Jurie, Gaurav Sharma

Comments Accepted at CVPR 2026

2511.18822 2026-03-27 cs.CV

DiP: Taming Diffusion Models in Pixel Space

Zhennan Chen, Junwei Zhu, Xu Chen, Jiangning Zhang, Xiaobin Hu, Hanzhen Zhao, Chengjie Wang, Jian Yang, Ying Tai

Comments Accepted by CVPR 2026

2511.15186 2026-03-27 cs.CV

Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

Geon Choi, Hangyul Yoon, Hyunju Shin, Hyunki Park, Sang Hoon Seo, Eunho Yang, Edward Choi

Comments Camera-ready version for CVPR 2026

2511.07436 2026-03-27 cs.AI

Analysing Environmental Efficiency in AI for X-Ray Diagnosis

Liam Kearns

Comments Accepted for publication in Journal of AI. The final published version is available at https://doi.org/10.61969/jai.1838517

详情

DOI: 10.61969/jai.1838517
Journal ref: Journal of AI 10 (2026) 37-55

英文摘要

The integration of AI tools into medical applications has aimed to improve the efficiency of diagnosis. The emergence of large language models (LLMs), such as ChatGPT and Claude, has expanded this integration even further despite a concern for their environmental impact. Because of LLM versatility and ease of use through APIs, these larger models are often utilised even though smaller, custom models can be used instead. In this paper, LLMs and small discriminative models are integrated into a Mendix application to detect Covid-19 in chest X-rays. These discriminative models are also used to provide knowledge bases for LLMs to improve accuracy. This provides a benchmark study of 14 different model configurations for comparison of diagnostic accuracy and environmental impact. The findings indicated that while smaller models reduced the carbon footprint of the application, the output was biased towards a positive diagnosis and the output probabilities were lacking confidence. Meanwhile, restricting LLMs to only give probabilistic output caused poor performance in both accuracy and carbon footprint, demonstrating the risk of using LLMs as a universal AI solution. While using the smaller LLM GPT-4.1-Nano reduced the carbon footprint by 94.2% compared to the larger models, this was still disproportionate to the discriminative models; the most efficient solution was the Covid-Net model. Although it had a larger carbon footprint than other small models, its carbon footprint was 99.9% less than when using GPT-4.5-Preview, whilst achieving an accuracy of 95.5%, the highest of all models examined. This paper contributes to knowledge by comparing generative and discriminative models in Covid-19 detection as well as highlighting the environmental risk of using generative tools for classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2511.05878 2026-03-27 cs.LG cs.SE

FusionLog: Cross-System Log-based Anomaly Detection via Fusion of General and Proprietary Knowledge

Xinlong Zhao, Tong Jia, Minghua He, Xixuan Yang, Ying Li

Comments 12 pages, 5 figures, and 2 tables

详情

英文摘要

Log-based anomaly detection is critical for ensuring the stability and reliability of web systems. One of the key problems in this task is the lack of sufficient labeled logs, which limits the rapid deployment in new systems. Existing works usually leverage large-scale labeled logs from a mature web system and a small amount of labeled logs from a new system, using transfer learning to extract and generalize general knowledge across both domains. However, these methods focus solely on the transfer of general knowledge and neglect the disparity and potential mismatch between such knowledge and the proprietary knowledge of target system, thus constraining performance. To address this limitation, we propose FusionLog, a novel zero-label cross-system log-based anomaly detection method that effectively achieves the fusion of general and proprietary knowledge, enabling cross-system generalization without any labeled target logs. Specifically, we first design a training-free router based on semantic similarity that dynamically partitions unlabeled target logs into 'general logs' and 'proprietary logs.' For general logs, FusionLog employs a small model based on system-agnostic representation meta-learning for direct training and inference, inheriting the general anomaly patterns shared between the source and target systems. For proprietary logs, we iteratively generate pseudo-labels and fine-tune the small model using multi-round collaborative knowledge distillation and fusion based on large language model (LLM) and small model (SM) to enhance its capability to recognize anomaly patterns specific to the target system. Experimental results on three public log datasets from different systems show that FusionLog achieves over 90% F1-score under a fully zero-label setting, significantly outperforming state-of-the-art cross-system log-based anomaly detection methods.

URL PDF HTML ☆

赞 0 踩 0

2510.18840 2026-03-27 cs.CV cs.CL

See the Text: From Tokenization to Visual Reading

Ling Xing, Rui Yan, Alex Jinpeng Wang, Zechao Li, Jinhui Tang