arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.06366 2026-04-09 cs.LG stat.ML

Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks

Guillaume Corlouer, Avi Semler, Alexander Strang, Alexander Gietelink Oldenziel

详情

英文摘要

Deep linear networks (DLNs) are used as an analytically tractable model of the training dynamics of deep neural networks. While gradient descent in DLNs is known to exhibit saddle-to-saddle dynamics, the impact of stochastic gradient descent (SGD) noise on this regime remains poorly understood. We investigate the dynamics of SGD during training of DLNs in the saddle-to-saddle regime. We model the training dynamics as stochastic Langevin dynamics with anisotropic, state-dependent noise. Under the assumption of aligned and balanced weights, we derive an exact decomposition of the dynamics into a system of one-dimensional per-mode stochastic differential equations. This establishes that the maximal diffusion along a mode precedes the corresponding feature being completely learned. We also derive the stationary distribution of SGD for each mode: in the absence of label noise, its marginal distribution along specific features coincides with the stationary distribution of gradient flow, while in the presence of label noise it approximates a Boltzmann distribution. Finally, we confirm experimentally that the theoretical results hold qualitatively even without aligned or balanced weights. These results establish that SGD noise encodes information about the progression of feature learning but does not fundamentally alter the saddle-to-saddle dynamics.

URL PDF HTML ☆

赞 0 踩 0

2604.06365 2026-04-09 cs.CL cs.AI

A Severity-Based Curriculum Learning Strategy for Arabic Medical Text Generation

Ahmed Alansary, Molham Mohamed, Ali Hamdi

Comments 10 pages, 2 figures, 2 tables, ICTIS2026

2604.06356 2026-04-09 cs.CL cs.AI

In-Context Learning in Speech Language Models: Analyzing the Role of Acoustic Features, Linguistic Structure, and Induction Heads

Charlotte Pouw, Hosein Mohebbi, Afra Alishahi, Willem Zuidema

Comments Submitted to COLM 2026

2604.06352 2026-04-09 cs.CV cs.AI cs.MM eess.IV

DietDelta: A Vision-Language Approach for Dietary Assessment via Before-and-After Images

Gautham Vinod, Siddeshwar Raghavan, Bruce Coburn, Fengqing Zhu

2604.06349 2026-04-09 cs.LG cs.AI cs.CV

Bi-Level Optimization for Single Domain Generalization

Marzi Heidari, Hanping Zhang, Hao Yan, Yuhong Guo

Comments CVPR Findings Track, 2026

2604.06347 2026-04-09 cs.CV

Evidence-Based Actor-Verifier Reasoning for Echocardiographic Agents

Peng Huang, Yiming Wang, Yineng Chen, Liangqiao Gui, Hui Guo, Bo Peng, Shu Hu, Xi Wu, Tsao Connie, Hongtu Zhu, Balakrishnan Prabhakaran, Xin Wang

Comments cvprw 2026(AIMS)

2604.06346 2026-04-09 cs.CL cs.AI

Severity-Aware Weighted Loss for Arabic Medical Text Generation

Ahmed Alansary, Molham Mohamed, Ali Hamdi

Comments 10 pages, 1 figure, 2 tables, ICTIS2026

2604.06341 2026-04-09 cs.RO

Occlusion Handling by Pushing for Enhanced Fruit Detection

Ege Gursoy, Dana Kulić, Andrea Cherubini

详情

DOI: 10.1109/IROS58592.2024.10802174
Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

英文摘要

In agricultural robotics, effective observation and localization of fruits present challenges due to occlusions caused by other parts of the tree, such as branches and leaves. These occlusions can result in false fruit localization or impede the robot from picking the fruit. The objective of this work is to push away branches that block the fruit's view to increase their visibility. Our setup consists of an RGB-D camera and a robot arm. First, we detect the occluded fruit in the RGB image and estimate its occluded part via a deep learning generative model in the depth space. The direction to push to clear the occlusions is determined using classic image processing techniques. We then introduce a 3D extension of the 2D Hough transform to detect straight line segments in the point cloud. This extension helps detect tree branches and identify the one mainly responsible for the occlusion. Finally, we clear the occlusion by pushing the branch with the robot arm. Our method uses a combination of deep learning for fruit appearance estimation, classic image processing for push direction determination, and 3D Hough transform for branch detection. We validate our perception methods through real data under different lighting conditions and various types of fruits (i.e. apple, lemon, orange), achieving improved visibility and successful occlusion clearance. We demonstrate the practical application of our approach through a real robot branch pushing demonstration.

URL PDF HTML ☆

赞 0 踩 0

2604.06339 2026-04-09 cs.CV

Evolution of Video Generative Foundations

Teng Hu, Jiangning Zhang, Hongrui Huang, Ran Yi, Zihan Su, Jieyu Weng, Zhucun Xue, Lizhuang Ma, Ming-Hsuan Yang, Dacheng Tao

2604.06336 2026-04-09 cs.LG cs.AI

BiScale-GTR: Fragment-Aware Graph Transformers for Multi-Scale Molecular Representation Learning

Yi Yang, Ovidiu Daescu

2604.06332 2026-04-09 cs.CV cs.LG

Telescope: Learnable Hyperbolic Foveation for Ultra-Long-Range Object Detection

Parker Ewen, Dmitriy Rivkin, Mario Bijelic, Felix Heide

Comments Project website: https://light.princeton.edu/telescope

2604.06330 2026-04-09 cs.CL

STDec: Spatio-Temporal Stability Guided Decoding for dLLMs

Yuzhe Chen, Jiale Cao, Xuyang Liu, Jin Xie, Aiping Yang, Yanwei Pang

Comments Homepage: https://yzchen02.github.io/STDec

2604.06327 2026-04-09 cs.SD cs.AI

A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech

Jia-Hong Huang, Seulgi Kim, Yi Chieh Liu, Yixian Shen, Hongyi Zhu, Prayag Tiwari, Stevan Rudinac, Evangelos Kanoulas

Comments The paper has been accepted by the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

2604.06298 2026-04-09 cs.LG

Limits of Difficulty Scaling: Hard Samples Yield Diminishing Returns in GRPO-Tuned SLMs

Suraj Yadav, Siddharth Yadav, Parth Goyal

Comments Accepted at ICLR Workshop 2026 ICBINB

2604.06287 2026-04-09 cs.LG cs.NA math.NA physics.comp-ph physics.flu-dyn

Asymptotic-Preserving Neural Networks for Viscoelastic Parameter Identification in Multiscale Blood Flow Modeling

Giulia Bertaglia, Raffaella Fiamma Cabini

2604.06277 2026-04-09 cs.AI cs.CL cs.LG

Weakly Supervised Distillation of Hallucination Signals into Transformer Representations

Shoaib Sadiq Salehmohamed, Jinal Prashant Thakkar, Hansika Aredla, Shaik Mohammed Omar, Shalmali Ayachit

Comments 20 pages, 6 figures, 6 tables. Introduces a 15k-sample representation-level hallucination dataset with full transformer hidden states and multi-signal weak supervision. Evaluates 5 probing architectures and demonstrates internal hallucination detection without external inference-time signals. Includes held-out test evaluation and deployment benchmarks

详情

英文摘要

Existing hallucination detection methods for large language models (LLMs) rely on external verification at inference time, requiring gold answers, retrieval systems, or auxiliary judge models. We ask whether this external supervision can instead be distilled into the model's own representations during training, enabling hallucination detection from internal activations alone at inference time. We introduce a weak supervision framework that combines three complementary grounding signals: substring matching, sentence embedding similarity, and an LLM as a judge verdict to label generated responses as grounded or hallucinated without human annotation. Using this framework, we construct a 15000-sample dataset from SQuAD v2 (10500 train/development samples and a separate 5000-sample test set), where each example pairs a LLaMA-2-7B generated answer with its full per-layer hidden states and structured hallucination labels. We then train five probing classifiers: ProbeMLP (M0), LayerWiseMLP (M1), CrossLayerTransformer (M2), HierarchicalTransformer (M3), and CrossLayerAttentionTransformerV2 (M4), directly on these hidden states, treating external grounding signals as training-time supervision only. Our central hypothesis is that hallucination detection signals can be distilled into transformer representations, enabling internal detection without any external verification at inference time. Results support this hypothesis. Transformer-based probes achieve the strongest discrimination, with M2 performing best on 5-fold average AUC/F1, and M3 performing best on both single-fold validation and held-out test evaluation. We also benchmark inference efficiency: probe latency ranges from 0.15 to 5.62 ms (batched) and 1.55 to 6.66 ms (single sample), while end-to-end generation plus probe throughput remains approximately 0.231 queries per second, indicating negligible practical overhead.

URL PDF HTML ☆

赞 0 踩 0

2604.06268 2026-04-09 cs.LG

RAGEN-2: Reasoning Collapse in Agentic RL

Zihan Wang, Chi Gui, Xing Jin, Qineng Wang, Licheng Liu, Kangrui Wang, Shiqi Chen, Linjie Li, Zhengyuan Yang, Pingyue Zhang, Yiping Lu, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li

2604.06267 2026-04-09 cs.LG cs.AI

MO-RiskVAE: A Multi-Omics Variational Autoencoder for Survival Risk Modeling in Multiple MyelomaMO-RiskVAE

Zixuan Chen, Heng Zhang, YuPeng Qin, WenPeng Xing, Qiang Wang, Da Wang, Changting Lin, Meng Han

2604.06265 2026-04-09 cs.LG cond-mat.stat-mech quant-ph

SMT-AD: a scalable quantum-inspired anomaly detection approach

Apimuk Sornsaeng, Si Min Chan, Wenxuan Zhang, Swee Liang Wong, Joshua Lim, Dario Poletti

Comments 11 pages, 5 figures

2604.06260 2026-04-09 cs.LG cs.AI

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

Ahsan Bilal, Muhammad Ahmed Mohsin, Muhammad Umer, Asad Aali, Muhammad Usman Khanzada, Muhammad Usman Rafique, Zihao He, Emily Fox, Dean F. Hougen

Comments Submitted to COLM 2026

2604.06256 2026-04-09 cs.LG cs.AI

Spectral Edge Dynamics Reveal Functional Modes of Learning

Yongzhong Xu

Comments 17 pages, 1 figure

2604.06253 2026-04-09 cs.LG cs.AI cs.PL

FLeX: Fourier-based Low-rank EXpansion for multilingual transfer

Gaurav Narasimhan

Comments 19 pages, 25 figures, Stanford CS224N Custom Project

2604.06251 2026-04-09 cs.AI cs.LG stat.AP

Toward Reducing Unproductive Container Moves: Predicting Service Requirements and Dwell Times

Elena Villalobos, Adolfo De Unánue T., Fernanda Sobrino, David Aké, Stephany Cisneros, Jorge Lecona, Alejandra Matadamaz

Comments Preprint, 20 pages, 9 figures, 5 tables (including appendices)

2604.06250 2026-04-09 cs.CV cs.AI

DISSECT: Diagnosing Where Vision Ends and Language Priors Begin in Scientific VLMs

Dikshant Kukreja, Kshitij Sah, Karan Goyal, Mukesh Mohania, Vikram Goyal

2604.06246 2026-04-09 cs.CV

No-reference based automatic parameter optimization for iterative reconstruction using a novel search space aware crow search algorithm

Poorya MohammadiNasab, Ander Biguri, Philipp Steininger, Peter Keuschnigg, Lukas Lamminger, Agnieszka Lach, S M Ragib Shahriar Islam, Anna Breger, Clemens Karner, Carola-Bibiane Schönlieb, Wolfgang Birkfellner, Sepideh Hatamikia

详情

英文摘要

Iterative reconstruction technique's ability to reduce radiation exposure by using fewer projections has attracted significant attention. However, these methods typically require a precise tuning of several hyperparameters, which can have a major impact on reconstruction quality. Manually setting these parameters is time-consuming and increases the workload for human operators. In this paper, we introduce a novel fully automatic parameter optimization framework that can be applied to a wide range of Cone-beam computed tomography (CBCT) iterative reconstruction algorithms to determine optimal parameters without requiring a reference reconstruction. The proposed method incorporates a modified crow search algorithm (CSA) featuring a superior set-dependent local search mechanism, a search-space-aware global search strategy, and an objective-driven balance between local and global search. Additionally, to ensure an effective initial population, we propose a chaotic diagonal linear uniform initialization scheme that accelerates algorithm convergence. The performance of the proposed framework was evaluated on three imaging machines and four real datasets, as well as three different iterative reconstruction methods with the highest number of tunable parameters, representing the most challenging senario. The results indicate that the proposed method could outperform manual settings and CSA, with an 4.19% improvement in average fitness and 4.89% and 3.82% improvements on CHILL@UK and RPI_AXIS, respectively, which are two benchmark no-reference learning-based quality metrics. In addition, the qualitative results clearly show the superiority of the proposed method by maintaining fine details sharply. The overall performance of the proposed framework across different comparison scenarios demonstrates its effectiveness and robustness across all cases.

URL PDF HTML ☆

赞 0 踩 0

2604.06245 2026-04-09 cs.CV

CraterBench-R: Instance-Level Crater Retrieval for Planetary Scale

Jichao Fang, Lei Zhang, Michael Phillips, Wei Luo

Comments Accepted at the EarthVision 2026 Workshop at CVPR 2026

详情

英文摘要

Impact craters are a cornerstone of planetary surface analysis. However, while most deep learning pipelines treat craters solely as a detection problem, critical scientific workflows such as catalog deduplication, cross-observation matching, and morphological analog discovery are inherently retrieval tasks. To address this, we formulate crater analysis as an instance-level image retrieval problem and introduce CraterBench-R, a curated benchmark featuring about 25,000 crater identities with multi-scale gallery views and manually verified queries spanning diverse scales and contexts. Our baseline evaluations across various architectures reveal that self-supervised Vision Transformers (ViTs), particularly those with in-domain pretraining, dominate the task, outperforming generic models with significantly more parameters. Furthermore, we demonstrate that retaining multiple ViT patch tokens for late-interaction matching dramatically improves accuracy over standard single-vector pooling. However, storing all tokens per image is operationally inefficient at a planetary scale. To close this efficiency gap, we propose instance-token aggregation, a scalable, training-free method that selects K seed tokens, assigns the remaining tokens to these seeds via cosine similarity, and aggregates each cluster into a single representative token. This approach yields substantial gains: at K=16, aggregation improves mAP by 17.9 points over raw token selection, and at K=64, it matches the accuracy of using all 196 tokens with significantly less storage. Finally, we demonstrate that a practical two-stage pipeline, with single-vector shortlisting followed by instance-token reranking, recovers 89-94% of the full late-interaction accuracy while searching only a small candidate set. The benchmark is publicly available at hf.co/datasets/jfang/CraterBench-R.

URL PDF HTML ☆

赞 0 踩 0

2604.06233 2026-04-09 cs.AI

Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules

Cameron Pattison, Lorenzo Manuali, Seth Lazar

Comments 9 pages body text, 38 pages total, 4 figures

2604.06228 2026-04-09 cs.LG cs.AI cs.CL cs.DS cs.IR cs.IT math.IT

Probabilistic Language Tries: A Unified Framework for Compression, Decision Policies, and Execution Reuse

Gregory Magarshak

Comments 24 pages, 2 figures

2604.06227 2026-04-09 cs.LG econ.EM

A Benchmark of Classical and Deep Learning Models for Agricultural Commodity Price Forecasting on A Novel Bangladeshi Market Price Dataset

Tashreef Muhammad, Tahsin Ahmed, Meherun Farzana, Md. Mahmudul Hasan, Abrar Eyasir, Md. Emon Khan, Mahafuzul Islam Shawon, Ferdous Mondol, Mahmudul Hasan, Muhammad Ibrahim

Comments 26 pages, 22 figures, 7 tables

2604.06216 2026-04-09 cs.CL cs.AI

Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses

Khizar Hussain, Bradley A. Malin, Zhijun Yin, Susannah Leigh Rose, Murat Kantarcioglu