arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.01834 2026-04-03 cs.CV

Ranking-Guided Semi-Supervised Domain Adaptation for Severity Classification

Shota Harada, Ryoma Bise, Kiyohito Tanaka, Seiichi Uchida

详情

英文摘要

Semi-supervised domain adaptation leverages a few labeled and many unlabeled target samples, making it promising for addressing domain shifts in medical image analysis. However, existing methods struggle with severity classification due to unclear class boundaries. Severity classification involves naturally ordered class labels, complicating adaptation. We propose a novel method that aligns source and target domains using rank scores learned via ranking with class order. Specifically, Cross-Domain Ranking ranks sample pairs across domains, while Continuous Distribution Alignment aligns rank score distributions. Experiments on ulcerative colitis and diabetic retinopathy classification validate the effectiveness of our approach, demonstrating successful alignment of class-specific rank score distributions.

URL PDF HTML ☆

赞 0 踩 0

2604.01830 2026-04-03 cs.LG cs.SY eess.SY

Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids

Pantelis Dogoulis, Maxime Cordy

2604.01826 2026-04-03 cs.CV

SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers

Xiang Yang, Feifei Li, Mi Zhang, Geng Hong, Xiaoyu You, Min Yang

Comments CVPR26

详情

英文摘要

Recent Text-to-Image (T2I) models based on rectified-flow transformers (e.g., SD3, FLUX) achieve high generative fidelity but remain vulnerable to unsafe semantics, especially when triggered by multi-token interactions. Existing mitigation methods largely rely on fine-tuning or attention modulation for concept unlearning; however, their expensive computational overhead and design tailored to U-Net-based denoisers hinder direct adaptation to transformer-based diffusion models (e.g., MMDiT). In this paper, we conduct an in-depth analysis of the attention mechanism in MMDiT and find that unsafe semantics concentrate within interpretable, low-dimensional subspaces at head level, where a finite set of safety-critical heads is responsible for unsafe feature extraction. We further observe that perturbing the Rotary Positional Embedding (RoPE) applied to the query and key vectors can effectively modify some specific concepts in the generated images. Motivated by these insights, we propose SafeRoPE, a lightweight and fine-grained safe generation framework for MMDiT. Specifically, SafeRoPE first constructs head-wise unsafe subspaces by decomposing unsafe embeddings within safety-critical heads, and computes a Latent Risk Score (LRS) for each input vector via projection onto these subspaces. We then introduce head-wise RoPE perturbations that can suppress unsafe semantics without degrading benign content or image quality. SafeRoPE combines both head-wise LRS and RoPE perturbations to perform risk-specific head-wise rotation on query and key vector embeddings, enabling precise suppression of unsafe outputs while maintaining generation fidelity. Extensive experiments demonstrate that SafeRoPE achieves SOTA performance in balancing effective harmful content mitigation and utility preservation for safe generation of MMDiT. Codes are available at https://github.com/deng12yx/SafeRoPE.

URL PDF HTML ☆

赞 0 踩 0

2604.01791 2026-04-03 cs.CV

PTC-Depth: Pose-Refined Monocular Depth Estimation with Temporal Consistency

Leezy Han, Seunggyu Kim, Dongseok Shim, Hyeonbeom Lee

Comments Accepted at CVPR 2026

2604.01787 2026-04-03 cs.CL

DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment

Liang Zhu, Feiteng Fang, Yuelin Bai, Longze Chen, Zhexiang Zhang, Minghuan Tan, Min Yang

2604.01779 2026-04-03 cs.CL

Taming CATS: Controllable Automatic Text Simplification through Instruction Fine-Tuning with Control Tokens

Hanna Hubarava, Yingqiang Gao

2604.01777 2026-04-03 cs.CV

GardenDesigner: Encoding Aesthetic Principles into Jiangnan Garden Construction via a Chain of Agents

Mengtian Li, Fan Yang, Ruixue Xiong, Yiyan Fan, Zhifeng Xie, Zeyu Wang

Comments CVPR 2026, Project page: https://monad-cube.github.io/GardenDesigner

2604.01776 2026-04-03 cs.RO

Preferential Bayesian Optimization with Crash Feedback

Johanna Menn, David Stenger, Sebastian Trimpe

2604.01775 2026-04-03 cs.LG

Bridging Deep Learning and Integer Linear Programming: A Predictive-to-Prescriptive Framework for Supply Chain Analytics

Khai Banh Nghiep, Duc Nguyen Minh, Lan Hoang Thi

Comments 12 pages, 4 figures, 4 tables

2604.01769 2026-04-03 cs.LG

Dual-Attention Based 3D Channel Estimation

Xiangzhao Qin, Sha Hu

Comments 5 pages, 6 figures

2604.01766 2026-04-03 cs.CV cs.AI

FSKD: Monocular Forest Structure Inference via LiDAR-to-RGBI Knowledge Distillation

Taimur Khan, Hannes Feilhauer, Muhammad Jazib Zafar

Comments Paper in-review

2604.01765 2026-04-03 cs.CV cs.AI cs.RO

DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

Yang Zhou, Xiaofeng Wang, Hao Shao, Letian Wang, Guosheng Zhao, Jiangnan Shao, Jiagang Zhu, Tingdong Yu, Zheng Zhu, Guan Huang, Steven L. Waslander

Comments 11 pages, 4 figures; Project Website: https://drivedreamer-policy.github.io/

2604.01764 2026-04-03 cs.CV

Hidden Meanings in Plain Sight: RebusBench for Evaluating Cognitive Visual Reasoning

Seyed Amir Kasaei, Arash Marioriyad, Mahbod Khaleti, MohammadAmin Fazli, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban

Comments Accepted at ICLR 2026 Workshop: From Human Cognition to AI Reasoning (HCAIR)

2604.01763 2026-04-03 cs.CV

Cosine-Normalized Attention for Hyperspectral Image Classification

Muhammad Ahmad, Manuel Mazzara

2604.01762 2026-04-03 cs.LG cs.AI cs.CL cs.DC

FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang

Comments The first two authors contributed equally to this work; listing order is random

2604.01761 2026-04-03 cs.CV

Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion

Edoardo A. Dominici, Thomas Deixelberger, Konstantinos Vardis, Markus Steinberger

Comments project page https://dedoardo.github.io/projects/control-dino/

2604.01756 2026-04-03 cs.RO

Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

Sheng Li, Jingcheng Huang, Min Li

Comments 8 pages,7 figures

2604.01754 2026-04-03 cs.CL cs.AI cs.LG

LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches

Linyang He, Qiyao Yu, Hanze Dong, Baohao Liao, Xinxing Xu, Micah Goldblum, Jiang Bian, Nima Mesgarani

Comments Project page: https://livemathematicianbench.github.io/

详情

英文摘要

Mathematical reasoning is a hallmark of human intelligence, and whether large language models (LLMs) can meaningfully perform it remains a central question in artificial intelligence and cognitive science. As LLMs are increasingly integrated into scientific workflows, rigorous evaluation of their mathematical capabilities becomes a practical necessity. Existing benchmarks are limited by synthetic settings and data contamination. We present LiveMathematicianBench, a dynamic multiple-choice benchmark for research-level mathematical reasoning built from recent arXiv papers published after model training cutoffs. By grounding evaluation in newly published theorems, it provides a realistic testbed beyond memorized patterns. The benchmark introduces a thirteen-category logical taxonomy of theorem types (e.g., implication, equivalence, existence, uniqueness), enabling fine-grained evaluation across reasoning forms. It employs a proof-sketch-guided distractor pipeline that uses high-level proof strategies to construct plausible but invalid answer choices reflecting misleading proof directions, increasing sensitivity to genuine understanding over surface-level matching. We also introduce a substitution-resistant mechanism to distinguish answer recognition from substantive reasoning. Evaluation shows the benchmark is far from saturated: Gemini-3.1-pro-preview, the best model, achieves only 43.5%. Under substitution-resistant evaluation, accuracy drops sharply: GPT-5.4 scores highest at 30.6%, while Gemini-3.1-pro-preview falls to 17.6%, below the 20% random baseline. A dual-mode protocol reveals that proof-sketch access yields consistent accuracy gains, suggesting models can leverage high-level proof strategies for reasoning. Overall, LiveMathematicianBench offers a scalable, contamination-resistant testbed for studying research-level mathematical reasoning in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2604.01753 2026-04-03 cs.RO

Analysis of Efficient Transmission Methods of Grid Maps for Intelligent Vehicles

Robin Dehler, Dominik Authaler, Aryan Thakur, Thomas Wodtko, Michael Buchholz

Comments Accepted for 2026 IEEE Intelligent Vehicles Symposium (IV) - DOI will be added after publication

2604.01749 2026-04-03 cs.CV

Ultrasound-CLIP: Semantic-Aware Contrastive Pre-training for Ultrasound Image-Text Understanding

Jiayun Jin, Haolong Chai, Xueying Huang, Xiaoqing Guo, Zengwei Zheng, Zhan Zhou, Junmei Wang, Xinyu Wang, Jie Liu, Binbin Zhou

Comments Accepted by CVPR 2026

2604.01747 2026-04-03 cs.CV

Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception

Haoyuan Li, Wen Yang, Fang Xu, Hong Tan, Haijian Zhang, Shengyang Li, Gui-Song Xia

Comments 15 pages, 10 figures

详情

英文摘要

Cross-view geo-localization for Unmanned Aerial Vehicles (UAVs) operating in GNSS-denied environments remains challenging due to the severe geometric discrepancy between oblique UAV imagery and orthogonal satellite maps. Most existing methods address this problem through a decoupled pipeline of place retrieval and pose estimation, implicitly treating perspective distortion as appearance noise rather than an explicit geometric transformation. In this work, we propose a geometry-aware UAV geo-localization framework that explicitly models the 3D scene geometry to unify coarse place recognition and fine-grained pose estimation within a single inference pipeline. Our approach reconstructs a local 3D scene from multi-view UAV image sequences using a Visual Geometry Grounded Transformer (VGGT), and renders a virtual Bird's-Eye View (BEV) representation that orthorectifies the UAV perspective to align with satellite imagery. This BEV serves as a geometric intermediary that enables robust cross-view retrieval and provides spatial priors for accurate 3 Degrees of Freedom (3-DoF) pose regression. To efficiently handle multiple location hypotheses, we introduce a Satellite-wise Attention Block that isolates the interaction between each satellite candidate and the reconstructed UAV scene, preventing inter-candidate interference while maintaining linear computational complexity. In addition, we release a recalibrated version of the University-1652 dataset with precise coordinate annotations and spatial overlap analysis, enabling rigorous evaluation of end-to-end localization accuracy. Extensive experiments on the refined University-1652 benchmark and SUES-200 demonstrate that our method significantly outperforms state-of-the-art baselines, achieving robust meter-level localization accuracy and improved generalization in complex urban environments.

URL PDF HTML ☆

赞 0 踩 0

2604.01745 2026-04-03 cs.CL

Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text

Melania Berbatova, Tsvetoslav Vasev

2604.01742 2026-04-03 cs.CV

Dense Point-to-Mask Optimization with Reinforced Point Selection for Crowd Instance Segmentation

Hongru Chen, Jiyang Huang, Jia Wan, Antoni B. Chan

2604.01740 2026-04-03 cs.LG cs.NE

DDCL: Deep Dual Competitive Learning: A Differentiable End-to-End Framework for Unsupervised Prototype-Based Representation Learning

Giansalvo Cirrincione

详情

英文摘要

A persistent structural weakness in deep clustering is the disconnect between feature learning and cluster assignment. Most architectures invoke an external clustering step, typically k-means, to produce pseudo-labels that guide training, preventing the backbone from directly optimising for cluster quality. This paper introduces Deep Dual Competitive Learning (DDCL), the first fully differentiable end-to-end framework for unsupervised prototype-based representation learning. The core contribution is architectural: the external k-means is replaced by an internal Dual Competitive Layer (DCL) that generates prototypes as native differentiable outputs of the network. This single inversion makes the complete pipeline, from backbone feature extraction through prototype generation to soft cluster assignment, trainable by backpropagation through a single unified loss, with no Lloyd iterations, no pseudo-label discretisation, and no external clustering step. To ground the framework theoretically, the paper derives an exact algebraic decomposition of the soft quantisation loss into a simplex-constrained reconstruction error and a non-negative weighted prototype variance term. This identity reveals a self-regulating mechanism built into the loss geometry: the gradient of the variance term acts as an implicit separation force that resists prototype collapse without any auxiliary objective, and leads to a global Lyapunov stability theorem for the reduced frozen-encoder system. Six blocks of controlled experiments validate each structural prediction. The decomposition identity holds with zero violations across more than one hundred thousand training epochs; the negative feedback cycle is confirmed with Pearson -0.98; with a jointly trained backbone, DDCL outperforms its non-differentiable ablation by 65% in clustering accuracy and DeepCluster end-to-end by 122%.

URL PDF HTML ☆

赞 0 踩 0

2604.01738 2026-04-03 cs.AI

AeroTherm-GPT: A Verification-Centered LLM Framework for Thermal Protection System Engineering Workflows

Chuhan Qiao, Jinglai Zheng, Jie Huang, Buyue Zhao, Fan Li, Haiming Huang

2604.01728 2026-04-03 cs.AI

The AnIML Ontology: Enabling Semantic Interoperability for Large-Scale Experimental Data in Interconnected Scientific Labs

Wilf Morlidge, Elliott Watkiss-Leek, George Hannah, Harry Rostron, Andrew Ng, Ewan Johnson, Andrew Mitchell, Terry R. Payne, Valentina Tamma, Jacopo de Berardinis

Comments Accepted at the 38th International Conference on Advanced Information Systems Engineering (CAiSE 2026)

2604.01727 2026-04-03 cs.LG

MATA-Former & SIICU: Semantic Aware Temporal Alignment for High-Fidelity ICU Risk Prediction

Zhichong Zheng, Xiaohang Nie, Xueqi Wang, Yuanjin Zhao, Haitao Zhang, Yichao Tang

2604.01725 2026-04-03 cs.AI cs.LG

LiteInception: A Lightweight and Interpretable Deep Learning Framework for General Aviation Fault Diagnosis

Zhihuan Wei, Xinhang Chen, Danyang Han, Yang Hu, Jie Liu, Xuewen Miao, Guijiang Li

2604.01723 2026-04-03 cs.RO cs.AI

Causal Scene Narration with Runtime Safety Supervision for Vision-Language-Action Driving

Yun Li, Yidu Zhang, Simon Thompson, Ehsan Javanmardi, Manabu Tsukada

Comments 18 pages, 6 figures, 4 tables

2604.01720 2026-04-03 cs.RO

Hi-LOAM: Hierarchical Implicit Neural Fields for LiDAR Odometry and Mapping

Zhiliu Yang, Jianyuan Zhang, Lianhui Zhao, Jinyu Dai, Zhu Yang

Comments This manuscript is the accepted version of IEEE Transactions on Multimedia