arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2509.17726 2026-03-25 cs.CV cs.LG

Automated Labeling of Intracranial Arteries with Uncertainty Quantification Using Deep Learning

Javier Bisbal, Patrick Winter, Sebastian Jofre, Aaron Ponce, Sameer A. Ansari, Ramez Abdalla, Michael Markl, Oliver Welin Odeback, Sergio Uribe, Cristian Tejos, Julio Sotelo, Susanne Schnell, David Marlevi

Comments 16 pages, 6 figures

详情

DOI: 10.1186/s12880-026-02276-5
Journal ref: BMC Medical Imaging (2026)

英文摘要

Accurate anatomical labeling of intracranial arteries is essential for cerebrovascular diagnosis and hemodynamic analysis but remains time-consuming and subject to interoperator variability. We present a deep learning-based framework for automated artery labeling from 3D Time-of-Flight Magnetic Resonance Angiography (3D ToF-MRA) segmentations (n=35), incorporating uncertainty quantification to enhance interpretability and reliability. We evaluated three convolutional neural network architectures: (1) a UNet with residual encoder blocks, reflecting commonly used baselines in vascular labeling; (2) CS-Net, an attention-augmented UNet incorporating channel and spatial attention mechanisms for enhanced curvilinear structure recognition; and (3) nnUNet, a self-configuring framework that automates preprocessing, training, and architectural adaptation based on dataset characteristics. Among these, nnUNet achieved the highest labeling performance (average Dice score: 0.922; average surface distance: 0.387 mm), with improved robustness in anatomically complex vessels. To assess predictive confidence, we implemented test-time augmentation (TTA) and introduced a novel coordinate-guided strategy to reduce interpolation errors during augmented inference. The resulting uncertainty maps reliably indicated regions of anatomical ambiguity, pathological variation, or manual labeling inconsistency. We further validated clinical utility by comparing flow velocities derived from automated and manual labels in co-registered 4D Flow MRI datasets, observing close agreement with no statistically significant differences. Our framework offers a scalable, accurate, and uncertainty-aware solution for automated cerebrovascular labeling, supporting downstream hemodynamic analysis and facilitating clinical integration.

URL PDF HTML ☆

赞 0 踩 0

2509.03242 2026-03-25 cs.LG cs.SE

TopoMap: A Feature-based Semantic Discriminator of the Topographical Regions in the Test Input Space

Gianmarco De Vita, Nargiz Humbatova, Paolo Tonella

详情

英文摘要

Testing Deep Learning (DL)-based systems is an open challenge. Although it is relatively easy to find inputs that cause a DL model to misbehave, the grouping of inputs by features that make the DL model under test fail is largely unexplored. Existing approaches for DL testing introduce perturbations that may focus on specific failure-inducing features, while neglecting others that belong to different regions of the feature space. In this paper, we create an explicit topographical map of the input feature space. Our approach, named TopoMap, is both black-box and model-agnostic as it relies solely on features that characterise the input space. To discriminate the inputs according to the specific features they share, we first apply dimensionality reduction to obtain input embeddings, which are then subjected to clustering. Each DL model might require specific embedding computations and clustering algorithms to achieve a meaningful separation of inputs into discriminative groups. We propose a novel way to evaluate alternative configurations of embedding and clustering techniques. We used a deep neural network (DNN) as an approximation of a human evaluator who could tell whether a pair of clusters can be discriminated based on the features of the included elements. We use such a DNN to automatically select the optimal topographical map of the inputs among all those that are produced by different embedding/clustering configurations. The evaluation results show that the maps generated by TopoMap consist of distinguishable and meaningful regions. In addition, we evaluate the effectiveness of TopoMap using mutation analysis. In particular, we assess whether the clusters in our topographical map allow for an effective selection of mutation-killing inputs. Experimental results show that our approach outperforms random selection by 35% on average on killable mutants; by 61% on non-killable ones.

URL PDF HTML ☆

赞 0 踩 0

2506.17892 2026-03-25 cs.CV cs.LG

BeltCrack: the First Sequential-image Industrial Conveyor Belt Crack Detection Dataset and Its Baseline with Triple-domain Feature Learning

Jianghong Huang, Luping Ji, Xin Ma, Mao Ye

Comments Accepted by Pattern Recognition

2506.11167 2026-03-25 cs.CV cs.LG

Towards a general-purpose foundation model for fMRI analysis

Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Wanyi Fu, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, Tianming Liu, Xue-Jun Kong, Quanzheng Li, Daniel S. Barron, Anqi Qiu, Randy Hirschtick, Byung-Hoon Kim, Hongbin Han, Xiang Li, Yixuan Yuan

2505.11139 2026-03-25 cs.LG

Covariance Density Neural Networks

Om Roy, Yashar Moshfeghi, Keith Smith

2503.10404 2026-03-25 cs.LG cond-mat.dis-nn cs.CV

Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Matteo Gambella, Fabrizio Pittorino, Manuel Roveri

Comments Published in the journal Machine Learning: Science and Technology - IOPscience

详情

DOI: 10.1088/2632-2153/adf02e
Journal ref: 2025 Mach. Learn.: Sci. Technol. 6 035016

英文摘要

Neural Architecture Search (NAS) has become an essential tool for designing effective and efficient neural networks. In this paper, we investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods, specifically NAS-Bench-201 and DARTS. By defining flatness metrics such as neighborhoods and loss barriers along paths in architecture space, we reveal locality and flatness characteristics analogous to the well-known properties of neural network loss landscapes in weight space. In particular, we find that highly accurate architectures cluster together in flat regions, while suboptimal architectures remain isolated, unveiling the detailed geometrical structure of the architecture search landscape. Building on these insights, we propose Architecture-Aware Minimization (A$^2$M), a novel analytically derived algorithmic framework that explicitly biases, for the first time, the gradient of differentiable NAS methods towards flat minima in architecture space. A$^2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet16-120, across both NAS-Bench-201 and DARTS search spaces. Notably, A$^2$M is able to increase the test accuracy, on average across different differentiable NAS methods, by +3.60\% on CIFAR-10, +4.60\% on CIFAR-100, and +3.64\% on ImageNet16-120, demonstrating its superior effectiveness in practice. A$^2$M can be easily integrated into existing differentiable NAS frameworks, offering a versatile tool for future research and applications in automated machine learning. We open-source our code at https://github.com/AI-Tech-Research-Lab/AsquaredM.

URL PDF HTML ☆

赞 0 踩 0

2503.04945 2026-03-25 cs.CL cs.AI cs.HC

Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems

Jooyoung Lee, Xiaochen Zhu, Georgi Karadzhov, Tom Stafford, Andreas Vlachos, Dongwon Lee

Comments 15; To appear in ICWSM 2026 (https://www.icwsm.org/2026/)

2502.10001 2026-03-25 cs.CL cs.AR cs.DC cs.LG

EmbBERT: Attention Under 2 MB Memory

Riccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri

Comments 24 pages, 4 figures, 14 tables

详情

DOI: 10.1016/j.neunet.2026.108800
Journal ref: Neural Networks, Volume 200, 2026, 108800, ISSN 0893-6080, https://www.sciencedirect.com/science/article/pii/S0893608026002625

英文摘要

Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task. However, their substantial memory and computational requirements still hinder deployment on ultra-constrained devices such as wearables and Internet-of-Things (IoT) units, where available memory is limited to just a few megabytes. To address this challenge, we introduce EmbBERT, a tiny language model (TLM) architecturally designed for extreme efficiency. The model integrates a compact embedding layer, streamlined feed-forward blocks, and an efficient attention mechanism that together enable optimal performance under strict memory budgets. Through this redesign for the extreme edge, we demonstrate that highly simplified transformer architectures remain remarkably effective under tight resource constraints. EmbBERT requires only 2 MB of total memory, and achieves accuracy performance comparable to the ones of state-of-the-art (SotA) models that require a $\mathbf{10\times}$ memory budget. Extensive experiments on the curated TinyNLP benchmark and the GLUE suite confirm that EmbBERT achieves competitive accuracy, comparable to that of larger SotA models, and consistently outperforms downsized versions of BERT and MAMBA of similar size. Furthermore, we demonstrate the model resilience to 8-bit quantization, which further reduces memory usage to just 781 kB , and the scalability of the EmbBERT architecture across the sub-megabyte to tens-of-megabytes range. Finally, we perform an ablation study demonstrating the positive contributions of all components and the pre-training procedure. All code, scripts, and checkpoints are publicly released to ensure reproducibility: https://github.com/RiccardoBravin/tiny-LLM.

URL PDF HTML ☆

赞 0 踩 0

2501.08415 2026-03-25 cs.CV cs.AI

Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics

Georgii Gotin, Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin

Comments Accepted for VISAPP 2025

详情

DOI: 10.1109/CVPR52688.2022.01464

英文摘要

Recent studies have revealed that modern image and video quality assessment (IQA/VQA) metrics are vulnerable to adversarial attacks. An attacker can manipulate a video through preprocessing to artificially increase its quality score according to a certain metric, despite no actual improvement in visual quality. Most of the attacks studied in the literature are white-box attacks, while black-box attacks in the context of VQA have received less attention. Moreover, some research indicates a lack of transferability of adversarial examples generated for one model to another when applied to VQA. In this paper, we propose a cross-modal attack method, IC2VQA, aimed at exploring the vulnerabilities of modern VQA models. This approach is motivated by the observation that the low-level feature spaces of images and videos are similar. We investigate the transferability of adversarial perturbations across different modalities; specifically, we analyze how adversarial perturbations generated on a white-box IQA model with an additional CLIP module can effectively target a VQA model. The addition of the CLIP module serves as a valuable aid in increasing transferability, as the CLIP model is known for its effective capture of low-level semantics. Extensive experiments demonstrate that IC2VQA achieves a high success rate in attacking three black-box VQA models. We compare our method with existing black-box attack strategies, highlighting its superiority in terms of attack success within the same number of iterations and levels of attack strength. We believe that the proposed method will contribute to the deeper analysis of robust VQA metrics.

URL PDF HTML ☆

赞 0 踩 0

2412.07586 2026-03-25 cs.LG stat.ML

Paired Wasserstein Autoencoders for Conditional Sampling

Moritz Piening, Matthias Chung

2411.00623 2026-03-25 cs.CV cs.LG

Replay-Free Continual Low-Rank Adaptation with Dynamic Memory

Huancheng Chen, Jingtao Li, Weiming Zhuang, Chen Chen, Lingjuan Lyu

2410.22492 2026-03-25 cs.AI

RealCQA-V2: A Diagnostic Benchmark for Structured Visual Entailment over Scientific Charts

Saleem Ahmed, Srirangaraj Setlur, Venu Govindaraju

Comments Under Review : Code and Data will be made public soon - https://cse-ai-lab.github.io/VPP/

2406.01825 2026-03-25 cs.LG cs.AI

Reliable OOD Virtual Screening with Extrapolatory Pseudo-Label Matching

Yunni Qu, Bhargav Vaduri, Karthikeya Jatoth, James Wellnitz, Dzung Dinh, Seth Veenbaas, Jonathan Chapman, Alexander Tropsha, Junier Oliva

2603.23297 2026-03-25 cs.CV cs.LG eess.IV

Drop-In Perceptual Optimization for 3D Gaussian Splatting

Ezgi Ozyilkan, Zhiqi Chen, Oren Rippel, Jona Ballé, Kedar Tatwawadi

Comments Project page: https://apple.github.io/ml-perceptual-3dgs

2603.23295 2026-03-25 cs.CV

Mamba-driven MRI-to-CT Synthesis for MRI-only Radiotherapy Planning

Konstantinos Barmpounakis, Theodoros P. Vagenas, Maria Vakalopoulou, George K. Matsopoulos

2603.23292 2026-03-25 cs.AI cs.CL

LLM Olympiad: Why Model Evaluation Needs a Sealed Exam

Jan Christian Blaise Cruz, Alham Fikri Aji

2603.23282 2026-03-25 cs.LG cs.AI

A Comparative Study of Machine Learning Models for Hourly Forecasting of Air Temperature and Relative Humidity

Jiaqi Dong

2603.23278 2026-03-25 cs.RO

Learning Multi-Agent Local Collision-Avoidance for Collaborative Carrying tasks with Coupled Quadrupedal Robots

Francesca Bray, Simone Tolomei, Andrei Cramariuc, Cesar Cadena, Marco Hutter

2603.23276 2026-03-25 cs.CV

CCF: Complementary Collaborative Fusion for Domain Generalized Multi-Modal 3D Object Detection

Yuchen Wu, Kun Wang, Yining Pan, Na Zhao

Comments Accepted to CVPR 2026

2603.23272 2026-03-25 cs.CV cs.MM

Multi-Modal Image Fusion via Intervention-Stable Feature Learning

Xue Wang, Zheng Guan, Wenhua Qian, Chengchao Wang, Runzhuo Ma

Comments Accpted by CVPR 2026

2603.23271 2026-03-25 cs.RO cs.AI

A Multimodal Framework for Human-Multi-Agent Interaction

Shaid Hasan, Breenice Lee, Sujan Sarker, Tariq Iqbal

Comments 4 pages, 3 figures. Accepted at ACM/IEEE HRI 2026 Workshop (MAgicS-HRI)

2603.23268 2026-03-25 cs.LG cs.AI

SafeSeek: Universal Attribution of Safety Circuits in Language Models

Miao Yu, Siyuan Fu, Moayad Aloqaily, Zhenhong Zhou, Safa Otoum, Xing fan, Kun Wang, Yufei Guo, Qingsong Wen

2603.23265 2026-03-25 cs.LG

SynForceNet: A Force-Driven Global-Local Latent Representation Framework for Lithium-Ion Battery Fault Diagnosis

Rongxiu Chen, Yuting Su

2603.23255 2026-03-25 cs.LG

Permutation-Symmetrized Diffusion for Unconditional Molecular Generation

Gyeonghoon Ko, Juho Lee

2603.23251 2026-03-25 cs.CL cs.LG

Is AI Catching Up to Human Expression? Exploring Emotion, Personality, Authorship, and Linguistic Style in English and Arabic with Six Large Language Models

Nasser A Alsadhan

Comments Preprint. Under review

详情

英文摘要

The advancing fluency of LLMs raises important questions about their ability to emulate complex human traits, including emotional expression and personality, across diverse linguistic and cultural contexts. This study investigates whether LLMs can convincingly mimic emotional nuance in English and personality markers in Arabic, a critical under-resourced language with unique linguistic and cultural characteristics. We conduct two tasks across six models:Jais, Mistral, LLaMA, GPT-4o, Gemini, and DeepSeek. First, we evaluate whether machine classifiers can reliably distinguish between human-authored and AI-generated texts. Second, we assess the extent to which LLM-generated texts exhibit emotional or personality traits comparable to those of humans. Our results demonstrate that AI-generated texts are distinguishable from human-authored ones (F1>0.95), though classification performance deteriorates on paraphrased samples, indicating a reliance on superficial stylistic cues. Emotion and personality classification experiments reveal significant generalization gaps: classifiers trained on human data perform poorly on AI-generated texts and vice versa, suggesting LLMs encode affective signals differently from humans. Importantly, augmenting training with AI-generated data enhances performance in the Arabic personality classification task, highlighting the potential of synthetic data to address challenges in under-resourced languages. Model-specific analyses show that GPT-4o and Gemini exhibit superior affective coherence. Linguistic and psycholinguistic analyses reveal measurable divergences in tone, authenticity, and textual complexity between human and AI texts. These findings have implications for affective computing, authorship attribution, and responsible AI deployment, particularly within underresourced language contexts where generative AI detection and alignment pose unique challenges.

URL PDF HTML ☆

赞 0 踩 0

2603.23246 2026-03-25 cs.CV

GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models

Zekai Gu, Shuoxuan Feng, Yansong Wang, Hanzhuo Huang, Zhongshuo Du, Chengfeng Zhao, Chengwei Ren, Peng Wang, Yuan Liu

Comments Project page: https://igl-hkust.github.io/GO-Renderer

2603.23245 2026-03-25 cs.LG cs.AI

Neural ODE and SDE Models for Adaptation and Planning in Model-Based Reinforcement Learning

Chao Han, Stefanos Ioannou, Luca Manneschi, T. J. Hayward, Michael Mangan, Aditya Gilra, Eleni Vasilaki

2603.23244 2026-03-25 cs.AI

Online library learning in human visual puzzle solving

Pinzhe Zhao, Emanuele Sansone, Marta Kryven, Bonan Zhao

2603.23232 2026-03-25 cs.LG

GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

Haoyu Wang, Jingcheng Wang, Shunyu Wu, Xinwei Xiao

2603.23229 2026-03-25 cs.CL

I Came, I Saw, I Explained: Benchmarking Multimodal LLMs on Figurative Meaning in Memes

Shijia Zhou, Saif M. Mohammad, Barbara Plank, Diego Frassinelli

Comments LREC 2026, 18 pages, 10 figures