arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.15624 2026-03-18 cs.CV cs.AI cs.RO

Exploring the Use of VLMs for Navigation Assistance for People with Blindness and Low Vision

Yu Li, Yuchen Zheng, Giles Hamilton-Fletcher, Marco Mezzavilla, Yao Wang, Sundeep Rangan, Maurizio Porfiri, Zhou Yu, John-Ross Rizzo

详情

英文摘要

This paper investigates the potential of vision-language models (VLMs) to assist people with blindness and low vision (pBLV) in navigation tasks. We evaluate state-of-the-art closed-source models, including GPT-4V, GPT-4o, Gemini-1.5-Pro, and Claude-3.5-Sonnet, alongside open-source models, such as Llava-v1.6-mistral and Llava-onevision-qwen, to analyze their capabilities in foundational visual skills: counting ambient obstacles, relative spatial reasoning, and common-sense wayfinding-pertinent scene understanding. We further assess their performance in navigation scenarios, using pBLV-specific prompts designed to simulate real-world assistance tasks. Our findings reveal notable performance disparities between these models: GPT-4o consistently outperforms others across all tasks, particularly in spatial reasoning and scene understanding. In contrast, open-source models struggle with nuanced reasoning and adaptability in complex environments. Common challenges include difficulties in accurately counting objects in cluttered settings, biases in spatial reasoning, and a tendency to prioritize object details over spatial feedback, limiting their usability for pBLV in navigation tasks. Despite these limitations, VLMs show promise for wayfinding assistance when better aligned with human feedback and equipped with improved spatial reasoning. This research provides actionable insights into the strengths and limitations of current VLMs, guiding developers on effectively integrating VLMs into assistive technologies while addressing key limitations for enhanced usability.

URL PDF HTML ☆

赞 0 踩 0

2603.15622 2026-03-18 cs.CV cs.AI

SAC-NeRF: Adaptive Ray Sampling for Neural Radiance Fields via Soft Actor-Critic Reinforcement Learning

Chenyu Ge

2602.13713 2026-03-18 cs.CL

On Theoretically-Driven LLM Agents for Multi-Dimensional Discourse Analysis

Maciej Uberna, Michał Wawer, Jarosław A. Chudziak, Marcin Koszowy

Comments 8 pages, 4 figures, 3 tables. This is the accepted version of the paper presented at the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026), Marbella, Spain

2601.22629 2026-03-18 cs.CL cs.AI

Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models

Jingxuan Wu, Zhenglin Wan, Xingrui Yu, Yuzhe Yang, Yiqiao Huang, Ivor Tsang, Yang You

2511.18247 2026-03-18 cs.LG math.OC

Tail Distribution of Regret in Optimistic Reinforcement Learning

Sajad Khodadadian, Mehrdad Moharrami

Comments 27 pages, 0 figures

2510.04017 2026-03-18 cs.AI cs.LG physics.ao-ph

Zephyrus: An Agentic Framework for Weather Science

Sumanth Varambally, Marshall Fisher, Jas Thakker, Yiwei Chen, Zhirui Xia, Yasaman Jafari, Ruijia Niu, Manas Jain, Veeramakali Vignesh Manivannan, Zachary Novack, Luyu Han, Srikar Eranky, Salva Rühling Cachay, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yi-An Ma, Rose Yu

2507.06993 2026-03-18 cs.AI cs.CV

IMAIA: Interactive Maps AI Assistant for Travel Planning and Geo-Spatial Intelligence

Jieren Deng, Zhizhang Hu, Ziyan He, Aleksandar Cvetkovic, Pak Kiu Chung, Dragomir Yankov, Chiqun Zhang

Comments Accepted to The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2506.11602 2026-03-18 cs.CL cs.AI

Are LLMs Good Text Diacritizers? An Arabic and Yoruba Case Study

Hawau Olamide Toyin, Samar Mohamed Magdy, Hanan Aldarmaki

Comments accepted at LREC 2026

2506.04439 2026-03-18 cs.LG

RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis

Robin Yadav, Qi Yan, Guy Wolf, Avishek Joey Bose, Renjie Liao

2505.08875 2026-03-18 cs.RO

Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation

Shuyuan Yang, Zonghe Chua

2305.13883 2026-03-18 cs.LG cs.CY cs.SE

Leveraging Imperfect Sources to Detect Fairwashing in Black-Box Auditing

Jade Garcia Bourrée, Erwan Le Merrer, Gilles Tredan, Benoît Rottembourg

Comments 23 pages, 10 figures

2603.16850 2026-03-18 math.NA cs.AI cs.DC cs.NA math.DS math.OC

Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks

Xavier Gonzalez

Comments PhD Dissertation; Stanford University

详情

DOI: 10.25740/vf943fc9855

英文摘要

Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from sequential bottlenecks. Recent work showed that dynamical systems can in fact be parallelized across the sequence length by reframing their evaluation as a system of nonlinear equations, which can be solved with Newton's method using a parallel associative scan. However, these parallel Newton methods struggled with limitations, primarily inefficiency, instability, and lack of convergence guarantees. This thesis addresses these limitations with methodological and theoretical contributions, drawing particularly from optimization. Methodologically, we develop scalable and stable parallel Newton methods, based on quasi-Newton and trust-region approaches. The quasi-Newton methods are faster and more memory efficient, while the trust-region approaches are significantly more stable. Theoretically, we unify many fixed-point methods into our parallel Newton framework, including Picard and Jacobi iterations. We establish a linear convergence rate for these techniques that depends on the method's approximation accuracy and stability. Moreover, we give a precise condition, rooted in dynamical stability, that characterizes when parallelization provably accelerates a dynamical system and when it cannot. Specifically, the sign of the Largest Lyapunov Exponent of a dynamical system determines whether or not parallel Newton methods converge quickly. In sum, this thesis unlocks scalable and stable methods for parallelizing sequential computation, and provides a firm theoretical basis for when such techniques will and will not work. This thesis also serves as a guide to parallel Newton methods for researchers who want to write the next chapter in this ongoing story.

URL PDF HTML ☆

赞 0 踩 0

2603.16829 2026-03-18 stat.ML cs.LG math.ST stat.ME stat.TH

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testing

Saksham Jain, Alex Luedtke

2603.16812 2026-03-18 cs.DC cs.AI cs.AR

ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation

Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester

2603.16587 2026-03-18 q-bio.QM cs.CV eess.IV

HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes

Pierre-Antoine Bannier

2603.15699 2026-03-18 cs.PF cs.AI cs.SE

This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs

Lars Krupp, Daniel Geißler, Francisco M. Calatrava-Nicolas, Vishal Banwari, Paul Lukowicz, Jakob Karolus

Comments This work was accepted at PerCom 2026

2603.15154 2026-03-18 eess.IV cs.CV

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

2603.14621 2026-03-18 eess.IV cs.CV

A Heterogeneous Ensemble for Multi-Center COVID-19 Classification from Chest CT Scans

Aadit Nilay, Bhavesh Thapar, Anant Agrawal, Mohammad Nayeem Teli

2603.13392 2026-03-18 eess.IV cs.AI cs.CV

Comparative Analysis of Deep Learning Architectures for Multi-Disease Classification of Single-Label Chest X-rays

Ali M. Bahram, Saman Muhammad Omer, Hardi M. Mohammed

Comments 19 pages, 9 figures, 12 tables. Published in Charmo Journal of Natural Sciences and Technologies (CJNST), 2026

详情

DOI: 10.31530/cjnst.2026.2.1.2
Journal ref: Charmo Journal of Natural Sciences and Technologies (CJNST), Vol. 2, Issue 1, pp. 10-28, 2026

英文摘要

Chest X-ray imaging remains the primary diagnostic tool for pulmonary and cardiac disorders worldwide, yet its accuracy is hampered by radiologist shortages and inter-observer variability. This study presents a systematic comparative evaluation of seven deep learning architectures for multi-class chest disease classification: ConvNeXt-Tiny, DenseNet121, DenseNet201, ResNet50, ViT-B/16, EfficientNetV2-M, and MobileNetV2. A balanced dataset of 18,080 chest X-ray images spanning five disease categories (Cardiomegaly, COVID-19, Normal, Pneumonia, and Tuberculosis) was constructed from three public repositories and partitioned at the patient level to prevent data leakage. All models were trained under identical conditions using ImageNet-pretrained weights, standardized preprocessing, and consistent hyperparameters. All seven architectures exceeded 90% test accuracy. ConvNeXt-Tiny achieved the highest performance (92.31% accuracy, 95.70% AUROC), while MobileNetV2 emerged as the most parameter-efficient model (3.5M parameters, 90.42% accuracy, 94.10% AUROC), completing training in 48 minutes. Tuberculosis and COVID-19 classification was near-perfect (AUROC >= 99.97%) across all architectures, while Normal, Cardiomegaly, and Pneumonia presented greater challenges due to overlapping radiographic features. Grad-CAM visualizations confirmed clinically consistent attention patterns across disease categories. These findings demonstrate that high-accuracy multi-disease chest X-ray classification is achievable without excessive computational resources, with important implications for AI-assisted diagnosis in both resource-rich and resource-constrained healthcare settings.

URL PDF HTML ☆

赞 0 踩 0

2404.03813 2026-03-18 quant-ph cs.LG

Agnostic Tomography of Stabilizer Product States

Sabee Grewal, Vishnu Iyer, William Kretschmer, Daniel Liang

Comments 20 pages. V2: minor corrections. V3: addition of new references. V4: reworked the algorithm and presentation. V5: accepted to Quantum

2211.13231 2026-03-18 q-bio.QM cs.LG

Predicting Biomedical Interactions with Probabilistic Model Selection for Graph Neural Networks

Kishan KC, Rui Li, Paribesh Regmi, Anne R. Haake

2603.16751 2026-03-18 cs.GT cs.AI cs.LG

Finding Common Ground in a Sea of Alternatives

Jay Chooi, Paul Gölz, Ariel D. Procaccia, Benjamin Schiffer, Shirley Zhang

2603.16750 2026-03-18 cs.HC cs.ET cs.RO

Thermopneumatic Pixels for Fast, Localized, Low-Voltage Touch Feedback

Max Linnander, Yon Visell

2603.16746 2026-03-18 math.DS cs.LG nlin.CD

Data-driven forced response analysis with min-max representations of nonlinear restoring forces

Akira Saito, Hiromu Fujita

2603.16712 2026-03-18 math.ST cs.DS cs.LG stat.ML stat.TH

High-dimensional estimation with missing data: Statistical and computational limits

Kabir Aladin Verchand, Ankit Pensia, Saminul Haque, Rohith Kuditipudi

2603.16668 2026-03-18 eess.AS cs.SD

HRTF-guided Binaural Target Speaker Extraction with Real-World Validation

Yoav Ellinson, Sharon Gannot

Comments Submitted to Interspeech 2026

2603.16599 2026-03-18 eess.SY cs.AI cs.CE cs.ET cs.SY

Data-driven generalized perimeter control: Zürich case study

Alessio Rimoldi, Carlo Cenedese, Alberto Padoan, Florian Dörfler, John Lygeros

Comments 33 pages, 16 figures

2603.16565 2026-03-18 eess.SP cs.AI cs.AR cs.SY eess.SY

Deep Learning-Driven Black-Box Doherty Power Amplifier with Pixelated Output Combiner and Extended Efficiency Range

Han Zhou, Haojie Chang, David Widen

2603.16548 2026-03-18 cs.CR cs.CV

SAMSEM -- A Generic and Scalable Approach for IC Metal Line Segmentation

Christian Gehrmann, Jonas Ricker, Simon Damm, Deruo Cheng, Julian Speith, Yiqiong Shi, Asja Fischer, Christof Paar

详情

英文摘要

In light of globalized hardware supply chains, the assurance of hardware components has gained significant interest, particularly in cryptographic applications and high-stakes scenarios. Identifying metal lines on scanning electron microscope (SEM) images of integrated circuits (ICs) is one essential step in verifying the absence of malicious circuitry in chips manufactured in untrusted environments. Due to varying manufacturing processes and technologies, such verification usually requires tuning parameters and algorithms for each target IC. Often, a machine learning model trained on images of one IC fails to accurately detect metal lines on other ICs. To address this challenge, we create SAMSEM by adapting Meta's Segment Anything Model 2 (SAM2) to the domain of IC metal line segmentation. Specifically, we develop a multi-scale segmentation approach that can handle SEM images of varying sizes, resolutions, and magnifications. Furthermore, we deploy a topology-based loss alongside pixel-based losses to focus our segmentation on electrical connectivity rather than pixel-level accuracy. Based on a hyperparameter optimization, we then fine-tune the SAM2 model to obtain a model that generalizes across different technology nodes, manufacturing materials, sample preparation methods, and SEM imaging technologies. To this end, we leverage an unprecedented dataset of SEM images obtained from 48 metal layers across 14 different ICs. When fine-tuned on seven ICs, SAMSEM achieves an error rate as low as 0.72% when evaluated on other images from the same ICs. For the remaining seven unseen ICs, it still achieves error rates as low as 5.53%. Finally, when fine-tuned on all 14 ICs, we observe an error rate of 0.62%. Hence, SAMSEM proves to be a reliable tool that significantly advances the frontier in metal line segmentation, a key challenge in post-manufacturing IC verification.

URL PDF HTML ☆

赞 0 踩 0

2603.16470 2026-03-18 cs.IT cs.AI eess.SP math.IT

Multi-Agent Reinforcement Learning Counteracts Delayed CSI in Multi-Satellite Systems

Marios Aristodemou, Yasaman Omid, Sangarapillai Lambotharan, Mahsa Derakhshan, Lajos Hanzo

Comments 12 pages, 6 Figures, Submit to IEEE Transactions of Vehicular Technology. It has been reviewed once