arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.24821 2026-03-27 cs.CV cs.AI

Generative Adversarial Perturbations with Cross-paradigm Transferability on Localized Crowd Counting

Alabi Mehzabin Anisha, Guangjing Wang, Sriram Chellappan

Comments Accepted at CVPR 2026 Main Conference

详情

英文摘要

State-of-the-art crowd counting and localization are primarily modeled using two paradigms: density maps and point regression. Given the field's security ramifications, there is active interest in model robustness against adversarial attacks. Recent studies have demonstrated transferability across density-map-based approaches via adversarial patches, but cross-paradigm attacks (i.e., across both density map-based models and point regression-based models) remain unexplored. We introduce a novel adversarial framework that compromises both density map and point regression architectural paradigms through a comprehensive multi-task loss optimization. For point-regression models, we employ scene-density-specific high-confidence logit suppression; for density-map approaches, we use peak-targeted density map suppression. Both are combined with model-agnostic perceptual constraints to ensure that perturbations are effective and imperceptible to the human eye. Extensive experiments demonstrate the effectiveness of our attack, achieving on average a 7X increase in Mean Absolute Error compared to clean images while maintaining competitive visual quality, and successfully transferring across seven state-of-the-art crowd models with transfer ratios ranging from 0.55 to 1.69. Our approach strikes a balance between attack effectiveness and imperceptibility compared to state-of-the-art transferable attack strategies. The source code is available at https://github.com/simurgh7/CrowdGen

URL PDF HTML ☆

赞 0 踩 0

2603.24815 2026-03-27 cs.CV

Attention-based Pin Site Image Classification in Orthopaedic Patients with External Fixators

Yubo Wang, Marie Fridberg, Anirejuoritse Bafor, Ole Rahbek, Christopher Iobst, Søren Vedding Kold, Ming Shen

2603.24813 2026-03-27 cs.RO

Characterization of Constraints in Flexible Unknown Environments

Samrat Bhattacharyya, Nabil Simaan

2603.24811 2026-03-27 cs.RO

A Nonvolatile Switchable-polarity EPM Valve

Bingchao Wang, Jonah Mack, Francesco Giorgio-Serchi, Adam A. Stokes

2603.24806 2026-03-27 cs.RO cs.AI

FODMP: Fast One-Step Diffusion of Movement Primitives Generation for Time-Dependent Robot Actions

Xirui Shi, Arya Ebrahimi, Yi Hu, Jun Jin

2603.24804 2026-03-27 cs.CV cs.AI cs.LG

GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretraining

Deen Dayal Mohan, Hossein Souri, Vitali Petsiuk, Juhong Min, Gopal Sharma, Luowei Zhou, Suren Kumar

2603.24801 2026-03-27 cs.CV cs.AI cs.LG

Dissecting Model Failures in Abdominal Aortic Aneurysm Segmentation through Explainability-Driven Analysis

Abu Noman Md Sakib, Merjulah Roby, Zijie Zhang, Satish Muluk, Mark K. Eskandari, Ender A. Finol

2603.24800 2026-03-27 cs.CV

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Danil Tokhchukov, Aysel Mirzoeva, Andrey Kuznetsov, Konstantin Sobolev

Comments Accepted to CVRP 2026, Project page: https://v-gen-ai.github.io/Calibri-page/

2603.24797 2026-03-27 cs.CL

Enhancing Structured Meaning Representations with Aspect Classification

Claire Benét Post, Paul Bontempo, August Milliken, Alvin Po-Chun Chen, Nicholas Derby, Saksham Khatwani, Sumeyye Nabieva, Karthik Sairam, Alexis Palmer

Comments 15 pages, 3 figures, 8 tables

2603.24793 2026-03-27 cs.CV cs.MM cs.SD

AVControl: Efficient Framework for Training Audio-Visual Controls

Matan Ben-Yosef, Tavi Halperin, Naomi Ken Korem, Mohammad Salama, Harel Cain, Asaf Joseph, Anthony Chen, Urska Jelercic, Ofir Bibi

Comments Project page: https://matanby.github.io/AVControl/

2603.24787 2026-03-27 cs.AI

ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing

Yaopei Zeng, Congchao Wang, Blake JianHang Chen, Lu Lin

2603.24780 2026-03-27 cs.LG

Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

Jungtaek Kim, Thomas Zeng, Ziqian Lin, Minjae Lee, Chungpa Lee, Jy-yong Sohn, Hyung Il Koo, Kangwook Lee

Comments Accepted for publication in Transactions on Machine Learning Research (TMLR)

2603.24772 2026-03-27 cs.CL cs.AI cs.LG

Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

Mohammed Nowshad Ruhani Chowdhury, Mohammed Nowaz Rabbani Chowdhury, Sakari Lukkarinen

Comments 9 pages, 3 figures, 2 tables

2603.24770 2026-03-27 cs.CV

DRoPS: Dynamic 3D Reconstruction of Pre-Scanned Objects

Narek Tumanyan, Samuel Rota Bulò, Denis Rozumny, Lorenzo Porzi, Adam Harley, Tali Dekel, Peter Kontschieder, Jonathon Luiten

Comments Project page: https://drops-dynamics.github.io/

2603.24767 2026-03-27 cs.CL

Fine-Tuning A Large Language Model for Systematic Review Screening

Kweku Yamoah, Noah Schroeder, Emmanuel Dorley, Neha Rani, Caleb Schutz

2603.24764 2026-03-27 cs.CV cs.LG

Synthetic Cardiac MRI Image Generation using Deep Generative Models

Ishan Kumarasinghe, Dasuni Kawya, Madhura Edirisooriya, Isuri Devindi, Isuru Nawinne, Vajira Thambawita

Comments 12 pages, 2 figures, Preprint

2603.24753 2026-03-27 cs.LG cs.CV

Light Cones For Vision: Simple Causal Priors For Visual Hierarchy

Manglam Kartik, Neel Tushar Shah

Comments ICLR GRaM Workshop 2026

2603.24744 2026-03-27 cs.LG

Contrastive Learning Boosts Deterministic and Generative Models for Weather Data

Nathan Bailey

2603.24742 2026-03-27 cs.AI cs.LG cs.MA nlin.AO

Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

Adeela Bashir, Zhao Song, Ndidi Bianca Ogbo, Nataliya Balabanova, Martin Smit, Chin-wing Leung, Paolo Bova, Manuel Chica Serrano, Dhanushka Dissanayake, Manh Hong Duong, Elias Fernandez Domingos, Nikita Huber-Kralj, Marcus Krellner, Andrew Powell, Stefan Sarkadi, Fernando P. Santos, Zia Ush Shamszaman, Chaimaa Tarzi, Paolo Turrini, Grace Ibukunoluwa Ufeoshi, Victor A. Vargas-Perez, Alessandro Di Stefano, Simon T. Powers, The Anh Han

2603.24736 2026-03-27 cs.AI cs.LG

AutoSAM: an Agentic Framework for Automating Input File Generation for the SAM Code with Multi-Modal Retrieval-Augmented Generation

Zaid Abulawi, Zavier Ndum Ndum, Eric Cervi, Rui Hu, Yang Liu

Comments 34 Pages, 14 Figures

详情

英文摘要

In the design and safety analysis of advanced reactor systems, constructing input files for system-level thermal-hydraulics codes such as the System Analysis Module (SAM) remains a labor-intensive task. Analysts must extract and reconcile design data from heterogeneous engineering documents and manually translate it into solver-specific syntax. In this paper, we present AutoSAM, an agentic framework that automates SAM input file generation. The framework combines a large language model agent with retrieval-augmented generation over the solver's user guide and theory manual, together with specialized tools for analyzing PDFs, images, spreadsheets, and text files. AutoSAM ingests unstructured engineering documents, including system diagrams, design reports, and data tables, extracts simulation-relevant parameters into a human-auditable intermediate representation, and synthesizes validated, solver-compatible input decks. Its multimodal retrieval pipeline integrates scientific text extraction, vision-based figure interpretation, semantic embedding, and query answering. We evaluate AutoSAM on four case studies of increasing complexity: a single-pipe steady-state model, a solid-fuel channel with temperature reactivity feedback, the Advanced Burner Test Reactor core, and the Molten Salt Reactor Experiment primary loop. Across all cases, the agent produces runnable SAM models consistent with expected thermal-hydraulic behavior while explicitly identifying missing data and labeling assumed values. The framework achieves 100% utilization of structured inputs, about 88% extraction from PDF text, and 100% completeness in vision-based geometric extraction. These results demonstrate a practical path toward prompt-driven reactor modeling, in which analysts provide system descriptions and supporting documentation while the agent translates them into transparent, and executable, SAM simulations.

URL PDF HTML ☆

赞 0 踩 0

2603.24733 2026-03-27 cs.CV eess.IV q-bio.QM

OpenCap Monocular: 3D Human Kinematics and Musculoskeletal Dynamics from a Single Smartphone Video

Selim Gilon, Emily Y. Miller, Scott D. Uhlrich

详情

英文摘要

Quantifying human movement (kinematics) and musculoskeletal forces (kinetics) at scale, such as estimating quadriceps force during a sit-to-stand movement, could transform prediction, treatment, and monitoring of mobility-related conditions. However, quantifying kinematics and kinetics traditionally requires costly, time-intensive analysis in specialized laboratories, limiting clinical translation. Scalable, accurate tools for biomechanical assessment are needed. We introduce OpenCap Monocular, an algorithm that estimates 3D skeletal kinematics and kinetics from a single smartphone video. The method refines 3D human pose estimates from a monocular pose estimation model (WHAM) via optimization, computes kinematics of a biomechanically constrained skeletal model, and estimates kinetics via physics-based simulation and machine learning. We validated OpenCap Monocular against marker-based motion capture and force plate data for walking, squatting, and sit-to-stand tasks. OpenCap Monocular achieved low kinematic error (4.8° mean absolute error for rotational degrees of freedom; 3.4 cm for pelvis translations), outperforming a regression-only computer vision baseline by 48% in rotational accuracy (p = 0.036) and 69% in translational accuracy (p < 0.001). OpenCap Monocular also estimated ground reaction forces during walking with accuracy comparable to, or better than, our prior two-camera OpenCap system. We demonstrate that the algorithm estimates important kinetic outcomes with clinically meaningful accuracy in applications related to frailty and knee osteoarthritis, including estimating knee extension moment during sit-to-stand transitions and knee adduction moment during walking. OpenCap Monocular is deployed via a smartphone app, web app, and secure cloud computing (https://opencap.ai), enabling free, accessible single-smartphone biomechanical assessments.

URL PDF HTML ☆

赞 0 踩 0

2603.24730 2026-03-27 cs.CV

A Framework for Generating Semantically Ambiguous Images to Probe Human and Machine Perception

Yuqi Hu, Vasha DuTell, Ahna R. Girshick, Jennifer E. Corbett

2603.24721 2026-03-27 cs.CV cs.AI cs.LG cs.MM

Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Shengli Zhou, Minghang Zheng, Feng Zheng, Yang Liu

Comments Accepted by CVPR 2026

2603.24716 2026-03-27 cs.CV

Accurate Point Measurement in 3DGS -- A New Alternative to Traditional Stereoscopic-View Based Measurements

Deyan Deng, Rongjun Qin

Comments Accepted to the 2026 ISPRS Congress

详情

英文摘要

3D Gaussian Splatting (3DGS) has revolutionized real-time rendering with its state-of-the-art novel view synthesis, but its utility for accurate geometric measurement remains underutilized. Compared to multi-view stereo (MVS) point clouds or meshes, 3DGS rendered views present superior visual quality and completeness. However, current point measurement methods still rely on demanding stereoscopic workstations or direct picking on often-incomplete and inaccurate 3D meshes. As a novel view synthesizer, 3DGS renders exact source views and smoothly interpolates in-between views. This allows users to intuitively pick congruent points across different views while operating 3DGS models. By triangulating these congruent points, one can precisely generate 3D point measurements. This approach mimics traditional stereoscopic measurement but is significantly less demanding: it requires neither a stereo workstation nor specialized operator stereoscopic capability. Furthermore, it enables multi-view intersection (more than two views) for higher measurement accuracy. We implemented a web-based application to demonstrate this proof-of-concept (PoC). Using several UAV aerial datasets, we show this PoC allows users to successfully perform highly accurate point measurements, achieving accuracy matching or exceeding traditional stereoscopic methods on standard hardware. Specifically, our approach significantly outperforms direct mesh-based measurements. Quantitatively, our method achieves RMSEs in the 1-2 cm range on well-defined points. More critically, on challenging thin structures where mesh-based RMSE was 0.062 m, our method achieved 0.037 m. On sharp corners poorly reconstructed in the mesh, our method successfully measured all points with a 0.013 m RMSE, whereas the mesh method failed entirely. Code is available at: https://github.com/GDAOSU/3dgs_measurement_tool.

URL PDF HTML ☆

赞 0 踩 0

2603.24714 2026-03-27 cs.LG cs.SY eess.SY

Can an Actor-Critic Optimization Framework Improve Analog Design Optimization?

Sounak Dutta, Fin Amin, Sushil Panda, Jonathan Rabe, Yuejiang Wen, Paul Franzon

Comments 7 pages, 5 figures

2603.24713 2026-03-27 cs.CV

Lookalike3D: Seeing Double in 3D

Chandan Yeshwanth, Angela Dai

Comments Project page: https://cy94.github.io/lookalike3d/, Video: https://www.youtube.com/watch?v=g6S7J0y_52U

2603.24699 2026-03-27 cs.RO

Saranga: MilliWatt Ultrasound for Navigation in Visually Degraded Environments on Palm-Sized Aerial Robots

Manoj Velmurugan, Phillip Brush, Colin Balfour, Richard J. Przybyla, Nitin J. Sanket

详情

DOI: 10.1126/scirobotics.adz9609
Journal ref: Science Robotics Vol 11, Issue 112, eadz9609 (2026)

英文摘要

Tiny palm-sized aerial robots possess exceptional agility and cost-effectiveness in navigating confined and cluttered environments. However, their limited payload capacity directly constrains the sensing suite on-board the robot, thereby limiting critical navigational tasks in Global Positioning System (GPS)-denied wild scenes. Common methods for obstacle avoidance use cameras and LIght Detection And Ranging (LIDAR), which become ineffective in visually degraded conditions such as low visibility, dust, fog or darkness. Other sensors, such as RAdio Detection And Ranging (RADAR), have high power consumption, making them unsuitable for tiny aerial robots. Inspired by bats, we propose Saranga, a low-power ultrasound-based perception stack that localizes obstacles using a dual sonar array. We present two key solutions to combat the low Peak Signal-to-Noise Ratio of $-4.9$ decibels: physical noise reduction and a deep learning based denoising method. Firstly, we present a practical way to block propeller induced ultrasound noise on the weak echoes. The second solution is to train a neural network to utilize the \textcolor{black}{long horizon of ultrasound echoes} for finding signal patterns under high amounts of uncorrelated noise where classical methods were insufficient. We generalize to the real world by using a synthetic data generation pipeline and limited real noise data for training. We enable a palm-sized aerial robot to navigate in visually degraded conditions of dense fog, darkness, and snow in a cluttered environment with thin and transparent obstacles using only on-board sensing and computation. We provide extensive real world results to demonstrate the efficacy of our approach.

URL PDF HTML ☆

赞 0 踩 0

2603.24696 2026-03-27 cs.CV

LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration

Gokce Inal, Pouyan Navard, Alper Yilmaz

Comments Accepted in AI4Space Workshop CVPR2026. Website: https://osupcvlab.github.io/LLaVA-LE/, Dataset: https://huggingface.co/datasets/pcvlab/lucid

2603.24695 2026-03-27 cs.LG cs.CR cs.CV

Amplified Patch-Level Differential Privacy for Free via Random Cropping

Kaan Durmaz, Jan Schuchardt, Sebastian Schmidt, Stephan Günnemann

Comments Published at TMLR

2603.24690 2026-03-27 cs.CV

UniICL: Systematizing Unified Multimodal In-context Learning through a Capability-Oriented Taxonomy

Yicheng Xu, Jiangning Zhang, Zhucun Xue, Teng Hu, Ran Yi, Xiaobin Hu, Yong Liu, Dacheng Tao

Comments ECCV2026 under review