arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.09675 2026-03-30 cs.CV cs.LG

PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild

Felix B. Mueller, Jan F. Meier, Timo Lueddecke, Richard Vogg, Roger L. Freixanet, Valentin Hassler, Tiffany Bosshard, Elif Karakoc, William J. O'Hearn, Sofia M. Pereira, Sandro Sehner, Kaja Wierucka, Judith Burkart, Claudia Fichtel, Julia Fischer, Alexander Gail, Catherine Hobaiter, Julia Ostner, Liran Samuni, Oliver Schülke, Neda Shahidi, Erin G. Wessling, Alexander S. Ecker

Comments 9 pages, 5 figures, CVPR 2026

2511.06494 2026-03-30 cs.LG cs.AI cs.IT math.IT

Route Experts by Sequence, not by Token

Tiansheng Wen, Yifei Wang, Aosong Feng, Long Ma, Xinyang Liu, Yifan Wang, Lixuan Guo, Bo Chen, Stefanie Jegelka, Chenyu You

2511.04235 2026-03-30 cs.AI cs.CE

Shared Spatial Memory Through Predictive Coding

Zhengru Fang, Yu Guo, Yuang Zhang, Haonan An, Wenbo Ding, Yuguang Fang

2511.02531 2026-03-30 cs.LG cs.AI

Causal Graph Neural Networks for Healthcare

Munib Mesinovic, Max Buhlan, Tingting Zhu

2511.00810 2026-03-30 cs.CV cs.AI cs.CL cs.HC cs.LG

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

Shijie Zhou, Viet Dac Lai, Hao Tan, Jihyung Kil, Wanrong Zhu, Changyou Chen, Ruiyi Zhang

2511.00255 2026-03-30 cs.CV

BeetleFlow: An Integrative Deep Learning Pipeline for Beetle Image Processing

Fangxun Liu, S M Rayeed, Samuel Stevens, Alyson East, Cheng Hsuan Chiang, Colin Lee, Daniel Yi, Junke Yang, Tejas Naik, Ziyi Wang, Connor Kilrain, Elijah H Buckwalter, Jiacheng Hou, Saul Ibaven Bueno, Shuheng Wang, Xinyue Ma, Yifan Liu, Zhiyuan Tao, Ziheng Zhang, Eric Sokol, Michael Belitz, Sydne Record, Charles V. Stewart, Wei-Lun Chao

Comments 4 pages, NeurIPS 2025 Workshop Imageomics

2510.24133 2026-03-30 cs.CV cs.AI

Compositional Image Synthesis with Inference-Time Scaling

Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn

Comments projcet page: https://github.com/gcl-inha/ReFocus

2510.15376 2026-03-30 cs.RO

Towards Automated Chicken Deboning via Learning-based Dynamically-Adaptive 6-DoF Multi-Material Cutting

Zhaodong Yang, Ai-Ping Hu, Harish Ravichandar

Comments Accepted by ICRA 2026

2510.14273 2026-03-30 cs.CV

CLEAR: Causal Learning Framework For Robust Histopathology Tumor Detection Under Out-Of-Distribution Shifts

Kieu-Anh Truong Thi, Huy-Hieu Pham, Duc-Trong Le

2510.13540 2026-03-30 cs.CV

Learning Neural Parametric 3D Breast Shape Models for Metrical Surface Reconstruction From Monocular RGB Videos

Maximilian Weiherer, Antonia von Riedheim, Vanessa Brébant, Bernhard Egger, Christoph Palm

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:005

详情

DOI: 10.59275/j.melba.2026-8b23
Journal ref: Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

英文摘要

We present a neural parametric 3D breast shape model and, based on this model, introduce a low-cost and accessible 3D surface reconstruction pipeline capable of recovering accurate breast geometry from a monocular RGB video. In contrast to widely used, commercially available yet prohibitively expensive 3D breast scanning solutions and existing low-cost alternatives, our method requires neither specialized hardware nor proprietary software and can be used with any device that is able to record RGB videos. The key building blocks of our pipeline are a state-of-the-art, off-the-shelf Structure-from-motion pipeline, paired with a parametric breast model for robust and metrically correct surface reconstruction. Our model, similarly to the recently proposed implicit Regensburg Breast Shape Model (iRBSM), leverages implicit neural representations to model breast shapes. However, unlike the iRBSM, which employs a single global neural signed distance function (SDF), our approach -- inspired by recent state-of-the-art face models -- decomposes the implicit breast domain into multiple smaller regions, each represented by a local neural SDF anchored at anatomical landmark positions. When incorporated into our surface reconstruction pipeline, the proposed model, dubbed liRBSM (short for localized iRBSM), significantly outperforms the iRBSM in terms of reconstruction quality, yielding more detailed surface reconstruction than its global counterpart. Overall, we find that the introduced pipeline is able to recover high-quality 3D breast geometry within an error margin of less than 2 mm. Our method is fast (requires less than six minutes), fully transparent and open-source, and -- together with the model -- publicly available at https://rbsm.re-mic.de/local-implicit.

URL PDF HTML ☆

赞 0 踩 0

2510.11647 2026-03-30 cs.CV

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

Yinan Chen, Jiangning Zhang, Teng Hu, Yuxiang Zeng, Zhucun Xue, Qingdong He, Chengjie Wang, Yong Liu, Xiaobin Hu, Shuicheng Yan

Comments Accepted by ICLR 2026. Equal contributions from first two authors. Project page: https://ryanchenyn.github.io/projects/IVEBench Code: https://github.com/RyanChenYN/IVEBench Dataset: https://huggingface.co/datasets/Coraxor/IVEBench

2510.10163 2026-03-30 cs.CV

SSeg: Active Sparse Point-Label Augmentation for Semantic Segmentation

Cesar Borja, Carlos Plou, Ruben Martinez-Cantin, Ana C. Murillo

2510.09962 2026-03-30 cs.RO

VG-Mapping: Variation-aware Density Control for Online 3D Gaussian Mapping in Semi-static Scenes

Yicheng He, Jingwen Yu, Guangcheng Chen, Hong Zhang

2510.08222 2026-03-30 cs.AI

Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens

Yunlong Deng, Boyang Sun, Yan Li, Lingjing Kong, Zeyu Tang, Kun Zhang, Guangyi Chen

2510.07160 2026-03-30 cs.RO

A Narwhal-Inspired Sensing-to-Control Framework for Small Fixed-Wing Aircraft

Fengze Xie, Xiaozhou Fan, Jacob Schuster, Yisong Yue, Morteza Gharib

2510.06692 2026-03-30 cs.LG cs.CR

Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?

Akira Ito, Takayuki Miura, Yosuke Todo

Comments Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

详情

英文摘要

Deep Neural Networks (DNNs) have attracted significant attention, and their internal models are now considered valuable intellectual assets. Extracting such a model via oracle access to a DNN is conceptually similar to extracting a secret key from a block cipher. Consequently, cryptanalytic techniques, particularly differential-like attacks, have been actively explored. ReLU-based DNNs are the most common and widely deployed architectures. While early works (e.g., Crypto 2020, Eurocrypt 2024) assume access to exact output logits, which are typically not exposed, more recent works (e.g., Asiacrypt 2024, Eurocrypt 2025) focus on the hard-label setting, where only the final classification result (e.g., "dog" or "car") is available. Notably, Carlini et al. (Eurocrypt 2025) showed that model extraction is feasible in polynomial time even under this restricted setting. In this paper, we show that a key assumption underlying their attack becomes increasingly unrealistic as the target depth grows. While prior works noted neurons whose activation states rarely change, we analyze their concrete impact on hard-label extraction: even a single neuron that is (almost) always active can prevent the attack from proceeding unless its parameters are recovered, and ignoring it incurs a non-negligible error. A straightforward solution is to extract these parameters by observing a state switch of such a neuron, but observing such a switch becomes exponentially harder as depth increases, implying that hard-label extraction is not always polynomial time. To address this limitation, we propose a novel attack called cross-layer extraction. Rather than extracting secret parameters (e.g., weights and biases) directly, we exploit cross-layer interactions to recover them from deeper layers, reducing query complexity and addressing limitations of existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2510.05516 2026-03-30 cs.LG math.OC

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information

Wei-Ting Tang, Akshay Kudva, Joel A. Paulson

2510.04428 2026-03-30 cs.CV

A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering

Yuanhao Zou, Shengji Jin, Andong Deng, Youpeng Zhao, Jun Wang, Chen Chen

Comments ICLR 2026 Paper

2510.03223 2026-03-30 cs.CL cs.AI

Attention-Aligned Reasoning for Large Language Models

Hongxiang Zhang, Yuan Tian, Tianyi Zhang

2510.02898 2026-03-30 cs.CV

One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework

Lorenzo Bianchi, Giacomo Pacini, Fabio Carrara, Nicola Messina, Giuseppe Amato, Fabrizio Falchi

Comments IEEE CVF Conference on Computer Vision and Pattern Recognition 2026. Project page with code, models and examples: https://paciosoft.com/Patch-ioner/

2510.01448 2026-03-30 cs.CV cs.AI

GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings

Angel Daruna, Nicholas Meegan, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

Comments Accepted to CVPR 2026 main track

2510.00316 2026-03-30 cs.LG

Large Language Models Can Perform Automatic Modulation Classification via Discretized Self-supervised Candidate Retrieval

Mohammad Rostami, Atik Faysal, Reihaneh Gh. Roshan, Huaxia Wang, Nikhil Muralidhar, Yu-Dong Yao

详情

英文摘要

Identifying wireless modulation schemes is essential for cognitive radio, but standard supervised models often degrade under distribution shift, and training domain-specific wireless foundation models from scratch is computationally prohibitive. Large Language Models (LLMs) offer a promising training-free alternative via in-context learning, yet feeding raw floating-point signal statistics into LLMs overwhelms models with numerical noise and exhausts token budgets. We introduce DiSC-AMC, a framework that reformulates Automatic Modulation Classification (AMC) as an LLM reasoning task by combining aggressive feature discretization with nearest-neighbor retrieval over self-supervised embeddings. By mapping continuous features to coarse symbolic tokens, DiSC-AMC aligns abstract signal patterns with LLM reasoning capabilities and reduces prompt length by over $50$\%. Simultaneously, utilizing a DINOv2 visual encoder to retrieve the $k_\text{NN}$ most similar labeled exemplars provides highly relevant, query-specific context rather than generic class averages. On a 10-class benchmark, a fine-tuned 7B-parameter LLM using DiSC-AMC achieves $83.0$\% in-distribution accuracy ($-10$\,to\,$+10$\,dB) and $82.50$\% out-of-distribution (OOD) accuracy ($-11$\,to\,$-15$\,dB), outperforming supervised baselines. Comprehensive ablations on vanilla LLMs demonstrate the token efficiency of DiSC-AMC. A training-free $7$B LLM achieves $71$\% accuracy using only $0.5$\,K-token prompt,surpassing a $200$B-parameter baseline that relies on a $2.9$K-token prompt. Furthermore, similarity-based exemplar retrieval outperforms naive class-average selection by over $20$\%. Finally, we identify a fundamental limitation of this pipeline. At extreme OOD noise levels ($-30$\,dB), the underlying self-supervised representations collapse, degrading retrieval quality and reducing classification to random chance.

URL PDF HTML ☆

赞 0 踩 0

2509.24779 2026-03-30 cs.LG q-bio.BM

MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models

Kacper Kapuśniak, Cristian Gabellini, Michael Bronstein, Prudencio Tossou, Francesco Di Giovanni

2509.24207 2026-03-30 cs.AI

Humanline: Online Alignment as Perceptual Loss

Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh

2509.22225 2026-03-30 cs.CV cs.AI

ExtrinSplat: Decoupling Geometry and Semantics for Open-Vocabulary Understanding in 3D Gaussian Splatting

Jiayu Ding, Xinpeng Liu, Zhiyi Pan, Shiqiang Long, Ge Li

Comments Accepted to CVPR 2026

2509.01752 2026-03-30 cs.CV physics.med-ph

Clinical Metadata Guided Limited-Angle CT Image Reconstruction

Yu Shi, Shuyi Fan, Changsheng Fang, Shuo Han, Haodong Li, Li Zhou, Bahareh Morovati, Dayang Wang, Hengyong Yu

Comments IEEE Transactions on Medical Imaging, 2026

2509.00841 2026-03-30 cs.CL

Neural Models and Language Model Prompting for the Multidimensional Evaluation of Open-Ended Conversations

Michelle Elizabeth, Alicja Kasicka, Natalia Krawczyk, Magalie Ochs, Gwénolé Lecorvé, Justyna Gromada, Lina M. Rojas-Barahona

Comments This work was granted access to the HPC resources of IDRIS under the allocations AD011015150R1 made by GENCI

2508.14765 2026-03-30 cs.LG cs.AI

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang

Comments 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: AI for Science (Spotlight)

2508.07841 2026-03-30 cs.LG cs.SY eess.SY

Learning Robust Satellite Attitude Dynamics with Physics-Informed Normalising Flow

Carlo Cena, Mauro Martini, Marcello Chiaberge

详情

DOI: 10.1016/j.actaastro.2026.03.047
Journal ref: Acta Astronautica 2026

英文摘要

Attitude control is a fundamental aspect of spacecraft operations. Model Predictive Control (MPC) has emerged as a powerful strategy for these tasks, relying on accurate models of the system dynamics to optimize control actions over a prediction horizon. In scenarios where physics models are incomplete, difficult to derive, or computationally expensive, machine learning offers a flexible alternative by learning the system behavior directly from data. However, purely data-driven models often struggle with generalization and stability, especially when applied to inputs outside their training domain. To address these limitations, we investigate the benefits of incorporating Physics-Informed Neural Networks (PINNs) into the learning of spacecraft attitude dynamics, comparing their performance with that of purely data-driven approaches. Using a Real-valued Non-Volume Preserving (Real NVP) neural network architecture with a self-attention mechanism, we trained several models on simulated data generated with the Basilisk simulator. Two training strategies were considered: a purely data-driven baseline and a physics-informed variant to improve robustness and stability. Our results demonstrate that the inclusion of physics-based information significantly enhances the performance in terms of the mean relative error with the best architectures found by 27.08%. These advantages are particularly evident when the learned models are integrated into an MPC framework, where PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness, and achieve improved settling times when compared to traditional MPC approaches, yielding improvements of up to 62%, when subject to observation noise and RWs friction.

URL PDF HTML ☆

赞 0 踩 0

2508.07819 2026-03-30 cs.CV cs.AI cs.LG

ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection

Ke Ma, Jun Long, Hongxiao Fei, Liujie Hua, Zhen Dai, Yueyi Luo

Comments 4 pages, 1 reference, 3 figures