arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.09032 2026-03-11 cs.LG cs.AR cs.CE cs.DC

Two Teachers Better Than One: Hardware-Physics Co-Guided Distributed Scientific Machine Learning

Yuchen Yuan, Junhuan Yang, Hao Wan, Yipei Liu, Hanhan Wu, Youzuo Lin, Lei Yang

Comments 7 pages, 9 figures. Accepted at the 63rd ACM/IEEE Design Automation Conference (DAC 2026), Long Beach, CA, July 2026

2603.09031 2026-03-11 cs.RO

ImpedanceDiffusion: Diffusion-Based Global Path Planning for UAV Swarm Navigation with Generative Impedance Control

Faryal Batool, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Roohan Ahmed Khan, Aleksey Fedoseev, Dzmitry Tsetserukou

Comments This is paper is under review

2603.09018 2026-03-11 cs.AI

Meissa: Multi-modal Medical Agentic Intelligence

Yixiong Chen, Xinyi Bai, Yue Pan, Zongwei Zhou, Alan Yuille

详情

英文摘要

Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely almost entirely on frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter medical MM-LLM that brings agentic capability offline. Instead of imitating static answers, Meissa learns both when to engage external interaction (strategy selection) and how to execute multi-step interaction (strategy execution) by distilling structured trajectories from frontier models. Specifically, we propose: (1) Unified trajectory modeling: trajectories (reasoning and action traces) are represented within a single state-action-observation formalism, allowing one model to generalize across heterogeneous medical environments. (2) Three-tier stratified supervision: the model's own errors trigger progressive escalation from direct reasoning to tool-augmented and multi-agent interaction, explicitly learning difficulty-aware strategy selection. (3) Prospective-retrospective supervision: pairing exploratory forward traces with hindsight-rationalized execution traces enables stable learning of effective interaction policies. Trained on 40K curated trajectories, Meissa matches or exceeds proprietary frontier agents in 10 of 16 evaluation settings across 13 medical benchmarks spanning radiology, pathology, and clinical reasoning. Using over 25x fewer parameters than typical frontier models like Gemini-3, Meissa operates fully offline with 22x lower end-to-end latency compared to API-based deployment. Data, models, and environments are released at https://github.com/Schuture/Meissa.

URL PDF HTML ☆

赞 0 踩 0

2603.09016 2026-03-11 cs.LG cs.CV cs.NE

An accurate flatness measure to estimate the generalization performance of CNN models

Rahman Taleghani, Maryam Mohammadi, Francesco Marchetti

2603.09014 2026-03-11 cs.LG cs.CV

The Coupling Within: Flow Matching via Distilled Normalizing Flows

David Berthelot, Tianrong Chen, Jiatao Gu, Marco Cuturi, Laurent Dinh, Bhavik Chandna, Michal Klein, Josh Susskind, Shuangfei Zhai

Comments Submitted to ICML 2026

2603.09011 2026-03-11 cs.RO cs.AI cs.HC

Improving through Interaction: Searching Behavioral Representation Spaces with CMA-ES-IG

Nathaniel Dennler, Zhonghao Shi, Yiran Tao, Andreea Bobu, Stefanos Nikolaidis, Maja Matarić

Comments Under submission to IJRR

2603.08998 2026-03-11 cs.CV

Diffusion-Based Authentication of Copy Detection Patterns: A Multimodal Framework with Printer Signature Conditioning

Bolutife Atoki, Iuliia Tkachenko, Bertrand Kerautret, Carlos Crispim-Junior

Comments Accepted at WACV 2026

2603.08997 2026-03-11 cs.CV

SkipGS: Post-Densification Backward Skipping for Efficient 3DGS Training

Jingxing Li, Yongjae Leeand, Deliang Fan

2603.08989 2026-03-11 cs.CL

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance

Seungjun Yi, Joakim Nguyen, Huimin Xu, Terence Lim, Joseph Skrovan, Mehak Beri, Hitakshi Modi, Andrew Well, Carlos M. Mery, Yan Zhang, Mia K. Markey, Ying Ding

Comments Submitted to AMIA 2026 Annual Symposium (American Medical Informatics Association)

2603.08988 2026-03-11 cs.RO

Characterization, Analytical Planning, and Hybrid Force Control for the Inspire RH56DFX Hand

Xuan Tan, William Xie, Nikolaus Correll

2603.08987 2026-03-11 cs.LG

MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, Ning Guo

2603.08983 2026-03-11 cs.RO cs.CV

SurgCalib: Gaussian Splatting-Based Hand-Eye Calibration for Robot-Assisted Minimally Invasive Surgery

Zijian Wu, Shuojue Yang, Yu Chung Lee, Eitan Prisman, Yueming Jin, Septimiu E. Salcudean

Comments 9 pages, 7 figures

2603.08982 2026-03-11 cs.CV

SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

Xuanyi Zhou, Qiuyang Mang, Shuo Yang, Haocheng Xi, Jintao Zhang, Huanzhi Mao, Joseph E. Gonzalez, Kurt Keutzer, Ion Stoica, Alvin Cheung

2603.08972 2026-03-11 cs.LG

MAcPNN: Mutual Assisted Learning on Data Streams with Temporal Dependence

Federico Giannini, Emanuele Della Valle

2603.08967 2026-03-11 cs.CV eess.AS

Can You Hear, Localize, and Segment Continually? An Exemplar-Free Continual Learning Benchmark for Audio-Visual Segmentation

Siddeshwar Raghavan, Gautham Vinod, Bruce Coburn, Fengqing Zhu

2603.08961 2026-03-11 cs.RO

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

Niraj Pudasaini, Yutong Zhang, Jensen Lavering, Alessandro Roncone, Nikolaus Correll

2603.08960 2026-03-11 cs.LG cs.AR cs.DC cs.PF

The $qs$ Inequality: Quantifying the Double Penalty of Mixture-of-Experts at Inference

Vignesh Adhinarayanan, Nuwan Jayasena

Comments 10 pages, 6 tables

2603.08958 2026-03-11 cs.RO cs.SY eess.SY

Formation-Aware Adaptive Conformalized Perception for Safe Leader-Follower Multi-Robot Systems

Richie R. Suganda, Bin Hu

Comments 8 pages, 8 figures

2603.08954 2026-03-11 cs.AI cs.CL cs.DC cs.IR cs.LG

A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations

Joshua Castillo, Ravi Mukkamala

Comments Accepted to CAC: Applied Computing & Automation Conferences 2026. 16 pages, 6 figures

2603.08936 2026-03-11 cs.SD cs.AI cs.CL cs.MM eess.AS

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

Hezhao Zhang, Huang-Cheng Chou, Shrikanth Narayanan, Thomas Hain

Comments submitted to Interspeech 2026

2603.08933 2026-03-11 cs.AI cs.IR cs.LG

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

Joshua Castillo, Ravi Mukkamala

Comments 14 pages, 7 figures. Accepted at ICEIS 2026 (International Conference on Enterprise Information Systems)

2603.08930 2026-03-11 cs.CV cs.AI

Using Vision Language Foundation Models to Generate Plant Simulation Configurations via In-Context Learning

Heesup Yun, Isaac Kazuo Uyehara, Earl Ranario, Lars Lundqvist, Christine H. Diepenbrock, Brian N. Bailey, J. Mason Earles

2603.08928 2026-03-11 cs.CV

TIDE: Text-Informed Dynamic Extrapolation with Step-Aware Temperature Control for Diffusion Transformers

Yihua Liu, Fanjiang Ye, Bowen Lin, Rongyu Fang, Chengming Zhang

2603.08927 2026-03-11 cs.CV cs.MM

MEGC2026: Micro-Expression Grand Challenge on Visual Question Answering

Xinqi Fan, Jingting Li, John See, Moi Hoon Yap, Su-Jing Wang, Adrian K. Davison

Comments MEGC 2026 at IEEE FG 2026

2603.08926 2026-03-11 cs.RO

Fly, Track, Land: Infrastructure-less Magnetic Localization for Heterogeneous UAV-UGV Teaming

Valerio Brunacci, Davide Plozza, Alessio De Angelis, Michele Magno, Tommaso Polonelli

Comments Submitted to IEEE Transactions on Robotics (T-RO). Supplementary video available

2603.08921 2026-03-11 cs.CV cs.LG

Vision-Language Models Encode Clinical Guidelines for Concept-Based Medical Reasoning

Mohamed Harmanani, Bining Long, Zhuoxin Guo, Paul F. R. Wilson, Amirhossein Sabour, Minh Nguyen Nhat To, Gabor Fichtinger, Purang Abolmaesumi, Parvin Mousavi

Comments CVPR 2026 Findings

2603.08914 2026-03-11 cs.LG cs.AI

Uncovering a Winning Lottery Ticket with Continuously Relaxed Bernoulli Gates

Itamar Tsayag, Ofir Lindenbaum

2603.08913 2026-03-11 cs.LG cs.CR q-bio.GN

Quantifying Memorization and Privacy Risks in Genomic Language Models

Alexander Nemecek, Wenbiao Li, Xiaoqian Jiang, Jaideep Vaidya, Erman Ayday

Comments 13 pages

详情

英文摘要

Genomic language models (GLMs) have emerged as powerful tools for learning representations of DNA sequences, enabling advances in variant prediction, regulatory element identification, and cross-task transfer learning. However, as these models are increasingly trained or fine-tuned on sensitive genomic cohorts, they risk memorizing specific sequences from their training data, raising serious concerns around privacy, data leakage, and regulatory compliance. Despite growing awareness of memorization risks in general-purpose language models, little systematic evaluation exists for these risks in the genomic domain, where data exhibit unique properties such as a fixed nucleotide alphabet, strong biological structure, and individual identifiability. We present a comprehensive, multi-vector privacy evaluation framework designed to quantify memorization risks in GLMs. Our approach integrates three complementary risk assessment methodologies: perplexity-based detection, canary sequence extraction, and membership inference. These are combined into a unified evaluation pipeline that produces a worst-case memorization risk score. To enable controlled evaluation, we plant canary sequences at varying repetition rates into both synthetic and real genomic datasets, allowing precise quantification of how repetition and training dynamics influence memorization. We evaluate our framework across multiple GLM architectures, examining the relationship between sequence repetition, model capacity, and memorization risk. Our results establish that GLMs exhibit measurable memorization and that the degree of memorization varies across architectures and training regimes. These findings reveal that no single attack vector captures the full scope of memorization risk, underscoring the need for multi-vector privacy auditing as a standard practice for genomic AI systems.

URL PDF HTML ☆

赞 0 踩 0

2603.08910 2026-03-11 cs.CL

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Hexuan Wang, Yaxuan Ren, Srikar Bommireddypalli, Shuxian Chen, Adarsh Prabhudesai, Rongkun Zhou, Elina Baral, Philipp Koehn

Comments 18 pages, 11 figures, 7 tables

2603.08907 2026-03-11 cs.LG cs.AI stat.ML

Cross-Domain Uncertainty Quantification for Selective Prediction: A Comprehensive Bound Ablation with Transfer-Informed Betting

Abhinaba Basu

详情

英文摘要

We present a comprehensive ablation of nine finite-sample bound families for selective prediction with risk control, combining concentration inequalities (Hoeffding, Empirical Bernstein, Clopper-Pearson, Wasserstein DRO, CVaR) with multiple-testing corrections (union bound, Learn Then Test fixed-sequence) and betting-based confidence sequences (WSR). Our main theoretical contribution is Transfer-Informed Betting (TIB), which warm-starts the WSR wealth process using a source domain's risk profile, achieving tighter bounds in data-scarce settings with a formal dominance guarantee. We prove that the TIB wealth process remains a valid supermartingale under all source-target divergences, that TIB dominates standard WSR when domains match, and that no data-independent warm-start can achieve better convergence. The combination of betting-based confidence sequences, LTT monotone testing, and cross-domain transfer is, to our knowledge, a three-way novelty not present in the literature. We evaluate all nine bound families on four benchmarks-MASSIVE (n=1,102), NyayaBench (n=280), CLINC-150 (n=22.5K), and Banking77 (n=13K)-across 18 (alpha, delta) configurations. On MASSIVE at alpha=0.10, LTT eliminates the ln(K) union-bound penalty, achieving 94.0% guaranteed coverage versus 73.8% for Hoeffding-a 27% relative improvement. On NyayaBench, where the small calibration set makes Hoeffding-family bounds infeasible below alpha=0.20, Transfer-Informed Betting achieves 18.5% coverage at alpha=0.10, a 5.4x improvement over LTT + Hoeffding. We additionally compare with split-conformal prediction, showing that conformal methods produce prediction sets (avg. 1.67 classes) whereas selective prediction provides single-prediction risk guarantees. We apply these methods to agentic caching systems, formalizing a progressive trust model where the guarantee determines when cached responses can be served autonomously.

URL PDF HTML ☆

赞 0 踩 0