arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.24953 2026-03-27 cs.CV

Select, Hypothesize and Verify: Towards Verified Neuron Concept Interpretation

ZeBin Ji, Yang Hu, Xiuli Bi, Bo Liu, Bin Xiao

Comments Accepted in CVPR 2026

详情

英文摘要

It is essential for understanding neural network decisions to interpret the functionality (also known as concepts) of neurons. Existing approaches describe neuron concepts by generating natural language descriptions, thereby advancing the understanding of the neural network's decision-making mechanism. However, these approaches assume that each neuron has well-defined functions and provides discriminative features for neural network decision-making. In fact, some neurons may be redundant or may offer misleading concepts. Thus, the descriptions for such neurons may cause misinterpretations of the factors driving the neural network's decisions. To address the issue, we introduce a verification of neuron functions, which checks whether the generated concept highly activates the corresponding neuron. Furthermore, we propose a Select-Hypothesize-Verify framework for interpreting neuron functionality. This framework consists of: 1) selecting activation samples that best capture a neuron's well-defined functional behavior through activation-distribution analysis; 2) forming hypotheses about concepts for the selected neurons; and 3) verifying whether the generated concepts accurately reflect the functionality of the neuron. Extensive experiments show that our method produces more accurate neuron concepts. Our generated concepts activate the corresponding neurons with a probability approximately 1.5 times that of the current state-of-the-art method.

URL PDF HTML ☆

赞 0 踩 0

2603.24947 2026-03-27 cs.AI econ.GN q-fin.EC

Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For

Se Yan, Han Zhong, Zemin, Zhong, Wenyu Zhou

2603.24943 2026-03-27 cs.AI cs.CL

FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li, Xianyin Zhang, Lifan Guo, Feng Chen, Yong Liu, Chi Zhang

Comments Accepted by ICASSP 2026

2603.24941 2026-03-27 cs.CV cs.CL

Beyond Attention Magnitude: Leveraging Inter-layer Rank Consistency for Efficient Vision-Language-Action Models

Peiju Liu, Jinming Liu, Xipeng Qiu, Xuanjing Huang

Comments 10 pages, 7 figures, preprint

2603.24938 2026-03-27 cs.CV

Infinite Gaze Generation for Videos with Autoregressive Diffusion

Jenna Kang, Colin Groth, Tong Wu, Finley Torrens, Patsorn Sangkloy, Gordon Wetzstein, Qi Sun

2603.24934 2026-03-27 cs.LG cs.AI cs.CV

CVA: Context-aware Video-text Alignment for Video Temporal Grounding

Sungho Moon, Seunghun Lee, Jiwan Seo, Sunghoon Im

Comments Accepted to CVPR 2026

2603.24933 2026-03-27 cs.AI cs.CE

Decoding Market Emotions in Cryptocurrency Tweets via Predictive Statement Classification with Machine Learning and Transformers

Moein Shahiki Tash, Zahra Ahani, Mohim Tash, Mostafa Keikhay Farzaneh, Ari Y. Barrera-Animas, Olga Kolesnikova

2603.24931 2026-03-27 cs.RO

COIN: Collaborative Interaction-Aware Multi-Agent Reinforcement Learning for Self-Driving Systems

Yifeng Zhang, Jieming Chen, Tingguang Zhou, Tanishq Duhan, Jianghong Dong, Yuhong Cao, Guillaume Sartoretti

2603.24930 2026-03-27 cs.RO

CROSS: A Mixture-of-Experts Reinforcement Learning Framework for Generalizable Large-Scale Traffic Signal Control

Xibei Chen, Yifeng Zhang, Yuxiang Xiao, Mingfeng Fan, Maonan Wang, Guillaume Sartoretti

2603.24929 2026-03-27 cs.AI cs.CL cs.IT math.IT

LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics

Farhan Ahmed, Yuya Jeremy Ong, Chad DeLuca

2603.24917 2026-03-27 cs.CL cs.LG

Estimating near-verbatim extraction risk in language models with decoding-constrained beam search

A. Feder Cooper, Mark A. Lemley, Christopher De Sa, Lea Duesterwald, Allison Casasola, Jamie Hayes, Katherine Lee, Daniel E. Ho, Percy Liang

2603.24916 2026-03-27 cs.LG stat.ML

Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML

Yassien Shaalan

Comments 12 pages, 5 figures. Accepted at MLSys 2026. TinyML / on-device learning paper on hypernetwork-based compression for ECG and other 1D biosignals, with integer-only inference on commodity MCUs. Evaluated on Apnea-ECG, PTB-XL, and MIT-BIH. Camera-ready version with additional datasets, experiments, and insights will appear after May 2026

详情

Journal ref: MLSys 2026

英文摘要

Deploying neural networks on microcontrollers is constrained by kilobytes of flash and SRAM, where 1x1 pointwise (PW) mixers often dominate memory even after INT8 quantization across vision, audio, and wearable sensing. We present HYPER-TINYPW, a compression-as-generation approach that replaces most stored PW weights with generated weights: a shared micro-MLP synthesizes PW kernels once at load time from tiny per-layer codes, caches them, and executes them with standard integer operators. This preserves commodity MCU runtimes and adds only a one-off synthesis cost; steady-state latency and energy match INT8 separable CNN baselines. Enforcing a shared latent basis across layers removes cross-layer redundancy, while keeping PW1 in INT8 stabilizes early, morphology-sensitive mixing. We contribute (i) TinyML-faithful packed-byte accounting covering generator, heads/factorization, codes, kept PW1, and backbone; (ii) a unified evaluation with validation-tuned t* and bootstrap confidence intervals; and (iii) a deployability analysis covering integer-only inference and boot versus lazy synthesis. On three ECG benchmarks (Apnea-ECG, PTB-XL, MIT-BIH), HYPER-TINYPW shifts the macro-F1 versus flash Pareto frontier: at about 225 kB it matches a roughly 1.4 MB CNN while being 6.31x smaller (84.15% fewer bytes), retaining at least 95% of large-model macro-F1. Under 32-64 kB budgets it sustains balanced detection where compact baselines degrade. The mechanism applies broadly to other 1D biosignals, on-device speech, and embedded sensing tasks where per-layer redundancy dominates, indicating a wider role for compression-as-generation in resource-constrained ML systems. Beyond ECG, HYPER-TINYPW transfers to TinyML audio: on Speech Commands it reaches 96.2% test accuracy (98.2% best validation), supporting broader applicability to embedded sensing workloads where repeated linear mixers dominate memory.

URL PDF HTML ☆

赞 0 踩 0

2603.24912 2026-03-27 cs.CV

ICTPolarReal: A Polarized Reflection and Material Dataset of Real World Objects

Jing Yang, Krithika Dharanikota, Emily Jia, Haiwei Chen, Yajie Zhao

Comments CVPR 2026

2603.24908 2026-03-27 cs.RO cs.AI cs.MA

Integrated Multi-Drone Task Allocation, Sequencing, and Optimal Trajectory Generation in Obstacle-Rich 3D Environments

Yunes Alqudsi, Murat Makaraci

Comments Resubmission following accepted appeal (MOD-78958). Resubmitting to cs.RO with cross-lists cs.MA and cs.AI as advised by arXiv Support

2603.24904 2026-03-27 cs.AI cs.CR

On the Foundations of Trustworthy Artificial Intelligence

TJ Dunham

Comments 26 pages, 10 tables, 1 figure, 17 theorems/definitions/corollaries

2603.24897 2026-03-27 cs.CV

SurgPhase: Time efficient pituitary tumor surgery phase recognition via an interactive web platform

Yan Meng, Jack Cook, X. Y. Han, Kaan Duman, Shauna Otto, Dhiraj Pangal, Jonathan Chainey, Ruth Lau, Margaux Masson-Forsythe, Daniel A. Donoho, Danielle Levy, Gabriel Zada, Sébastien Froelich, Juan Fernandez-Miranda, Mike Chang

2603.24896 2026-03-27 cs.CL cs.AI

LogSigma at SemEval-2026 Task 3: Uncertainty-Weighted Multitask Learning for Dimensional Aspect-Based Sentiment Analysis

Baraa Hikal, Jonas Becker, Bela Gipp

2603.24883 2026-03-27 cs.LG

Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization

Kalle Kujanpää, Yuying Zhu, Kristina Klinkner, Shervin Malmasi

Comments ICLR 2026 Workshop on AI for Mechanism Design and Strategic Decision Making

2603.24876 2026-03-27 cs.CV

OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding

Xiaoyu Tang, Jun Dong, Jintao Cheng, Rui Fan

2603.24866 2026-03-27 cs.AI cs.CL cs.CV

How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning

Luyu Yang, Yutong Dai, An Yan, Viraj Prabhu, Ran Xu, Zeyuan Chen

2603.24856 2026-03-27 cs.AI cs.CY cs.ET cs.MA

SentinelAI: A Multi-Agent Framework for Structuring and Linking NG9-1-1 Emergency Incident Data

Kliment Ho, Ilya Zaslavsky

Comments 10 pages, 5 figures

2603.24850 2026-03-27 cs.CV cs.LG cs.RO

Towards automatic smoke detector inspection: Recognition of the smoke detectors in industrial facilities and preparation for future drone integration

Lukas Kratochvila, Jakub Stefansky, Simon Bilik, Robert Rous, Tomas Zemcik, Michal Wolny, Frantisek Rusnak, Ondrej Cech, Karel Horak

2603.24847 2026-03-27 cs.CV

CORA: A Pathology Synthesis Driven Foundation Model for Coronary CT Angiography Analysis and MACE Risk Assessment

Jinkui Hao, Gorkem Durak, Halil Ertugrul Aktas, Ulas Bagci, Bradley D. Allen, Nilay S. Shah, Bo Zhou

2603.24846 2026-03-27 cs.CV cs.AI cs.LG

NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders

Katarina Trojachanec Dineva, Stefan Andonov, Ilinka Ivanoska, Ivan Kitanovski, Sasho Gramatikov, Tamara Kostova, Monika Simjanoska Misheva, Kostadin Mishev

Comments 53 pages, 12 figures. Manuscript submitted to the BMC Medical Informatics and Decision Making journal

详情

英文摘要

Recent advances in multimodal large language models enable new possibilities for image-based decision support. However, their reliability and operational trade-offs in neuroimaging remain insufficiently understood. We present a comprehensive benchmarking study of vision-enabled large language models for 2D neuroimaging using curated MRI and CT datasets covering multiple sclerosis, stroke, brain tumors, other abnormalities, and normal controls. Models are required to generate multiple outputs simultaneously, including diagnosis, diagnosis subtype, imaging modality, specialized sequence, and anatomical plane. Performance is evaluated across four directions: discriminative classification with abstention, calibration, structured-output validity, and computational efficiency. A multi-phase framework ensures fair comparison while controlling for selection bias. Across twenty frontier multimodal models, the results show that technical imaging attributes such as modality and plane are nearly solved, whereas diagnostic reasoning, especially subtype prediction, remains challenging. Tumor classification emerges as the most reliable task, stroke is moderately solvable, while multiple sclerosis and rare abnormalities remain difficult. Few-shot prompting improves performance for several models but increases token usage, latency, and cost. Gemini-2.5-Pro and GPT-5-Chat achieve the strongest overall diagnostic performance, while Gemini-2.5-Flash offers the best efficiency-performance trade-off. Among open-weight architectures, MedGemma-1.5-4B demonstrates the most promising results, as under few-shot prompting, it approaches the zero-shot performance of several proprietary models, while maintaining perfect structured output. These findings provide practical insights into performance, reliability, and efficiency trade-offs, supporting standardized evaluation of multimodal LLMs in neuroimaging.

URL PDF HTML ☆

赞 0 踩 0

2603.24844 2026-03-27 cs.LG cs.AI cs.CL

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

Isha Puri, Mehul Damani, Idan Shenfeld, Marzyeh Ghassemi, Jacob Andreas, Yoon Kim

2603.24840 2026-03-27 cs.CL

Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR

Haobo Xu, Sirui Chen, Ruizhong Qiu, Yuchen Yan, Chen Luo, Monica Cheng, Jingrui He, Hanghang Tong

Comments 17 pages, 4 figures

2603.24835 2026-03-27 cs.CV

DCARL: A Divide-and-Conquer Framework for Autoregressive Long-Trajectory Video Generation

Junyi Ouyang, Wenbin Teng, Gonglin Chen, Yajie Zhao, Haiwei Chen

Comments 29 pages, 11 figures. Project page: https://junyiouy.github.io/projects/dcarl

2603.24829 2026-03-27 cs.LG

Flow matching on homogeneous spaces

Francesco Ruscelli

Comments 10 pages

2603.24828 2026-03-27 cs.LG cs.AI

A Practical Guide Towards Interpreting Time-Series Deep Clinical Predictive Models: A Reproducibility Study

Yongda Fan, John Wu, Andrea Fitzpatrick, Naveen Baskaran, Jimeng Sun, Adam Cross

Comments Under Review

2603.24826 2026-03-27 cs.CL

Synthetic Rewriting as a Quality Multiplier: Evidence from Portuguese Continued Pretraining

Thales Sales Almeida, Rodrigo Nogueira, Hélio Pedrini