arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.00534 2026-04-02 cs.CV

FreqPhys: Repurposing Implicit Physiological Frequency Prior for Robust Remote Photoplethysmography

Wei Qian, Dan Guo, Jinxing Zhou, Bochao Zou, Zitong Yu, Meng Wang

详情

英文摘要

Remote photoplethysmography (rPPG) enables contactless physiological monitoring by capturing subtle skin-color variations from facial videos. However, most existing methods predominantly rely on time-domain modeling, making them vulnerable to motion artifacts and illumination fluctuations, where weak physiological clues are easily overwhelmed by noise. To address these challenges, we propose FreqPhys, a frequency-guided rPPG framework that explicitly leverages physiological frequency priors for robust signal recovery. Specifically, FreqPhys first applies a Physiological Bandpass Filtering module to suppress out-of-band interference, and then performs Physiological Spectrum Modulation together with adaptive spectral selection to emphasize pulse-related frequency components while suppress residual in-band noise. A Cross-domain Representation Learning module further fuses these spectral priors with deep time-domain features to capture informative spatial--temporal dependencies. Finally, a frequency-aware conditional diffusion process progressively reconstructs high-fidelity rPPG signals. Extensive experiments on six benchmarks demonstrate that FreqPhys yields significant improvements over state-of-the-art approaches, particularly under challenging motion conditions. It highlights the importance of explicitly modeling physiological frequency priors. The source code will be released.

URL PDF HTML ☆

赞 0 踩 0

2604.00533 2026-04-02 cs.LG cs.IT math.IT

Learning from Many and Adapting to the Unknown in Open-set Test Streams

Xiao Zhang, Juntao Lyu, Tianyu Hu, Qianchuan Zhao, Huimin Ma

2604.00531 2026-04-02 cs.LG

Learning Shared Representations for Multi-Task Linear Bandits

Jiabin Lin, Shana Moothedath

2604.00530 2026-04-02 cs.CV

AceTone: Bridging Words and Colors for Conditional Image Grading

Tianren Ma, Mingxiang Liao, Xijin Zhang, Qixiang Ye

Comments Accepted by CVPR 2026. Project Page: github.com/martian422/AceTone

2604.00529 2026-04-02 cs.LG cs.CL

MF-QAT: Multi-Format Quantization-Aware Training for Elastic Inference

Zifei Xu, Sayeh Sharify, Hesham Mostafa

2604.00523 2026-04-02 cs.LG cs.IR cs.MA

Lipschitz Dueling Bandits over Continuous Action Spaces

Mudit Sharma, Shweta Jain, Vaneet Aggarwal, Ganesh Ghalme

2604.00519 2026-04-02 cs.CV

Learnability-Guided Diffusion for Dataset Distillation

Jeffrey A. Chan-Santiago, Mubarak Shah

Comments This paper has been accepted to CVPR 2026

2604.00517 2026-04-02 cs.CV cs.AI

Toward Optimal Sampling Rate Selection and Unbiased Classification for Precise Animal Activity Recognition

Axiu Mao, Meilu Zhu, Lei Shen, Xiaoshuai Wang, Tomas Norton, Kai Liu

Comments 26 pages, 14 figures

2604.00514 2026-04-02 cs.CV cs.AI

MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

Kyeonghun Kim, Hyeonseok Jung, Youngung Han, Junsu Lim, YeonJu Jean, Seongbin Park, Eunseob Choi, Hyunsu Go, SeoYoung Ju, Seohyoung Park, Gyeongmin Kim, MinJu Kwon, KyungSeok Yuh, Soo Yong Kim, Ken Ying-Kai Liao, Nam-Joon Kim, Hyuk-Jae Lee

Comments 5 pages, 3 figures. Accepted at ICEIC 2026

2604.00510 2026-04-02 cs.AI

Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

Hongbeen Kim, Juhyun Lee, Sanghyeon Lee, Kwanghoon Choi, Jaehyuk Huh

2604.00508 2026-04-02 cs.LG

A Decoupled Basis-Vector-Driven Generative Framework for Dynamic Multi-Objective Optimization

Yaoming Yang, Shuai Wang, Bingdong Li, Peng Yang, Ke Tang

2604.00507 2026-04-02 cs.CV

RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

Jihwan Park, Chanhyeong Yang, Jinyoung Park, Taehoon Song, Hyunwoo J. Kim

Comments Accepted at CVPR2026

2604.00495 2026-04-02 cs.CV

PC-SAM: Patch-Constrained Fine-Grained Interactive Road Segmentation in High-Resolution Remote Sensing Images

Chengcheng Lv, Rushi Li, Mincheng Wu, Xiufang Shi, Zhenyu Wen, Shibo He

2604.00494 2026-04-02 cs.CV

ARGS: Auto-Regressive Gaussian Splatting via Parallel Progressive Next-Scale Prediction

Quanyuan Ruan, Kewei Shi, Jiabao Lei, Xifeng Gao, Xiaoguang Han

2604.00493 2026-04-02 cs.CV cs.AI cs.LG

A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation

Yabin Zhang, Chong Wang, Yunhe Gao, Jiaming Liu, Maya Varma, Justin Xu, Sophie Ostmeier, Jin Long, Sergios Gatidis, Seena Dehkharghani, Arne Michalson, Eun Kyoung Hong, Christian Bluethgen, Haiwei Henry Guo, Alexander Victor Ortiz, Stephan Altmayer, Sandhya Bodapati, Joseph David Janizek, Ken Chang, Jean-Benoit Delbrouck, Akshay S. Chaudhari, Curtis P. Langlotz

Comments Codes: https://github.com/YBZh/CheXOne Models: https://huggingface.co/StanfordAIMI/CheXOne

详情

英文摘要

Chest X-rays (CXRs) are among the most frequently performed imaging examinations worldwide, yet rising imaging volumes increase radiologist workload and the risk of diagnostic errors. Although artificial intelligence (AI) systems have shown promise for CXR interpretation, most generate only final predictions, without making explicit how visual evidence is translated into radiographic findings and diagnostic predictions. We present CheXOne, a reasoning-enabled vision-language model for CXR interpretation. CheXOne jointly generates diagnostic predictions and explicit, clinically grounded reasoning traces that connect visual evidence, radiographic findings, and these predictions. The model is trained on 14.7 million instruction and reasoning samples curated from 30 public datasets spanning 36 CXR interpretation tasks, using a two-stage framework that combines instruction tuning with reinforcement learning to improve reasoning quality. We evaluate CheXOne in zero-shot settings across visual question answering, report generation, visual grounding and reasoning assessment, covering 17 evaluation settings. CheXOne outperforms existing medical and general-domain foundation models and achieves strong performance on independent public benchmarks. A clinical reader study demonstrates that CheXOne-drafted reports are comparable to or better than resident-written reports in 55% of cases, while effectively addressing clinical indications and enhancing both report writing and CXR interpretation efficiency. Further analyses involving radiologists reveal that the generated reasoning traces show high clinical factuality and provide causal support for the final predictions, offering a plausible explanation for the performance gains. These results suggest that explicit reasoning can improve model performance, interpretability and clinical utility in AI-assisted CXR interpretation.

URL PDF HTML ☆

赞 0 踩 0

2604.00489 2026-04-02 cs.CL

Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling

Kazuki Yano, Jun Suzuki, Shinji Watanabe

2604.00479 2026-04-02 cs.CV

All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models

Xinyu Tian, Shu Zou, Zhaoyuan Yang, Mengqi He, Peter Tu, Jing Zhang

Comments Accepted to CVPR2026

2604.00477 2026-04-02 cs.AI cs.CL cs.HC cs.MA

Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation

HyunJoon Jung, William Na

2604.00473 2026-04-02 cs.LG math.DS

Phase space integrity in neural network models of Hamiltonian dynamics: A Lagrangian descriptor approach

Abrari Noor Hasmi, Haralampos Hatzikirou, Hadi Susanto

Comments 40 pages, 22 figures

详情

DOI: 10.1016/j.cnsns.2026.109956
Journal ref: Communications in Nonlinear Science and Numerical Simulation, Volume 160, September 2026, 109956

英文摘要

We propose Lagrangian Descriptors (LDs) as a diagnostic framework for evaluating neural network models of Hamiltonian systems beyond conventional trajectory-based metrics. Standard error measures quantify short-term predictive accuracy but provide little insight into global geometric structures such as orbits and separatrices. Existing evaluation tools in dissipative systems are inadequate for Hamiltonian dynamics due to fundamental differences in the systems. By constructing probability density functions weighted by LD values, we embed geometric information into a statistical framework suitable for information-theoretic comparison. We benchmark physically constrained architectures (SympNet, HénonNet, Generalized Hamiltonian Neural Networks) against data-driven Reservoir Computing across two canonical systems. For the Duffing oscillator, all models recover the homoclinic orbit geometry with modest data requirements, though their accuracy near critical structures varies. For the three-mode nonlinear Schrödinger equation, however, clear differences emerge: symplectic architectures preserve energy but distort phase-space topology, while Reservoir Computing, despite lacking explicit physical constraints, reproduces the homoclinic structure with high fidelity. These results demonstrate the value of LD-based diagnostics for assessing not only predictive performance but also the global dynamical integrity of learned Hamiltonian models.

URL PDF HTML ☆

赞 0 踩 0

2604.00469 2026-04-02 cs.CV cs.LG

Automated Detection of Multiple Sclerosis Lesions on 7-tesla MRI Using U-net and Transformer-based Segmentation

Michael Maynord, Minghui Liu, Cornelia Fermüller, Seongjin Choi, Yuxin Zeng, Shishir Dahal, Daniel M. Harrison

Comments 31 pages, 3 figures, 3 tables. Inference code and model weights available at https://github.com/maynord/7T-MS-lesion-segmentation

2604.00455 2026-04-02 cs.CV cs.AI cs.CL

First Logit Boosting: Visual Grounding Method to Mitigate Object Hallucination in Large Vision-Language Models

Jiwoo Ha, Jongwoo Baek, Jinhyun So

Comments 19 pages, 13 figures

2604.00452 2026-04-02 cs.CV

Out of Sight, Out of Track: Adversarial Attacks on Propagation-based Multi-Object Trackers via Query State Manipulation

Halima Bouzidi, Haoyu Liu, Yonatan Gizachew Achamyeleh, Praneetsai Vasu Iddamsetty, Mohammad Abdullah Al Faruque

Comments Accepted for presentation at CVPR 2026 (main track)

2604.00447 2026-04-02 cs.SD cs.HC

Sona: Real-Time Multi-Target Sound Attenuation for Noise Sensitivity

Jeremy Zhengqi Huang, Emani Hicks, Sidharth, Gillian R. Hayes, Dhruv Jain

Comments 12 pages, 6 figures

2604.00445 2026-04-02 cs.AI cs.CL

Towards Reliable Truth-Aligned Uncertainty Estimation in Large Language Models

Ponhvoan Srey, Quang Minh Nguyen, Xiaobao Wu, Anh Tuan Luu

2604.00443 2026-04-02 cs.CL cs.AI

Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics

Iyad Ait Hou, Rebecca Hwa

Comments 21 pages

2604.00442 2026-04-02 cs.AI cs.CL

Execution-Verified Reinforcement Learning for Optimization Modeling

Runda Guan, Xiangqing Shen, Jiajun Zhang, Yifan Zhang, Jian Cheng, Rui Xia

2604.00439 2026-04-02 cs.RO cs.SY eess.SY

Reachability-Aware Time Scaling for Path Tracking

Hossein Gholampour, Logan E. Beaver

Comments 7 pages, 5 figures

2604.00438 2026-04-02 cs.CL

TR-ICRL: Test-Time Rethinking for In-Context Reinforcement Learning

Wenxuan Jiang, Yuxin Zuo, Zijian Zhang, Xuecheng Wu, Zining Fan, Wenxuan Liu, Li Chen, Xiaoyu Li, Xuezhi Cao, Xiaolong Jin, Ninghao Liu

Comments 14 pages, 7 figures

2604.00428 2026-04-02 cs.RO

Certificate-Driven Closed-Loop Multi-Agent Path Finding with Inheritable Factorization

Jiarui Li, Runyu Zhang, Gioele Zardini

2604.00421 2026-04-02 cs.AI

Self-Routing: Parameter-Free Expert Routing from Hidden States

Jama Hussein Mohamud, Drew Wagner, Mirco Ravanelli