arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.12677 2026-05-08 cs.CL cs.AI

MetaKE: Meta-Learning for Knowledge Editing Toward a Better Accuracy-Editability Trade-off

Shuxin Liu, Di Gao, Ou Wu

Comments 37 pages, 9 figures

详情

英文摘要

Existing locate-then-edit Knowledge Editing (KE) methods typically decompose editing into two stages: upstream target representation optimization and downstream constrained parameter optimization. The optimization across the two stages is disconnected: upstream applies uniform regularization without observing downstream realization of the planned residual, hindering a refined accuracy-editability trade-off. Since this realization is request-specific and depends on downstream constraints, uniform regularization can over-shrink high-association requests, causing insufficient editing, while it can under-regularize low-association requests, producing over-large planned residuals that reduce downstream editability. To bridge this disconnect, we propose MetaKE (Meta-learning for Knowledge Editing), a new framework that unifies upstream and downstream stages into a bi-level optimization problem. The inner level optimizes parameter updates for the target representation, while the outer level optimizes representation using feedback from downstream constraints, achieving a better semantic accuracy-editability trade-off. To avoid costly multi-layer backpropagation, we introduce a Structural Gradient Proxy to approximate and propagate this feedback. Extensive experiments show that MetaKE outperforms strong baselines, offering a new perspective on KE.

URL PDF HTML ☆

赞 0 踩 0

2603.12572 2026-05-08 cs.CL

LMEB: Long-horizon Memory Embedding Benchmark

Xinping Zhao, Xinshuo Hu, Jiaxin Xu, Danyu Tang, Xin Zhang, Mengjia Zhou, Yan Zhong, Yao Zhou, Zifei Shan, Meishan Zhang, Baotian Hu, Min Zhang

Comments 35 pages, 9 figures, 23 tables

2603.11161 2026-05-08 cs.LG cond-mat.dis-nn stat.ML

Algorithmic Task Capture, Computational Complexity, and Inductive Bias of Infinite Transformers

Orit Davidovich, Zohar Ringel

2603.10302 2026-05-08 cs.LG q-bio.QM

How to make the most of your masked language model for protein engineering

Calvin McCarter, Nick Bhattacharya, Sebastian W. Ober, Hunter Elliott

Comments Accepted into the GEM Workshop, ICLR 2026

2603.07819 2026-05-08 cs.CV cs.LG

Fusion Complexity Inversion: Why Simpler Cross View Modules Outperform SSMs and Cross View Attention Transformers for Pasture Biomass Regression

Mridankan Mandal

Comments Accepted to CVPR: Vision for Agriculture Workshop 2026 (Withdrawn)

2603.06351 2026-05-08 cs.CV cs.AI cs.LG

DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking

Akash Haridas, Utkarsh Saxena, Parsa Ashrafi Fashi, Mehdi Rezagholizadeh, Vikram Appia, Emad Barsoum

2603.05630 2026-05-08 cs.CV cs.LG

Making Reconstruction FID Predictive of Diffusion Generation FID

Tongda Xu, Mingwei He, Shady Abu-Hussein, Jose Miguel Hernandez-Lobato, Chunhang Zheng, Kai Zhao, Chao Zhou, Ya-Qin Zhang, Yan Wang

2603.05421 2026-05-08 cs.CV cs.AI cs.LG

DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression

Numan Saeed, Asif Hanif, Fadillah Adamsyah Maani, Hussain Alasmawi, Mohammad Yaqub

Comments Project website: www.numansaeed.com/mobilefetalclip

2603.04673 2026-05-08 cs.CV physics.med-ph stat.ML

sFRC for assessing hallucinations in medical image restoration

Prabhat Kc, Rongping Zeng, Nirmal Soni, Aldo Badano

Comments 16 pages; 14 figures; 1 Supplemental document. TechRxiv Preprints, 2025

详情

DOI: 10.36227/techrxiv.171259560.02243347/v2

英文摘要

Deep learning (DL) methods are currently being explored to restore images from sparse-view-, limited-data-, and undersampled-based acquisitions in medical applications. Although outputs from DL may appear visually appealing based on likability/subjective criteria (such as less noise, smooth features), they may also suffer from hallucinations. This issue is further exacerbated by a lack of easy-to-use techniques and robust metrics for the identification of hallucinations in DL outputs. In this work, we propose performing Fourier Ring Correlation (FRC) analysis over small patches and concomitantly (s)canning across DL outputs and their reference counterparts to detect hallucinations (termed as sFRC). We describe the rationale behind sFRC and provide its mathematical formulation. The parameters essential to sFRC may be set using predefined hallucinated features annotated by subject matter experts or using imaging theory-based hallucination maps. We use sFRC to detect hallucinations for three undersampled medical imaging problems: CT super-resolution, CT sparse view, and MRI subsampled restoration. In the testing phase, we demonstrate sFRC's effectiveness in detecting hallucinated features for the CT problem and sFRC's agreement with imaging theory-based outputs on hallucinated feature maps for the MR problem. Finally, we quantify the hallucination rates of DL methods on in-distribution versus out-of-distribution data and under increasing subsampling rates to characterize the robustness of DL methods. Beyond DL-based methods, sFRC's effectiveness in detecting hallucinations for a conventional regularization-based restoration method and a state-of-the-art unrolled method is also shown.

URL PDF HTML ☆

赞 0 踩 0

2603.03331 2026-05-08 cs.CL cs.AI

PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning

Hung Manh Pham, Jinyang Wu, Xiao Ma, Yiming Zhang, Yixin Xu, Aaqib Saeed, Bin Zhu, Zhou Pan, Dong Ma

Comments PulseLM v2

2603.03080 2026-05-08 cs.AI

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

Chengkai Wang, Baisong Liu

Comments The authors have identified an issue in the evaluation protocol in Section 5.1.3. Feature extraction and semantic matching used to compute P-EHR require correction and re-validation, as they may not have been applied consistently across all generated explanations and baselines. This may affect part of the reported quantitative results and analysis, so the authors withdraw this version

2602.22859 2026-05-08 cs.CV

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Hongrui Jia, Chaoya Jiang, Yongrui Heng, Shikun Zhang, Wei Ye

2602.20670 2026-05-08 cs.CL cs.AI

CAMEL: Confidence-Gated Reflection for Reward Modeling

Zirui Zhu, Hailun Xu, Yang Luo, Yong Liu, Kanchan Sarkar, Kun Xu, Yang You

Comments ICML 2026

2602.18823 2026-05-08 cs.CL

EvalSense: A Framework for Domain-Specific LLM (Meta-)Evaluation

Adam Dejl, Jonathan Pearson

Comments Accepted to EACL 2026 System Demonstrations

详情

DOI: 10.18653/v1/2026.eacl-demo.33
Journal ref: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), 480-491. 2026

英文摘要

Robust and comprehensive evaluation of large language models (LLMs) is essential for identifying effective LLM system configurations and mitigating risks associated with deploying LLMs in sensitive domains. However, traditional statistical metrics are poorly suited to open-ended generation tasks, leading to growing reliance on LLM-based evaluation methods. These methods, while often more flexible, introduce additional complexity: they depend on carefully chosen models, prompts, parameters, and evaluation strategies, making the evaluation process prone to misconfiguration and bias. In this work, we present EvalSense, a flexible, extensible framework for constructing domain-specific evaluation suites for LLMs. EvalSense provides out-of-the-box support for a broad range of model providers and evaluation strategies, and assists users in selecting and deploying suitable evaluation methods for their specific use-cases. This is achieved through two unique components: (1) an interactive guide aiding users in evaluation method selection and (2) automated meta-evaluation tools that assess the reliability of different evaluation approaches using perturbed data. We demonstrate the effectiveness of EvalSense in a case study involving the generation of clinical notes from unstructured doctor-patient dialogues, using a popular open dataset. All code, documentation, and assets associated with EvalSense are open-source and publicly available at https://github.com/nhsengland/evalsense.

URL PDF HTML ☆

赞 0 踩 0

2602.18473 2026-05-08 cs.LG cs.AI

Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series

Guoqi Yu, Juncheng Wang, Chen Yang, Jing Qin, Angelica I. Aviles-Rivero, Shujun Wang

Comments Accepted by ICLR 2026 (Oral). arXiv admin note: text overlap with arXiv:2405.19363 by other authors

2602.17683 2026-05-08 cs.LG cs.CV stat.ML

Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates

Irene Iele, Giulia Romoli, Daniele Molino, Elena Mulero Ayllón, Filippo Ruffini, Paolo Soda, Matteo Tortora

2602.17419 2026-05-08 cs.CV

EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models

Xiaomeng Peng, Xilang Huang, Seon Han Choi

2602.15872 2026-05-08 cs.RO cs.CV cs.LG

MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models

Xunlan Zhou, Xuanlin Chen, Shaowei Zhang, ShengHua Wan, Xiaohai Hu, Lei Yuan, De-chuan Zhan

2602.13636 2026-05-08 cs.CV

Layer-Guided UAV Tracking: Enhancing Efficiency and Occlusion Robustness

Yang Zhou, Derui Ding, Ran Sun, Ying Sun, Haohua Zhang

2602.12828 2026-05-08 cs.LG cs.AI

Risk Horizons: Structured Hypothesis Spaces for Longitudinal Clinical Prediction

Zhan Qu, Michael Färber

2602.11509 2026-05-08 cs.CL cs.AI cs.CV

Multimodal Fact-Level Attribution for Verifiable Reasoning

David Wan, Han Wang, Ziyang Wang, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal

Comments Accepted to ICML 2026. Code and data are available at https://github.com/meetdavidwan/murgat

2602.09128 2026-05-08 cs.LG

Counterfactual Maps: What They Are and How to Find Them

Awa Khouna, Julien Ferry, Thibaut Vidal

2602.07830 2026-05-08 cs.AI

Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning

Jiahui Zhou, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Lin Li, Zhuomin Chen, Jian Lou, See-Kiong Ng

2602.04832 2026-05-08 cs.LG cs.AI cs.CV cs.NE

It's Not a Lottery, It's a Race: Understanding How Gradient Descent Adapts the Network's Capacity to the Task

Hannah Pinson

2602.04244 2026-05-08 cs.LG

GraphVec: Cross-Domain Graph Vectorization for Graph-Level Representation Learning

Qi Feng, Jicong Fan

2602.01839 2026-05-08 cs.LG cs.AI q-bio.GN

DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics Analysis

Ru Zhang, Xunkai Li, Yaxin Deng, Sicheng Liu, Daohan Su, Qiangqiang Dai, Hongchao Qin, Rong-Hua Li, Guoren Wang, Jia Li

Comments 34 pages, 4 figures

2602.00656 2026-05-08 cs.LG

DisRFM: Polar Riemannian Flow Matching for Structure-Preserving Graph Domain Adaptation

Yingxu Wang, Xinwang Liu, Mengzhu Wang, Siyang Gao, Nan Yin

2602.00175 2026-05-08 cs.LG cs.AI cs.CV cs.CY

The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization

Manyi Li, Yufan Liu, Lai Jiang, Bing Li, Yuming Li, Weiming Hu

Comments 25 pages, 12 figures, 12 tables

2601.23166 2026-05-08 cs.CL

Monotonic Reference-Free Refinement for Autoformalization

Lan Zhang, Marco Valentino, André Freitas

Comments Preprint

2601.22891 2026-05-08 cs.LG

PlatoLTL: Learning to Generalize Across Symbols in LTL Instructions for Multi-Task RL

Jacques Cloete, Mathias Jackermeier, Ioannis Havoutis, Alessandro Abate

Comments 14 pages, 4 figures (main paper). 22 pages, 11 figures (appendix)