arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.26211 2026-04-30 cs.AI cs.LG

OMEGA: Optimizing Machine Learning by Evaluating Generated Algorithms

Jeremy Nixon, Annika Singh

Comments ICLR 2026: Workshop on AI with Recursive Self-Improvement

2604.26209 2026-04-30 cs.CL cs.AI

Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction

Theodore Glavas, Nikhita Vedula, Dushyanta Dhyani, Yilun Zhu, Shervin Malmasi

2604.26206 2026-04-30 cs.CL cs.AI

Option-Order Randomisation Reveals a Distributional Position Attractor in Prompted Sandbagging

Jon-Paul Cacioli

Comments 9 pages, 4 figures, 1 table. Pre-registered: https://osf.io/efr6s. Code and data: https://github.com/synthiumjp/bcb-sandbagging-pilot

2604.26201 2026-04-30 cs.RO

Lights Out: A Nighttime UAV Localization Framework Using Thermal Imagery and Semantic 3D Maps

Ryan Allen, Melissa Greeff

Comments 8 pages, 4 figures, accepted to ICUAS 2025

2604.26188 2026-04-30 cs.LG

Efficient and Interpretable Transformer for Counterfactual Fairness

Panyi Dong, Zhiyu Quan

2604.26186 2026-04-30 cs.CV cs.HC cs.IR cs.MM

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

Morayo Danielle Adeyemi, Ryan A. Rossi, Franck Dernoncourt

Comments 5 pages, 4 tables, 1 figure. Under review

2604.26184 2026-04-30 cs.CV cs.CR

Privacy-Preserving Clothing Classification using Vision Transformer for Thermal Comfort Estimation

Tatsuya Chuman, Yousuke Udagawa, Hitoshi Kiya

Comments To be appeared in 2026 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW 2026)

2604.26182 2026-04-30 cs.CV cs.AI cs.LG

Lifting Embodied World Models for Planning and Control

Alex N. Wang, Trevor Darrell, Pavel Izmailov, Yutong Bai, Amir Bar

2604.26174 2026-04-30 cs.CV cs.LG cs.RO

Why Domain Matters: A Preliminary Study of Domain Effects in Underwater Object Detection

Melanie Wille, Dimity Miller, Tobias Fischer, Scarlett Raine

Comments Poster Presentation at ICRA 2026 Workshop S2S

2604.26170 2026-04-30 cs.CL

EvoSelect: Data-Efficient LLM Evolution for Targeted Task Adaptation

Ting-Wei Li, Sirui Chen, Jiaru Zou, Yingbing Huang, Tianxin Wei, Jingrui He, Hanghang Tong

详情

英文摘要

Adapting large language models (LLMs) to a targeted task efficiently and effectively remains a fundamental challenge. Such adaptation often requires iteratively improving the model toward a targeted task, yet collecting high-quality human-labeled data to support this process is costly and difficult to scale. As a result, synthetic data generation has emerged as a flexible and scalable alternative. One straightforward approach is through an iterative generation-training loop, where candidate data are synthesized through an external generator, the model is updated using these data and the process is repeated over iterations. However, generated samples can be noisy, highly redundant, or even misaligned with the targeted task distribution. Training indiscriminately on such data can dilute useful learning signals and even degrade model performance. To address this, we introduce a refined paradigm, namely an iterative generation-selection-training loop, which incorporates a selection step prior to model updates. Building on this paradigm, we propose EvoSelect, a data-efficient framework to evolve LLM effectively. Given candidate samples produced by the data generator, EvoSelect selects training data by jointly modeling targeted task alignment and diversity. We estimate task relevance through optimal transport with proxy gradient representations, which quantifies how well candidate samples align with the targeted task distribution. To mitigate redundancy, we incorporate a diversification mechanism that promotes coverage of complementary training samples. By interleaving alignment and diversification, EvoSelect enables progressive LLM evolution toward targeted tasks. Extensive experiments on various benchmarks demonstrate that with either weak or strong data generators, EvoSelect consistently improves adaptation efficacy over existing data selection methods.

URL PDF HTML ☆

赞 0 踩 0

2604.26169 2026-04-30 cs.LG econ.EM stat.ML

Budget-Constrained Causal Bandits: Bridging Uplift Modeling and Sequential Decision-Making

Abhirami Pillai

Comments 12 pages, 2 figures, preprint

2604.26167 2026-04-30 cs.CL cs.AI cs.LG

Test-Time Safety Alignment

Baturay Saglam, Dionysis Kalogerias

2604.26147 2026-04-30 cs.CV cs.AI

A Data-Centric Framework for Intraoperative Fluorescence Lifetime Imaging for Glioma Surgical Guidance

Silvia Noble Anbunesan, Mohamed Abul Hassan, Jinyi Qi, Lisanne Kraft, Han Sung Lee, Orin Bloch, Laura Marcu

2604.26138 2026-04-30 cs.CV

MixerCA: An Efficient and Accurate Model for High-Performance Hyperspectral Image Classification

Mohammed Q. Alkhatib, Ali Jamali

Comments Preprint accepted for publication in "Remote Sensing Applications: Society and Environment" Journal

2604.26133 2026-04-30 cs.LG

Spatially-constrained clustering of geospatial features for heat vulnerability assessment of favelas in Rio de Janeiro

Baptiste Clemence, Thomas Hallopeau, Vanderlei Pascoal De Matos, Laurent Demagistri, Joris Guerin

Comments Workshop Publication (ICLR ML4RS 2026)

2604.26130 2026-04-30 cs.LG cs.AI

reward-lens: A Mechanistic Interpretability Library for Reward Models

Mohammed Suhail B Nadaf

Comments 30 pages, 5 figures, 9 tables, including appendix. Library available at https://github.com/suhailnadaf509/reward-lens (pip install reward-lens)

2604.26120 2026-04-30 cs.AI

Hierarchical Multi-Persona Induction from User Behavioral Logs: Learning Evidence-Grounded and Truthful Personas

Nayoung Choi, Haeyu Jeong, Changbong Kim, Hongjun Lim, Jinho D. Choi

2604.26116 2026-04-30 cs.CV cs.LG

Sample Selection Using Multi-Task Autoencoders in Federated Learning with Non-IID Data

Emre Ardıç, Yakup Genç

Comments Published in Engineering Science and Technology, an International Journal, 61 (2025), 101920. DOI: https://doi.org/10.1016/j.jestch.2024.101920 and Codes: https://github.com/eardic/FL_DPQS

详情

DOI: 10.1016/j.jestch.2024.101920
Journal ref: Engineering Science and Technology, an International Journal, 61 (2025), 101920

英文摘要

Federated learning is a machine learning paradigm in which multiple devices collaboratively train a model under the supervision of a central server while ensuring data privacy. However, its performance is often hindered by redundant, malicious, or abnormal samples, leading to model degradation and inefficiency. To overcome these issues, we propose novel sample selection methods for image classification, employing a multitask autoencoder to estimate sample contributions through loss and feature analysis. Our approach incorporates unsupervised outlier detection, using one-class support vector machine (OCSVM), isolation forest (IF), and adaptive loss threshold (AT) methods managed by a central server to filter noisy samples on clients. We also propose a multi-class deep support vector data description (SVDD) loss controlled by a central server to enhance feature-based sample selection. We validate our methods on CIFAR10 and MNIST datasets across varying numbers of clients, non-IID distributions, and noise levels up to 40%. The results show significant accuracy improvements with loss-based sample selection, achieving gains of up to 7.02% on CIFAR10 with OCSVM and 1.83% on MNIST with AT. Additionally, our federated SVDD loss further improves feature-based sample selection, yielding accuracy gains of up to 0.99% on CIFAR10 with OCSVM. These results show the effectiveness of our methods in improving model accuracy across various client counts and noise conditions.

URL PDF HTML ☆

赞 0 踩 0

2604.26106 2026-04-30 cs.AI

Evaluating Strategic Reasoning in Forecasting Agents

Tom Liptay, Dan Schwarz, Rafael Poyiadzi, Jack Wildman, Nikos I. Bosse

2604.26097 2026-04-30 cs.LG cs.AI cs.GR

Momentum-Conserving Graph Neural Networks for Deformable Objects

Jiahong Wang, Logan Numerow, Stelian Coros, Christian Theobalt, Vahid Babaei, Bernhard Thomaszewski

Comments Accepted to 3DV 2026

2604.26095 2026-04-30 cs.AI

Distill-Belief: Closed-Loop Inverse Source Localization and Characterization in Physical Fields

Yiwei Shi, Zixing Song, Mengyue Yang, Cunjia Liu, Weiru Liu

2604.26091 2026-04-30 cs.AI cs.CE cs.MA

Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital

T. J. Barton, Chris Constantakis, Patti Hauseman, Annie Mous, Alaska Hoffman, Brian Bergeron, Hunter Goodreau

Comments 18 pages, 6 figures. Public onchain dashboard and supporting documentation linked in paper

2604.26084 2026-04-30 cs.CV cs.AI cs.RO

FruitProM-V2: Robust Probabilistic Maturity Estimation and Detection of Fruits and Vegetables

Rahul Harsha Cheppally, Sidharth Rai, Sudan Baral, Benjamin Vail, Ajay Sharda

2604.26078 2026-04-30 cs.LG

PPG-Based Affect Recognition with Long-Range Deep Models: A Measurement-Driven Comparison of CNN, Transformer, and Mamba Architectures

Karim Alghoul, Hussein Al Osman, Abdulmotaleb El Saddik

2604.26073 2026-04-30 cs.LG cs.AI cs.SY eess.SY

Privacy-Preserving Federated Learning Framework for Distributed Chemical Process Optimization

Teetat Pipattaratonchai, Aueaphum Aueawatthanaphisut

Comments 10 pages, 5 figures, 2 tables, 17 equations

2604.26067 2026-04-30 cs.CV

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments

Zaid Nasser, Mikhail Iumanov, Tianhao Li, Maxim Popov, Jaafar Mahmoud, Sergey Kolyubin

2604.26065 2026-04-30 cs.RO

FlowS: One-Step Motion Prediction via Local Transport Conditioning

Leandro Di Bella, Adrian Munteanu, Bruno Cornelis

Comments 8 pages

2604.26051 2026-04-30 cs.CV cs.AI

Evaluating the Alignment Between GeoAI Explanations and Domain Knowledge in Satellite-Based Flood Mapping

Hyunho Lee, Wenwen Li

Comments 21 pages, 6 figures, 5 tables

2604.26048 2026-04-30 cs.CL

BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets

Richard A. A. Jonker, Bárbara Maria Ribeiro de Abreu Martins, Sérgio Matos

Comments 15 pages, 7 figures, conference (ECIR)

详情

DOI: 10.1007/978-3-032-21321-1_62
Journal ref: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 - April 2, 2026, Proceedings, Part IV

英文摘要

This paper presents a principled and scalable framework for systematically generating complex Question Answering (QA) data. In the core of this framework is a graphlet-anchored generation process, where small subgraphs from a Knowledge Graph (KG) are used in a structured prompt to control the complexity and ensure the factual grounding of questions generated by Large Language Models. The first instantiation of this framework is BioGraphletQA, a new biomedical KGQA dataset of 119,856 QA pairs. Each entry is grounded in a graphlet of up to five nodes from the OREGANO KG, with most of the pairs being enriched with relevant document snippets from PubMed. We start by demonstrating the framework's value and the dataset's quality through evaluation by a domain expert on 106 QA pairs, confirming the high scientific validity and complexity of the generated data. Secondly, we establish its practical utility by showing that augmenting downstream benchmarks with our data improves accuracy on PubMedQA from 49.2% to 68.5% in a low-resource setting, and on MedQA from a 41.4% baseline to 44.8% in a full-resource setting. Our framework provides a robust and generalizable solution for creating critical resources to advance complex QA tasks, including MCQA and KGQA. All resources supporting this work, including the dataset (https://zenodo.org/records/17381119) and framework code (https://github.com/ieeta-pt/BioGraphletQA), are publicly available to facilitate use, reproducibility and extension.

URL PDF HTML ☆

赞 0 踩 0

2604.26039 2026-04-30 cs.LG cs.AI cs.DC

RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts

Vyom Sharma, Debajyoti Datta

Comments 10 pages, 8 figures, 9 tables. Preprint