arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.04751 2026-04-28 cs.AI

Evaluating the Search Agent in a Parallel World

Jiawei Chen, Xintian Shen, Lihao Zheng, Lifu Mu, Haoyi Sun, Ning Mao, Hao Ma, Tao Wei, Pan Zhou, Kun Zhan

Comments https://github.com/TIMMY-CHAN/Mind-ParaWorld

详情

英文摘要

Integrating web search tools has significantly extended the capability of LLMs to address open-world, real-time, and long-tail problems. However, evaluating these Search Agents presents formidable challenges. First, constructing high-quality deep search benchmarks is prohibitively expensive, while unverified synthetic data often suffers from unreliable sources. Second, static benchmarks face dynamic obsolescence: as internet information evolves, complex queries requiring deep research often degrade into simple retrieval tasks due to increased popularity, and ground truths become outdated due to temporal shifts. Third, attribution ambiguity confounds evaluation, as an agent's performance is often dominated by its parametric memory rather than its actual search and reasoning capabilities. Finally, reliance on specific commercial search engines introduces variability that hampers reproducibility. To address these issues, we propose a novel framework, Mind-ParaWorld, for evaluating Search Agents in a Parallel World. Specifically, MPW samples real-world entity names to synthesize future scenarios and questions situated beyond the model's knowledge cutoff. A ParaWorld Law Model then constructs a set of indivisible Atomic Facts and a unique ground-truth for each question. During evaluation, instead of retrieving real-world results, the agent interacts with a ParaWorld Engine Model that dynamically generates SERPs grounded in these inviolable Atomic Facts. We release MPW-Bench, an interactive benchmark spanning 19 domains with 1,608 instances. Experiments across three evaluation settings show that, while search agents are strong at evidence synthesis given complete information, their performance is limited not only by evidence collection and coverage in unfamiliar search environments, but also by unreliable evidence sufficiency judgment and when-to-stop decisions-bottlenecks.

URL PDF HTML ☆

赞 0 踩 0

2602.17315 2026-04-28 cs.LG cs.AI

Flickering Multi-Armed Bandits

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

2602.15853 2026-04-28 cs.CL cs.AI

A Lightweight Explainable Guardrail for Prompt Safety

Md Asiful Islam, Mihai Surdeanu

2602.15846 2026-04-28 cs.CL

Gated Tree Cross-Attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMs

Xinyu Gao, Shaonan Wang, Nai Ding

Comments ACL 2026 MainConference

2602.14222 2026-04-28 cs.RO cs.SY eess.SY math.OC

Muscle Coactivation in the Sky: Geometry and Pareto Optimality of Energy vs. Aerodynamic Promptness and Multirotors as Variable Stiffness Actuators

Antonio Franchi

Comments Accepted for IEEE ICUAS 2026

2602.12971 2026-04-28 cs.RO

INHerit-SG: Incremental Hierarchical Semantic Scene Graphs with RAG-Style Retrieval

YukTungSamuel Fang, Zhikang Shi, Jiabin Qiu, Zixuan Chen, Jieqi Shi, Hao Xu, Jing Huo, Yang Gao

2602.12134 2026-04-28 cs.AI cs.HC

Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

Jiajun Chen, Hua Shen

Comments Preprint. Under review. 20 pages, 13 figures

2602.12120 2026-04-28 cs.AI

Forecasting Commencing Enrolments Under Data Sparsity: A Zero-Shot Time Series Foundation Models Framework for Higher Education Planning

Jittarin Jetwiriyanon, Teo Susnjak, Surangika Ranathunga

Comments 30 pages, 5 figures, 3 tables

2602.11079 2026-04-28 cs.LG cs.AI

Probe-Based Data Attribution: Discovering and Mitigating Undesirable Behaviors in LLM Post-Training

Frank Xiao, Santiago Aranguri

2602.10298 2026-04-28 cs.CL

On Emergent Social World Models -- Evidence for Functional Integration of Theory of Mind and Pragmatic Reasoning in Language Models

Polina Tsvilodub, Jan-Felix Klumpp, Amir Mohammadpour, Jennifer Hu, Michael Franke

Comments 39 pages, 20 figures, accepted to ACL 2026 Main Conference

2602.04749 2026-04-28 cs.CV

Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation

Buddhi Wijenayake, Nichula Wasalathilake, Roshan Godaliyadda, Vijitha Herath, Parakrama Ekanayake, Vishal M. Patel

Comments Accepted to Publication at 2026 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

2602.00981 2026-04-28 cs.CL cs.AI

MedSpeak: A Knowledge Graph-Aided ASR Error Correction Framework for Spoken Medical QA

Yutong Song, Shiva Shrestha, Chenhan Lyu, Elahe Khatibi, Pengfei Zhang, Honghui Xu, Nikil Dutt, Amir Rahmani

2601.17536 2026-04-28 cs.CV cs.LG

OTI: A Model-free and Visually Interpretable Measure of Image Attackability

Jiaming Liang, Haowei Liu, Chi-Man Pun

详情

DOI: 10.1609/aaai.v40i9.37615
Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 40(9), 6826-6834, 2026

英文摘要

Despite the tremendous success of neural networks, benign images can be corrupted by adversarial perturbations to deceive these models. Intriguingly, images differ in their attackability. Specifically, given an attack configuration, some images are easily corrupted, whereas others are more resistant. Evaluating image attackability has important applications in active learning, adversarial training, and attack enhancement. This prompts a growing interest in developing attackability measures. However, existing methods are scarce and suffer from two major limitations: (1) They rely on a model proxy to provide prior knowledge (e.g., gradients or minimal perturbation) to extract model-dependent image features. Unfortunately, in practice, many task-specific models are not readily accessible. (2) Extracted features characterizing image attackability lack visual interpretability, obscuring their direct relationship with the images. To address these, we propose a novel Object Texture Intensity (OTI), a model-free and visually interpretable measure of image attackability, which measures image attackability as the texture intensity of the image's semantic object. Theoretically, we describe the principles of OTI from the perspectives of decision boundaries as well as the mid- and high-frequency characteristics of adversarial perturbations. Comprehensive experiments demonstrate that OTI is effective and computationally efficient. In addition, our OTI provides the adversarial machine learning community with a visual understanding of attackability.

URL PDF HTML ☆

赞 0 踩 0

2601.16046 2026-04-28 cs.RO cs.CV

DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning

Junha Lee, Eunha Park, Minsu Cho

Comments CVPR 2026, Project page: https://junha-l.github.io/dexter/

2601.13288 2026-04-28 cs.CL

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Gonzalo Ariel Meyoyan, Luciano Del Corro

Comments Accepted to ACL 2026 (Main Conference)

2601.07476 2026-04-28 cs.RO cs.SE cs.SY eess.SY

NanoCockpit: Performance-optimized Application Framework for AI-based Autonomous Nanorobotics

Elia Cereda, Alessandro Giusti, Daniele Palossi

Comments Accepted for publication in the IEEE RA-P journal. GitHub repository: https://github.com/idsia-robotics/crazyflie-nanocockpit

2601.06352 2026-04-28 cs.AI

CARD: Cluster-level Adaptation with Reward-guided Decoding for Personalized Text Generation

Yutong Song, Jiang Wu, Weijia Zhang, Chengze Shen, Shaofan Yuan, Weitao Lu, Jian Wang, Yu Wang, Nikil Dutt, Amir M. Rahmani

2601.04919 2026-04-28 cs.AI cs.HC

What Students Ask, How a Generative AI Assistant Responds: Exploring Higher Education Students' Dialogues on Learning Analytics Feedback

Yildiz Uzun, Andrea Gauthier, Mutlu Cukurova

2601.02018 2026-04-28 cs.CV

Towards Any-Quality Image Segmentation via Generative and Adaptive Latent Space Enhancement

Guangqian Guo, Aixi Ren, Yong Guo, Xuehui Yu, Jiacheng Tian, Wenli Li, Chaowei Wang, Yaoxing Wang, Shan Gao

Comments Diffusion-based latent space enhancement helps improve the robustness of SAM

2601.00823 2026-04-28 cs.AI cs.IT cs.SY eess.SY math.IT

Energy-Aware Routing to Large Reasoning Models

Austin R. Ellis-Mohr, Max Hartman, Lav R. Varshney

2512.21648 2026-04-28 cs.LG cs.AI

Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search

Maximilian Weichart

2512.20164 2026-04-28 cs.CL cs.AI

AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

Honglin Mu, Jinghao Liu, Kaiyang Wan, Rui Xing, Xiuying Chen, Timothy Baldwin, Wanxiang Che

2512.12844 2026-04-28 cs.LG cs.AI

Selective Conformal Risk Control

Yunpeng Xu, Wenge Guo, Zhi Wei

2512.10362 2026-04-28 cs.CV cs.AI

Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models

Woojun Jung, Jaehoon Go, Mingyu Jeon, Sunjae Yoon, Junyeong Kim

Comments Accepted to CVPR 2026(Findings)

2512.08345 2026-04-28 cs.AI cs.CL cs.CY cs.MA

The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations

Benedikt Mangold

Comments 8 figures, 3 tables

2512.07868 2026-04-28 cs.LG cs.AI stat.ML

Bayesian Optimization for Function-Valued Responses under Min-Max Criteria

Pouya Ahadi, Reza Marzban, Ali Adibi, Kamran Paynabar

Comments 25 pages, 6 figures

2512.07834 2026-04-28 cs.CV

Voxify3D: Pixel Art Meets Volumetric Rendering

Yi-Chuan Huang, Jiewen Chan, Hao-Jen Chien, Yu-Lun Liu

Comments CVPR 2026. Project page: https://yichuanh.github.io/Voxify-3D/

2512.07538 2026-04-28 cs.CL

SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents

Michelle Wastl, Jannis Vamvas, Rico Sennrich

Comments 30 pages; v3 accepted to ACL Main (camera-ready)

2512.07371 2026-04-28 cs.RO cs.AI

ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning

Byung-ju Kim, Jinu Pahk, Chungwoo Lee, Jaejoon Kim, Jangha Lee, Theo Taeyeong Kim, Kyuhwan Shim, Jun Ki Lee, Byoung-Tak Zhang

Comments project page: https://project-espada.github.io/espada/

2512.04695 2026-04-28 cs.LG

TRINITY: An Evolved LLM Coordinator

Jinglue Xu, Qi Sun, Peter Schwendeman, Stefan Nielsen, Edoardo Cetin, Yujin Tang

Comments To appear at the 14th International Conference on Learning Representation (ICLR 2026)