arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.21199 2026-05-04 cs.LG cs.CV

ARFBench: Benchmarking Time Series Question Answering Ability for Software Incident Response

Stephan Xie, Ben Cohen, Mononito Goswami, Junhong Shen, Emaad Khwaja, Chenghao Liu, David Asker, Othmane Abou-Amal, Ameet Talwalkar

Comments Updated author affiliation

详情

英文摘要

Time series question-answering (TSQA), in which we ask natural language questions to infer and reason about properties of time series, is a promising yet underexplored capability of foundation models. In this work, we present ARFBench, a TSQA benchmark that evaluates the understanding of multimodal foundation models (FMs) on time series anomalies prevalent in software incident data. ARFBench consists of 750 questions across 142 time series and 5.38M data points from 63 production incidents sourced exclusively from internal telemetry at Datadog. We evaluate leading proprietary and open-source LLMs, VLMs, and time series FMs and observe that frontier VLMs perform markedly better than existing baselines; the leading model (GPT-5) achieves a 62.7% accuracy and 51.9% F1. We next demonstrate the promise of specialized multimodal approaches. We develop a novel TSFM + VLM hybrid prototype which we post-train on a small set of synthetic and real data that yields comparable overall F1 and accuracy with frontier models. Lastly, we find models and human domain experts exhibit complementary strengths. We define a model-expert oracle, a best-of-2 oracle selector over model and expert answers, yielding 82.8% F1 and 87.2% accuracy and establishing a new superhuman frontier for future TSQA models. The benchmark is available at https://huggingface.co/datasets/Datadog/ARFBench.

URL PDF HTML ☆

赞 0 踩 0

2604.19652 2026-05-04 cs.SD cs.AI

Environmental Sound Deepfake Detection Using Deep-Learning Framework

Lam Pham, Khoi Vu, Dat Tran, Phat Lam, Vu Nguyen, David Fischinger, Son Le

2604.19098 2026-05-04 cs.CL cs.AI cs.LG

SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning

Rania Elbadry, Sarfraz Ahmad, Ahmed Heakl, Dani Bouch, Momina Ahsan, Muhra AlMahri, Marwa Elsaid khalil, Yuxia Wang, Salem Lahlou, Sophia Ananiadou, Veselin Stoyanov, Jimin Huang, Xueqing Peng, Preslav Nakov, Zhuohan Xie

Comments 29 page

2604.18239 2026-05-04 cs.LG cs.AI

Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang

2604.17465 2026-05-04 cs.AI

Language models recognize dropout and Gaussian noise applied to their activations

Damiano Fornasiere, Mirko Bronzi, Spencer Kitts, Alessandro Palmas, Yoshua Bengio, Oliver Richardson

2604.17450 2026-05-04 cs.AI

Compiling Deterministic Structure into SLM Harnesses

Zan Kai Chong, Hiroyuki Ohsaki, Bryan Ng

2604.17423 2026-05-04 cs.LG

A unified convergence theory for adaptive first-order methods in the nonconvex case, including AdaNorm, full and diagonal AdaGrad, Shampoo and Muo

S. Gratton, Ph. L. Toint

2604.16243 2026-05-04 cs.CV

Find, Fix, Reason: Context Repair for Video Reasoning

Haojian Huang, Chuanyu Qin, Yinchuan Li, Yingcong Chen

Comments 22 pages, 7 figures, 17 tables. Accepted by ICML 2026

2604.15830 2026-05-04 cs.LG

Placing Puzzle Pieces Where They Matter: A Question Augmentation Framework for Reinforcement Learning

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi

2604.10766 2026-05-04 cs.CV

At FullTilt: Real-Time Open-Set 3D Macromolecule Detection Directly from Tilted 2D Projections

Ming-Yang Ho, Alberto Bartesaghi

2604.10217 2026-05-04 cs.CV

Are Pretrained Image Matchers Good Enough for SAR-Optical Satellite Registration?

Isaac Corley, Alex Stoken, Gabriele Berton

Comments CVPR 2026 Image Matching Workshop

2604.07669 2026-05-04 cs.LG cs.AI cs.CE

Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization

Tao Li, Kaiyuan Hou, Tuan Vinh, Monika Raj, Zhichun Guo, Carl Yang

2604.04465 2026-05-04 cs.AI

The Topology of Multimodal Fusion: Why Current Architectures Fail at Creative Cognition

Xiujiang Tan

Comments Expanded 11 technical improvements; 5 reference corrections; Appendix B pseudocode added. ~43 pages, 5 figures. Chinese philosophical terms romanized. Companion monograph available separately

2604.01707 2026-05-04 cs.CL cs.DB

Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

Yanchen Wu, Tenghui Lin, Yingli Zhou, Fangyuan Zhang, Qintian Guo, Xun Zhou, Sibo Wang, Xilin Liu, Yuchi Ma, Yixiang Fang

2603.25719 2026-05-04 cs.AI cs.AR cs.LG

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

Abhishek Bhandwaldar, Mihir Choudhury, Ruchir Puri, Akash Srivastava

2603.22908 2026-05-04 cs.CV cs.LG

Adaptive Dual-Teacher Distillation with Subnetwork Rectification for Bridging Semantic Gaps in Black-Box Domain Adaptation

Zhe Zhang, Jing Li, Wanli Xue, Xu Cheng, Jianhua Zhang, Qinghua Hu, Shengyong Chen

Comments Under Review

2603.22285 2026-05-04 cs.CV

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Ruoliu Yang, Chu Wu, Caifeng Shan, Ran He, Chaoyou Fu

2603.19397 2026-05-04 cs.LG

Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning

Xueqiao Peng, Andrew Perrault

2603.15949 2026-05-04 cs.CL

BanglaSocialBench: A Benchmark for Evaluating Sociopragmatic and Cultural Alignment of LLMs in Bangladeshi Social Interaction

Tanvir Ahmed Sijan, S. M Golam Rifat, Pankaj Chowdhury Partha, Md. Tanjeed Islam, Md. Musfique Anwar

Comments Accepted at ACL SRW 2026

2603.08965 2026-05-04 cs.LG cs.AI

Semantic Level of Detail for Knowledge Graphs: Discovering Abstraction Boundaries via Spectral Heat Diffusion

Edward Izgorodin

Comments v2: extended companion of GRAAI 2026 workshop paper; full proofs of Lemmas A.1-A.2 (Frechet-mean and heat-kernel Lipschitz constants, corrected) in Appendix A; Proposition 1(i) empirical anchor (new Figure 1); 50-seed ablation with BCa CIs and Wilcoxon tests (Tables 3-4, p<10^-15); WordNet retained (tau=0.79). 21 pages, 6 figures, 4 tables

详情

英文摘要

Graph-structured knowledge systems -- from knowledge graphs to GraphRAG pipelines -- organize information into hierarchical communities, yet lack a principled mechanism for continuous resolution control: where do the qualitative boundaries between abstraction levels lie, and how should an agent navigate them? Current approaches rely on discrete community detection with manually tuned resolution parameters (e.g., Leiden $γ$), offering no continuous zoom and no formal guarantees. We introduce Semantic Level of Detail (SLoD), a framework that addresses both problems by defining a continuous zoom operator via heat kernel diffusion on a graph Laplacian whose kNN structure is induced by a Poincare-ball embedding. We prove hierarchical coherence in the tree limit (exact tree with Sarkar embedding), with bounded approximation error, and demonstrate consistent boundary-detection behaviour on noisy hierarchies; spectral gaps in the graph Laplacian then induce emergent scale boundaries -- scales where the representation undergoes qualitative transitions -- detectable without manual resolution tuning. On synthetic hierarchies (HSBM, 1024 nodes), spectral clustering at the BoundaryScan-detected scale recovers planted levels, with macro ARI saturating at 1.00 in the high-SNR regime (50-seed median) and meso ARI reaching 0.89 [0.86, 0.92] at r=200. On the full WordNet noun hierarchy (82K synsets), using 100 stratified leaf queries, detected boundaries align with true taxonomic depth ($τ= 0.79$), demonstrating meaningful abstraction-level discovery in real-world knowledge graphs without resolution-parameter tuning. The composite weights, MAD threshold, and kNN-parameter rule ($k = \max(10, \min(\lfloor\sqrt{N}\rfloor, 50))$) use defaults that transferred unchanged between HSBM and WordNet; their behaviour on graphs with implicit or qualitatively different hierarchical structure is open.

URL PDF HTML ☆

赞 0 踩 0

2603.08032 2026-05-04 cs.LG cs.AI

GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables

Zhengyu Li, Xiangfei Qiu, Yuhan Zhu, Xingjian Wu, Jilin Hu, Chenjuan Guo, Bin Yang

2603.03565 2026-05-04 cs.AI cs.CL cs.LG

Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright, Marcus Yearwood, Hongtai Wei, Sudeep Das, Danny Nightingale, Meg Watson, Charles Pollnow

2603.02727 2026-05-04 cs.CV

Gated Differential Linear Attention: A Linear-Time Decoder for High-Fidelity Medical Segmentation

Hongbo Zheng, Afshin Bozorgpour, Dorit Merhof, Minjia Zhang

2603.02641 2026-05-04 cs.SD

Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement

Szu-Wei Fu, Rong Chao, Xuesong Yang, Sung-Feng Huang, Ryandhimas E. Zezario, Rauf Nasretdinov, Ante Jukić, Yu Tsao, Yu-Chiang Frank Wang

2603.02275 2026-05-04 cs.LG stat.AP stat.ML

A Comparative Study of UMAP and Other Dimensionality Reduction Methods

Guanzhe Zhang, Shanshan Ding, Zhezhen Jin

Comments 31 pages, 4 figures

2603.01267 2026-05-04 cs.RO cs.CV

Certifiable Factor Graph Optimization

Zhexin Xu, Nikolas R. Sanderson, Hanna Jiamei Zhang, David M. Rosen

Comments 20 pages, 3 figures

2602.18757 2026-05-04 cs.CV

Driving with A Thousand Faces: A Benchmark for Closed-Loop Personalized End-to-End Autonomous Driving

Xiaoru Dong, Ruiqin Li, Xiao Han, Zhenxuan Wu, Jiamin Wang, Jian Chen, Qi Jiang, SM Yiu, Xinge Zhu, Yuexin Ma

2602.14474 2026-05-04 cs.LG

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise

Amith Bhat, Haipeng Luo, Aadirupa Saha

详情

英文摘要

We study $K$-armed Multiarmed Bandit (MAB) problem with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances $\{σ_j^2\}_{j=1}^M$. The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of $\tilde{O}\left({σ^*}^2\sum_{i=2}^K \frac{\log T}{Δ_i} + \sqrt{K \sum_{j=1}^M σ_j^2}\right)$, up to preprocessing costs depending only on problem parameters, where ${σ^*}^2 := \min_j σ_j^2$ is the minimum source variance and $Δ_i$ denotes the suboptimality gap of the $i$-th arm. This result is both surprising as despite lacking prior knowledge of the minimum-variance source among $M$ alternatives, SOAR attains the optimal instance-dependent regret of standard single-source MAB with variance ${σ^*}^2$, while incurring only an small (and unavoidable) additive cost of $\tilde O(\sqrt{K \sum_{j=1}^M σ_j^2})$ towards the optimal (minimum variance) source identification. Our theoretical bounds represent a significant improvement over some proposed baselines, e.g. Uniform UCB or Explore-then-Commit UCB, which could potentially suffer regret scaling with $σ_{\max}^2$ in place of ${σ^*}^2$-a gap that can be arbitrarily large when $σ_{\max} \gg σ^*$. Experiments on multiple synthetic problem instances and the real-world MovieLens\;25M dataset, demonstrating the superior performance of SOAR over the baselines.

URL PDF HTML ☆

赞 0 踩 0

2602.13933 2026-05-04 cs.AI

HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

Xiaochen Zhao, Kaikai Wang, Xiaowen Zhang, Chen Yao, Aili Wang

2602.07744 2026-05-04 cs.LG

Riemannian MeanFlow

Dongyeop Woo, Marta Skreta, Seonghyun Park, Kirill Neklyudov, Sungsoo Ahn