arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.10913 2026-05-04 cs.CL

ADVICE: Answer-Dependent Verbalized Confidence Estimation

Ki Jung Seo, Sehun Lim, Taeuk Kim

Comments ACL 2026 Main

详情

英文摘要

Recent progress in large language models (LLMs) has enabled them to communicate their confidence in natural language, improving transparency and reliability. However, this expressiveness is often accompanied by systematic overconfidence, whose underlying causes remain poorly understood. In this work, we analyze the dynamics of verbalized confidence estimation and identify answer-independence -- the failure to condition confidence on the model's own answer -- as a primary driver of this behavior. To address this, we introduce ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that promotes answer-grounded confidence estimation. Extensive experiments show that ADVICE substantially improves confidence calibration, while exhibiting strong generalization to unseen settings without degrading task performance. We further demonstrate that these gains stem from enhanced answer dependence, shedding light on the origins of overconfidence and enabling trustworthy confidence verbalization.

URL PDF HTML ☆

赞 0 踩 0

2510.09696 2026-05-04 cs.LG cs.AI

Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression

Lorenzo Nikiforos, Luciano Prono, Charalampos Antoniadis, Fabio Pareschi, Riccardo Rovatti, Gianluca Setti

Comments Code available at https://github.com/foros15/vanishing-contributions

2510.07922 2026-05-04 cs.LG cs.DC

SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

Murtaza Rangwala, Farag Azzedin, Richard O. Sinnott, Rajkumar Buyya

Comments 11 pages, 3 figures, Code Available: https://doi.org/10.5281/zenodo.17223405

详情

英文摘要

Decentralized Federated Learning (DFL) enables privacy-preserving collaborative training without centralized servers but remains vulnerable to Byzantine attacks. Existing Byzantine-robust defenses are predicated on exchanging full, high-dimensional model vectors with every neighbor before filtering, an $O(d|\mathcal{N}_i|)$ communication cost incurred regardless of how many neighbors are ultimately rejected. This design choice is sustainable in small-scale experimental settings but becomes a fundamental barrier to deployment as network scale or model size grows. We propose SketchGuard, a framework that decouples Byzantine filtering from aggregation via sketch-based screening. Each client compresses its $d$-dimensional model to a $k$-dimensional Count Sketch ($k \ll d$), exchanges only sketches for neighbor screening, and fetches full models exclusively from accepted neighbors. This eliminates the pre-filtering communication waste of existing defenses: rejected Byzantine neighbors incur only $O(k)$ sketch cost rather than $O(d)$ full-model cost. Communication savings therefore scale with the Byzantine rejection rate: negligible extra overhead in benign conditions, rising to 50-70% total savings when 50-70% of neighbors are rejected. We prove convergence in both strongly convex and non-convex settings, establishing that Count Sketch's distance-preservation guarantee causes sketch-based filtering to deviate from full-precision filtering by at most a $(1+O(ε))$ factor in the effective threshold, a gap that can be made arbitrarily small. Experiments across three non-IID federated benchmarks, five network topologies, and four attack types confirm that SketchGuard matches state-of-the-art robustness (mean TER deviation $\leq$0.5 percentage points) while reducing computation by up to 82%, with robustness remaining stable across compression ratios up to 13,000:1.

URL PDF HTML ☆

赞 0 踩 0

2510.05950 2026-05-04 cs.AI

Training-Free Time Series Classification via In-Context Reasoning with LLM Agents

Songyuan Sui, Zihang Xu, Xia Hu

Comments 8 pages main content, 12 pages total including appendix, 1 figure

2510.05583 2026-05-04 cs.LG cs.DC

When Does Global Attention Help? A Unified Empirical Study on Atomistic Graph Learning

Arindam Chowdhury, Massimiliano Lupo Pasini

Comments 44 pages, 8 figures, 19 tables

详情

DOI: 10.1186/s13321-026-01171-z
Journal ref: Journal of Cheminformatics (2026) 18:54

英文摘要

Graph neural networks (GNNs) are widely used as surrogates for costly experiments and first-principles simulations to study the behavior of compounds at atomistic scale, and their architectural complexity is constantly increasing to enable the modeling of complex physics. While most recent GNNs combine more traditional message passing neural networks (MPNNs) layers to model short-range interactions with more advanced graph transformers (GTs) with global attention mechanisms to model long-range interactions, it is still unclear when global attention mechanisms provide real benefits over well-tuned MPNN layers due to inconsistent implementations, features, or hyperparameter tuning. We introduce the first unified, reproducible benchmarking framework - built on HydraGNN - that enables seamless switching among four controlled model classes: MPNN, MPNN with chemistry/topology encoders, GPS-style hybrids of MPNN with global attention, and fully fused local-global models with encoders. Using seven diverse open-source datasets for benchmarking across regression and classification tasks, we systematically isolate the contributions of message passing, global attention, and encoder-based feature augmentation. Our study shows that encoder-augmented MPNNs form a robust baseline, while fused local-global models yield the clearest benefits for properties governed by long-range interaction effects. We further quantify the accuracy-compute trade-offs of attention, reporting its overhead in memory. Together, these results establish the first controlled evaluation of global attention in atomistic graph learning and provide a reproducible testbed for future model development.

URL PDF HTML ☆

赞 0 踩 0

2510.04378 2026-05-04 cs.LG

Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models

Xinshuai Dong, Ignavier Ng, Haoyue Dai, Jiaqi Sun, Xiangchen Song, Peter Spirtes, Kun Zhang

2510.01948 2026-05-04 cs.CV

ClustViT: Clustering-based Token Merging for Semantic Segmentation

Fabio Montello, Ronja Güldenring, Lazaros Nalpantidis

Comments Submitted to IEEE

2510.00233 2026-05-04 cs.LG physics.flu-dyn

Differentiable Autoencoding Neural Operator for Interpretable and Integrable Latent Space Modeling

Siva Viknesh, Amirhossein Arzani

详情

英文摘要

Scientific machine learning has enabled the extraction of physical insights and data-driven modeling of high-dimensional spatiotemporal data, yet achieving physically interpretable latent representations and computationally efficient surrogates remains an open challenge. We propose the DIfferentiable Autoencoding Neural Operator - DIANO, an autoencoding neural operator framework that constructs visualizable coarse-grid latent spaces for both dimensionality and geometric reduction across varying spatial discretizations, with governing equations enforced directly within the latent space. Built upon neural operators, DIANO achieves this through an encoding neural operator that spatially coarsens the high-dimensional input functions into the latent representation, and a decoding neural operator that reconstructs the original inputs via spatial refinement. We assess DIANO's latent representation and performance against baselines, including the Convolutional Neural Operator and standard autoencoders. Furthermore, a fully differentiable partial differential equation (PDE) solver is integrated as the sole input-output functional mapping operator within the latent space, enabling end-to-end training with governing physics prescribed a priori through parametric PDEs. Various PDE formulations are investigated, including the 2D unsteady advection-diffusion and the 3D Pressure--Poisson equation, revealing that the fidelity of the embedded PDE relative to the true physics governs the learned latent representation and reconstruction accuracy. Benchmark problems include flow past a 2D cylinder, flow through a 2D symmetric stenosed artery, and a 3D patient-specific coronary artery, showing accurate reconstruction of high-fidelity spatio-temporal fields through low-fidelity latent PDE evolution at reduced computational cost, while yielding coherent, spatially organized, and meaningful latent structures.

URL PDF HTML ☆

赞 0 踩 0

2510.00072 2026-05-04 cs.CV cs.AI cs.LG

Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards

Chenhui Xu, Fuxun Yu, Michael J. Bianco, Jacob Kovarskiy, Raphael Tang, Qi Zhang, Zirui Xu, Will LeVine, Brandon Dubbs, Heming Liao, Cassandra Burgess, Suvam Bag, Jay Patravali, Rupanjali Kukal, Mikael Figueroa, Rishi Madhok, Nikolaos Karianakis, Jinjun Xiong

Comments ICML 2026

2509.24496 2026-05-04 cs.LG cs.AI

LLM DNA: Tracing Model Evolution via Functional Representations

Zhaomin Wu, Haodong Zhao, Ziyang Wang, Jizhou Guo, Qian Wang, Bingsheng He

Comments ICLR 2026 (Oral)

2509.24276 2026-05-04 cs.AI

G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge

Linhao Luo, Zicheng Zhao, Junnan Liu, Zhangchi Qiu, Junnan Dong, Serge Panev, Chen Gong, Thuy-Trang Vu, Gholamreza Haffari, Dinh Phung, Alan Wee-Chung Liew, Shirui Pan

Comments Accepted by ICLR 2026

2509.24169 2026-05-04 cs.CL

Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

Haolin Yang, Hakaze Cho, Kaize Ding, Naoya Inoue

Comments ICLR 2026

2509.24164 2026-05-04 cs.CL

Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

Haolin Yang, Hakaze Cho, Naoya Inoue

Comments ICLR 2026

2509.23330 2026-05-04 cs.CL

Structured In-context Environment Scaling for Large Language Model Reasoning

Peng Yu, Zeyuan Zhao, Shao Zhang, Luoyi Fu, Xinbing Wang, Ying Wen

Comments Title modified for greater clarity and better alignment with the paper's focus

2509.21864 2026-05-04 cs.CV

Deepfakes: we need to re-think the concept of "real" images

Janis Keuper, Margret Keuper

2509.21723 2026-05-04 cs.RO

VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation

Huayi Zhou, Kui Jia

Comments accepted by ICLR 2026. The project link is https://hnuzhy.github.io/projects/VLBiMan/

2509.21514 2026-05-04 cs.LG cs.CL

Knowing When to Defer: Selective Prediction for Responsible Knowledge Tracing

Joshua Mitton, Prarthana Bhattacharyya, Ralph Abboud, Simon Woodhead

Comments 10 pages, 7 figures. Joshua Mitton and Prarthana Bhattacharyya contributed equally to this paper

2509.20823 2026-05-04 cs.LG cs.AI cs.CV

CaTS-Bench: Can Language Models Describe Time Series?

Luca Zhou, Pratham Yashwante, Marshall Fisher, Alessio Sampieri, Zihao Zhou, Fabio Galasso, Rose Yu

Comments 9 pages, 6 figures, 4 tables in the main paper. Many more in the appendix

2509.20098 2026-05-04 cs.LG

Incomplete Data, Complete Dynamics: A Diffusion Approach

Zihan Zhou, Chenguang Wang, Hongyi Ye, Yongtao Guan, Tianshu Yu

2509.12057 2026-05-04 cs.LG cs.DM cs.DS

Optimal hypersurface decision trees

Xi He

2509.06864 2026-05-04 cs.LG cs.SE

Concolic Testing on Individual Fairness of Neural Network Models

Ming-I Huang, Chih-Duo Hong, Fang Yu

Comments Add a theorem and improve wording and layout

2508.19932 2026-05-04 cs.AI

CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments

Nitish Jaipuria, Lorenzo Gatto, Zijun Kan, Shankey Poddar, Bill Cheung, Diksha Bansal, Ramanan Balakrishnan, Aviral Suri, Jose Estevez

Comments 7 pages, 5 figures, Version published in IEEE Xplore

详情

DOI: 10.1109/BigData66926.2025.11402424
Journal ref: 2025 IEEE International Conference on Big Data (BigData), Macau, China, 2025, pp. 2177-2183

英文摘要

The proliferation of digital payment platforms has transformed commerce, offering unmatched convenience and accessibility globally. However, this growth has also attracted malicious actors, leading to a corresponding increase in sophisticated social engineering scams. These scams are often initiated and orchestrated on multiple surfaces outside the payment platform, making user and transaction-based signals insufficient for a complete understanding of the scam's methodology and underlying patterns, without which it is very difficult to prevent it in a timely manner. This paper presents CASE (Conversational Agent for Scam Elucidation), a novel Agentic AI framework that addresses this problem by collecting and managing user scam feedback in a safe and scalable manner. A conversational agent is uniquely designed to proactively interview potential victims to elicit intelligence in the form of a detailed conversation. The conversation transcripts are then consumed by another AI system that extracts information and converts it into structured data for downstream usage in automated and manual enforcement mechanisms. Using Google's Gemini family of LLMs, we implemented this framework on Google Pay (GPay) India. By augmenting our existing features with this new intelligence, we have observed a 21% uplift in the volume of scam enforcements. The architecture and its robust evaluation framework are highly generalizable, offering a blueprint for building similar AI-driven systems to collect and manage scam intelligence in other sensitive domains.

URL PDF HTML ☆

赞 0 踩 0

2508.19600 2026-05-04 cs.CV

Quantization Robustness to Input Degradations for Object Detection

Toghrul Karimov, Hassan Imani, Allan Kazakov

2508.15568 2026-05-04 cs.CV cs.LG

Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment

Youjia Zhang, Youngeun Kim, Young-Geun Choi, Hongyeob Kim, Huiling Liu, Sungeun Hong

2508.14255 2026-05-04 cs.LG

Graph Concept Bottleneck Models

Haotian Xu, Tsui-Wei Weng, Lam M. Nguyen, Tengfei Ma

Comments TMLR March 2026

2508.11696 2026-05-04 cs.CV cs.LG

A Deep Learning-Based CCTV System for Automatic Smoking Detection in Fire Exit Zones

Sami Sadat, Mohammad Irtiza Hossain, Junaid Ahmed Sifat, Suhail Haque Rafi, Md. Waseq Alauddin Alvi, Md. Khalilur Rhaman

Comments We request withdrawal due to critical inconsistencies in the Result Analysis, where reported metrics for the proposed model conflict between text and Table 1 (precision/recall/mAP@50), and a methodological issue in Dataset Description where augmentation likely introduced data leakage across train/validation/test splits, making results unreliable and non-reproducible

2508.07630 2026-05-04 cs.CL cs.AI cs.CV

InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

Anirudh Iyengar Kaniyar Narayana Iyengar, Srija Mukhopadhyay, Adnan Qidwai, Shubhankar Singh, Dan Roth, Vivek Gupta

Comments 22 pages, 8 figures, 14 tables. Accepted at IJCNLP-AACL 2025

2508.06361 2026-05-04 cs.LG cs.AI

Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He

Comments ICLR 2026 (Oral)

2508.04086 2026-05-04 cs.CL

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

Zhongyi Zhou, Kohei Uehara, Haoyu Zhang, Jingtao Zhou, Lin Gu, Ruofei Du, Zheng Xu, Tatsuya Harada

Comments ACL 2026 Finding. Source code: https://github.com/zhongyi-zhou/toolgrad

2507.18654 2026-05-04 cs.LG cs.CV

Diffusion Models for Solving Inverse Problems via Posterior Sampling with Piecewise Guidance

Saeed Mohseni-Sehdeh, Walid Saad, Kei Sakaguchi, Tao Yu