arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.27266 2026-05-01 cs.LG cond-mat.mtrl-sci

AutoREC: A software platform for developing reinforcement learning agents for equivalent circuit model generation from electrochemical impedance spectroscopy data

Ali Jaberi, Yonatan Kurniawan, Robert Black, Shayan Mousavi M., Kabir Verma, Zoya Sadighi, Santiago Miret, Jason Hattrick-Simpers

详情

英文摘要

This paper introduces AutoREC, an open-source Python package for developing reinforcement learning (RL) agents to automatically generate equivalent circuit models (ECMs) from electrochemical impedance spectroscopy (EIS) data. While ECMs are a standard framework for interpreting EIS data, traditional identification is typically based on manual trial-and-error, which requires domain experts and limits scalability, particularly in autonomous experimental pipelines such as self-driving laboratories. AutoREC addresses this challenge by formulating ECM construction as a sequential decision-making problem within a Markov Decision Process framework. It implements a Double Deep Q-Network with prioritized experience replay, along with a dedicated dead-loop mitigation strategy, to efficiently explore a complex action space for circuit generation. To demonstrate the capabilities of the platform, we trained an RL agent using AutoREC and evaluated its strengths and limitations across diverse datasets, while also discussing possible strategies to mitigate these limitations in future agent designs. The trained agent achieved a success rate exceeding $99.6\%$ on synthetic datasets and demonstrated strong generalization to unseen experimental EIS data from batteries, corrosion, oxygen evolution reaction, and CO$_2$ reduction systems. These results position AutoREC as a promising platform for adaptive and data-driven ECM generation, with potential for integration into automated electrochemical workflows.

URL PDF HTML ☆

赞 0 踩 0

2604.27259 2026-05-01 cs.CV cs.LG

VTBench: A Multimodal Framework for Time-Series Classification with Chart-Based Representations

Madhumitha Venkatesan, Xuyang Chen, Dongyu Liu

Comments 8 pages main text

2604.27253 2026-05-01 cs.AI

AutoSurfer -- Teaching Web Agents through Comprehensive Surfing, Learning, and Modeling

Fazle Elahi Faisal, Qianhui Wu, Baolin Peng, Jianfeng Gao

Comments 21 pages, 3 figures

详情

英文摘要

Recent advances in multimodal large language models (LLMs) have revolutionized web agents that can automate complex tasks on websites. However, their accuracy remains limited by the scarcity of high-quality web trajectory training data. Existing automatic trajectory generation methods suffer from incomplete website coverage due to homepage-based task proposals or random-walk exploration. Such methods often result in hallucinated or ambiguous task synthesis that lead to incomplete and unreliable trajectory generation. Here, we present AutoSurfer, a comprehensive web trajectory generator that addresses these limitations through three key innovations. First, AutoSurfer employs a systematic breadth-first exploration strategy that maintains a queue of discovered pages and action traces, propagates knowledge across pages to avoid redundant exploration, and recursively expands multi-level graphical user interface elements - closely resembling how a human would learn a new website. Second, AutoSurfer leverages the exploration trajectory to guide task synthesis, reducing hallucinations by grounding complex tasks in actual navigation paths rather than isolated actions or page content alone. Third, AutoSurfer uses the same exploration trajectory as hints to steer a web agent toward more accurate and reliable trajectory refinement. Together, these innovations enable AutoSurfer to comprehensively cover a website's action space and generate data suitable for training website-specific LLMs. We evaluate AutoSurfer on the WebArena benchmark by fine-tuning Qwen2.5-VL-7B-Instruct and demonstrate that it outperforms state-of-the-art methods - Explorer, OS-Genesis, and SynthAgent - achieving up to 24.23% overall task completion accuracy compared to 19.59% for the best prior method. Further, task diversity analysis demonstrates that AutoSurfer yields a more diverse distribution of synthesized tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.27249 2026-05-01 cs.CL cs.AI

Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation

Jon-Paul Cacioli

Comments 12 pages, 3 figures, 3 tables. Pre-registered on OSF (osf.io/7p64)

2604.27239 2026-05-01 cs.LG

Analytical Correction for Subsampling Bias in Drifting Models

Jiaru Zhang, Zeyun Deng, Juanwu Lu, Ziran Wang, Ruqi Zhang

2604.27234 2026-05-01 cs.LG

Remaining Useful Life Estimation for Turbofan Engines: A Comparative Study of Classical, CNN, and LSTM Approaches

Astitva Goel, Samarth Galchar, Sumit Kanu

Comments 7 pages, 5 algorithms, 7 figures, 4 tables

2604.27233 2026-05-01 cs.AI cs.LG cs.MA

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

Anh Ta, Junjie Zhu, Shahin Shayandeh

详情

英文摘要

Tool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, and fundamentally cannot course-correct the agent in real time. To close this gap, we move evaluation into the execution loop at inference time: a specialized reviewer agent evaluates provisional tool calls prior to execution, shifting the paradigm from post-hoc recovery to proactive evaluation and error mitigation. In practice, this architecture establishes a clear separation of concerns between the primary execution agent and a secondary review agent. As with any multi-agent system, the reviewer can introduce new errors while correcting others, yet no prior work to our knowledge has systematically measured this tradeoff. To quantify this tradeoff, we introduce Helpfulness-Harmfulness metrics: helpfulness measures the percentage of base agent errors that feedback corrects; harmfulness measures the percentage of correct responses that feedback degrades. These metrics directly inform reviewer design by revealing whether a given model or prompt provides net positive value. We evaluate our approach on BFCL (single-turn) and Tau2-Bench (multi-turn stateful scenarios), achieving +5.5% on irrelevance detection and +7.1% on multi-turn tasks. Our metrics reveal that reviewer model choice is critical: the reasoning model o3-mini achieves a 3:1 benefit-to-risk ratio versus 2.1:1 for GPT-4o. Automated prompt optimization via GEPA provides an additional +1.5-2.8%. Together, these results demonstrate a core advantage of separating execution and review: the reviewer can be systematically improved through model selection and prompt optimization, without retraining the base agent.

URL PDF HTML ☆

赞 0 踩 0

2604.27228 2026-05-01 cs.AI cs.CL cs.CY cs.MA

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

Juergen Dietrich

Comments 22 pages

2604.27221 2026-05-01 cs.AI

Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

Yuxuan Huang, Yihang Chen, Zhiyuan He, Yuxiang Chen, Ka Yiu Lee, Huichi Zhou, Weilin Luo, Meng Fang, Jun Wang

2604.27218 2026-05-01 cs.CV

AttriBE: Quantifying Attribute Expressivity in Body Embeddings for Recognition and Identification

Basudha Pal, Siyuan Huang, Anirudh Nanduri, Zhaoyang Wang, Rama Chellappa

2604.27217 2026-05-01 cs.AI

Toward Personalized Digital Twins for Cognitive Decline Assessment: A Multimodal, Uncertainty-Aware Framework

Bulent Soykan, Gulsah Hancerliogullari Koksalmis, Hsin-Hsiung Huang, Laura J. Brattain

Comments 6 pages, 6 figures

详情

英文摘要

Cognitive decline is highly heterogeneous across individuals, which complicates prognosis, trial design, and treatment planning. We present the Personalized Cognitive Decline Assessment Digital Twin (PCD-DT), a multimodal and uncertainty-aware framework for modeling patient-specific disease trajectories from sparse, noisy, and irregular longitudinal data. The framework combines three methodological components: (1) latent state-space models for individualized temporal dynamics, (2) multimodal fusion for clinical, biomarker, and imaging features, and (3) uncertainty-aware validation and adaptive updating for robust digital twin operation. We also outline how conditional generative models can support data augmentation and stress testing for underrepresented progression patterns. As a preliminary feasibility study, we analyze longitudinal TADPOLE trajectories and show clear separation between cognitively normal and Alzheimer's disease cohorts in ADAS13, ventricle volume, and hippocampal volume over five years. We further conduct a multimodal next-visit prediction ablation using an LSTM sequence model on 3{,}003 visit-pair sequences derived from TADPOLE, where the combined cognitive plus MRI configuration achieves the lowest standardized RMSE for both ADAS13 (0.4419) and ventricle volume (0.5842), outperforming a Last Observation Carried Forward baseline. A Bayesian tensor modeling component for high-dimensional imaging fusion is also discussed. These results support the feasibility of the proposed architecture while also highlighting the need for stronger uncertainty calibration and longer-horizon predictive evaluation. The PCD-DT framework provides a principled starting point for personalized in silico modeling in neurodegenerative disease. This work positions PCD-DT as a foundational step toward clinically deployable, uncertainty-aware digital twin systems.

URL PDF HTML ☆

赞 0 踩 0

2604.27206 2026-05-01 cs.CV

HQ-UNet: A Hybrid Quantum-Classical U-Net with a Quantum Bottleneck for Remote Sensing Image Segmentation

Md Aminur Hossain, Ayush V. Patel, Ikshwaku Vanani, Biplab Banerjee

Comments 6 pages

2604.27204 2026-05-01 cs.CL cs.LG

Selective Augmentation: Improving Universal Automatic Phonetic Transcription via G2P Bootstrapping

Tobias Bystrich, Julia M. Pritzen, Christoph A. Schmidt, Claudia Wich-Reif

Comments Accepted at LREC 2026

2604.27201 2026-05-01 cs.CL cs.AI cs.LG

Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation

Shouren Wang, Wang Yang, Chuang Ma, Debargha Ganguly, Vikash Singh, Chaoda Song, Xinpeng Li, Xianxuan Long, Vipin Chaudhary, Xiaotian Han

Comments 27 pages, 9 figures, 6 tables. Under review

2604.27193 2026-05-01 cs.RO cs.CE cs.DC cs.SY eess.SY

Real-Time GPU-Accelerated Monte Carlo Evaluation of Safety-Critical AEB Systems Under Uncertainty

Akshay Karjol, Shadi Alawneh

Comments 10 pages, 6 figures. Submitted to IEEE journal for possible publication; under review

详情

英文摘要

Automatic Emergency Braking (AEB) systems represent a safety-critical national interest, with the National Highway Traffic Safety Administration (NHTSA) Federal Motor Vehicle Safety Standard (FMVSS No. 127) requiring AEB in all new light vehicles sold in the United States by September 2029. However, production implementations frequently rely on deterministic stopping-distance or Time-to-Collision (TTC) thresholds that fail to capture uncertainty in sensing, road conditions, and vehicle dynamics. This paper presents a GPU-accelerated Monte Carlo framework for stochastic evaluation of emergency braking performance using a high-fidelity longitudinal vehicle model incorporating aerodynamic drag, road grade, brake actuator dynamics, and weight transfer effects. A one-thread-per-sample execution strategy exploits the independence of Monte Carlo rollouts, while deterministic CPU-generated sampling ensures bit-exact numerical consistency between CPU and GPU implementations. The framework is evaluated across four hardware platforms spanning development and deployment environments: two laptop GPUs (GTX 1650, RTX 5070) and two automotive-grade embedded platforms (Jetson Orin Nano, Jetson AGX Orin). Peak speedups of 54.57x are achieved while maintaining exact numerical agreement. Real-time feasibility analysis with a complete AEB timing budget (700 ms human reaction time minus 120 ms perception and 50 ms decision overhead) demonstrates that the Jetson AGX Orin can execute approximately 25,000 Monte Carlo samples within a 530 ms budget, enabling real-time probabilistic AEB evaluation as part of a complete embedded pipeline. These results establish Monte Carlo-based uncertainty evaluation as a deployable runtime component rather than an offline validation tool and provide quantitative guidance for risk-aware AEB threshold selection under the NHTSA final rule.

URL PDF HTML ☆

赞 0 踩 0

2604.27182 2026-05-01 cs.LG cs.AI

Preserving Temporal Dynamics in Time Series Generation

Ci Lin, Futong Li, Tet Yeap, Iluju Kiringa

2604.27178 2026-05-01 cs.CV

Energy-Efficient Plant Monitoring via Knowledge Distillation

Ilyass Moummad, Reda Bensaid, Kawtar Zaher, Hervé Goëau, Jean-Christophe Lombardo, Joseph Salmon, Pierre Bonnet, Alexis Joly

2604.27175 2026-05-01 cs.RO

Global Sampling-Based Trajectory Optimization for Contact-Rich Manipulation via KernelSOS

Zhongqi Wei, Frederike Dümbgen

Comments 8 pages, 5 figures

2604.27172 2026-05-01 cs.LG

Context-Aware Graph Attention for Unsupervised Telco Anomaly Detection

Sara Malacarne, Eirik Hoel-Høiseth, Erlend Aune, David Zsolt Biro, Massimiliano Ruocco

2604.27169 2026-05-01 cs.CL cs.LG

Semantic Structure of Feature Space in Large Language Models

Austin C. Kozlowski, Andrei Boutyline

2604.27168 2026-05-01 cs.RO cs.HC

The Field of Safe Motion: Operationalizing Affordances in the Field of Safe Travel Using Reachability Analysis

Leif Johnson, Trent Victor, Johan Engström

2604.27166 2026-05-01 cs.LG cs.GT

Distributional Alignment Games for Answer-Level Fine-Tuning

Mehryar Mohri, Jon Schneider, Yifan Wu

2604.27156 2026-05-01 cs.AI

Interval Orders, Biorders and Credibility-limited Belief Revision

Richard Booth, Ivan Varzinczak

2604.27151 2026-05-01 cs.AI

Step-level Optimization for Efficient Computer-use Agents

Jinbiao Wei, Kangqi Ni, Yilun Zhao, Guo Gan, Arman Cohan

详情

英文摘要

Computer-use agents provide a promising path toward general software automation because they can interact directly with arbitrary graphical user interfaces instead of relying on brittle, application-specific integrations. Despite recent advances in benchmark performance, strong computer-use agents remain expensive and slow in practice, since most systems invoke large multimodal models at nearly every interaction step. We argue that this uniform allocation of compute is fundamentally inefficient for long-horizon GUI tasks. Such trajectories are highly heterogeneous: many steps are routine and can be handled reliably by smaller, cheaper policies, while errors tend to concentrate at a relatively small number of high-risk moments. Across computer-use benchmarks, these failures repeatedly take two forms: progress stalls, where the agent loops, repeats ineffective actions, or fails to make meaningful progress, and silent semantic drift, where the agent continues taking locally plausible actions after already deviating from the user's true goal. To address this inefficiency, we propose an event-driven, step-level cascade for computer-use agents that runs a small policy by default and escalates to a stronger model only when lightweight learned monitors detect elevated risk. Our framework combines two complementary signals: a Stuck Monitor that detects degraded progress from recent reasoning-action history and triggers recovery, and a Milestone Monitor that identifies semantically meaningful checkpoints where sparse verification is most informative for catching drift. This design turns always-on frontier-model inference into adaptive, on-demand compute allocation over the course of an evolving interaction. The framework is modular and deployment-oriented: it can be layered on top of existing computer-use agents without changing the underlying agent architecture or retraining the large model.

URL PDF HTML ☆

赞 0 踩 0

2604.27150 2026-05-01 cs.AI

Optimal Stop-Loss and Take-Profit Parameterization for Autonomous Trading Agent Swarm

Nathan Li, Aikins Laryea, Yigit Ihlamur

Comments 4 pages, 2 figures, 3 tables

2604.27149 2026-05-01 cs.LG cs.AI

ConformaDecompose: Explaining Uncertainty via Calibration Localization

Fatima Rabia Yapicioglu, Meltem Aksoy, Alberto Rigenti, Tuwe Löfström-Cavallin, Helena Löfström-Cavallin, Seyda Yoncaci, Luca Longo

Comments This is the accepted author version of a paper to appear in the proceedings of the World Explainable AI Conference (Springer). The final version will be available via Springer. This manuscript introduces ConformaDecompose, a framework for instance-wise uncertainty explainability via calibration localisation

2604.27137 2026-05-01 cs.CL

Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

Camelia Baluta

Comments 12 prompt clusters 6 languages 3 runs; data and code at github.com/camelbal-ship-it/crosslingual-claude-eval

2604.27134 2026-05-01 cs.AI cs.HC

Unpacking Vibe Coding: Help-Seeking Processes in Student-AI Interactions While Programming

Daiana Rinja, Eduardo Araujo Oliveira, Sonsoles López-Pernas, Mohammed Saqr, Marcus Specht, Kamila Misiejuk

Comments Accepted by the 27th International Conference on Artificial Intelligence in Education (AIED'26)

2604.27132 2026-05-01 cs.AI

TRUST: A Framework for Decentralized AI Service v.0.1

Yu-Chao Huang, Zhen Tan, Mohan Zhang, Pingzhi Li, Zhuo Zhang, Tianlong Chen

2604.27126 2026-05-01 cs.AI cs.CE cs.LG physics.geo-ph

Unsupervised Electrofacies Classification and Porosity Characterization in the Offshore Keta Basin Using Wireline Logs

Hamdiya Adams, Theophilus Ansah-Narh, Daniel Kwadwo Asiedu, Bruce Kofi Banoeng-Yakubo, Marcellin Atemkeng, Thomas Armah, Richmond Opoku-Sarkodie, Rebecca Davis, Ezekiel Nii Noye Nortey

Comments 7 pages, 7 figures. Accepted to ICECET 2026