arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1832
2604.27849 2026-05-01 cs.AI

A Grid-Aware Agent-Based Model for Analyzing Electric Vehicle Charging Systems

Khalil Al-Rahman Youssefi, Marija Gojkovic, Walter Stefanutti, Mika Auer, Melanie Schranz

Comments This is the author's version of a paper submitted to SIMULTECH. 12 Pages, 1 Table, 10 Figures

详情
英文摘要

This paper presents a configurable, grid-aware Agent-Based Model (ABM) for the systematic analysis of electric vehicle (EV) charging systems under configurable infrastructure and operational conditions. The model integrates heterogeneous EV behavior, charging column constraints, and a shared Energy Sandbox that regulates aggregate power allocation, enabling the joint study of user-centric charging dynamics and facility-level power behavior. Implemented in Python using the SimPy discrete-event framework, the approach supports scalable, event-driven simulations across varying system sizes, charger compositions, and scheduling strategies. A representative workplace charging scenario is investigated to illustrate how infrastructure configuration and coordination mechanisms influence energy delivery performance, infrastructure utilization, and aggregate load characteristics. The results highlight the context-dependence of infrastructure suitability and demonstrate how charging strategies and charger types reshape both service-level outcomes and grid-facing behavior. The proposed ABM provides a flexible and extensible simulation environment for exploring technical, operational, and grid-aware aspects of EV charging ecosystems, and for serving as a methodological basis for subsequent studies on advanced coordination strategies beyond the specific scenario analyzed in this study.

2604.27846 2026-05-01 cs.CL

Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health

Yuxi Ma, Jieming Cui, Muyang Li, Ye Zhao, Yu Li, Yixuan Wang, Chi Zhang, Yinyin Zang, Yixin Zhu

详情
英文摘要

How people narrate their experiences offers a window into how the mind organizes them. Computational approaches to therapeutic writing have evolved from lexical counting to neural methods, yet remain fragmented: dictionary tools miss discourse structure, while embeddings conflate local coherence with global organization. No existing framework maps these techniques onto the hierarchical processes through which narratives are constructed. Here we introduce a three-level framework - micro-level lexical features, meso-level semantic embeddings, and macro-level LLM narrative evaluation - and show, across 830 Chinese therapeutic texts spanning depression, anxiety, and trauma, that macro-level evaluation substantially outperforms lexical and embedding features for mental health prediction. This challenges the field's emphasis on word-counting: formal structural features (Labov's story grammar, RST coherence, propositional composition) demonstrate that narrative organization per se carries predictive signal, while clinically-grounded narrative dimensions capture how psychological states are expressed through discourse. Semantic embeddings add minimal independent value but yield incremental gains in multi-level classification. By grounding computational levels in discourse processing theory, this framework identifies macro-structural organization as the primary locus of clinical signal and generates testable hypotheses for intervention design and longitudinal research.

2604.27840 2026-05-01 cs.LG cs.AI

CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting

Bokai Pan, Mingyue Cheng, Zhiding Liu, Shuo Yu, Xiaoyu Tao, Yuchong Wu, Qi Liu, Defu Lian, Enhong Chen

详情
英文摘要

Recently, large language models (LLMs) have shown great promise in time series forecasting. However, most existing LLM-based forecasting methods still follow a static generative paradigm that directly maps historical observations to future values in a single pass. Under this paradigm, forecasting is constrained by limited temporal pattern extraction, single-round acquisition of contextual features, one-shot forecast generation, and lack of support from ensemble forecasts. To address these limitations, in this work, we propose CastFlow, a dynamic agentic forecasting framework that enables multi-view temporal pattern extraction, multi-round contextual features acquisition, iterative forecast refinement, and forecasting with ensemble forecasts. First, CastFlow organizes the forecasting process into planning, action, forecasting, and reflection, establishing an agentic workflow. Second, this workflow is supported by a memory module that retrieves prior experience and a multi-view toolkit that constructs diagnostic evidence and provides a reliable ensemble forecast baseline. Third, CastFlow adopts a role-specialized design that combines general-purpose reasoning with specialized numerical forecasting. Under this design, a frozen LLM preserves general-purpose reasoning, while a fine-tuned domain-specific LLM performs evidence-guided numerical forecasting based on the ensemble forecast baseline, rather than from scratch. To optimize a fine-tuned domain-specific LLM, we further develop a two-stage workflow-oriented training that combines supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). To evaluate the effectiveness of CastFlow, we conduct extensive experiments on diverse datasets and show that it achieves superior overall results against strong baselines. We hope that this work can serve as a step toward more adaptive and accurate time series forecasting.

2604.27833 2026-05-01 cs.CV cs.LG

Taming Noise-Induced Prototype Degradation for Privacy-Preserving Personalized Federated Fine-Tuning

Yuhua Wang, Qinnan Zhang, Xiaodong Li, Huan Zhang, Yifan Sun, Wangjie Qiu, Hainan Zhang, Yongxin Tong, Zhiming Zheng

Comments Accepted by CVPR 2026 (Highlight)

详情
英文摘要

Prototype-based Personalized Federated Learning (ProtoPFL) enables efficient multi-domain adaptation by communicating compact class prototypes, but directly sharing them poses privacy risks. A common defense involves per-example $\ell_2$ clipping before prototype computation to bound sensitivity, followed by isotropic Gaussian noise to enforce Local Differential Privacy (LDP). However, Isotropic Gaussian Prototype Perturbation (IGPP) typically over-perturbs discriminative dimensions and struggles to balance the clipping threshold with representation fidelity. In this paper, we propose VPDR, a client-side privacy plug-in that seamlessly integrates into existing ProtoPFLs. Motivated by the observation that dimension-wise class variance reflects discriminability, we introduce Variance-adaptive Prototype Perturbation (VPP), which allocates less noise to discriminative subspaces, preserving semantic separability while ensuring privacy. We further develop Distillation-guided Clipping Regularization (DCR), which enables feature norms to adaptively concentrate near the predefined clipping threshold while maintaining prediction consistency. Theoretical analysis shows that our groupwise mechanism provides privacy guarantees no weaker than the isotropic baseline under the same privacy constraints. Extensive experiments on multi-domain benchmarks demonstrate that VPDR achieves a superior privacy-utility trade-off, outperforming IGPP in personalized federated fine-tuning without sacrificing robustness against realistic attacks.

2604.27821 2026-05-01 cs.RO

Learning-Based Hierarchical Scene Graph Matching for Robot Localization Leveraging Prior Maps

Nimrod Millenium Ndulue, Jose Andres Millan-Romera, Matteo Giorgi, Holger Voos, Jose Luis Sanchez-Lopez

详情
英文摘要

Accurate localization is a fundamental requirement for autonomous robots operating in indoor environments. Scene graphs encode the spatial structure of an environment as a hierarchy of semantic entities and their relationships, and can be constructed both online from robot sensor data and offline from architectural priors such as Building Information Models (BIM). Matching these two complementary representations enables drift correction in SLAM by grounding robot observations against a known structural prior. However, establishing reliable node-to-node correspondences between them remains an open challenge: existing combinatorial methods are prohibitively expensive at scale, and prior learned approaches address only flat graph matching, ignoring the multi-level semantic structure present in both representations. Here we present a learned, end-to-end differentiable pipeline that augments both graphs with semantically motivated edge types encoding intra- and inter- level relationships, explicitly exploiting this hierarchy to enable simultaneous matching from high-level room concepts down to low-level wall surfaces. Trained exclusively on floor plans, the proposed method outperforms the combinatorial baseline in F1 on real LiDAR environments while running an order of magnitude faster, demonstrating viable zero-shot generalization for BIM-assisted robot localization.

2604.27820 2026-05-01 cs.AI cs.DB cs.IR cs.MA

ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

Mohit Dubey, Open Gigantic

Comments 12 pages, 4 figures, 4 tables

详情
英文摘要

Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem. We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling. We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.

2604.27819 2026-05-01 cs.AI

MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents

Haonan Li, Tianjun Sun, Yongqing Wang, Qisheng Zhang

Comments 21 pages, 1 figure, 16 tables. Code: https://github.com/lihaonan0716/MCPHunt Data: https://huggingface.co/datasets/lihaonan0716/mcphunt-agent-traces

详情
英文摘要

Multi-server MCP agents create an information-flow control problem: faithful tool composition can turn individually benign read/write permissions into cross-boundary credential propagation -- a structural side effect of workflow topology, not necessarily malicious model behavior. We present MCPHunt, to our knowledge the first controlled benchmark that isolates non-adversarial, verbatim credential propagation across multi-server MCP trust boundaries, with three methodological contributions: (1) canary-based taint tracking that reduces propagation detection to objective string matching; (2) an environment-controlled coverage design with risky, benign, and hard-negative conditions that validates pipeline soundness and controls for credential-format confounds; (3) CRS stratification that disentangles task-mandated propagation (faithful execution of verbatim-transfer instructions) from policy-violating propagation (credentials included despite the option to redact). Across 3,615 main-benchmark traces from 5 models spanning 147 tasks and 9 mechanism families, policy-violating propagation rates reach 11.5--41.3% across all models. This propagation is pathway-specific (25x cross-mechanism range) and concentrated in browser-mediated data flows; hard-negative controls provide evidence that production-format credentials are not necessary -- prompt-directed cross-boundary data flow is sufficient. A prompt-mitigation study across 3 models reduces policy-violating propagation by up to 97% while preserving 80.5% utility, but effectiveness varies with instruction-following capability -- suggesting that prompt-level defenses alone may not suffice. Code, traces, and labeling pipeline are released under MIT and CC BY 4.0.

2604.27814 2026-05-01 cs.LG

Probabilistic Circuits for Irregular Multivariate Time Series Forecasting

Christian Klötergens, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme

详情
英文摘要

Joint probabilistic modeling is essential for forecasting irregular multivariate time series (IMTS) to accurately quantify uncertainty. Existing approaches often struggle to balance model expressivity with consistent marginalization, frequently leading to unreliable or contradictory forecasts. To address this, we propose CircuITS, a novel architecture for probabilistic IMTS forecasting based on probabilistic circuits. Our model is flexible in capturing intricate dependencies between time series channels while structurally guaranteeing valid joint distributions. Experiments on four real world datasets demonstrate that CircuITS achieves superior joint and marginal density estimation compared to state of the art baselines.

2604.27804 2026-05-01 cs.CV cs.CR cs.LG

Machine Unlearning for Class Removal through SISA-based Deep Neural Network Architectures

Ishrak Hamim Mahi, Siam Ferdous, Md Sakib Sadman Badhon, Nabid Hasan Omi, Md Habibun Nabi Hemel, Farig Yousuf Sadeque, Md. Tanzim Reza

Comments 10 pages, 9 figures, 2 tables

详情
英文摘要

The rapid proliferation of image generation models and other artificial intelligence (AI) systems has intensified concerns regarding data privacy and user consent. As the availability of public datasets declines, major technology companies increasingly rely on proprietary or private user data for model training, raising ethical and legal challenges when users request the deletion of their data after it has influenced a trained model. Machine unlearning seeks to address this issue by enabling the removal of specific data from models without complete retraining. This study investigates a modified SISA (Sharded, Isolated, Sliced, and Aggregated) framework designed to achieve class-level unlearning in Convolutional Neural Network (CNN) architectures. The proposed framework incorporates a reinforced replay mechanism and a gating network to enhance selective forgetting efficiency. Experimental evaluations across multiple image datasets and CNN configurations demonstrate that the modified SISA approach enables effective class unlearning while preserving model performance and reducing retraining overhead. The findings highlight the potential of SISA-based unlearning for deployment in privacy-sensitive AI applications. The implementation is publicly available at https://github.com/SiamFS/ sisa-class-unlearning.

2604.27796 2026-05-01 cs.AI

Post-Optimization Adaptive Rank Allocation for LoRA

Vishnuprasadh Kumaravelu, Sunil Gupta, P. K. Srijith

详情
英文摘要

Exponential growth in the scale of modern foundation models has led to the widespread adoption of Low-Rank Adaptation (LoRA) as a parameter-efficient fine-tuning technique. However, standard LoRA implementations disregard the varying intrinsic dimensionality of model layers and enforce a uniform rank, leading to parameter redundancy. We propose Post-Optimization Adaptive Rank Allocation (PARA), a data-free compression method for LoRA that integrates seamlessly into existing fine-tuning pipelines. PARA leverages Singular Value Decomposition to prune LoRA ranks using a global threshold over singular values across all layers. This results in non-uniform rank allocation based on layer-wise spectral importance. As a post-hoc method, PARA circumvents the training modifications and resulting instabilities that dynamic architectures typically incur. We empirically demonstrate that PARA reduces parameter count by 75-90\% while preserving the predictive performance of the original, uncompressed LoRA across multiple vision and language benchmarks. Code will be published upon acceptance.

2604.27786 2026-05-01 cs.LG

On the Expressive Power of GNNs to Solve Linear SDPs

Chendi Qian, Christopher Morris

详情
英文摘要

Semidefinite programs (SDPs) are a powerful framework for convex optimization and for constructing strong relaxations of hard combinatorial problems. However, solving large SDPs can be computationally expensive, motivating the use of machine learning models as fast computational surrogates. Graph neural networks (GNNs) are a natural candidate in this setting due to their sparsity-awareness and ability to model variable-constraint interactions. In this work, we study what expressive power is sufficient to recover optimal SDP solutions. We first prove negative results showing that standard GNN architectures fail on recovering linear SDP solutions. We then identify a more expressive architecture that captures the key structure of SDPs and can, in particular, emulate the updates of a standard first-order solver. Empirically, on both synthetic and \textsc{SdpLib} benchmarks of various classes of SDPs, this more expressive architecture achieves consistently lower prediction error and objective gap than theoretically weaker baselines. Finally, using the learned high-quality predictions to warm-start the first-order solver yields practical speedups of up to 80%.

2604.27776 2026-05-01 cs.AI cs.CL

WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments

Jinchao Li, Yunxin Li, Chenrui Zhao, Zhenran Xu, Baotian Hu, Min Zhang

详情
英文摘要

While GUI agents have shown impressive capabilities in common computer-use tasks such as OSWorld, current benchmarks mainly focus on isolated and single-application tasks. This overlooks a critical real-world requirement of coordinating across multiple applications to accomplish complex profession-specific workflows. To bridge this gap, we present a computer-use benchmark in cross-application workflows, named WindowsWorld, designed to systematically assess GUI Agents on complex multi-step tasks that mirror real-world professional activities. Our methodology uses a multi-agent framework steered by 16 occupations to generate four difficulty-level tasks with intermediate inspection, which are then refined by human review and executed in a simulated environment. The resulting benchmark contains 181 tasks with an average of 5.0 sub-goals across 17 common desktop applications, of which 78% are inherently multi-application. Experimental results of leading large models and agents show that: 1) All computer-use agents perform poorly on multi-application tasks (< 21% success rate), far below the performance of simple single-app tasks; 2) They largely fail at tasks requiring conditional judgment and reasoning across $\geq$ 3 applications, stalling at early sub-goals; 3) Low execution efficiency, where tasks often fail despite far exceeding human step limits. Code, benchmark data, and evaluation resources are available at github.com/HITsz-TMG/WindowsWorld.

2604.27766 2026-05-01 cs.CL cs.AI

Instruction-Guided Poetry Generation in Arabic and Its Dialects

Abdelrahman Sadallah, Kareem Elozeiri, Mervat Abassy, Rania Elbadry, Mohamed Anwar, Abed Alhakim Freihat, Preslav Nakov, Fajri Koto

Comments ACL Findings 2026

详情
英文摘要

Poetry has long been a central art form for Arabic speakers, serving as a powerful medium of expression and cultural identity. While modern Arabic speakers continue to value poetry, existing research on Arabic poetry within Large Language Models (LLMs) has primarily focused on analysis tasks such as interpretation or metadata prediction, e.g., rhyme schemes and titles. In contrast, our work addresses the practical aspect of poetry creation in Arabic by introducing controllable generation capabilities to assist users in writing poetry. Specifically, we present a large-scale, carefully curated instruction-based dataset in Modern Standard Arabic (MSA) and various Arabic dialects. This dataset enables tasks such as writing, revising, and continuing poems based on predefined criteria, including style and rhyme, as well as performing poetry analysis. Our experiments show that fine-tuning LLMs on this dataset yields models that can effectively generate poetry that is aligned with user requirements, based on both automated metrics and human evaluation with native Arabic speakers. The data and the code are available at https://github.com/mbzuai-nlp/instructpoet-ar

2604.27764 2026-05-01 cs.CV

GourNet: A CNN-Based Model for Mango Leaf Disease Detection

Ekram Alam, Jaydip Sanyal, Akhil Kumar Das, Arijit Bhattacharya, Farhana Sultana

详情
英文摘要

Mango cultivation is crucial in the agricultural sector, significantly contributing to economic development and food security. However, diseases affecting mango leaves can significantly reduce both the production and overall fruit grade. Detecting leaf diseases at an early stage with precision is key to effective disease prevention and sustaining crop productivity. In this paper, we introduce a "deep learning" model named "GourNet", which leverages "Convolutional Neural Networks" to identify infections in mango leaves. We utilize the "MangoLeafBD" (MBD) dataset to train and assess the effectiveness of the presented model. The MBD dataset contains seven disease classes and a Healthy class, making a total of eight classes. To enhance model performance, the images are preprocessed through steps like resizing, rescaling, and data augmentation prior to training. To properly evaluate the model, the dataset is separated into 80% for training, with the remaining 20% equally split between validation and testing. Our model uses only 683,656 total parameters and achieves a classification accuracy of 97%. This research's source code can be found at: https://github.com/ekramalam/GourNet-Repo.

2604.27763 2026-05-01 cs.AI

Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions

Zhuoran Pan, Yue Li, Zhi Guan, Jianbin Hu, Zhong Chen

详情
英文摘要

The emergence of Large Language Models (LLMs) offers a transformative interface for Web3, yet existing benchmarks fail to capture the complexity of translating high-level user intents into functionally correct, state-dependent on-chain transactions. We present \textsc{Intent2Tx}, a high-fidelity benchmark featuring 29,921 single-step and 1,575 multi-step instances meticulously derived from 300 days of real-world Ethereum mainnet traces. Unlike prior works that rely on synthetic instructions, \textsc{Intent2Tx} grounds natural language intents in real-world protocol interactions across 11 categories, including diverse long-tail Decentralized Finance (DeFi) primitives. To enable rigorous evaluation, we propose an execution-aware framework that transcends surface-level text matching by employing differential state analysis on forked mainnet environments. Our extensive evaluation of 16 state-of-the-art LLMs reveals that while scaling and retrieval-augmentation enhance logical consistency and parameter precision, current models struggle with out-of-distribution generalization and multi-step planning. Crucially, our execution-based analysis demonstrates that syntactically valid outputs often fail to achieve intended state transitions, highlighting a significant gap in current "reasoning-to-execution" capabilities. \textsc{Intent2Tx} serves as a critical foundation for developing autonomous, reliable agents in intent-centric Web3 ecosystems. Code and data: https://anonymous.4open.science/r/Intent2Tx_Bench-97FF .

2604.27759 2026-05-01 cs.CV cs.AI

Learning to Reason: Targeted Knowledge Discovery and Fuzzy Logic Update for Robust Image Recognition

Gurucharan Srinivas, Joshua Niemeijer, Frank Köster

Comments Accepted to CVPR Findings 2026

详情
英文摘要

Integrating domain knowledge into deep neural networks is a promising way to improve generalization. Existing methods either encode prior knowledge in the loss function or apply post-processing modules, but both depend on identifying useful symbolic knowledge to integrate. Since such rules are often unavailable in real-world vision tasks, we propose a method for targeted knowledge discovery. We propose a Differentiable Knowledge Unit (DKU) that enables modulating the classifier logits, yielding refined class probabilities. The DKU uses implication rules to represent relationships between task classes and implicit concepts learned entirely from the main task supervision, without requiring concept labels. Concepts are identified by dedicated classifiers, whose probabilities are passed to DKU alongside the primary class probabilities. DKU computes a logic-based adjustment vector via fuzzy inference, which modulates the primary class logits to yield refined class probabilities. When concept classifiers represent concepts that do not support the logical rule structure, the resulting adjustments to the class probabilities do not directly minimize the supervision loss. Consequently, optimizing the supervision loss on these adjusted class probabilities implicitly trains the concept classifiers. We construct the rule base so that bidirectional logical relations connect concepts and classes. We enforce the concepts to be distinct from each other and with respect to the classes. This design enforces a clean supervision signal for concept learning. We evaluate our methods on the PASCAL-VOC, COCO, and MedMNIST datasets. We demonstrate improvement through our knowledge integration across these datasets. We conduct domain generalization and hard-sample ablation studies and find that our implicit knowledge discovery and integration outperforms the baseline.

2604.27753 2026-05-01 cs.AI cs.ET cs.MA

Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making

Salman Jan, Toqeer Ali Syed, Shahid Kamal, Qamar Wali, Ali Akarma

Comments This paper is submitted to MECON2026 conference

详情
英文摘要

This article outlines a new framework of traffic light optimization through a digital twin of the transport infrastructure, managed by agentic AI to ensure real-time autonomous decisions. The framework relies on physical sensors and edge computing to measure real-time traffic information and simulate traffic flow in a constantly updated digital twin. The traffic light is automatically controlled through the digital twin according to traffic congestion, travel delay and traffic patterns. This approach is implemented as a three-layer system: perception, conceptualization and action. The perception layer receives data on physical systems; the conceptualization layer uses LangChain to process the data; and the action layer links to the Model Context Protocol (MCP) and traffic management APIs to implement optimised traffic signal control algorithms. The results show that the framework minimizes waiting time at traffic lights and positively affects the effectiveness of the entire traffic flow, which is better than the fixed-time and reinforcement learning-based baselines.

2604.27744 2026-05-01 cs.AI

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

Wei Zhou, Rashina Hoda, Joycelyn Ling

详情
英文摘要

AI applications are increasingly being introduced into digital health. While technical performance has advanced rapidly, successful deployment mainly depends on consumer attitudes, especially to patient-facing applications. However, most existing research examines consumer attitudes towards healthcare AI at an abstract level rather than in response to concrete artefacts. We report a mixed-methods survey study in Australia (N=275) examining consumer readiness, acceptance, trust, and risk perceptions of healthcare AI, combined with a scenario-based evaluation of an AI-generated versus clinician-written consultation summary. Participants expressed moderate optimism and strong perceived usefulness and ease of use, but also substantial concerns about accuracy, safety, and data use. In the scenario task, the AI-generated summary was strongly preferred for quality, empathy, and overall usefulness, yet identification of the AI summary was near chance. Findings show that consumers judge AI through concrete communication quality and visible human governance, underscoring the need for clinically supervised deployment frameworks beyond technical performance alone.

2604.27742 2026-05-01 cs.LG stat.ML

Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

Mehryar Mohri, Yutao Zhong

详情
英文摘要

The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yield slow square-root consistency bounds, while piecewise-linear losses (like Hinge) offer fast linear consistency rates but suffer from non-differentiability. We propose Linear-Core (LC) Surrogates, a new family of convex loss functions that resolve this tension by stitching a linear core to a smooth tail. We prove that these surrogates are differentiable everywhere while retaining strict linear $H$-consistency bounds, effectively combining the optimization benefits of smoothness with the statistical efficiency of margin-based losses. In the structured prediction setting, we show that this smoothness unlocks a massive computational and energy advantage: it allows for an unbiased stochastic gradient estimator that bypasses the quadratic complexity $O(|\mathscr{Y}|^2)$ of exact inference (e.g., Viterbi). Empirically, our method achieves a 23$\times$ speedup over Structured SVMs on large-vocabulary sequence tagging tasks and demonstrates superior robustness to instance-dependent label noise, outperforming Cross-Entropy by 2.6% on corrupted CIFAR-10.

2604.27741 2026-05-01 cs.LG

Differential Subgroup Discovery: Characterizing Where Two Populations Differ, and Why

Sascha Xu, Jilles Vreeken

详情
英文摘要

We study the problem of understanding where two populations differ within a feature space, which we formalize in the concept of a differential subgroup: a subset of individuals from both populations who, despite sharing similar characteristics, exhibit exceptional differences in a target outcome. Differential subgroups reveal the regions of the feature space where population-level gaps are most pronounced and can help practitioners identify the covariate combinations that are structurally responsible for these differences, e.g.~in clinical analysis, model diagnostics, or treatment-effect studies. We introduce a general optimization objective for discovering differential subgroups and establish conditions under which the resulting subgroups admit a causal interpretation of population differences. We propose DiffSub, a gradient-based approach that discovers interpretable differential subgroups in tabular data. Across synthetic benchmarks, medical case studies, model-error analyses, and treatment-effect settings, DiffSub identifies informative subgroups that reveal where population differences arise and why.

2604.27295 2026-05-01 cs.AI cs.LG

Learning Rate Engineering: From Coarse Single Parameter to Layered Evolution

Ming-Hong Yao, Di Wang, Jian Cui, Jin-Yan Chen, Zi-Hao Cui, Fa Wang, Chen Wei, Qiu-Ye Yu

Comments 16 pages, 5 figures, 3 tables

详情
英文摘要

Learning rate scheduling has evolved from the single global fixed rate of early SGD to sophisticated layer-wise adaptive strategies. We systematize this evolution into five generations: (Gen1) global fixed learning rates, (Gen2) global scheduling, (Gen3) parameter-level adaptation, (Gen4) layer-level differentiation, and (Gen5) joint layer-time scheduling. We trace the fundamental motivation behind each transition, showing how the shift from one-size-fits-all to tailoring by layer and time addresses the impossible trinity of transfer learning: lower layers require small updates to preserve general knowledge while higher layers need large updates to adapt to new tasks. Building on this taxonomy, we propose Discriminative Adaptive Layer Scaling (DALS), a unified framework that integrates phase-adaptive cosine scheduling, depth-aware Grokfast gradient filtering, and LARS-style trust ratios into a single coherent optimizer. We benchmark 18 strategies including three DALS variants across all five generations on five datasets: synthetic, CIFAR-10 (from scratch), RTE, TREC-6, and IMDb (fine-tuning). On synthetic, DALS achieves the best accuracy at 98.0%, while DALS-Fast reaches 90% in just 3 epochs. The cross-dataset analysis reveals striking regime-dependent patterns -- no single strategy wins across all regimes. Critically, STLR+Discriminative, the ULMFiT champion, catastrophically fails on from-scratch tasks (43.6% on TREC-6 from scratch vs. 96.8% with RAdam), confirming that directional decay biases are harmful without pretrained features. DALS avoids either extreme, achieving the best synthetic result while maintaining competitive fine-tuning performance.

2604.26355 2026-05-01 cs.CL

Shorthand for Thought: Compressing LLM Reasoning via Entropy-Guided Supertokens

Zhenyu Zhao, Sander Land, Daniel M. Bikel, Waseem Alshikh

详情
英文摘要

Reasoning in Large Language Models incurs significant inference-time compute, yet the token-level information structure of reasoning traces remains underexplored. We observe that reasoning tokens split into two functional types: low-entropy \textit{structural} tokens (recurring phrases that scaffold the reasoning process) and higher-entropy \textit{organic} tokens (problem-specific content that drives toward a solution). This asymmetry motivates a simple, model-agnostic compression pipeline: apply cross-word BPE merges on a model's own reasoning traces to derive \textit{supertokens} that capture frequent structural patterns, then teach the model to adopt them via supervised fine-tuning. Across three model families and five mathematical reasoning benchmarks, our approach shortens reasoning traces by 8.1\% on average with no statistically significant accuracy loss on any model--benchmark pair. Beyond compression, supertokens act as interpretable reasoning-move annotations (backtracking, verification, strategy shifts), exposing the model's high-level strategy at a glance. Analyzing transitions between structural categories reveals systematic differences between correct and incorrect traces: correct traces show productive recovery (backtracking followed by strategy shifts and verification), while incorrect traces are dominated by confusion cycles (repeated hedging and unresolved contradictions). These diagnostic signals suggest applications in reward shaping and early stopping for RL-based reasoning training.

2604.22615 2026-05-01 cs.RO

GazeVLA: Learning Human Intention for Robotic Manipulation

Chengyang Li, Kaiyi Xiong, Yuan Xu, Lei Qian, Yizhou Wang, Wentao Zhu

Comments Project page: https://gazevla.github.io

详情
英文摘要

Embodied foundation models have achieved significant breakthroughs in robotic manipulation, yet they still depend heavily on large-scale robot demonstrations. Although recent works have explored leveraging human data to alleviate this dependency, effectively extracting transferable knowledge remains a significant challenge due to the inherent embodiment gap between human and robot. We argue that the intention underlying human actions can serve as a powerful intermediate representation for bridging this gap. In this paper, we introduce a novel framework that explicitly learns and transfers human intention to facilitate robotic manipulation. Specifically, we model intention through gaze, as it naturally precedes physical actions and serves as an observable proxy for human intent. Our model is first pretrained on a large-scale egocentric human dataset to capture human intention and its synergy with action, followed by finetuning on a small set of robot and human data. During inference, the model adopts a Chain-of-Thought reasoning paradigm, sequentially predicting intention before executing the action. Extensive evaluations in simulation and real-world settings, across long-horizon and fine-grained tasks, and under few-shot and robustness benchmarks, show that our method consistently outperforms strong baselines, generalizes better, and achieves state-of-the-art performance. Project page: https://gazevla.github.io .

2604.21564 2026-05-01 cs.CL

Measuring Opinion Bias and Sycophancy via LLM-based Persuasion

Rodrigo Nogueira, Giovana Kerche Bonás, Thales Sales Almeida, Andrea Roque, Ramon Pires, Hugo Abonizio, Thiago Laitz, Celio Larcher, Roseval Malaquias Junior, Marcos Piau

详情
英文摘要

Large language models increasingly shape the information people consume: they are embedded in search, consulted for professional advice, deployed as agents, and used as a first stop for questions about policy, ethics, health, and politics. When such a model silently holds a position on a contested topic, that position propagates at scale into users' decisions. Eliciting a model's positions is harder than it first appears: contemporary assistants answer direct opinion questions with evasive disclaimers, and the same model may concede the opposite position once the user starts arguing one side. We propose a method, released as the open-source llm-bias-bench, for discovering the opinions an LLM actually holds on contested topics under conditions that resemble real multi-turn interaction. The method pairs two complementary free-form probes. Direct probing asks for the model's opinion across five turns of escalating pressure from a simulated user. Indirect probing never asks for an opinion and engages the model in argumentative debate, letting bias leak through how it concedes, resists, or counter-argues. Three user personas (neutral, agree, disagree) collapse into a nine-way behavioral classification that separates persona-independent positions from persona-dependent sycophancy, and an auditable LLM judge produces verdicts with textual evidence. The first instantiation ships 38 topics in Brazilian Portuguese across values, scientific consensus, philosophy, and economic policy. Applied to 13 assistants, the method surfaces findings of practical interest: argumentative debate triggers sycophancy 2-3x more than direct questioning (median 50% to 79%); models that look opinionated under direct questioning often collapse into mirroring under sustained arguments; and attacker capability matters mainly when an existing opinion must be dislodged, not when the assistant starts neutral.

2604.20967 2026-05-01 cs.RO cs.SY eess.SY

Clinical Evaluation of a Tongue-Controlled Wrist Abduction-Adduction Assistance in a 6-DoF Upper-Limb Exoskeleton for Individuals with ALS and SCI

Juwairiya S. Khan, Mostafa Mohammadi, Alexander L. Ammitzbøll, Ellen-Merete Hagen, Jakob Blicher Izabella Obál, Ana S. S. Cardoso, Oguzhan Kirtas, Rasmus L. Kæseler, John Rasmussen, Lotte N. S. Andreasen Struijk

Comments 9 pages, 7 figures and 2 tables. This work has been submitted to the IEEE Transactions on Neural Systems and Rehabilitation Engineering

详情
英文摘要

Upper-limb exoskeletons (ULEs) have the potential to restore functional independence in individuals with severe motor impairments; however, the clinical relevance of wrist degrees of freedom (DoF), particularly abduction-adduction (Ab-Ad), remains insufficiently evaluated. This study investigates the functional and user-perceived impact of wrist Ab-Ad assistance during two activities of daily living (ADLs). Wrist Ab-Ad assistance in a tongue-controlled 6-DoF ULE, EXOTIC2, was evaluated in a within-subject study involving one individual with amyotrophic lateral sclerosis and five individuals with spinal cord injury. Participants performed drinking and scratch stick leveling tasks with EXOTIC2 under two conditions: with and without wrist Ab-Ad assistance. Outcome measure included task success, task completion time, kinematic measures, and a usability questionnaire capturing comfort, functional perception, and acceptance. Enabling wrist Ab-Ad improved task success rates across both ADLs, with consistent reductions in spillage (from 77.8% spillages to 22.2%) and failed placements (from 66.7% to 16.7%). Participants utilized task-specific subsets of the available wrist range of motion, indicating that effective control within functional ranges was more critical than maximal joint excursion. Questionnaire responses indicated no increase in discomfort with the additional DoF and reflected perceived improvements in task performance. In conclusion, wrist Ab-Ad assistance enhances functional task performance in assistive exoskeleton use without compromising user comfort. However, its effectiveness depends on task context, control usability, and individual user strategies. This study provides clinically relevant, user-centered evidence supporting the inclusion of wrist Ab-Ad in ULEs, emphasizing the importance of balancing functional capability with usability in assistive device design.

2604.20893 2026-05-01 cs.RO

Design, Modelling and Experimental Evaluation of a Tendon-driven Wrist Abduction-Adduction Mechanism for an upper limb exoskeleton

Juwairiya S. Khan, Mostafa Mohammadi, John Rasmussen, Lotte N. S. Andreasen Struijk

Comments 8 pages and 8 figures. Submitted to IEEE/ASME Transactions on Mechatronics. Includes experimental validation on human participants

详情
英文摘要

Wrist exoskeletons play a vital role in rehabilitation and assistive applications, yet conventional actuation mechanisms such as electric motors or pneumatics often introduce undesirable weight, friction, and complexity. This paper presents a novel single-cable (tendon), torsional-spring-assisted actuation mechanism for wrist abduction-adduction, and a simulation-based method for selecting its stiffness parameters. The mechanism employs a single Bowden cable passively tensioned by a spiral torsional spring (clock spring) to maintain continuous cable tension without antagonistic actuation. Kinematic and dynamic modeling of the mechanism was performed to estimate the required torque and identify optimal spring parameters. These simulation-derived parameters guided the design of a functional prototype, which was experimentally evaluated with five participants with no motor disabilities (NMD) under varying arm positions and loading conditions using three spring configurations to account for user variability and modeling uncertainties. Experimental results show consistent agreement with simulation-derived trends, with the nominal spring configuration achieving balanced motion range, torque demand, and repeatability. The results demonstrate that simulation-informed stiffness selection can effectively guide the design of compact, cable-driven wrist exoskeletons while reducing reliance on empirical tuning.

2604.20522 2026-05-01 cs.SD cs.CV

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

Nan Xu, Shiheng Li, Shengchao Hou

Comments 52 pages, 18 figures, 16 tables

详情
英文摘要

We propose a new approach for a practical two-stage Optical Music Recognition (OMR) pipeline, with a particular focus on its second stage. Given symbol and event candidates from the visual pipeline, we decode them into an editable, verifiable, and exportable score structure. We focus on complex polyphonic staff notation, especially piano scores, where voice separation and intra-measure timing are the main bottlenecks. Our approach formulates second-stage decoding as a structure decoding problem and uses topology recognition with probability-guided search (BeadSolver) as its core method. We also describe a data strategy that combines procedural generation with recognition-feedback annotations. The result is a practical decoding component for real OMR systems and a path to accumulate structured score data for future end-to-end, multimodal, and RL-style methods.

2604.19528 2026-05-01 cs.LG cs.AI cs.DB

Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments

Jianyang Gao, Yutong Gou, Yuexuan Xu, Jifan Shi, Yongyi Yang, Shuolin Li, Raymond Chi-Wing Wong, Cheng Long

详情
英文摘要

This technical note revisits the relationship between RaBitQ and TurboQuant under a unified comparison framework. We compare the two methods in terms of methodology, theoretical guarantees, and empirical performance, using a reproducible, transparent, and symmetric setup. Our results show that, despite the claimed advantage of TurboQuant, TurboQuant performs worse than RaBitQ in most tested settings of inner-product estimation, nearest-neighbor search and KV cache quantization. We further find that several reported runtime and recall results in the TurboQuant paper could not be reproduced from the released implementation under the stated configuration. Overall, this note clarifies the shared structure and genuine differences between the two lines of work, while documenting reproducibility issues in the experimental results reported by the TurboQuant paper.

2604.18578 2026-05-01 cs.LG cs.AI

Bounded Ratio Reinforcement Learning

Yunke Ao, Le Chen, Bruce D. Lee, Assefa S. Wahd, Aline Czarnobai, Philipp Fürnstahl, Bernhard Schölkopf, Andreas Krause

Comments 23 pages, 9 figures; Project page and code available at https://bounded-ratio-rl.github.io/brrl/

详情
英文摘要

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying foundations of trust region methods and the heuristic clipped objective used in PPO. In this paper, we bridge this gap by introducing the Bounded Ratio Reinforcement Learning (BRRL) framework. We formulate a novel regularized and constrained policy optimization problem and derive its analytical optimal solution. We prove that this solution ensures monotonic performance improvement. To handle parameterized policy classes, we develop a policy optimization algorithm called Bounded Policy Optimization (BPO) that minimizes an advantage-weighted divergence between the policy and the analytic optimal solution from BRRL. We further establish a lower bound on the expected performance of the resulting policy in terms of the BPO loss function. Notably, our framework also provides a new theoretical lens to interpret the success of the PPO loss, and connects trust region policy optimization and the Cross-Entropy Method (CEM). We additionally extend BPO to Group-relative BPO (GBPO) for LLM fine-tuning. Empirical evaluations of BPO across MuJoCo, Atari, and complex IsaacLab environments (e.g., Humanoid locomotion), and of GBPO for LLM fine-tuning tasks, demonstrate that BPO and GBPO generally match or outperform PPO and GRPO in stability and final performance.

2604.02715 2026-05-01 cs.LG

FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving

Qingxiu Liu, Cyril Y. He, Hanser Jiang, Zion Wang, Alan Zhao, Patrick P. C. Lee

详情
英文摘要

Mixture-of-Experts (MoE) models have become a dominant paradigm for scaling large language models, but their rapidly growing parameter sizes introduce a fundamental inefficiency during inference: most expert weights remain idle in GPU memory while competing with performance-critical runtime state such as the key-value (KV) cache. Since KV cache capacity directly determines serving throughput, this mismatch leads to underutilized memory and degraded performance. In this paper, we present FluxMoE, a new MoE inference system that decouples expert parameters from persistent GPU residency. FluxMoE introduces an expert paging abstraction that treats expert weights as streamed, transient resources, materializing them on demand and evicting them immediately after use, allowing GPU memory to be preferentially allocated to throughput-critical runtime state. We implement FluxMoE atop vLLM to enable efficient MoE inference under severe memory constraints. Experimental results demonstrate that FluxMoE achieves up to 3.0$\times$ throughput gains over vLLM in memory-intensive regimes, without compromising model fidelity.