arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.07102 2026-04-09 cs.CL cs.AI

The Impact of Steering Large Language Models with Persona Vectors in Educational Applications

Yongchao Wu, Aron Henriksson

详情
英文摘要

Activation-based steering can personalize large language models at inference time, but its effects in educational settings remain unclear. We study persona vectors for seven character traits in short-answer generation and automated scoring on the ASAP-SAS benchmark across three models spanning two architectures. Persona steering lowers answer quality overall, with much larger effects on open-ended English Language Arts (ELA) prompts than on factual science prompts; interpretive and argumentative tasks are up to 11x more sensitive. On the scoring side, we observe predictable valence-aligned calibration shifts: evil and impolite scorers grade more harshly, while good and optimistic scorers grade more leniently. ELA tasks are 2.5-3x more susceptible to scorer personalization than science tasks, and the Mixture-of-Experts model shows roughly 6x larger calibration shifts than the dense models. To our knowledge, this is the first study to systematically examine the effects of activation-steered persona traits in educational generation and scoring, and the results highlight the need for task-aware and architecture-aware calibration when deploying steered models in educational settings.

2604.07101 2026-04-09 cs.CV cs.AI cs.MM eess.IV

SurFITR: A Dataset for Surveillance Image Forgery Detection and Localisation

Qizhou Wang, Guansong Pang, Christopher Leckie

详情
英文摘要

We present the Surveillance Forgery Image Test Range (SurFITR), a dataset for surveillance-style image forgery detection and localisation, in response to recent advances in open-access image generation models that raise concerns about falsifying visual evidence. Existing forgery models, trained on datasets with full-image synthesis or large manipulated regions in object-centric images, struggle to generalise to surveillance scenarios. This is because tampering in surveillance imagery is typically localised and subtle, occurring in scenes with varied viewpoints, small or occluded subjects, and lower visual quality. To address this gap, SurFITR provides a large collection of forensically valuable imagery generated via a multimodal LLM-powered pipeline, enabling semantically aware, fine-grained editing across diverse surveillance scenes. It contains over 137k tampered images with varying resolutions and edit types, generated using multiple image editing models. Extensive experiments show that existing detectors degrade significantly on SurFITR, while training on SurFITR yields substantial improvements in both in-domain and cross-domain performance. SurFITR is publicly available on GitHub.

2604.07097 2026-04-09 cs.CV

Novel Anomaly Detection Scenarios and Evaluation Metrics to Address the Ambiguity in the Definition of Normal Samples

Reiji Saito, Satoshi Kamiya, Kazuhiro Hotta

Comments Accepted by CVPR 2026 Workshop

详情
英文摘要

In conventional anomaly detection, training data consist of only normal samples. However, in real-world scenarios, the definition of a normal sample is often ambiguous. For example, there are cases where a sample has small scratches or stains but is still acceptable for practical usage. On the other hand, higher precision is required when manufacturing equipment is upgraded. In such cases, normal samples may include small scratches, tiny dust particles, or a foreign object that we would prefer to classify as an anomaly. Such cases frequently occur in industrial settings, yet they have not been discussed until now. Thus, we propose novel scenarios and an evaluation metric to accommodate specification changes in real-world applications. Furthermore, to address the ambiguity of normal samples, we propose the RePaste, which enhances learning by re-pasting regions with high anomaly scores from the previous step into the input for the next step. On our scenarios using the MVTec AD benchmark, RePaste achieved the state-of-the-art performance with respect to the proposed evaluation metric, while maintaining high AUROC and PRO scores. Code: https://github.com/ReijiSoftmaxSaito/Scenario

2604.07095 2026-04-09 cs.CL

Multilingual Embedding Probes Fail to Generalize Across Learner Corpora

Laurits Lyngbaek, Ross Deans Kristensen-McLachlan

详情
英文摘要

Do multilingual embedding models encode a language-general representation of proficiency? We investigate this by training linear and non-linear probes on hidden-state activations from Qwen3-Embedding (0.6B, 4B, 8B) to predict CEFR proficiency levels from learner texts across nine corpora and seven languages. We compare five probing architectures against a baseline trained on surface-level text features. Under in-distribution evaluation, probes achieve strong performance ($QWK\approx0.7$), substantially outperforming the surface baseline, with middle layers consistently yielding the best predictions. However, in cross-corpus evaluation performance collapses across all probe types and model sizes. Residual analysis reveals that out-of-distribution probes converge towards predicting uniformly distributed labels, indicating that the learned mappings capture corpus-specific distributional properties (topic, language, task type, rating methodology) rather than an abstract, transferable proficiency dimension. These results suggest that current multilingual embeddings do not straightforwardly encode language-general proficiency, with implications for representation-based approaches to proficiency-adaptive language technology.

2604.07084 2026-04-09 cs.RO cs.AI

Flow Motion Policy: Manipulator Motion Planning with Flow Matching Models

Davood Soleymanzadeh, Xiao Liang, Minghui Zheng

详情
英文摘要

Open-loop end-to-end neural motion planners have recently been proposed to improve motion planning for robotic manipulators. These methods enable planning directly from sensor observations without relying on a privileged collision checker during planning. However, many existing methods generate only a single path for a given workspace across different runs, and do not leverage their open-loop structure for inference-time optimization. To address this limitation, we introduce Flow Motion Policy, an open-loop, end-to-end neural motion planner for robotic manipulators that leverages the stochastic generative formulation of flow matching methods to capture the inherent multi-modality of planning datasets. By modeling a distribution over feasible paths, Flow Motion Policy enables efficient inference-time best-of-$N$ sampling. The method generates multiple end-to-end candidate paths, evaluates their collision status after planning, and executes the first collision-free solution. We benchmark the Flow Motion Policy against representative sampling-based and neural motion planning methods. Evaluation results demonstrate that Flow Motion Policy improves planning success and efficiency, highlighting the effectiveness of stochastic generative policies for end-to-end motion planning and inference-time optimization. Experimental evaluation videos are available via this \href{https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2026/03/FMP-Website.mp4}{link}.

2604.07082 2026-04-09 physics.comp-ph cs.NA math.NA

Granular mixing and flow dynamics in horizontal stirred bed reactors

Sahar Pourandi, Igor Ostanin, Thomas Weinhart

详情
英文摘要

Horizontal stirred bed reactors (HSBRs) are used in gas--phase polyolefin production, where efficient solids mixing and controlled residence time distributions are essential for product quality and stability. Despite their industrial relevance, the influence of operating conditions on granular flow and mixing in HSBRs is not well understood. Discrete Element Method (DEM) simulations are used to study the effects of rotation speed and fill level on particle motion, mixing, and axial transport in a lab--scale HSBR. An industrial--grade polypropylene powder is modelled using calibrated contact parameters. Mixing is quantified using the Lacey index in axial (z) and cross--sectional (xy) directions. Particle circulation is characterised via cycle--time analysis and a coarse--grained angular velocity field. Axial dispersion coefficients are obtained from particle trajectories using both Einstein--type and cycle--based approaches, and validated with a diffusion model predicting the axial Lacey index. Results show that axial mixing depends strongly on rotation speed and fill level: higher rotation speeds accelerate homogenization, while higher fill levels slow mixing. Cross--sectional mixing is mainly sensitive to rotation speed, with fill--level effects diminishing at higher speeds. Cycle time decreases with increasing rotation speed and fill level, indicating enhanced circulation. Axial dispersion increases with rotation speed but decreases with fill level, with consistent results across methods. These findings reveal trade--offs between axial mixing, circulation, and dispersion, highlighting the need to balance operating conditions and demonstrating the capability of DEM to support HSBR optimisation.

2604.07081 2026-04-09 eess.SY cs.SY

Small-gain analysis of exponential incremental input/output-to-state stability for large-scale distributed systems

Christian Gatke, Julian D. Schiller, Matthias A. Müller

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

We provide a detectability analysis for nonlinear large-scale distributed systems in the sense of exponential incremental input/output-to-state stability (i-IOSS). In particular, we prove that the overall system is exponentially i-IOSS if each subsystem is i-IOSS, with interconnections treated as external inputs, and a suitable small-gain condition holds. The analysis is extended to a Lyapunov characterization, resulting in a different quantitative outcome regarding the small-gain condition, which is further analyzed within this work. Moreover, we derive linear matrix inequality conditions posed solely on the local subsystems and their interconnections, which guarantee exponential i-IOSS of the overall distributed system. The results are illustrated on a numerical example.

2604.07079 2026-04-09 cs.IR

MARVEL: Multimodal Adaptive Reasoning-intensiVe Expand-rerank and retrievaL

Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Mahmoud Abdalla, Abdelrahman Abdallah, Hyun-Soo Kang

详情
英文摘要

Multimodal retrieval over text corpora remains a fundamental challenge: the best vision-language encoder achieves only 27.6 nDCG@10 on MM-BRIGHT, a reasoning-intensive multimodal retrieval benchmark, underperforming strong text-only systems. We argue that effective multimodal retrieval requires three tightly integrated capabilities that existing approaches address only in isolation: expanding the query's latent intent, retrieving with a model trained for complex reasoning, and reranking via explicit step-by-step reasoning over candidates. We introduce \textbf{MARVEL} (\textbf{M}ultimodal \textbf{A}daptive \textbf{R}easoning-intensi\textbf{V}e \textbf{E}xpand-rerank and retrieva\textbf{L}), a unified pipeline that combines LLM-driven query expansion, \textbf{MARVEL-Retriever} -- a reasoning-enhanced dense retriever fine-tuned for complex multimodal queries -- and GPT-4o-based chain-of-thought reranking with optional multi-pass reciprocal rank fusion. Evaluated on MM-BRIGHT across 29 technical domains, MARVEL achieves \textbf{37.9} nDCG@10, surpassing the best multimodal encoder by \textbf{+10.3 points} and outperforming all single-stage baselines in 27 of 29 domains and matching or approaching the best baseline in the remaining two highly-specialized domains (Crypto, Quantum Computing), demonstrating that reasoning-intensive multimodal retrieval is best addressed through a unified expand-retrieve-rerank framework. https://github.com/mm-bright/multimodal-reasoning-retrieval

2604.07072 2026-04-09 cs.LG

Epistemic Robust Offline Reinforcement Learning

Abhilash Reddy Chenreddy, Erick Delage

详情
英文摘要

Offline reinforcement learning learns policies from fixed datasets without further environment interaction. A key challenge in this setting is epistemic uncertainty, arising from limited or biased data coverage, particularly when the behavior policy systematically avoids certain actions. This can lead to inaccurate value estimates and unreliable generalization. Ensemble-based methods like SAC-N mitigate this by conservatively estimating Q-values using the ensemble minimum, but they require large ensembles and often conflate epistemic with aleatoric uncertainty. To address these limitations, we propose a unified and generalizable framework that replaces discrete ensembles with compact uncertainty sets over Q-values. %We further introduce an Epinet based model that directly shapes the uncertainty sets to optimize the cumulative reward under the robust Bellman objective without relying on ensembles. We also introduce a benchmark for evaluating offline RL algorithms under risk-sensitive behavior policies, and demonstrate that our method achieves improved robustness and generalization over ensemble-based baselines across both tabular and continuous state domains.

2604.07071 2026-04-09 cs.HC cs.CR

BioMoTouch: Touch-Based Behavioral Authentication via Biometric-Motion Interaction Modeling

Zijian Ling, Jianbang Chen, Hongwei Li, Hongda Zhai, Man Zhou, Jun Feng, Zhengxiong Li, Qi Li, Qian Wang

Comments 13 pages

详情
英文摘要

Touch-based authentication is widely deployed on mobile devices due to its convenience and seamless user experience. However, existing systems largely model touch interaction as a purely behavioral signal, overlooking its intrinsic multidimensional nature and limiting robustness against sophisticated adversarial behaviors and real-world variations. In this work, we present BioMoTouch, a multi-modal touch authentication framework on mobile devices grounded in a key empirical finding: during touch interaction, inertial sensors capture user-specific behavioral dynamics, while capacitive screens simultaneously capture physiological characteristics related to finger morphology and skeletal structure. Building upon this insight, BioMoTouch jointly models physiological contact structures and behavioral motion dynamics by integrating capacitive touchscreen signals with inertial measurements. Rather than combining independent decisions, the framework explicitly learns their coordinated interaction to form a unified representation of touch behavior. BioMoTouch operates implicitly during natural user interactions and requires no additional hardware, enabling practical deployment on commodity mobile devices. We evaluate BioMoTouch with 38 participants under realistic usage conditions. Experimental results show that BioMoTouch achieves a balanced accuracy of 99.71% and an equal error rate of 0.27%. Moreover, it maintains false acceptance rates below 0.90% under artificial replication, mimicry, and puppet attack scenarios, demonstrating strong robustness against partial-factor manipulation.

2604.07069 2026-04-09 eess.SY cs.LG cs.SY math.DS

Controller Design for Structured State-space Models via Contraction Theory

Muhammad Zakwan, Vaibhav Gupta, Alireza Karimi, Efe C. Balta, Giancarlo Ferrari-Trecate

Comments The first and second authors contributed equally. The paper has been accepted in 24th European Control Conference (ECC) in Reykjavik, Iceland, 2026

详情
英文摘要

This paper presents an indirect data-driven output feedback controller synthesis for nonlinear systems, leveraging Structured State-space Models (SSMs) as surrogate models. SSMs have emerged as a compelling alternative in modelling time-series data and dynamical systems. They can capture long-term dependencies while maintaining linear computational complexity with respect to the sequence length, in comparison to the quadratic complexity of Transformer-based architectures. The contributions of this work are threefold. We provide the first analysis of controllability and observability of SSMs, which leads to scalable control design via Linear Matrix Inequalities (LMIs) that leverage contraction theory. Moreover, a separation principle for SSMs is established, enabling the independent design of observers and state-feedback controllers while preserving the exponential stability of the closed-loop system. The effectiveness of the proposed framework is demonstrated through a numerical example, showcasing nonlinear system identification and the synthesis of an output feedback controller.

2604.07067 2026-04-09 cs.CL

Is Cross-Lingual Transfer in Bilingual Models Human-Like? A Study with Overlapping Word Forms in Dutch and English

Iza Škrjanec, Irene Elisabeth Winther, Vera Demberg, Stefan L. Frank

详情
英文摘要

Bilingual speakers show cross-lingual activation during reading, especially for words with shared surface form. Cognates (friends) typically lead to facilitation, whereas interlingual homographs (false friends) cause interference or no effect. We examine whether cross-lingual activation in bilingual language models mirrors these patterns. We train Dutch-English causal Transformers under four vocabulary-sharing conditions that manipulate whether (false) friends receive shared or language-specific embeddings. Using psycholinguistic stimuli from bilingual reading studies, we evaluate the models through surprisal and embedding similarity analyses. The models largely maintain language separation, and cross-lingual effects arise primarily when embeddings are shared. In these cases, both friends and false friends show facilitation relative to controls. Regression analyses reveal that these effects are mainly driven by frequency rather than consistency in form-meaning mapping. Only when just friends share embeddings are the qualitative patterns of bilinguals reproduced. Overall, bilingual language models capture some cross-linguistic activation effects. However, their alignment with human processing seems to critically depend on how lexical overlap is encoded, possibly limiting their explanatory adequacy as models of bilingual reading.

2604.07066 2026-04-09 cs.CL

SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis (DimABSA)

Liang-Chih Yu, Jonas Becker, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Lung-Hao Lee, Ying-Lung Lin, Jin Wang, Jan Philip Wahle, Terry Ruas, Natalia Loukachevitch, Alexander Panchenko, Ilseyar Alimova, Lilian Wanzare, Nelson Odhiambo, Bela Gipp, Kai-Wei Chang, Saif M. Mohammad

详情
英文摘要

We present the SemEval-2026 shared task on Dimensional Aspect-Based Sentiment Analysis (DimABSA), which improves traditional ABSA by modeling sentiment along valence-arousal (VA) dimensions rather than using categorical polarity labels. To extend ABSA beyond consumer reviews to public-issue discourse (e.g., political, energy, and climate issues), we introduce an additional task, Dimensional Stance Analysis (DimStance), which treats stance targets as aspects and reformulates stance detection as regression in the VA space. The task consists of two tracks: Track A (DimABSA) and Track B (DimStance). Track A includes three subtasks: (1) dimensional aspect sentiment regression, (2) dimensional aspect sentiment triplet extraction, and (3) dimensional aspect sentiment quadruplet extraction, while Track B includes only the regression subtask for stance targets. We also introduce a continuous F1 (cF1) metric to jointly evaluate structured extraction and VA regression. The task attracted more than 400 participants, resulting in 112 final submissions and 42 system description papers. We report baseline results, discuss top-performing systems, and analyze key design choices to provide insights into dimensional sentiment analysis at the aspect and stance-target levels. All resources are available on our GitHub repository.

2604.07065 2026-04-09 eess.SY cs.SY

Trust-as-a-Service: Task-Specific Orchestration for Effective Task Completion via Model Context Protocol-Aided Agentic AI

Botao Zhu, Xianbin Wang

详情
英文摘要

As future tasks in networked systems are increasingly relying on collaborative execution among distributed devices, trust has become an essential tool for securing both reliable collaborators and task-specific resources. However, the diverse requirements of different tasks, the limited information of task owners on others, and the complex relationships among networked devices pose significant challenges to achieving timely and accurate trust evaluation of potential collaborators for meeting task-specific needs. To address these challenges, this paper proposes Trust-as-a-Service (TaaS), a novel paradigm that encapsulates complex trust mechanisms into a unified, system-wide service. This paradigm enables efficient utilization of distributed trust-related data, need-driven trust evaluation service provision, and task-specific collaborator organization. To realize TaaS, we develop an agentic AI-based framework as the enabling platform by leveraging the Model Context Protocol (MCP). The central server-side agent autonomously performs trust-related operations in accordance with specific task requirements, delivering the trust assessment service to all task owners through a unified interface. Meanwhile, all device-side agents expose their capabilities and resources via MCP servers, allowing devices to be dynamically discovered, evaluated, engaged, and released, thereby forming task-specific collaborative units. Experimental results demonstrate that the proposed TaaS achieves 100\% collaborator selection accuracy, along with high reliability and resource-efficient task completion.

2604.07064 2026-04-09 eess.SY cs.SY

TSO-DSO Coordinated Reactive Power Dispatch for Smart Inverters with Multiple Control Modes Real-Time Implementation

Mohammad Almomani, Ahmed Alkhonain, Venkataramana Ajjarapu

详情
英文摘要

This paper presents TSO-DSO coordinated reactive power dispatch, with a focus on real-time implementation. A sensitivity-aware, mixed-integer linear programming (MILP) formulation is developed to model the IEEE 1547-compliant droop-based control modes Volt VAR (VV), Volt Watt (VW), and Watt VAR (WV) of smart inverters. The algorithm employs a hierarchical optimization strategy using Special Ordered Sets (SOS1) to enhance computational efficiency and supports limited measurement scenarios through Recursive Least Squares (RLS) estimation. The proposed method is tested on the IEEE 13-bus and 123-bus distribution networks, which are connected to a 9-bus transmission system. Results demonstrate the feasibility and effectiveness of the real-time dispatch framework in improving voltage regulation and minimizing power curtailment.

2604.07059 2026-04-09 cs.LG

Production-Ready Automated ECU Calibration using Residual Reinforcement Learning

Andreas Kampmeier, Kevin Badalian, Lucas Koch, Sung-Yong Lee, Jakob Andert

Comments This manuscript has been submitted to SAE as a conference paper for the 2026 Stuttgart International Symposium on Automotive and Powertrain Technology

详情
英文摘要

Electronic Control Units (ECUs) have played a pivotal role in transforming motorcars of yore into the modern vehicles we see on our roads today. They actively regulate the actuation of individual components and thus determine the characteristics of the whole system. In this, the behavior of the control functions heavily depends on their calibration parameters which engineers traditionally design by hand. This is taking place in an environment of rising customer expectations and steadily shorter product development cycles. At the same time, legislative requirements are increasing while emission standards are getting stricter. Considering the number of vehicle variants on top of all that, the conventional method is losing its practical and financial viability. Prior work has already demonstrated that optimal control functions can be automatically developed with reinforcement learning (RL); since the resulting functions are represented by artificial neural networks, they lack explainability, a circumstance which renders them challenging to employ in production vehicles. In this article, we present an explainable approach to automating the calibration process using residual RL which follows established automotive development principles. Its applicability is demonstrated by means of a map-based air path controller in a series control unit using a hardware-in-the-loop (HiL) platform. Starting with a sub-optimal map, the proposed methodology quickly converges to a calibration which closely resembles the reference in the series ECU. The results prove that the approach is suitable for the industry where it leads to better calibrations in significantly less time and requires virtually no human intervention

2604.07058 2026-04-09 cs.FL

The Quadratic State Cost of Classical Simulation of One-Way Quantum Finite Automata

Zeyu Chen, Junde Wu

详情
英文摘要

Generalized finite automata (GFAs), probabilistic finite automata (PFAs), and one-way general quantum finite automata (1gQFA) recognize the same strict-cutpoint languages, but the state complexity of exact probabilistic simulation has remained unclear. This paper determines that worst-case cost exactly: every \(n\)-state 1gQFA admits exact strict-cutpoint simulation by a one-way PFA with \(O(n^2)\) states, via the standard \(n^2\)-dimensional mixed-state linearization together with an explicit alphabet-preserving construction that converts each \(k\)-state GFA into a one-way PFA with at most \(2k+6\) states; conversely, for every \(n\ge 2\), there exists an \(n\)-state 1gQFA for which every equivalent one-way PFA requires at least \(n^2-1\) states, obtained from a prepare--test construction and a Vapnik--Chervonenkis dimension argument. Hence the worst-case probabilistic state cost of exact strict-cutpoint simulation is \(Θ(n^2)\).

2604.07057 2026-04-09 cs.CL

IndoBERT-Sentiment: Context-Conditioned Sentiment Classification for Indonesian Text

Muhammad Apriandito Arya Saputra, Andry Alamsyah, Dian Puteri Ramadhani, Thomhert Suprapto Siadari, Hanif Fakhrurroja

Comments 8 pages, 5 tables, and 2 figures

详情
英文摘要

Existing Indonesian sentiment analysis models classify text in isolation, ignoring the topical context that often determines whether a statement is positive, negative, or neutral. We introduce IndoBERT-Sentiment, a context-conditioned sentiment classifier that takes both a topical context and a text as input, producing sentiment predictions grounded in the topic being discussed. Built on IndoBERT Large (335M parameters) and trained on 31,360 context-text pairs labeled across 188 topics, the model achieves an F1 macro of 0.856 and accuracy of 88.1%. In a head-to-head evaluation against three widely used general-purpose Indonesian sentiment models on the same test set, IndoBERT-Sentiment outperforms the best baseline by 35.6 F1 points. We show that context-conditioning, previously demonstrated for relevancy classification, transfers effectively to sentiment analysis and enables the model to correctly classify texts that are systematically misclassified by context-free approaches.

2604.07051 2026-04-09 eess.SY cs.SY

Trajectory-Based Nonlinear Indices for Real-Time Monitoring and Quantification of Short-Term Voltage Stability

Mohammad Almomani, Muhammad Sarwar, Venkataramana Ajjarapu

详情
英文摘要

Existing short term voltage stability (STVS) methods typically address either voltage oscillations or delayed voltage recovery; however, the coexistence of both phenomena has not been adequately covered in the literature. Moreover, existing real-time STVS assessment methods often provide only binary stability classifications. This paper proposes novel indices that enable early detection and quantify the degree of stability. The proposed method decomposes post-fault voltage trajectories using Empirical Mode Decomposition (EMD) into residual and oscillatory components. It then employs Lyapunov Exponents (LEs) to characterize the dynamic behavior of each component and evaluates the stability degree using Kullback Leibler (KL) divergence by comparing the LEs of each component with those of a predefined critical signal. The proposed indices assess oscillatory stability significantly faster than the traditional LE method applied directly to the original signal. Specifically, they detect stability within 0.6 seconds after a fault, compared to approximately 10 seconds for the conventional LE approach. In addition, the delayed-recovery index can identify generator trips caused by over-excitation limits within 3 seconds, well before the actual trip occurs at approximately 20 seconds, thereby providing operators and controllers sufficient time to take preventive actions. Furthermore, thresholds are derived to distinguish between stable and unstable cases, offering a graded measure of the stability margin. Simulation studies on the Nordic test system under varying load conditions demonstrate the effectiveness of the proposed indices.

2604.07041 2026-04-09 cs.DB cs.AI cs.ET cs.HC cs.IR

AV-SQL: Decomposing Complex Text-to-SQL Queries with Agentic Views

Minh Tam Pham, Trinh Pham, Tong Chen, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen

详情
英文摘要

Text-to-SQL is the task of translating natural language queries into executable SQL for a given database, enabling non-expert users to access structured data without writing SQL manually. Despite rapid advances driven by large language models (LLMs), existing approaches still struggle with complex queries in real-world settings, where database schemas are large and questions require multi-step reasoning over many interrelated tables. In such cases, providing the full schema often exceeds the context window, while one-shot generation frequently produces non-executable SQL due to syntax errors and incorrect schema linking. To address these challenges, we introduce AV-SQL, a framework that decomposes complex Text-to-SQL into a pipeline of specialized LLM agents. Central to AV-SQL is the concept of agentic views: agent-generated Common Table Expressions (CTEs) that encapsulate intermediate query logic and filter relevant schema elements from large schemas. AV-SQL operates in three stages: (1) a rewriter agent compresses and clarifies the input query; (2) a view generator agent processes schema chunks to produce agentic views; and (3) a planner, generator, and revisor agent collaboratively compose these views into the final SQL query. Extensive experiments show that AV-SQL achieves 70.38% execution accuracy on the challenging Spider 2.0 benchmark, outperforming state-of-the-art baselines, while remaining competitive on standard datasets with 85.59% on Spider, 72.16% on BIRD and 63.78% on KaggleDBQA. Our source code is available at https://github.com/pminhtam/AV-SQL.

2604.07038 2026-04-09 cs.RO q-bio.NC

Exploring the proprioceptive potential of joint receptors using a biomimetic robotic joint

Akihiro Miki, Shun Hasegawa, Sota Yuzaki, Yuta Sahara, Yoshimoto Ribayashi, Kento Kawaharazuka, Kei Okada

Comments 26 pages including supplementary materials (17 pages main text), 6 main figures and 7 supplementary figures. Published in Scientific Reports

详情
Journal ref
Scientific Reports, 16, Article number: 4724 (2026)
英文摘要

In neuroscience, joint receptors have traditionally been viewed as limit detectors, providing positional information only at extreme joint angles, while muscle spindles are considered the primary sensors of joint angle position. However, joint receptors are widely distributed throughout the joint capsule, and their full role in proprioception remains unclear. In this study, we specifically focused on mimicking Type I joint receptors, which respond to slow and sustained movements, and quantified their proprioceptive potential using a biomimetic joint developed with robotics technology. Results showed that Type I-like joint receptors alone enabled proprioceptive sensing with an average error of less than 2 degrees in both bending and twisting motions. These findings suggest that joint receptors may play a greater role in proprioception than previously recognized and that the relative contributions of muscle spindles and joint receptors are differentially weighted within neural networks during development and evolution. Furthermore, this work may prompt new discussions on the differential proprioceptive deficits observed between the elbows and knees in patients with hereditary sensory and autonomic neuropathy type III. Together, these findings highlight the potential of biomimetics-based robotic approaches for advancing interdisciplinary research bridging neuroscience, medicine, and robotics.

2604.07037 2026-04-09 hep-ex cs.CV

Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training

Saúl Alonso-Monsalve, Fabio Cufino, Umut Kose, Anna Mascellani, André Rubbia

Comments 18 pages, 6 figures

详情
英文摘要

Accelerator-based neutrino physics is entering an energy-frontier regime in which interactions reach the TeV scale and produce exceptionally dense, overlapping detector signatures. In this regime, event interpretation becomes impractical for conventional reconstruction approaches, particularly when labelled data are scarce and the analysis spans diverse downstream objectives. We present a sparse ViT framework for learning reusable representations from heterogeneous detector data. Self-supervised pre-training combines masked autoencoder reconstruction with relational voxel-level objectives for hierarchy, ghost and particle identification, and the resulting shared encoder is then jointly fine-tuned across classification and regression tasks. Evaluated on simulated events from the proposed FASERCal concept at the LHC, we find that pre-training consistently improves neutrino flavour and charm-quark identification, momentum regression, and vertex reconstruction over training from scratch, with the addition of relational objectives yielding further gains in the most topologically complex channels. Interpretability analyses further show that pre-training yields a more structured latent space, while detector-subsystem ablations recover physically plausible channel-dependent roles for the heterogeneous inputs. A data-efficiency study shows that, with roughly $10^3$ labelled events, the pre-trained encoder already matches the flavour-classification performance of a randomly initialised model trained on an order of magnitude more data. The learned representations also transfer effectively to publicly available benchmarks spanning different detector technologies and energy scales, matching or exceeding published baselines. These results support self-supervised pre-training on multimodal detector data as a scalable route towards reusable representations for neutrino and particle-detector analysis.

2604.07036 2026-04-09 cs.CL cs.LG cs.MA

ReDAct: Uncertainty-Aware Deferral for LLM Agents

Dzianis Piatrashyn, Nikita Kotelevskii, Kirill Grishchenkov, Nikita Glazkov, Ivan Nasonov, Ilya Makarov, Timothy Baldwin, Preslav Nakov, Roman Vashurin, Maxim Panov

详情
英文摘要

Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems. However, they inherit the tendency of LLMs to hallucinate, leading to incorrect decisions. In sequential settings, even a single mistake can irreversibly degrade the trajectory, making hallucinations an even bigger problem. Although larger LLMs hallucinate less, they incur a significantly higher per-token cost. In this paper, we address this tradeoff by proposing ReDAct (Reason-Defer-Act). In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model. When the predictive uncertainty of the small model exceeds a calibrated threshold, the decision is deferred to the large model. We evaluate our approach in text-based embodied environments such as ALFWorld and MiniGrid and show that deferring only about 15% of decisions to the large model can match the quality of using it exclusively, while significantly reducing inference costs.

2604.07034 2026-04-09 cs.RO cs.AI cs.CV

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

Mehdi Hosseinzadeh, King Hang Wong, Feras Dayoub

Comments ICRA 2026; Project page: https://m80hz.github.io/kite/

详情
英文摘要

We present KITE, a training-free, keyframe-anchored, layout-grounded front-end that converts long robot-execution videos into compact, interpretable tokenized evidence for vision-language models (VLMs). KITE distills each trajectory into a small set of motion-salient keyframes with open-vocabulary detections and pairs each keyframe with a schematic bird's-eye-view (BEV) representation that encodes relative object layout, axes, timestamps, and detection confidence. These visual cues are serialized with robot-profile and scene-context tokens into a unified prompt, allowing the same front-end to support failure detection, identification, localization, explanation, and correction with an off-the-shelf VLM. On the RoboFAC benchmark, KITE with Qwen2.5-VL substantially improves over vanilla Qwen2.5-VL in the training-free setting, with especially large gains on simulation failure detection, identification, and localization, while remaining competitive with a RoboFAC-tuned baseline. A small QLoRA fine-tune further improves explanation and correction quality. We also report qualitative results on real dual-arm robots, demonstrating the practical applicability of KITE as a structured and interpretable front-end for robot failure analysis. Code and models are released on our project page: https://m80hz.github.io/kite/

2604.07030 2026-04-09 cs.LG

MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale

Tobias Falke, Nicolas Anastassacos, Samson Tan, Chankrisna Richy Meas, Chandana Satya Prakash, Nitesh Sekhar, M Saiful Bari, Krishna Kompella, Gamaleldin F. Elsayed

详情
英文摘要

Sparse Mixture-of-Experts (MoE) architectures are increasingly popular for frontier large language models (LLM) but they introduce training challenges due to routing complexity. Fully leveraging parameters of an MoE model requires all experts to be well-trained and to specialize in non-redundant ways. Assessing this, however, is complicated due to lack of established metrics and, importantly, many routing techniques exhibit similar performance at smaller sizes, which is often not reflective of their behavior at large scale. To address this challenge, we propose the MoE Routing Testbed, a setup that gives clearer visibility into routing dynamics at small scale while using realistic data. The testbed pairs a data mix with clearly distinguishable domains with a reference router that prescribes ideal routing based on these domains, providing a well-defined upper bound for comparison. This enables quantifiable measurement of expert specialization. To demonstrate the value of the testbed, we compare various MoE routing approaches and show that balancing scope is the crucial factor that allows specialization while maintaining high expert utilization. We confirm that this observation generalizes to models 35x larger.

2604.07029 2026-04-09 physics.soc-ph cs.CY

Quality assessment of a country-wide bicycle node network with loop census analysis

Michael Szell, Anastassia Vybornova, Ane Rahbek Vierø

Comments Main text: 12 pages, 6 figures. SI: 10 pages, 8 figures

详情
英文摘要

Bicycle node networks are regional bicycle networks equipped with a wayfinding system of numbered nodes to ease recreational cycling. They spur sustainable bicycle tourism, economic spending, and local culture. Due to their country-wide scale, implementing bicycle node networks is a considerable effort and investment. Despite this investment, planning is a manual ad-hoc process that follows general design principles, but without clear performance metrics that account for the human cycling experience. Here we analyze a 28,215 km long bicycle node network spanning Denmark, developing and studying such metrics. First, a spatial analysis of geometric and topological properties reveals high heterogeneity and local clusters of node density, face loop lengths, gradients, and feature-rich areas. Next, taking the perspective of a recreational cyclist starting at any node on the network, we create a loop census that lists all loops in the network up to day-trip length. The loop census identifies the feasible points on the network from which to take a day trip and quantifies the number of round trip choices, unveiling different levels of choice depending on the considered demographic group. While long-range cyclists can access most of the country with often overabundant choices, cyclists with stronger length and gradient limitations like families with small children can not - which could be overcome by e-bikes. Our open-source analysis methods provide data-driven decision support for bicycle node network planning with the potential to boost the development of rural cycling and cycling tourism.

2604.07027 2026-04-09 cs.LG

Learning to Query History: Nonstationary Classification via Learned Retrieval

Jimmy Gammell, Bishal Thapaliya, Yoon Jung, Riyasat Ohib, Bilel Fehri, Deepayan Chakrabarti

Comments Accepted to ICLR 2026 Workshop on Time Series in the Age of Large Models (TSALM). 12 pages, 6 figures

详情
英文摘要

Nonstationarity is ubiquitous in practical classification settings, leading deployed models to perform poorly even when they generalize well to holdout sets available at training time. We address this by reframing nonstationary classification as time series prediction: rather than predicting from the current input alone, we condition the classifier on a sequence of historical labeled examples that extends beyond the training cutoff. To scale to large sequences, we introduce a learned discrete retrieval mechanism that samples relevant historical examples via input-dependent queries, trained end-to-end with the classifier using a score-based gradient estimator. This enables the full corpus of historical data to remain on an arbitrary filesystem during training and deployment. Experiments on synthetic benchmarks and Amazon Reviews '23 (electronics category) show improved robustness to distribution shift compared to standard classifiers, with VRAM scaling predictably as the length of the historical data sequence increases.

2604.07026 2026-04-09 cs.CV

Not all tokens contribute equally to diffusion learning

Guoqing Zhang, Lu Shi, Wanru Xu, Linna Zhang, Sen Wang, Fangfang Wang, Yigang Cen

详情
英文摘要

With the rapid development of conditional diffusion models, significant progress has been made in text-to-video generation. However, we observe that these models often neglect semantically important tokens during inference, leading to biased or incomplete generations under classifier-free guidance. We attribute this issue to two key factors: distributional bias caused by the long-tailed token frequency in training data, and spatial misalignment in cross-attention where semantically important tokens are overshadowed by less informative ones. To address these issues, we propose Distribution-Aware Rectification and Spatial Ensemble (DARE), a unified framework that improves semantic guidance in diffusion models from the perspectives of distributional debiasing and spatial consistency. First, we introduce Distribution-Rectified Classifier-Free Guidance (DR-CFG), which regularizes the training process by dynamically suppressing dominant tokens with low semantic density, encouraging the model to better capture underrepresented semantic cues and learn a more balanced conditional distribution. This design mitigates the risk of the model distribution overfitting to tokens with low semantic density. Second, we propose Spatial Representation Alignment (SRA), which adaptively reweights cross-attention maps according to token importance and enforces representation consistency, enabling semantically important tokens to exert stronger spatial guidance during generation. This mechanism effectively prevents low semantic-density tokens from dominating the attention allocation, thereby avoiding the dilution of the spatial and distributional guidance provided by high semantic-density tokens. Extensive experiments on multiple benchmark datasets demonstrate that DARE consistently improves generation fidelity and semantic alignment, achieving significant gains over existing approaches.

2604.07025 2026-04-09 math.DS cs.LG cs.NA math.NA

Physics-Informed Functional Link Constrained Framework with Domain Mapping for Solving Bending Analysis of an Exponentially Loaded Perforated Beam

Iswari Sahu, Ramanath Garai, S. Chakraverty

详情
英文摘要

This article presents a novel and comprehensive approach for analyzing bending behavior of the tapered perforated beam under an exponential load. The governing differential equation includes important factors like filling ratio ($α$), number of rows of holes ($N$), tapering parameters ($ϕ$ and $ψ$), and exponential loading parameter ($γ$), providing a realistic and flexible representation of perforated beam configuration. Main goal of this work is to see how well the Domain mapped physics-informed Functional link Theory of Functional Connection (DFL-TFC) method analyses bending response of perforated beam with square holes under exponential loading. For comparison purposes, a corresponding PINN-based formulation is developed. Outcomes clearly show that the proposed DFL-TFC framework gives better results, including faster convergence, reduced computational cost, and improved solution accuracy when compared to the PINN approach. These findings highlight effectiveness and potential of DFL-TFC method for solving complex engineering problems governed by differential equations. Within this framework, hidden layer is replaced by a functional expansion block that enriches input representation via orthogonal polynomial basis functions, and the domain of DE mapped to corresponding domain of orthogonal polynomials. A Constrained Expression (CE), constructed through the Theory of Functional Connections (TFC) using boundary conditions, ensures that constraints are exactly satisfied. In CE, free function is represented using a Functional Link Neural Network (FLNN), which learns to solve resulting unconstrained optimization problem. The obtained results are further validated through the Galerkin and PINN solutions.

2604.07023 2026-04-09 cs.CL

MARS: Enabling Autoregressive Models Multi-Token Generation

Ziqi Jin, Lei Wang, Ziwei Luo, Aixin Sun

Comments 15 pages, 4 fugures

详情
英文摘要

Autoregressive (AR) language models generate text one token at a time, even when consecutive tokens are highly predictable given earlier context. We introduce MARS (Mask AutoRegreSsion), a lightweight fine-tuning method that teaches an instruction-tuned AR model to predict multiple tokens per forward pass. MARS adds no architectural modifications, no extra parameters, and produces a single model that can still be called exactly like the original AR model with no performance degradation. Unlike speculative decoding, which maintains a separate draft model alongside the target, or multi-head approaches such as Medusa, which attach additional prediction heads, MARS requires only continued training on existing instruction data. When generating one token per forward pass, MARS matches or exceeds the AR baseline on six standard benchmarks. When allowed to accept multiple tokens per step, it maintains baseline-level accuracy while achieving 1.5-1.7x throughput. We further develop a block-level KV caching strategy for batch inference, achieving up to 1.71x wall-clock speedup over AR with KV cache on Qwen2.5-7B. Finally, MARS supports real-time speed adjustment via confidence thresholding: under high request load, the serving system can increase throughput on the fly without swapping models or restarting, providing a practical latency-quality knob for deployment.