arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.00206 2026-05-04 cs.LG cs.CL

State Stream Transformer (SST) V2: Parallel Training of Nonlinear Recurrence for Latent Space Reasoning

Thea Aviss

Comments 48 pages, 21 figures

详情

英文摘要

Current transformers discard their rich latent residual stream between positions, reconstructing latent reasoning context at each new position and leaving potential reasoning capacity untapped. The State Stream Transformer (SST) V2 enables parameter-efficient reasoning in continuous latent space through an FFN-driven nonlinear recurrence at each decoder layer, where latent states are streamed horizontally across the full sequence via a learned blend. This same mechanism supports continuous latent deliberation per position at inference time, dedicating additional FLOPs to exploring abstract reasoning before committing to a token. A two-pass parallel training procedure resolves the sequential dependency of the recurrence to allow compute-efficient training. Hidden state analysis shows the state stream facilitates reasoning through exploration of distinct semantic basins in continuous latent space, where transitions at content-dependent positions move the model into a substantially different Bayesian posterior, directly influencing the latent space at future positions. We also find, via a learned probe, that at the first generated token position, the latent state already predicts whether the eventual answer will survive or break under additional latent computation for every subsequent position. Co-trained into an existing 27B backbone using only a small dataset of GSM8K examples, the SST delivers a +15.15 point gain over a fine-tuning-matched baseline on out-of-distribution GPQA-Diamond and cuts that same baseline's remaining GSM8K errors by 46%, together showing that the reasoning improvement is attributable to the architectural mechanism rather than scale or training data. On GPQA-Diamond, the resulting 27B SST also achieves higher accuracy than several larger open-weight and proprietary systems, including open-weight models up to 25 times larger.

URL PDF HTML ☆

赞 0 踩 0

2605.00193 2026-05-04 cs.LG stat.ML

OTSS: Output-Targeted Soft Segmentation for Contextual Decision-Weight Learning

Renjun Hu, Hyun-Soo Ahn

Comments 23 pages, 2 figures

2605.00184 2026-05-04 cs.LG cs.HC

Introducing WARM-VR: Benchmark Dataset for Multimodal Wearable Affect Recognition in Virtual Reality

Karim Alghoul, Faisal Mohd, Fedwa Laamarti, Hussein Al Osman, Abdulmotaleb El Saddik

详情

英文摘要

With the growing integration of human-computer interaction into everyday life, advances in machine learning have enabled systems to better perceive and respond to users' emotional states. Most existing affect recognition datasets focus on static environments, limiting their applicability to immersive multimedia contexts such as Virtual Reality (VR). In this paper, we introduce WARM-VR, a novel publicly available multimodal dataset designed to support affect recognition in immersive, multisensory environments using wearable sensing instrumentation. Data were collected from 31 participants aged 19-37 using wearable sensors: a wristband measuring Blood Volume Pulse (BVP), EDA, skin Temperature, three-axis Acceleration, and a chest strap recording ECG signals. Participants engaged in immersive VR experiences designed to elicit relaxation through a calming beach environment following stress induction via an arithmetic task. These sessions incorporated synchronized multimedia stimuli: visual, auditory, and olfactory. Affective states were assessed subjectively through validated self-report questionnaires and objectively through the analysis of physiological measurements. Statistical analysis of the questionnaires confirmed that VR relaxation significantly reduced negative affect, particularly with olfactory enhancement. Furthermore, we established a benchmark on the dataset using widely recognized machine learning algorithms. The best performance for binary classification from BVP data of valence, was obtained with a CNN and a CNN-Bi-GRU model, both achieving an average F1-score of 0.63 and an AUC of 0.69. For arousal, a lightweight Transformer architecture provided the most balanced results (F1-0 0.54 and F1-1 0.63), outperforming recurrent hybrids. In the relaxation task, a CNN-Bi-GRU model reached the highest overall performance (average F1-score 0.64, AUC 0.69).

URL PDF HTML ☆

赞 0 踩 0

2605.00159 2026-05-04 cs.RO

E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

Kaiyan Zhao, Borong Zhang, Yiming Wang, Xingyu Liu, Xuetao Li, Yuyang Chen, Xiaoguang Niu

Comments ICRA2026 accepted

2605.00147 2026-05-04 cs.CV

From Images2Mesh: A 3D Surface Reconstruction Pipeline for Non-Cooperative Space Objects

Bala Prenith Reddy Gopu, Patrick Quinn, George M. Nehma, Madhur Tiwari, Matt Ueckermann, David Hinckley, Christopher McKenna

Comments 25 Pages, 16 Figures

2605.00146 2026-05-04 cs.CV

Real-Time Frame- and Event-based Object Detection with Spiking Neural Networks on Edge Neuromorphic Hardware: Design, Deployment and Benchmark

Udayanga G. W. K. N. Gamage, Yan Zeng, Cesar Cadena, Matteo Fumagalli, Silvia Tolu

Comments 11 figures, 7 tables, 53 pages

2605.00143 2026-05-04 cs.CL

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

Yuxi Ma, Yongqian Peng, Junchen Lyu, Chi Zhang, Yixin Zhu

Comments to be published in CogSci 2026

2605.00140 2026-05-04 cs.LG cs.CL cs.CV

Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization

YiFeng Wang, Zhun Sun, Keisuke Sakaguchi

2605.00136 2026-05-04 cs.AI

Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

Kaituo Zhang, Zhen Xiong, Mingyu Zhong, Zhimeng Jiang, Zhouyuan Yuan, Zhecheng Li, Ying Lin

2605.00133 2026-05-04 cs.LG cs.AI cs.ET

Smart Profit-Aware Crop Advisory System: Kisan AI

Debasis Dwibedy, Avyay Nishtala, Pranathi Mukku, D Snehaja

Comments 13 pages, 8 figures, 5 tables

2605.00130 2026-05-04 cs.LG

Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization

Huayu Li, ZhengXiao He, Xiwen Chen, Jingjing Wang, Siyuan Tian, Jinghao Wen, Ao Li

2605.00126 2026-05-04 cs.LG eess.SP stat.ML

SPLICE: Latent Diffusion over JEPA Embeddings for Conformal Time-Series Inpainting

Arnaud Zinflou

2605.00123 2026-05-04 cs.AI

Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models

Shubham Kumar, Narendra Ahuja

Comments Pre-print

2605.00121 2026-05-04 cs.RO

Predictive Spatio-Temporal Scene Graphs for Semi-Static Scenes

Miguel Saavedra-Ruiz, Charlie Gauthier, Kumaraditya Gupta, Shima Shahfar, Kirsty Ellis, Steven Parkison, Liam Paull

2605.00120 2026-05-04 cs.CV cs.CR cs.LG

GAFSV-Net: A Vision Framework for Online Signature Verification

Himanshu Singhal, Suresh Sundaram

2605.00119 2026-05-04 cs.CL cs.AI

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

Muhammad Dehan Al Kautsar, Saeed Almheiri, Momina Ahsan, Bilal Elbouardi, Younes Samih, Sarfraz Ahmad, Amr Keleg, Omar El Herraoui, Kareem Elzeky, Abed Alhakim Freihat, Mohamed Anwar, Zhuohan Xie, Junhong Liang, Mohammad Rustom Al Nasar, Preslav Nakov, Fajri Koto

Comments 23 pages, 7 figures, 16 tables

2605.00116 2026-05-04 cs.CL cs.AI cs.LG

ViLegalNLI: Natural Language Inference for Vietnamese Legal Texts

Nhung Thi-Hong Duong, Mai Ngoc Ho, Tin Van Huynh, Kiet Van Nguyen

Comments 33 pages, 17 figures

详情

英文摘要

In this article, we introduce ViLegalNLI, the first large-scale Vietnamese Natural Language Inference (NLI) dataset specifically constructed for the legal domain. The dataset consists of 42,012 premise-hypothesis pairs derived from official statutory documents and annotated with binary inference labels (Entailment and Non-entailment). It covers multiple legal domains and reflects realistic legal reasoning scenarios characterized by structured logic, conditional clauses, and domain-specific terminology. To construct ViLegalNLI, we propose a semi-automatic data generation framework that integrates large language models for controlled hypothesis generation and systematic quality validation procedures. The framework incorporates artifact mitigation strategies and cross-model validation to improve annotation reliability and ensure legal consistency. The resulting dataset captures diverse reasoning patterns, including paraphrasing, logical implication, and legally invalid inferences, thereby providing a comprehensive benchmark for Vietnamese legal inference tasks. We conduct extensive experiments on the ViLegalNLI using multilingual models, Vietnamese-specific pretrained language models, and instruction-tuned large language models. The results show that few-shot LLM configurations consistently achieve superior performance, while performance is significantly influenced by hypothesis length, lexical overlap, and reasoning complexity. Cross-domain evaluations further reveal the challenges of generalizing legal inference across distinct legal fields. Overall, ViLegalNLI establishes a foundational benchmark for Vietnamese legal NLI and supports future research in legal reasoning, statutory text understanding, and the development of reliable AI systems for legal analysis and decision support. The dataset is publicly available for research purposes.

URL PDF HTML ☆

赞 0 踩 0

2605.00113 2026-05-04 cs.CL cs.AI cs.HC

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses

Ishan Gupta, Pavlo Buryi

Comments 15 pages, 3 figures, 2 tables. Benchmark, code, and data available at https://github.com/ishansgupta/ndbench

2605.00111 2026-05-04 cs.CV cs.AI

AIDA-ReID: Adaptive Intermediate Domain Adaptation for Generalizable and Source-Free Person Re-Identification

Sundas Iqbal, Qing Tian, Danish Ali, Jianping Gou, Weihua Oue

2605.00086 2026-05-04 cs.CL cs.AI

NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

Enzo S. N. Silva, Pablo B. Costa, Raphael C. Vlasman, Rosimeire P. Costa, Henrique L. P. Silva, Lucas F. A. O. Pellicer, Guilherme Rinaldo, Renato A. Almeida, Darian S. R. Rabbani, Cinthya O. Oestreich, Vinicius F. Caridá

Comments This article has already undergone formal submission, review, acceptance, and publication in the proceedings of PROPOR 2026: Proceedings of the 17th International Conference on Computational Processing of Portuguese, Vol. 1. The published version is available in the ACL Anthology at https://aclanthology.org/2026.propor-1.18/ 11 pages, 9 tables, 2 figures

2605.00083 2026-05-04 cs.LG

Comparative Analysis of Polygon-Based and Global Machine Learning Models for Bus Occupancy Prediction

Daniel Azenkot, Michael Fire, Eran Ben Elia

2605.00082 2026-05-04 cs.LG cs.AI

Hyperspherical Forward-Forward with Prototypical Representations

Shalini Sarode, Brian Moser, Joachim Folz, Federico Raue, Tobias Nauen, Stanislav Frolov, Andreas Dengel

Comments 22 pages, 5 figures

2605.00080 2026-05-04 cs.RO cs.CV

World Model for Robot Learning: A Comprehensive Survey

Bohan Hou, Gen Li, Jindou Jia, Tuo An, Xinying Guo, Sicong Leng, Haoran Geng, Yanjie Ze, Tatsuya Harada, Philip Torr, Oier Mees, Marc Pollefeys, Zhuang Liu, Jiajun Wu, Pieter Abbeel, Jitendra Malik, Yilun Du, Jianfei Yang

Comments 43 pages, 6 figures

2605.00078 2026-05-04 cs.RO cs.CV cs.LG

Being-H0.7: A Latent World-Action Model from Egocentric Videos

Hao Luo, Wanpeng Zhang, Yicheng Feng, Sipeng Zheng, Haiweng Xu, Chaoyi Xu, Ziheng Xi, Yuhui Fu, Zongqing Lu

2605.00073 2026-05-04 cs.AI

AgentReputation: A Decentralized Agentic AI Reputation Framework

Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li

Comments 5 pages, 1 figure, accepted to FSE 2026, Ideas, Visions and Reflections track

详情

DOI: 10.1145/3803437.3805579

英文摘要

Decentralized, agentic AI marketplaces are rapidly emerging to support software engineering tasks such as debugging, patch generation, and security auditing, often operating without centralized oversight. However, existing reputation mechanisms fail in this setting for three fundamental reasons: agents can strategically optimize against evaluation procedures; demonstrated competence does not reliably transfer across heterogeneous task contexts; and verification rigor varies widely, from lightweight automated checks to costly expert review. Current approaches to reputation drawing on federated learning, blockchain-based AI platforms, and large language model safety research are unable to address these challenges in combination. We therefore propose \textbf{AgentReputation}, a decentralized, three-layer reputation framework for agentic AI systems. The framework separates task execution, reputation services, and tamper-proof persistence to both leverage their respective strengths and enable independent evolution. The framework introduces explicit verification regimes linked to agent reputation metadata, as well as context-conditioned reputation cards that prevent reputation conflation across domains and task types. In addition, AgentReputation provides a decision-facing policy engine that supports resource allocation, access control, and adaptive verification escalation based on risk and uncertainty. Building on this framework, we outline several future research directions, including the development of verification ontologies, methods for quantifying verification strength, privacy-preserving evidence mechanisms, cold-start reputation bootstrapping, and defenses against adversarial manipulation.

URL PDF HTML ☆

赞 0 踩 0

2605.00070 2026-05-04 cs.LG

CRADIPOR: Crash Dispersion Predictor

Edgar Chaillou, Sebastian Rodriguez, Yves Tourbier, Francisco Chinesta

2605.00069 2026-05-04 cs.LG

Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series

Christopher Holder, Anthony Bagnall

2605.00068 2026-05-04 cs.LG cs.AI physics.plasm-ph

Human-in-the-Loop Meta Bayesian Optimization for Fusion Energy and Scientific Applications

Ricardo Luna Gutierrez, Sahand Ghorbanpour, Ejaz Rahman, Varchas Gopalaswamy, Riccardo Betti, Vineet Gundecha, Aarne Lees, Soumyendu Sarkar

Comments Accepted at IJCAI 2026 (35th International Joint Conference on Artificial Intelligence)

2605.00066 2026-05-04 cs.RO

Do Open-Loop Metrics Predict Closed-Loop Driving? A Cross-Benchmark Correlation Study of NAVSIM and Bench2Drive

Yiru Wang, Anqing Jiang, Shuo Wang, Yuwen Heng, Hai Yang, Yang Chen, Hao Sun

详情

英文摘要

Open-loop evaluation offers fast, reproducible assessment of autonomous driving planners, but its ability to predict real closed-loop driving performance remains questionable. Prior work has shown that traditional open-loop metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE) exhibit no reliable correlation with closed-loop Driving Score. In this paper, we ask whether the more recent, safety-aware open-loop metrics introduced by NAVSIM~v2 can bridge this gap. By systematically cross-referencing published results from 15 state-of-the-art methods across NAVSIM (open-loop) and Bench2Drive (closed-loop), we compile a paired dataset of open-loop sub-metrics and closed-loop performance, yielding 8 methods with complete paired data. Our analysis reveals three key findings: (1) the aggregate NAVSIM PDM Score shows a strong positive but non-monotonic correlation with Bench2Drive Driving Score, with clear ranking inversions; (2) among individual NAVSIM sub-metrics, Ego Progress (EP) is the strongest single predictor of closed-loop success, substantially exceeding the safety-critical collision metric NC; (3) the safety-progress trade-off manifests differently in open-loop and closed-loop: methods that maximize safety at the expense of progress rank highly in NAVSIM but underperform in closed-loop due to timeout and slow-driving penalties. We further demonstrate that a much simpler 3-metric formula matches the predictive power of the full 5-metric PDMS at the same Spearman $ρ{=}0.90$ on our paired sample of $n{=}8$ methods, suggesting that within current state-of-the-art methods -- where TTC and Comfort approach saturation -- these two sub-metrics add little marginal information for closed-loop ranking. Additionally, we identify the snowball effect -- where small open-loop deviations compound into closed-loop failures -- as a candidate mechanism for the residual gap.

URL PDF HTML ☆

赞 0 踩 0

2605.00064 2026-05-04 cs.LG

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent with Predictable Virtual Noise

Mohammad Partohaghighi

详情

英文摘要

Information-theoretic generalization bounds analyze stochastic optimization by relating expected generalization error to the mutual information between learned parameters and training data. Virtual perturbation analyses of SGD add auxiliary Gaussian noise only in the proof, making mutual information tractable while leaving the actual SGD trajectory unchanged. Existing bounds, however, typically require perturbation covariances to be fixed independently of the optimization history, limiting their ability to represent geometries induced by moving gradient statistics, preconditioners, curvature proxies, and other pathwise information. We introduce predictable history-adaptive virtual perturbations, where the perturbation covariance at each iteration may depend on the past real SGD history but not on current or future randomness. This predictability enables a conditional Gaussian relative-entropy argument and yields generalization bounds for SGD with adaptive virtual-noise geometry. The bounds replace fixed sensitivity and gradient-deviation terms with conditional adaptive counterparts, include an output-sensitivity penalty from accumulated perturbation covariance, and reduce the deviation term to a conditional variance only under conditional unbiasedness. Since adaptive covariances may be data-dependent, we separate local Gaussian smoothing from global reference-kernel comparison. The resulting bound includes a covariance-comparison cost measuring the KL price of using an admissible reference geometry different from the actual adaptive covariance. Fixed-noise-style bounds are recovered under admissible synchronization, such as deterministic, public, or prefix-observable covariance rules. The framework recovers fixed isotropic and geometry-aware bounds as special cases while extending virtual perturbation analysis to history-dependent SGD without modifying the algorithm.

URL PDF HTML ☆

赞 0 踩 0