arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.24758 2026-04-28 cs.HC cs.AI cs.CY cs.ET cs.LG

Personalized Worked Example Generation from Student Code Submissions using Pattern-based Knowledge Components

Griffin Pitts, Muntasir Hoq, Peter Brusilovsky, Narges Norouzi, Arto Hellas, Juho Leinonen, Bita Akram

Comments Accepted to the Thirteenth ACM Conference on Learning @ Scale (L@S 2026)

2604.24756 2026-04-28 cs.GT cs.DS

A Strongly Polynomial Algorithm for Arctic Auctions

Jugal Garg, Shayan Taherijam, Vijay V Vazirani

Comments 52 pages

2604.24749 2026-04-28 cs.LG stat.ML

The Optimal Sample Complexity of Multiclass and List Learning

Chirag Pabbaraju

2604.24745 2026-04-28 cs.LG

Conflict-Aware Harmonized Rotational Gradient for Multiscale Kinetic Regimes

Zhangyong Liang

2604.24737 2026-04-28 cs.LG cs.AI cs.CC stat.ML

Learning to Think from Multiple Thinkers

Nirmit Joshi, Roey Magen, Nathan Srebro, Nikolaos Tsilivis, Gal Vardi

Comments Comments are welcome. There are 78 pages and 5 Figures

2604.24732 2026-04-28 cs.GT

Distributional Robustness of Linear Contracts

Shiliang Zuo

2604.24731 2026-04-28 math.NA cs.NA

Error analysis for the approximation of a flow in deformable porous media with nonlinear strain-stress relation

Andrea Bonito, Vivette Girault, Diane Guignard

2604.24729 2026-04-28 cs.LG

SpecRLBench: A Benchmark for Generalization in Specification-Guided Reinforcement Learning

Zijian Guo, İlker Işık, H. M. Sabbir Ahmad, Wenchao Li

2604.24726 2026-04-28 cs.CE cs.SY eess.SY

VEHRON: A Configuration-Driven BEV Simulation Framework for Subsystem-Level Studies

Subramanyam Natarajan

Comments 12 pages, 3 figures, 5 tables; software paper

2604.24720 2026-04-28 cs.CL

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi, Martin Clinton Tosima Manullang

Comments 8 pages, 5 figures, 4 tables. Final project for Natural Language Processing course (PBA 2026) at Institut Teknologi Sumatera

2604.24719 2026-04-28 cs.CV

DiffuSAM: Diffusion-Based Prompt-Free SAM2 for Few-Shot and Source-Free Medical Image Segmentation

Tal Grossman, Noa Cahan, Lev Ayzenberg, Hayit Greenspan

2604.24718 2026-04-28 cs.CV

WildLIFT: Lifting monocular drone video to 3D for species-agnostic wildlife monitoring

Vandita Shukla, Fabio Remondino, Blair Costelloe, Benjamin Risse

2604.24717 2026-04-28 cs.AI

Learning to Rotate: Temporal and Semantic Rotary Encoding for Sequential Modeling

Hailing Cheng, Daqi Sun, Xinyu Lu

Comments 8 pages, 3 figures

详情

英文摘要

Every Transformer architecture dedicates enormous capacity to learning rich representations in semantic embedding space -- yet the rotation manifold acted upon by Rotary Positional Embeddings (RoPE) has been treated as a fixed, hand-crafted structure, populated only by discrete ordinal indices. We argue that this rotation space is a largely overlooked second dimension of expressivity in the attention mechanism, one whose systematic exploration may open a new door for attention-based architectures. The analogy to complex numbers is instructive: just as introducing the imaginary axis -- orthogonal to and independent of the real line -- unlocked new algebraic structure once believed impossible, treating the rotation manifold as a learnable, signal-conditioned space opens an orthogonal degree of freedom in attention. In this framing, the token embedding encodes the semantic (real) component of a representation -- what a token means -- while the rotation encodes its dynamic (imaginary) component -- how it relates to every other token across time, position, and context. We introduce SIREN-RoPE, a concrete instantiation of this idea, which populates the rotation dimension with heterogeneous signals -- continuous timestamps, cyclical temporal patterns, and categorical metadata -- via a dual-branch Sinusoidal Representation Network (SIREN). As a proof of concept, we evaluate on a production-scale news feed dataset from a major social network using a generative recommender as the ranking model, demonstrating that activating this hidden dimension yields consistent improvements across calibration and ranking objectives with negligible computational overhead. We invite the community to view the rotation space not as a solved positional-encoding detail, but as an untapped axis whose rich structure may prove as consequential for attention as the imaginary unit proved for algebra.

URL PDF HTML ☆

赞 0 踩 0

2604.24715 2026-04-28 cs.CL cs.LG

Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling

Parsa Ashrafi Fashi, Utkarsh Saxena, Mehdi Rezagholizadeh, Aref Jafari, Akash Haridas, Mingyu Yang, Vansh Bhatia, Guihong Li, Vikram Appia, Emad Barsoum

2604.24712 2026-04-28 cs.SE

When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

Amal AKLI, Mike PAPADAKIS, Maxime CORDY, Yves Le TRAON

详情

英文摘要

Large language models are increasingly used for code generation, yet the correctness of their outputs depends not only on model capability but also on how tasks are specified. Prior studies demonstrate that small changes in natural language prompts, particularly under-specification can substantially reduce code correctness; however, these findings are largely based on minimal-specification benchmarks such as HumanEval and MBPP, where limited structural redundancy may exaggerate sensitivity. In this exploratory study, we investigate how prompt structure, task complexity, and specification richness interact with LLM robustness to prompt mutations. We evaluate 10 different models across HumanEval and the structurally richer LiveCodeBench. Our results reveal that robustness is not a fixed property of LLMs but is highly dependent on prompt structure: the same under-specification mutations that degrade performance on HumanEval have near-zero net effect on LiveCodeBench due to redundancy across descriptions, constraints, examples, and I/O conventions. Surprisingly, we also find that prompt mutations can improve correctness. In LiveCodeBench, under-specification often breaks misleading lexical or structural cues that trigger incorrect retrieval-based solution strategies, leading to correctness improvements that counterbalance degradations. Manual analysis identifies consistent mechanisms behind these improvements, including the disruption of over-fitted terminology, removal of misleading constraints, and elimination of spurious identifier triggers. Overall, our study shows that structurally rich task descriptions can substantially mitigate the negative effects of under-specification and, in some cases, even enhance correctness. We outline categories of prompt modifications that positively influence the behavior of LLM code-generation, offering practical insights for writing robust prompts.

URL PDF HTML ☆

赞 0 踩 0

2604.24710 2026-04-28 cs.AI cs.CL

Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters

Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez

Comments 14 pages, 2 figures, 3 tables, submitted to JAMIA

详情

英文摘要

Objective. Clinical AI documentation systems require evaluation methodologies that are clinically valid, economically viable, and sensitive to iterative changes. Methods requiring expert review per scoring instance are too slow and expensive for safe, iterative deployment. We present a case-specific, clinician-authored rubric methodology for clinical AI evaluation and examine whether LLM-generated rubrics can approximate clinician agreement. Materials and Methods. Twenty clinicians authored 1,646 rubrics for 823 clinical cases (736 real-world, 87 synthetic) across primary care, psychiatry, oncology, and behavioral health. Each rubric was validated by confirming that an LLM-based scoring agent consistently scored clinician-preferred outputs higher than rejected ones. Seven versions of an EHR-embedded AI agent for clinicians were evaluated across all cases. Results. Clinician-authored rubrics discriminated effectively between high- and low-quality outputs (median score gap: 82.9%) with high scoring stability (median range: 0.00%). Median scores improved from 84% to 95%. In later experiments, clinician-LLM ranking agreement (tau: 0.42-0.46) matched or exceeded clinician-clinician agreement (tau: 0.38-0.43), attributable to both ceiling compression and LLM rubric improvement. Discussion. This convergence supports incorporating LLM rubrics alongside clinician-authored ones. At roughly 1,000 times lower cost, LLM rubrics enable substantially greater evaluation coverage, while continued clinical authorship grounds evaluation in expert judgment. Ceiling compression poses a methodological challenge for future inter-rater agreement studies. Conclusion. Case-specific rubrics offer a path for clinical AI evaluation that preserves expert judgment while enabling automation at three orders lower cost. Clinician-authored rubrics establish the baseline against which LLM rubrics are validated.

URL PDF HTML ☆

赞 0 踩 0

2604.24708 2026-04-28 cs.LG cs.AI

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

Hailing Cheng, Tao Huang, Chen Zhu, Antonio Alonso

Comments 8 pages, 2 figures

2604.24707 2026-04-28 cs.RO

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Ali Tourani, Miguel Fernandez-Cortizas, Saad Ejaz, David Pérez Saura, Asier Bikandi-Noya, Jose Luis Sanchez-Lopez, Holger Voos

Comments 5 pages, 5 figures

2604.24706 2026-04-28 eess.SY cs.LG cs.RO cs.SY

Exploiting Differential Flatness for Efficient Learning-based Model Predictive Control of Constrained Multi-Input Control Affine Systems

Tobias A. Farger, Adam W. Hall, Angela P. Schoellig

Comments Accepted for publication in 2026 European Control Conference

2604.24705 2026-04-28 econ.EM cs.LG

Energy-Arena: A Dynamic Benchmark for Operational Energy Forecasting

Max Kleinebrahm, Jonathan Berrisch, Philipp Eiser, Wolf Fichtner, Veit Hagenmeyer, Matthias Hertel, Nils Koster, Sebastian Lerch, Ralf Mikut, Jan Priesmann, Melanie Schienle, Benjamin Schaefer, Jann Weinand, Florian Ziel

Comments 6 pages, 5 figures, 1 table. Submitted to the European Electricity Markets (EEM) conference

2604.24703 2026-04-28 cs.SE cs.AI

Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon

2604.24701 2026-04-28 cs.CR

Profiling Resilient to Change in Probe Position

Elie Bursztein, Michael Gruber, Karel Král, Jean-Michel Picod, Matthias Probst, Georg Sigl

2604.24700 2026-04-28 cs.CL cs.AI

Green Shielding: A User-Centric Approach Towards Trustworthy AI

Aaron J. Li, Nicolas Sanchez, Hao Huang, Ruijiang Dong, Jaskaran Bains, Katrin Jaradeh, Zhen Xiang, Bo Li, Feng Liu, Aaron Kornblith, Bin Yu

2604.24698 2026-04-28 cs.CL

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

Yunze Xiao, Vivienne J. Zhang, Chenghao Yang, Ningshan Ma, Weihao Xuan, Jen-tse Huang

2604.24693 2026-04-28 cs.CL

Contextual Linear Activation Steering of Language Models

Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin

2604.24692 2026-04-28 cs.LG

Diffusion-Guided Feature Selection via Nishimori Temperature: Noise-Based Spectral Embedding

Vasiliy S. Usatyuk, Denis A. Sapozhnikov, Sergey I. Egorov

Comments 8 pages, 3 figures, extended version (with noise shift proof) of DSPA2026 article

2604.24691 2026-04-28 eess.SY cs.SY

Reachability Analysis of the State Transition and State Covariance Matrices for an LTV System

Fengjiao Liu, Yixiao Zhang, Panagiotis Tsiotras

Comments 12 pages, 2 figures

2604.24690 2026-04-28 cs.CL

Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination

Lirong Gao, Zeqing Wang, Yuyan Cai, Jiayi Deng, Yanmei Gu, Yiming Zhang, Jia Zhou, Yanfei Zhang, Junbo Zhao

Comments Accepted at ACL 2026

2604.24686 2026-04-28 cs.AI

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

German Marin, Jatin Chaudhary

2604.24685 2026-04-28 cs.CV

Aycromo: An Open-Source Platform for Automatic Chromosome Detection in Metaphase Images Based on Deep Learning

Jorge L. A. Lima, Filipe R. Cordeiro

Comments Accepted at SBCAS'26