arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2602.16101 2026-02-19 cs.LG

Axle Sensor Fusion for Online Continual Wheel Fault Detection in Wayside Railway Monitoring

Afonso Lourenço, Francisca Osório, Diogo Risca, Goreti Marreiros

2602.16093 2026-02-19 cs.CL cs.AI

Updating Parametric Knowledge with Context Distillation Retains Post-Training Capabilities

Shankar Padmanabhan, Mustafa Omer Gul, Tanya Goyal

Comments 15 pages. Preprint, under review

2602.16092 2026-02-19 cs.LG cs.CL

Why Any-Order Autoregressive Models Need Two-Stream Attention: A Structural-Semantic Tradeoff

Patrick Pynadath, Ruqi Zhang

2602.16085 2026-02-19 cs.CL cs.AI

Language Statistics and False Belief Reasoning: Evidence from 41 Open-Weight LMs

Sean Trott, Samuel Taylor, Cameron Jones, James A. Michaelov, Pamela D. Rivière

Comments 15 pages, 7 figures, submitted to conference

2602.16066 2026-02-19 cs.AI

Improving Interactive In-Context Learning from Natural Language Feedback

Martin Klissarov, Jonathan Cook, Diego Antognini, Hao Sun, Jingling Li, Natasha Jaques, Claudiu Musat, Edward Grefenstette

2602.16065 2026-02-19 cs.LG cs.AI math.ST stat.ML stat.TH

Can Generative Artificial Intelligence Survive Data Contamination? Theoretical Guarantees under Contaminated Recursive Training

Kevin Wang, Hongqian Niu, Didong Li

2602.16054 2026-02-19 cs.CL cs.LG

CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill

Bradley McDanel, Steven Li, Harshit Khaitan

Comments 15 pages, 8 figures

2602.16052 2026-02-19 cs.LG

MoE-Spec: Expert Budgeting for Efficient Speculative Decoding

Bradley McDanel, Steven Li, Sruthikesh Surineni, Harshit Khaitan

Comments 12 pages, 10 figures

2602.16050 2026-02-19 cs.AI cs.CL

Evidence-Grounded Subspecialty Reasoning: Evaluating a Curated Clinical Intelligence Layer on the 2025 Endocrinology Board-Style Examination

Amir Hosseinian, MohammadReza Zare Shahneh, Umer Mansoor, Gilbert Szeto, Kirill Karlin, Nima Aghaeepour

2602.16039 2026-02-19 cs.AI

How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment

Hang Li, Kaiqi Yang, Xianxuan Long, Fedor Filippov, Yucheng Chu, Yasemin Copur-Gencturk, Peng He, Cory Miller, Namsoo Shin, Joseph Krajcik, Hui Liu, Jiliang Tang

详情

英文摘要

The rapid rise of large language models (LLMs) is reshaping the landscape of automatic assessment in education. While these systems demonstrate substantial advantages in adaptability to diverse question types and flexibility in output formats, they also introduce new challenges related to output uncertainty, stemming from the inherently probabilistic nature of LLMs. Output uncertainty is an inescapable challenge in automatic assessment, as assessment results often play a critical role in informing subsequent pedagogical actions, such as providing feedback to students or guiding instructional decisions. Unreliable or poorly calibrated uncertainty estimates can lead to unstable downstream interventions, potentially disrupting students' learning processes and resulting in unintended negative consequences. To systematically understand this challenge and inform future research, we benchmark a broad range of uncertainty quantification methods in the context of LLM-based automatic assessment. Although the effectiveness of these methods has been demonstrated in many tasks across other domains, their applicability and reliability in educational settings, particularly for automatic grading, remain underexplored. Through comprehensive analyses of uncertainty behaviors across multiple assessment datasets, LLM families, and generation control settings, we characterize the uncertainty patterns exhibited by LLMs in grading scenarios. Based on these findings, we evaluate the strengths and limitations of different uncertainty metrics and analyze the influence of key factors, including model families, assessment tasks, and decoding strategies, on uncertainty estimates. Our study provides actionable insights into the characteristics of uncertainty in LLM-based automatic assessment and lays the groundwork for developing more reliable and effective uncertainty-aware grading systems in the future.

URL PDF HTML ☆

赞 0 踩 0

2602.16037 2026-02-19 cs.AI cs.MA

Optimization Instability in Autonomous Agentic Workflows for Clinical Symptom Detection

Cameron Cagan, Pedram Fard, Jiazi Tian, Jingya Cheng, Shawn N. Murphy, Hossein Estiri

2602.16035 2026-02-19 cs.RO

The Impact of Class Uncertainty Propagation in Perception-Based Motion Planning

Jibran Iqbal Shah, Andrei Ivanovic, Kelly Zhu, Masha Itkina, Rowan McAllister, Igor Gilitschenski, Florian Shkurti

Comments 8 pages, 10 figures. Code and results available at: https://github.com/aivanovic1/uncertainty-trajectron

2602.16023 2026-02-19 cs.CL

A Curious Class of Adpositional Multiword Expressions in Korean

Junghyun Min, Na-Rae Han, Jena D. Hwang, Nathan Schneider

Comments 10 pages. Camera-ready for MWE at EACL 2026

2602.16019 2026-02-19 cs.CV cs.AI

MedProbCLIP: Probabilistic Adaptation of Vision-Language Foundation Model for Reliable Radiograph-Report Retrieval

Ahmad Elallaf, Yu Zhang, Yuktha Priya Masupalli, Jeong Yang, Young Lee, Zechun Cao, Gongbo Liang

Comments Accepted to the 2026 Winter Conference on Applications of Computer Vision (WACV) Workshops

2602.16012 2026-02-19 cs.AI cs.LG math.OC

Towards Efficient Constraint Handling in Neural Solvers for Routing Problems

Jieyi Bi, Zhiguang Cao, Jianan Zhou, Wen Song, Yaoxin Wu, Jie Zhang, Yining Ma, Cathy Wu

Comments Accepted by ICLR 2026

2602.16008 2026-02-19 cs.SD cs.AI cs.CL cs.LG

MAEB: Massive Audio Embedding Benchmark

Adnan El Assadi, Isaac Chung, Chenghao Xiao, Roman Solomatin, Animesh Jha, Rahul Chand, Silky Singh, Kaitlyn Wang, Ali Sartaz Khan, Marc Moussa Nasser, Sufen Fong, Pengfei He, Alan Xiao, Ayush Sunil Munot, Aditya Shrivastava, Artem Gazizov, Niklas Muennighoff, Kenneth Enevoldsen

2602.16006 2026-02-19 cs.CV

BTReport: A Framework for Brain Tumor Radiology Report Generation with Clinically Relevant Features

Juampablo E. Heras Rivera, Dickson T. Chen, Tianyi Ren, Daniel K. Low, Asma Ben Abacha, Alberto Santamaria-Pang, Mehmet Kurt

Comments Accepted to Medical Imaging with Deep Learning (MIDL) 2026

2602.15989 2026-02-19 cs.CV

SAM 3D Body: Robust Full-Body Human Mesh Recovery

Xitong Yang, Devansh Kukreja, Don Pinkus, Anushka Sagar, Taosha Fan, Jinhyung Park, Soyong Shin, Jinkun Cao, Jiawei Liu, Nicolas Ugrinovic, Matt Feiszli, Jitendra Malik, Piotr Dollar, Kris Kitani

Comments Code: https://github.com/facebookresearch/sam-3d-body

2602.15984 2026-02-19 cs.LG

Verifier-Constrained Flow Expansion for Discovery Beyond the Data

Riccardo De Santi, Kimon Protopapas, Ya-Ping Hsieh, Andreas Krause

Comments ICLR 2026

2602.15973 2026-02-19 cs.CV cs.DB

LAND: A Longitudinal Analysis of Neuromorphic Datasets

Gregory Cohen, Alexandre Marcireau

Comments The LAND dataset tool can be accessed via https://neuromorphicsystems.github.io/land/

2602.15972 2026-02-19 cs.LG stat.ML

Fast Online Learning with Gaussian Prior-Driven Hierarchical Unimodal Thompson Sampling

Tianchi Zhao, He Liu, Hongyin Shi, Jinliang Li

2602.15967 2026-02-19 cs.CV

Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

Mohamed Khalil Ben Salah, Philippe Jouvet, Rita Noumeir

2602.15963 2026-02-19 cs.RO

The human intention. A taxonomy attempt and its applications to robotics

J. E. Domínguez-Vidal, Alberto Sanfeliu

Comments Original version submitted to the International Journal of Social Robotics. Final version available on the SORO website

2602.15962 2026-02-19 cs.CV

Automated Re-Identification of Holstein-Friesian Cattle in Dense Crowds

Phoenix Yu, Tilo Burghardt, Andrew W Dowsey, Neill W Campbell

Comments 32 pages, 13 figures, 5 tables

2602.15961 2026-02-19 cs.LG

R$^2$Energy: A Large-Scale Benchmark for Robust Renewable Energy Forecasting under Diverse and Extreme Conditions

Zhi Sheng, Yuan Yuan, Guozhen Zhang, Yong Li

2602.15958 2026-02-19 cs.CL cs.AI cs.CV

DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting

Md Mofijul Islam, Md Sirajus Salekin, Nivedha Balakrishnan, Vincil C. Bishop, Niharika Jain, Spencer Romo, Bob Strahan, Boyi Xie, Diego A. Socolinsky

2602.15955 2026-02-19 cs.LG stat.AP

Adaptive Semi-Supervised Training of P300 ERP-BCI Speller System with Minimum Calibration Effort

Shumeng Chen, Jane E. Huggins, Tianwen Ma

Comments 8 pages, 8 figures

2602.15954 2026-02-19 cs.RO cs.AI

Hybrid Model Predictive Control with Physics-Informed Neural Network for Satellite Attitude Control

Carlo Cena, Mauro Martini, Marcello Chiaberge

Comments Paper in peer-review. Copyright notice may change

2602.15927 2026-02-19 cs.CV cs.LG

Visual Memory Injection Attacks for Multi-Turn Conversations

Christian Schlarmann, Matthias Hein

2602.15926 2026-02-19 cs.CV cs.LG

A Study on Real-time Object Detection using Deep Learning

Ankita Bose, Jayasravani Bhumireddy, Naveen N

Comments 34 pages, 18 figures