arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2602.05393 2026-02-06 cs.CL cs.LG

Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better

Ji Zhao, Yufei Gu, Shitong Shao, Xun Zhou, Liang Xiang, Zeke Xie

2602.05392 2026-02-06 cs.CL cs.AI

Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child Utterances

Jiyun Chun, Eric Fosler-Lussier, Michael White, Andrew Perrault

2602.05390 2026-02-06 cs.LG cs.AI

Assessing Electricity Demand Forecasting with Exogenous Data in Time Series Foundation Models

Wei Soon Cheong, Lian Lian Jiang, Jamie Ng Suat Ling

Comments 9 pages, 1 Figure and 3 Tables. Accepted to AI4TS Workshop @ AAAI'26 as an oral presentation (see https://ai4ts.github.io/aaai2026)

2602.05389 2026-02-06 cs.LG

A Decomposition-based State Space Model for Multivariate Time-Series Forecasting

Shunya Nagashima, Shuntaro Suzuki, Shuitsu Koyama, Shinnosuke Hirano

Comments ICASSP2026

2602.05387 2026-02-06 cs.CV cs.LG

Parallel Swin Transformer-Enhanced 3D MRI-to-CT Synthesis for MRI-Only Radiotherapy Planning

Zolnamar Dorjsembe, Hung-Yi Chen, Furen Xiao, Hsing-Kuo Pao

2602.05385 2026-02-06 cs.CL

IESR:Efficient MCTS-Based Modular Reasoning for Text-to-SQL with Large Language Models

Tao Liu, Jiafan Lu, Bohan Yu, Pengcheng Wu, Liu Haixin, Guoyu Xu, Li Xiangheng, Lixiao Li, Jiaming Hou, Zhao Shijun, Xinglin Lyu, Kunli Zhang, Yuxiang Jia, Hongyin Zan

Comments 25 pages, 16 figures, 8 tables. Hongyin Zan is corresponding author, Jiafan Lu is first co-author

2602.05384 2026-02-06 cs.CV

Dolphin-v2: Universal Document Parsing via Scalable Anchor Prompting

Hao Feng, Wei Shi, Ke Zhang, Xiang Fei, Lei Liao, Dingkang Yang, Yongkun Du, Xuecheng Wu, Jingqun Tang, Yang Liu, Hong Chen, Can Huang

详情

英文摘要

Document parsing has garnered widespread attention as vision-language models (VLMs) advance OCR capabilities. However, the field remains fragmented across dozens of specialized models with varying strengths, forcing users to navigate complex model selection and limiting system scalability. Moreover, existing two-stage approaches depend on axis-aligned bounding boxes for layout detection, failing to handle distorted or photographed documents effectively. To this end, we present Dolphin-v2, a two-stage document image parsing model that substantially improves upon the original Dolphin. In the first stage, Dolphin-v2 jointly performs document type classification (digital-born versus photographed) alongside layout analysis. For digital-born documents, it conducts finer-grained element detection with reading order prediction. In the second stage, we employ a hybrid parsing strategy: photographed documents are parsed holistically as complete pages to handle geometric distortions, while digital-born documents undergo element-wise parallel parsing guided by the detected layout anchors, enabling efficient content extraction. Compared with the original Dolphin, Dolphin-v2 introduces several crucial enhancements: (1) robust parsing of photographed documents via holistic page-level understanding, (2) finer-grained element detection (21 categories) with semantic attribute extraction such as author information and document metadata, and (3) code block recognition with indentation preservation, which existing systems typically lack. Comprehensive evaluations are conducted on DocPTBench, OmniDocBench, and our self-constructed RealDoc-160 benchmark. The results demonstrate substantial improvements: +14.78 points overall on the challenging OmniDocBench and 91% error reduction on photographed documents, while maintaining efficient inference through parallel processing.

URL PDF HTML ☆

赞 0 踩 0

2602.05382 2026-02-06 cs.CV cs.LG

VRIQ: Benchmarking and Analyzing Visual-Reasoning IQ of VLMs

Tina Khezresmaeilzadeh, Jike Zhong, Konstantinos Psounis

2602.05381 2026-02-06 cs.AI

Clinical Validation of Medical-based Large Language Model Chatbots on Ophthalmic Patient Queries with LLM-based Evaluation

Ting Fang Tan, Kabilan Elangovan, Andreas Pollreisz, Kevin Bryan Dy, Wei Yan Ng, Joy Le Yi Wong, Jin Liyuan, Chrystie Quek Wan Ning, Ashley Shuen Ying Hong, Arun James Thirunavukarasu, Shelley Yin-His Chang, Jie Yao, Dylan Hong, Wang Zhaoran, Amrita Gupta, Daniel SW Ting

详情

英文摘要

Domain specific large language models are increasingly used to support patient education, triage, and clinical decision making in ophthalmology, making rigorous evaluation essential to ensure safety and accuracy. This study evaluated four small medical LLMs Meerkat-7B, BioMistral-7B, OpenBioLLM-8B, and MedLLaMA3-v20 in answering ophthalmology related patient queries and assessed the feasibility of LLM based evaluation against clinician grading. In this cross sectional study, 180 ophthalmology patient queries were answered by each model, generating 2160 responses. Models were selected for parameter sizes under 10 billion to enable resource efficient deployment. Responses were evaluated by three ophthalmologists of differing seniority and by GPT-4-Turbo using the S.C.O.R.E. framework assessing safety, consensus and context, objectivity, reproducibility, and explainability, with ratings assigned on a five point Likert scale. Agreement between LLM and clinician grading was assessed using Spearman rank correlation, Kendall tau statistics, and kernel density estimate analyses. Meerkat-7B achieved the highest performance with mean scores of 3.44 from Senior Consultants, 4.08 from Consultants, and 4.18 from Residents. MedLLaMA3-v20 performed poorest, with 25.5 percent of responses containing hallucinations or clinically misleading content, including fabricated terminology. GPT-4-Turbo grading showed strong alignment with clinician assessments overall, with Spearman rho of 0.80 and Kendall tau of 0.67, though Senior Consultants graded more conservatively. Overall, medical LLMs demonstrated potential for safe ophthalmic question answering, but gaps remained in clinical depth and consensus, supporting the feasibility of LLM based evaluation for large scale benchmarking and the need for hybrid automated and clinician review frameworks to guide safe clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2602.05374 2026-02-06 cs.CL cs.LG

Cross-Lingual Empirical Evaluation of Large Language Models for Arabic Medical Tasks

Chaimae Abouzahir, Congbo Ma, Nizar Habash, Farah E. Shamout

Comments Accepted to HeaLing-EACL 2026

2602.05373 2026-02-06 cs.SD

Speech-XL: Towards Long-Form Speech Understanding in Large Speech Language Models

Haoqin Sun, Chenyang Lyu, Shiwan Zhao, Xuanfan Ni, Xiangyu Kong, Longyue Wang, Weihua Luo, Yong Qin

2602.05360 2026-02-06 cs.CV

Breaking Semantic Hegemony: Decoupling Principal and Residual Subspaces for Generalized OOD Detection

Ningkang Peng, Xiaoqian Peng, Yuhao Zhang, Qianfeng Yu, Feng Xing, Peirong Ma, Xichen Yang, Yi Chen, Tingyu Lu, Yanhui Gu

2602.05349 2026-02-06 cs.CV

Learning with Adaptive Prototype Manifolds for Out-of-Distribution Detection

Ningkang Peng, JiuTao Zhou, Yuhao Zhang, Xiaoqian Peng, Qianfeng Yu, Linjing Qian, Tingyu Lu, Yi Chen, Yanhui Gu

2602.05347 2026-02-06 cs.CL

How Do Language Models Acquire Character-Level Information?

Soma Sato, Ryohei Sasano

Comments Accepted to EACL 2026 Main Conference

2602.05339 2026-02-06 cs.CV cs.LG

Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation

Yongwoo Kim, Sungmin Cha, Hyunsoo Kim, Jaewon Lee, Donghyun Kim

2602.05333 2026-02-06 cs.LG cs.IT math.IT

Pool-based Active Learning as Noisy Lossy Compression: Characterizing Label Complexity via Finite Blocklength Analysis

Kosuke Sugiyama, Masato Uchida

Comments 21 pages, 1 figure

2602.05327 2026-02-06 cs.AI

ProAct: Agentic Lookahead in Interactive Environments

Yangbin Yu, Mingyu Yang, Junyou Li, Yiming Gao, Feiyu Liu, Yijun Yang, Zichuan Lin, Jiafei Lyu, Yicheng Liu, Zhicong Lu, Deheng Ye, Jie Jiang

2602.04795 2026-02-06 cs.LG cs.NA eess.SP math.NA stat.ML

Maximum-Volume Nonnegative Matrix Factorization

Olivier Vu Thanh, Nicolas Gillis

Comments arXiv admin note: substantial text overlap with arXiv:2412.06380 (this paper is an updated version of Chapter 7 of the thesis of the first author, available from arXiv:2412.06380). The code is available from https://gitlab.com/vuthanho/maxvolmf.jl

2602.04408 2026-02-06 cs.LG stat.ML

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

Shizhou Xu

2602.04142 2026-02-06 cs.CV cs.AI

JSynFlow: Japanese Synthesised Flowchart Visual Question Answering Dataset built with Large Language Models

Hiroshi Sasaki

Comments 7 pages, 1 figure

Journal ref Proceedings of the Annual Conference of JSAI, JSAI2025:2Win587-2Win587, 2025

2602.02989 2026-02-06 cs.CV

SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation

Zhanfeng Liao, Jiajun Zhang, Hanzhang Tu, Zhixi Wang, Yunqi Gao, Hongwen Zhang, Yebin Liu

2602.02258 2026-02-06 cs.LG

Alignment-Aware Model Adaptation via Feedback-Guided Optimization

Gaurav Bhatt, Aditya Chinchure, Jiawei Zhou, Leonid Sigal

2602.01999 2026-02-06 cs.CL

From Latent Signals to Reflection Behavior: Tracing Meta-Cognitive Activation Trajectory in R1-Style LLMs

Yanrui Du, Yibo Gao, Sendong Zhao, Jiayun Li, Haochun Wang, Qika Lin, Kai He, Bing Qin, Mengling Feng

2602.01789 2026-02-06 cs.RO

RFS: Reinforcement Learning with Residual Flow Steering for Dexterous Manipulation

Entong Su, Tyler Westenbroek, Anusha Nagabandi, Abhishek Gupta

2602.00391 2026-02-06 cs.CV

Robust automatic brain vessel segmentation in 3D CTA scans using dynamic 4D-CTA data

Alberto Mario Ceballos-Arroyo, Shrikanth M. Yadav, Chu-Hsuan Lin, Jisoo Kim, Geoffrey S. Young, Lei Qin, Huaizu Jiang

Comments 18 pages, 10 figures

2602.00377 2026-02-06 cs.CL

DecompressionLM: Deterministic, Diagnostic, and Zero-Shot Concept Graph Extraction from Language Models

Zhaochen Hong, Jiaxuan You

2602.00151 2026-02-06 cs.CV cs.AI

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency

Alexander Blezinger, Wolfgang Nejdl, Ming Tang

Comments 9 pages, 7 figures and 5 tables

2601.23177 2026-02-06 cs.LG

MeshGraphNet-Transformer: Scalable Mesh-based Learned Simulation for Solid Mechanics

Mikel M. Iparraguirre, Iciar Alfaro, David Gonzalez, Elias Cueto

2601.22693 2026-02-06 cs.CV cs.AI

PEAR: Pixel-aligned Expressive humAn mesh Recovery

Jiahao Wu, Yunfei Liu, Lijian Lin, Ye Zhu, Lei Zhu, Jingyi Li, Yu Li

Comments 23 pages

2601.22401 2026-02-06 cs.AI math.CO math.NT

Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems

Tony Feng, Trieu Trinh, Garrett Bingham, Jiwon Kang, Shengtong Zhang, Sang-hyun Kim, Kevin Barreto, Carl Schildkraut, Junehyuk Jung, Jaehyeon Seo, Carlo Pagano, Yuri Chervonyi, Dawsen Hwang, Kaiying Hou, Sergei Gukov, Cheng-Chiang Tsai, Hyunwoo Choi, Youngbeom Jin, Wei-Yuan Li, Hao-An Wu, Ruey-An Shiu, Yu-Sheng Shih, Quoc V. Le, Thang Luong

Comments Reclassify Erdos-935 as Independent Rediscovery, bringing the number of autonomous solutions down to 5. (Explanation in Addendum 4.1) Elaborate on Footnote 3. Slightly reword various phrases in the Introduction in response to feedback

AI 大模型

视觉与机器人

科学与医疗