arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2602.06964 2026-02-09 cs.LG cs.AI cs.CL

Learning a Generative Meta-Model of LLM Activations

Grace Luo, Jiahai Feng, Trevor Darrell, Alec Radford, Jacob Steinhardt

2602.06959 2026-02-09 cs.CV

CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation

Kaiyi Huang, Yukun Huang, Yu Li, Jianhong Bai, Xintao Wang, Zinan Lin, Xuefei Ning, Jiwen Yu, Pengfei Wan, Yu Wang, Xihui Liu

Comments Project website: https://karine-huang.github.io/CineScene/

2602.06955 2026-02-09 cs.LG

Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine

Reza E. Fazel, Arash Bakhtiary, Siavash A. Bigdeli

Comments 22 pages, 5 figures, 5 tables

2602.06953 2026-02-09 cs.CL

DAWN: Dependency-Aware Fast Inference for Diffusion LLMs

Lizhuo Luo, Zhuoran Shi, Jiajun Luo, Zhi Wang, Shen Ren, Wenya Wang, Tianwei Zhang

2602.06950 2026-02-09 math.CO cs.DM

Metric Dimensions of March Madness Brackets

Sam Spiro

2602.06949 2026-02-09 cs.RO cs.AI cs.CV cs.LG

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

Shenyuan Gao, William Liang, Kaiyuan Zheng, Ayaan Malik, Seonghyeon Ye, Sihyun Yu, Wei-Cheng Tseng, Yuzhu Dong, Kaichun Mo, Chen-Hsuan Lin, Qianli Ma, Seungjun Nah, Loic Magne, Jiannan Xiang, Yuqi Xie, Ruijie Zheng, Dantong Niu, You Liang Tan, K. R. Zentner, George Kurian, Suneel Indupuru, Pooya Jannaty, Jinwei Gu, Jun Zhang, Jitendra Malik, Pieter Abbeel, Ming-Yu Liu, Yuke Zhu, Joel Jang, Linxi "Jim" Fan

Comments Project page: https://dreamdojo-world.github.io/

2602.06948 2026-02-09 cs.AI cs.LG

Agentic Uncertainty Reveals Agentic Overconfidence

Jean Kaddour, Srijan Patel, Gbètondji Dovonon, Leo Richter, Pasquale Minervini, Matt J. Kusner

2602.06945 2026-02-09 cs.LO cs.DC

Distributed Knowledge in Simplicial Models

Éric Goubault, Jérémy Ledent, Sergio Rajsbaum

2602.06944 2026-02-09 eess.SY cs.LG cs.SY

Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches

Saber Omidi, Rene Akupan Ebunle, Se Young Yoon

Comments 10 pages, 9 figures. Preprint; manuscript under journal review

2602.06942 2026-02-09 cs.CL cs.AI

Optimal Turkish Subword Strategies at Scale: Systematic Evaluation of Data, Vocabulary, Morphology Interplay

Duygu Altinok

Comments Submitted to Cambridge NLP journal, all rights belong to them

详情

英文摘要

Tokenization is a pivotal design choice for neural language modeling in morphologically rich languages (MRLs) such as Turkish, where productive agglutination challenges both vocabulary efficiency and morphological fidelity. Prior studies have explored tokenizer families and vocabulary sizes but typically (i) vary vocabulary without systematically controlling the tokenizer's training corpus, (ii) provide limited intrinsic diagnostics, and (iii) evaluate a narrow slice of downstream tasks. We present the first comprehensive, principled study of Turkish subword tokenization; a "subwords manifest", that jointly varies vocabulary size and tokenizer training corpus size (data and vocabulary coupling), compares multiple tokenizer families under matched parameter budgets (WordPiece, morphology level, and character baselines), and evaluates across semantic (NLI, STS, sentiment analysis, NER), syntactic (POS, dependency parsing), and morphology-sensitive probes. To explain why tokenizers succeed or fail, we introduce a morphology-aware diagnostic toolkit that goes beyond coarse aggregates to boundary-level micro/macro F1, decoupled lemma atomicity vs. surface boundary hits, over/under-segmentation indices, character/word edit distances (CER/WER), continuation rates, and affix-type coverage and token-level atomicity. Our contributions are fourfold: (i) a systematic investigation of the vocabulary-corpus-success triad; (ii) a unified, morphology-aware evaluation framework linking intrinsic diagnostics to extrinsic outcomes; (iii) controlled comparisons identifying when character-level and morphology-level tokenization pay off; and (iv) an open-source release of evaluation code, tokenizer pipelines, and models. As the first work of its kind, this "subwords manifest" delivers actionable guidance for building effective tokenizers in MRLs and establishes a reproducible foundation for future research.

URL PDF HTML ☆

赞 0 踩 0

2602.06940 2026-02-09 cs.LG

From Core to Detail: Unsupervised Disentanglement with Entropy-Ordered Flows

Daniel Galperin, Ullrich Köthe

2602.06939 2026-02-09 cs.LG cs.AI

Cochain Perspectives on Temporal-Difference Signals for Learning Beyond Markov Dynamics

Zuyuan Zhang, Sizhe Tang, Tian Lan

2602.06938 2026-02-09 cs.CV cs.LG

Reliable Mislabel Detection for Video Capsule Endoscopy Data

Julia Werner, Julius Oexle, Oliver Bause, Maxime Le Floch, Franz Brinkmann, Hannah Tolle, Jochen Hampe, Oliver Bringmann

2602.06937 2026-02-09 cs.SD cs.LG eess.AS

Reciprocal Latent Fields for Precomputed Sound Propagation

Hugo Seuté, Pranai Vasudev, Etienne Richan, Louis-Xavier Buffoni

Comments Temporary pre-print, will be updated. In review at a conference

2602.06935 2026-02-09 cs.IR

On the Efficiency of Sequentially Aware Recommender Systems: Cotten4Rec

Shankar Veludandi, Gulrukh Kurdistan, Uzma Mushtaque

2602.06923 2026-02-09 cs.LG cs.AI physics.class-ph

From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers

Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias

2602.06920 2026-02-09 cs.CL cs.AI

Halluverse-M^3: A multitask multilingual benchmark for hallucination in LLMs

Samir Abdaljalil, Parichit Sharma, Erchin Serpedin, Hasan Kurban

2602.06917 2026-02-09 eess.AS cs.LG

Automatic Detection and Analysis of Singing Mistakes for Music Pedagogy

Sumit Kumar, Suraj Jaiswal, Parampreet Singh, Vipul Arora

Comments Under Review at Transactions of Audio Speech and Language Processing

2602.06916 2026-02-09 cs.HC

Understanding Workplace Relatedness Support among Healthcare Professionals: A Four-Layer Model and Implications for Technology Design

Zheyuan Zhang, Dorian Peters, Lan Xiao, Jingjing Sun, Laura Moradbakhti, Andrew Hall, Rafael A. Calvo

Journal ref In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI 26), April 13 to 17, 2026, Barcelona, Spain. ACM, New York, NY, USA, 21 pages

2602.06915 2026-02-09 cs.HC

Directing Space: Rehearsing Architecture as Performer with Explainable AI

Pavlos Panagiotidis, Jocelyn Spence, Nils Jaeger, Paul Tennent

2602.06914 2026-02-09 cs.CV

Seeing Beyond Redundancy: Task Complexity's Role in Vision Token Specialization in VLLMs

Darryl Hannan, John Cooper, Dylan White, Yijing Watkins

Comments 25 pages

2602.06909 2026-02-09 cs.LG

Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models

Yunshi Wen, Wesley M. Gifford, Chandra Reddy, Lam M. Nguyen, Jayant Kalagnanam, Anak Agung Julius

2602.06907 2026-02-09 cs.LG

A first realization of reinforcement learning-based closed-loop EEG-TMS

Dania Humaidan, Jiahua Xu, Jing Chen, Christoph Zrenner, David Emanuel Vetter, Laura Marzetti, Paolo Belardinelli, Timo Roine, Risto J. Ilmoniemi, Gian Luca Romani, Ulf Zieman

2602.06903 2026-02-09 cs.CC

The First Known Problem That Is FPT with Respect to Node Scanwidth but Not Treewidth

Jannik Schestag, Norbert Zeh

2602.06900 2026-02-09 cs.LG cs.AI cs.IT cs.NE math.IT stat.ML

Supercharging Simulation-Based Inference for Bayesian Optimal Experimental Design

Samuel Klein, Willie Neiswanger, Daniel Ratner, Michael Kagan, Sean Gasiorowski

2602.06899 2026-02-09 cs.LG stat.ML

Sample Complexity of Causal Identification with Temporal Heterogeneity

Ameya Rathod, Sujay Belsare, Salvik Krishna Nautiyal, Dhruv Laad, Ponnurangam Kumaraguru

2602.06887 2026-02-09 cs.CR

Plato's Form: Toward Backdoor Defense-as-a-Service for LLMs with Prototype Representations

Chen Chen, Yuchen Sun, Jiaxin Gao, Yanwen Jia, Xueluan Gong, Qian Wang, Kwok-Yan Lam

2602.06884 2026-02-09 cs.LG

A Cycle-Consistent Graph Surrogate for Full-Cycle Left Ventricular Myocardial Biomechanics

Siyu Mu, Wei Xuan Chan, Choon Hwai Yap

2602.06879 2026-02-09 cs.CV cs.AI

NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices

Ruchika Chavhan, Malcolm Chadwick, Alberto Gil Couto Pimentel Ramos, Luca Morreale, Mehdi Noroozi, Abhinav Mehrotra

2602.06875 2026-02-09 cs.SE cs.AI

TraceCoder: A Trace-Driven Multi-Agent Framework for Automated Debugging of LLM-Generated Code

Jiangping Huang, Wenguang Ye, Weisong Sun, Jian Zhang, Mingyue Zhang, Yang Liu