arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2507.01761 2026-02-18 cs.LG cs.AI stat.ML

Enhanced Generative Model Evaluation with Clipped Density and Coverage

Nicolas Salvy, Hugues Talbot, Bertrand Thirion

Journal ref The Fourteenth International Conference on Learning Representations, 2026

2506.16594 2026-02-18 cs.CL

A Scoping Review of Synthetic Data Generation by Language Models in Biomedical Research and Application: Data Utility and Quality Perspectives

Hanshu Rao, Weisi Liu, Haohan Wang, I-Chan Huang, Zhe He, Xiaolei Huang

Journal ref Journal of Healthcare Informatics Research (2026)

2506.02649 2026-02-18 cs.AI

From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV

Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida, Zhu Han

2505.22914 2026-02-18 cs.CV cs.LG

cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna Vorontsova, Anton Konushin, Vladislav Kurenkov, Danila Rukhovich

Comments ICLR 2026 (Oral)

2505.18883 2026-02-18 cs.LG

Partition Generative Modeling: Masked Modeling Without Masks

Justin Deschenaux, Lan Tran, Caglar Gulcehre

2505.11824 2026-02-18 cs.LG cs.AI

Latent Veracity Inference for Identifying Errors in Stepwise Reasoning

Minsu Kim, Jean-Pierre Falet, Oliver E. Richardson, Xiaoyin Chen, Moksh Jain, Sungjin Ahn, Sungsoo Ahn, Yoshua Bengio

2504.14337 2026-02-18 cs.CV

Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithms

Josef Taher, Eric Hyyppä, Matti Hyyppä, Klaara Salolahti, Xiaowei Yu, Leena Matikainen, Antero Kukko, Matti Lehtomäki, Harri Kaartinen, Sopitta Thurachen, Paula Litkey, Ville Luoma, Markus Holopainen, Gefei Kong, Hongchao Fan, Petri Rönnholm, Matti Vaaja, Antti Polvivaara, Samuli Junttila, Mikko Vastaranta, Stefano Puliti, Rasmus Astrup, Joel Kostensalo, Mari Myllymäki, Maksymilian Kulicki, Krzysztof Stereńczak, Raul de Paula Pires, Ruben Valbuena, Juan Pedro Carbonell-Rivera, Jesús Torralba, Yi-Chen Chen, Lukas Winiwarter, Markus Hollaus, Gottfried Mandlburger, Narges Takhtkeshha, Fabio Remondino, Maciej Lisiewicz, Bartłomiej Kraszewski, Xinlian Liang, Jianchang Chen, Eero Ahokas, Kirsi Karila, Eugeniu Vezeteu, Petri Manninen, Roope Näsi, Heikki Hyyti, Siiri Pyykkönen, Peilun Hu, Juha Hyyppä

Journal ref ISPRS Journal of Photogrammetry and Remote Sensing, Volume 233, 2026, Pages 278-309

详情

DOI: 10.1016/j.isprsjprs.2026.01.031

英文摘要

Climate-smart and biodiversity-preserving forestry demands precise information on forest resources, extending to the individual tree level. Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing, but challenges remain in leveraging deep learning techniques and identifying rare tree species in class-imbalanced datasets. This study addresses these gaps by conducting a comprehensive benchmark of deep learning and traditional shallow machine learning methods for tree species classification. For the study, we collected high-density multispectral ALS data ($>1000$ $\mathrm{pts}/\mathrm{m}^2$) at three wavelengths using the FGI-developed HeliALS system, complemented by existing Optech Titan data (35 $\mathrm{pts}/\mathrm{m}^2$), to evaluate the species classification accuracy of various algorithms in a peri-urban study area located in southern Finland. We established a field reference dataset of 6326 segments across nine species using a newly developed browser-based crowdsourcing tool, which facilitated efficient data annotation. The ALS data, including a training dataset of 1065 segments, was shared with the scientific community to foster collaborative research and diverse algorithmic contributions. Based on 5261 test segments, our findings demonstrate that point-based deep learning methods, particularly a point transformer model, outperformed traditional machine learning and image-based deep learning approaches on high-density multispectral point clouds. For the high-density ALS dataset, a point transformer model provided the best performance reaching an overall (macro-average) accuracy of 87.9% (74.5%) with a training set of 1065 segments and 92.0% (85.1%) with a larger training set of 5000 segments.

URL PDF HTML ☆

赞 0 踩 0

2503.22399 2026-02-18 cs.CV

VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow

Ada Gorgun, Bernt Schiele, Jonas Fischer

Comments Accepted at the International Conference on Computer Vision 2025 (ICCV 2025). Code is available at: https://github.com/adagorgun/VITAL

2503.08550 2026-02-18 cs.CL

Transferring Extreme Subword Style Using Ngram Model-Based Logit Scaling

Craig Messner, Tom Lippincott

Comments Accepted for publication at NLP4DH 2025 @ NAACL

Journal ref Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities (2025)

2503.00509 2026-02-18 cs.LG cs.AI math.OC stat.ML

Functional multi-armed bandit and the best function identification problems

Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Anastasiia Soboleva

2503.00168 2026-02-18 cs.CV

SSL4EO-S12 v1.1: A Multimodal, Multiseasonal Dataset for Pretraining, Updated

Benedikt Blumenstiel, Nassim Ait Ali Braham, Conrad M Albrecht, Stefano Maurogiovanni, Paolo Fraccaro

2502.19412 2026-02-18 cs.CL

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness

Shir Ashury-Tahan, Yifan Mai, Rajmohan C, Ariel Gera, Yotam Perlitz, Asaf Yehudai, Elron Bandel, Leshem Choshen, Eyal Shnarch, Percy Liang, Michal Shmueli-Scheuer

2502.17812 2026-02-18 cs.CL cs.LG

Can Multimodal LLMs Perform Time Series Anomaly Detection?

Xiongxiao Xu, Haoran Wang, Yueqing Liang, Philip S. Yu, Yue Zhao, Kai Shu

Comments ACM Web Conference 2026 (WWW'26)

2502.13022 2026-02-18 cs.LG

Efficient and Sharp Off-Policy Learning under Unobserved Confounding

Konstantin Hess, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel

2412.20987 2026-02-18 cs.LG

RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

Mohamed Djilani, Salah Ghamizi, Maxime Cordy

2410.02674 2026-02-18 cs.CL

Examining Language Modeling Assumptions Using an Annotated Literary Dialect Corpus

Craig Messner, Tom Lippincott

Comments Accepted to NLP4DH@EMNLP2024

Journal ref Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities (2024)

2410.02605 2026-02-18 cs.LG cs.AI

Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning

Olivier Lepel, Anas Barakat

2408.14073 2026-02-18 cs.LG stat.ME stat.ML

Score-based change point detection via tracking the best of infinitely many experts

Anna Markovich, Nikita Puchkin

Comments 61 pages, 4 figures

2408.00539 2026-02-18 cs.CL cs.AI

Intermittent Semi-Working Mask: A New Masking Paradigm for LLMs

HaoYuan Hu, Mingcong Lu, Di Luo, XinYa Wu, Jiangcai Zhu, Taoye Yin, Zheng Li, Hao Wang, Shusheng Zhang, KeZun Zhang, KaiLai Shao, Chao Chen, Feng Wang

2407.05180 2026-02-18 cs.CV cs.AI cs.LG eess.IV

ReCAP: Recursive Cross Attention Network for Pseudo-Label Generation in Robotic Surgical Skill Assessment

Julien Quarez, Marc Modat, Sebastien Ourselin, Jonathan Shapey, Alejandro Granados

2405.21012 2026-02-18 cs.LG stat.ME

IGC-Net for conditional average potential outcome estimation over time

Konstantin Hess, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel

2401.15068 2026-02-18 cs.CL

Pairing Orthographically Variant Literary Words to Standard Equivalents Using Neural Edit Distance Models

Craig Messner, Tom Lippincott

Comments Accepted to LaTeCH@EACL2024

Journal ref Proceedings of the 8th Joint {SIGHUM} Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (2024)

2107.03633 2026-02-18 cs.LG stat.ML

Generalization Error of GAN from the Discriminator's Perspective

Hongkang Yang, Weinan E

2602.15533 2026-02-18 cs.RO

Efficient Knowledge Transfer for Jump-Starting Control Policy Learning of Multirotors through Physics-Aware Neural Architectures

Welf Rehberg, Mihir Kulkarni, Philipp Weiss, Kostas Alexis

Comments 8 pages. Accepted to IEEE Robotics and Automation Letters

2602.15532 2026-02-18 cs.AI cs.LG

Quantifying construct validity in large language model evaluations

Ryan Othniel Kearns

详情

英文摘要

The LLM community often reports benchmark results as if they are synonymous with general model capabilities. However, benchmarks can have problems that distort performance, like test set contamination and annotator error. How can we know that a benchmark is a reliable indicator of some capability that we want to measure? This question concerns the construct validity of LLM benchmarks, and it requires separating benchmark results from capabilities when we model and predict LLM performance. Both social scientists and computer scientists propose formal models - latent factor models and scaling laws - for identifying the capabilities underlying benchmark scores. However, neither technique is satisfactory for construct validity. Latent factor models ignore scaling laws, and as a result, the capabilities they extract often proxy model size. Scaling laws ignore measurement error, and as a result, the capabilities they extract are both uninterpretable and overfit to the observed benchmarks. This thesis presents the structured capabilities model, the first model to extract interpretable and generalisable capabilities from a large collection of LLM benchmark results. I fit this model and its two alternatives on a large sample of results from the OpenLLM Leaderboard. Structured capabilities outperform latent factor models on parsimonious fit indices, and exhibit better out-of-distribution benchmark prediction than scaling laws. These improvements are possible because neither existing approach separates model scale from capabilities in the appropriate way. Model scale should inform capabilities, as in scaling laws, and these capabilities should inform observed results up to measurement error, as in latent factor models. In combining these two insights, structured capabilities demonstrate better explanatory and predictive power for quantifying construct validity in LLM evaluations.

URL PDF HTML ☆

赞 0 踩 0

2602.15521 2026-02-18 cs.CL cs.LG

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns

Ziyu Zhao, Tong Zhu, Zhi Zhang, Tiantian Fan, Jinluan Yang, Kun Kuang, Zhongyu Wei, Fei Wu, Yu Cheng

2602.15516 2026-02-18 cs.CV

Semantic-Guided 3D Gaussian Splatting for Transient Object Removal

Aditi Prabakaran, Priyesh Shukla

2602.15514 2026-02-18 cs.CL

DependencyAI: Detecting AI Generated Text through Dependency Parsing

Sara Ahmed, Tracy Hammond

2602.15509 2026-02-18 cs.CL

Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination

Xiangyan Chen, Yujian Gan, Matthew Purver

2602.15506 2026-02-18 cs.CL

LuxMT Technical Report

Nils Rehlinger

Comments preprint