arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2601.15609 2026-01-27 cs.LG cs.CL

When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards

Mingyuan Fan, Weiguang Han, Daixin Wang, Cen Chen, Zhiqiang Zhang, Jun Zhou

2601.15516 2026-01-27 cs.CV cs.HC cs.LG

DeltaDorsal: Enhancing Hand Pose Estimation with Dorsal Features in Egocentric Views

William Huang, Siyou Pei, Leyi Zou, Eric J. Gonzalez, Ishan Chatterjee, Yang Zhang

Comments 16 pages, 11 figures, Presented at ACM CHI 2026. For associated codebase, see https://github.com/hilab-open-source/deltadorsal

2601.15436 2026-01-27 cs.AI cs.CL cs.CY

Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models

Shahar Ben Natan, Oren Tsur

2601.13295 2026-01-27 cs.LG cs.AI cs.CL cs.MA cs.SI

CooperBench: Why Coding Agents Cannot be Your Teammates Yet

Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, Diyi Yang

Comments https://cooperbench.com First two authors contribute equally. The 3th - 6th authors contribute equally

2601.12816 2026-01-27 cs.LG cs.AI

Fisher-Orthogonal Projected Natural Gradient Descent for Continual Learning

Ishir Garg, Neel Kolhe, Andy Peng, Rohan Gopalam

2601.12658 2026-01-27 cs.CL cs.AI

Augmenting Question Answering with A Hybrid RAG Approach

Tianyi Yang, Nashrah Haque, Vaishnave Jonnalagadda, Yuya Jeremy Ong, Zhehui Chen, Yanzhao Wu, Lei Yu, Divyesh Jadav, Wenqi Wei

Comments 10 pages, 5 tables, 2 figures; presented at IEEE CogMI 2025

2601.11451 2026-01-27 cs.CV cs.AI cs.LG

PRISM-CAFO: Prior-conditioned Remote-sensing Infrastructure Segmentation and Mapping for CAFOs

Oishee Bintey Hoque, Nibir Chandra Mandal, Kyle Luong, Amanda Wilson, Samarth Swarup, Madhav Marathe, Abhijin Adiga

2601.10547 2026-01-27 cs.SD

HeartMuLa: A Family of Open Sourced Music Foundation Models

Dongchao Yang, Yuxin Xie, Yuguo Yin, Zheyu Wang, Xiaoyu Yi, Gongxi Zhu, Xiaolong Weng, Zihan Xiong, Yingzhe Ma, Dading Cong, Jingliang Liu, Zihang Huang, Jinghan Ru, Rongjie Huang, Haoran Wan, Peixu Wang, Kuoxi Yu, Helin Wang, Liming Liang, Xianwei Zhuang, Yuanyuan Wang, Dingdong, Wang, Haohan Guo, Junjie Cao, Zeqian Ju, Songxiang Liu, Yuewen Cao, Heming Weng, Yuexian Zou

2601.10132 2026-01-27 cs.AI cs.LG

Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction

Yanan Cao, Farnaz Fallahi, Murali Mohana Krishna Dandu, Lalitesh Morishetti, Kai Zhao, Luyi Ma, Sinduja Subramaniam, Jianpeng Xu, Evren Korpeoglu, Kaushiki Nag, Sushant Kumar, Kannan Achan

Comments Accepted at The Web Conference 2026 (WWW 2026)

2601.09852 2026-01-27 cs.CL

Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences

Sriram Padmanabhan, Siyuan Song, Kanishka Misra

2601.08653 2026-01-27 cs.AI

Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding

Zenghua Liao, Jinzhi Liao, Xiang Zhao

2601.08235 2026-01-27 cs.AI cs.CL

MPCI-Bench: A Benchmark for Multimodal Pairwise Contextual Integrity Evaluation of Language Model Agents

Shouju Wang, Haopeng Zhang

2601.07974 2026-01-27 cs.CL

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

Yuxi Xia, Kinga Stańczak, Benjamin Roth

2601.05344 2026-01-27 cs.CV

Coding the Visual World: From Image to Simulation Using Vision Language Models

Sagi Eppel

2601.02103 2026-01-27 cs.CV

HeadLighter: Disentangling Illumination in Generative 3D Gaussian Heads via Lightstage Captures

Yating Wang, Yuan Sun, Xuan Wang, Ran Yi, Boyao Zhou, Yipengjing Sun, Hongyu Liu, Yinuo Wang, Lizhuang Ma

2601.02015 2026-01-27 cs.CL cs.AI cs.IT math.IT

Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets

Omar Momen, Emilie Sitter, Berenike Herrmann, Sina Zarrieß

Comments to be published at EACL 2026 main conference

2601.01126 2026-01-27 cs.CL

RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution

Andrew Borthwick, Stephen Ash

Comments 18 pages, 3 figures

2512.24235 2026-01-27 cs.CL

LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring

May Bashendy, Walid Massoud, Sohaila Eltanbouly, Salam Albatarni, Marwan Sayed, Abrar Abir, Houda Bouamor, Tamer Elsayed

Comments Accepted at EACL 2026 - main conference

2512.21231 2026-01-27 cs.LG cond-mat.mtrl-sci

MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

Andres M Bran, Tong Xie, Shai Pranesh, Jeffrey Meng, Xuan Vu Nguyen, Jeremy Goumaz, David Ming Segura, Ruizhi Xu, Dongzhan Zhou, Wenjie Zhang, Bram Hoex, Philippe Schwaller

2512.19855 2026-01-27 cs.RO stat.ML

Gaussian Variational Inference with Non-Gaussian Factors for State Estimation: A UWB Localization Case Study

Andrew Stirling, Mykola Lukashchuk, Dmitry Bagaev, Wouter Kouw, James R. Forbes

2512.19367 2026-01-27 cs.LG cs.AI cs.NA math.NA

Sprecher Networks: A Parameter-Efficient Kolmogorov-Arnold Architecture

Christian Hägg, Kathlén Kohn, Giovanni Luca Marchetti, Boris Shapiro

Comments 45 pages

详情

英文摘要

We introduce Sprecher Networks (SNs), a family of trainable architectures derived from David Sprecher's 1965 constructive form of the Kolmogorov-Arnold representation. Each SN block implements a "sum of shifted univariate functions" using only two shared learnable splines per block, a monotone inner spline $ϕ$ and a general outer spline $Φ$, together with a learnable shift parameter $η$ and a mixing vector $λ$ shared across all output dimensions. Stacking these blocks yields deep, compositional models; for vector-valued outputs we append an additional non-summed output block. We also propose an optional lateral mixing operator enabling intra-block communication between output channels with only $O(d_{\mathrm{out}})$ additional parameters. Owing to the vector (not matrix) mixing weights and spline sharing, SNs scale linearly in width, approximately $O(\sum_{\ell}(d_{\ell-1}+d_{\ell}+G))$ parameters for $G$ spline knots, versus $O(\sum_{\ell} d_{\ell-1}d_{\ell})$ for dense MLPs and $O(G\sum_{\ell} d_{\ell-1}d_{\ell})$ for edge-spline KANs. This linear width-scaling is particularly attractive for extremely wide, shallow models, where low depth can translate into low inference latency. Finally, we describe a sequential forward implementation that avoids materializing the $d_{\mathrm{in}}\times d_{\mathrm{out}}$ shifted-input tensor, reducing peak forward-intermediate memory from quadratic to linear in layer width, relevant for memory-constrained settings such as on-device/edge inference; we demonstrate deployability via fixed-point real-time digit classification on resource-constrained embedded device with only 4 MB RAM. We provide empirical demonstrations on supervised regression, Fashion-MNIST classification (including stable training at 25 hidden layers with residual connections and normalization), and a Poisson PINN, with controlled comparisons to MLP and KAN baselines.

URL PDF HTML ☆

赞 0 踩 0

2512.16912 2026-01-27 cs.LG cs.AI cs.CL

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin, Xi Chen, Tianyi Lin

Comments Accepted by ICLR 2026

2512.15765 2026-01-27 cs.LG cs.GT stat.ML

Data Valuation for LLM Fine-Tuning: Efficient Shapley Value Approximation via Language Model Arithmetic

Mélissa Tamine, Otmane Sakhi, Benjamin Heymann

Comments 11 pages, 2 figures

2512.14602 2026-01-27 cs.SD cs.LG

Sound and Music Biases in Deep Music Transcription Models: A Systematic Analysis

Lukáš Samuel Marták, Patricia Hu, Gerhard Widmer

Comments pre-print of the upcoming EURASIP JASM journal article

Journal ref EURASIP J. Audio Speech Music Process. Vol. 2026, Art. 5 (2026)

详情

DOI: 10.1186/s13636-025-00428-z

英文摘要

Automatic Music Transcription (AMT) -- the task of converting music audio into note representations -- has seen rapid progress, driven largely by deep learning systems. Due to the limited availability of richly annotated music datasets, much of the progress in AMT has been concentrated on classical piano music, and even a few very specific datasets. Whether these systems can generalize effectively to other musical contexts remains an open question. Complementing recent studies on distribution shifts in sound (e.g., recording conditions), in this work we investigate the musical dimension -- specifically, variations in genre, dynamics, and polyphony levels. To this end, we introduce the MDS corpus, comprising three distinct subsets -- (1) Genre, (2) Random, and (3) MAEtest -- to emulate different axes of distribution shift. We evaluate the performance of several state-of-the-art AMT systems on the MDS corpus using both traditional information-retrieval and musically-informed performance metrics. Our extensive evaluation isolates and exposes varying degrees of performance degradation under specific distribution shifts. In particular, we measure a note-level F1 performance drop of 20 percentage points due to sound, and 14 due to genre. Generally, we find that dynamics estimation proves more vulnerable to musical variation than onset prediction. Musically informed evaluation metrics, particularly those capturing harmonic structure, help identify potential contributing factors. Furthermore, experiments with randomly generated, non-musical sequences reveal clear limitations in system performance under extreme musical distribution shifts. Altogether, these findings offer new evidence of the persistent impact of the Corpus Bias problem in deep AMT systems.

URL PDF HTML ☆

赞 0 踩 0

2512.13481 2026-01-27 cs.AI cs.CL cs.CY

neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings

Arnav Ramamoorthy, Shrey Dhorajiya, Ojas Pungalia, Rashi Upadhyay, Abhishek Mishra, Abhiram H, Tejasvi Alladi, Sujan Yenuganti, Dhruv Kumar

Comments Under Review

2512.12884 2026-01-27 cs.CV cs.RO

Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection

Xiangzhong Liu, Jiajie Zhang, Hao Shen

Comments 6 pages, 3 figures, accepted at IV2025

Journal ref 2025 IEEE Intelligent Vehicles Symposium (IV)

2512.11908 2026-01-27 cs.RO

Safe Learning for Contact-Rich Robot Tasks: A Survey from Classical Learning-Based Methods to Safe Foundation Models

Heng Zhang, Rui Dai, Gokhan Solak, Pokuang Zhou, Yu She, Arash Ajoudani

Comments version 2

详情

DOI: 10.36227/techrxiv.176472870.03980379/v1

英文摘要

Contact-rich tasks pose significant challenges for robotic systems due to inherent uncertainty, complex dynamics, and the high risk of damage during interaction. Recent advances in learning-based control have shown great potential in enabling robots to acquire and generalize complex manipulation skills in such environments, but ensuring safety, both during exploration and execution, remains a critical bottleneck for reliable real-world deployment. This survey provides a comprehensive overview of safe learning-based methods for robot contact-rich tasks. We categorize existing approaches into two main domains: safe exploration and safe execution. We review key techniques, including constrained reinforcement learning, risk-sensitive optimization, uncertainty-aware modeling, control barrier functions, and model predictive safety shields, and highlight how these methods incorporate prior knowledge, task structure, and online adaptation to balance safety and efficiency. A particular emphasis of this survey is on how these safe learning principles extend to and interact with emerging robotic foundation models, especially vision-language models (VLMs) and vision-language-action models (VLAs), which unify perception, language, and control for contact-rich manipulation. We discuss both the new safety opportunities enabled by VLM/VLA-based methods, such as language-level specification of constraints and multimodal grounding of safety signals, and the amplified risks and evaluation challenges they introduce. Finally, we outline current limitations and promising future directions toward deploying reliable, safety-aligned, and foundation-model-enabled robots in complex contact-rich environments. More details and materials are available at our \href{ https://github.com/jack-sherman01/Awesome-Learning4Safe-Contact-rich-tasks}{Project GitHub Repository}.

URL PDF HTML ☆

赞 0 踩 0

2512.11654 2026-01-27 cs.CV

Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation

Luca Cazzola, Ahed Alboody

2512.10481 2026-01-27 cs.RO

Contact SLAM: An Active Tactile Exploration Policy Based on Physical Reasoning Utilized in Robotic Fine Blind Manipulation Tasks

Gaozhao Wang, Xing Liu, Zhenduo Ye, Zhengxiong Liu, Panfeng Huang

Comments 9 pages, 9 figures

2512.09313 2026-01-27 cs.LG cs.AI

Hetero-SplitEE: Split Learning of Neural Networks with Early Exits for Heterogeneous IoT Devices

Yuki Oda, Yuta Ono, Hiroshi Nakamura, Hideki Takase

Comments 8 pages. Accepted at MCSoC 2025

详情

DOI: 10.1109/MCSoC67473.2025.00093

英文摘要

The continuous scaling of deep neural networks has fundamentally transformed machine learning, with larger models demonstrating improved performance across diverse tasks. This growth in model size has dramatically increased the computational resources required for the training process. Consequently, distributed approaches, such as Federated Learning and Split Learning, have become essential paradigms for scalable deployment. However, existing Split Learning approaches assume client homogeneity and uniform split points across all participants. This critically limits their applicability to real-world IoT systems where devices exhibit heterogeneity in computational resources. To address this limitation, this paper proposes Hetero-SplitEE, a novel method that enables heterogeneous IoT devices to train a shared deep neural network in parallel collaboratively. By integrating heterogeneous early exits into hierarchical training, our approach allows each client to select distinct split points (cut layers) tailored to its computational capacity. In addition, we propose two cooperative training strategies, the Sequential strategy and the Averaging strategy, to facilitate this collaboration among clients with different split points. The Sequential strategy trains clients sequentially with a shared server model to reduce computational overhead. The Averaging strategy enables parallel client training with periodic cross-layer aggregation. Extensive experiments on CIFAR-10, CIFAR-100, and STL-10 datasets using ResNet-18 demonstrate that our method maintains competitive accuracy while efficiently supporting diverse computational constraints, enabling practical deployment of collaborative deep learning in heterogeneous IoT ecosystems.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

When Sharpening Becomes Collapse: Sampling Bias and Semantic Coupling in RL with Verifiable Rewards

DeltaDorsal: Enhancing Hand Pose Estimation with Dorsal Features in Egocentric Views

Not Your Typical Sycophant: The Elusive Nature of Sycophancy in Large Language Models

CooperBench: Why Coding Agents Cannot be Your Teammates Yet

Fisher-Orthogonal Projected Natural Gradient Descent for Continual Learning

Augmenting Question Answering with A Hybrid RAG Approach

PRISM-CAFO: Prior-conditioned Remote-sensing Infrastructure Segmentation and Mapping for CAFOs

HeartMuLa: A Family of Open Sourced Music Foundation Models

Is More Context Always Better? Examining LLM Reasoning Capability for Time Interval Prediction

Bears, all bears, and some bears. Language Constraints on Language Models' Inductive Inferences

Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding

MPCI-Bench: A Benchmark for Multimodal Pairwise Contextual Integrity Evaluation of Language Model Agents

Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis

Coding the Visual World: From Image to Simulation Using Vision Language Models

HeadLighter: Disentangling Illumination in Generative 3D Gaussian Heads via Lightstage Captures

Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets

RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution

LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring

MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

Gaussian Variational Inference with Non-Gaussian Factors for State Estimation: A UWB Localization Case Study

Sprecher Networks: A Parameter-Efficient Kolmogorov-Arnold Architecture

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

Data Valuation for LLM Fine-Tuning: Efficient Shapley Value Approximation via Language Model Arithmetic

Sound and Music Biases in Deep Music Transcription Models: A Systematic Analysis

neuralFOMO: Can LLMs Handle Being Second Best? Measuring Envy-Like Preferences in Multi-Agent Settings

Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection

Safe Learning for Contact-Rich Robot Tasks: A Survey from Classical Learning-Based Methods to Safe Foundation Models

Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation

Contact SLAM: An Active Tactile Exploration Policy Based on Physical Reasoning Utilized in Robotic Fine Blind Manipulation Tasks

Hetero-SplitEE: Split Learning of Neural Networks with Early Exits for Heterogeneous IoT Devices