arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2602.19193 2026-02-24 cs.RO cs.AI

Visual Prompt Guided Unified Pushing Policy

Hieu Bui, Ziyan Gao, Yuya Hosoda, Joo-Ho Lee

2602.19188 2026-02-24 cs.CV

PositionOCR: Augmenting Positional Awareness in Multi-Modal Models via Hybrid Specialist Integration

Chen Duan, Zhentao Guo, Pei Fu, Zining Wang, Kai Zhou, Pengfei Yan

2602.19187 2026-02-24 cs.LG

Adaptive Problem Generation via Symbolic Representations

Teresa Yeo, Myeongho Jeon, Dulaj Weerakoon, Rui Qiao, Alok Prakash, Armando Solar-Lezama, Archan Misra

2602.19184 2026-02-24 cs.RO

Human-to-Robot Interaction: Learning from Video Demonstration for Robot Imitation

Thanh Nguyen Canh, Thanh-Tuan Tran, Haolan Zhang, Ziyan Gao, Nak Young Chong, Xiem HoangVan

详情

英文摘要

Learning from Demonstration (LfD) offers a promising paradigm for robot skill acquisition. Recent approaches attempt to extract manipulation commands directly from video demonstrations, yet face two critical challenges: (1) general video captioning models prioritize global scene features over task-relevant objects, producing descriptions unsuitable for precise robotic execution, and (2) end-to-end architectures coupling visual understanding with policy learning require extensive paired datasets and struggle to generalize across objects and scenarios. To address these limitations, we propose a novel ``Human-to-Robot'' imitation learning pipeline that enables robots to acquire manipulation skills directly from unstructured video demonstrations, inspired by the human ability to learn by watching and imitating. Our key innovation is a modular framework that decouples the learning process into two distinct stages: (1) Video Understanding, which combines Temporal Shift Modules (TSM) with Vision-Language Models (VLMs) to extract actions and identify interacted objects, and (2) Robot Imitation, which employs TD3-based deep reinforcement learning to execute the demonstrated manipulations. We validated our approach in PyBullet simulation environments with a UR5e manipulator and in a real-world experiment with a UF850 manipulator across four fundamental actions: reach, pick, move, and put. For video understanding, our method achieves 89.97% action classification accuracy and BLEU-4 scores of 0.351 on standard objects and 0.265 on novel objects, representing improvements of 76.4% and 128.4% over the best baseline, respectively. For robot manipulation, our framework achieves an average success rate of 87.5% across all actions, with 100% success on reaching tasks and up to 90% on complex pick-and-place operations. The project website is available at https://thanhnguyencanh.github.io/LfD4hri.

URL PDF HTML ☆

赞 0 踩 0

2602.19180 2026-02-24 cs.CV

VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery

Wenhao Shen, Hao Wang, Wanqi Yin, Fayao Liu, Xulei Yang, Chao Liang, Zhongang Cai, Guosheng Lin

Comments Accepted to CVPR 2026

2602.19178 2026-02-24 cs.CV

EMAD: Evidence-Centric Grounded Multimodal Diagnosis for Alzheimer's Disease

Qiuhui Chen, Xuancheng Yao, Zhenglei Zhou, Xinyue Hu, Yi Hong

Comments Accepted by CVPR2026

2602.19177 2026-02-24 cs.CL cs.AI

Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content

Simon Münker, Nils Schwager, Kai Kugler, Michael Heseltine, Achim Rettinger

Comments 8 pages (12 including references), 2 figures and 2 tables

2602.19173 2026-02-24 cs.RO cs.SY eess.SY

Distributed and Consistent Multi-Robot Visual-Inertial-Ranging Odometry on Lie Groups

Ziwei Kang, Yizhi Zhou

2602.19170 2026-02-24 cs.CV

BriMA: Bridged Modality Adaptation for Multi-Modal Continual Action Quality Assessment

Kanglei Zhou, Chang Li, Qingyi Pan, Liyuan Wang

Comments Accepted to CVPR 2026

2602.19169 2026-02-24 cs.LG cs.AI cs.MS math.PR

Virtual Parameter Sharpening: Dynamic Low-Rank Perturbations for Inference-Time Reasoning Enhancement

Saba Kublashvili

2602.19163 2026-02-24 cs.CV cs.MM cs.SD

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

Kai Liu, Yanhao Zheng, Kai Wang, Shengqiong Wu, Rongjunchen Zhang, Jiebo Luo, Dimitrios Hatzinakos, Ziwei Liu, Hao Fei, Tat-Seng Chua

Comments Accepted by ICLR 2026. Homepage: https://JavisVerse.github.io/JavisDiT2-page

2602.19161 2026-02-24 cs.CV

Flash-VAED: Plug-and-Play VAE Decoders for Efficient Video Generation

Lunjie Zhu, Yushi Huang, Xingtong Ge, Yufei Xue, Zhening Liu, Yumeng Zhang, Zehong Lin, Jun Zhang

Comments Code will be released at https://github.com/Aoko955/Flash-VAED

2602.19160 2026-02-24 cs.AI cs.CL cs.LO

Reasoning Capabilities of Large Language Models. Lessons Learned from General Game Playing

Maciej Świechowski, Adam Żychowski, Jacek Mańdziuk

2602.19159 2026-02-24 cs.AI cs.CL cs.LG

Beyond Behavioural Trade-Offs: Mechanistic Tracing of Pain-Pleasure Decisions in an LLM

Francesca Bianco, Derek Shiller

Comments 24 pages, 8+1 Tables

详情

英文摘要

Prior behavioural work suggests that some LLMs alter choices when options are framed as causing pain or pleasure, and that such deviations can scale with stated intensity. To bridge behavioural evidence (what the model does) with mechanistic interpretability (what computations support it), we investigate how valence-related information is represented and where it is causally used inside a transformer. Using Gemma-2-9B-it and a minimalist decision task modelled on prior work, we (i) map representational availability with layer-wise linear probing across streams, (ii) test causal contribution with activation interventions (steering; patching/ablation), and (iii) quantify dose-response effects over an epsilon grid, reading out both the 2-3 logit margin and digit-pair-normalised choice probabilities. We find that (a) valence sign (pain vs. pleasure) is perfectly linearly separable across stream families from very early layers (L0-L1), while a lexical baseline retains substantial signal; (b) graded intensity is strongly decodable, with peaks in mid-to-late layers and especially in attention/MLP outputs, and decision alignment is highest slightly before the final token; (c) additive steering along a data-derived valence direction causally modulates the 2-3 margin at late sites, with the largest effects observed in late-layer attention outputs (attn_out L14); and (d) head-level patching/ablation suggests that these effects are distributed across multiple heads rather than concentrated in a single unit. Together, these results link behavioural sensitivity to identifiable internal representations and intervention-sensitive sites, providing concrete mechanistic targets for more stringent counterfactual tests and broader replication. This work supports a more evidence-driven (a) debate on AI sentience and welfare, and (b) governance when setting policy, auditing standards, and safety safeguards.

URL PDF HTML ☆

赞 0 踩 0

2602.19158 2026-02-24 cs.AI

DoAtlas-1: A Causal Compilation Paradigm for Clinical AI

Yulong Li, Jianxu Chen, Xiwei Liu, Chuanyue Suo, Rong Xia, Zhixiang Lu, Yichen Li, Xinlin Zhuang, Niranjana Arun Menon, Yutong Xie, Eran Segal, Imran Razzak

2602.19156 2026-02-24 cs.CV cs.AI

Artefact-Aware Fungal Detection in Dermatophytosis: A Real-Time Transformer-Based Approach for KOH Microscopy

Rana Gursoy, Abdurrahim Yilmaz, Baris Kizilyaprak, Esmahan Caglar, Burak Temelkuran, Huseyin Uvet, Ayse Esra Koku Aksu, Gulsum Gencoglan

2602.19143 2026-02-24 cs.LG math.OC stat.ML

Incremental Learning of Sparse Attention Patterns in Transformers

Oğuz Kaan Yüksel, Rodrigo Alvarez Lucendo, Nicolas Flammarion

Comments 36 pages, 19 figures

2602.19142 2026-02-24 cs.LG cs.AI

Celo2: Towards Learned Optimization Free Lunch

Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Comments ICLR 2026

2602.19141 2026-02-24 cs.AI cs.CY cs.HC

Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians

Kartik Chandra, Max Kleiman-Weiner, Jonathan Ragan-Kelley, Joshua B. Tenenbaum

2602.19140 2026-02-24 cs.CV cs.LG

CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

Sijie Mai, Shiqin Han

Comments Accepted by CVPR 2026

2602.19134 2026-02-24 cs.CV

Mapping Networks

Lord Sen, Shyamapada Mukherjee

Comments 10 pages

2602.19133 2026-02-24 cs.CL

A Dataset for Named Entity Recognition and Relation Extraction from Art-historical Image Descriptions

Stefanie Schneider, Miriam Göldl, Julian Stalter, Ricarda Vollmer

2602.19131 2026-02-24 cs.LG cs.AI

Test-Time Learning of Causal Structure from Interventional Data

Wei Chen, Rui Ding, Bojun Huang, Yang Zhang, Qiang Fu, Yuxuan Liang, Han Shi, Dongmei Zhang

2602.19130 2026-02-24 cs.LG cs.AI

Detecting labeling bias using influence functions

Frida Jørgensen, Nina Weng, Siavash Bigdeli

2602.19127 2026-02-24 cs.CL

AgenticRAGTracer: A Hop-Aware Benchmark for Diagnosing Multi-Step Retrieval Reasoning in Agentic RAG

Qijie You, Wenkai Yu, Wentao Zhang

2602.19115 2026-02-24 cs.CL cs.AI cs.DL

How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

Michael McCoubrey, Angelo Salatino, Francesco Osborne, Enrico Motta

Comments Presented at SESAME 2025: Smarter Extraction of ScholArly MEtadata using Knowledge Graphs and Language Models, @ JCDL 2025

2602.19111 2026-02-24 cs.CL

Astra: Activation-Space Tail-Eigenvector Low-Rank Adaptation of Large Language Models

Kainan Liu, Yong Zhang, Ning Cheng, Yun Zhu, Yanmeng Wang, Shaojun Wang, Jing Xiao

Comments 22 pages, 10 figures

2602.19109 2026-02-24 cs.AI

Post-Routing Arithmetic in Llama-3: Last-Token Result Writing and Rotation-Structured Digit Directions

Yao Yan

2602.19108 2026-02-24 cs.RO

Understanding Fire Through Thermal Radiation Fields for Mobile Robots

Anton R. Wagner, Madhan Balaji Rao, Xuesu Xiao, Sören Pirk

2602.19094 2026-02-24 cs.LG

RKHS Representation of Algebraic Convolutional Filters with Integral Operators

Alejandro Parada-Mayorga, Alejandro Ribeiro, Juan Bazerque

AI 大模型

视觉与机器人

科学与医疗