arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2512.19243 2026-02-06 cs.CV

VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis

Meng Chu, Senqiao Yang, Haoxuan Che, Suiyun Zhang, Xichen Zhang, Shaozuo Yu, Haokun Gui, Zhefan Rao, Dandan Tu, Rui Liu, Jiaya Jia

2512.13636 2026-02-06 cs.CV cs.RO

MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Hongwei Xie, Bing Wang, Guang Chen, Dingkang Liang, Xiang Bai

Comments 16 pages, 12 figures, 6 tables; Project Page: https://xiaomi-mlab.github.io/MindDrive/

2512.10962 2026-02-06 cs.LG cs.AI

WebSTAR: Scalable Data Synthesis for Computer Use Agents with Step-Level Filtering

Yifei He, Pranit Chawla, Yaser Souri, Subhojit Som, Xia Song

Comments Project website: https://yifei-he.github.io/webstar-website/

2511.11007 2026-02-06 cs.CV cs.AI cs.LG

VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

Xinlei Yu, Chengming Xu, Guibin Zhang, Zhangquan Chen, Yudong Zhang, Yongbo He, Peng-Tao Jiang, Jiangning Zhang, Xiaobin Hu, Shuicheng Yan

2511.08667 2026-02-06 cs.LG stat.ML

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, Mihir Manium, Rosen Yu, Felix Jablonski, Shi Bin Hoo, Anurag Garg, Jake Robertson, Magnus Bühler, Vladyslav Moroshan, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Bernhard Schölkopf, Sauraj Gambhir, Noah Hollmann, Frank Hutter

2510.06528 2026-02-06 cs.SD cs.LG eess.AS

BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music

Mingyang Yao, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick

Comments Accepted by IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026

2510.03969 2026-02-06 cs.AI cs.CR cs.LG

How Catastrophic is Your LLM? Certifying Risk in Conversation

Chengxiao Wang, Isha Chaudhary, Qian Hu, Weitong Ruan, Rahul Gupta, Gagandeep Singh

Comments Accepted by ICLR 2026

2509.21293 2026-02-06 cs.LG

Optimal Robust Recourse with $L^p$-Bounded Model Change

Phone Kyaw, Kshitij Kayastha, Shahin Jabbari

Comments This paper appears in the proceedings of IEEE SATML-26

2509.10555 2026-02-06 cs.CV

SurgLaVi: Large-Scale Hierarchical Dataset for Surgical Vision-Language Representation Learning

Alejandra Perez, Chinedu Nwoye, Ramtin Raji Kermani, Omid Mohareri, Muhammad Abdullah Jamal

详情

英文摘要

Vision-language pre-training (VLP) offers unique advantages for surgery by aligning language with surgical videos, enabling workflow understanding and transfer across tasks without relying on expert-labeled datasets. However, progress in surgical VLP remains constrained by the limited scale, procedural diversity, semantic quality, and hierarchical structure of existing datasets. In this work, we present SurgLaVi, the largest and most diverse surgical vision-language dataset to date, comprising nearly 240k clip-caption pairs from more than 200 procedures, and featuring hierarchical levels at coarse-, mid-, and fine-level. At the core of SurgLaVi lies a fully automated pipeline that systematically generates fine-grained transcriptions of surgical videos and segments them into coherent procedural units. To ensure high-quality annotations, it applies dual-modality filtering to remove irrelevant and noisy samples. Within this framework, the resulting captions are enriched with contextual detail, producing annotations that are both semantically rich and easy to interpret. To ensure accessibility, we release SurgLaVi-$\b{eta}$, an open-source derivative of 113k clip-caption pairs constructed entirely from public data, which is over four times larger than existing surgical VLP datasets. To demonstrate the value of the SurgLaVi datasets, we introduce SurgCLIP, a CLIP-style video-text contrastive framework with dual encoders, as a representative base model. SurgCLIP achieves consistent improvements across phase, step, action, and tool recognition, surpassing prior state-of-the-art methods, often by large margins. These results validate that large-scale, semantically rich, and hierarchically structured datasets directly translate into stronger and more generalizable representations, establishing SurgLaVi as a key resource for developing surgical foundation models.

URL PDF HTML ☆

赞 0 踩 0

2509.03531 2026-02-06 cs.CL cs.AI cs.LG

Real-Time Detection of Hallucinated Entities in Long-Form Generation

Oscar Obeso, Andy Arditi, Javier Ferrando, Joshua Freeman, Cameron Holmes, Neel Nanda

2508.18175 2026-02-06 cs.LG cs.AI

Amortized Sampling with Transferable Normalizing Flows

Charlie B. Tan, Majdi Hassan, Leon Klein, Saifuddin Syed, Dominique Beaini, Michael M. Bronstein, Alexander Tong, Kirill Neklyudov

Comments Presented at NeurIPS 2025

2508.02276 2026-02-06 cs.LG cs.AI cs.CL q-bio.QM

CellForge: Agentic Design of Virtual Cell Models

Xiangru Tang, Zhuoyun Yu, Jiapeng Chen, Yan Cui, Daniel Shao, Weixu Wang, Fang Wu, Yuchen Zhuang, Wenqi Shi, Zhi Huang, Arman Cohan, Xihong Lin, Fabian Theis, Smita Krishnaswamy, Mark Gerstein

2508.02016 2026-02-06 cs.AI

Dynamic Context Adaptation for Consistent Role-Playing Agents with Retrieval-Augmented Generations

Jeiyoon Park, Yongshin Han, Minseop Kim, Kisu Yang

Comments preprint

2507.15155 2026-02-06 cs.RO

Learning-Based Modeling of a Magnetically Steerable Soft Suction Device for Endoscopic Endonasal Interventions

Majid Roshanfar, Alex Zhang, Changyan He, Amir Hooshiar, Dale J. Podolsky, Thomas Looi, Eric Diller

Journal ref Mechatronics 116 (2026) 103468

详情

DOI: 10.1016/j.mechatronics.2026.103468

英文摘要

This paper introduces a learning-based modeling framework for a magnetically steerable soft suction device designed for endoscopic endonasal brain tumor resection. The device is miniaturized (4 mm outer diameter, 2 mm inner diameter, 40 mm length), 3D printed using biocompatible SIL 30 material, and integrates embedded Fiber Bragg Grating (FBG) sensors for real-time shape feedback. Shape reconstruction is represented using four Bezier control points, providing a compact representation of deformation. A data-driven model was trained on 5,097 experimental samples to learn the mapping from magnetic field parameters (magnitude: 0-14 mT, frequency: 0.2-1.0 Hz, vertical tip distances: 90-100 mm) to Bezier control points defining the robot's 3D shape. Both Neural Network (NN) and Random Forest (RF) architectures were compared. The RF model outperformed the NN, achieving a mean RMSE of 0.087 mm in control point prediction and 0.064 mm in shape reconstruction error. Feature importance analysis revealed that magnetic field components predominantly influence distal control points, while frequency and distance affect the base configuration. Unlike prior studies applying general machine learning to soft robotic data, this framework introduces a new paradigm linking magnetic actuation inputs directly to geometric Bezier control points, creating an interpretable, low-dimensional deformation representation. This integration of magnetic field characterization, embedded FBG sensing, and Bezier-based learning provides a unified strategy extensible to other magnetically actuated continuum robots. By enabling sub-millimeter shape prediction and real-time inference, this work advances intelligent control of magnetically actuated soft robotic tools in minimally invasive neurosurgery.

URL PDF HTML ☆

赞 0 踩 0

2506.24068 2026-02-06 cs.CL cs.AI

STACK: Adversarial Attacks on LLM Safeguard Pipelines

Ian R. McKenzie, Oskar J. Hollinsworth, Tom Tseng, Xander Davies, Stephen Casper, Aaron D. Tucker, Robert Kirk, Adam Gleave

Comments Add results on other models and datasets

2506.13771 2026-02-06 cs.LG cs.AI cs.CL

LittleBit: Ultra Low-Bit Quantization via Latent Factorization

Banseok Lee, Dongkyu Kim, Youngcheon You, Youngmin Kim

Comments Accepted to NeurIPS 2025. Banseok Lee and Dongkyu Kim contributed equally

2506.13342 2026-02-06 cs.AI cs.CL cs.LG

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

Wooseok Seo, Seungju Han, Jaehun Jung, Benjamin Newman, Seungwon Lim, Seungbeen Lee, Ximing Lu, Yejin Choi, Youngjae Yu

Comments Accepted to COLM 2025

2506.12474 2026-02-06 cs.LG cs.AI

Generalizable Trajectory Prediction via Inverse Reinforcement Learning with Mamba-Graph Architecture

Wenyun Li, Wenjie Huang, Zejian Deng, Chen Sun

2506.05701 2026-02-06 cs.LG

Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health

Pavel Dolin, Weizhi Li, Gautam Dasarathy, Visar Berisha

Journal ref 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

2506.04022 2026-02-06 cs.AI cs.LG

Interpretability by Design for Efficient Multi-Objective Reinforcement Learning

Qiyue Xia, Tianwei Wang, J. Michael Herrmann

2505.18051 2026-02-06 cs.CV

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

Anthony Fuller, Yousef Yassin, Junfeng Wen, Daniel G. Kyrollos, Tarek Ibrahim, James R. Green, Evan Shelhamer

2505.17767 2026-02-06 cs.CL

Position: The Real Barrier to LLM Agent Usability is Agentic ROI

Weiwen Liu, Jiarui Qin, Xu Huang, Xingshan Zeng, Yunjia Xi, Jianghao Lin, Chuhan Wu, Yasheng Wang, Lifeng Shang, Ruiming Tang, Defu Lian, Yong Yu, Weinan Zhang

2505.17406 2026-02-06 cs.AI

Robust Answers, Fragile Logic: Probing the Decoupling Hypothesis in LLM Reasoning

Enyi Jiang, Changming Xu, Nischay Singh, Tian Qiu, Gagandeep Singh

2505.17004 2026-02-06 cs.LG cs.AI cs.NA math.NA stat.ML

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Jiachen Yao, Abbas Mammadov, Julius Berner, Gavin Kerrigan, Jong Chul Ye, Kamyar Azizzadenesheli, Anima Anandkumar

Comments Accepted to NeurIPS 2025

2505.14814 2026-02-06 cs.SD cs.CL eess.AS

GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples

Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang

Comments Accepted at Interspeech 2025

Journal ref Proc. Interspeech 2025, 2680-2684

2505.10960 2026-02-06 cs.LG cs.AI cs.DB

Relational Graph Transformer

Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico López, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, Jure Leskovec

Comments ICLR 2026, Code: https://github.com/snap-stanford/relgt

2504.15281 2026-02-06 cs.CV

StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians

Cailin Zhuang, Yaoqi Hu, Xuanyang Zhang, Wei Cheng, Jiacheng Bao, Shengqi Liu, Yiying Yang, Xianfang Zeng, Gang Yu, Ming Li

Comments 18 pages; Project page: https://styleme3d.github.io/

2504.15259 2026-02-06 cs.CV cs.AI

Bringing Diversity from Diffusion Models to Semantic-Guided Face Asset Generation

Yunxuan Cai, Sitao Xiang, Zongjian Li, Haiwei Chen, Yajie Zhao

Comments Accepted Manuscript

2504.11634 2026-02-06 cs.RO

Doppler-SLAM: Doppler-Aided Radar-Inertial and LiDAR-Inertial Simultaneous Localization and Mapping

Dong Wang, Hannes Haag, Daniel Casado Herraez, Stefan May, Cyrill Stachniss, Andreas Nüchter

Comments 8 pages, 7 figures

Journal ref IEEE Robotics and Automation Letters (RA-L), 2025

2504.05727 2026-02-06 cs.RO

SAP-CoPE: Social-Aware Planning using Cooperative Pose Estimation with Infrastructure Sensor Nodes

Minghao Ning, Yufeng Yang, Shucheng Huang, Jiaming Zhong, Keqi Shu, Chen Sun, Ehsan Hashemi, Amir Khajepour

Comments This paper has been submitted to the IEEE Transactions on Automation Science and Engineering

AI 大模型

视觉与机器人

科学与医疗