arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2504.00843 2026-02-05 cs.AI cs.HC

Benchmarking Large Language Models for Diagnosing Students' Cognitive Skills from Handwritten Math Work

Yoonsu Kim, Hyoungwook Jin, Hayeon Doh, Eunhye Kim, Dongyun Jung, Seungju Kim, Kiyoon Choi, Jinho Son, Juho Kim

2503.20322 2026-02-05 cs.CV

Dynamic Pyramid Network for Efficient Multimodal Large Language Model

Hao Ai, Kunyi Wang, Zezhou Wang, Hao Lu, Jin Tian, Yaxin Luo, Peng Xing, Jen-Yuan Huang, Huaxia Li, Gen luo

2503.08301 2026-02-05 cs.LG cs.AI cs.NE

Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study

Xian-Rong Zhang, Yue-Jiao Gong, Yuan-Ting Zhong, Ting Huang, Jun Zhang

Comments 39 pages

Journal ref Information Sciences 726 (2026) 122762

2502.02542 2026-02-05 cs.LG cs.CR

OverThink: Slowdown Attacks on Reasoning LLMs

Abhinav Kumar, Jaechul Roh, Ali Naseh, Marzena Karpinska, Mohit Iyyer, Amir Houmansadr, Eugene Bagdasarian

2501.12974 2026-02-05 cs.CV

MorphoSkel3D: Morphological Skeletonization of 3D Point Clouds for Informed Sampling in Object Classification and Retrieval

Pierre Onghena, Santiago Velasco-Forero, Beatriz Marcotegui

Journal ref 2025 International Conference on 3D Vision (3DV)

2501.06148 2026-02-05 cs.LG stat.ML

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

Julius Berner, Lorenz Richter, Marcin Sendera, Jarrid Rector-Brooks, Nikolay Malkin

Comments TMLR final version; code: https://github.com/GFNOrg/gfn-diffusion/tree/stagger

2410.18844 2026-02-05 cs.LG cs.AI stat.ME stat.ML

Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

Udvas Das, Debabrota Basu

2410.17774 2026-02-05 cs.CV cs.GR

Quasi-Medial Distance Field (Q-MDF): A Robust Method for Approximating and Discretizing Neural Medial Axes

Jiayi Kong, Chen Zong, Jun Luo, Shiqing Xin, Fei Hou, Hanqing Jiang, Chen Qian, Ying He

2410.15539 2026-02-05 cs.CL cs.LG

Grammatical Error Correction for Low-Resource Languages: The Case of Zarma

Mamadou K. Keita, Adwoa Bremang, Huy Le, Dennis Owusu, Christopher Homan, Marcos Zampieri

2410.07427 2026-02-05 cs.LG stat.ML

A Generalization Bound for a Family of Implicit Networks

Samy Wu Fung, Benjamin Berkels

2410.03198 2026-02-05 cs.CL

PersoBench: Benchmarking Personalized Response Generation in Large Language Models

Saleh Afzoon, Zahra Jamali, Usman Naseem, Amin Beheshti

2409.07653 2026-02-05 cs.LG

STAND: Self-Aware Precondition Induction for Interactive Task Learning

Daniel Weitekamp, Glen Smith, Kenneth Koedinger, Christopher MacLellan

2407.02122 2026-02-05 cs.CL

Fake News Detection: It's All in the Data!

Soveatin Kuntur, Anna Wróblewska, Marcin Paprzycki, Maria Ganzha

2405.05981 2026-02-05 cs.LG cs.CE physics.comp-ph

Scalable physical source-to-field inference with hypernetworks

Berian James, Stefan Pollok, Ignacio Peis, Elizabeth Louise Baker, Jes Frellsen, Rasmus Bjørk

Comments Version accepted at TMLR

2404.12785 2026-02-05 cs.RO

AutoInspect: Towards Long-Term Autonomous Industrial Inspection

Michal Staniaszek, Tobit Flatscher, Joseph Rowell, Hanlin Niu, Wenxing Liu, Yang You, Robert Skilton, Maurice Fallon, Nick Hawes

Comments Accepted to the IEEE ICRA Workshop on Field Robotics 2024

Journal ref IEEE Transactions on Field Robotics, vol. 2, pp. 529-548, 2025

2402.10192 2026-02-05 cs.LG cs.AI cs.DM quant-ph

Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias

Philip A. LeMaitre, Marius Krumm, Hans J. Briegel

Comments 41 pages, 9 figures; Code repository at https://github.com/MariusKrumm/ManyBodyMEPS. Updated to be consistent with AIJ version

Journal ref Artificial Intelligence 352, 104489 (2026)

详情

DOI: 10.1016/j.artint.2026.104489

英文摘要

With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.

URL PDF HTML ☆

赞 0 踩 0

2401.14325 2026-02-05 cs.CV

Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction

Dominik Rößle, Jeremias Gerner, Klaus Bogenberger, Daniel Cremers, Stefanie Schmidtner, Torsten Schön

Comments Copyright 2024 IEEE. This is the accepted version of the paper. In 2024 IEEE Intelligent Vehicles Symposium (IV), pp. 2220-2225. Official paper available at https://doi.org/10.1109/IV55156.2024.10588608

Journal ref IEEE Intelligent Vehicles Symposium (IV), pp. 2220-2225, 2024

2311.13600 2026-02-05 cs.CV cs.GR cs.LG

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani

Comments Project page: https://ziplora.github.io

2602.04346 2026-02-05 cs.LG

MirrorLA: Reflecting Feature Map for Vision Linear Attention

Weikang Meng, Liangyu Huo, Yadan Luo, Yaowei Wang, Yingjian Li, Zheng Zhang

2602.04344 2026-02-05 cs.LG cs.AI

UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

Kou Misaki, Takuya Akiba

2602.04340 2026-02-05 cs.CV cs.AI

Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning

Qian-Wei Wang, Yaguang Song, Shu-Tao Xia

2602.04339 2026-02-05 cs.LG

RISE: Interactive Visual Diagnosis of Fairness in Machine Learning Models

Ray Chen, Christan Grant

2602.04337 2026-02-05 cs.CV cs.AI

Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner

Qian-Wei Wang, Guanghao Meng, Ren Cai, Yaguang Song, Shu-Tao Xia

2602.04328 2026-02-05 cs.CV

Multiview Self-Representation Learning across Heterogeneous Views

Jie Chen, Zhu Wang, Chuanbin Liu, Xi Peng

Comments 12 pages

2602.04326 2026-02-05 cs.AI cs.CL cs.MA

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

SeungWon Seo, SooBin Lim, SeongRae Noh, Haneul Kim, HyeongYeop Kang

Comments 31 pages, 10 figures, Accepted ICLR 2026

2602.04323 2026-02-05 cs.LG cs.AI

Efficient Equivariant High-Order Crystal Tensor Prediction via Cartesian Local-Environment Many-Body Coupling

Dian Jin, Yancheng Yuan, Xiaoming Tao

2602.04320 2026-02-05 cs.CL

A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction

Marco Martinelli, Stefano Marchesin, Vanessa Bonato, Giorgio Maria Di Nunzio, Nicola Ferro, Ornella Irrera, Laura Menotti, Federica Vezzani, Gianmaria Silvello

Comments Accepted to EACL 2026

2602.04317 2026-02-05 cs.CV

JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction

Zihan Lou, Jinlong Fan, Sihan Ma, Yuxiang Yang, Jing Zhang

Comments 15 pages, 15 figures, Project page at https://github.com/MiliLab/JOintGS

2602.04315 2026-02-05 cs.RO cs.CV

GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning

Guoqing Ma, Siheng Wang, Zeyu Zhang, Shan Yu, Hao Tang

详情

英文摘要

Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One fundamental challenge is that the models exhibit limited zero-shot capability, which hampers their ability to generalize effectively to unseen scenarios. In this work, we propose GeneralVLA (Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning), a hierarchical vision-language-action (VLA) model that can be more effective in utilizing the generalization of foundation models, enabling zero-shot manipulation and automatically generating data for robotics. In particular, we study a class of hierarchical VLA model where the high-level ASM (Affordance Segmentation Module) is finetuned to perceive image keypoint affordances of the scene; the mid-level 3DAgent carries out task understanding, skill knowledge, and trajectory planning to produce a 3D path indicating the desired robot end-effector trajectory. The intermediate 3D path prediction is then served as guidance to the low-level, 3D-aware control policy capable of precise manipulation. Compared to alternative approaches, our method requires no real-world robotic data collection or human demonstration, making it much more scalable to diverse tasks and viewpoints. Empirically, GeneralVLA successfully generates trajectories for 14 tasks, significantly outperforming state-of-the-art methods such as VoxPoser. The generated demonstrations can train more robust behavior cloning policies than training with human demonstrations or from data generated by VoxPoser, Scaling-up, and Code-As-Policies. We believe GeneralVLA can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting. Code: https://github.com/AIGeeksGroup/GeneralVLA. Website: https://aigeeksgroup.github.io/GeneralVLA.

URL PDF HTML ☆

赞 0 踩 0

2602.04304 2026-02-05 cs.CV cs.AI cs.CL

Beyond Static Cropping: Layer-Adaptive Visual Localization and Decoding Enhancement

Zipeng Zhu, Zhanghao Hu, Qinglin Zhu, Yuxi Hong, Yijun Liu, Jingyong Su, Yulan He, Lin Gui

Comments 9 pages, 5 figures

AI 大模型

视觉与机器人

科学与医疗

Benchmarking Large Language Models for Diagnosing Students' Cognitive Skills from Handwritten Math Work

Dynamic Pyramid Network for Efficient Multimodal Large Language Model

Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study

OverThink: Slowdown Attacks on Reasoning LLMs

MorphoSkel3D: Morphological Skeletonization of 3D Point Clouds for Informed Sampling in Object Classification and Retrieval

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

Quasi-Medial Distance Field (Q-MDF): A Robust Method for Approximating and Discretizing Neural Medial Axes

Grammatical Error Correction for Low-Resource Languages: The Case of Zarma

A Generalization Bound for a Family of Implicit Networks

PersoBench: Benchmarking Personalized Response Generation in Large Language Models

STAND: Self-Aware Precondition Induction for Interactive Task Learning

Fake News Detection: It's All in the Data!

Scalable physical source-to-field inference with hypernetworks

AutoInspect: Towards Long-Term Autonomous Industrial Inspection

Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias

Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

MirrorLA: Reflecting Feature Map for Vision Linear Attention

UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching

Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning

RISE: Interactive Visual Diagnosis of Fairness in Machine Learning Models

Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner

Multiview Self-Representation Learning across Heterogeneous Views

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

Efficient Equivariant High-Order Crystal Tensor Prediction via Cartesian Local-Environment Many-Body Coupling

A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction

JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction

GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning

Beyond Static Cropping: Layer-Adaptive Visual Localization and Decoding Enhancement