arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2509.21029 2026-03-03 cs.LG

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

Runqi Lin, Alasdair Paren, Suqin Yuan, Muyang Li, Philip Torr, Adel Bibi, Tongliang Liu

Comments Accepted by CVPR 2026

2509.20323 2026-03-03 cs.LG math.OC stat.ML

A Recovery Guarantee for Sparse Neural Networks

Sara Fridovich-Keil, Mert Pilanci

Comments ICLR 2026

2509.19877 2026-03-03 cs.LG cond-mat.mtrl-sci cs.AI physics.chem-ph physics.comp-ph

Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials

Shi Yin, Zujian Dai, Xinyang Pan, Lixin He

详情

英文摘要

Deep learning methods for electronic-structure Hamiltonian prediction has offered significant computational efficiency advantages over traditional DFT methods, yet the diversity of atomic types, structural patterns, and the high-dimensional complexity of Hamiltonians pose substantial challenges to the generalization performance. In this work, we contribute on both the methodology and dataset sides to advance universal deep learning paradigm for Hamiltonian prediction. On the method side, we propose NextHAM, a neural E(3)-symmetry and expressive correction method for efficient and generalizable materials electronic-structure Hamiltonian prediction. First, we introduce the zeroth-step Hamiltonians, which can be efficiently constructed by the initial charge density of DFT, as informative descriptors of neural regression model in the input level and initial estimates of the target Hamiltonian in the output level, so that the regression model directly predicts the correction terms to the target ground truths, thereby significantly simplifying the input-output mapping for learning. Second, we present a neural Transformer architecture with strict E(3)-Symmetry and high non-linear expressiveness for Hamiltonian prediction. Third, we propose a novel training objective to ensure the accuracy performance of Hamiltonians in both real space and reciprocal space, preventing error amplification and the occurrence of "ghost states" caused by the large condition number of the overlap matrix. On the dataset side, we curate a high-quality broad-coverage large benchmark, namely Materials-HAM-SOC, comprising 17,000 material structures spanning 68 elements from six rows of the periodic table and explicitly incorporating SOC effects. Experimental results on Materials-HAM-SOC demonstrate that NextHAM achieves excellent accuracy and efficiency in predicting Hamiltonians and band structures.

URL PDF HTML ☆

赞 0 踩 0

2509.18487 2026-03-03 cs.CL

Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference

Ben Finkelshtein, Silviu Cucerzan, Sujay Kumar Jauhar, Ryen White

2509.17050 2026-03-03 cs.CV

Geodesic Prototype Matching via Diffusion Maps for Interpretable Fine-Grained Recognition

Junhao Jia, Yunyou Liu, Yifei Sun, Huangwei Chen, Feiwei Qin, Changmiao Wang, Yong Peng

Comments The paper has been accepted by ICASSP 2026

2509.16557 2026-03-03 cs.CV cs.ET cs.HC cs.LG

Person Identification from Egocentric Human-Object Interactions using 3D Hand Pose

Muhammad Hamza, Danish Hamid, Muhammad Tahir Akram

Comments 21 pages, 8 figures, 7 tables. Preprint of a manuscript to appear in CCF Trans. Pervasive Comp. Interact. (2026)

2509.15888 2026-03-03 cs.CL cs.AI

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

Senkang Hu, Xudong Han, Jinqi Jiang, Yihang Tao, Zihan Fang, Yong Dai, Sam Tak Wu Kwong, Yuguang Fang

Comments Accepted by NeurIPS'25

2509.14965 2026-03-03 cs.CV

Brain-HGCN: A Hyperbolic Graph Convolutional Network for Brain Functional Network Analysis

Junhao Jia, Yunyou Liu, Cheng Yang, Yifei Sun, Feiwei Qin, Changmiao Wang, Yong Peng

Comments The paper has been accepted by ICASSP 2026

2509.13789 2026-03-03 cs.CV cs.AI

BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching

Hanshuai Cui, Zhiqing Tang, Zhifei Xu, Zhi Yao, Wenyi Zeng, Weijia Jia

2509.12813 2026-03-03 cs.RO cs.SY eess.SY

Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks

Bowen Ye, Junyue Huang, Yang Liu, Xiaozhen Qiao, Xiang Yin

2509.12282 2026-03-03 cs.AI cs.LG

AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

Sasi Kiran Gaddipati, Farhana Keya, Gollam Rabby, Sören Auer

2509.11617 2026-03-03 cs.RO

AssemMate: Graph-Based LLM for Robotic Assembly Assistance

Qi Zheng, Chaoran Zhang, Zijian Liang, EnTe Lin, Shubo Cui, Qinghongbing Xie, Zhaobo Xu, Long Zeng

2509.10980 2026-03-03 cs.CV

TrueSkin: Towards Fair and Accurate Skin Tone Recognition and Generation

Haoming Lu

Comments The dataset is available for download at https://drive.google.com/file/d/1_ndw5uyY4h4DLL5iGTL4bVDKdE_g_H4B/view?usp=sharing

2509.08628 2026-03-03 cs.CV

LADB: Latent Aligned Diffusion Bridges for Semi-Supervised Domain Translation

Xuqin Wang, Tao Wu, Yanfeng Zhang, Lu Liu, Dong Wang, Mingwei Sun, Yongliang Wang, Niclas Zeller, Daniel Cremers

Journal ref Pattern Recognition. DAGM GCPR 2025

2509.07413 2026-03-03 cs.RO

DA-VPC: Disturbance-Aware Visual Predictive Control Scheme of Docking Maneuvers for Autonomous Trolley Collection

Yuhan Pang, Bingyi Xia, Zhe Zhang, Zhirui Sun, Peijia Xie, Bike Zhu, Wenjun Xu, Jiankun Wang

2509.04932 2026-03-03 cs.CV

UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features

Haowang Cui, Rui Chen, Jiaze Wang, Tao Guo, Zheng Qin

2509.03516 2026-03-03 cs.CV

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

Ouxiang Li, Yuan Wang, Xinting Hu, Huijuan Huang, Rui Chen, Jiarong Ou, Xin Tao, Pengfei Wan, Xiaojuan Qi, Fuli Feng

Comments Accepted to ICLR 2026. Project Page: https://t2i-corebench.github.io/

2509.03214 2026-03-03 cs.CV

RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

Junhao Jia, Yifei Sun, Yunyou Liu, Cheng Yang, Changmiao Wang, Feiwei Qin, Yong Peng, Wenwen Min

Comments The paper has been accepted by BIBM 2025

Journal ref 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2025, pp. 2301-2308

2508.19754 2026-03-03 cs.CV

FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers

Yue Wu, Xuanhong Chen, Yufan Wu, Wen Li, Yuxi Lu, Kairui Feng

2508.18672 2026-03-03 cs.LG cs.AI cs.CL

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Taishi Nakamura, Satoki Ishikawa, Masaki Kawamura, Takumi Okamoto, Daisuke Nohara, Jun Suzuki, Rio Yokota

Comments Accepted as an oral at ICLR 2026

2508.12811 2026-03-03 cs.CV cs.AI cs.LG

Next Visual Granularity Generation

Yikai Wang, Zhouxia Wang, Zhonghua Wu, Qingyi Tao, Kang Liao, Chen Change Loy

Comments ICLR 2026

2508.11999 2026-03-03 cs.CV cs.AI cs.IR cs.LG

MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding

Daoze Zhang, Chenghan Fu, Zhanheng Nie, Jianyu Liu, Wanxian Guan, Yuan Gao, Jun Song, Pengjie Wang, Jian Xu, Bo Zheng

Comments Accepted by WSDM 2026 (oral). 11 pages, 9 figures

详情

英文摘要

With the rapid advancement of e-commerce, exploring general representations rather than task-specific ones has attracted increasing research attention. For product understanding, although existing discriminative dual-flow architectures drive progress in this field, they inherently struggle to model the many-to-one alignment between multiple images and texts of products. Therefore, we argue that generative Multimodal Large Language Models (MLLMs) hold significant potential for improving product representation learning. Nevertheless, achieving this goal still remains non-trivial due to several key challenges: the lack of multimodal and aspect-aware modeling modules in typical LLMs; the common presence of background noise in product images; and the absence of a standard benchmark for evaluation. To address these issues, we propose the first generative MLLM-based model named MOON for product representation learning. Our method (1) employs a guided Mixture-of-Experts (MoE) module for targeted modeling of multimodal and aspect-specific product content; (2) effectively detects core semantic regions in product images to mitigate the distraction and interference caused by background noise; and (3) introduces the specialized negative sampling strategy to increase the difficulty and diversity of negative samples. In addition, we release a large-scale multimodal benchmark MBE for various product understanding tasks. Experimentally, our model demonstrates competitive zero-shot performance on both our benchmark and the public dataset, showcasing strong generalization across various downstream tasks, including cross-modal retrieval, product classification, and attribute prediction. Furthermore, the case study and visualization illustrate the effectiveness of MOON for product understanding. The data of our MBE benchmark is given in https://huggingface.co/datasets/Daoze/MM-Bench-E-Commerce.

URL PDF HTML ☆

赞 0 踩 0

2508.11484 2026-03-03 cs.CV

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

Xiaoxue Wu, Bingjie Gao, Yu Qiao, Yaohui Wang, Xinyuan Chen

Comments ICLR2026 Accept; Project Page:https://uknowsth.github.io/CineTrans/

2508.11428 2026-03-03 cs.CV

ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, Li Zhang

Comments Accepted for publication in 2026 IEEE International Conference on Robotics and Automation (ICRA)

2508.09003 2026-03-03 cs.RO cs.SY eess.SY

Large Scale Robotic Material Handling: Learning, Planning, and Control

Filippo A. Spinelli, Yifan Zhai, Fang Nan, Pascal Egli, Julian Nubert, Thilo Bleumer, Lukas Miller, Ferdinand Hofmann, Marco Hutter

Comments Final version published in IEEE Transactions on Field Robotics. It includes additional experiments and comparisons with classical methods

2508.03926 2026-03-03 cs.LG cs.NA math.DS math.NA

Next Generation Equation-Free Multiscale Modelling of Crowd Dynamics via Machine Learning

Hector Vargas Alvarez, Dimitrios G. Patsatzis, Lucia Russo, Ioannis Kevrekidis, Constantinos Siettos

Comments 43 pages (16 pages of Appendix), 16 figures (11 in Appendix)

详情

英文摘要

Bridging the microscopic and macroscopic modelling scales in crowd dynamics constitutes an open challenge for systematic numerical analysis, optimization, and control. Here, we propose a manifold-informed machine learning approach to learn the discrete evolution operator for the emergent/collective crowd dynamics in latent spaces from high-fidelity individual/agent-based simulations. The proposed framework is a four-stage one, \textit{explicitly conserving the mass} of the reconstructed dynamics in the high-dimensional space. In the first step, we derive continuous macroscopic fields (densities) from discrete microscopic data (pedestrians' positions) using Kernel Density Estimation. In the second step, we construct a map from the density-field space into an appropriate latent space parametrized by a few coordinates based on Proper-Orthogonal Decomposition (POD) of the corresponding density distributions. The third step involves learning reduced-order surrogate models in the latent space using machine learning techniques, particularly Long Short-Term Memory networks and Multivariate Autoregressive models. Finally, we reconstruct the crowd dynamics in the high-dimensional space with POD, demonstrating that the POD reconstruction conserves the mass. Thus, with this ``embed -> learn in latent space -> lift back to the high-dimensional space'' pipeline, we create an effective solution operator of the unavailable (at the macroscopic scale) PDE for the evolution of the density distribution. For our illustrations, we used the Social Force Model to generate data in a corridor with an obstacle, imposing periodic boundary conditions in two scenarios: (i) a unidirectional flow, and (ii) a counterflow. The numerical results demonstrate high accuracy, robustness, and generalizability, thus allowing for fast and accurate modelling/simulation of crowd dynamics from agent-based simulations.

URL PDF HTML ☆

赞 0 踩 0

2507.20034 2026-03-03 cs.RO cs.CV

Digital and Robotic Twinning for Validation of Proximity Operations and Formation Flying

Z. Ahmed, E. Bates, P. Francesch Huc, S. Y. W. Low, A. Golan, T. Bell, A. Rizza, S. D'Amico

Journal ref 2026 Rocky Mountain AAS GN&C Conference, Breckenridge, Colorado

2507.15852 2026-03-03 cs.CV cs.AI

Advancing Complex Video Object Segmentation via Progressive Concept Construction

Zhixiong Zhang, Shuangrui Ding, Xiaoyi Dong, Songxin He, Jianfan Lin, Junsong Tang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang

Comments project page: https://rookiexiong7.github.io/projects/SeC/ ; code: https://github.com/OpenIXCLab/SeC ; dataset: https://huggingface.co/datasets/OpenIXCLab/SeCVOS

2507.14894 2026-03-03 cs.CL

SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs

Boyi Deng, Yu Wan, Baosong Yang, Fei Huang, Wenjie Wang, Fuli Feng

Comments ICLR 2026

2507.08776 2026-03-03 cs.CV

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering

Zhengqing Wang, Yuefan Wu, Jiacheng Chen, Fuyang Zhang, Yasutaka Furukawa

Comments Project page: https://clift-nvs.github.io

AI 大模型

视觉与机器人

科学与医疗

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction

A Recovery Guarantee for Sparse Neural Networks

Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials

Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference

Geodesic Prototype Matching via Diffusion Maps for Interpretable Fine-Grained Recognition

Person Identification from Egocentric Human-Object Interactions using 3D Hand Pose

Distribution-Aligned Decoding for Efficient LLM Task Adaptation

Brain-HGCN: A Hyperbolic Graph Convolutional Network for Brain Functional Network Analysis

BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching

Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks

AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

AssemMate: Graph-Based LLM for Robotic Assembly Assistance

TrueSkin: Towards Fair and Accurate Skin Tone Recognition and Generation

LADB: Latent Aligned Diffusion Bridges for Semi-Supervised Domain Translation

DA-VPC: Disturbance-Aware Visual Predictive Control Scheme of Docking Maneuvers for Autonomous Trolley Collection

UniView: Enhancing Novel View Synthesis From A Single Image By Unifying Reference Features

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?

RTGMFF: Enhanced fMRI-based Brain Disorder Diagnosis via ROI-driven Text Generation and Multimodal Feature Fusion

FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Next Visual Granularity Generation

MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

Large Scale Robotic Material Handling: Learning, Planning, and Control

Next Generation Equation-Free Multiscale Modelling of Crowd Dynamics via Machine Learning

Digital and Robotic Twinning for Validation of Proximity Operations and Formation Flying

Advancing Complex Video Object Segmentation via Progressive Concept Construction

SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering