arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2510.21769 2026-02-12 cs.CV

H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows

Harry Zhang, Luca Carlone

2510.21584 2026-02-12 cs.CL

Automated Quality Control for Language Documentation: Detecting Phonotactic Inconsistencies in a Kokborok Wordlist

Kellen Parker van Dam, Abishek Stephen

Comments 7 pages, 3 tables, accepted to Workshop on NLP Applications to Field Linguistics at EACL 2026

2510.17247 2026-02-12 cs.CL cs.CV

From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

Zefan Cai, Haoyi Qiu, Haozhe Zhao, Ke Wan, Jiachen Li, Jiuxiang Gu, Wen Xiao, Nanyun Peng, Junjie Hu

Comments TMLR

2510.15044 2026-02-12 cs.LG quant-ph

IQNN-CS: Interpretable Quantum Neural Network for Credit Scoring

Abdul Samad Khan, Nouhaila Innan, Aeysha Khalique, Muhammad Shafique

Comments Accepted for oral presentation at QUEST-IS'25. To appear in Springer proceedings

Journal ref International Conference on Quantum Engineering Sciences and Technologies for Industry and Services 2025

2510.14331 2026-02-12 cs.LG

LLM Priors for ERM over Programs

Shivam Singhal, Priyadarsi Mishra, Eran Malach, Tomer Galanti

2510.11462 2026-02-12 cs.AI

Unifying Deductive and Abductive Reasoning in Knowledge Graphs with Masked Diffusion Model

Yisen Gao, Jiaxin Bai, Yi Huang, Xingcheng Fu, Qingyun Sun, Yangqiu Song

Comments Accepted by The Web Conference 2026

2510.08554 2026-02-12 cs.LG stat.ML

Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

Kevin Rojas, Jiahe Lin, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, Molei Tao, Wei Deng

2510.00309 2026-02-12 cs.LG stat.ML

Lipschitz Bandits with Stochastic Delayed Feedback

Zhongxuan Liu, Yue Kang, Thomas C. M. Lee

Comments The Fourteenth International Conference on Learning Representations (ICLR 2026)

2509.23050 2026-02-12 cs.LG cs.AI

Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding

Lin Long, Changdae Oh, Seongheon Park, Sharon Li

Comments ICLR 2026

2509.23049 2026-02-12 cs.LG cs.AI cs.DC

Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning

Zijian Wang, Xiaofei Zhang, Xin Zhang, Yukun Liu, Qiong Zhang

2509.22214 2026-02-12 cs.LG

A Law of Data Reconstruction for Random Features (and Beyond)

Leonardo Iurada, Simone Bombari, Tatiana Tommasi, Marco Mondelli

Comments Accepted ICLR 2026 - Code at https://github.com/iurada/data-reconstruction-law

2509.21916 2026-02-12 cs.CV

Enhancing Vehicle Detection under Adverse Weather Conditions with Contrastive Learning

Boying Li, Chang Liu, Petter Kyösti, Mattias Öhman, Devashish Singha Roy, Sofia Plazzi, Hamam Mokayed, Olle Hagner

2509.16944 2026-02-12 cs.CV

Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception

Yuheng Shi, Xiaohuan Pei, Minjing Dong, Chang Xu

Comments 20 pages, 6 figures

详情

英文摘要

Multimodal Large Language Models (MLLMs) require high-resolution visual information to perform fine-grained perception, yet processing entire high-resolution images is computationally prohibitive. While recent methods leverage a Region-of-Interest (RoI) mechanism to focus on salient areas, they typically present a difficult trade-off: training-based approaches depend on large-scale annotated datasets, while training-free methods that utilize the model's internal attention are computationally inefficient and less accurate, requiring either multi-pass prefill stages or reliance on the slow auto-regressive decoding process. In this paper, we propose an efficient, annotation-free Self-Distilled Region Proposal Network (SD-RPN) that resolves this trade-off. The SD-RPN is built around a pipeline that transforms the noisy attention maps from the MLLM's middle layers into high-quality pseudo-RoI labels by explicitly denoising the signal and resolving ambiguity. We use these labels to train a lightweight Region Proposal Network (RPN) that learns a more precise localization. This RPN is also highly efficient, predicting the RoI in a single forward pass using features from the MLLM's middle layers, decoupling RoI identification from the auto-regressive generation and avoiding costly multi-pass operations. To validate our approach, we integrate the framework into multiple MLLM families. Despite being trained on only a few (e.g. 10K) question-answer pairs, our method demonstrates exceptional data efficiency and generalization, achieving over a 10% absolute accuracy improvement on unseen benchmarks, including TextVQA, DocVQA, and V-Star. Our work presents a practical and scalable solution for enhancing the fine-grained perception of MLLMs without requiring costly supervision or full model fine-tuning. Code is available at https://github.com/YuHengsss/SD-RPN.

URL PDF HTML ☆

赞 0 踩 0

2509.16871 2026-02-12 cs.RO

HOGraspFlow: Taxonomy-Aware Hand-Object Retargeting for Multi-Modal SE(3) Grasp Generation

Yitian Shi, Zicheng Guo, Rosa Wolf, Edgar Welte, Rania Rayyes

Comments Accepted to ICRA 2026

2509.14671 2026-02-12 cs.CL cs.AI cs.LG

TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding

Xiaobo Xing, Wei Yuan, Tong Chen, Quoc Viet Hung Nguyen, Xiangliang Zhang, Hongzhi Yin

Comments Accepted to ICLR 2026. 26 pages, 11 figures

2509.09893 2026-02-12 cs.RO cs.AI

Self-Augmented Robot Trajectory: Efficient Imitation Learning via Safe Self-augmentation with Demonstrator-annotated Precision

Hanbit Oh, Masaki Murooka, Tomohiro Motoda, Ryoichi Nakajo, Yukiyasu Domae

Comments 21 pages, 10 figures, Advanced Robotics accepted 2026.02.03

2509.09679 2026-02-12 cs.LG cs.AI cs.CL

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms

Bingxin Xu, Zhen Dong, Oussama Elachqar, Yuzhang Shang

Comments Replace discrete Hadamard transforms with continuous Butterfly transforms to facilitate the learning of rotation matrices in LLM quantization

2509.05515 2026-02-12 cs.CV

Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting

Sen Wang, Kunyi Li, Siyun Liang, Elena Alegret, Jing Ma, Nassir Navab, Stefano Gasperini

Comments Project page: https://vala3d.github.io

2509.04821 2026-02-12 cs.CL

AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding

Yan Xie, Yibo Cui, Liang Xie, Erwei Yin

Comments Accepted to IEEE ICASSP 2026

2509.04345 2026-02-12 cs.SD cs.AI cs.LG

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

Qizhou Wang, Hanxun Huang, Guansong Pang, Sarah Erfani, Christopher Leckie

2508.16929 2026-02-12 cs.LG cs.CL

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning

Junxuan Wang, Xuyang Ge, Wentao Shu, Zhengfu He, Xipeng Qiu

Comments 27 pages, 16 figures

2508.09210 2026-02-12 cs.CV cs.AI

MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models

Fan Zhang, Zebang Cheng, Chong Deng, Haoxuan Li, Zheng Lian, Qian Chen, Huadai Liu, Wen Wang, Yi-Fan Zhang, Renrui Zhang, Ziyu Guo, Zhihong Zhu, Hao Wu, Haixin Wang, Yefeng Zheng, Xiaojiang Peng, Xian Wu, Kun Wang, Xiangang Li, Jieping Ye, Pheng-Ann Heng

2508.02882 2026-02-12 cs.LG cs.NA math.NA

Deep Network Trainability via Persistent Subspace Orthogonality

Alex Massucco, Davide Murari, Carola-Bibiane Schönlieb

2507.15975 2026-02-12 cs.RO

Fast Task Planning with Neuro-Symbolic Relaxation

Qiwei Du, Bowen Li, Yi Du, Shaoshu Su, Taimeng Fu, Zitong Zhan, Zhipeng Zhao, Chen Wang

Comments 8 pages, 6 figures

Journal ref IEEE Robotics and Automation Letters (RA-L), 2026

2507.06968 2026-02-12 cs.AI cs.CL

Scaling Towards the Information Boundary of Instruction Sets: The Infinity Instruct Subject Technical Report

Li Du, Hanyu Zhao, Yiming Ju, Tengfei Pan

2506.17667 2026-02-12 cs.AI

PhysUniBench: A Multi-Modal Physics Reasoning Benchmark at Undergraduate Level

Lintao Wang, Encheng Su, Jiaqi Liu, Pengze Li, Jiabei Xiao, Wenlong Zhang, Xinnan Dai, Xi Chen, Yuan Meng, Lei Bai, Wanli Ouyang, Shixiang Tang, Aoran Wang, Xinzhu Ma

2506.12365 2026-02-12 cs.CL cs.DB

Advances in LLMs with Focus on Reasoning, Adaptability, Efficiency and Ethics

Asifullah Khan, Muhammad Zaeem Khan, Aleesha Zainab, Saleha Jamshed, Sadia Ahmad, Kaynat Khatib, Faria Bibi, Abdul Rehman

2506.07304 2026-02-12 cs.CV

FANVID: A Benchmark for Face and License Plate Recognition in Low-Resolution Videos

Kavitha Viswanathan, Vrinda Goel, Shlesh Gholap, Devayan Ghosh, Madhav Gupta, Dhruvi Ganatra, Sanket Potdar, Amit Sethi

2506.03956 2026-02-12 cs.LG cs.CV

Adapt before Continual Learning

Aojun Lu, Tao Feng, Hangjie Yuan, Chunhui Ding, Yanan Sun

Comments Accepted to AAAI2026

2506.00131 2026-02-12 cs.LG cs.AI

Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization

Simon Sinong Zhan, Qingyuan Wu, Philip Wang, Frank Yang, Xiangyu Shi, Chao Huang, Qi Zhu

AI 大模型

视觉与机器人

科学与医疗