arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2104.12269 2026-01-28 cs.CL cs.AI cs.IR

A Bi-Encoder LSTM Model For Learning Unstructured Dialogs

Danny Brahman, Pooran S. Negi, Mohammad Mahoor

2601.19899 2026-01-28 cs.CL

Evaluation of Oncotimia: An LLM based system for supporting tumour boards

Luis Lorenzo, Marcos Montana-Mendez, Sergio Figueiras, Miguel Boubeta, Cristobal Bernardo-Castineira

Comments 9 pages, 2 figures

2601.19898 2026-01-28 cs.CV

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding

Shubham Patle, Sara Ghaboura, Hania Tariq, Mohammad Usman Khan, Omkar Thawakar, Rao Muhammad Anwer, Salman Khan

Comments Accepted to EACL-2026 (Main Track)

2601.19897 2026-01-28 cs.LG

Self-Distillation Enables Continual Learning

Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal

2601.19884 2026-01-28 cs.CV cs.LG

SONIC: Spectral Oriented Neural Invariant Convolutions

Gijs Joppe Moens, Regina Beets-Tan, Eduardo H. P. Pooch

Comments 10 pages, 4 figures. Accepted at ICLR 2026

2601.19871 2026-01-28 cs.CL

Reflective Translation: Improving Low-Resource Machine Translation via Structured Self-Reflection

Nicholas Cheng

Comments 12 pages, 3 figures, 6 tables. Accepted to the NeurIPS 2025 Workshop on Multilingual Representation Learning (Mexico City) and the AAAI 2025 Workshop on Language Models for Under-Resourced Communities (LM4UC). Code and data available at: https://github.com/Nickcheng123/reflective-translation-mt

2601.19867 2026-01-28 cs.LG

Bandits in Flux: Adversarial Constraints in Dynamic Environments

Tareq Si Salem

Comments Accepted to AISTATS 2026

2601.19862 2026-01-28 cs.LG cs.GT

Calibration without Ground Truth

Yuqing Kong, Mingyu Song, Yizhou Wang, Yifan Wu

2601.19856 2026-01-28 cs.RO

Estimating Trust in Human-Robot Collaboration through Behavioral Indicators and Explainability

Giulio Campagna, Marta Lagomarsino, Marta Lorenzini, Dimitrios Chrysostomou, Matthias Rehm, Arash Ajoudani

Journal ref IEEE Robotics and Automation Letters (Volume: 10, Issue: 10, October 2025)

2601.19850 2026-01-28 cs.CV

EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning

Binzhu Xie, Shi Qiu, Sicheng Zhang, Yinqiao Wang, Hao Xu, Muzammal Naseer, Chi-Wing Fu, Pheng-Ann Heng

Comments Accepted in ICLR 2026, Codebase: https://github.com/Nicous20/EgoHandICL

2601.19849 2026-01-28 cs.CV

HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation

Haya Alyoussef, Ahmad Bdeir, Diego Coello de Portugal Mecke, Tom Hanika, Niels Landwehr, Lars Schmidt-Thieme

2601.19847 2026-01-28 cs.CL

Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering

Fangan Dong, Zuming Yan, Xuri Ge, Zhiwei Xu, Mengqi Zhang, Xuanang Chen, Ben He, Xin Xin, Zhumin Chen, Ying Zhou

2601.19839 2026-01-28 cs.RO cs.AI cs.HC

HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs

Jeanne Malécot, Hamed Rahimi, Jeanne Cattoni, Marie Samson, Mouad Abrini, Mahdi Khoramshahi, Maribel Pino, Mohamed Chetouani

2601.19834 2026-01-28 cs.AI

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

Jialong Wu, Xiaoying Zhang, Hongyi Yuan, Xiangcheng Zhang, Tianhao Huang, Changjing He, Chaoyi Deng, Renrui Zhang, Youbin Wu, Mingsheng Long

Comments Project page: https://thuml.github.io/Reasoning-Visual-World

详情

英文摘要

Humans construct internal world models and reason by manipulating the concepts within these models. Recent advances in AI, particularly chain-of-thought (CoT) reasoning, approximate such human cognitive abilities, where world models are believed to be embedded within large language models. Expert-level performance in formal and abstract domains such as mathematics and programming has been achieved in current systems by relying predominantly on verbal reasoning. However, they still lag far behind humans in domains like physical and spatial intelligence, which require richer representations and prior knowledge. The emergence of unified multimodal models (UMMs) capable of both verbal and visual generation has therefore sparked interest in more human-like reasoning grounded in complementary multimodal pathways, though their benefits remain unclear. From a world-model perspective, this paper presents the first principled study of when and how visual generation benefits reasoning. Our key position is the visual superiority hypothesis: for certain tasks--particularly those grounded in the physical world--visual generation more naturally serves as world models, whereas purely verbal world models encounter bottlenecks arising from representational limitations or insufficient prior knowledge. Theoretically, we formalize internal world modeling as a core component of CoT reasoning and analyze distinctions among different forms of world models. Empirically, we identify tasks that necessitate interleaved visual-verbal CoT reasoning, constructing a new evaluation suite, VisWorld-Eval. Controlled experiments on a state-of-the-art UMM show that interleaved CoT significantly outperforms purely verbal CoT on tasks that favor visual world modeling, but offers no clear advantage otherwise. Together, this work clarifies the potential of multimodal world modeling for more powerful, human-like multimodal AI.

URL PDF HTML ☆

赞 0 踩 0

2601.19833 2026-01-28 cs.LG

A Multi-directional Meta-Learning Framework for Class-Generalizable Anomaly Detection

Padmaksha Roy, Lamine Mili, Almuatazbellah Boker

2601.19832 2026-01-28 cs.RO

Information-Theoretic Detection of Bimanual Interactions for Dual-Arm Robot Plan Generation

Elena Merlo, Marta Lagomarsino, Arash Ajoudani

Journal ref in IEEE Robotics and Automation Letters, vol. 10, no. 5, pp. 4532-4539, May 2025

2601.19826 2026-01-28 cs.RO cs.HC

Whether We Care, How We Reason: The Dual Role of Anthropomorphism and Moral Foundations in Robot Abuse

Fan Yang, Renkai Ma, Yaxin Hu, Lingyao Li

2601.19825 2026-01-28 cs.AI cs.DB

Routing End User Queries to Enterprise Databases

Saikrishna Sudarshan, Tanay Kulkarni, Manasi Patwardhan, Lovekesh Vig, Ashwin Srinivasan, Tanmay Tulsidas Verlekar

Comments 6 pages, 2 figures

2601.19824 2026-01-28 cs.AI cs.HC cs.IR cs.SI

An Interpretable Recommendation Model for Psychometric Data, With an Application to Gerontological Primary Care

Andre Paulino de Lima, Paula Castro, Suzana Carvalho Vaz de Andrade, Rosa Maria Marcucci, Ruth Caldeira de Melo, Marcelo Garcia Manzato

Comments 81 pages, 19 figures, 3 annexes

2601.19818 2026-01-28 cs.LG cs.NA math.NA

Learn and Verify: A Framework for Rigorous Verification of Physics-Informed Neural Networks

Kazuaki Tanaka, Kohei Yatabe

Comments 13 pages, 10 figures

2601.19798 2026-01-28 cs.CV

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Zhixiang Wei, Yi Li, Zhehan Kan, Xinghua Jiang, Zuwei Long, Shifeng Liu, Hongze Shen, Wei Liu, Xiaoyu Tan, Haojia Lin, Yubo Zhu, Qianyu Li, Di Yin, Haoyu Cao, Weibo Gu, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Yunsheng Wu, Mingkong Tang, Shuangyin Liu, Lexiang Tang, Haodong Lin, Junru Lu, Jiarui Qin, Lingfeng Qiao, Ruizhi Qiao, Bo Ke, Jianfeng He, Ke Li, Yangning Li, Yunhang Shen, Mengdan Zhang, Peixian Chen, Kun Yin, Bing Liu, Yunfei Wu, Huang Chen, Zhongpeng Cai, Xiaotian Li

2601.19795 2026-01-28 cs.CV

Diffusion for De-Occlusion: Accessory-Aware Diffusion Inpainting for Robust Ear Biometric Recognition

Deeksha Arun, Kevin W. Bowyer, Patrick Flynn

2601.19794 2026-01-28 cs.LG cs.SY eess.SY

Component-Aware Pruning Framework for Neural Network Controllers via Gradient-Based Importance Estimation

Ganesh Sundaram, Jonas Ulmen, Daniel Görges

Comments 8 pages, Submitted to the 2026 IFAC World Congress

2601.19793 2026-01-28 cs.AI

CASTER: Breaking the Cost-Performance Barrier in Multi-Agent Orchestration via Context-Aware Strategy for Task Efficient Routing

Shanyv Liu, Xuyang Yuan, Tao Chen, Zijun Zhan, Zhu Han, Danyang Zheng, Weishan Zhang, Shaohua Cao

2601.19788 2026-01-28 cs.LG cs.DC

Knowledge-Aware Evolution for Streaming Federated Continual Learning with Category Overlap and without Task Identifiers

Sixing Tan, Xianmin Liu

2601.19781 2026-01-28 cs.SD

Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning with Differentiable K-Means

Kentaro Onda, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

Comments Accepted to ICASSP 2026

2601.19773 2026-01-28 cs.CL

Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis

Zhuohan Long, Zhijie Bao, Zhongyu Wei

2601.19771 2026-01-28 cs.CV

PaW-ViT: A Patch-based Warping Vision Transformer for Robust Ear Verification

Deeksha Arun, Kevin W. Bowyer, Patrick Flynn

2601.19767 2026-01-28 cs.SD

Advanced Modeling of Interlanguage Speech Intelligibility Benefit with L1-L2 Multi-Task Learning Using Differentiable K-Means for Accent-Robust Discrete Token-Based ASR

Kentaro Onda, Satoru Fukayama, Daisuke Saito, Nobuaki Minematsu

Comments Accepted to ICASSP 2026

2601.19766 2026-01-28 cs.LG

The Effect of Architecture During Continual Learning

Allyson Hahn, Krishnan Raghavan

详情

英文摘要

Continual learning is a challenge for models with static architecture, as they fail to adapt to when data distributions evolve across tasks. We introduce a mathematical framework that jointly models architecture and weights in a Sobolev space, enabling a rigorous investigation into the role of neural network architecture in continual learning and its effect on the forgetting loss. We derive necessary conditions for the continual learning solution and prove that learning only model weights is insufficient to mitigate catastrophic forgetting under distribution shifts. Consequently, we prove that by learning the architecture and weights simultaneously at each task, we can reduce catastrophic forgetting. To learn weights and architecture simultaneously, we formulate continual learning as a bilevel optimization problem: the upper level selects an optimal architecture for a given task, while the lower level computes optimal weights via dynamic programming over all tasks. To solve the upper level problem, we introduce a derivative-free direct search algorithm to determine the optimal architecture. Once found, we must transfer knowledge from the current architecture to the optimal one. However, the optimal architecture will result in a weights parameter space different from the current architecture (i.e., dimensions of weights matrices will not match). To bridge the dimensionality gap, we develop a low-rank transfer mechanism to map knowledge across architectures of mismatched dimensions. Empirical studies across regression and classification problems, including feedforward, convolutional, and graph neural networks, demonstrate that learning the optimal architecture and weights simultaneously yields substantially improved performance (up to two orders of magnitude), reduced forgetting, and enhanced robustness to noise compared with static architecture approaches.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

A Bi-Encoder LSTM Model For Learning Unstructured Dialogs

Evaluation of Oncotimia: An LLM based system for supporting tumour boards

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding

Self-Distillation Enables Continual Learning

SONIC: Spectral Oriented Neural Invariant Convolutions

Reflective Translation: Improving Low-Resource Machine Translation via Structured Self-Reflection

Bandits in Flux: Adversarial Constraints in Dynamic Environments

Calibration without Ground Truth

Estimating Trust in Human-Robot Collaboration through Behavioral Indicators and Explainability

EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning

HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation

Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering

HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs

Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models

A Multi-directional Meta-Learning Framework for Class-Generalizable Anomaly Detection

Information-Theoretic Detection of Bimanual Interactions for Dual-Arm Robot Plan Generation

Whether We Care, How We Reason: The Dual Role of Anthropomorphism and Moral Foundations in Robot Abuse

Routing End User Queries to Enterprise Databases

An Interpretable Recommendation Model for Psychometric Data, With an Application to Gerontological Primary Care

Learn and Verify: A Framework for Rigorous Verification of Physics-Informed Neural Networks

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Diffusion for De-Occlusion: Accessory-Aware Diffusion Inpainting for Robust Ear Biometric Recognition

Component-Aware Pruning Framework for Neural Network Controllers via Gradient-Based Importance Estimation

CASTER: Breaking the Cost-Performance Barrier in Multi-Agent Orchestration via Context-Aware Strategy for Task Efficient Routing

Knowledge-Aware Evolution for Streaming Federated Continual Learning with Category Overlap and without Task Identifiers

Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning with Differentiable K-Means

Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis

PaW-ViT: A Patch-based Warping Vision Transformer for Robust Ear Verification

Advanced Modeling of Interlanguage Speech Intelligibility Benefit with L1-L2 Multi-Task Learning Using Differentiable K-Means for Accent-Robust Discrete Token-Based ASR

The Effect of Architecture During Continual Learning