arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.18584 2026-04-21 cs.AI cs.DL cs.IR cs.LG

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

Comments ICLR 2026; Website: http://mathnet.mit.edu

详情

Journal ref: Proceedings of the International Conference on Learning Representations (ICLR), 2026

英文摘要

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and two decades of competitions, comprising 30,676 expert-authored problems with solutions across diverse domains. In addition to the core dataset, we construct a retrieval benchmark consisting of mathematically equivalent and structurally similar problem pairs curated by human experts. MathNet supports three tasks: (i) Problem Solving, (ii) Math-Aware Retrieval, and (iii) Retrieval-Augmented Problem Solving. Experimental results show that even state-of-the-art reasoning models (78.4% for Gemini-3.1-Pro and 69.3% for GPT-5) remain challenged, while embedding models struggle to retrieve equivalent problems. We further show that retrieval-augmented generation performance is highly sensitive to retrieval quality; for example, DeepSeek-V3.2-Speciale achieves gains of up to 12%, obtaining the highest scores on the benchmark. MathNet provides the largest high-quality Olympiad dataset together with the first benchmark for evaluating mathematical problem retrieval, and we publicly release both the dataset and benchmark at https://mathnet.mit.edu.

URL PDF HTML ☆

赞 0 踩 0

2604.18583 2026-04-21 cs.CV

MUA: Mobile Ultra-detailed Animatable Avatars

Heming Zhu, Guoxing Sun, Marc Habermann

Comments Project page: https://vcai.mpi-inf.mpg.de/projects/MUA/

详情

英文摘要

Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed along two directions: improving the fidelity of dynamic geometry and appearance, or reducing computational complexity to enable deployment on resource-constrained platforms, e.g., VR headsets. However, existing approaches fail to achieve both goals simultaneously: Ultra-high-fidelity avatars typically require substantial computation on server-class GPUs, whereas lightweight avatars often suffer from limited surface dynamics, reduced appearance details, and noticeable artifacts. To bridge this gap, we propose a novel animatable avatar representation, termed Wavelet-guided Multi-level Spatial Factorized Blendshapes, and a corresponding distillation pipeline that transfers motion-aware clothing dynamics and fine-grained appearance details from a pre-trained ultra-high-quality avatar model into a compact, efficient representation. By coupling multi-level wavelet spectral decomposition with low-rank structural factorization in texture space, our method achieves up to 2000X lower computational cost and a 10X smaller model size than the original high-quality teacher avatar model, while preserving visually plausible dynamics and appearance details closely resemble those of the teacher model. Extensive comparisons with state-of-the-art methods show that our approach significantly outperforms existing avatar approaches designed for mobile settings and achieves comparable or superior rendering quality to most approaches that can only run on servers. Importantly, our representation substantially improves the practicality of high-fidelity avatars for immersive applications, achieving over 180 FPS on a desktop PC and real-time native on-device performance at 24 FPS on a standalone Meta Quest 3.

URL PDF HTML ☆

赞 0 踩 0

2604.18575 2026-04-21 cs.CV

ReCap: Lightweight Referential Grounding for Coherent Story Visualization

Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach

Comments Diffusion Models, Story Visualization

详情

英文摘要

Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives unfold. Maintaining such cross-frame consistency has traditionally relied on explicit memory banks, architectural expansion, or auxiliary language models, resulting in substantial parameter growth and inference overhead. We introduce ReCap, a lightweight consistency framework that improves character stability and visual fidelity without modifying the base diffusion backbone. ReCap's CORE (COnditional frame REferencing) module treats anaphors, in our case pronouns, as visual anchors, activating only when characters are referred to by a pronoun and conditioning on the preceding frame to propagate visual identity. This selective design avoids unconditional cross-frame conditioning and introduces only 149K additional parameters, a fraction of the cost of memory-bank and LLM-augmented approaches. To further stabilize identity, we incorporate SemDrift (Guided Semantic Drift Correction) applied only during training. When text is vague or referential, the denoiser lacks a visual anchor for identity-defining attributes, causing character appearance to drift across frames, SemDrift corrects this by aligning denoiser representations with pretrained DINOv3 visual embeddings, enforcing semantic identity stability at zero inference cost. ReCap outperforms previous state-of-the-art, StoryGPT-V, on the two main benchmarks for story visualization by 2.63% Character-Accuracy on FlintstonesSV and by 5.65% on PororoSV, establishing a new state-of-the-art character consistency on both benchmarks. Furthermore, we extend story visualization to human-centric narratives derived from real films, demonstrating the capability of ReCap beyond stylized cartoon domains.

URL PDF HTML ☆

赞 0 踩 0

2604.18574 2026-04-21 cs.LG cs.AI

When Can LLMs Learn to Reason with Weak Supervision?

Salman Rahman, Jingyan Shen, Anna Mordvina, Hamid Palangi, Saadia Gabriel, Pavel Izmailov

2604.18573 2026-04-21 cs.CV

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

Savya Khosla, Sethuraman T, Aryan Chadha, Alex Schwing, Derek Hoiem

2604.18567 2026-04-21 cs.LG cs.AI cs.CL

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

Manan Gupta, Dhruv Kumar

Comments Under Review

2604.18563 2026-04-21 cs.CL

Dual Alignment Between Language Model Layers and Human Sentence Processing

Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki, Ethan Gotlieb Wilcox

Comments ACL 2026 main

2604.18549 2026-04-21 cs.CV

Advancing Vision Transformer with Enhanced Spatial Priors

Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He

Comments Accepted by TPAMI2026

2604.18548 2026-04-21 cs.LG q-bio.QM

Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems

William Lavery, Jodie A. Cochrane, Christian Olesen, Dagim S. Tadele, John T. Nardini, Sara Hamis

2604.18546 2026-04-21 cs.LG eess.SP math.OC

Wasserstein Distributionally Robust Risk-Sensitive Estimation via Conditional Value-at-Risk

Feras Al Taha, Eilyan Bitar

Comments 6 pages, 2 figures

2604.18539 2026-04-21 cs.CL cs.AI

Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations

Eric Rudolph, Philipp Steigerwald, Jens Albrecht

Comments Accepted as ACL findings paper

2604.18537 2026-04-21 cs.CV

MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation

Tanjim Rahaman Fardin, S M Zunaid Alam, Mahadi Hasan Fahim, Md Faysal Mahfuz

Comments 8 pages, 5 figures

2604.18085 2026-04-21 cs.LG

Predicting LLM Compression Degradation from Spectral Statistics

Mingxue Xu

Comments Profoundly assisted by agentic AI

2604.16500 2026-04-21 cs.CV

Semantically Stable Image Composition Analysis via Saliency and Gradient Vector Flow Fusion

Armin Dadras, Robert Sablatnig, Franziska Proksa, Markus Seidl

Comments Accepted to ICPR 2026

2604.07328 2026-04-21 cs.LG

How to sketch a learning algorithm

Sam Gunn

Comments Improved presentation and simplified Algorithm 4

2512.02543 2026-04-21 cs.LG

Inference-Time Distillation: Cost-Efficient Agents Without Fine-Tuning or Manual Prompt Engineering

Vishnu Sarukkai, Asanshay Gupta, James Hong, Michaël Gharbi, Kayvon Fatahalian

Comments 21 pages, 4 figures

2511.14846 2026-04-21 cs.LG cs.AI cs.CL

Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization

Yifeng Ding, Hung Le, Songyang Han, Kangrui Ruan, Zhenghui Jin, Varun Kumar, Zijian Wang, Anoop Deoras

2511.02757 2026-04-21 cs.LG math.OC stat.ML

ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models

Lejs Deen Behric, Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil

2503.14324 2026-04-21 cs.CV cs.CL

DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies

Wei Song, Yuran Wang, Zijia Song, Yadong Li, Zenan Zhou, Long Chen, Jianhua Xu, Jiaqi Wang, Kaicheng Yu

2411.15115 2026-04-21 cs.CV cs.AI cs.CL

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement

Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

Comments Accepted to ACL 2026 Findings. Project page: https://video-repair.github.io

2406.04301 2026-04-21 cs.CV

Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry

Xinhai Chang, Kaichen Zhou

2604.18519 2026-04-21 cs.AI

LLM Safety From Within: Detecting Harmful Content with Internal Representations

Difan Jiao, Yilun Liu, Ye Yuan, Zhenwei Tang, Linfeng Du, Haolun Wu, Ashton Anderson

Comments 17 pages,10 figures,6 tables

2604.18512 2026-04-21 cs.CV

S2H-DPO: Hardness-Aware Preference Optimization for Vision-Language Models

Nitish Shukla, Surgan Jandial, Arun Ross

2604.18493 2026-04-21 cs.LG

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

Zhenwen Liang, Yujun Zhou, Sidi Lu, Xiangliang Zhang, Haitao Mi, Dong Yu

Comments ACL 2026 Main Paper

2604.18492 2026-04-21 cs.LG cs.SY eess.SY

Barrier-enforced multi-objective optimization for direct point and sharp interval forecasting

Worachit Amnuaypongsa, Yotsapat Suparanonrat, Pana Wanitchollakit, Jitkomut Songsiri

Comments 25 pages, 12 figures, 3 tables

2604.18490 2026-04-21 cs.CL cs.AI

LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation

Samar M. Magdy, Fakhraddin Alwajih, Abdellah El Mekki, Wesam El-Sayed, Muhammad Abdul-Mageed

Comments Accepted to ACL 2026; resources available at https://github.com/UBC-NLP/LQM_MT

2604.18489 2026-04-21 cs.SD cs.CL eess.AS

Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints

Hao Meng, Siyuan Zheng, Shuran Zhou, Qiangqiang Wang, Yang Song

Comments Accepted by IEEE ICASSP 2026

2604.18487 2026-04-21 cs.CL cs.AI

Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety

Marcello Galisai, Susanna Cifani, Francesco Giarrusso, Piercosma Bisconti, Matteo Prandi, Federico Pierucci, Federico Sartore, Daniele Nardi

2604.18484 2026-04-21 cs.CV cs.MM cs.RO

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments

Kangan Qian, ChuChu Xie, Yang Zhong, Jingrui Pang, Siwen Jiao, Sicong Jiang, Zilin Huang, Yunlong Wang, Kun Jiang, Mengmeng Yang, Hao Ye, Guanghao Zhang, Hangjun Ye, Guang Chen, Long Chen, Diange Yang

Comments 15 pages, 5 figures

2604.18478 2026-04-21 cs.AI cs.CL

WorldDB: A Vector Graph-of-Worlds Memory Engine with Ontology-Aware Write-Time Reconciliation

Harish Santhanalakshmi Ganesan