arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.18584 2026-04-21 cs.AI cs.DL cs.IR cs.LG

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

Comments ICLR 2026; Website: http://mathnet.mit.edu

详情

Journal ref: Proceedings of the International Conference on Learning Representations (ICLR), 2026

英文摘要

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and two decades of competitions, comprising 30,676 expert-authored problems with solutions across diverse domains. In addition to the core dataset, we construct a retrieval benchmark consisting of mathematically equivalent and structurally similar problem pairs curated by human experts. MathNet supports three tasks: (i) Problem Solving, (ii) Math-Aware Retrieval, and (iii) Retrieval-Augmented Problem Solving. Experimental results show that even state-of-the-art reasoning models (78.4% for Gemini-3.1-Pro and 69.3% for GPT-5) remain challenged, while embedding models struggle to retrieve equivalent problems. We further show that retrieval-augmented generation performance is highly sensitive to retrieval quality; for example, DeepSeek-V3.2-Speciale achieves gains of up to 12%, obtaining the highest scores on the benchmark. MathNet provides the largest high-quality Olympiad dataset together with the first benchmark for evaluating mathematical problem retrieval, and we publicly release both the dataset and benchmark at https://mathnet.mit.edu.

URL PDF HTML ☆

赞 0 踩 0

2604.18583 2026-04-21 cs.CV

MUA: Mobile Ultra-detailed Animatable Avatars

Heming Zhu, Guoxing Sun, Marc Habermann

Comments Project page: https://vcai.mpi-inf.mpg.de/projects/MUA/

详情

英文摘要

Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed along two directions: improving the fidelity of dynamic geometry and appearance, or reducing computational complexity to enable deployment on resource-constrained platforms, e.g., VR headsets. However, existing approaches fail to achieve both goals simultaneously: Ultra-high-fidelity avatars typically require substantial computation on server-class GPUs, whereas lightweight avatars often suffer from limited surface dynamics, reduced appearance details, and noticeable artifacts. To bridge this gap, we propose a novel animatable avatar representation, termed Wavelet-guided Multi-level Spatial Factorized Blendshapes, and a corresponding distillation pipeline that transfers motion-aware clothing dynamics and fine-grained appearance details from a pre-trained ultra-high-quality avatar model into a compact, efficient representation. By coupling multi-level wavelet spectral decomposition with low-rank structural factorization in texture space, our method achieves up to 2000X lower computational cost and a 10X smaller model size than the original high-quality teacher avatar model, while preserving visually plausible dynamics and appearance details closely resemble those of the teacher model. Extensive comparisons with state-of-the-art methods show that our approach significantly outperforms existing avatar approaches designed for mobile settings and achieves comparable or superior rendering quality to most approaches that can only run on servers. Importantly, our representation substantially improves the practicality of high-fidelity avatars for immersive applications, achieving over 180 FPS on a desktop PC and real-time native on-device performance at 24 FPS on a standalone Meta Quest 3.

URL PDF HTML ☆

赞 0 踩 0

2604.18575 2026-04-21 cs.CV

ReCap: Lightweight Referential Grounding for Coherent Story Visualization

Aditya Arora, Akshita Gupta, Pau Rodriguez, Marcus Rohrbach

Comments Diffusion Models, Story Visualization

详情

英文摘要

Story Visualization aims to generate a sequence of images that faithfully depicts a textual narrative that preserve character identity, spatial configuration, and stylistic coherence as the narratives unfold. Maintaining such cross-frame consistency has traditionally relied on explicit memory banks, architectural expansion, or auxiliary language models, resulting in substantial parameter growth and inference overhead. We introduce ReCap, a lightweight consistency framework that improves character stability and visual fidelity without modifying the base diffusion backbone. ReCap's CORE (COnditional frame REferencing) module treats anaphors, in our case pronouns, as visual anchors, activating only when characters are referred to by a pronoun and conditioning on the preceding frame to propagate visual identity. This selective design avoids unconditional cross-frame conditioning and introduces only 149K additional parameters, a fraction of the cost of memory-bank and LLM-augmented approaches. To further stabilize identity, we incorporate SemDrift (Guided Semantic Drift Correction) applied only during training. When text is vague or referential, the denoiser lacks a visual anchor for identity-defining attributes, causing character appearance to drift across frames, SemDrift corrects this by aligning denoiser representations with pretrained DINOv3 visual embeddings, enforcing semantic identity stability at zero inference cost. ReCap outperforms previous state-of-the-art, StoryGPT-V, on the two main benchmarks for story visualization by 2.63% Character-Accuracy on FlintstonesSV and by 5.65% on PororoSV, establishing a new state-of-the-art character consistency on both benchmarks. Furthermore, we extend story visualization to human-centric narratives derived from real films, demonstrating the capability of ReCap beyond stylized cartoon domains.

URL PDF HTML ☆

赞 0 踩 0

2604.18574 2026-04-21 cs.LG cs.AI

When Can LLMs Learn to Reason with Weak Supervision?

Salman Rahman, Jingyan Shen, Anna Mordvina, Hamid Palangi, Saadia Gabriel, Pavel Izmailov

2604.18573 2026-04-21 cs.CV

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

Savya Khosla, Sethuraman T, Aryan Chadha, Alex Schwing, Derek Hoiem

2604.18569 2026-04-21 stat.ML cs.LG

Revisiting Active Sequential Prediction-Powered Mean Estimation

Maria-Eleni Sfyraki, Jun-Kun Wang

Comments Published as a conference paper at ICLR 2026

2604.18567 2026-04-21 cs.LG cs.AI cs.CL

Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

Manan Gupta, Dhruv Kumar

Comments Under Review

2604.18565 2026-04-21 cs.SI

Detectability of minority communities in networks

Jiaze Li, Leto Peel

Comments 21 pages, 16 figures

2604.18563 2026-04-21 cs.CL

Dual Alignment Between Language Model Layers and Human Sentence Processing

Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki, Ethan Gotlieb Wilcox

Comments ACL 2026 main

2604.18559 2026-04-21 q-bio.BM cs.LG

ConforNets: Latents-Based Conformational Control in OpenFold3

Minji Lee, Colin Kalicki, Minkyu Jeon, Aymen Qabel, Alisia Fadini, Mohammed AlQuraishi

2604.18549 2026-04-21 cs.CV

Advancing Vision Transformer with Enhanced Spatial Priors

Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He

Comments Accepted by TPAMI2026

2604.18548 2026-04-21 cs.LG q-bio.QM

Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems

William Lavery, Jodie A. Cochrane, Christian Olesen, Dagim S. Tadele, John T. Nardini, Sara Hamis

2604.18547 2026-04-21 stat.ML cs.CL cs.LG

FUSE: Ensembling Verifiers with Zero Labeled Data

Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Candès

2604.18546 2026-04-21 cs.LG eess.SP math.OC

Wasserstein Distributionally Robust Risk-Sensitive Estimation via Conditional Value-at-Risk

Feras Al Taha, Eilyan Bitar

Comments 6 pages, 2 figures

2604.18540 2026-04-21 math.AP cs.LG math.FA math.OC

Duality for the Adversarial Total Variation

Leon Bungert, Lucas Schmitt

Comments 39 pages

2604.18539 2026-04-21 cs.CL cs.AI

Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations

Eric Rudolph, Philipp Steigerwald, Jens Albrecht

Comments Accepted as ACL findings paper

2604.18538 2026-04-21 cs.HC

Fast and Forgettable: A Controlled Study of Novices' Performance, Learning, Workload, and Emotion in AI-Assisted and Human Pair Programming Paradigms

Nicholas Gardella, James Prather, Juho Leinonen, Paul Denny, Raymond Pettit, Sara L. Riggs

Comments for online appendices, see https://doi.org/10.5281/zenodo.19665253

2604.18537 2026-04-21 cs.CV

MetaCloak-JPEG: JPEG-Robust Adversarial Perturbation for Preventing Unauthorized DreamBooth-Based Deepfake Generation

Tanjim Rahaman Fardin, S M Zunaid Alam, Mahadi Hasan Fahim, Md Faysal Mahfuz

Comments 8 pages, 5 figures

2604.18536 2026-04-21 math.NA cs.NA physics.flu-dyn

A differentiable software suite for accelerated simulation of turbulent flows

Syver Døving Agdestein, Benjamin Sanderse

Comments 22 pages, 19 figures

2604.18532 2026-04-21 cs.LO cs.AI cs.FL

Symbolic Synthesis for LTLf+ Obligations

Giuseppe De Giacomo, Christian Hagemeier, Daniel Hausmann, Nir Piterman

2604.18529 2026-04-21 cs.PF cs.DC

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

Mao Lin, Xi Wang, Guilherme Cox, Dong Li, Hyeran Jeon

2604.18525 2026-04-21 cs.SE

Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts

Tamás Aladics, Norbert Vándor, Rudolf Ferenc, Péter Hegedűs

2604.18523 2026-04-21 cond-mat.dis-nn cs.IT math.IT math.ST stat.TH

BBP transition and the leading eigenvector of the spiked Wigner model with inhomogeneous noise

Leonardo S. Ferreira, Fernando L. Metz

Comments 21 pages, 7 figures

2604.18441 2026-04-21 math.ST cs.LG stat.ML stat.TH

Conformal Robust Set Estimation

Alejandro Cholaquidis, Emilien Joly, Leonardo Moreno

2604.18085 2026-04-21 cs.LG

Predicting LLM Compression Degradation from Spectral Statistics

Mingxue Xu

Comments Profoundly assisted by agentic AI

2604.17300 2026-04-21 eess.IV cs.AI cs.CV

Chaos-Enhanced Prototypical Networks for Few-Shot Medical Image Classification

Chinthakuntla Meghan Sai, Murarisetty V Sai Kartheek, Sita Devi Bharatula, Karthik Seemakurthy

2604.16835 2026-04-21 q-fin.ST cs.AI cs.LG

The CTLNet for Shanghai Composite Index Prediction

Haibin Jiao

2604.16500 2026-04-21 cs.CV

Semantically Stable Image Composition Analysis via Saliency and Gradient Vector Flow Fusion

Armin Dadras, Robert Sablatnig, Franziska Proksa, Markus Seidl

Comments Accepted to ICPR 2026

2604.16229 2026-04-21 eess.SY cs.SY

Simulating Arbitrage Optimization for Market Monitoring in Gas and Electricity Transmission Networks

Noah Rhodes, Sachin Shivakumar, Luke S. Baker, Kaarthik Sundar, Anatoly Zlotnik

2604.15249 2026-04-21 cs.CR

Structural Dependency Analysis for Masked NTT Hardware: Scalable Pre-Silicon Verification of Post-Quantum Cryptographic Accelerators

Ray Iskander, Khaled Kirah

Comments 36 pages, 4 figures