arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.12155 2026-03-13 cs.CV cs.AI

GlyphBanana: Advancing Precise Text Rendering Through Agentic Workflows

Zexuan Yan, Jiarui Jin, Yue Ma, Shijian Wang, Jiahui Hu, Wenxiang Jiao, Yuan Lu, Linfeng Zhang

详情

英文摘要

Despite recent advances in generative models driving significant progress in text rendering, accurately generating complex text and mathematical formulas remains a formidable challenge. This difficulty primarily stems from the limited instruction-following capabilities of current models when encountering out-of-distribution prompts. To address this, we introduce GlyphBanana, alongside a corresponding benchmark specifically designed for rendering complex characters and formulas. GlyphBanana employs an agentic workflow that integrates auxiliary tools to inject glyph templates into both the latent space and attention maps, facilitating the iterative refinement of generated images. Notably, our training-free approach can be seamlessly applied to various Text-to-Image (T2I) models, achieving superior precision compared to existing baselines. Extensive experiments demonstrate the effectiveness of our proposed workflow. Associated code is publicly available at https://github.com/yuriYanZeXuan/GlyphBanana.

URL PDF HTML ☆

赞 0 踩 0

2603.12152 2026-03-13 cs.CL

LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation

Feiyu Duan, Xuanjing Huang, Zhongyu Wei

2603.12151 2026-03-13 cs.LG cs.AI

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Zhoujun Cheng, Yutao Xie, Yuxiao Qu, Amrith Setlur, Shibo Hao, Varad Pimpalkhute, Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor Killian, Aviral Kumar

Comments 29 pages, 27 figures. Under review

2603.12149 2026-03-13 cs.CV cs.CL

Linking Perception, Confidence and Accuracy in MLLMs

Yuetian Du, Yucheng Wang, Rongyu Zhang, Zhijie Xu, Boyu Yang, Ming Kong, Jie Liu, Qiang Zhu

Comments Accepted by CVPR2026

2603.12147 2026-03-13 cs.CV

EgoIntent: An Egocentric Step-level Benchmark for Understanding What, Why, and Next

Ye Pan, Chi Kit Wong, Yuanhuiyi Lyu, Hanqian Li, Jiahao Huo, Jiacheng Chen, Lutao Jiang, Xu Zheng, Xuming Hu

2603.12146 2026-03-13 cs.CV cs.AI cs.LG cs.MM

FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

Quanhao Li, Zhen Xing, Rui Wang, Haidong Cao, Qi Dai, Daoguo Dong, Zuxuan Wu

Comments Accepted by CVPR2026

2603.12144 2026-03-13 cs.CV cs.RO eess.IV

O3N: Omnidirectional Open-Vocabulary Occupancy Prediction

Mengfei Duan, Hao Shi, Fei Teng, Guoqiang Zhao, Yuheng Zhang, Zhiyong Li, Kailun Yang

Comments The source code will be made publicly available at https://github.com/MengfeiD/O3N

2603.12138 2026-03-13 cs.CV

HATS: Hardness-Aware Trajectory Synthesis for GUI Agents

Rui Shao, Ruize Gao, Bin Xie, Yixing Li, Kaiwen Zhou, Shuai Wang, Weili Guan, Gongwei Chen

Comments Accepted by CVPR 2026

2603.12133 2026-03-13 cs.AI cs.CL

TopoBench: Benchmarking LLMs on Hard Topological Reasoning

Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresque, Noel O'Connor, Fergal Reid

Comments Accepted, Workshop on Logical Reasoning of Large Language Models at ICLR 2026

2603.12129 2026-03-13 cs.AI cs.CY cs.SI econ.GN physics.soc-ph q-fin.EC

Increasing intelligence in AI agents can worsen collective outcomes

Neil F. Johnson

2603.12126 2026-03-13 cs.CV cs.LG

Hoi3DGen: Generating High-Quality Human-Object-Interactions in 3D

Agniv Sharma, Xianghui Xie, Tom Fischer, Eddy Ilg, Gerard Pons-Moll

2603.12123 2026-03-13 cs.CL

Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions

Tae-Eun Song

Comments 10 pages, 2 figures, 8 tables

2603.12120 2026-03-13 cs.RO cs.AI cs.CV

CRAFT: A Tendon-Driven Hand with Hybrid Hard-Soft Compliance

Leo Lin, Shivansh Patel, Jay Moon, Svetlana Lazebnik, Unnat Jain

2603.12117 2026-03-13 cs.CL cs.AI

SommBench: Assessing Sommelier Expertise of Language Models

William Brach, Tomas Bedej, Jacob Nielsen, Jacob Pichna, Juraj Bedej, Eemeli Saarensilta, Julie Dupouy, Gianluca Barmina, Andrea Blasi Núñez, Peter Schneider-Kamp, Kristian Košťál, Michal Ries, Lukas Galke Poech

详情

英文摘要

With the rapid advances of large language models, it becomes increasingly important to systematically evaluate their multilingual and multicultural capabilities. Previous cultural evaluation benchmarks focus mainly on basic cultural knowledge that can be encoded in linguistic form. Here, we propose SommBench, a multilingual benchmark to assess sommelier expertise, a domain deeply grounded in the senses of smell and taste. While language models learn about sensory properties exclusively through textual descriptions, SommBench tests whether this textual grounding is sufficient to emulate expert-level sensory judgment. SommBench comprises three main tasks: Wine Theory Question Answering (WTQA), Wine Feature Completion (WFC), and Food-Wine Pairing (FWP). SommBench is available in multiple languages: English, Slovak, Swedish, Finnish, German, Danish, Italian, and Spanish. This helps separate a language model's wine expertise from its language skills. The benchmark datasets were developed in close collaboration with a professional sommelier and native speakers of the respective languages, resulting in 1,024 wine theory question-answering questions, 1,000 wine feature-completion examples, and 1,000 food-wine pairing examples. We provide results for the most popular language models, including closed-weights models such as Gemini 2.5, and open-weights models, such as GPT-OSS and Qwen 3. Our results show that the most capable models perform well on wine theory question answering (up to 97% correct with a closed-weights model), yet feature completion (peaking at 65%) and food-wine pairing show (MCC ranging between 0 and 0.39) turn out to be more challenging. These results position SommBench as an interesting and challenging benchmark for evaluating the sommelier expertise of language models. The benchmark is publicly available at https://github.com/sommify/sommbench.

URL PDF HTML ☆

赞 0 踩 0

2603.12110 2026-03-13 cs.LG cs.AI

Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives

Taeho Lee, Donghwan Lee

2603.12108 2026-03-13 cs.CV

EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation

Yan Li, Ning Liao, Xiangyu Zhao, Shaofeng Zhang, Xiaoxing Wang, Yifan Yang, Junchi Yan, Xue Yang

2603.12105 2026-03-13 cs.CL

To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times

Thomas Hikaru Clark, Carlos Arriaga, Javier Conde, Gonzalo Martínez, Pedro Reviriego

2603.12099 2026-03-13 cs.RO

Towards Dynamic Model Identification and Gravity Compensation for the dVRK-Si Patient Side Manipulator

Haoying Zhou, Hao Yang, Brendan Burkhart, Anton Deguet, Loris Fichera, Gregory S. Fischer, Jie Ying Wu, Peter Kazanzides

Comments Submitted to IEEE Transactions on Medical Robotics and Bionics (T-MRB), under review. Open-source GitHub Repo: https://github.com/jhu-dvrk/dvrk_psm_dynamics_identification

2603.12096 2026-03-13 cs.AI

A Robust and Efficient Multi-Agent Reinforcement Learning Framework for Traffic Signal Control

Sheng-You Huang, Hsiao-Chuan Chang, Yen-Chi Chen, Ting-Han Wei, I-Hau Yeh, Sheng-Yao Kuan, Chien-Yao Wang, Hsuan-Han Lee, I-Chen Wu

Comments 12 pages, 4 tables, 8 figures. Under review in the 31st ITS World Congress 2026

2603.12091 2026-03-13 cs.LG cs.AI

Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

Xiaojie Gu, Dmitry Ignatov, Radu Timofte

2603.12087 2026-03-13 cs.LG

Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics

Ming-Hong Chen, Kuan-Chen Pan, You-De Huang, Xi Liu, Ping-Chun Hsieh

Comments Accepted at ICLR 2026

2603.12083 2026-03-13 cs.CV cs.RO eess.IV physics.optics

Towards Universal Computational Aberration Correction in Photographic Cameras: A Comprehensive Benchmark Analysis

Xiaolong Qian, Qi Jiang, Yao Gao, Lei Sun, Zhonghua Yi, Kailun Yang, Luc Van Gool, Kaiwei Wang

Comments Accepted to CVPR 2026. Benchmarks, codes, and Zemax files will be available at https://github.com/XiaolongQian/UniCAC

2603.12075 2026-03-13 cs.RO cs.SY eess.SY

Decentralized Cooperative Localization for Multi-Robot Systems with Asynchronous Sensor Fusion

Nivand Khosravi, Niusha Khosravi, Mohammad Bozorg, Masoud S. Bahraini

Comments Presented at the 13th RSI International Conference on Robotics and Mechatronics (ICRoM 2025)

2603.12073 2026-03-13 cs.LG cs.AI q-bio.GN

A Multi-Label Temporal Convolutional Framework for Transcription Factor Binding Characterization

Pietro Demurtas, Ferdinando Zanchetta, Giovanni Perini, Rita Fioresi

2603.12063 2026-03-13 cs.CV

NBAvatar: Neural Billboards Avatars with Realistic Hand-Face Interaction

David Svitov, Mahtab Dahaghin

2603.12060 2026-03-13 cs.LG cs.AI math.ST stat.ML stat.TH

Chemical Reaction Networks Learn Better than Spiking Neural Networks

Sophie Jaffard, Ivo F. Sbalzarini

Comments Keywords: Chemical Reaction Networks, Spiking Neural Networks, Supervised Learning, Classification, Mass-Action Kinetics, Statistical Learning Theory, Regret Bounds, Model Complexity

2603.12059 2026-03-13 cs.RO cs.SY eess.SY

Flight through Narrow Gaps with Morphing-Wing Drones

Julius Wanner, Hoang-Vu Phan, Charbel Toumieh, Dario Floreano

2603.12050 2026-03-13 cs.CL

Translationese as a Rational Response to Translation Task Difficulty

Maria Kunilovskaya

Comments 17 pages, submitted to ARR March 2026

2603.12038 2026-03-13 cs.LG cs.AI

Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability

Xingyu Xie, Zhaochen Yu, Yue Liao, Tao Wang, Kim-Chuan Toh, Shuicheng Yan

2603.12036 2026-03-13 cs.CV physics.optics

Single Pixel Image Classification using an Ultrafast Digital Light Projector

Aisha Kanwal, Graeme E. Johnstone, Fahimeh Dehkhoda, Johannes H. Herrnsdorf, Robert K. Henderson, Martin D. Dawson, Xavier Porte, Michael J. Strain