arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03647 2026-04-09 cs.CV cs.AI

Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling

Yunyao Yu, Zhengxian Wu, Zhuohong Chen, Hangrui Xu, Zirui Liao, Xiangwen Deng, Zhifang Liu, Senyuan Shi, Haoqian Wang

Comments 16 pages, 6 figures

详情

英文摘要

In the unsupervised self-evolution of Multimodal Large Language Models, the quality of feedback signals during post-training is pivotal for stable and effective learning. However, existing self-evolution methods predominantly rely on majority voting to select the most frequent output as the pseudo-golden answer, which may stem from the model's intrinsic biases rather than guaranteeing the objective correctness of the reasoning paths. To counteract the degradation, we propose Continuous Softened Retracing reSampling (CSRS) in MLLM self-evolution. Specifically, we introduce a Retracing Re-inference Mechanism (RRM) that the model re-inferences from anchor points to expand the exploration of long-tail reasoning paths. Simultaneously, we propose Softened Frequency Reward (SFR), which replaces binary rewards with continuous signals, calibrating reward based on the answers' frequency across sampled reasoning sets. Furthermore, incorporated with Visual Semantic Perturbation (VSP), CSRS ensures the model prioritizes mathematical logic over visual superficiality. Experimental results demonstrate that CSRS significantly enhances the reasoning performance of Qwen2.5-VL-7B on benchmarks such as MathVision. We achieve state-of-the-art (SOTA) results in unsupervised self-evolution on geometric tasks. Our code is avaible at https://github.com/yyy195/CSRS.

URL PDF HTML ☆

赞 0 踩 0

2604.03336 2026-04-09 cs.LG eess.SP

NativeTernary: A Self-Delimiting Binary Encoding with Unary Run-Length Hierarchy Markers for Ternary Neural Network Weights, Structured Data, and General Computing Infrastructure

Maharshi Savdhariya

Comments v2: benchmark results added. Real BitNet b1.58 2B4T architecture analysis: NativeTernary framing overhead 460x smaller than GGUF tensor headers (91 bytes vs 42KB). 1.31x smaller than GGUF Q2_K. C implementation: https://github.com/sm45118/nativeternary

2604.03128 2026-04-09 cs.LG cs.CL

Self-Distilled RLVR

Chenxu Yang, Chuanyu Qin, Qingyi Si, Minghui Chen, Naibin Gu, Dingyu Yao, Zheng Lin, Weiping Wang, Jiaqi Wang, Nan Duan

Comments Work in progress

2604.03044 2026-04-09 cs.CL cs.AI

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

Aichen Cai, Anmeng Zhang, Anyu Li, Bo Zhang, Bohua Cai, Chang Li, Changjian Jiang, Changkai Lu, Chao Xue, Chaocai Liang, Cheng Zhang, Dongkai Liu, Fei Wang, Guoqiang Huang, Haijian Ke, Han Lin, Hao Wang, Ji Miao, Jiacheng Zhang, Jialong Shi, Jifeng Zhu, Jingjing Qian, Junhui Luo, Junwu Xiong, Lam So, Liang Huang, Ming Ke, Mingyang Li, Panfeng Shi, Peng Hao, Qi Wang, Qian Lai, Qiaoqiao Yuan, Qingyu Yin, Qiong Cao, Qixiang Wang, Rongcheng Bian, Rongduo Han, Shaoqiang Zheng, Shi Hu, Shi Suo, Shijie Ren, Shijin Zhang, Shiying Fan, Shuai Xie, Tianyi Zhang, Wei Liu, Wentao Tan, Xianghan Meng, Xiaodong He, Xing Pan, Xiran Wang, Xuyang Peng, Ya Zhang, Yang Liu, Yangyang Duan, Yanxu Chen, Yicheng Gong, Yidan Huang, Yifei Liu, Yinhao Bai, Yongqiang Liu, Yuesong Zhang, Yuqi Zhang, Zerui Xie, Zhenfang Wang, Zhennan Shen, Zheyuan Liu, Zhuwei Zeng

Comments Xiaodong He is the corresponding author

2604.02996 2026-04-09 cs.CV

Rendering Multi-Human and Multi-Object with 3D Gaussian Splatting

Weiquan Wang, Jun Xiao, Feifei Shao, Yi Yang, Yueting Zhuang, Long Chen

Comments 8 pages, 4 figures, accepted by ICRA 2026

2604.02643 2026-04-09 cs.RO

Differentiable SpaTiaL: Symbolic Learning and Reasoning with Geometric Temporal Logic for Manipulation Tasks

Licheng Luo, Kaier Liang, Cristian-Ioan Vasile, Mingyu Cai

Comments Code available at: https://github.com/plen1lune/DiffSpaTiaL

2604.02387 2026-04-09 cs.RO

A Dynamic Toolkit for Transmission Characteristics of Precision Reducers with Explicit Contact Geometry

Jiacheng Miao, Chao Liu, Qiliang Wang, Yunhui Guan, Weidong He

Comments 21 pages, 8 figures

2604.01972 2026-04-09 cs.CV

SDesc3D: Towards Layout-Aware 3D Indoor Scene Generation from Short Descriptions

Jie Feng, Jiawei Shen, Junjia Huang, Junpeng Zhang, Mingtao Feng, Weisheng Dong, Guanbin Li

2604.01840 2026-04-09 cs.AI

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Zekai Ye, Qiming Li, Xiaocheng Feng, Ruihan Chen, Ziming Li, Haoyu Ren, Kun Chen, Dandan Tu, Bing Qin

2604.00239 2026-04-09 cs.CL

A Taxonomy of Programming Languages for Code Generation

Nishat Raihan, Christian Newman, Marcos Zampieri

2603.29908 2026-04-09 cs.AI

C-TRAIL: A Commonsense World Framework for Trajectory Planning in Autonomous Driving

Zhihong Cui, Haoran Tang, Tianyi Li, Yushuai Li, Peiyuan Guan, Amir Taherkordi, Tor Skeie

2603.29441 2026-04-09 cs.CV

EarthEmbeddingExplorer: A Web Application for Cross-Modal Retrieval of Global Satellite Images

Yijie Zheng, Weijie Wu, Bingyue Wu, Long Zhao, Guoqing Li, Mikolaj Czerkawski, Konstantin Klemmer

Comments ICLR 2026 Workshop ML4RS Tutorial Track (oral)

2603.28253 2026-04-09 cs.LG cs.AI

MR-ImagenTime: Multi-Resolution Time Series Generation through Dual Image Representations

Xianyong Xu, Yuanjun Zuo, Zhihong Huang, Yihan Qin, Haoxian Xu, Leilei Du, Haotian Wang

2603.28049 2026-04-09 cs.CV

Drift-AR: Single-Step Visual Autoregressive Generation via Anti-Symmetric Drifting

Zhen Zou, Xiaoxiao Ma, Mingde Yao, Jie Huang, LinJiang Huang, Feng Zhao

2603.27253 2026-04-09 cs.CL

Mitigating Hallucination on Hallucination in RAG via Ensemble Voting

Zequn Xie, Zhengyang Sun

Comments arXiv admin note: text overlap with arXiv:2505.18581 by other authors

2603.27105 2026-04-09 cs.CV

UniDAC: Universal Metric Depth Estimation for Any Camera

Girish Chandar Ganesan, Yuliang Guo, Liu Ren, Xiaoming Liu

2603.26588 2026-04-09 cs.CV cs.LG

From Synthetic Data to Real Restorations: Diffusion Model for Patient-specific Dental Crown Completion

Dávid Pukanec, Tibor Kubík, Michal Španěl

Comments VISAPP 2026 Conference / CVPR Workshop GenRecon3D

2603.26167 2026-04-09 cs.CV cs.CR

Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication

Yi Zhang, Hongbo Huang, Liang-Jie Zhang

Comments Accepted by CVPR 2026 Findings

2603.24480 2026-04-09 cs.CV cs.HC cs.IR

Positive-First Most Ambiguous: A Simple Active Learning Criterion for Interactive Retrieval of Rare Categories

Kawtar Zaher, Olivier Buisson, Alexis Joly

Comments CVPRW 2026 - The 13th Workshop on Fine-Grained Visual Categorization (FGVC13)

详情

英文摘要

Real-world fine-grained visual retrieval often requires discovering a rare concept from large unlabeled collections with minimal supervision. This is especially critical in biodiversity monitoring, ecological studies, and long-tailed visual domains, where the target may represent only a tiny fraction of the data, creating highly imbalanced binary problems. Interactive retrieval with relevance feedback offers a practical solution: starting from a small query, the system selects candidates for binary user annotation and iteratively refines a lightweight classifier. While Active Learning (AL) is commonly used to guide selection, conventional AL assumes symmetric class priors and large annotation budgets, limiting effectiveness in imbalanced, low-budget, low-latency settings. We introduce Positive-First Most Ambiguous (PF-MA), a simple yet effective AL criterion that explicitly addresses the class imbalance asymmetry: it prioritizes near-boundary samples while favoring likely positives, enabling rapid discovery of subtle visual categories while maintaining informativeness. Unlike standard methods that oversample negatives, PF-MA consistently returns small batches with a high proportion of relevant samples, improving early retrieval and user satisfaction. To capture retrieval diversity, we also propose a class coverage metric that measures how well selected positives span the visual variability of the target class. Experiments on long-tailed datasets, including fine-grained botanical data, demonstrate that PF-MA consistently outperforms strong baselines in both coverage and classifier performance, across varying class sizes and descriptors. Our results highlight that aligning AL with the asymmetric and user-centric objectives of interactive fine-grained retrieval enables simple yet powerful solutions for retrieving rare and visually subtle categories in realistic human-in-the-loop settings.

URL PDF HTML ☆

赞 0 踩 0

2603.20284 2026-04-09 cs.CV cs.GR eess.IV

STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction

Runze Wang, Yuxuan Song, Youcheng Cai, Ligang Liu

Comments 10 pages, 6 figures. Accepted by CVPR 2026. This version includes supplementary material

2603.17812 2026-04-09 cs.CV cs.AI cs.LG

ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation

Dmitriy Rivkin, Parker Ewen, Lili Gao, Julian Ost, Stefanie Walz, Rasika Kangutkar, Mario Bijelic, Felix Heide

Comments Project website: https://light.princeton.edu/chopgrad

2603.13354 2026-04-09 cs.CV cs.LG

AgriPath: A Systematic Exploration of Architectural Trade-offs for Crop Disease Classification

Hamza Mooraj, George Pantazopoulos, Alessandro Suglia

Comments 11 pages main text, 24 pages total including references and appendix. 6 figures, 14 tables. Code and dataset will be released upon publication

2603.11703 2026-04-09 cs.LG

EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering

Nicolas Deutschmann, Constance Ferragu, Jonathan D. Ziegler, Shayan Aziznejad, Eli Bixby

Comments Accepted at Workshop on Foundation Models for Science: Real-World Impact and Science-First Design, ICLR 2026

2603.11090 2026-04-09 cs.LG stat.ME

Interventional Time Series Priors for Causal Foundation Models

Dennis Thumm, Ying Chen

Comments ICLR 2026 1st Workshop on Time Series in the Age of Large Models (TSALM)

2603.10658 2026-04-09 cs.CV

How to Embed Matters: Evaluation of EO Embedding Design Choices

Luis Gilch, Isabelle Wittmann, Maximilian Nitsche, Johannes Jakubik, Arne Ewald, Thomas Brunschwiler

2603.10512 2026-04-09 cs.AI cs.LG cs.NE

Resource-constrained Amazons chess decision framework integrating large language models and graph attention

Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Leszek Rutkowski

Comments 20 pages, 15 figures. Supported by the National Key Research and Development Project of China (No. 2020YFA0714300), NSFC (No. 61833005, 12061088), the Open Project of Key Laboratory of Transport Industry of Comprehensive Transportation Theory (Nanjing Modern Multimodal Transportation Laboratory) (MTF2023004), and the China Postdoctoral Science Foundation (2024T170129, GZC20240261)

2603.10477 2026-04-09 cs.CL

PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses

Minki Hong, Eunsoo Lee, Sohyun Park, Jihie Kim

Comments This is a preprint version of a paper accepted to IEEE Access. The final published version is available at DOI: 10.1109/ACCESS.2026.3679809

详情

DOI: 10.1109/ACCESS.2026.3679809

英文摘要

Prompt design is a primary control interface for large language models (LLMs), yet standard evaluations largely reduce performance to answer correctness, obscuring why a prompt succeeds or fails and providing little actionable guidance. We propose PEEM (Prompt Engineering Evaluation Metrics), a unified framework for joint and interpretable evaluation of both prompts and responses. PEEM defines a structured rubric with 9 axes: 3 prompt criteria (clarity/structure, linguistic quality, fairness) and 6 response criteria (accuracy, coherence, relevance, objectivity, clarity, conciseness), and uses an LLM-based evaluator to output (i) scalar scores on a 1-5 Likert scale and (ii) criterion-specific natural-language rationales grounded in the rubric. Across 7 benchmarks and 5 task models, PEEM's accuracy axis strongly aligns with conventional accuracy while preserving model rankings (aggregate Spearman rho about 0.97, Pearson r about 0.94, p < 0.001). A multi-evaluator study with four models shows consistent relative judgments (pairwise rho = 0.68-0.85), supporting evaluator-agnostic deployment. Beyond alignment, PEEM captures complementary linguistic failure modes and remains informative under prompt perturbations: prompt-quality trends track downstream accuracy under iterative rewrites, semantic adversarial manipulations induce clear score degradation, and meaning-preserving paraphrases yield high stability (robustness rate about 76.7-80.6%). Finally, using only PEEM scores and rationales as feedback, a zero-shot prompt rewriting loop improves downstream accuracy by up to 11.7 points, outperforming supervised and RL-based prompt-optimization baselines. Overall, PEEM provides a reproducible, criterion-driven protocol that links prompt formulation to response behavior and enables systematic diagnosis and optimization of LLM interactions.

URL PDF HTML ☆

赞 0 踩 0

2603.09677 2026-04-09 cs.AI

Logics-Parsing-Omni Technical Report

Xin An, Jingyi Cai, Xiangyang Chen, Huayao Liu, Peiting Liu, Peng Wang, Bei Yang, Xiuwen Zhu, Yongfan Chen, Yan Gao, Yuan Gao, Baoyu Hou, Guangzheng Hu, Shuzhao Li, Weixu Qiao, Weidong Ren, Yanan Wang, Boyu Yang, Fan Yang, Jiangtao Zhang, Lixin Zhang, Lin Qu, Hu Wei, Xiaoxiao Xu, Bing Zhao

2603.04300 2026-04-09 cs.LG

LUMINA: Foundation Models for Topology Transferable ACOPF

Yijiang Li, Zeeshan Memon, Hongwei Jin, Stefano Fenu, Keunju Song, Sunash B Sharma, Parfait Gasana, Hongseok Kim, Liang Zhao, Kibaek Kim

2603.04165 2026-04-09 cs.CV cs.AI

PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters

Yinghong Yu, Guangyuan Li, Jiancheng Yang