arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.04090 2026-04-07 cs.LG cs.AI

Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization

Xuelin Zhang, Hong Chen, Bin Gu, Tieliang Gong, Feng Zheng

详情

DOI: 10.24963/ijcai.2024/609

英文摘要

Stochastic bilevel optimization (SBO) has been integrated into many machine learning paradigms recently, including hyperparameter optimization, meta learning, and reinforcement learning. Along with the wide range of applications, there have been numerous studies on the computational behavior of SBO. However, the generalization guarantees of SBO methods are far less understood from the lens of statistical learning theory. In this paper, we provide a systematic generalization analysis of the first-order gradient-based bilevel optimization methods. Firstly, we establish the quantitative connections between the on-average argument stability and the generalization gap of SBO methods. Then, we derive the upper bounds of on-average argument stability for single-timescale stochastic gradient descent (SGD) and two-timescale SGD, where three settings (nonconvex-nonconvex (NC-NC), convex-convex (C-C), and strongly-convex-strongly-convex (SC-SC)) are considered respectively. Experimental analysis validates our theoretical findings. Compared with the previous algorithmic stability analysis, our results do not require reinitializing the inner-level parameters at each iteration and are applicable to more general objective functions.

URL PDF HTML ☆

赞 0 踩 0

2604.04088 2026-04-07 cs.CL cs.AI cs.CY cs.LG

Embedding Enhancement via Fine-Tuned Language Models for Learner-Item Cognitive Modeling

Yuanhao Liu, Zihan Zhou, Kaiying Wu, Shuo Liu, Yiyang Huang, Jiajun Guo, Aimin Zhou, Hong Qian

Comments Accepted by The ACM Web Conference 2026 (WWW '26)

详情

DOI: 10.1145/3774904.3792542

英文摘要

Learner-item cognitive modeling plays a central role in the web-based online intelligent education system by enabling cognitive diagnosis (CD) across diverse online educational scenarios. Although ID embedding remains the mainstream approach in cognitive modeling due to its effectiveness and flexibility, recent advances in language models (LMs) have introduced new possibilities for incorporating rich semantic representations to enhance CD performance. This highlights the need for a comprehensive analysis of how LMs enhance embeddings through semantic integration across mainstream CD tasks. This paper identifies two key challenges in fully leveraging LMs in existing work: Misalignment between the training objectives of LMs and CD models creates a distribution gap in feature spaces; A unified framework is essential for integrating textual embeddings across varied CD tasks while preserving the strengths of existing cognitive modeling paradigms to ensure the robustness of embedding enhancement. To address these challenges, this paper introduces EduEmbed, a unified embedding enhancement framework that leverages fine-tuned LMs to enrich learner-item cognitive modeling across diverse CD tasks. EduEmbed operates in two stages. In the first stage, we fine-tune LMs based on role-specific representations and an interaction diagnoser to bridge the semantic gap of CD models. In the second stage, we employ a textual adapter to extract task-relevant semantics and integrate them with existing modeling paradigms to improve generalization. We evaluate the proposed framework on four CD tasks and computerized adaptive testing (CAT) task, achieving robust performance. Further analysis reveals the impact of semantic information across diverse tasks, offering key insights for future research on the application of LMs in CD for online intelligent education systems.

URL PDF HTML ☆

赞 0 踩 0

2604.04086 2026-04-07 cs.CV

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

Dat Nguyen, Enjie Ghorbel, Anis Kacem, Marcella Astrid, Djamila Aouada

Comments Journal version of LAA-Net (CVPR 2024)

2604.04080 2026-04-07 cs.CV cs.AI cs.LG

Intelligent Traffic Monitoring with YOLOv11: A Case Study in Real-Time Vehicle Detection

Shkelqim Sherifi

Comments 2025 International Conference on Computer and Applications (ICCA)

2604.04071 2026-04-07 cs.CV

Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

V. Sevetlidis, V. Arampatzakis, M. Karta, I. Mourthos, D. Tsiafaki, G. Pavlidis

Comments Accepted at CAA 2026 International Conference

2604.04064 2026-04-07 cs.CL cs.AI

Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison

Jihoon Jeong

Comments 14 pages, 4 figures, 7 tables. Paper #6 in the Model Medicine series

2604.04063 2026-04-07 cs.CV

4C4D: 4 Camera 4D Gaussian Splatting

Junsheng Zhou, Zhifan Yang, Liang Han, Wenyuan Zhang, Kanle Shi, Shenkun Xu, Yu-Shen Liu

Comments Accepted by CVPR 2026. Project page: https://junshengzhou.github.io/4C4D

2604.04055 2026-04-07 cs.CV cs.RO

DINO-VO: Learning Where to Focus for Enhanced State Estimation

Qi Chen, Guanghao Li, Sijia Hu, Xin Gao, Junpeng Ma, Xiangyang Xue, Jian Pu

2604.04050 2026-04-07 cs.CV cs.LG

TORA: Topological Representation Alignment for 3D Shape Assembly

Nahyuk Lee, Zhiang Chen, Marc Pollefeys, Sunghwan Hong

2604.04043 2026-04-07 cs.CL

Emergent Inference-Time Semantic Contamination via In-Context Priming

Marcin Abram

Comments 6 pages, 2 figures, appendix

2604.04039 2026-04-07 cs.RO

Adapting Neural Robot Dynamics on the Fly for Predictive Control

Abdullah Altawaitan, Nikolay Atanasov

Comments This work has been submitted to the IEEE for possible publication

2604.04029 2026-04-07 cs.CV

ATSS: Detecting AI-Generated Videos via Anomalous Temporal Self-Similarity

Hang Wang, Chao Shen, Lei Zhang, Zhi-Qi Cheng

Comments 16 pages, 4 figures

2604.04020 2026-04-07 cs.CL cs.LG

Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models

Sailesh kiran kurra, Shiek Ruksana, Vishal Borusu

Comments Paper accepted for publication at IEEE International Conference on Emerging Computing and Intelligent Technologies 2026 (ICoECIT),5 Pages,5 figures,1 table

2604.04018 2026-04-07 cs.CV

1.x-Distill: Breaking the Diversity, Quality, and Efficiency Barrier in Distribution Matching Distillation

Haoyu Li, Tingyan Wen, Lin Qi, Zhe Wu, Yihuang Chen, Xing Zhou, Lifei Zhu, Xueqian Wang, Kai Zhang

Comments Project page: https://thu-accdiff.github.io/1.x-distill-page/

2604.04017 2026-04-07 cs.CL

GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces

Xinyu Geng, Yanjing Xiao, Yuyang Zhang, Hanwen Wang, Xinyan Liu, Rui Min, Tianqing Fang, Yi R. Fung

2604.04016 2026-04-07 cs.CV cs.AI

HOIGS: Human-Object Interaction Gaussian Splatting

Taewoo Kim, Suwoong Yeom, Jaehyun Pyun, Geonho Cha, Dongyoon Wee, Joonsik Nam, Yun-Seong Jeong, Kyeongbo Kong, Suk-Ju Kang

Comments 24 pages, 9 figures

2604.04013 2026-04-07 cs.CL

RUQuant: Towards Refining Uniform Quantization for Large Language Models

Han Liu, Haotian Gao, Changya Li, Feng Zhang, Xiaotong Zhang, Wei Wang, Hong Yu

Comments Accepted to KDD 2026. 12 pages, 9 figures

详情

DOI: 10.1145/3770854.3780259

英文摘要

The increasing size and complexity of large language models (LLMs) have raised significant challenges in deployment efficiency, particularly under resource constraints. Post-training quantization (PTQ) has emerged as a practical solution by compressing models without requiring retraining. While existing methods focus on uniform quantization schemes for both weights and activations, they often suffer from substantial accuracy degradation due to the non-uniform nature of activation distributions. In this work, we revisit the activation quantization problem from a theoretical perspective grounded in the Lloyd-Max optimality conditions. We identify the core issue as the non-uniform distribution of activations within the quantization interval, which causes the optimal quantization point under the Lloyd-Max criterion to shift away from the midpoint of the interval. To address this issue, we propose a two-stage orthogonal transformation method, RUQuant. In the first stage, activations are divided into blocks. Each block is mapped to uniformly sampled target vectors using composite orthogonal matrices, which are constructed from Householder reflections and Givens rotations. In the second stage, a global Householder reflection is fine-tuned to further minimize quantization error using Transformer output discrepancies. Empirical results show that our method achieves near-optimal quantization performance without requiring model fine-tuning: RUQuant achieves 99.8% of full-precision accuracy with W6A6 and 97% with W4A4 quantization for a 13B LLM, within approximately one minute. A fine-tuned variant yields even higher accuracy, demonstrating the effectiveness and scalability of our approach.

URL PDF HTML ☆

赞 0 踩 0

2604.04012 2026-04-07 cs.CV cs.LG

OASIC: Occlusion-Agnostic and Severity-Informed Classification

Kay Gijzen, Gertjan J. Burghouts, Daniël M. Pelt

Comments 14 pages, 5 figures

2604.03999 2026-04-07 cs.RO

Dynamic Whole-Body Dancing with Humanoid Robots -- A Model-Based Control Approach

Shibowen Zhang, Jiayang Wu, Guannan Liu, Helin Zhu, Junjie Liu, Zhehan Li, Junhong Guo, Xiaokun Leng, Hangxin Liu, Jingwen Zhang, Jikai Wang, Zonghai Chen, Zhicheng He, Jiayi Wang, Yao Su

2604.03998 2026-04-07 cs.RO

VA-FastNavi-MARL: Real-Time Robot Control with Multimedia-Driven Meta-Reinforcement Learning

Yang Zhang, Shengxi Jing, Fengxiang Wang, Yuan Feng, Hong Wang

Comments Accepted to the 2026 IEEE International Conference on Multimedia and Expo (ICME 2026)

2604.03995 2026-04-07 cs.CV cs.SD

A Systematic Study of Cross-Modal Typographic Attacks on Audio-Visual Reasoning

Tianle Chen, Deepti Ghadiyaram

2604.03993 2026-04-07 cs.LG cs.AI

Can LLMs Learn to Reason Robustly under Noisy Supervision?

Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen

详情

英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains reasoning models that rely on abundant perfect labels, but its vulnerability to unavoidable noisy labels due to expert scarcity remains critically underexplored. In this work, we take the first step toward a systematic analysis of noisy label mechanisms in RLVR. In contrast to supervised classification, most RLVR algorithms incorporate a rollout-based condition: a label's influence on training is contingent on whether the current policy can generate rollouts that realize it, a property that naturally extends to noisy labels. Based on this observation, we distinguish two types of noise: inactive noisy labels, which reduce data efficiency, and active noisy labels, which are reinforced and risk skewing the model toward incorrect distributions. From experiments on training with noisy samples, we identify an Early Correctness Coherence phenomenon: although noisy samples begin to lag behind in later stages, accuracy on both clean and noisy samples increases similarly in early training. Motivated by this dynamic, we propose Online Label Refinement (OLR), which progressively corrects potentially noisy labels with majority-voted answers when two conditions hold: a positive slope in the majority answer's rollout pass rate and stable historical consistency across updates, enabling gradual self-correction as the policy improves. We evaluate OLR on six in-distribution mathematical reasoning benchmarks (AIME24/25, AMC, MATH-500, Minerva, and Olympiad) and three out-of-distribution tasks (ARC-c, GPQA-diamond, and MMLU-pro). Across noise ratios from 0.1 to 0.9, OLR consistently improves robustness under both inactive and active noisy-label settings, achieving average gains of 3.6% to 3.9% on in-distribution benchmarks and 3.3% to 4.6% on out-of-distribution evaluations.

URL PDF HTML ☆

赞 0 踩 0

2604.03985 2026-04-07 cs.LG eess.SP stat.ML

Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals

Momoka Iida, Hayato Motohashi, Hirotaka Takahashi

Comments 27 pages, 16 figures, 14 tables

2604.03984 2026-04-07 cs.CV

High-Fidelity Mural Restoration via a Unified Hybrid Mask-Aware Transformer

Jincheng Jiang, Qianhao Han, Chi Zhang, Zheng Zheng

Comments 13 pages, 3 figures

2604.03981 2026-04-07 cs.LG stat.CO

Multirate Stein Variational Gradient Descent for Efficient Bayesian Sampling

Arash Sarshar

2604.03980 2026-04-07 cs.CV cs.AI

Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics

Minglei Chen, Weilong Wang, Jiang Duan, Ye Deng

2604.03976 2026-04-07 cs.AI cs.CE

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

Wenyue Hua, Tianyi Peng, Chi Wang, Ian Kaufman, Bryan Lim, Chandler Fang

Comments 30 pages, 9 figures

2604.03972 2026-04-07 cs.CV

Hierarchical Point-Patch Fusion with Adaptive Patch Codebook for 3D Shape Anomaly Detection

Xueyang Kang, Zizhao Li, Tian Lan, Dong Gong, Kourosh Khoshelham, Liangliang Nan

Comments 10 pages, 5 figures, 6 tables

2604.03964 2026-04-07 cs.AI

SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Shuaike Shen, Wenduo Cheng, Mingqian Ma, Alistair Turcan, Martin Jinye Zhang, Jian Ma

2604.03962 2026-04-07 cs.CL cs.LG

Predict, Don't React: Value-Based Safety Forecasting for LLM Streaming

Pride Kavumba, Koki Wataoka, Huy H. Nguyen, Jiaxuan Li, Masaya Ohagi

详情

英文摘要

In many practical LLM deployments, a single guardrail is used for both prompt and response moderation. Prompt moderation operates on fully observed text, whereas streaming response moderation requires safety decisions to be made over partial generations. Existing text-based streaming guardrails commonly frame this output-side problem as boundary detection, training models to identify the earliest prefix at which a response has already become unsafe. In this work, we introduce StreamGuard, a unified model-agnostic streaming guardrail that instead formulates moderation as a forecasting problem: given a partial prefix, the model predicts the expected harmfulness of likely future continuations. We supervise this prediction using Monte Carlo rollouts, which enables early intervention without requiring exact token-level boundary annotations. Across standard safety benchmarks, StreamGuard performs strongly both for input moderation and for streaming output moderation. At the 8B scale, StreamGuard improves aggregated input-moderation F1 from 86.7 to 88.2 and aggregated streaming output-moderation F1 from 80.4 to 81.9 relative to Qwen3Guard-Stream-8B-strict. On the QWENGUARDTEST response_loc streaming benchmark, StreamGuard reaches 97.5 F1, 95.1 recall, and 92.6% on-time intervention, compared to 95.9 F1, 92.1 recall, and 89.9% for Qwen3Guard-Stream-8B-stric, while reducing the miss rate from 7.9% to 4.9%. We further show that forecasting-based supervision transfers effectively across tokenizers and model families: with transferred targets, Gemma3-StreamGuard-1B reaches 81.3 response-moderation F1, 98.2 streaming F1, and a 3.5% miss rate. These results show that strong end-to-end streaming moderation can be obtained without exact boundary labels, and that forecasting future risk is an effective supervision strategy for low-latency safety intervention.

URL PDF HTML ☆

赞 0 踩 0