arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.20112 2026-03-17 cs.CL cs.AI

ERC-SVD: Error-Controlled SVD for Large Language Model Compression

Haolei Bai, Siyong Jian, Tuo Liang, Yu Yin, Huan Wang

详情

英文摘要

Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks. Nevertheless, their considerable sizes and memory demands hinder practical deployment, underscoring the importance of developing efficient compression strategies. Singular value decomposition (SVD) decomposes a matrix into orthogonal components, enabling efficient low-rank approximation. This is particularly suitable for LLM compression, where weight matrices often exhibit significant redundancy. However, current SVD-based methods neglect the residual matrix from truncation, resulting in significant truncation loss. Additionally, compressing all layers of the model results in severe error propagation. To overcome these limitations, we propose ERC-SVD, a new post-training SVD-based LLM compression method from an error-controlled perspective. Specifically, we leverage the residual matrix generated during the truncation process to reduce truncation loss. Moreover, under a fixed overall compression ratio, we selectively compress the last few layers of the model, which mitigates error propagation and improves compressed model performance. Comprehensive evaluations on diverse LLM families and multiple benchmark datasets indicate that ERC-SVD consistently achieves superior performance over existing counterpart methods, demonstrating its practical effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2505.17341 2026-03-17 cs.LG

TI-DeepONet: Learnable Time Integration for Stable Long-Term Extrapolation

Dibyajyoti Nayak, Somdatta Goswami

Comments 32 pages (including references), 22 figures

2505.16643 2026-03-17 cs.CV cs.AI

From Evaluation to Defense: Advancing Safety in Video Large Language Models

Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie

Comments Accepted at ICLR 2026

2505.16353 2026-03-17 cs.LG math.OC math.PR

Admission Control of Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning

Céline Comte, Pascal Moyal

2505.16294 2026-03-17 cs.CV

Self-Classification Enhancement and Correction for Weakly Supervised Object Detection

Yufei Yin, Lechao Cheng, Wengang Zhou, Jiajun Deng, Zhou Yu, Houqiang Li

Comments Accepted by IJCAI 2025

2505.15030 2026-03-17 cs.LG

A Systematic Evaluation of On-Device LLMs: Quantization, Performance, and Resources

Qingyu Song, Rui Liu, Wei Lin, Peiyu Liao, Wenqian Zhao, Yiwen Wang, Shoubo Hu, Yining Jiang, Mochun Long, Hui-Ling Zhen, Ning Jiang, Mingxuan Yuan, Qiao Xiang, Hong Xu

Comments 10 pages, 8 figures

2505.12284 2026-03-17 cs.AI cs.CL

Shorten After You're Right: Lazy Length Penalties for Reasoning RL

Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao

Comments Under review

2505.09986 2026-03-17 cs.CV eess.IV

High Quality Underwater Image Compression with Adaptive Color Correction

Yimin Zhou, Yichong Xia, Sicheng Pan, Bin Chen, Yaowei Li, Jiawei Li, Mingyao Hong, Zhi Wang, Yaowei Wang

2505.03025 2026-03-17 cs.CL cs.AI

A Typology of Synthetic Datasets for Dialogue Processing in Clinical Contexts

Steven Bedrick, A. Seza Doğruöz, Sergiu Nisioi

Comments Accepted at LREC 2026 https://lrec2026.info/

2504.20371 2026-03-17 cs.CL

DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation

Zhibo Man, Yuanmeng Chen, Yujie Zhang, Jinan Xu

Comments Accepted by EMNLP2025-main

2504.14325 2026-03-17 cs.AI

FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory

Alessio Buscemi, Daniele Proverbio, Alessandro Di Stefano, The-Anh Han, German Castignani, Pietro Liò

2504.06460 2026-03-17 cs.CL

Can LLMs Simulate Personas with Reversed Performance? A Systematic Investigation for Counterfactual Instruction Following in Math Reasoning Context

Sai Adith Senthil Kumar, Hao Yan, Saipavan Perepa, Murong Yue, Ziyu Yao

2504.00623 2026-03-17 cs.CL

Efficient Construction of Model Family through Progressive Training Using Model Expansion

Kazuki Yano, Sho Takase, Sosuke Kobayashi, Shun Kiyono, Jun Suzuki

Comments 17pages, accepted by COLM 2025 as a conference paper

2503.22478 2026-03-17 cs.LG cs.AI math.OC

Almost Bayesian: The Fractal Dynamics of Stochastic Gradient Descent

Max Hennick, Stijn De Baerdemacker

2503.08723 2026-03-17 cs.LG cs.CV

Is CLIP ideal? No. Can we fix it? Yes!

Raphi Kang, Yue Song, Georgia Gkioxari, Pietro Perona

Comments ICCV 2025

2503.05954 2026-03-17 cs.LG

A Survey on Deep Learning Approaches for Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, Diversity, and Beyond

Mihaela Cătălina Stoian, Eleonora Giunchiglia, Thomas Lukasiewicz

Comments Accepted to Transactions on Machine Learning Research (02/2026)

2502.14135 2026-03-17 cs.LG cs.CR

Cluster Analysis and Concept Drift Detection in Malware

Aniket Mishra, Mark Stamp

2502.03285 2026-03-17 cs.CV eess.IV

Deep Learning-based Event Data Coding: A Joint Spatiotemporal and Polarity Solution

Abdelrahman Seleem, André F. R. Guarda, Nuno M. M. Rodrigues, Fernando Pereira

2501.18328 2026-03-17 cs.CV cs.AI

Virtual Full-stack Scanning of Brain MRI via Imputing Any Quantised Code

Yicheng Wu, Tao Song, Zhonghua Wu, Jin Ye, Zongyuan Ge, Wenjia Bai, Zhaolin Chen, Jianfei Cai

Comments Accepted by CVPR 2026

2501.17424 2026-03-17 cs.RO cs.LG

Certificated Actor-Critic: Hierarchical Reinforcement Learning with Control Barrier Functions for Safe Navigation

Junjun Xie, Shuhao Zhao, Liang Hu, Huijun Gao

Comments Accepted to ICRA 2025

2501.05264 2026-03-17 cs.CV cs.AI

Towards Balanced Multi-Modal Learning in 3D Human Pose Estimation

Mengshi Qi, Jiaxuan Peng, Xianlin Zhang, Huadong Ma

Comments Accepted by CVPR 2026

2501.00691 2026-03-17 cs.CL cs.LG

Labels Generated by Large Language Models Help Measure People's Empathy in Vitro

Md Rakibul Hasan, Yue Yao, Md Zakir Hossain, Aneesh Krishna, Imre Rudas, Shafin Rahman, Tom Gedeon

Comments This work has been submitted to the IEEE for possible publication

2412.18507 2026-03-17 cs.LG

An Empirical Analysis of Federated Learning Models Subject to Label-Flipping Adversarial Attack

Kunal Bhatnagar, Sagana Chattanathan, Angela Dang, Bhargav Eranki, Ronnit Rana, Charan Sridhar, Siddharth Vedam, Angie Yao, Mark Stamp

Comments In: Stamp, M., Jureček, M. (eds) Machine Learning, Deep Learning and AI for Cybersecurity. Springer (2025)

2412.17741 2026-03-17 cs.CV

Reasoning to Attend: Try to Understand How <SEG> Token Works

Rui Qian, Xin Yin, Dejing Dou

Comments This work has been accepted to CVPR 2025, please refer to https://github.com/rui-qian/READ

2412.16787 2026-03-17 cs.LG physics.comp-ph physics.flu-dyn

Symplectic Neural Flows for Modeling and Discovery

Priscilla Canizares, Davide Murari, Carola-Bibiane Schönlieb, Ferdia Sherry, Zakhar Shumaylov

2412.11967 2026-03-17 cs.LG cs.SY eess.SY

A Digital Twin for Diesel Engines: Operator-infused Physics-Informed Neural Networks with Transfer Learning for Engine Health Monitoring

Kamaljyoti Nath, Varun Kumar, Daniel J. Smith, George Em Karniadakis

详情

DOI: 10.1016/j.engappai.2026.114052

英文摘要

Improving diesel engine efficiency, reducing emissions, and enabling robust health monitoring have been critical research topics in engine modelling. While recent advancements in the use of neural networks for system monitoring have shown promising results, such methods often focus on component-level analysis, lack generalizability, and physical interpretability. In this study, we propose a novel hybrid framework that combines physics-informed neural networks (PINNs) with deep operator networks (DeepONet) to enable accurate and computationally efficient parameter identification in mean-value diesel engine models. Our method leverages physics-based system knowledge in combination with data-driven training of neural networks to enhance model applicability. Incorporating offline-trained DeepONets to predict actuator dynamics significantly lowers the online computation cost when compared to the existing PINN framework. To address the re-training burden typical of PINNs under varying input conditions, we propose two transfer learning (TL) strategies: (i) a multi-stage TL scheme offering better runtime efficiency than full online training of the PINN model and (ii) a few-shot TL scheme that freezes a shared multi-head network body and computes physics-based derivatives required for model training outside the training loop. The second strategy offers a computationally inexpensive and physics-based approach for predicting engine dynamics and parameter identification, offering computational efficiency over the existing PINN framework. Compared to existing health monitoring methods, our framework combines the interpretability of physics-based models with the flexibility of deep learning, offering substantial gains in generalization, accuracy, and deployment efficiency for diesel engine diagnostics.

URL PDF HTML ☆

赞 0 踩 0

2410.08950 2026-03-17 cs.LG cs.AI

On the Adversarial Transferability of Generalized "Skip Connections"

Yisen Wang, Yichuan Mo, Dongxian Wu, Mingjie Li, Xingjun Ma, Zhouchen Lin

详情

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

英文摘要

Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. Despite their huge success in normal scenarios (state-of-the-art classification performance on natural examples), we investigate and identify an interesting property of skip connections under adversarial scenarios, namely, the use of skip connections allows easier generation of highly transferable adversarial examples. Specifically, in ResNet-like models (with skip connections), we find that biasing backpropagation to favor gradients from skip connections--while suppressing those from residual modules via a decay factor--allows one to craft adversarial examples with high transferability. Based on this insight, we propose the Skip Gradient Method (SGM). Although starting from ResNet-like models in vision domains, we further extend SGM to more advanced architectures, including Vision Transformers (ViTs), models with varying-length paths, and other domains such as natural language processing. We conduct comprehensive transfer-based attacks against diverse model families, including ResNets, Transformers, Inceptions, Neural Architecture Search-based models, and Large Language Models (LLMs). The results demonstrate that employing SGM can greatly improve the transferability of crafted attacks in almost all cases. Furthermore, we demonstrate that SGM can still be effective under more challenging settings such as ensemble-based attacks, targeted attacks, and against defense equipped models. At last, we provide theoretical explanations and empirical insights on how SGM works. Our findings not only motivate new adversarial research into the architectural characteristics of models but also open up further challenges for secure model architecture design. Our code is available at https://github.com/mo666666/SGM.

URL PDF HTML ☆

赞 0 踩 0

2409.16945 2026-03-17 cs.CV

Revisiting Face Forgery Detection: From Facial Representation to Forgery Detection

Zonghui Guo, Yingjie Liu, Jie Zhang, Haiyong Zheng, Shiguang Shan

2408.13024 2026-03-17 cs.CV

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao

Comments Accepted by AAAI 2025 (Oral)

2408.05472 2026-03-17 cs.LG physics.ao-ph

FuXi Weather: A data-to-forecast machine learning system for global weather

Xiuyu Sun, Xiaohui Zhong, Xiaoze Xu, Yuanqing Huang, Hao Li, J. David Neelin, Deliang Chen, Jie Feng, Wei Han, Libo Wu, Yuan Qi

Comments 73 pages

详情

DOI: 10.1038/s41467-025-62024-1

英文摘要

Weather forecasting traditionally relies on numerical weather prediction (NWP) systems that integrates global observational systems, data assimilation (DA), and forecasting models. Despite steady improvements in forecast accuracy over recent decades, further advances are increasingly constrained by high computational costs, the underutilization of vast observational datasets, and the challenges of obtaining finer resolution. These limitations, alongside the uneven distribution of observational networks, result in global disparities in forecast accuracy, leaving some regions vulnerable to extreme weather. Recent advances in machine learning present a promising alternative, providing more efficient and accurate forecasts using the same initial conditions as NWP. However, current machine learning models still depend on the initial conditions generated by NWP systems, which require extensive computational resources and expertise. Here we introduce FuXi Weather, a machine learning weather forecasting system that assimilates data from multiple satellites. Operating on a 6-hourly DA and forecast cycle, FuXi Weather generates reliable and accurate 10-day global weather forecasts at a spatial resolution of $0.25^\circ$. FuXi Weather is the first system to achieve all-grid, all-surface, all-channel, and all-sky DA and forecasting, extending skillful forecast lead times beyond those of the European Centre for Medium-range Weather Forecasts (ECMWF) high-resolution forecasts (HRES) while using significantly fewer observations. FuXi Weather consistently outperforms ECMWF HRES in observation-sparse regions, such as central Africa, demonstrating its potential to improve forecasts where observational infrastructure is limited.

URL PDF HTML ☆

赞 0 踩 0