arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

Bohan Zeng, Kaixin Zhu, Daili Hua, Bozhou Li, Chengzhuo Tong, Yuran Wang, Xinyi Huang, Yifan Dai, Zixiang Zhang, Yifan Yang, Zhou Liu, Hao Liang, Xiaochen Ma, Ruichuan An, Tianyi Bai, Hongcheng Gao, Junbo Niu, Yang Shi, Xinlong Chen, Yue Ding, Minglei Shi, Kai Zeng, Yiwen Tang, Yuanxing Zhang, Pengfei Wan, Xintao Wang, Wentao Zhang

Comments 13 pages, 4 figures

2602.01626 2026-02-03 cs.LG cs.AI

Toward Enhancing Representation Learning in Federated Multi-Task Settings

Mehdi Setayesh, Mahdi Beitollahi, Yasser H. Khalil, Hongliang Li

Comments This paper has been accepted at ICLR 2026

2602.01624 2026-02-03 cs.CV

PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards

Minh-Quan Le, Gaurav Mittal, Cheng Zhao, David Gu, Dimitris Samaras, Mei Chen

2602.01623 2026-02-03 cs.CV

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?

Susan Liang, Chao Huang, Filippos Bellos, Yolo Yunlong Tang, Qianxiang Shen, Jing Bi, Luchuan Song, Zeliang Zhang, Jason Corso, Chenliang Xu

2602.01618 2026-02-03 cs.CL

SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

Panuthep Tasawong, Jian Gang Ngui, Alham Fikri Aji, Trevor Cohn, Peerat Limkonchotiwat

Comments Under reivew

2602.01614 2026-02-03 cs.LG cs.AI

AgroFlux: A Spatial-Temporal Benchmark for Carbon and Nitrogen Flux Prediction in Agricultural Ecosystems

Qi Cheng, Licheng Liu, Yao Zhang, Mu Hong, Yiqun Xie, Xiaowei Jia

2602.01613 2026-02-03 cs.LG cs.AI

A Practical Tensor-Network Compression Pipeline for Production-Scale Large Language Models

Sergii Kozyrev, Davyd Maiboroda

Comments 13 pages, 5 figures

2602.01611 2026-02-03 cs.LG

What Do Agents Learn from Trajectory-SFT: Semantics or Interfaces?

Weizheng Gu, Chengze Li, Zhuohao Yu, Mengyuan Sun, Zhibang Yang, Wei Wang, Hongrui Jia, Shikun Zhang, Wei Ye

2602.01610 2026-02-03 cs.AI cs.LG

ToPT: Task-Oriented Prompt Tuning for Urban Region Representation Learning

Zitao Guo, Changyang Jiang, Tianhong Zhao, Jinzhou Cao, Genan Dai, Bowen Zhang

Comments The paper has been accepted by ICASSP 2026

2602.01609 2026-02-03 cs.CV

Token Pruning for In-Context Generation in Diffusion Transformers

Junqing Lin, Xingyu Zheng, Pei Cheng, Bin Fu, Jingwei Sun, Guangzhong Sun

Comments 20 pages

2602.01608 2026-02-03 cs.AI

Reasoning with Autoregressive-Diffusion Collaborative Thoughts

Mu Yuan, Liekang Zeng, Guoliang Xing, Lan Zhang, Yunhao Liu

2602.01606 2026-02-03 cs.LG cs.AI

Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching

Zeqiao Li, Yijing Wang, Haoyu Wang, Zheng Li, Zhiqiang Zuo

2602.01605 2026-02-03 cs.LG stat.ML

Universal Redundancies in Time Series Foundation Models

Anthony Bao, Venkata Hasith Vattikuti, Jeffrey Lai, William Gilpin

2602.01599 2026-02-03 cs.LG cs.AI

The Multiple Ticket Hypothesis: Random Sparse Subnetworks Suffice for RLVR

Israel Adewuyi, Solomon Okibe, Vladmir Ivanov

2602.01598 2026-02-03 cs.CL

The Art of Socratic Inquiry: A Framework for Proactive Template-Guided Therapeutic Conversation Generation

Mingwen Zhang, Minqiang Yang, Changsheng Ma, Yang Yu, Hui Bai, Chen Xu, Xiangzhen Kong, Bin Hu

2602.01594 2026-02-03 cs.CV

UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception

Wenzhuo Liu, Qiannan Guo, Zhen Wang, Wenshuo Wang, Lei Yang, Yicheng Qiao, Lening Wang, Zhiwei Li, Chen Lv, Shanghang Zhang, Junqiang Xi, Huaping Liu

2602.01593 2026-02-03 cs.CV

Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework

Wenzhuo Zhao, Keren Fu, Jiahao He, Xiaohong Liu, Qijun Zhao, Guangtao Zhai

详情

英文摘要

Existing salient object detection (SOD) models are generally constrained by the limited receptive fields of convolutional neural networks (CNNs) and quadratic computational complexity of Transformers. Recently, the emerging state-space model, namely Mamba, has shown great potential in balancing global receptive fields and computational efficiency. As a solution, we propose Saliency Mamba (Samba), a pure Mamba-based architecture that flexibly handles various distinct SOD tasks, including RGB/RGB-D/RGB-T SOD, video SOD (VSOD), RGB-D VSOD, and visible-depth-thermal SOD. Specifically, we rethink the scanning strategy of Mamba for SOD, and introduce a saliency-guided Mamba block (SGMB) that features a spatial neighborhood scanning (SNS) algorithm to preserve the spatial continuity of salient regions. A context-aware upsampling (CAU) method is also proposed to promote hierarchical feature alignment and aggregation by modeling contextual dependencies. As one step further, to avoid the "task-specific" problem as in previous SOD solutions, we develop Samba+, which is empowered by training Samba in a multi-task joint manner, leading to a more unified and versatile model. Two crucial components that collaboratively tackle challenges encountered in input of arbitrary modalities and continual adaptation are investigated. Specifically, a hub-and-spoke graph attention (HGA) module facilitates adaptive cross-modal interactive fusion, and a modality-anchored continual learning (MACL) strategy alleviates inter-modal conflicts together with catastrophic forgetting. Extensive experiments demonstrate that Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost, whereas Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model. Additional results further demonstrate the potential of our Samba framework.

URL PDF HTML ☆

赞 0 踩 0

2602.01591 2026-02-03 cs.CV

Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages

Zhixiong Yue, Zixuan Ni, Feiyang Ye, Jinshan Zhang, Sheng Shen, Zhenpeng Mi

2602.01587 2026-02-03 cs.CL cs.AI

Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment

Zehua Cheng, Jianwei Yang, Wei Dai, Jiahao Sun

Comments 10 pages

2602.01585 2026-02-03 cs.LG

A Lightweight Sparse Interaction Network for Time Series Forecasting

Xu Zhang, Qitong Wang, Peng Wang, Wei Wang

Comments The paper is published in AAAI Conference on Artificial Intelligence, AAAI 2025. The code is available at the link https://github.com/Meteor-Stars/LSINet

2602.01581 2026-02-03 cs.LG

Nearly Optimal Active Preference Learning and Its Application to LLM Alignment

Yao Zhao, Kwang-Sung Jun

2602.01574 2026-02-03 cs.CV

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models

Haobo Wang, Weiqi Luo, Xiaojun Jia, Xiaochun Cao

2602.01570 2026-02-03 cs.CV

One-Step Diffusion for Perceptual Image Compression

Yiwen Jia, Hao Wei, Yanhui Zhou, Chenyang Ge

2602.01564 2026-02-03 cs.LG math.AP math.OC math.PR

Local Exponential Stability of Mean-Field Langevin Descent-Ascent in Wasserstein Space

Geuntaek Seo, Minseop Shin, Pierre Monmarché, Beomjun Choi

2602.01561 2026-02-03 cs.CV cs.AI

Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd

Yejin Son, Saejin Kim, Dongjun Min, Younjae Yu

Comments 24 pages

2602.01559 2026-02-03 cs.CV eess.IV

Combined Flicker-banding and Moire Removal for Screen-Captured Images

Libo Zhu, Zihan Zhou, Zhiyi Zhou, Yiyang Qu, Weihang Zhang, Keyu Shi, Yifan Fu, Yulun Zhang

2602.01558 2026-02-03 cs.LG

How Implicit Bias Accumulates and Propagates in LLM Long-term Memory

Yiming Ma, Lixu Wang, Lionel Z. Wang, Hongkun Yang, Haoming Sun, Xin Xu, Jiaqi Wu, Bin Chen, Wei Dong

Comments Under review, and the first two authors contribute equally

2602.01556 2026-02-03 cs.AI

Autonomous Question Formation for Large Language Model-Driven AI Systems

Hong Su

2602.01547 2026-02-03 cs.SD eess.AS

Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition

Qingran Yang, Botao Zhao, Zuheng Kang, Xue Li, Yayun He, Chuhang Liu, Xulong Zhang, Xiaoyang Qu, Junqing Peng, Jianzong Wang

Comments Accepted to 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)

2602.01541 2026-02-03 cs.CV cs.AI

Toward Cognitive Supersensing in Multimodal Large Language Model

Boyi Li, Yifan Shen, Yuanzhe Liu, Yifan Xu, Jiateng Liu, Xinzhuo Li, Zhengyuan Li, Jingyuan Zhu, Yunhan Zhong, Fangzhou Lan, Jianguo Cao, James M. Rehg, Heng Ji, Ismini Lourentzou, Xu Cao

AI 大模型

视觉与机器人

科学与医疗

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks