arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03957 2026-04-07 cs.LG cs.CL

BWTA: Accurate and Efficient Binarized Transformer by Algorithm-Hardware Co-design

Yifu Ding, Xianglong Liu, Shenghao Jin, Jinyang Guo, Jiwen Lu

Comments Under review

详情

英文摘要

Ultra low-bit quantization brings substantial efficiency for Transformer-based models, but the accuracy degradation and limited GPU support hinder its wide usage. In this paper, we analyze zero-point distortion in binarization and propose a Binary Weights & Ternary Activations (BWTA) quantization scheme, which projects tiny values to zero and preserves the accuracy of extremely low-bit models. For training, we propose Smooth Multi-Stage Quantization, combining a Levelwise Degradation Strategy and a Magnitude-Alignment Projection Factor to enable stable and fast convergence. For inference, we develop a BWTA MatMul CUDA kernel with instruction-level parallel bit-packing and comprehensive binary/ternary MatMul implementations for both linear and attention operators, allowing seamless integration across Transformer architectures. Experiments show that BWTA approaches full-precision performance for BERT, with an average 3.5% drop on GLUE and less than 2% drop on five tasks, and achieves comparable perplexity and accuracy for LLMs. In efficiency, it delivers 16 to 24 times kernel-level speedup over FP16 on NVIDIA GPUs, and 216 to 330 tokens/s end-to-end prefill speedup with lower memory footprint on LLMs. As an algorithm-hardware co-design, BWTA demonstrates practical, low-latency ultra-low-bit inference without sacrificing model quality.

URL PDF HTML ☆

赞 0 踩 0

2604.03953 2026-04-07 cs.CV cs.LG

Multimodal Structure Learning: Disentangling Shared and Specific Topology via Cross-Modal Graphical Lasso

Fei Wang, Yutong Zhang, Xiong Wang

Comments Submitted to a conference

2604.03950 2026-04-07 cs.LG cs.AI

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Yifu Ding, Xinhao Zhang, Jinyang Guo

Comments CVPR Workshop EDGE 2026

2604.03941 2026-04-07 cs.CV

SafeCtrl: Region-Aware Safety Control for Text-to-Image Diffusion via Detect-Then-Suppress

Lingyun Zhang, Yu Xie, Zhongli Fang, Yu Liu, Ping Chen

Comments 6 pages, 5 figures, accepted to 2026 IEEE International Conference on Multimedia and Expo (ICME)

2604.03925 2026-04-07 cs.CL cs.AI

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference

Fangzhou Lin, Peiran Li, Shuo Xing, Siyuan Yang, Qianwen Ge, Kazunori Yamada, Ziming Zhang, Haichong Zhang, Zhengzhong Tu

Comments 20 pages, 4 figures, 5 tables

2604.03924 2026-04-07 cs.CL cs.AI

Uncertainty as a Planning Signal: Multi-Turn Decision Making for Goal-Oriented Conversation

Xinyi Ling, Ye Liu, Reza Averly, Xia Ning

2604.03922 2026-04-07 cs.LG

ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation

Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li

Comments 32 pages, 14 figures, 9 tables

2604.03919 2026-04-07 cs.CV cs.AI

Interpreting Video Representations with Spatio-Temporal Sparse Autoencoders

Atahan Dokme, Sriram Vishwanath

Comments 9 pages, 2 figures, 5 tables. Submitted to ACM Multimedia 2026

2604.03911 2026-04-07 cs.LG q-bio.QM

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

Aniketh Iyengar, Jiaqi Han, Pengwei Sun, Mingjian Jiang, Jianwen Xie, Stefano Ermon

Comments Published at ICLR 2026. 38 pages, 17 figures, 17 tables

2604.03904 2026-04-07 cs.CL cs.AI

I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation

Haotian Zong, Binze Li, Yufei Long, Sinyin Chang, Jialong Wu, Gillian K. Hadfield

2604.03898 2026-04-07 cs.AI stat.CO

LLM-Agent-based Social Simulation for Attitude Diffusion

Deepak John Reji

2604.03891 2026-04-07 cs.LG

Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

Yaoze Guo, Shana Moothedath

详情

英文摘要

Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.

URL PDF HTML ☆

赞 0 踩 0

2604.03890 2026-04-07 cs.RO

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

Mingyang Xie, Jin Wei-Kocsis

2604.03888 2026-04-07 cs.AI cs.CL cs.MA q-fin.TR

PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage

Rajat M. Barot, Arjun S. Borkhatariya

Comments 13 pages, 3 figures, 3 tables

2604.03878 2026-04-07 cs.CV

Learning 3D Reconstruction with Priors in Test Time

Lei Zhou, Haoyu Wu, Akshat Dave, Dimitris Samaras

Comments Accepted to CVPR2026. Code link: https://github.com/cvlab-stonybrook/TCO

2604.03877 2026-04-07 cs.CL cs.AI cs.LG

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

Hope McGovern, Caroline Craig, Thomas Lippincott, Hale Sirin

2604.03870 2026-04-07 cs.CL

Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs

Wenhui Zhu, Xuanzhao Dong, Xiwen Chen, Rui Cai, Peijie Qiu, Zhipeng Wang, Oana Frunza, Shao Tang, Jindong Gu, Yalin Wang

详情

英文摘要

The rapid deployment of open-source frameworks has significantly advanced the development of modern multi-agent systems. However, expanded action spaces, including uncontrolled privilege exposure and hidden inter-system interactions, pose severe security challenges. Specifically, Indirect Prompt Injections (IPI), which conceal malicious instructions within third-party content, can trigger unauthorized actions such as data exfiltration during normal operations. While current security evaluations predominantly rely on isolated single-turn benchmarks, the systemic vulnerabilities of these agents within complex dynamic environments remain critically underexplored. To bridge this gap, we systematically evaluate six defense strategies against four sophisticated IPI attack vectors across nine LLM backbones. Crucially, we conduct our evaluation entirely within dynamic multi-step tool-calling environments to capture the true attack surface of modern autonomous agents. Moving beyond binary success rates, our multidimensional analysis reveals a pronounced fragility. Advanced injections successfully bypass nearly all baseline defenses, and some surface-level mitigations even produce counterproductive side effects. Furthermore, while agents execute malicious instructions almost instantaneously, their internal states exhibit abnormally high decision entropy. Motivated by this latent hesitation, we investigate Representation Engineering (RepE) as a robust detection strategy. By extracting hidden states at the tool-input position, we revealed that the RepE-based circuit breaker successfully identifies and intercepts unauthorized actions before the agent commits to them, achieving high detection accuracy across diverse LLM backbones. This study exposes the limitations of current IPI defenses and provides a highly practical paradigm for building resilient multi-agent architectures.

URL PDF HTML ☆

赞 0 踩 0

2604.03867 2026-04-07 cs.LG

Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment

Soham Gadgil, Chris Lin, Su-In Lee

Comments Preprint

2604.03853 2026-04-07 cs.LG

Understanding When Poisson Log-Normal Models Outperform Penalized Poisson Regression for Microbiome Count Data

Daniel Agyapong, Julien Chiquet, Jane Marks, Toby Dylan Hocking

2604.03850 2026-04-07 cs.LG cs.NE

Collapse-Free Prototype Readout Layer for Transformer Encoders

Giansalvo Cirrincione, Rahul Ranjeev Kumar

Comments 35 pages, 6 figures, submitted to Pattern Recognition

2604.03841 2026-04-07 cs.CV

Training a Student Expert via Semi-Supervised Foundation Model Distillation

Pardis Taghavi, Tian Liu, Renjie Li, Reza Langari, Zhengzhong Tu

Comments Accepted to the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 14 pages, 9 figures

2604.03839 2026-04-07 cs.CV

Beyond Task-Driven Features for Object Detection

Meilun Zhou, Alina Zare

Comments Accepted for Oral Presentation at the 46th IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2026, Washington D.C., United States. 4 pages and 4 figures

2604.03837 2026-04-07 cs.CV

Task-Guided Multi-Annotation Triplet Learning for Remote Sensing Representations

Meilun Zhou, Alina Zare

Comments Accepted for Oral Presentation at the 46th IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2026, Washington D.C., United States. 4 pages and 2 figures

2604.03833 2026-04-07 cs.CV

SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning

Hessen Bougueffa Eutamene, Abdellah Zakaria Sellam, Abdelmalik Taleb-Ahmed, Abdenour Hadid

2604.03820 2026-04-07 cs.AI cs.CL

Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitative Research

Max Hao Lu, Ryan Ellegood, Rony Rodriguez-Ramirez, Sophia Blumert

Comments 9 pages, 3 figures, BEA2026 Conference Submission

2604.03819 2026-04-07 cs.CV

ActivityForensics: A Comprehensive Benchmark for Localizing Manipulated Activity in Videos

Peijun Bao, Anwei Luo, Gang Pan, Alex C. Kot, Xudong Jiang

Comments [CVPR 2026] The first benchmark for action-level deepfake localization

2604.03814 2026-04-07 cs.CV cs.AI

InCaRPose: In-Cabin Relative Camera Pose Estimation Model and Dataset

Felix Stillger, Lukas Hahn, Frederik Hasecke, Tobias Meisen

Comments Accepted at the CVPR 2026 Workshop on Autonomous Driving (WAD)

2604.03809 2026-04-07 cs.LG cs.AI cs.MA

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

Dipkumar Patel

Comments 11 pages, 2 figures, 7 tables

2604.03806 2026-04-07 cs.CV

Bridging Restoration and Diagnosis: A Comprehensive Benchmark for Retinal Fundus Enhancement

Xuanzhao Dong, Wenhui Zhu, Xiwen Chen, Hao Wang, Xin Li, Yujian Xiong, Jiajun Cheng, Zhipeng Wang, Shao Tang, Oana Dumitrascu, Yalin Wang

详情

英文摘要

Over the past decade, generative models have demonstrated success in enhancing fundus images. However, the evaluation of these models remains a challenge. A benchmark for fundus image enhancement is needed for three main reasons:(1) Conventional denoising metrics such as PSNR and SSIM fail to capture clinically relevant features, such as lesion preservation and vessel morphology consistency, limiting their applicability in real-world settings; (2) There is a lack of unified evaluation protocols that address both paired and unpaired enhancement methods, particularly those guided by clinical expertise; and (3) An evaluation framework should provide actionable insights to guide future advancements in clinically aligned enhancement models. To address these gaps, we introduce EyeBench-V2, a benchmark designed to bridge the gap between enhancement model performance and clinical utility. Our work offers three key contributions:(1) Multi-dimensional clinical-alignment through downstream evaluations: Beyond standard enhancement metrics, we assess performance across clinically meaningful tasks including vessel segmentation, diabetic retinopathy (DR) grading, generalization to unseen noise patterns, and lesion segmentation. (2) Expert-guided evaluation design: We curate a novel dataset enabling fair comparisons between paired and unpaired enhancement methods, accompanied by a structured manual assessment protocol by medical experts, which evaluates clinically critical aspects such as lesion structure alterations, background color shifts, and the introduction of artificial structures. (3) Actionable insights: Our benchmark provides a rigorous, task-oriented analysis of existing generative models, equipping clinical researchers with the evidence needed to make informed decisions, while also identifying limitations in current methods to inform the design of next-generation enhancement models.

URL PDF HTML ☆

赞 0 踩 0

2604.03803 2026-04-07 cs.CV cs.LG

Rényi Attention Entropy for Patch Pruning

Hiroaki Aizawa, Yuki Igaue

Comments Accepted to ICPR2026