arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.14722 2026-04-17 cs.LG

A Mechanistic Account of Attention Sinks in GPT-2: One Circuit, Broader Implications for Mitigation

Yuval Ran-Milo, Hila Ofek, Shahar Mendel

Comments 9 pages, 8 figures

详情

英文摘要

Transformers commonly exhibit an attention sink: disproportionately high attention to the first position. We study this behavior in GPT-2-style models with learned query biases and absolute positional embeddings. Combining structural analysis with causal interventions, validated across natural-language, mathematical, and code inputs, we find that the sink arises from the interaction among (i) a learned query bias, (ii) the first-layer MLP transformation of the positional encoding, and (iii) structure in the key projection. Crucially, each component we identify is individually dispensable: architectures omitting each of them robustly exhibit sinks. This indicates that attention sinks may arise through distinct circuits across architectures. These findings inform mitigation of sinks, and motivate broader investigation into why sinks emerge.

URL PDF HTML ☆

赞 0 踩 0

2604.14720 2026-04-17 cs.CV

Data Synthesis Improves 3D Myotube Instance Segmentation

David Exler, Nils Friederich, Martin Krüger, John Jbeily, Mario Vitacolonna, Rüdiger Rudolf, Ralf Mikut, Markus Reischl

Comments 4 pages, 4 figures, submitted to BMT (VDE) 2026 Conference

2604.14718 2026-04-17 cs.AI cond-mat.dis-nn hep-th

The Agentification of Scientific Research: A Physicist's Perspective

Xiao-Liang Qi

Comments 14 pages, 4 figures

2604.14712 2026-04-17 cs.AI

SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval

Xin Xie, Dongyun Xue, Wuguannan Yao, Mingxiao Feng, Wengang Zhou, Xiang Qi, Houqiang Li, Peng Zhang

2604.14711 2026-04-17 cs.CV

MS-SSE-Net: A Multi-Scale Spatial Squeeze-and-Excitation Network for Structural Damage Detection in Civil and Geotechnical Engineering

Saif ur Rehman Khan, Imad Ahmed Waqar, Arooj Zaib, Saad Ahmed, Sebastian Vollmer, Andreas Dengel, Muhammad Nabeel Asim

2604.14710 2026-04-17 cs.CV

G-MIXER: Geodesic Mixup-based Implicit Semantic Expansion and Explicit Semantic Re-ranking for Zero-Shot Composed Image Retrieval

Jiyoung Lim, Heejae Yang, Jee-Hyong Lee

Comments CVPR 2026 Accepted

2604.14709 2026-04-17 cs.AI

HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks

Fan Cui, Hongyuan Hou, Zizhang Luo, Chenyun Yin, Yun Liang

2604.14706 2026-04-17 cs.CV

NG-GS: NeRF-Guided 3D Gaussian Splatting Segmentation

Yi He, Tao Wang, Yi Jin, Congyan Lang, Yidong Li, Haibin Ling

Comments Accepted to CVPR 2026 (Highlight)

2604.14705 2026-04-17 cs.AI

SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces

Rongchao Xu, Lin Jiang, Dahai Yu, Ximiao Li, Guang Wang

详情

DOI: 10.1145/3810213

英文摘要

Human activity traces (HATs) are critical for many applications, including human mobility modeling and point-of-interest (POI) recommendation. However, growing privacy concerns have severely limited access to authentic large-scale HAT datasets. Recent advances in generative AI provide new opportunities to synthesize realistic and privacy-preserving HATs for such applications. Yet two major challenges remain: (i) HATs are highly irregular and dynamic, with long and varying time intervals, making it difficult to capture their complex spatio-temporal dependencies and underlying distributions; and (ii) generative models are often computationally expensive, making long-term, fine-grained HAT synthesis inefficient. To address these challenges, we propose SynHAT, a computationally efficient coarse-to-fine HAT synthesis framework built on a novel spatio-temporal denoising diffusion model. In Stage 1, we develop Coarse-HADiff, which models the overall spatio-temporal dependencies of coarse-grained latent spatio-temporal traces. It incorporates a novel Latent Spatio-Temporal U-Net with dual Drift-Jitter branches to jointly model smooth spatial transitions and temporal variations during denoising. In Stage 2, we introduce a three-step pipeline consisting of Behavior Pattern Extraction, Fine-HADiff, which shares the same architecture as Coarse-HADiff, and Semantic Alignment to generate fine-grained latent spatio-temporal traces from the Stage 1 outputs. We extensively evaluate SynHAT in terms of data fidelity, utility, privacy, robustness, and scalability. Experiments on real-world HAT datasets from four cities across three countries show that SynHAT substantially outperforms state-of-the-art baselines, achieving 52% and 33% improvements on spatial and temporal metrics, respectively.

URL PDF HTML ☆

赞 0 踩 0

2604.14703 2026-04-17 cs.CV

The Courtroom Trial of Pixels: Robust Image Manipulation Localization via Adversarial Evidence and Reinforcement Learning Judgment

Songlin Li, Zhiqing Guo, Dan Ma, Changtao Miao, Gaobo Yang

2604.14702 2026-04-17 cs.LG stat.ML

Gating Enables Curvature: A Geometric Expressivity Gap in Attention

Satwik Bathula, Anand A. Joshi

Comments 41 pages, 9 figures

2604.14687 2026-04-17 cs.AI

M2-PALE: A Framework for Explaining Multi-Agent MCTS--Minimax Hybrids via Process Mining and LLMs

Yiyu Qian, Liyuan Zhao, Tim Miller

2604.14683 2026-04-17 cs.AI

DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Qianqian Xie, Qingheng Xiong, He Zhu, Tiantian Xia, Xueming Han, Fanyu Meng, Jiakai Wang, Zhiqi Bai, Chengkang Jiang, Zhaohui Wang, Yubin Guo, Yuqing Wen, Jiayang Mao, Zijie Zhang, Shihao Li, Yanghai Wang, Yuxiang Ren, Junlan Feng, Jiaheng Liu

2604.14682 2026-04-17 cs.AI cs.CL

Acceptance Dynamics Across Cognitive Domains in Speculative Decoding

Saif Mahmoud

2604.14672 2026-04-17 cs.CL

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

Binxian Su, Haoye Lou, Shucheng Zhu, Weikang Wang, Ying Liu, Dong Yu, Pengyuan Liu

Comments Accepted by ACL 2026

2604.14669 2026-04-17 cs.LG math.DS math.OC stat.ML

Zeroth-Order Optimization at the Edge of Stability

Minhak Song, Liang Zhang, Bingcong Li, Niao He, Michael Muehlebach, Sewoong Oh

Comments 38 pages

2604.14656 2026-04-17 cs.AI cs.CL cs.CV

Rethinking Patient Education as Multi-turn Multi-modal Interaction

Zonghai Yao, Zhipeng Tang, Chengtao Lin, Xiong Luo, Benlu Wang, Juncheng Huang, Chin Siang Ong, Hong Yu

Comments Equal contribution for the first two authors

详情

英文摘要

Most medical multimodal benchmarks focus on static tasks such as image question answering, report generation, and plain-language rewriting. Patient education is more demanding: systems must identify relevant evidence across images, show patients where to look, explain findings in accessible language, and handle confusion or distress. Yet most patient education work remains text-only, even though combined image-and-text explanations may better support understanding. We introduce MedImageEdu, a benchmark for multi-turn, evidence-grounded radiology patient education. Each case provides a radiology report with report text and case images. A DoctorAgent interacts with a PatientAgent, conditioned on a hidden profile that captures factors such as education level, health literacy, and personality. When a patient question would benefit from visual support, the DoctorAgent can issue drawing instructions grounded in the report, case images, and the current question to a benchmark-provided drawing tool. The tool returns image(s), after which the DoctorAgent produces a final multimodal response consisting of the image(s) and a grounded plain-language explanation. MedImageEdu contains 150 cases from three sources and evaluates both the consultation process and the final multimodal response along five dimensions: Consultation, Safety and Scope, Language Quality, Drawing Quality, and Image-Text Response Quality. Across representative open- and closed-source vision-language model agents, we find three consistent gaps: fluent language often outpaces faithful visual grounding, safety is the weakest dimension across disease categories, and emotionally tense interactions are harder than low education or low health literacy. MedImageEdu provides a controlled testbed for assessing whether multimodal agents can teach from evidence rather than merely answer from text.

URL PDF HTML ☆

赞 0 踩 0

2604.14652 2026-04-17 cs.RO

DigiForest: Digital Analytics and Robotics for Sustainable Forestry

Marco Camurri, Enrico Tomelleri, Matías Mattamala, Sebastián Barbas Laina, Martin Jacquet, Jens Behley, Sunni Kanta Prasad Kushwaha, Fang Nan, Nived Chebrolu, Leonard Freißmuth, Marvin Chayton Harms, Meher V. R. Malladi, Fan Yang, Jonas Frey, Cesar Cadena, Marco Hutter, Janine Schweier, Kostas Alexis, Cyrill Stachniss, Maurice Fallon, Stefan Leutenegger

Comments 34 pages, 24 figures

2604.14648 2026-04-17 cs.CV cs.AI cs.LG

Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

Inseok Jeon, Minhyeok Lee, Seunghoon Lee, Minseok Kang, Suhwan Cho, Sangyoun Lee

Comments 8 pages, 8 figures (main paper); 9 pages, 10 figures (supplementary). Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026, Findings

2604.14645 2026-04-17 cs.CV cs.AI nlin.CD

Chaotic CNN for Limited Data Image Classification

Anusree M, Akhila Henry, Pramod P Nair

2604.14644 2026-04-17 cs.CL cs.LG

CURaTE: Continual Unlearning in Real Time with Ensured Preservation of LLM Knowledge

Seyun Bae, Seokhan Lee, Eunho Yang

Comments Accepted to Findings of ACL 2026

2604.14643 2026-04-17 cs.CV cs.LG

Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification

Weiwei Zhuang, Wangze Xie, Qi Zhang, Xia Du, Zihan Lin, Zheng Lin, Hanlin Cai, Jizhe Zhou, Zihan Fang, Chi-man Pun, Wei Ni, Jun Luo

Comments 14 pages, 11 figures

2604.14641 2026-04-17 cs.AI

Learning to Draw ASCII Improves Spatial Reasoning in Language Models

Shiyuan Huang, Li Liu, Jincheng He, Leilani H. Gilpin

2604.14635 2026-04-17 cs.RO

A multi-platform LiDAR dataset for standardized forest inventory measurement at long term ecological monitoring sites

Michael R. Chang, Anna Candotti, Karl von Ellenrieder, Enrico Tomelleri, Marco Camurri

Comments 30 pages, 7 figures

2604.14634 2026-04-17 cs.CL

Pushing the Boundaries of Multiple Choice Evaluation to One Hundred Options

Nahyun Lee, Guijin Son

2604.14632 2026-04-17 cs.CV

High-Speed Full-Color HDR Imaging via Unwrapping Modulo-Encoded Spike Streams

Chu Zhou, Siqi Yang, Kailong Zhang, Heng Guo, Zhaofei Yu, Boxin Shi, Imari Sato

Comments TPAMI under review

2604.14631 2026-04-17 cs.CL cs.AI

StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation

Geonhui Jang, Dongyoon Han, YoungJoon Yoo

Comments 21 pages, 12 figures. ACL 2026 Main Conference

2604.14630 2026-04-17 cs.CV cs.LG

CMTM: Cross-Modal Token Modulation for Unsupervised Video Object Segmentation

Inseok Jeon, Suhwan Cho, Minhyeok Lee, Seunghoon Lee, Minseok Kang, Jungho Lee, Chaewon Park, Donghyeong Kim, Sangyoun Lee

Comments 6 pages, 5 figures. Accepted to IEEE ICIP 2025

2604.14629 2026-04-17 cs.CV

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

Haoyi Sun, Xiaoxiao Wang, Ning Mao, Qian Wang, Lifu Mu, Wen Zheng, Tao Wei, Wei Chen

Comments 11 pages, 3 figures

2604.14627 2026-04-17 cs.AI

A Parallel Approach to Counting Exact Covers Based on Decomposability Property

Liangda Fang, Yaohui Luo, Delong Li, Xuanxiang Huang, Quanlong Guan

Comments Submitted to SAT 2026