arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.11809 2026-04-14 cs.CV

Who Handles Orientation? Investigating Invariance in Feature Matching

David Nordström, Johan Edstedt, Fredrik Kahl, Georg Bökman

详情

英文摘要

Finding matching keypoints between images is a core problem in 3D computer vision. However, modern matchers struggle with large in-plane rotations. A straightforward mitigation is to learn rotation invariance via data augmentation. However, it remains unclear at which stage rotation invariance should be incorporated. In this paper, we study this in the context of a modern sparse matching pipeline. We perform extensive experiments by training on a large collection of 3D vision datasets and evaluating on popular image matching benchmarks. Surprisingly, we find that incorporating rotation invariance already in the descriptor yields similar performance to handling it in the matcher. However, rotation invariance is achieved earlier in the matcher when it is learned in the descriptor, allowing for a faster rotation-invariant matcher. Further, we find that enforcing rotation invariance does not hurt upright performance when trained at scale. Finally, we study the emergence of rotation invariance through scale and find that increasing the training data size substantially improves generalization to rotated images. We release two matchers robust to in-plane rotations that achieve state-of-the-art performance on e.g. multi-modal (WxBS), extreme (HardMatch), and satellite image matching (SatAst). Code is available at https://github.com/davnords/loma.

URL PDF HTML ☆

赞 0 踩 0

2604.11806 2026-04-14 cs.AI cs.CL

Detecting Safety Violations Across Many Agent Traces

Adam Stein, Davis Brown, Hamed Hassani, Mayur Naik, Eric Wong

Comments 35 pages, 17 figures

2604.11805 2026-04-14 cs.LG cs.AI cs.CV cs.RO

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak

Comments Project Webpage - https://sim2reason.github.io/

2604.11803 2026-04-14 cs.CL

Saar-Voice: A Multi-Speaker Saarbrücken Dialect Speech Corpus

Lena S. Oberkircher, Jesujoba O. Alabi, Dietrich Klakow, Jürgen Trouvain

Comments accepted at DialRes-LREC26

2604.11802 2026-04-14 cs.CL

Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs?

Yuto Harada, Hiro Taiyo Hamada

详情

英文摘要

Using psychological constructs such as the Big Five, large language models (LLMs) can imitate specific personality profiles and predict a user's personality. While LLMs can exhibit behaviors consistent with these constructs, it remains unclear where and how they are represented inside the model and how they relate to behavioral outputs. To address this gap, we focus on questionnaire-operationalized Big Five concepts, analyze the formation and localization of their internal representations, and use interventions to examine how these representations relate to behavioral outputs. In our experiment, we first use probing to examine where Big Five information emerges across model depth. We then identify neurons that respond selectively to each Big Five concept and test whether enhancing or suppressing their activations can bias latent representations and label generation in intended directions. We find that Big Five information becomes rapidly decodable in early layers and remains detectable through the final layers, while concept-selective neurons are most prevalent in mid layers and exhibit limited overlap across domains. Interventions on these neurons consistently shift probe readouts toward targeted concepts, with targeted success rates exceeding 0.8 for some concepts, indicating that the model's internal separation of Big Five personality traits can be causally steered. At the label-generation level, the same interventions often bias generated label distributions in the intended directions, but the effects are weaker, more concept-dependent, and often accompanied by cross-trait spillover, indicating that comparable control over generated labels is difficult even with interventions on a large fraction of concept-selective neurons. Overall, our findings reveal a gap between representational control and behavioral control in LLMs.

URL PDF HTML ☆

赞 0 踩 0

2604.11801 2026-04-14 cs.CL

CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation

WonJin Yoon, Kangyu Zhu, Ian Bulovic, Autumn Sehy, Yanjun Gao, Dmitriy Dligach, Majid Afshar, Timothy A. Miller

2604.11798 2026-04-14 cs.CV cs.AI

Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net

Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono

2604.11793 2026-04-14 cs.RO

Disentangled Point Diffusion for Precise Object Placement

Lyuxing He, Eric Cai, Shobhit Aggarwal, Jianjun Wang, David Held

2604.11792 2026-04-14 cs.CV

LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Ruqi Huang, Hao Zhao

Comments Accepted by CVPR 2026. Project Page: https://lottiegpt.github.io/

2604.11791 2026-04-14 cs.LG cs.AI

A Mechanistic Analysis of Looped Reasoning Language Models

Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong

Comments 39 pages, 63 figures

2604.11788 2026-04-14 cs.CV

HDR Video Generation via Latent Alignment with Logarithmic Encoding

Naomi Ken Korem, Mohamed Oumoumad, Harel Cain, Matan Ben Yosef, Urska Jelercic, Ofir Bibi, Yaron Inger, Or Patashnik, Daniel Cohen-Or

Comments https://HDR-LumiVid.github.io

2604.11786 2026-04-14 cs.AI cs.MA

GenTac: Generative Modeling and Forecasting of Soccer Tactics

Jiayuan Rao, Tianlin Gui, Haoning Wu, Yanfeng Wang, Weidi Xie

Comments 40 pages, 5 figures; technical Report

2604.11784 2026-04-14 cs.LG cs.AI cs.CL cs.CV

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

2604.11778 2026-04-14 cs.CL cs.AI

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang, Wenling Yuan, Yifan Zhou, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai

Comments 17 pages, 9 figures

2604.11775 2026-04-14 cs.CV cs.AI

Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation

Ricardo Coimbra Brioso, Giulio Sichili, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono

2604.11773 2026-04-14 cs.LG cond-mat.mtrl-sci cs.CV

Autonomous Diffractometry Enabled by Visual Reinforcement Learning

J. Oppliger, M. Stifter, A. Rüegg, I. Biało, L. Martinelli, P. G. Freeman, D. Prabhakaran, J. Zhao, Q. Wang, J. Chang

Comments 20 pages, 16 figures

2604.11768 2026-04-14 cs.RO

Identifying Inductive Biases for Robot Co-Design

Apoorv Vaish, Oliver Brock

2604.11762 2026-04-14 cs.CV cs.LG eess.SP physics.med-ph stat.ML

MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI

Paula Arguello, Berk Tinaz, Mohammad Shahab Sepehri, Maryam Soltanolkotabi, Mahdi Soltanolkotabi

Comments 15 pages, 6 figures, preliminary version

2604.11757 2026-04-14 cs.RO cs.AI cs.CV

StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems

Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia

2604.11753 2026-04-14 cs.CL

Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

Yoonsang Lee, Howard Yen, Xi Ye, Danqi Chen

2604.11751 2026-04-14 cs.RO cs.AI

Grounded World Model for Semantically Generalizable Planning

Quanyi Li, Lan Feng, Haonan Zhang, Wuyang Li, Letian Wang, Alexandre Alahi, Harold Soh

2604.11749 2026-04-14 cs.CL

HistLens: Mapping Idea Change across Concepts and Corpora

Yi Jing, Weiyun Qiu, Yihang Peng, Zhifang Sui

Comments Accepted by ACL 2026 MainConference

2604.11744 2026-04-14 cs.LG

KL Divergence Between Gaussians: A Step-by-Step Derivation for the Variational Autoencoder Objective

Andrés Muñoz, Rodrigo Ramele

Comments 8 pages, no figures. Derivation of the KL divergence between Gaussian distributions with application to Variational Autoencoders (VAEs)

2604.11742 2026-04-14 cs.CL cs.AI

Discourse Diversity in Multi-Turn Empathic Dialogue

Hongli Zhan, Emma S. Gueorguieva, Javier Hernandez, Jina Suh, Desmond C. Ong, Junyi Jessy Li

详情

英文摘要

Large language models (LLMs) produce responses rated as highly empathic in single-turn settings (Ayers et al., 2023; Lee et al., 2024), yet they are also known to be formulaic generators that reuse the same lexical patterns, syntactic templates, and discourse structures across tasks (Jiang et al., 2025; Shaib et al., 2024; Namuduri et al., 2025). Less attention has been paid to whether this formulaicity extends to the level of discourse moves, i.e., what a response does for the person it is addressing. This question is especially consequential for empathic dialogue, where effective support demands not just a kind response at one moment but varied strategies as a conversation unfolds (Stiles et al., 1998). Indeed, prior work shows that LLMs reuse the same tactic sequences more than human supporters in single-turn settings (Gueorguieva et al., 2026). We extend this analysis to multi-turn conversations and find that the rigidity compounds: once a tactic appears in a supporter turn, LLMs reuse it in the next at nearly double the rate of humans (0.50-0.56 vs. 0.27). This pattern holds across LLMs serving as supporters in real emotional support conversations, and is invisible to standard similarity metrics. To address this gap, we introduce MINT (Multi-turn Inter-tactic Novelty Training), the first reinforcement learning framework to optimize discourse move diversity across multi-turn empathic dialogue. The best MINT variant combines an empathy quality reward with a cross-turn tactic novelty signal, improving aggregate empathy by 25.3% over vanilla across 1.7B and 4B models while reducing cross-turn discourse move repetition by 26.3% on the 4B model, surpassing all baselines including quality-only and token-level diversity methods on both measures. These results suggest that what current models lack is not empathy itself, but the ability to vary their discourse moves across a conversation.

URL PDF HTML ☆

赞 0 踩 0

2604.11741 2026-04-14 cs.AI

Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games

Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li

Comments 9 pages, 5 figures, Findings of ACL 2026

2604.11737 2026-04-14 cs.CV

Learning Long-term Motion Embeddings for Efficient Kinematics Generation

Nick Stracke, Kolja Bauer, Stefan Andreas Baumann, Miguel Angel Bautista, Josh Susskind, Björn Ommer

Comments for the project page and code, view https://compvis.github.io/long-term-motion

2604.11724 2026-04-14 cs.CV

The Devil is in the Details -- From OCR for Old Church Slavonic to Purely Visual Stemma Reconstruction

Armin Hoenen

Comments International conference at Valamo monastery, Finnland, 2026

2604.11720 2026-04-14 cs.CV cs.AI cs.CR

On the Robustness of Watermarking for Autoregressive Image Generation

Andreas Müller, Denis Lukovnikov, Shingo Kodama, Minh Pham, Anubhav Jain, Jonathan Petit, Niv Cohen, Asja Fischer

2604.11716 2026-04-14 cs.AI cs.CL

SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

Shuquan Lian, Juncheng Liu, Yazhe Chen, Yuhong Chen, Hui Li

2604.11714 2026-04-14 cs.CV

BEM: Training-Free Background Embedding Memory for False-Positive Suppression in Real-Time Fixed-Background Camera

Junwoo Park, Jangho Lee, Sunho Lim

Comments Accepted to ICPR 2026