arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03436 2026-04-07 cs.LG cs.AI

MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents

Matthew Levinson

详情

英文摘要

Sparse autoencoders (SAEs) are increasingly used for safety-relevant applications including alignment detection and model steering. These use cases require SAE latents to be as atomic as possible. Each latent should represent a single coherent concept drawn from a single underlying representational subspace. In practice, SAE latents blend representational subspaces together. A single feature can activate across semantically distinct contexts that share no true common representation, muddying an already complex picture of model computation. We introduce a joint training objective that directly penalizes this subspace blending. A small meta SAE is trained alongside the primary SAE to sparsely reconstruct the primary SAE's decoder columns; the primary SAE is penalized whenever its decoder directions are easy to reconstruct from the meta dictionary. This occurs whenever latent directions lie in a subspace spanned by other primary directions. This creates gradient pressure toward more mutually independent decoder directions that resist sparse meta-compression. On GPT-2 large (layer 20), the selected configuration reduces mean $|φ|$ by 7.5% relative to an identical solo SAE trained on the same data. Automated interpretability (fuzzing) scores improve by 7.6%, providing external validation of the atomicity gain independent of the training and co-occurrence metrics. Reconstruction overhead is modest. Results on Gemma 2 9B are directional. On not-fully-converged SAEs, the same parameterization yields the best results, a $+8.6\%$ $Δ$Fuzz. Though directional, this is an encouraging sign that the method transfers to a larger model. Qualitative analysis confirms that features firing on polysemantic tokens are split into semantically distinct sub-features, each specializing in a distinct representational subspace.

URL PDF HTML ☆

赞 0 踩 0

2604.03428 2026-04-07 cs.CV cs.AI cs.LG

Inference-Path Optimization via Circuit Duplication in Frozen Visual Transformers for Marine Species Classification

Thomas Manuel Rost

Comments pre study, more ablations to come

2604.03427 2026-04-07 cs.LG cs.SY eess.SY

Adversarial Robustness of Deep State Space Models for Forecasting

Sribalaji C. Anand, George J. Pappas

Comments 8 pages, 5 figures, conference submission

2604.03426 2026-04-07 cs.CV

Automated Segmentation and Tracking of Group Housed Pigs Using Foundation Models

Ye Bi, Bimala Acharya, David Rosero, Juan Steibel

详情

英文摘要

Foundation models (FM) are reshaping computer vision by reducing reliance on task-specific supervised learning and leveraging general visual representations learned at scale. In precision livestock farming, most pipelines remain dominated by supervised learning models that require extensive labeled data, repeated retraining, and farm-specific tuning. This study presents an FM-centered workflow for automated monitoring of group-housed nursery pigs, in which pretrained vision-language FM serve as general visual backbones and farm-specific adaptation is achieved through modular post-processing. Grounding-DINO was first applied to 1,418 annotated images to establish a baseline detection performance. While detection accuracy was high under daytime conditions, performance degraded under night-vision and heavy occlusion, motivating the integration of temporal tracking logic. Building on these detections, short-term video segmentation with Grounded-SAM2 was evaluated on 550 one-minute video clips; after post-processing, over 80% of 4,927 active tracks were fully correct, with most remaining errors arising from inaccurate masks or duplicated labels. To support identity consistency over an extended time, we further developed a long-term tracking pipeline integrating initialization, tracking, matching, mask refinement, re-identification, and post-hoc quality control. This system was evaluated on a continuous 132-minute video and maintained stable identities throughout. On 132 uniformly sampled ground-truth frames, the system achieved a mean region similarity (J) of 0.83, contour accuracy (F) of 0.92, J&F of 0.87, MOTA of 0.99, and MOTP of 90.7%, with no identity switches. Overall, this work demonstrates how FM prior knowledge can be combined with lightweight, task-specific logic to enable scalable, label-efficient, and long-duration monitoring in pig production.

URL PDF HTML ☆

赞 0 踩 0

2604.03422 2026-04-07 cs.CL

Towards a theory of morphology-driven marking in the lexicon: The case of the state

Mohamed El Idrissi

Comments 32 pages, 1 figure

2604.03417 2026-04-07 cs.LG

Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization

Peng Zhang, Xuefeng Li, Xiaoqi Wang, Han-Wei Shen, Yifan Hu

2604.03414 2026-04-07 cs.CV

KiToke: Kernel-based Interval-aware Token Compression for Video Large Language Models

Haifeng Huang, Yang Li

2604.03404 2026-04-07 cs.RO cs.LG

Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking

Haotian Xiang, Qin Lu, Yaakov Bar-Shalom

2604.03400 2026-04-07 cs.CV cs.AI cs.LG

Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

Kenan Tang, Praveen Arunshankar, Andong Hua, Anthony Yang, Yao Qin

Comments Accepted to CVPR 2026 Workshop on Agentic AI for Visual Media

2604.03397 2026-04-07 cs.RO

Learning-Based Fault Detection for Legged Robots in Remote Dynamic Environments

Abriana Stewart-Height, Seema Jahagirdar, Nikolai Matni

2604.03395 2026-04-07 cs.CL

Are Arabic Benchmarks Reliable? QIMMA's Quality-First Approach to LLM Evaluation

Leen AlQadi, Ahmed Alzubaidi, Mohammed Alyafeai, Hamza Alobeidli, Maitha Alhammadi, Shaikha Alsuwaidi, Omar Alkaabi, Basma El Amel Boussaha, Hakim Hacid

2604.03393 2026-04-07 cs.AI

TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

Tung Sum Thomas Kwok, Xinyu Wang, Xiaofeng Lin, Peng Lu, Chunhe Wang, Changlun Li, Hanwei Wu, Nan Tang, Elisa Kreiss, Guang Cheng

2604.03388 2026-04-07 cs.LG stat.ML

Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapters

Haotian Xiang, Bingcong Li, Qin Lu

详情

英文摘要

When deploying large language models (LLMs) to safety-critical applications, uncertainty quantification (UQ) is of utmost importance to self-assess the reliability of the LLM-based decisions. However, such decisions typically suffer from overconfidence, particularly after parameter-efficient fine-tuning (PEFT) for downstream domain-specific tasks with limited data. Existing methods to alleviate this issue either rely on Laplace approximation based post-hoc framework, which may yield suboptimal calibration depending on the training trajectory, or variational Bayesian training that requires multiple complete forward passes through the entire LLM backbone at inference time for Monte Carlo estimation, posing scalability challenges for deployment. To address these limitations, we build on the Bayesian last layer (BLL) model, where the LLM-based deterministic feature extractor is followed by random last layer parameters for uncertainty reasoning. Since existing low-rank adapters (LoRA) for PEFT have limited expressiveness due to rank collapse, we address this with Polar-decomposed Low-rank Adapter Representation (PoLAR), an orthogonalized parameterization paired with Riemannian optimization to enable more stable and expressive adaptation. Building on this PoLAR-BLL model, we leverage the variational (V) inference framework to put forth a scalable Bayesian fine-tuning approach which jointly seeks the PoLAR parameters and approximate posterior of the last layer parameters via alternating optimization. The resulting PoLAR-VBLL is a flexible framework that nicely integrates architecture-enhanced optimization with scalable Bayesian inference to endow LLMs with well-calibrated UQ. Our empirical results verify the effectiveness of PoLAR-VBLL in terms of generalization and uncertainty estimation on both in-distribution and out-of-distribution data for various common-sense reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.03387 2026-04-07 cs.AI

Hume's Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away

Yiling Wu

2604.03377 2026-04-07 cs.CV

ViBA: Implicit Bundle Adjustment with Geometric and Temporal Consistency for Robust Visual Matching

Xiaoji Niu, Yuqing Wang, Yan Wang, Hailiang Tang, Tisheng Zhang

2604.03376 2026-04-07 cs.AI cs.CL

VERT: Reliable LLM Judges for Radiology Report Evaluation

Federica Bologna, Jean-Philippe Corbeil, Matthew Wilkens, Asma Ben Abacha

2604.03374 2026-04-07 cs.CL cs.AI

CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

Mete Ismayilzada, Renqing Cuomao, Daniil Yurshevich, Anna Sotnikova, Lonneke van der Plas, Antoine Bosselut

Comments Under review

2604.03371 2026-04-07 cs.RO

Surrogate Model-Based Near-Optimal Gain Selection for Approach-Angle-Constrained Two-Phase Pure Proportional Navigation

Abhigyan Roy, Shreeya Padte, Abel Viji George, Vivek A, Satadal Ghosh

Comments 6 pages

2604.03361 2026-04-07 cs.LG q-bio.QM

The limits of bio-molecular modeling with large language models : a cross-scale evaluation

Yaxin Xu, Yue Zhou, Tianyu Zhao, Fengwei An, Zhixiang Ren

2604.03356 2026-04-07 cs.AI

Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing

Nicholas Skytland, Lauren Parsons, Alicia Llewellyn, Steele Billings, Peter Larson, John Anderson, Sean Boisen, Steve Runge

2604.03350 2026-04-07 cs.LG cs.AI

From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models

Paul Saves, Matthieu Mastio, Nicolas Verstaevel, Benoit Gaudou

Comments Published in MABS 2026 - The 27th International Workshop on Multi-Agent-Based Simulation

2604.03349 2026-04-07 cs.CV

YOLOv11 Demystified: A Practical Guide to High-Performance Object Detection

Nikhileswara Rao Sulake

Comments Paper accepted to CVC 2026 conference, but not continued due to no financial support

2604.03345 2026-04-07 cs.LG

Hardware-Oriented Inference Complexity of Kolmogorov-Arnold Networks

Bilal Khalid, Pedro Freire, Sergei K. Turitsyn, Jaroslaw E. Prilepsky

Comments This work has been submitted to the IEEE for possible publication

2604.03344 2026-04-07 cs.LG cs.AI

Towards Intelligent Energy Security: A Unified Spatio-Temporal and Graph Learning Framework for Scalable Electricity Theft Detection in Smart Grids

AbdulQoyum A. Olowookere, Usman A. Oguntola, Ebenezer. Leke Odekanle, Maridiyah A. Madehin, Aisha A. Adesope

Comments 26 pages, 9 figures

2604.03342 2026-04-07 cs.CV

Mixture-of-Experts in Remote Sensing: A Survey

Yongchuan Cui, Peng Liu, Lajiao Chen

2604.03340 2026-04-07 cs.CV cs.AI

Learning Additively Compositional Latent Actions for Embodied AI

Hangxing Wei, Xiaoyu Chen, Chuheng Zhang, Tim Pearce, Jianyu Chen, Alex Lamb, Li Zhao, Jiang Bian

2604.03339 2026-04-07 cs.CV

Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction

Wuqi Su, Huilun Song, Chen Zhao, Chi Xu

2604.03335 2026-04-07 cs.LG cs.NE

Apparent Age Estimation: Challenges and Outcomes

Justin Rainier Go, Lorenz Bernard Marqueses, Mikaella Kaye Martinez, John Kevin Patrick Sarmiento, Abien Fred Agarap

Comments Accepted for oral presentation at Philippine Computing Science Congress 2026

2604.03334 2026-04-07 cs.CV

Bridging the Dimensionality Gap: A Taxonomy and Survey of 2D Vision Model Adaptation for 3D Analysis

Akshat Pandya, Bhavuk Jain

Comments VISAPP 2026

详情

DOI: 10.5220/0014289900004084
Journal ref: Proceedings of the 21st International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP 2026; ISBN 978-989-758-804-4; ISSN 2184-4321, SciTePress, pages 353-364

英文摘要

The remarkable success of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in 2D vision has spurred significant research in extending these architectures to the complex domain of 3D analysis. Yet, a core challenge arises from a fundamental dichotomy between the regular, dense grids of 2D images and the irregular, sparse nature of 3D data such as point clouds and meshes. This survey provides a comprehensive review and a unified taxonomy of adaptation strategies that bridge this gap, classifying them into three families: (1) Data-centric methods that project 3D data into 2D formats to leverage off-the-shelf 2D models, (2) Architecture-centric methods that design intrinsic 3D networks, and (3) Hybrid methods, which synergistically combine the two modeling paradigms to benefit from both rich visual priors of large 2D datasets and explicit geometric reasoning of 3D models. Through this framework, we qualitatively analyze the fundamental trade-offs between these families concerning computational complexity, reliance on large-scale pre-training, and the preservation of geometric inductive biases. We discuss key open challenges and outline promising future research directions, including the development of 3D foundation models, advancements in self-supervised learning (SSL) for geometric data, and the deeper integration of multi-modal signals.

URL PDF HTML ☆

赞 0 踩 0

2604.03333 2026-04-07 cs.SD cs.AI

Composer Vector: Style-steering Symbolic Music Generation in a Latent Space

Xunyi Jiang, Mingyang Yao, Jingyue Huang, Julian McAuley