arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.13945 2026-03-24 cs.CV

Can You Learn to See Without Images? Procedural Warm-Up for Vision Transformers

Zachary Shinnick, Liangze Jiang, Hemanth Saratchandran, Damien Teney, Anton van den Hengel

Comments Camera-ready version

详情

英文摘要

Transformers are remarkably versatile, suggesting the existence of generic inductive biases beneficial across modalities. In this work, we explore a new way to instil such biases in vision transformers (ViTs) through pretraining on procedurally generated data devoid of visual or semantic content. We generate this data with simple algorithms such as formal grammars, so the results bear no relationship to either natural or synthetic images. We use this procedurally generated data to pretrain ViTs in a warm-up phase that bypasses their visual patch embedding mechanisms, thus encouraging the models to internalise abstract computational priors. When followed by standard image-based training, this warm-up significantly improves data efficiency, convergence speed, and downstream performance. On ImageNet-1K, for example, allocating just 1% of the training budget to procedural data improves final accuracy by over 1.7%. In terms of its effect on performance, 1% procedurally generated data is thus equivalent to 28% of the ImageNet-1K data. These findings suggest a promising path toward new data-efficient and domain-agnostic pretraining strategies.

URL PDF HTML ☆

赞 0 踩 0

2511.12920 2026-03-24 cs.CL cs.AI cs.CY cs.HC cs.IR

Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy

Desheng Hu, Joachim Baumann, Aleksandra Urman, Elsa Lichtenegger, Robin Forsberg, Aniko Hannak, Christo Wilson

Comments 18 pages, 10 figures; to appear in AAAI ICWSM 2026

2511.11828 2026-03-24 cs.LG cs.AI

Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

Wenwen Si, Sooyong Jang, Insup Lee, Osbert Bastani

2511.10065 2026-03-24 cs.AI

RadHiera: Semantic Hierarchical Reinforcement Learning for Medical Report Generation

Bodong Du, Honglong Yang, Xiaomeng Li

2511.03235 2026-03-24 cs.AI

From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers

Yi-Fei Liu, Yi-Long Lu, Di He, Hang Zhang

Comments Accepted to ICLR2026

详情

英文摘要

Psychological constructs within individuals are widely believed to be interconnected. We investigated whether and how Large Language Models (LLMs) can model the correlational structure of human psychological traits from minimal quantitative inputs. We prompted various LLMs with Big Five Personality Scale responses from 816 human individuals to role-play their responses on nine other psychological scales. LLMs demonstrated remarkable accuracy in capturing human psychological structure, with the inter-scale correlation patterns from LLM-generated responses strongly aligning with those from human data $(R^2 > 0.89)$. This zero-shot performance substantially exceeded predictions based on semantic similarity and approached the accuracy of machine learning algorithms trained directly on the dataset. Analysis of reasoning traces revealed that LLMs use a systematic two-stage process: First, they transform raw Big Five responses into natural language personality summaries through information selection and compression, analogous to generating sufficient statistics. Second, they generate target scale responses based on reasoning from these summaries. For information selection, LLMs identify the same key personality factors as trained algorithms, though they fail to differentiate item importance within factors. The resulting compressed summaries are not merely redundant representations but capture synergistic information--adding them to original scores enhances prediction alignment, suggesting they encode emergent, second-order patterns of trait interplay. Our findings demonstrate that LLMs can precisely predict individual participants' psychological traits from minimal data through a process of abstraction and reasoning, offering both a powerful tool for psychological simulation and valuable insights into their emergent reasoning capabilities.

URL PDF HTML ☆

赞 0 踩 0

2511.01946 2026-03-24 cs.LG cond-mat.mtrl-sci cs.AI physics.chem-ph

COFAP: A Universal Framework for COFs Adsorption Prediction through Designed Multi-Modal Extraction and Cross-Modal Synergy

Zihan Li, Mingyang Wan, Mingyu Gao, Xishi Tai, Zhongshan Chen, Xiangke Wang, Feifan Zhang

2511.01137 2026-03-24 cs.LG math.AG math.DS stat.ML

Regularization Implies balancedness in the deep linear network

Kathryn Lindsey, Govind Menon

Comments 18 pages, 3 figures. Fixed minor errors in revision, added more context and created Discussion section

2510.27419 2026-03-24 cs.AI cs.CL

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

Tian Liang, Wenxiang Jiao, Zhiwei He, Jiahao Xu, Haitao Mi, Dong Yu

Comments ICLR 2026

2510.19265 2026-03-24 cs.CL

Difficulty-Controllable Multiple-Choice Question Generation Using Large Language Models and Direct Preference Optimization

Yuto Tomikawa, Masaki Uto

Comments Accepted for publication in IEEE Access. Please refer to the published version for the final content. DOI: 10.1109/ACCESS.2026.3674595

2510.19217 2026-03-24 cs.CL

Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+

York Hay Ng, Aditya Khan, Xiang Lu, Matteo Salloum, Michael Zhou, Phuong H. Hoang, A. Seza Doğruöz, En-Shiun Annie Lee

Comments Accepted to EACL 2026 SRW

2510.18173 2026-03-24 cs.CL

Moneyball with LLMs: Analyzing Tabular Summarization in Sports Narratives

Ritam Upadhyay, Naman Ahuja, Rishabh Baral, Aparna Garimella, Vivek Gupta

2510.17699 2026-03-24 cs.CV cs.LG

GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver

Aleksandr Oganov, Ilya Bykov, Eva Neudachina, Mishan Aliev, Alexander Tolmachev, Alexander Sidorov, Aleksandr Zuev, Andrey Okhotin, Denis Rakitin, Aibek Alanov

Comments Accepted to ICLR 2026. Camera ready version

2510.17564 2026-03-24 cs.LG cs.AI cs.RO cs.SY eess.SY

Towards a Practical Understanding of Lagrangian Methods in Safe Reinforcement Learning

Lindsay Spoor, Álvaro Serra-Gómez, Aske Plaat, Thomas Moerland

2510.14922 2026-03-24 cs.AI cs.CL cs.LG eess.AS eess.SP

TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG

Annisaa Fitri Nurfidausi, Eleonora Mancini, Paolo Torroni

2510.13232 2026-03-24 cs.CV cs.AI

What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

Inha Kang, Youngsun Lim, Seonho Lee, Jiho Choi, Junsuk Choe, Hyunjung Shim

Comments 56 pages

2510.13170 2026-03-24 cs.CL

Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism

Xiaoshu Chen, Sihang Zhou, Ke Liang, Duanyang Yuan, Haoyuan Chen, Xiaoyu Sun, Lingyuan Meng, Xinwang Liu

2510.10154 2026-03-24 cs.RO

CompassNav: Steering From Path Imitation To Decision Understanding In Navigation

LinFeng Li, Jian Zhao, Yuan Xie, Xin Tan, Xuelong Li

2510.09695 2026-03-24 cs.CL

Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection

Yanran Chen, Lynn Greschner, Roman Klinger, Michael Klenk, Steffen Eger

Comments EACL 2026 Main Camera-ready; Figure 4 and typo fixed

2510.08138 2026-03-24 cs.CV cs.AI cs.MM

Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention Discriminability

Chengzhi Li, Heyan Huang, Ping Jian, Zhen Yang, Yaning Tian, Zhongbin Guo

Comments Accepted by CVPR 2026

2510.07028 2026-03-24 cs.RO

Efficient View Planning Guided by Previous-Session Reconstruction for Repeated Plant Monitoring

Sicong Pan, Luca Lobefaro, Moein Taherkhani, Xuying Huang, Rohit Menon, Cyrill Stachniss, Maren Bennewitz

Comments Submitted for review

2510.04058 2026-03-24 cs.LG

Unlearning in Diffusion models under Data Constraints: A Variational Inference Approach

Subhodip Panda, Varun M S, Shreyans Jain, Sarthak Kumar Maharana, Prathosh A. P

详情

Journal ref: Transaction on Machine Learning Research (TMLR), 2026

英文摘要

For a responsible and safe deployment of diffusion models in various domains, regulating the generated outputs from these models is desirable because such models could generate undesired, violent, and obscene outputs. To tackle this problem, recent works use machine unlearning methodology to forget training data points containing these undesired features from pre-trained generative models. However, these methods proved to be ineffective in data-constrained settings where the whole training dataset is inaccessible. Thus, the principal objective of this work is to propose a machine unlearning methodology that can prevent the generation of outputs containing undesired features from a pre-trained diffusion model in such a data-constrained setting. Our proposed method, termed as Variational Diffusion Unlearning (VDU), is a computationally efficient method that only requires access to a subset of training data containing undesired features. Our approach is inspired by the variational inference framework with the objective of minimizing a loss function consisting of two terms: plasticity inducer and stability regularizer. Plasticity inducer reduces the log-likelihood of the undesired training data points, while the stability regularizer, essential for preventing loss of image generation quality, regularizes the model in parameter space. We validate the effectiveness of our method through comprehensive experiments for both class unlearning and feature unlearning. For class unlearning, we unlearn some user-identified classes from MNIST, CIFAR-10, and tinyImageNet datasets from a pre-trained unconditional denoising diffusion probabilistic model (DDPM). Similarly, for feature unlearning, we unlearn the generation of certain high-level features from a pre-trained Stable Diffusion model trained on LAION-5B dataset.

URL PDF HTML ☆

赞 0 踩 0

2510.02711 2026-03-24 cs.LG cs.AI cs.CR

A Novel Unified Lightweight Temporal-Spatial Transformer Approach for Intrusion Detection in Drone Networks

Tarun Kumar Biswas, Ashrafun Zannat, Waqas Ishtiaq, Md. Alamgir Hossain

Comments 21 pages, 18 figures, 5 tables

详情

DOI: 10.1038/s41598-026-45063-6
Journal ref: Scientific Reports, 2026

英文摘要

The growing integration of drones across commercial, industrial, and civilian domains has introduced significant cybersecurity challenges, particularly due to the susceptibility of drone networks to a wide range of cyberattacks. Existing intrusion detection mechanisms often lack the adaptability, efficiency, and generalizability required for the dynamic and resource constrained environments in which drones operate. This paper proposes TSLT-Net, a novel lightweight and unified Temporal Spatial Transformer based intrusion detection system tailored specifically for drone networks. By leveraging self attention mechanisms, TSLT-Net effectively models both temporal patterns and spatial dependencies in network traffic, enabling accurate detection of diverse intrusion types. The framework includes a streamlined preprocessing pipeline and supports both multiclass attack classification and binary anomaly detection within a single architecture. Extensive experiments conducted on the ISOT Drone Anomaly Detection Dataset, consisting of more than 2.3 million labeled records, demonstrate the superior performance of TSLT-Net with 99.99 percent accuracy in multiclass detection and 100 percent in binary anomaly detection, while maintaining a minimal memory footprint of only 0.04 MB and 9722 trainable parameters. These results establish TSLT-Net as an effective and scalable solution for real time drone cybersecurity, particularly suitable for deployment on edge devices in mission critical UAV systems.

URL PDF HTML ☆

赞 0 踩 0

2510.02375 2026-03-24 cs.CL cs.AI cs.LG

Pretraining with hierarchical memories: separating long-tail and common knowledge

Hadi Pouransari, David Grangier, C Thomas, Michael Kirchhof, Oncel Tuzel

Comments ICLR 2026

2510.01049 2026-03-24 cs.CV cs.RO

KeySG: Hierarchical Keyframe-Based 3D Scene Graphs

Abdelrhman Werby, Dennis Rotondi, Fabio Scaparro, Kai O. Arras

Comments Code and video are available at https://keysg-lab.github.io/

2510.01037 2026-03-24 cs.LG cs.AI

CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Yongcheng Zeng, Zexu Sun, Bokai Ji, Erxue Min, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Haifeng Zhang, Xu Chen, Jun Wang

Comments 25 pages, 10 Figures

2509.24313 2026-03-24 cs.RO

Learning to Sample: Reinforcement Learning-Guided Sampling for Autonomous Vehicle Motion Planning

Korbinian Moller, Roland Stroop, Mattia Piccinini, Alexander Langmann, Johannes Betz

Comments 8 pages, submitted to the IEEE for possible publication

2509.24302 2026-03-24 cs.LG

LEAF: Language-EEG Aligned Foundation Model for Brain-Computer Interfaces

Muyun Jiang, Shuailei Zhang, Zhenjie Yang, Mengjun Wu, Weibang Jiang, Zhiwei Guo, Wei Zhang, Rui Liu, Shangen Zhang, Yong Li, Yi Ding, Cuntai Guan

详情

英文摘要

Recent advances in electroencephalography (EEG) foundation models, which capture transferable EEG representations, have greatly accelerated the development of brain-computer interfaces (BCIs). However, existing approaches still struggle to incorporate language instructions as prior constraints for EEG representation learning, limiting their ability to leverage the semantic knowledge inherent in language to unify different labels and tasks. To address this challenge, we present LEAF, a foundation model for EEG--Language Alignment with Semantic Task Instruction and Querying. LEAF integrates task-aware semantic guidance to produce structured and linguistically aligned EEG embeddings, thereby enhancing decoding robustness and transferability. In the pretraining stage, we introduce a joint Spectral--Temporal Reconstruction (STR) framework that captures the coupled spectral rhythms and temporal dynamics of EEG signals. STR applies randomized spectral perturbation to enhance frequency robustness and uses two complementary temporal objectives to learn both contextual and sequential structure. In the EEG-Language alignment stage, we propose the Instruction-conditioned Q-Former (IQF). This query-based cross-attention transformer injects instruction embeddings into EEG tokens and achieves semantic alignment with textual label embeddings through learnable queries. We evaluate LEAF on 16 downstream datasets spanning motor imagery, emotion recognition, steady-state visual evoked potentials, covert speech, and healthcare tasks. LEAF achieves state-of-the-art performance on 12 of the 16 datasets and obtains the best average results across all five task categories. Importantly, our analyses reveal for the first time that explicit task instructions serve as semantic priors guiding EEG embeddings into coherent and linguistically grounded spaces. The code and pre-trained weights will be released.

URL PDF HTML ☆

赞 0 踩 0

2509.18801 2026-03-24 cs.CV cs.AI

A Kernel Space-based Multidimensional Sparse Model for Dynamic PET Image Denoising

Kuang Xiaodong, Li Bingxuan, Li Yuan, Rao Fan, Ma Gege, Xie Qingguo, Mok Greta S P, Liu Huafeng, Zhu Wentao

2509.16963 2026-03-24 cs.RO cs.SY eess.SY

A Tactile-based Interactive Motion Planner for Robots in Unknown Cluttered Environments

Chengjin Wang, Yanmin Zhou, Zheng Yan, Feng Luan, Runjie Shen, Hongrui Sang, Zhipeng Wang, Bin He

2509.14617 2026-03-24 cs.LG

HDC-X: Efficient Medical Data Classification for Embedded Devices

Jianglan Wei, Zhenyu Zhang, Pengcheng Wang, Mingjie Zeng, Zhigang Zeng