arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.27967 2026-05-01 cs.LG

Differentiable latent structure discovery for interpretable forecasting in clinical time series

Ivan Lerner, Jean Feydy, Alexandre Kalimouttou, Anita Burgun, Francis Bach

Comments This manuscript is under review at BioData Mining

详情

英文摘要

Background: Timely, uncertainty-aware forecasting from irregular electronic health records (EHR) can support critical-care decisions, yet most approaches either impute to a grid or sacrifice interpretability. We introduce StructGP, a continuous-time multi-task Gaussian process that couples process convolutions with differentiable structure learning to uncover a sparse, ordered directed acyclic graph (DAG) of inter-variable dependencies while preserving principled uncertainty. We further propose LP-StructGP, which augments StructGP with latent pathways-shared, temporally shifted trajectories inferred via subject-specific coupling filters and a softmax gating mechanism-to capture cross-patient progression patterns. Both models are trained under sparsity and acyclicity constraints (augmented Lagrangian, Adam) using scalable low-rank updates. Results: In simulations, the approach reliably recovers ground-truth graphs (Structural Hamming Distance approaching 0 as cohorts grow) and pathway assignments (high Adjusted Rand Index). On a MIMIC-IV septic shock cohort (n=1,008; norepinephrine, creatinine, mean arterial pressure), StructGP improves short-horizon (6 h) forecasting over independent-task baselines (average RMSE 0.68 [95%CI: 0.63--0.74] vs. 0.88 [0.83-0.94]) and, with 15 additional inputs, markedly outperforms unstructured kernels (0.63 [0.58-0.69] vs. 3.02 [2.85-3.18]) with superior calibration (coverage 0.96 vs. 0.84). On the PhysioNet Challenge (12k patients, 41 variables), StructGP attains competitive accuracy (MAE 3.72e-2) relative to a state-of-the-art graph neural model while maintaining calibrated uncertainty. Conclusion: These results show that structured process convolutions with latent pathways deliver interpretable, scalable, and well-calibrated forecasting for irregular clinical time series.

URL PDF HTML ☆

赞 0 踩 0

2604.27964 2026-05-01 cs.AI

Splitting Assumption-Based Argumentation Frameworks

Giovanni Buraglio, Wolfgang Dvorak, Stefan Woltran

Comments Accepted at KR 2026

2604.27962 2026-05-01 cs.AI cs.CE cs.MA

Language Models Refine Mechanical Linkage Designs Through Symbolic Reflection and Modular Optimisation

João Pedro Gandarela, Thiago Rios, Stefan Menzel, André Freitas

2604.27958 2026-05-01 cs.CV

TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On

Dingbao Shao, Song Wu, Shenyi Wang, Ye Wang, Ziheng Tang, Fei Liu, Jiang Lin, Xinyu Chen, Qian Wang, Ying Tai, Jian Yang, Zili Yi

2604.27955 2026-05-01 cs.AI cs.CV

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

Junan Hu, Jian Liu, Jingxiang Lai, Jiarui Hu, Yiwei Sheng, Shuang Chen, Jian Li, Dazhao Du, Song Guo

Comments Project Page: https://github.com/Steve2457/Awesome-RL-GUI-Agents

2604.27953 2026-05-01 cs.AI cs.CV

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Kenneth J. K. Ong

2604.27944 2026-05-01 cs.LG cs.CY cs.GT physics.ao-ph

Calibrating Attribution Proxies for Reward Allocation in Participatory Weather Sensing

Mark C. Ballandies, Michael T. C. Chiu, Claudio J. Tessone

2604.27942 2026-05-01 cs.AI

A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics

Djamel Bouchaffra, Faycal Ykhlef, Mustapha Lebbah, Hanane Azzag

Comments Submitted to Nature. 21 pages, 4 figures. Code and data available at https://github.com/dbouchaffra/game-theoretic-free-energy-principle

2604.27936 2026-05-01 cs.LG eess.AS

Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

Eklavya Sarkar, Marius Miron, David Robinson, Gagan Narula, Milad Alizadeh, Ellen Gilsenan-McMahon, Emmanuel Chemla, Olivier Pietquin, Matthieu Geist

2604.27935 2026-05-01 cs.RO cs.SY eess.SP eess.SY

Flying by Inference: Active Inference World Models for Adaptive UAV Swarms

Kaleem Arshid, Ali Krayani, Lucio Marcenaro, David Martin Gomez, Carlo Regazzoni

Comments Submitted to IEEE journal

2604.27934 2026-05-01 cs.AI cs.CL

MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection

Weihai Lu, Zhejun Zhao, Yanshu Li, Huan He

Comments Accepted on ACL 2026 Main Conference

2604.27932 2026-05-01 cs.CV

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

Mingliang Liang, Zhuoran Liu, Arjen P. de Vries, Martha Larson

2604.27929 2026-05-01 cs.CL

DPN-LE: Dual Personality Neuron Localization and Editing for Large Language Models

Lifan Zheng, Xue Yang, Jiawei Chen, Chenyan Wu, Jingyuan Zhang, Fanheng Kong, Xinyi Zeng, Xiang Chen, Yu Tian

详情

Journal ref: ACL 2026 Findings

英文摘要

With the widespread adoption of large language models (LLMs), understanding their personality representation mechanisms has become critical. As a novel paradigm in Personality Editing, most existing methods employ neuron-editing to locate and modify LLM neurons, requiring changes to numerous neurons and leading to significant performance degradation. This raises a fundamental question: Are all modified neurons directly related to personality representation? In this work, we investigate and quantify this specificity through assessments of general capability impact and representation-level patterns. We find that: 1) Current methods can change personalities but reduce overall performance. 2) Neurons are multifunctional, connecting personality traits and general knowledge. 3) Opposing personality traits demonstrate distinctly mutually exclusive representation patterns. Motivated by these findings, we propose DPN-LE (Dual Personality Neuron Localization and Editing), which identifies personality-specific neurons by contrasting MLP activations between high-trait and low-trait samples. DPN-LE constructs layer-wise steering vectors and applies dual-criterion filtering based on Cohen's $d$ effect size and activation magnitude to isolate mutually exclusive neuron subsets. Sparse linear intervention on these neurons enables precise personality control at inference time. Using only 1,000 contrastive sample pairs per trait, DPN-LE intervenes on $\sim$0.5\% of neurons while achieving competitive personality control and substantially better capability preservation across reasoning tasks. Experiments on LLaMA-3-8B-Instruct and Qwen2.5-7B-Instruct demonstrate the effectiveness and generalizability of our approach.

URL PDF HTML ☆

赞 0 踩 0

2604.27928 2026-05-01 cs.CV cs.AI

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

Shipeng Liu, Liang Zhao, Dengfeng Chen, Zhanping Song

2604.27927 2026-05-01 cs.AI

Taming the Centaur(s) with LAPITHS: a framework for a theoretically grounded interpretation of AI performances

Matteo Da Pelo, Alessio Donvito, Claudio Frongia, Pietro Salis, Antonio Lieto

Comments 28 pages

2604.27920 2026-05-01 cs.CL cs.AI

Beyond Semantics: Measuring Fine-Grained Emotion Preservation in Small Language Model-Based Machine Translation

Dawid Wisniewski, Igor Czudy

Comments Accepted at EAMT 2026

2604.27918 2026-05-01 cs.CV

Generate Your Talking Avatar from Video Reference

Zujin Guo, Zhenhui Ye, Yi Ren, Yuanming Li, Ce Chen, Zhibin Hong, Chen Change Loy

Comments Project Page: https://gseancdat.github.io/projects/TAVR

2604.27914 2026-05-01 cs.CL cs.LG

Geometry-Calibrated Conformal Abstention for Language Models

Rui Xu, Yi Chen, Sihong Xie, Hui Xiong

2604.27911 2026-05-01 cs.LG cs.ET cs.NE

Physical Foundation Models: Fixed hardware implementations of large-scale neural networks

Logan G Wright, Tianyu Wang, Tatsuhiro Onodera, Peter L. McMahon

详情

英文摘要

Foundation models are deep neural networks (such as GPT-5, Gemini~3, and Opus~4) trained on large datasets that can perform diverse downstream tasks -- text and code generation, question answering, summarization, image classification, and so on. The philosophy of foundation models is to put effort into a single, large (${\sim}10^{12}$-parameter) general-purpose model that can be adapted to many downstream tasks with no or minimal additional training. We argue that the rise of foundation models presents an opportunity for hardware engineers: in contrast to when different models were used for different tasks, it now makes sense to build special-purpose, fixed hardware implementations of neural networks, manufactured and released at the roughly 1-year cadence of major new foundation-model versions. Beyond conventional digital-electronic inference hardware with read-only weight memory, we advocate a more radical re-thinking: hardware in which the neural network is realized directly at the level of the physical design and operates via the hardware's natural physical dynamics -- \textit{Physical Foundation Models} (PFMs). PFMs could enable orders-of-magnitude advantages in energy efficiency, speed, and parameter density. For ${\sim}10^{12}$-parameter models, this would both reduce the high energy burden of AI in datacenters and enable AI in edge devices that today are power-constrained to far smaller models. PFMs could also enable inference hardware for models much larger than current ones: $10^{15}$- or even $10^{18}$-parameter PFMs seem plausible by some measures. We present back-of-the-envelope calculations illustrating PFM scaling using an optical example -- a 3D nanostructured glass medium -- and discuss prospects in nanoelectronics and other physical platforms. We conclude with the major research challenges that must be resolved for trillion-parameter PFMs and beyond to become reality.

URL PDF HTML ☆

赞 0 踩 0

2604.27903 2026-05-01 cs.CV

HiMix: Hierarchical Artifact-aware Mixup for Generalized Synthetic Image Detection

Shuchang Zhou, Kaiwen Shen, Jiwei Wei, Yuyang Zhou, Peng Wang, Yang Yang

2604.27899 2026-05-01 cs.AI

Simulating clinical interventions with a generative multimodal model of human physiology

Guy Lutsker, Gal Sapir, Jordi Merino, Smadar Shilo, Anastasia Godneva, Eli Meirom, Shie Mannor, Hagai Rossman, Gal Chechik, Eran Segal

2604.27895 2026-05-01 cs.AI

Graph World Models: Concepts, Taxonomy, and Future Directions

Jiawei Liu, Senqiao Yang, Mingjun Wang, Yu Wang, Bei Yu

2604.27891 2026-05-01 cs.AI cs.LG

In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

Simon Dennis, Michael Diamond, Rivaan Patil, Kevin Shabahang, Hao Guo

Comments 23 pages

2604.27889 2026-05-01 cs.CV

Noise2Map: End-to-End Diffusion Model for Semantic Segmentation and Change Detection

Ali Shibli, Andrea Nascetti, Yifang Ban

详情

DOI: 10.1109/TGRS.2026.3687393 10.1109/TGRS.2026.3687393 10.1109/TGRS.2026.3687393 10.1109/TGRS.2026.3687393

英文摘要

Semantic segmentation and change detection are two fundamental challenges in remote sensing, requiring models to capture either spatial semantics or temporal differences from satellite imagery. Existing deep learning models often struggle with temporal inconsistencies or in capturing fine-grained spatial structures, require extensive pretraining, and offer limited interpretability - especially in real-world remote sensing scenarios. Recent advances in diffusion models show that Gaussian noise can be systematically leveraged to learn expressive data representations through denoising. Motivated by this, we investigate whether the noise process in diffusion models can be effectively utilized for discriminative tasks. We propose Noise2Map, a unified diffusion-based framework that repurposes the denoising process for fast, end-to-end discriminative learning. Unlike prior work that uses diffusion only for generation or feature extraction, Noise2Map directly predicts semantic or change maps using task-specific noise schedules and timestep conditioning, avoiding the costly sampling procedures of traditional diffusion models. The model is pretrained via self-supervised denoising and fine-tuned with supervision, enabling both interpretability and robustness. Our architecture supports both tasks (SS and CD) through a shared backbone and task-specific noise schedulers. Extensive evaluations on the SpaceNet7, WHU, and xView2 buildings damaged by wildfires datasets demonstrate that Noise2Map ranks on average 1st among seven models on semantic segmentation and 1st on change detection by a cross-dataset rank metric (average F1 primary, IoU tie-break). Ablation studies highlight the robustness of our model against different training noise schedulers and timestep control in the diffusion process, as well as the ability of the model to perform multi-task learning.

URL PDF HTML ☆

赞 0 踩 0

2604.27882 2026-05-01 cs.AI cs.HC

Building Persona-Based Agents On Demand: Tailoring Multi-Agent Workflows to User Needs

Giuseppe Arbore, Andrea Sillano, Luigi De Russis

2604.27875 2026-05-01 cs.CV

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Shuchang Zhou, Shangkun Wu, Jiwei Wei, Ke Liu, Ran Ran, Caiyan Qin, Yang Yang

2604.27872 2026-05-01 cs.AI

Modeling Clinical Concern Trajectories in Language Model Agents

Sukesh Subaharan, Venkatesan VS, Murugadasan P, Sivakumar D, Gautham N, Ganeshkumar M

2604.27870 2026-05-01 cs.CV

Parameter-Efficient Architectural Modifications for Translation-Invariant CNNs

Nuria Alabau-Bosque, Jorge Vila-Tomas, Paula Dauden-Oliver, Valero Laparra, Jesus Malo

Comments 25 pages, 16 figures

详情

英文摘要

Convolutional Neural Networks (CNNs) are widely assumed to be translation-invariant, yet standard architectures exhibit a startling fragility: even a single-pixel shift can drastically degrade performance due to their reliance on spatially dependent fully connected layers. In this work, we resolve this vulnerability by proposing a lightweight 'Online Architecture' strategy. By strategically inserting Global Average Pooling (GAP) layers at various network depths, we effectively decouple feature recognition from spatial location. Using VGG-16 as a primary case study, we demonstrate that this architectural modification achieves a massive 98% reduction in trainable parameters (from 5.2M to just 82K) and a 90% reduction in total network size (138M to 14M). Despite this drastic pruning, our variants maintain competitive Top-1 accuracy on ImageNet (66.4%) while doubling translational robustness, reducing average relative loss from 0.09 to 0.05. Furthermore, our analysis identifies a fundamental limit to invariance: while GAP resolves macroscopic sensitivity, discrete pooling operations introduce a residual periodic aliasing that prevents perfect pixel-level stability. Finally, we extend these findings to Perceptual Image Quality Assessment (IQA) by integrating our invariant backbones into the LPIPS framework. The resulting metric significantly outperforms the retrained baseline in generalization across the KADID-10k dataset (Spearman 0.89 vs. 0.75) and achieves a near-perfect alignment with human psychophysical response curves on the RAID dataset (Spearman 0.95). These results confirm that enforcing architectural invariance is a far more efficient and biologically plausible path to robustness than traditional data augmentation. Data and code are publicly available. The data and code are publicly available to facilitate validation and further research.

URL PDF HTML ☆

赞 0 踩 0

2604.27865 2026-05-01 cs.AI

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

Thomas Grady, Kip Parker, Iliyan Zarov, Henry Course, Chengxi Taylor, Ross Taylor

2604.27850 2026-05-01 cs.CL

Reasoning over Object Descriptions Improves Coreference Resolution in Task-Based Dialogue Systems

Oier Ijurco, Oier Lopez de Lacalle

Comments To be published in LREC 2026