arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2507.12590 2026-04-07 cs.CV cs.LG

From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping

Judy Long, Tao Liu, Sean Alexander Woznicki, Miljana Marković, Oskar Marko, Molly Sears

Comments A review paper on pixel-wise approaches for large-scale crop mapping. 29 pages, 18 figures. Preprint

详情

DOI: 10.1016/j.compag.2026.111646
Journal ref: Computers and Electronics in Agriculture, Volume 246, May 2026, 111646

英文摘要

Crop mapping involves identifying and classifying crop types using spatial data, primarily derived from remote sensing imagery. This study presents the first comprehensive review of large-scale, pixel-wise crop mapping workflows, encompassing both conventional supervised methods and emerging transfer learning approaches. To identify the optimal time-series generation approaches and supervised crop mapping models, we conducted systematic experiments, comparing six widely adopted satellite image-based preprocessing methods, alongside eleven supervised pixel-wise classification models. Additionally, we assessed the synergistic impact of varied training sample sizes and variable combinations. Moreover, we identified optimal transfer learning techniques for different magnitudes of domain shift. The evaluation of optimal methods was conducted across five diverse agricultural sites. Landsat 8 served as the primary satellite data source. Labels come from CDL trusted pixels and field surveys. Our findings reveal three key insights. First, fine-scale interval preprocessing paired with Transformer models consistently delivered optimal performance for both supervised and transferable workflows. RF offered rapid training and competitive performance in conventional supervised learning and direct transfer to similar domains. Second, transfer learning techniques enhanced workflow adaptability, with UDA being effective for homogeneous crop classes while fine-tuning remains robust across diverse scenarios. Finally, workflow choice depends heavily on the availability of labeled samples. With a sufficient sample size, supervised training typically delivers more accurate and generalizable results. Below a certain threshold, transfer learning that matches the level of domain shift is a viable alternative to achieve crop mapping. All code is publicly available to encourage reproducibility practice.

URL PDF HTML ☆

赞 0 踩 0

2507.05874 2026-04-07 cs.LG cs.SY eess.SY

Robust Power System State Estimation using Physics-Informed Neural Networks

Solon Falas, Markos Asprou, Charalambos Konstantinou, Maria K. Michael

2507.05387 2026-04-07 cs.CL

The Generalization Ridge: Information Flow in Natural Language Generation

Ruidi Chang, Chunyuan Deng, Hanjie Chen

2506.21744 2026-04-07 cs.LG stat.AP stat.ML

Federated Item Response Models: A Gradient-driven Privacy-preserving Framework for Distributed Psychometric Estimation

Biying Zhou, Nanyu Luo, Feng Ji

2506.18575 2026-04-07 cs.CV

2D Triangle Splatting for Direct Differentiable Mesh Training

Kaifeng Sheng, Zheng Zhou, Yingliang Peng, Qianwei Wang

Comments 13 pages, 8 figures

2506.17585 2026-04-07 cs.AI cs.CL cs.LG

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models

Yukun Huang, Sanxing Chen, Jian Pei, Manzil Zaheer, Bhuwan Dhingra

详情

英文摘要

Trustworthy language models should provide both correct and verifiable answers. However, citations generated directly by standalone LLMs are often unreliable. As a result, current systems insert citations by querying an external retriever at inference time, introducing latency, infrastructure dependence, and vulnerability to retrieval noise. We explore whether LLMs can be made to reliably attribute to the documents seen during continual pretraining without test-time retrieval, by revising the training process. To study this, we construct CitePretrainBench, a benchmark that mixes real-world corpora (Wikipedia, Common Crawl, arXiv) with novel documents and probes both short-form (single-fact) and long-form (multi-fact) citation tasks. Our approach follows a two-stage process: (1) continual pretraining to index factual knowledge by binding it to persistent document identifiers; and (2) instruction tuning to elicit citation behavior. We introduce Active Indexing for the first stage, which creates generalizable, source-anchored bindings by augmenting training with synthetic data that (i) restate each fact in diverse, compositional forms and (ii) enforce bidirectional training (source-to-fact and fact-to-source). This equips the model to both generate content from a cited source and attribute its own answers, improving robustness to paraphrase and composition. Experiments with Qwen-2.5-7B&3B show that Active Indexing consistently outperforms a Passive Indexing baseline, which simply appends an identifier to each document, achieving citation precision gains of up to 30.2% across all tasks and models. Our ablation studies reveal that performance continues to improve as we scale the amount of augmented data, showing a clear upward trend even at 16x the original token count. Finally, we show that internal citations complement external ones by making the model more robust to retrieval noise.

URL PDF HTML ☆

赞 0 踩 0

2506.14449 2026-04-07 cs.LG physics.optics

Detecting immune cells with label-free two-photon autofluorescence and deep learning

Lucas Kreiss, Amey Chaware, Maryam Roohian, Sarah Lemire, Oana-Maria Thoma, Birgitta Carlé, Maximilian Waldner, Sebastian Schürmann, Oliver Friedrich, Roarke Horstmeyer

详情

DOI: 10.1002/jbio.70260

英文摘要

Label-free imaging has gained broad interest because of its potential to omit elaborate staining procedures which is especially relevant for in vivo use. Label-free multiphoton microscopy (MPM), for instance, exploits two-photon excitation of natural autofluorescence (AF) from native, metabolic proteins, making it ideal for in vivo endomicroscopy. Deep learning (DL) models have been widely used in other optical imaging technologies to predict specific target annotations and thereby digitally augment the specificity of these label-free images. However, this computational specificity has only rarely been implemented for MPM. In this work, we used a data set of label-free MPM images from a series of different immune cell types (5,075 individual cells for binary classification in mixed samples and 3,424 cells for a multi-class classification task) and trained a convolutional neural network (CNN) to classify cell types based on this label-free AF as input. A low-complexity squeezeNet architecture was able to achieve reliable immune cell classification results (0.89 ROC-AUC, 0.95 PR-AUC, for binary classification in mixed samples; 0.689 F1 score, 0.697 precision, 0.748 recall, and 0.683 MCC for six-class classification in isolated samples). Perturbation tests confirmed that the model is not confused by extracellular environment and that both input AF channels (NADH and FAD) are about equally important to the classification. In the future, such predictive DL models could directly detect specific immune cells in unstained images and thus, computationally improve the specificity of label-free MPM which would have great potential for in vivo endomicroscopy.

URL PDF HTML ☆

赞 0 踩 0

2506.13183 2026-04-07 cs.CV

MT-PCR: Hybrid Mamba-Transformer Network with Spatial Serialization for Point Cloud Registration

Bingxi Liu, An Liu, Hao Chen, Huaqi Tao, Jinqiang Cui, Yiqun Wang, Hong Zhang

Comments 12 Pages

2506.02371 2026-04-07 cs.LG

SFBD Flow: A Continuous-Optimization Framework for Training Diffusion Models with Noisy Samples

Haoye Lu, Darren Lo, Yaoliang Yu

2506.00721 2026-04-07 cs.CV cs.LG

Common Inpainted Objects In-N-Out of Context

Tianze Yang, Tyson Jordan, Ruitong Sun, Ninghao Liu, Jin Sun

Comments The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026

2506.00698 2026-04-07 cs.CV cs.LG

Concept-Centric Token Interpretation for Vector-Quantized Generative Models

Tianze Yang, Yucheng Shi, Mengnan Du, Xuansheng Wu, Qiaoyu Tan, Jin Sun, Ninghao Liu

Comments 17 pages, 7 figures

2506.00318 2026-04-07 cs.CV

Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning

Sara Ghazanfari, Francesco Croce, Nicolas Flammarion, Prashanth Krishnamurthy, Farshad Khorrami, Siddharth Garg

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2506.00077 2026-04-07 cs.CL cs.LG stat.ML

Gaussian mixture models as a proxy for interacting language models

Edward L. Wang, Mohammad Sharifi Kiasari, Tianyu Wang, Hayden Helm, Avanti Athreya, Carey Priebe, Vince Lyzinski

2505.24535 2026-04-07 cs.LG cs.AI cs.CL

Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

Narmeen Oozeer, Luke Marks, Shreyans Jain, Fazl Barez, Amirali Abdullah

Comments Accepted to Findings of EMNLP, 2025

2505.21605 2026-04-07 cs.LG cs.AI cs.CR

SoSBench: Benchmarking Safety Alignment on Six Scientific Domains

Fengqing Jiang, Fengbo Ma, Zhangchen Xu, Yuetai Li, Zixin Rao, Bhaskar Ramasubramanian, Luyao Niu, Bo Li, Xianyan Chen, Zhen Xiang, Radha Poovendran

Comments Project Page: https://sosbench.github.io/; ICLR 2026

2505.19606 2026-04-07 cs.CL

Languages in Whisper-Style Speech Encoders Align Both Phonetically and Semantically

Ryan Soh-Eun Shim, Domenico De Cristofaro, Chengzhi Martin Hu, Alessandro Vietti, Barbara Plank

Comments Submitted to Interspeech 2026

2505.19487 2026-04-07 cs.CV

SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams

Zhuoheng Gao, Yihao Li, Jiyao Zhang, Rui Zhao, Tong Wu, Hao Tang, Zhaofei Yu, Hao Dong, Guozhang Chen, Tiejun Huang

Comments Accepted at ICLR 2026

2505.17087 2026-04-07 cs.CL cs.AI cs.CY cs.DB cs.LG

Informatics for Food Processing

Gordana Ispirova, Michael Sebek, Giulia Menichetti

2505.16934 2026-04-07 cs.CL

In-Context Watermarks for Large Language Models

Yepeng Liu, Xuandong Zhao, Christopher Kruegel, Dawn Song, Yuheng Bu

Comments ICLR2026

2505.15925 2026-04-07 cs.RO cs.AI cs.CV

VERDI: VLM-Embedded Reasoning for Autonomous Driving

Bowen Feng, Zhiting Mei, Julian Ost, Filippo Ghilotti, Baiang Li, Roger Girgis, Anirudha Majumdar, Felix Heide

2505.13742 2026-04-07 cs.LG cs.AI

Understanding Task Representations in Neural Networks via Bayesian Ablation

Andrew Nam, Declan Campbell, Thomas Griffiths, Jonathan Cohen, Sarah-Jane Leslie

Comments Accepted at CLeaR 2026 (5th Conference on Causal Learning and Reasoning). 13 pages, 3 figures, plus appendix

2505.12530 2026-04-07 cs.LG math.OC stat.ML

Enforcing Fair Predicted Scores on Intervals of Percentiles by Difference-of-Convex Constraints

Yutian He, Yankun Huang, Yao Yao, Qihang Lin

Comments 45 pages, 12 figures, 4 tables. This work is published in the proceedings of AISTATS 2026

2505.12167 2026-04-07 cs.LG cs.CR

FABLE: A Localized, Targeted Adversarial Attack on Weather Forecasting Models

Yue Deng, Asadullah Hill Galib, Xin Lan, Jack Gunn, Pang-Ning Tan, Lifeng Luo

Comments Version 2 incorporates revisions based on feedback from NeurIPS 2025 reviewers (final score: borderline). We improved clarity in previously complex sections to enhance accessibility for non-expert readers and expanded the experimental evaluation to provide more comprehensive and diverse results

2505.11211 2026-04-07 cs.LG cs.AI stat.ME stat.ML

Bayesian Hierarchical Invariant Prediction

Francisco Madaleno, Pernille Julie Viuff Sand, Francisco C. Pereira, Sergio Hernan Garrido Mejia

2505.08548 2026-04-07 cs.RO cs.AI cs.LG

From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation

Yifu Yuan, Haiqin Cui, Yibin Chen, Zibin Dong, Fei Ni, Longxin Kou, Jinyi Liu, Pengyi Li, Yan Zheng, Jianye Hao

Comments Published as a conference paper at ICLR 2026. Our project homepage: https://embodied-fsd.github.io/

2505.05375 2026-04-07 cs.CV cs.AI cs.LG cs.NE

Threshold Modulation for Online Test-Time Adaptation of Spiking Neural Networks

Kejie Zhao, Wenjia Hua, Aiersi Tuerhong, Luziwei Leng, Yuxin Ma, Qinghai Guo

Comments Accepted by IJCNN 2025. 10 pages, 3 figures, 7 tables

2505.03530 2026-04-07 cs.LG

A Multi-Level Causal Intervention Framework for Mechanistic Interpretability in Variational Autoencoders

Dip Roy, Rajiv Misra, Sanjay Kumar Singh, Anisha Roy

详情

英文摘要

Understanding how generative models represent and transform data is a foundational problem in deep learning interpretability. While mechanistic interpretability of discriminative architectures has yielded substantial insights, relatively little work has addressed variational autoencoders (VAEs). This paper presents the first general-purpose multilevel causal intervention framework for mechanistic interpretability of VAEs. The framework comprises four manipulation types: input manipulation, latent-space perturbation, activation patching, and causal mediation analysis. We also define three new quantitative metrics capturing properties not measured by existing disentanglement metrics alone: Causal Effect Strength (CES), intervention specificity, and circuit modularity. We conduct the largest empirical study to date of VAE causal mechanisms across six architectures (standard VAE, beta-VAE, FactorVAE, beta-TC-VAE, DIP-VAE-II, and VQ-VAE) and five benchmarks (dSprites, 3DShapes, MPI3D, CelebA, and SmallNORB), with three seeds per configuration, totaling 90 independent training runs. Our results reveal several findings: (i) a consistent within-dataset negative correlation between CES and DCI disentanglement (the CES-DCI trade-off); (ii) that the KL reweighting mechanism of beta-VAE induces a capacity bottleneck when generative factors approach latent dimensionality, degrading disentanglement on complex datasets; (iii) that no single VAE architecture dominates across all five datasets, with optimal choice depending on dataset structure; and (iv) that CES-based metrics applied to discrete latent spaces (VQ-VAE) yield near-zero values, revealing a critical limitation of continuous-intervention methods for discrete representations. These results provide both a theoretical foundation and comprehensive empirical evaluation for mechanistic interpretability of generative models.

URL PDF HTML ☆

赞 0 踩 0

2503.13821 2026-04-07 cs.CV

Stitch-a-Demo: Video Demonstrations from Multistep Descriptions

Chi Hsuan Wu, Kumar Ashutosh, Kristen Grauman

2503.11217 2026-04-07 cs.LG

Deep Joint Distribution Optimal Transport for Universal Domain Adaptation on Time Series

Romain Mussard, Fannia Pacheco, Maxime Berar, Gilles Gasso, Paul Honeine

2503.08751 2026-04-07 cs.CV cs.LG

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning

Qi Wang, Zhipeng Zhang, Baao Xie, Xin Jin, Yunbo Wang, Shiyu Wang, Liaomo Zheng, Xiaokang Yang, Wenjun Zeng

Comments Accepted by ICCV 2025. Project page: https://qiwang067.github.io/diswm