arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2941
2507.12442 2026-03-24 cs.AR cs.AI cs.LG cs.SY eess.SY

Characterizing State Space Model and Hybrid Language Model Performance with Long Context

Saptarshi Mitra, Rachid Karami, Haocheng Xu, Sitao Huang, Hyoukjun Kwon

Comments 13 pages, 7 figures

详情
英文摘要

Emerging applications such as AR are driving demands for machine intelligence capable of processing continuous and/or long-context inputs on local devices. However, currently dominant models based on Transformer architecture suffers from the quadratic computational and memory overhead, which hinders applications required to process long contexts. This has spurred a paradigm shift towards new architectures like State Space Models (SSMs) and SSM-Transformer hybrid models, which provide near-linear scaling. The near-linear scaling enabled efficient handling of millions of tokens while delivering high performance in recent studies. Although such works present promising results, their workload characteristics in terms of computational performance and hardware resource requirements are not yet thoroughly explored, which limits our understanding of their implications to the system level optimizations. To address this gap, we present a comprehensive, compara-ive benchmarking of carefully selected Transformers, SSMs, and hybrid models specifically for long-context inference on consumer and embedded GPUs. Our analysis shows that SSMs are well-suited for on-device AI on consumer and embedded GPUs for long context inferences. While Transformers are up to 1.9x faster at short sequences (<8K tokens), SSMs demonstrate a dramatic performance inversion, becoming up to 4x faster at very long contexts (~57K tokens), thanks to their linear computational complexity and ~64% reduced memory footrprint. Our operator-level analysis reveals that custom SSM kernels like selective scan despite being hardware-aware to minimize memory IO, dominate the inference runtime on edge platforms, accounting for over 55% of latency due to their sequential, element-wise nature. SSM-Scope is open-sourced at https://github.com/sapmitra/ssm-scope

2507.10057 2026-03-24 cs.IR cs.AI cs.CL cs.LG

Chain of Retrieval: Multi-Aspect Iterative Search Expansion and Post-Order Search Aggregation for Full Paper Retrieval

Sangwoo Park, Jinheon Baek, Soyeong Jeong, Sung Ju Hwang

详情
英文摘要

Scientific paper retrieval, particularly framed as document-to-document retrieval, aims to identify relevant papers in response to a long-form query paper, rather than a short query string. Previous approaches to this task have focused exclusively on abstracts, embedding them into dense vectors as surrogates for full documents and calculating similarity between them. Yet, abstracts offer only sparse and high-level summaries, and such methods primarily optimize one-to-one similarity, overlooking the dynamic relations that emerge across relevant papers during the retrieval process. To address this, we propose Chain of Retrieval(COR), a novel iterative framework for full-paper retrieval. Specifically, COR decomposes each query paper into multiple aspect-specific views, matches them against segmented candidate papers, and iteratively expands the search by promoting top-ranked results as new queries, thereby forming a tree-structured retrieval process. The resulting retrieval tree is then aggregated in a post-order manner: descendants are first combined at the query level, then recursively merged with their parent nodes, to capture hierarchical relations across iterations. To validate this, we present SCIFULLBENCH, a large-scale benchmark providing both complete and segmented contexts of full papers for queries and candidates, and results show that COR significantly outperforms existing retrieval baselines. Our code and dataset is available at https://github.com/psw0021/Chain-of-Retrieval-Official.

2507.06358 2026-03-24 q-bio.PE cs.LG

Multi-scale species richness estimation with deep learning

Victor Boussange, Bert Wuyts, Philipp Brun, Johanna T. Malle, Gabriele Midolo, Jeanne Portier, Théophile Sanchez, Niklaus E. Zimmermann, Irena Axmanová, Helge Bruelheide, Milan Chytrý, Stephan Kambach, Zdeňka Lososová, Martin Večeřa, Idoia Biurrun, Klaus T. Ecker, Jonathan Lenoir, Jens-Christian Svenning, Dirk Nikolaus Karger

Comments 31 pages

详情
英文摘要

Biodiversity assessments depend critically on the spatial scale at which species richness is measured. How species richness accumulates with sampling area is influenced by natural and anthropogenic processes whose effects vary across spatial scales. These accumulation dynamics, described by the species-area relationship (SAR), are challenging to assess because most biodiversity surveys cover sampling areas far smaller than the scales at which these processes operate. Here, we combine sampling theory with deep learning to estimate species richness at arbitrary spatial scales across geographic space from existing ecological surveys. We apply our model, named MuScaRi, to ~350k vegetation surveys across Europe. Validated against independent regional plant inventories, MuScaRi reduces root mean squared error of vascular plant richness estimates by 61% relative to conventional estimators, yields substantially less biased predictions, and produces multi-scale richness maps alongside spatially explicit estimates of the species accumulation rate, a key indicator for biodiversity conservation. By encompassing the full spectrum of ecologically relevant spatial scales within a single unified framework, MuScaRi provides an essential tool for robust biodiversity assessments and forecasts under global change.

2507.02064 2026-03-24 q-bio.NC cs.AI

REMI: Reconstructing Episodic Memory During Internally Driven Path Planning

Zhaoze Wang, Genela Morris, Dori Derdikman, Pratik Chaudhari, Vijay Balasubramanian

详情
英文摘要

Grid cells in the medial entorhinal cortex (MEC) and place cells in the hippocampus (HC) both form spatial representations. Grid cells fire in triangular grid patterns, while place cells fire at specific locations and respond to contextual cues. How do these interacting systems support not only spatial encoding but also internally driven path planning, such as navigating to locations recalled from cues? Here, we propose a system-level theory of MEC-HC wiring that explains how grid and place cell patterns could be connected to enable cue-triggered goal retrieval, path planning, and reconstruction of sensory experience along planned routes. We suggest that place cells autoassociate sensory inputs with grid cell patterns, allowing sensory cues to trigger recall of goal-location grid patterns. We show analytically that grid-based planning permits shortcuts through unvisited locations and generalizes local transitions to long-range paths. During planning, intermediate grid states trigger place cell pattern completion, reconstructing sensory experiences along the route. Using a single-layer RNN modeling the HC-MEC loop with a planning subnetwork, we demonstrate these effects in both biologically grounded navigation simulations using RatatouGym and visually realistic navigation tasks using Habitat Sim.

2506.03467 2026-03-24 cs.IT cs.CR cs.LG eess.SP math.IT stat.ME

Differentially Private Distribution Release of Gaussian Mixture Models via KL-Divergence Minimization

Hang Liu, Anna Scaglione, Sean Peisert

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Gaussian Mixture Models (GMMs) are widely used statistical models for representing multi-modal data distributions, with numerous applications in data mining, pattern recognition, data simulation, and machine learning. However, recent research has shown that releasing GMM parameters poses significant privacy risks, potentially exposing sensitive information about the underlying data. In this paper, we address the challenge of releasing GMM parameters while ensuring differential privacy (DP) guarantees. Specifically, we focus on the privacy protection of mixture weights, component means, and covariance matrices. We propose to use Kullback-Leibler (KL) divergence as a utility metric to assess the accuracy of the released GMM, as it captures the joint impact of noise perturbation on all the model parameters. To achieve privacy, we introduce a DP mechanism that adds carefully calibrated random perturbations to the GMM parameters. Through theoretical analysis, we quantify the effects of privacy budget allocation and perturbation statistics on the DP guarantee, and derive a tractable expression for evaluating KL divergence. We formulate and solve an optimization problem to minimize the KL divergence between the released and original models, subject to a given $(ε, δ)$-DP constraint. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach achieves strong privacy guarantees while maintaining high utility.

2505.20730 2026-03-24 cs.IR cs.AI cs.CL cs.LG

Do LLMs Understand Collaborative Signals? Diagnosis and Repair

Shahrooz Pouryousef, Ali Montazeralghaem

详情
英文摘要

Collaborative information from user-item interactions is a fundamental source of signal in successful recommender systems. Recently, researchers have attempted to incorporate this knowledge into large language model-based recommender approaches (LLMRec) to enhance their performance. However, there has been little fundamental analysis of whether LLMs can effectively reason over collaborative information. In this paper, we analyze the ability of LLMs to reason about collaborative information in recommendation tasks, comparing their performance to traditional matrix factorization (MF) models. We propose a simple and effective method to improve LLMs' reasoning capabilities using retrieval-augmented generation (RAG) over the user-item interaction matrix with four different prompting strategies. Our results show that the LLM outperforms the MF model whenever we provide relevant information in a clear and easy-to-follow format, and prompt the LLM to reason based on it. We observe that with this strategy, in almost all cases, the more information we provide, the better the LLM performs.

2505.19731 2026-03-24 stat.ML cs.LG

Proximal Point Nash Learning from Human Feedback

Daniil Tiapkin, Daniele Calandriello, Denis Belomestny, Eric Moulines, Alexey Naumov, Kashif Rasul, Michal Valko, Pierre Menard

详情
英文摘要

Traditional Reinforcement Learning from Human Feedback (RLHF) often relies on reward models, frequently assuming preference structures like the Bradley--Terry model, which may not accurately capture the complexities of real human preferences (e.g., intransitivity). Nash Learning from Human Feedback (NLHF) offers a more direct alternative by framing the problem as finding a Nash equilibrium of a game defined by these preferences. While many works study the Nash learning problem directly in the policy space, we instead consider it under a more realistic policy parametrization setting. We first analyze a simple self-play policy gradient method, which is equivalent to Online IPO. We establish high-probability last-iterate convergence guarantees for this method, but our analysis also reveals a possible stability limitation of the underlying dynamics. Motivated by this, we embed the self-play updates into a proximal point framework, yielding a stabilized algorithm. For this combined method, we prove high-probability last-iterate convergence and discuss its more practical version, which we call Nash Prox. Finally, we apply this method to post-training of large language models and validate its empirical performance.

2504.15453 2026-03-24 eess.SY cs.RO cs.SY

Barrier-Riccati Synthesis for Nonlinear Safe Control with Expanded Region of Attraction

Hassan Almubarak, Maitham F. AL-Sunni, Justin T. Dubbin, Nader Sadegh, John M. Dolan, Evangelos A. Theodorou

Comments This work has been accepted for publication in the proceedings of the 2026 American Control Conference (ACC), New Orleans, Louisiana, USA

详情
英文摘要

We present a Riccati-based framework for safety-critical nonlinear control that integrates the barrier states (BaS) methodology with the State-Dependent Riccati Equation (SDRE) approach. The BaS formulation embeds safety constraints into the system dynamics via auxiliary states, enabling safety to be treated as a control objective. To overcome the limited region of attraction in linear BaS controllers, we extend the framework to nonlinear systems using SDRE synthesis applied to the barrier-augmented dynamics and derive a matrix inequality condition that certifies forward invariance of a large region of attraction and guarantees asymptotic safe stabilization. The resulting controller is computed online via pointwise Riccati solutions. We validate the method on an unstable constrained system and cluttered quadrotor navigation tasks, demonstrating improved constraint handling, scalability, and robustness near safety boundaries. This framework offers a principled and computationally tractable solution for synthesizing nonlinear safe feedback in safety-critical environments.

2504.03097 2026-03-24 stat.ML cs.LG math.PR math.ST stat.TH

A Computational Transition for Detecting Multivariate Shuffled Linear Regression by Low-Degree Polynomials

Zhangsong Li

Comments 27 pages; improved exposition

Journal ref IEEE Transactions on Information Theory, 72(4):2444-2456 (April 2026)

详情
英文摘要

In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=\tfrac{1}{\sqrt{1+σ^2}}(Π_* X Q_* + σZ)$, where $X$ is an $n*d$ standard Gaussian design matrix, $Z$ is an $n*m$ Gaussian noise matrix, $Π_*$ is an unknown $n*n$ permutation matrix, and $Q_*$ is an unknown $d*m$ on the Grassmanian manifold satisfying $Q_*^{\top} Q_* = \mathbb I_m$. Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$ are independent Gaussian random matrices of sizes $n*d$ and $n*m$, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When $m=o(d)$, we show that all degree-$D$ polynomials fail to distinguish these two models even when $σ=0$, provided with $D^4=o\big( \tfrac{d}{m} \big)$. (2) When $m=d$ and $σ=ω(1)$, we show that all degree-$D$ polynomials fail to distinguish these two models provided with $D=o(σ)$. (3) When $m=d$ and $σ=o(1)$, we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions $m$ and $d$, the noise level $σ$, and the computational complexity of the testing task.

2503.21686 2026-03-24 quant-ph cs.LG

Molecular Quantum Transformer

Yuichi Kamata, Quoc Hoan Tran, Yasuhiro Endo, Hirotaka Oshima

Comments 14 pages, 8 figures; updated for refining results and discussion with other FTQC implementations of Quantum Transformer

Journal ref Proceedings of the AAAI Conference on Artificial Intelligence, 40(27), 22482-22490 (2026)

详情
英文摘要

The Transformer model, renowned for its powerful attention mechanism, has achieved state-of-the-art performance in various artificial intelligence tasks but faces challenges such as high computational cost and memory usage. Researchers are exploring quantum computing to enhance the Transformer's design, though it still shows limited success with classical data. With a growing focus on leveraging quantum machine learning for quantum data, particularly in quantum chemistry, we propose the Molecular Quantum Transformer (MQT) for modeling interactions in molecular quantum systems. By utilizing quantum circuits to implement the attention mechanism on the molecular configurations, MQT can efficiently calculate ground-state energies for all configurations. Numerical demonstrations show that in calculating ground-state energies for H2, LiH, BeH2, and H4, MQT outperforms the classical Transformer, highlighting the promise of quantum effects in Transformer structures. Furthermore, its pretraining capability on diverse molecular data facilitates the efficient learning of new molecules, extending its applicability to complex molecular systems with minimal additional effort. Our method offers an alternative to existing quantum algorithms for estimating ground-state energies, opening new avenues in quantum chemistry and materials science.

2501.02406 2026-03-24 stat.ML cs.AI cs.CL cs.IT cs.LG math.IT

A Training-free Method for LLM Text Attribution

Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari

详情
英文摘要

Verifying the provenance of content is crucial to the functioning of many organizations, e.g., educational institutions, social media platforms, and firms. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions use in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within their institutions. In this paper, we answer the following question: Given a piece of text, can we identify whether it was produced by a particular LLM, while ensuring a guaranteed low false positive rate? We model LLM text as a sequential stochastic process with complete dependence on history. We then design zero-shot statistical tests to (i) distinguish between text generated by two different known sets of LLMs $A$ (non-sanctioned) and $B$ (in-house), and (ii) identify whether text was generated by a known LLM or by any unknown model. We prove that the Type I and Type II errors of our test decrease exponentially with the length of the text. We also extend our theory to black-box access via sampling and characterize the required sample size to obtain essentially the same Type I and Type II error upper bounds as in the white-box setting (i.e., with access to $A$). We show the tightness of our upper bounds by providing an information-theoretic lower bound. We next present numerical experiments to validate our theoretical results and assess their robustness in settings with adversarial post-editing. Our work has a host of practical applications in which determining the origin of a text is important and can also be useful for combating misinformation and ensuring compliance with emerging AI regulations. See https://github.com/TaraRadvand74/llm-text-detection for code, data, and an online demo of the project.

2411.17367 2026-03-24 cs.AR cs.LG

Efficient transformer adaptation for analog in-memory computing via low-rank adapters

Chen Li, Elena Ferro, Corey Lammie, Manuel Le Gallo, Irem Boybat, Bipin Rajendran

Comments 18 pages

详情
英文摘要

Analog In-Memory Computing (AIMC) offers a promising solution to the von Neumann bottleneck. However, deploying transformer models on AIMC remains challenging due to their inherent need for flexibility and adaptability across diverse tasks. For the benefits of AIMC to be fully realized, weights of static vector-matrix multiplications must be mapped and programmed to analog devices in a weight-stationary manner. This poses two challenges for adapting a base network to hardware and downstream tasks: (i) conventional analog hardware-aware (AHWA) training requires retraining the entire model, and (ii) reprogramming analog devices is both time- and energy-intensive. To address these issues, we propose Analog Hardware-Aware Low-Rank Adaptation (AHWA-LoRA) training, a novel approach for efficiently adapting transformers to AIMC hardware. AHWA-LoRA training keeps the analog weights fixed as meta-weights and introduces lightweight external LoRA modules for both hardware and task adaptation. We validate AHWA-LoRA training on SQuAD v1.1 and the GLUE benchmark, demonstrate its scalability to larger models, and show its effectiveness in instruction tuning and reinforcement learning. We further evaluate a practical deployment scenario that balances AIMC tile latency with digital LoRA processing using optimized pipeline strategies, with RISC-V-based programmable multi-core accelerators. This hybrid architecture achieves efficient transformer inference with only a 4% per-layer overhead compared to a fully AIMC implementation.

2410.09027 2026-03-24 stat.ME cs.LG econ.EM stat.AP

Variance reduction combining pre-experiment and in-experiment data

Zhexiao Lin, Pablo Crespo

Comments Accepted to 5th Conference on Causal Learning and Reasoning (CLeaR), 2026

详情
英文摘要

Online controlled experiments (A/B testing) are fundamental to data-driven decision-making in many companies. Improving the sensitivity of these experiments under fixed sample size constraints requires reducing the variance of the average treatment effect (ATE) estimator. Existing variance reduction techniques such as CUPED and CUPAC use pre-experiment data, but their effectiveness depends on how predictive those data are for outcomes measured during the experiment. In-experiment data are often more strongly correlated with the outcome, but using arbitrary post-treatment variables can introduce bias. In this paper, we propose a general, robust, and scalable framework that combines both pre-experiment and in-experiment data to achieve variance reduction. Our framework is simple, interpretable, and computationally efficient, making it practical for real-world deployment. We develop the asymptotic theory of the proposed estimator and provide consistent variance estimators. Empirical results from multiple online experiments conducted at Etsy demonstrate substantial additional variance reduction over current pipeline, even when incorporating only a few post-treatment covariates. These findings underscore the effectiveness of our framework in improving experimental sensitivity and accelerating data-driven decision-making.

2410.01591 2026-03-24 eess.IV cs.AI cs.CV

Imaging foundation model for universal enhancement of non-ideal measurement CT

Rongjun Ge, Yuxin Liu, Zhan Wu, Shangwen Yang, Yuan Gao, Chenyu You, Ge Wang, Shuo Li, Yuting He, Yang Chen

Comments This paper has been accepted by Nature Communications

详情
英文摘要

Non-ideal measurement computed tomography (NICT) employs suboptimal imaging protocols to expand CT applications. However, the resulting trade-offs degrade image quality, limiting clinical acceptability. Although deep learning methods have been used to enhance NICT images, their reliance on large training datasets and limited generalizability across diverse settings hinder practical use. We propose the multi-scale integrated Transformer AMPlifier (TAMP), the first imaging foundation model for universal NICT enhancement. Pre-trained on 10.8 million physics-driven simulated NICT images, TAMP generalizes effectively across various NICT settings, defect degrees, and body regions. Moreover, a parameter-efficient fine-tuning strategy enables TAMP to adapt to specific clinical scenarios using only few slices. Extensive experiments, including radiologists and real-world validations, demonstrate that TAMP consistently improves image quality and clinical acceptability, underscoring its significant potential to advance CT imaging and broaden NICT applications in clinical practice.

2407.02419 2026-03-24 quant-ph cs.LG stat.ML

Quantum Curriculum Learning

Quoc Hoan Tran, Yasuhiro Endo, Hirotaka Oshima

Comments Updated with schematic figures of quantum circuits and transparent explanation for Curriculum Learning

Journal ref Phys. Rev. A 112, 032431 (2025)

详情
英文摘要

Quantum machine learning (QML) requires significant quantum resources to address practical real-world problems. When the underlying quantum information exhibits hierarchical structures in the data, limitations persist in training complexity and generalization. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learning model before progressing to more challenging ones. Q-CurL exhibits robustness to noise and data limitations, which is particularly relevant for current and near-term noisy intermediate-scale quantum devices. We achieve this through a curriculum design based on quantum data density ratios and a dynamic learning schedule that prioritizes the most informative quantum data. Empirical evidence shows that Q-CurL significantly enhances training convergence and generalization for unitary learning and improves the robustness of quantum phase recognition tasks. Q-CurL is effective with physical learning applications in physics and quantum chemistry.

2405.18777 2026-03-24 math.OC cs.LG

SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity

Tianshu Chu, Dachuan Xu, Wei Yao, Jin Zhang

Comments We have primarily fixed Lemma F.3 and revised the proofs of Theorems 3.7, 3.9, and 3.11 in this version, while the main results remain unchanged

详情
英文摘要

While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex optimization in (Li et al., 2021) to the bilevel setting, can achieve optimal sample complexity in both the finite-sum and expectation settings. We show the optimality of SPABA by proving that there is no gap in complexity analysis between stochastic bilevel and single-level optimization when implementing PAGE. Notably, as indicated by the results of (Dagréou et al., 2022), there might exist a gap in complexity analysis when implementing other stochastic gradient estimators, like SGD and SAGA. In addition to SPABA, we propose several other single-loop stochastic bilevel algorithms, that either match or improve the state-of-the-art sample complexity results, leveraging our convergence rate and complexity analysis. Numerical experiments demonstrate the superior practical performance of the proposed methods.

2405.17710 2026-03-24 cs.SI cs.CL

Does Geo-co-location Matter? A Case Study of Public Health Conversations during COVID-19

Paiheng Xu, Louiqa Raschid, Vanessa Frias-Martinez

Comments ICWSM 2026

详情
英文摘要

Social media platforms like Twitter (now X) have been pivotal in information dissemination and public engagement. The objective of our research is to analyze the effect of localized engagement on social media conversations. This study examines the impact of geographic co-location, as a proxy for localized engagement. Our research is grounded in a COVID-19 dataset. A key goal during the pandemic for public health experts was to encourage prosocial behavior that could impact local outcomes such as masking and social distancing. Given the importance of local news and guidance during COVID-19, we analyze the effect of localized engagement, between public health experts (PHEs) and the public, on social media. We analyze a Twitter Conversation dataset from January 2020 to November 2021, comprising over 19 K tweets from nearly five hundred PHEs, and 800 K replies from 350 K participants. We use a Poisson regression model to show that geo-co-location is indeed associated with higher engagement. Lexical features associated with emotion and personal experiences were more common in geo-co-located conversations. To complement our statistical analysis, we also applied a large language model (LLM)-based method to automatically generate and evaluate hypotheses; the LLM results confirm the results using lexical features. This research provides insights into how geographic co-location influences social media engagement and can inform strategies to improve public health messaging.

2402.01749 2026-03-24 cs.CY cs.AI cs.LG

Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models

Weijia Zhang, Jindong Han, Zhao Xu, Hang Ni, Tengfei Lyu, Hao Liu, Hui Xiong

详情
英文摘要

The integration of machine learning techniques has become a cornerstone in the development of intelligent urban services, significantly contributing to the enhancement of urban efficiency, sustainability, and overall livability. Recent advancements in foundational models, such as ChatGPT, have introduced a paradigm shift within the fields of machine learning and artificial intelligence. These models, with their exceptional capacity for contextual comprehension, problem-solving, and task adaptability, present a transformative opportunity to reshape the future of smart cities and drive progress toward Urban General Intelligence (UGI). Despite increasing attention to Urban Foundation Models (UFMs), this rapidly evolving field faces critical challenges, including the lack of clear definitions, systematic reviews, and universalizable solutions. To address these issues, this paper first introduces the definition and concept of UFMs and highlights the distinctive challenges involved in their development. Furthermore, we present a data-centric taxonomy that classifies existing research on UFMs according to the various urban data modalities and types. In addition, we propose a prospective framework designed to facilitate the realization of versatile UFMs, aimed at overcoming the identified challenges and driving further progress in this field. Finally, this paper systematically summarizes and discusses existing benchmarks and datasets related to UFMs, and explores the wide-ranging applications of UFMs within urban contexts, illustrating their potential to significantly impact and transform urban systems. A comprehensive collection of relevant research papers and open-source resources have been collated and are continuously updated at: https://github.com/usail-hkust/Awesome-Urban-Foundation-Models.

2401.09346 2026-03-24 stat.ML cs.LG

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

Wanrong Zhu, Zhipeng Lou, Ziyang Wei, Wei Biao Wu

详情
英文摘要

Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution information and construct a t-based confidence interval. Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free. We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference. In particular, a new Gaussian approximation result is developed for the online estimators to characterize the coverage properties of our confidence intervals in terms of relative errors. Additionally, our method also allows for leveraging parallel computing to further accelerate calculations using multiple cores. It is easy to implement and can be integrated with existing stochastic algorithms without the need for complicated modifications.

2305.10413 2026-03-24 stat.ML cs.LG math.ST stat.AP stat.TH

On Consistency of Signature Using Lasso

Xin Guo, Binnan Wang, Ruixun Zhang, Chaoyi Zhao

详情
英文摘要

Signatures are iterated path integrals of continuous and discrete-time processes, and their universal nonlinearity linearizes the problem of feature selection in time series data analysis. This paper studies the consistency of signature using Lasso regression, both theoretically and numerically. We establish conditions under which the Lasso regression is consistent both asymptotically and in finite sample. Furthermore, we show that the Lasso regression is more consistent with the Itô signature for time series and processes that are closer to the Brownian motion and with weaker inter-dimensional correlations, while it is more consistent with the Stratonovich signature for mean-reverting time series and processes. We demonstrate that signature can be applied to learn nonlinear functions and option prices with high accuracy, and the performance depends on properties of the underlying process and the choice of the signature.

2212.07826 2026-03-24 quant-ph cs.LG q-bio.BM

Hybrid Quantum Generative Adversarial Networks for Molecular Simulation and Drug Discovery

Prateek Jain, Param Pathak, Krishna Bhatia, Shalini Devendrababu, Srinjoy Ganguly

Comments 33 pages, 25 figures

Journal ref Quantum Mach. Intell. 8, 30 (2026)

详情
英文摘要

In molecular research, the modelling and analysis of molecules through simulation is an important part that has a direct influence on medical development, material science and drug discovery. The processing power required to design protein chains with hundreds of peptides is huge. Classical computing techniques, including state-of-the-art machine learning models being deployed on classical computing machines, have proven to be inefficient in this task, though they have been successful in a limited way. Moreover, current practical implementations, as opposed to purely theoretical modelling, are often infeasible in terms of both time and cost. One of the major areas where quantum machine learning is expected to have a profound advantage over classical algorithms is drug discovery. Quantum generative models have given some promising benefits in recent studies. This paper introduces three novel quantum generative adversarial network (QGAN) architecture variants resulting from different configurations, various quantum circuit layers and patched ansatz. A quantum simulator from Xanadu's PennyLane was utilized for executing the QGAN models trained on the QM9 dataset. Upon evaluation, one of the models, namely the QWGAN-HG-GP (Wasserstein distance with gradient penalty) model, outperformed the other QGAN models in different drug molecule property metrics.

2206.10143 2026-03-24 stat.ML cs.LG math.ST stat.ME stat.TH

Noise-contrastive Online Change Point Detection

Nikita Puchkin, Artur Goldman, Konstantin Yakovlev, Valeriia Dzis, Uliana Vinogradova

Comments The preliminary version of this paper was presented at the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023, PMLR 206:5686-5713)

详情
英文摘要

We suggest a novel procedure for online change point detection. Our approach expands an idea of maximizing a discrepancy measure between points from pre-change and post-change distributions. This leads to flexible algorithms suitable for both parametric and nonparametric scenarios. We prove non-asymptotic bounds on the average running length of the procedure and its expected detection delay. The efficiency of the algorithm is illustrated with numerical experiments on synthetic and real-world data sets.

2202.06135 2026-03-24 cs.GT cs.IR cs.LG

No-Regret Bayesian Recommendation to Homogeneous Users

Yiding Feng, Wei Tang, Haifeng Xu

Comments Accepted by OR'26, conference version in EC'22

详情
英文摘要

We introduce and study the online Bayesian recommendation problem for a recommender system platform. The platform has the privilege to privately observe a utility-relevant \emph{state} of a product at each round and uses this information to make online recommendations to a stream of myopic users. This paradigm is common in a wide range of scenarios in the current Internet economy. The platform commits to an online recommendation policy that utilizes her information advantage on the product state to persuade self-interested users to follow the recommendation. Since the platform does not know users' preferences or beliefs in advance, we study the platform's online learning problem of designing an adaptive recommendation policy to persuade users while gradually learning users' preferences and beliefs en route. Specifically, we aim to design online learning policies with no \emph{Stackelberg regret} for the platform, i.e., against the optimal benchmark policy in hindsight under the assumption that users will correspondingly adapt their responses to the benchmark policy. Our first result is an online policy that achieves double logarithmic regret dependence on the number of rounds. We also present an information-theoretic lower bound showing that no adaptive online policy can achieve regret with better dependency on the number of rounds. Finally, by formulating the platform's problem as optimizing a linear program with membership oracle access, we present our second online recommendation policy that achieves regret with polynomial dependence on the number of states but logarithmic dependence on the number of rounds.

2111.08457 2026-03-24 eess.SP cs.LG

A Novel TSK Fuzzy System Incorporating Multi-view Collaborative Transfer Learning for Personalized Epileptic EEG Detection

Andong Li, Zhaohong Deng, Qiongdan Lou

Comments Publication in Neurocomputing

详情
英文摘要

In clinical practice, electroencephalography (EEG) plays an important role in the diagnosis of epilepsy. EEG-based computer-aided diagnosis of epilepsy can greatly improve the ac-curacy of epilepsy detection while reducing the workload of physicians. However, there are many challenges in practical applications for personalized epileptic EEG detection (i.e., training of detection model for a specific person), including the difficulty in extracting effective features from one single view, the undesirable but common scenario of lacking sufficient training data in practice, and the no guarantee of identically distributed training and test data. To solve these problems, we propose a TSK fuzzy system-based epilepsy detection algorithm that integrates multi-view collaborative transfer learning. To address the challenge due to the limitation of single-view features, multi-view learning ensures the diversity of features by extracting them from different views. The lack of training data for building a personalized detection model is tackled by leveraging the knowledge from the source domain (reference scene) to enhance the performance of the target domain (current scene of interest), where mismatch of data distributions between the two domains is resolved with adaption technique based on maximum mean discrepancy. Notably, the transfer learning and multi-view feature extraction are performed at the same time. Furthermore, the fuzzy rules of the TSK fuzzy system equip the model with strong fuzzy logic inference capability. Hence, the proposed method has the potential to detect epileptic EEG signals effectively, which is demonstrated with the positive results from a large number of experiments on the CHB-MIT dataset.

2603.20533 2026-03-24 cs.CY cs.AI cs.CL

Revenue-Sharing as Infrastructure: A Distributed Business Model for Generative AI Platforms

Ghislain Dorian Tchuente Mondjo

Comments 11 pages, 1 figures, 2 tables

详情
英文摘要

Generative AI platforms (Google AI Studio, OpenAI, Anthropic) provide infrastructures (APIs, models) that are transforming the application development ecosystem. Recent literature distinguishes three generations of business models: a first generation modeled on cloud computing (pay-per-use), a second characterized by diversification (freemium, subscriptions), and a third, emerging generation exploring multi-layer market architectures with revenue-sharing mechanisms. Despite these advances, current models impose a financial barrier to entry for developers, limiting innovation and excluding actors from emerging economies. This paper proposes and analyzes an original model, "Revenue-Sharing as Infrastructure" (RSI), where the platform offers its AI infrastructure for free and takes a percentage of the revenues generated by developers applications. This model reverses the traditional upstream payment logic and mobilizes concepts of value co-creation, incentive mechanisms, and multi-layer market architecture to build an original theoretical framework. A detailed comparative analysis shows that the RSI model lowers entry barriers for developers, aligns stakeholder interests, and could stimulate innovation in the ecosystem. Beyond its economic relevance, RSI has a major societal dimension: by enabling developers without initial capital to participate in the digital economy, it could unlock the "latent jobs dividend" in low-income countries, where mobile penetration reaches 84%, and help address local challenges in health, agriculture, and services. Finally, we discuss the conditions of feasibility and strategic implications for platforms and developers.

2603.20520 2026-03-24 stat.ML cs.LG

CogFormer: Learn All Your Models Once

Jerry M. Huang, Lukas Schumacher, Niek Stevenson, Stefan T. Radev

详情
英文摘要

Simulation-based inference (SBI) with neural networks has accelerated and transformed cognitive modeling workflows. SBI enables modelers to fit complex models that were previously difficult or impossible to estimate, while also allowing rapid estimation across large numbers of datasets. However, the utility of SBI for iterating over varying modeling assumptions remains limited: changing parameterizations, generative functions, priors, and design variables all necessitate model retraining and hence diminish the benefits of amortization. To address these issues, we pilot a meta-amortized framework for cognitive modeling which we nickname the CogFormer. Our framework trains a transformer-based architecture that remains valid across a combinatorial number of structurally similar models, allowing for changing data types, parameters, design matrices, and sample sizes. We present promising quantitative results across families of decision-making models for binary, multi-alternative, and continuous responses. Our evaluation suggests that CogFormer can accurately estimate parameters across model families with a minimal amortization offset, making it a potentially powerful engine that catalyzes cognitive modeling workflows.

2603.20513 2026-03-24 cs.IR cs.AI

ReBOL: Retrieval via Bayesian Optimization with Batched LLM Relevance Observations and Query Reformulation

Anton Korikov, Scott Sanner

详情
英文摘要

LLM-reranking is limited by the top-k documents retrieved by vector similarity, which neither enables contextual query-document token interactions nor captures multimodal relevance distributions. While LLM query reformulation attempts to improve recall by generating improved or additional queries, it is still followed by vector similarity retrieval. We thus propose to address these top-k retrieval stage failures by introducing ReBOL, which 1) uses LLM query reformulations to initialize a multimodal Bayesian Optimization (BO) posterior over document relevance, and 2) iteratively acquires document batches for LLM query-document relevance scoring followed by posterior updates to optimize relevance. After exploring query reformulation and document batch diversification techniques, we evaluate ReBOL against LLM reranker baselines on five BEIR datasets and using two LLMs (Gemini-2.5-Flash-Lite, GPT-5.2). ReBOL consistently achieves higher recall and competitive rankings, for example compared to the best LLM reranker on the Robust04 dataset with 46.5% vs. 35.0% recall@100 and 63.6% vs. 61.2% NDCG@10. We also show that ReBOL can achieve comparable latency to LLM rerankers.

2603.20504 2026-03-24 cs.CR cs.AI

Meeting in the Middle: A Co-Design Paradigm for FHE and AI Inference

Bernardo Magri, Benjamin Marsh, Paul Gebheim

Comments Accepted to AICrypt 2026

详情
英文摘要

Modern cloud inference creates a two sided privacy problem where users reveal sensitive inputs to providers, while providers must execute proprietary model weights inside potentially leaky execution environments. Fully homomorphic encryption (FHE) offers cryptographic guarantees but remains prohibitively expensive for modern architectures. We argue that progress requires co-design where specializing FHE schemes/compilers for the static structure of inference circuits, while simultaneously constraining inference architectures to reduce dominant homomorphic cost drivers. We outline a meet in the middle agenda and concrete optimization targets on both axes.

2603.20467 2026-03-24 stat.ME cs.LG math.DS

Goal-oriented learning of stochastic dynamical systems using error bounds on path-space observables

Joanna Zou, Han Cheng Lie, Youssef Marzouk

详情
英文摘要

The governing equations of stochastic dynamical systems often become cost-prohibitive for numerical simulation at large scales. Surrogate models of the governing equations, learned from data of the high-fidelity system, are routinely used to predict key observables with greater efficiency. However, standard choices of loss function for learning the surrogate model fail to provide error guarantees in path-dependent observables, such as reaction rates of molecular dynamical systems. This paper introduces an error bound for path-space observables and employs it as a novel variational loss for the goal-oriented learning of a stochastic dynamical system. We show the error bound holds for a broad class of observables, including mean first hitting times on unbounded time domains. We derive an analytical gradient of the goal-oriented loss function by leveraging the formula for Frechet derivatives of expected path functionals, which remains tractable for implementation in stochastic gradient descent schemes. We demonstrate that surrogate models of overdamped Langevin systems developed via goal-oriented learning achieve improved accuracy in predicting the statistics of a first hitting time observable and robustness to distributional shift in the data.

2603.20462 2026-03-24 eess.SP cs.AI cs.LG

Shift-Invariant Feature Attribution in the Application of Wireless Electrocardiograms

Yalemzerf Getnet, Abiy Tasissa, Waltenegus Dargie

详情
英文摘要

Assigning relevance scores to the input features of a machine learning model enables to measure the contributions of the features in achieving a correct outcome. It is regarded as one of the approaches towards developing explainable models. For biomedical assignments, this is very useful for medical experts to comprehend machine-based decisions. In the analysis of electro cardiogram (ECG) signals, in particular, understanding which of the electrocardiogram samples or features contributed most for a given decision amounts to understanding the underlying cardiac phases or conditions the machine tries to explain. For the computation of relevance scores, determining the proper baseline is important. Moreover, the scores should have a distribution which is at once intuitive to interpret and easy to associate with the underline cardiac reality. The purpose of this work is to achieve these goals. Specifically, we propose a shift-invariant baseline which has a physical significance in the analysis as well as interpretation of electrocardiogram measurements. Moreover, we aggregate significance scores in such a way that they can be mapped to cardiac phases. We demonstrate our approach by inferring physical exertion from cardiac exertion using a residual network. We show that the ECG samples which achieved the highest relevance scores (and, therefore, which contributed most to the accurate recognition of the physical exertion) are those associated with the P and T waves. Index Terms Attribution, baseline, cardiovascular diseases, electrocardiogram, activity recognition, machine learning