arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1296
2604.01650 2026-04-27 cs.HC cs.AI

AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal Language Models

Yunge Wen, Awu Chen, Jianing Yu, Jas Brooks, Hiroshi Ishii, Paul Pu Liang

详情
英文摘要

Smell's deep connection with food, memory, and social experience has long motivated researchers to bring olfaction into interactive systems. Yet most olfactory interfaces remain limited to fixed scent cartridges and pre-defined generation patterns, and the scarcity of large-scale olfactory datasets has further constrained AI-based approaches. We present AromaGen, an AI-powered wearable interface capable of real-time, general-purpose aroma generation from free-form text or visual inputs. AromaGen is powered by a multimodal LLM that leverages latent olfactory knowledge to map semantic inputs to structured mixtures of 12 carefully selected base odorants, released through a neck-worn dispenser. Users can iteratively refine generated aromas through natural language feedback via in-context learning. Through a controlled user study ($N = 26$), AromaGen matches human-composed mixtures in zero-shot generation and significantly surpasses them after iterative refinement, achieving a median similarity of 8/10 to real food aromas and reducing perceived artificiality to levels comparable to real food. AromaGen is a step towards real-world interactive aroma generation, opening new possibilities for communication, wellbeing, and immersive technologies.

2603.23375 2026-04-27 cs.DB cs.AI cs.CL

Natural Language Interfaces for Spatial and Temporal Databases: A Comprehensive Overview of Methods, Taxonomy, and Future Directions

Samya Acharja, Kanchan Chowdhury

详情
英文摘要

The task of building a natural language interface to a database, known as NLIDB, has recently gained significant attention from both the database and Natural Language Processing (NLP) communities. With the proliferation of geospatial datasets driven by the rapid emergence of location-aware sensors, geospatial databases play a vital role in supporting geospatial applications. However, querying geospatial and temporal databases differs substantially from querying traditional relational databases due to the presence of geospatial topological operators and temporal operators. To bridge the gap between geospatial query languages and non-expert users, the geospatial research community has increasingly focused on developing NLIDBs for geospatial databases. Yet, existing research remains fragmented across systems, datasets, and methodological choices, making it difficult to clearly understand the landscape of existing methods, their strengths and weaknesses, and opportunities for future research. Existing surveys on NLIDBs focus on general-purpose database systems and do not treat geospatial and temporal databases as primary focus for analysis. To address this gap, this paper presents a comprehensive survey of studies on NLIDBs for geospatial and temporal databases. Specifically, we provide a detailed overview of datasets, evaluation metrics, and the taxonomy of the methods for geospatial and temporal NLIDBs, as well as a comparative analysis of the existing methods. Our survey reveals recurring trends in existing methods, substantial variation in datasets and evaluation practices, and several open challenges that continue to hinder progress in this area. Based on these findings, we identify promising directions for future research to advance natural language interfaces to geospatial and temporal databases.

2603.16663 2026-04-27 cs.CY cs.AI cs.HC cs.MA

When AI Agents Learn from Each Other: Insights from Emergent AI Agent Communities on OpenClaw for Human-AI Partnership in Education

Eason Chen, Ce Guan, Zhonghao Zhao, Joshua Zekeri, Afeez Edeifo Shaibu, Emmanuel Osadebe Prince, Cyuan-Jhen Wu, A Elshafiey

Comments 15 pages. Paper accepted at AIED 2026 bluesky

详情
英文摘要

The AIED community envisions AI evolving "from tools to teammates," yet most research still examines AI agents primarily through one-on-one human-AI interactions. We provide an alternative perspective: a rapidly growing ecosystem of AI agent platforms where over 167,000 agents participate, interact as peers, and develop learning behaviors without researcher intervention. Based on a month of daily qualitative observations across multiple platforms including Moltbook, The Colony, and 4claw, we identify four phenomena with implications for AIED: (1) humans who configure their agents undergo a "bidirectional scaffolding" process, learning through teaching; (2) peer learning emerges without any designed curriculum, including sharing concrete agent artifacts such as skills, workflows, and reusable routines; (3) agents converge on shared memory architectures that mirror open learner model design; and (4) trust dynamics, reliance risks, and platform mortality reveal design constraints for networked educational AI. Rather than presenting empirical findings, we argue that these organic phenomena offer a naturalistic window into dynamics that can inform principled design of multi-agent educational systems. We sketch an illustrative curriculum design, "Learning with Your AI Agent Tutor," and outline potential research directions and open problems to show how these observations might inform future AIED practice and inquiry.

2602.21468 2026-04-27 cond-mat.str-el cond-mat.dis-nn cs.LG quant-ph

Unsupervised Discovery of Intermediate Phase Order in the Frustrated $J_1$-$J_2$ Heisenberg Model via Prometheus Framework

Brandon Yee, Wilson Collins, Maximilian Rutkowski

Comments Substantial revision required across the whole text

详情
英文摘要

The spin-$1/2$ $J_1$-$J_2$ Heisenberg model on the square lattice exhibits a debated intermediate phase between Néel antiferromagnetic and stripe ordered regimes, with competing theories proposing plaquette valence bond, nematic, and quantum spin liquid ground states. We apply the Prometheus variational autoencoder framework -- previously applied to classical (2D, 3D Ising) and quantum (disordered transverse field Ising) phase transitions -- to systematically explore the $J_1$-$J_2$ phase diagram using a multi-scale approach. For $L=4$, we employ exact diagonalization with full wavefunction analysis via quantum-aware VAE. For larger systems ($L=6, 8$), we introduce a reduced density matrix (RDM) based methodology using DMRG ground states, enabling scaling beyond the exponential barrier of full Hilbert space representation. Through dense parameter scans of $J_2/J_1 \in [0, 1]$ and comprehensive latent space analysis, we identify the structure factor $S(π,π)$ and $S(π,0)$ as the dominant order parameters discovered by the VAE, with correlations exceeding $|r| > 0.97$. The RDM-VAE approach successfully captures the Néel-to-stripe crossover near $J_2/J_1 \approx 0.5$--$0.6$, demonstrating that local quantum correlations encoded in reduced density matrices contain sufficient information for unsupervised phase discovery. This work establishes a scalable pathway for applying machine learning to frustrated quantum systems where full wavefunction access is computationally prohibitive.

2602.20376 2026-04-27 cs.DS cs.LG math.OC quant-ph

Exploiting Low-Rank Structure in Max-K-Cut Problems

Ria Stevens, Fangshuo Liao, Barbara Su, Jianqiang Li, Anastasios Kyrillidis

详情
英文摘要

We approach the Max-3-Cut problem through the lens of maximizing complex-valued quadratic forms and demonstrate that low-rank structure in the objective matrix can be exploited, leading to alternative algorithms to classical semidefinite programming (SDP) relaxations and heuristic techniques. We propose an algorithm for maximizing these quadratic forms over a domain of size $K$ that enumerates and evaluates a set of $O\left(n^{2r-1}\right)$ candidate solutions, where $n$ is the dimension of the matrix and $r$ represents the rank of an approximation of the objective. We prove that this candidate set is guaranteed to include the exact maximizer when $K=3$ (corresponding to Max-3-Cut) and the objective is low-rank, and provide approximation guarantees when the objective is a perturbation of a low-rank matrix. This construction results in a family of novel, inherently parallelizable and theoretically-motivated algorithms for Max-3-Cut. Extensive experimental results demonstrate that our approach achieves performance comparable to existing algorithms across a wide range of graphs, while being highly scalable.

2601.08919 2026-04-27 cs.IR cs.CL cs.LG

LLMs as Assessors: Right for the Right Reason?

Sourav Saha, Mandar Mitra, Aditya Dutta

详情
英文摘要

A good deal of recent research has focused on how Large Language Models (LLMs) may be used as judges in place of humans to evaluate the quality of the output produced by various text / image processing systems. Within this broader context, a number of studies have investigated the specific question of how effectively LLMs can be used as relevance assessors for the standard ad hoc task in Information Retrieval (IR). We extend these studies by looking at additional questions. Most importantly, we use a Wikipedia based test collection created by the INEX initiative, and prompt LLMs to not only judge whether documents are relevant / non-relevant, but to highlight relevant passages in documents that it regards as useful. The human relevance assessors involved in creating this collection were given analogous instructions, i.e., they were asked to highlight all passages within a document that respond to the information need expressed in a query. This enables us to evaluate the quality of LLMs as judges not only at the document level, but to also quantify how often these judges are right for the right reasons. Our observations lead us to reiterate the cautionary note sounded in some earlier studies when it comes to using LLMs as assessors for creating IR datasets: while LLMs are unquestionably promising, and may be used judiciously to subtantially reduce the amount of human involvement required to generate high-quality benchmark datasets, they cannot replace humans as assessors.

2601.03294 2026-04-27 cs.CR cs.AI

AgentMark: Utility-Preserving Behavioral Watermarking for Agents

Kaibo Huang, Jin Tan, Yukun Wei, Wanling Li, Zipei Zhang, Hui Tian, Zhongliang Yang, Linna Zhou

Comments Accepted to ACL 2026 (Main, Poster)

详情
英文摘要

LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.

2512.16251 2026-04-27 q-fin.PR cs.AI cs.LG

Interpretable Deep Learning for Stock Returns: A Consensus-Bottleneck Asset Pricing Model

Changeun Kim, Younwoo Jeong, Bong-Gyu Jang

详情
英文摘要

We introduce the Consensus-Bottleneck Asset Pricing Model (CB-APM), which embeds aggregate analyst consensus as a structural bottleneck, treating professional beliefs as a sufficient statistic for the market's high-dimensional information set. Unlike post-hoc explainability approaches, CB-APM achieves interpretability-by-design: the bottleneck constraint functions as an endogenous regularizer that simultaneously improves out-of-sample predictive accuracy and anchors inference to economically interpretable drivers. Portfolios sorted on CB-APM forecasts exhibit a strong monotonic return gradient, robust across macroeconomic regimes. Pricing diagnostics further reveal that the learned consensus encodes priced variation not spanned by canonical factor models, identifying belief-driven risk heterogeneity that standard linear frameworks systematically miss.

2510.21236 2026-04-27 cs.CR cs.AI cs.SE

AgentBound: Securing Execution Boundaries of AI Agents

Christoph Bühler, Matteo Biagiola, Luca Di Grazia, Guido Salvaneschi

详情
Journal ref
Proceedings of the 34th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE). 2026
英文摘要

Large Language Models (LLMs) have evolved into AI agents that interact with external tools and environments to perform complex tasks. The Model Context Protocol (MCP) has become the de facto standard for connecting agents with such resources, but security has lagged behind: thousands of MCP servers execute with unrestricted access to host systems, creating a broad attack surface. In this paper, we introduce AgentBound, the first access control framework for MCP servers. AgentBound combines a declarative policy mechanism, inspired by the Android permission model, with a policy enforcement engine that contains malicious behavior without requiring MCP server modifications. We build a dataset containing the 296 most popular MCP servers, and show that access control policies can be generated automatically from source code with 80.9% accuracy. We also show that AgentBound blocks the majority of security threats in several malicious MCP servers, and that the policy enforcement engine introduces negligible overhead. Our contributions provide developers and project managers with a foundation for securing MCP servers while maintaining productivity, enabling researchers and tool builders to explore new directions for declarative access control and MCP security.

2510.16853 2026-04-27 cs.CY cs.AI

Agentic Inequality

Matthew Sharp, Omer Bilgin, Iason Gabriel, Lewis Hammond

详情
英文摘要

Autonomous AI agents capable of complex planning and action mark a shift beyond today's generative tools. As these systems enter political and economic life, who can access them, how capable they are, and how many can be deployed will shape distributions of power and opportunity. We define this emerging challenge as "agentic inequality": disparities in power, opportunity, and outcomes arising from unequal access to, and capabilities of, AI agents. We show that agents could either deepen existing divides or, under the right conditions, mitigate them. The paper makes three contributions. First, it develops a framework for analysing agentic inequality across three dimensions: availability, quality, and quantity. Second, it argues that agentic inequality differs from earlier technological divides because agents function as autonomous delegates rather than tools, generating new asymmetries through scalable goal delegation and direct agent-to-agent competition. Third, it analyses the technical and socioeconomic drivers likely to shape the distribution of agentic power, from model release strategies to market incentives, and concludes with a research agenda for governance.

2508.15919 2026-04-27 cs.DC cs.AI

HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling

Zahra Yousefijamarani, Xinglu Wang, Qian Wang, Morgan Lindsay Heisler, Taha Shabani, Niloofar Gholipour, Parham Yassini, Hong Chang, Kan Chen, Qiantao Zhang, Xiaolong Bai, Jiannan Wang, Ying Xiong, Yong Zhang, Zhenan Fan

详情
英文摘要

Large language model (LLM) serving faces the dual challenge of meeting strict user-specific service-level objectives (SLOs) while minimizing computational cost under dynamic, multi-task workloads. Existing approaches either rely on static scheduling policies or focus on single-task settings, limiting their applicability in real-world deployments with heterogeneous requests, variable prompt lengths, and elastic scaling requirements. We present HFX, a production LLM serving system that jointly optimizes request scheduling and elastic scaling across model replicas to satisfy diverse SLOs. HFX introduces a \textbf{scheduler} that performs proactive budget estimation and prioritization to ensure SLO compliance for both new and in-flight requests. HFX also integrates a \textbf{scaler} that supports fast device-to-device (D2D) weight transfer, reducing cold-start latency. Additionally, the system supports both colocated and disaggregated prefill/decode deployments, enabling adaptation to diverse workload patterns and cloud environments. Through extensive experiments on multi-task workloads, we demonstrate consistently higher SLO attainment, lower end-to-end latency, and lower NPU usage cost by up to 4.44$\times$, 65.82\%, and 49.81\%, respectively, compared to state-of-the-art systems. Our results highlight the effectiveness of SLO-aware scheduling and scaling in practical LLM serving, providing a robust framework for cost-efficient and SLO-compliant deployments.

2508.15468 2026-04-27 hep-ex cs.AR cs.LG

JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs

Zhiqiang Que, Chang Sun, Sudarshan Paramesvaran, Emyr Clement, Katerina Karakoulaki, Christopher Brown, Lauri Laatu, Arianna Cox, Alexander Tapper, Wayne Luk, Maria Spiropulu

Comments It has been accepted by FPT 2025

详情
Journal ref
2025 International Conference on Field Programmable Technology (ICFPT), Shanghai, China, 2025, pp. 171-179
英文摘要

Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider (HL-LHC). However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply. In this work, we propose JEDI-linear, a novel GNN architecture with linear computational complexity that eliminates explicit pairwise interactions by leveraging shared transformations and global aggregation. To further enhance hardware efficiency, we introduce fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic. Evaluation results show that our FPGA-based JEDI-linear achieves 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, and up to 6.2 times lower LUT usage compared to state-of-the-art GNN designs while also delivering higher model accuracy and eliminating the need for DSP blocks entirely. This is the first interaction-based GNN to achieve less than 60~ns latency and currently meets the requirements for use in the HL-LHC CMS Level-1 trigger system. This work advances the next-generation trigger systems by enabling accurate, scalable, and resource-efficient GNN inference in real-time environments. Our open-sourced templates will further support reproducibility and broader adoption across scientific applications.

2508.05633 2026-04-27 cs.IR cs.AI

KuaiLive: A Real-time Interactive Dataset for Live Streaming Recommendation

Changle Qu, Sunhao Dai, Ke Guo, Xiao Zhang, Liqin Zhao, Shijun Wang, Yannan Niu, Lantao Hu, Han Li, Jun Xu

Comments Accepted by SIGIR 2026

详情
英文摘要

Live streaming platforms have become a dominant form of online content consumption, offering dynamically evolving content, real-time interactions, and highly engaging user experiences. These unique characteristics introduce new challenges that differentiate live streaming recommendation from traditional recommendation settings and have garnered increasing attention from industry in recent years. However, research progress in academia has been hindered by the lack of publicly available datasets that accurately reflect the dynamic nature of live streaming environments. To address this gap, we introduce KuaiLive, the first real-time, interactive dataset collected from Kuaishou, a leading live streaming platform in China with over 400 million daily active users. The dataset records the interaction logs of 23,772 users and 452,621 streamers over a 21-day period. Compared to existing datasets, KuaiLive offers several advantages: it includes precise live room start and end timestamps, multiple types of real-time user interactions (click, comment, like, gift), and rich side information features for both users and streamers. These features enable more realistic simulation of dynamic candidate items and better modeling of user and streamer behaviors. We conduct a thorough analysis of KuaiLive from multiple perspectives and evaluate several representative recommendation methods on it, establishing a strong benchmark for future research. KuaiLive can support a wide range of tasks in the live streaming domain, such as top-K recommendation, click-through rate prediction, watch time prediction, and gift price prediction. Moreover, its fine-grained behavioral data also enables research on multi-behavior modeling, multi-task learning, and fairness-aware recommendation. The dataset and related resources are publicly available at https://imgkkk574.github.io/KuaiLive.

2507.19861 2026-04-27 quant-ph cs.LG

Quantum-Informed Machine Learning for Predicting Spatiotemporal Chaos with Practical Quantum Advantage

Maida Wang, Xiao Xue, Mingyang Gao, Peter V. Coveney

Comments 95 pages, 18 figures

详情
Journal ref
Science Advances, 2026, Vol 12, Issue 16
英文摘要

We introduce a quantum-informed machine learning (QIML) framework for modelling the long-term behaviour of high-dimensional chaotic systems. QIML combines a one-time, offline-trained quantum generative model with a classical autoregressive predictor for spatiotemporal field generation. The quantum model learns a quantum prior (Q-Prior) that guides the representation of small-scale interactions and improves the modelling of fine-scale dynamics. We evaluate QIML on the Kuramoto-Sivashinsky equation, two-dimensional Kolmogorov flow, and the three-dimensional turbulent channel flow used as a realistic inflow condition. Across these systems, QIML improves predictive distribution accuracy by up to 17.25% and full-spectrum fidelity by up to 29.36% relative to classical baselines. For turbulent channel inflow, the Q-Prior is trained on a superconducting quantum processor and proves essential: without it, predictions become unstable, whereas QIML produces physically consistent long-term forecasts that outperform leading PDE solvers. Beyond accuracy, QIML offers a memory advantage by compressing multi-megabyte datasets into a kilobyte-scale Q-Prior, enabling scalable integration of quantum resources into scientific modelling.

2507.15753 2026-04-27 cs.CE cs.AI

Algebraic Language Models for Inverse Design of Metamaterials via Diffusion Transformers

Li Zheng, Siddhant Kumar, Dennis M. Kochmann

详情
英文摘要

Generative machine learning models have revolutionized material discovery by capturing complex structure-property relationships, yet extending these approaches to the inverse design of three-dimensional metamaterials remains limited by computational complexity and underexplored design spaces due to the lack of expressive representations. Here we present DiffuMeta, a generative framework integrating diffusion transformers with an algebraic language representation, encoding three-dimensional geometries as mathematical sentences. This compact, unified parameterization spans diverse topologies, enabling the direct application of transformers to structural design. DiffuMeta leverages diffusion models to generate new shell structures with precisely targeted stress-strain responses under large deformations, accounting for buckling and contact while addressing the inherent one-to-many mapping by producing diverse solutions. Uniquely, our approach enables simultaneous control over multiple mechanical objectives, including linear and nonlinear responses beyond training domains. Experimental validation of fabricated structures further confirms the efficacy of our approach for accelerated design of metamaterials and structures with tailored properties.

2507.09028 2026-04-27 q-bio.QM cs.AI

From Classical Machine Learning to Emerging Foundation Models: Review on Multimodal Data Integration for Cancer Research

Amgad Muneer, Muhammad Waqas, Maliazurina B Saad, Eman Showkatian, Rukhmini Bandyopadhyay, Hui Xu, Wentao Li, Joe Y Chang, Zhongxing Liao, Cara Haymaker, Luisa Solis Soto, Carol C Wu, Natalie I Vokes, Xiuning Le, Lauren A Byers, Don L Gibbons, John V Heymach, Jianjun Zhang, Jia Wu

Comments 10 figures, 5 tables

详情
Journal ref
Artificial Intelligence Review 59, 119 (2026)
英文摘要

Cancer research is increasingly driven by the integration of diverse data modalities, spanning from genomics and proteomics to imaging and clinical factors. However, extracting actionable insights from these vast and heterogeneous datasets remains a key challenge. The rise of foundation models (FMs) -- large deep-learning models pretrained on extensive amounts of data serving as a backbone for a wide range of downstream tasks -- offers new avenues for discovering biomarkers, improving diagnosis, and personalizing treatment. This paper presents a comprehensive review of widely adopted integration strategies of multimodal data to assist advance the computational approaches for data-driven discoveries in oncology. We examine emerging trends in machine learning (ML) and deep learning (DL), including methodological frameworks, validation protocols, and open-source resources targeting cancer subtype classification, biomarker discovery, treatment guidance, and outcome prediction. This study also comprehensively covers the shift from traditional ML to FMs for multimodal integration. We present a holistic view of recent FMs advancements and challenges faced during the integration of multi-omics with advanced imaging data. We identify the state-of-the-art FMs, publicly available multi-modal repositories, and advanced tools and methods for data integration. We argue that current state-of-the-art integrative methods provide the essential groundwork for developing the next generation of large-scale, pre-trained models poised to further revolutionize oncology. To the best of our knowledge, this is the first review to systematically map the transition from conventional ML to advanced FM for multimodal data integration in oncology, while also framing these developments as foundational for the forthcoming era of large-scale AI models in cancer research.

2507.04535 2026-04-27 cs.AR cs.LG hep-ex

da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

Chang Sun, Zhiqiang Que, Vladimir Loncar, Wayne Luk, Maria Spiropulu

详情
Journal ref
ACM Trans. Reconfig. Technol. Syst., Vol. 19, No. 1, Article 13. (March 2026)
英文摘要

Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs fully unrolled and pipelined. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficient algorithm for implementing CMVM operations with distributed arithmetic on FPGAs that simultaneously optimizes for area consumption and latency. The algorithm achieves resource reduction similar to state-of-the-art algorithms while being significantly faster to compute. The proposed algorithm is open-sourced and integrated into the \texttt{hls4ml} library, a free and open-source library for running real-time neural network inference on FPGAs. We show that the proposed algorithm can reduce on-chip resources by up to a third for realistic, highly quantized neural networks while simultaneously reducing latency, enabling the implementation of previously infeasible networks.

2507.03014 2026-04-27 cs.CR cs.CL cs.LG

Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

Do-hyeon Yoon, Minsoo Chun, Thomas Allen, Hans Müller, Min Wang, Rajesh Sharma

Comments arXiv admin note: This paper has been withdrawn by arXiv due to unverifiable authorship and affiliation

详情
英文摘要

Large language models (LLMs) face significant copyright and intellectual property challenges as the cost of training increases and model reuse becomes prevalent. While watermarking techniques have been proposed to protect model ownership, they may not be robust to continue training and development, posing serious threats to model attribution and copyright protection. This work introduces a simple yet effective approach for robust LLM fingerprinting based on intrinsic model characteristics. We discover that the standard deviation distributions of attention parameter matrices across different layers exhibit distinctive patterns that remain stable even after extensive continued training. These parameter distribution signatures serve as robust fingerprints that can reliably identify model lineage and detect potential copyright infringement. Our experimental validation across multiple model families demonstrates the effectiveness of our method for model authentication. Notably, our investigation uncovers evidence that a recently Pangu Pro MoE model released by Huawei is derived from Qwen-2.5 14B model through upcycling techniques rather than training from scratch, highlighting potential cases of model plagiarism, copyright violation, and information fabrication. These findings underscore the critical importance of developing robust fingerprinting methods for protecting intellectual property in large-scale model development and emphasize that deliberate continued training alone is insufficient to completely obscure model origins.

2506.17299 2026-04-27 cs.CR cs.AI cs.LG

Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem

Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan

Comments Accepted to MLSys 2026

详情
英文摘要

As large language models (LLMs) become increasingly deployed in safety-critical applications, the lack of systematic methods to assess their vulnerability to jailbreak attacks presents a critical security gap. We introduce the jailbreak oracle problem: given a model, prompt, and decoding strategy, determine whether a jailbreak response can be generated with likelihood exceeding a specified threshold. This formalization enables a principled study of jailbreak vulnerabilities. Answering the jailbreak oracle problem poses significant computational challenges, as the search space grows exponentially with response length. We present Boa, the first system designed for efficiently solving the jailbreak oracle problem. Boa employs a two-phase search strategy: (1) breadth-first sampling to identify easily accessible jailbreaks, followed by (2) depth-first priority search guided by fine-grained safety scores to systematically explore promising yet low-probability paths. Boa enables rigorous security assessments including systematic defense evaluation, standardized comparison of red team attacks, and model certification under extreme adversarial conditions. Code is available at https://github.com/shuyilinn/BOA/tree/mlsys2026ae

2505.12296 2026-04-27 cs.CR cs.AI cs.LG

PoLO: Proof-of-Learning and Proof-of-Ownership at Once with Chained Watermarking

Haiyu Deng, Yanna Jiang, Guangsheng Yu, Qin Wang, Xu Wang, Baihe Ma, Wei Ni, Ren Ping Liu

详情
英文摘要

Our evaluation shows that PoLO achieves \textbf{99\%} watermark detection accuracy for ownership verification, while preserving data privacy and cutting verification costs to just \textbf{1.5--10\%} of traditional methods. Forging PoLO demands \textbf{1.1--4$\times$} more resources than honest proof generation, with the original proof retaining over \textbf{90\%} detection accuracy even after attacks.

2504.08170 2026-04-27 quant-ph cs.LG physics.comp-ph

Efficient measurement of neutral-atom qubits with matched filters

Robert M. Kent, Linipun Phuttitarn, Chaithanya Naik Mude, Swamit Tannu, Mark Saffman, Gregory Lafyatis, Daniel J. Gauthier

详情
Journal ref
Phys. Rev. Applied 25, 044048 (2026)
英文摘要

Quantum computers require high-fidelity measurement of many qubits to achieve a quantum advantage. Traditional approaches suffer from readout crosstalk for a neutral-atom quantum processor with a tightly spaced array. Although classical machine learning algorithms based on convolutional neural networks can improve fidelity, they are computationally expensive, making it difficult to scale them to large qubit counts. We present two simpler and scalable machine learning algorithms that realize matched filters for the readout problem. One is a local model that focuses on a single qubit, and the other uses information from neighboring qubits in the array to prevent crosstalk among the qubits. We demonstrate error reductions of up to 32% and 43% for the site and array models, respectively, compared to a conventional Gaussian threshold approach. Additionally, our array model uses two orders of magnitude fewer trainable parameters and four orders of magnitude fewer multiplications and nonlinear function evaluations than a recent convolutional neural network approach, with only a minor (3.5%) increase in error across different readout times. Another strength of our approach is its physical interpretability: the learned filter can be visualized to provide insights into experimental imperfections. We also show that a convolutional neural network model for improved can be pruned to have 70x and 4000x fewer parameters, respectively, while maintaining similar errors. Our work shows that simple machine learning approaches can achieve high-fidelity qubit measurements while remaining scalable to systems with larger qubit counts.

2502.17011 2026-04-27 q-fin.CP cs.CE cs.CL cs.LG q-fin.PM

Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation

Jaskaran Singh Walia, Aarush Sinha, Naman Saraswat, Srinitish Srinivasan, Srihari Unnikrishnan

详情
英文摘要

Financial bond yield forecasting is challenging due to data scarcity, nonlinear macroeconomic dependencies, and evolving market conditions. In this paper, we propose a novel framework that leverages Causal Generative Adversarial Networks (CausalGANs) and Soft Actor-Critic (SAC) reinforcement learning (RL) to generate high-fidelity synthetic bond yield data for four major bond categories (AAA, BAA, US10Y, Junk). By incorporating 12 key macroeconomic variables, we ensure statistical fidelity by preserving essential market properties. To transform this market dependent synthetic data into actionable insights, we employ a finetuned Large Language Model (LLM) Qwen2.5-7B that generates trading signals (BUY/HOLD/SELL), risk assessments, and volatility projections. We use automated, human and LLM evaluations, all of which demonstrate that our framework improves forecasting performance over existing methods, with statistical validation via predictive accuracy, MAE evaluation(0.103%), profit/loss evaluation (60% profit rate), LLM evaluation (3.37/5) and expert assessments scoring 4.67 out of 5. The reinforcement learning-enhanced synthetic data generation achieves the least Mean Absolute Error of 0.103, demonstrating its effectiveness in replicating real-world bond market dynamics. We not only enhance data-driven trading strategies but also provides a scalable, high-fidelity synthetic financial data pipeline for risk & volatility management and investment decision-making. This work establishes a bridge between synthetic data generation, LLM driven financial forecasting, and language model evaluation, contributing to AI-driven financial decision-making.

2501.19277 2026-04-27 stat.ML cs.LG

On Pareto Optimality for Parametric Choice Bandits

Jierui Zuo, Hanzhang Qin

详情
英文摘要

We study online assortment optimization under stochastic choice when a decision maker simultaneously values cumulative revenue performance and the quality of post-hoc inference on revenue contrasts. We analyze a forced-exploration optimism-in-the-face-of-uncertainty (OFU) scheme that combines two regularized maximum-likelihood estimators: one based on all observations for sequential decision making, and one based only on exploration rounds for inference. Our general theory is developed under predictable score proxies and per-round action-dependent curvature domination. Under these conditions we establish a self-normalized concentration inequality, a likelihood-based ellipsoidal confidence-set theorem, and a regret bound for approximate optimistic actions that explicitly accounts for optimization error. For the multinomial logit (MNL) model we derive explicit score and curvature proxies and show that a balanced spaced singleton-exploration schedule yields realized coordinate coverage, implying regret $\Otilde(n_T + T/\sqrt{n_T})$ and revenue-contrast error $\Otilde(1/\sqrt{n_T})$ up to fixed problem-dependent factors. A hard two-assortment subclass yields a matching lower bound at the product level. Consequently, within the polynomial exploration family $n_T \asymp T^α$, the regret and inference rates become $\Otilde(T^{\max\{α,1-α/2\}})$ and $\Otilde(T^{-α/2})$, respectively; hence $α\in[2/3,1)$ is the rate-wise Pareto-undominated interval and $α=2/3$ is the unique balancing point that minimizes the regret exponent. Finally, for the Exponomial Choice and Nested Logit models we state verifiable sufficient conditions that would instantiate the general framework.

2409.18169 2026-04-27 cs.CR cs.AI cs.LG

Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey

Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu

Comments Accepted by ACM Computing Survey (CSUR). The authors will continuously update the arXiv version. Please reach out to the authors if you find uncovered papers on relevant topics

详情
英文摘要

Recent research demonstrates that the nascent fine-tuning-as-a-service business model exposes serious safety concerns: fine-tuning with a few harmful data uploaded from the users can compromise the safety alignment of the model. The attack, known as harmful fine-tuning attack, has generated broad research interests in both academia and industry. In this paper, we first systematically formulate the threat model and basic assumptions of harmful fine-tuning. Then, we provide a comprehensive review of harmful fine-tuning from three fundamental perspectives: attack setting, defense design, and evaluation methodology. First, we present the threat model of the problem and introduce the harmful fine-tuning attack and its variants. Next, we systematically survey representative attacks, defense methods, and mechanical analysis of adverse effects in the existing literature. Finally, we introduce the evaluation methodology and outline future research directions, which can serve as guidelines and crucial perspectives for the future development of the subject. We also maintain a curated list of relevant papers, which are made accessible at https://github.com/git-disl/awesome_LLM-harmful-fine-tuning-papers

2407.08750 2026-04-27 stat.ML cs.LG econ.EM stat.AP stat.CO stat.ME

Online Distributional Regression

Simon Hirsch, Jonathan Berrisch, Florian Ziel

Comments Revised version January 2026. 34 pages, 9 figures, 4 tables including appendix

详情
英文摘要

Large-scale streaming data are common in modern machine learning applications and have led to the development of online learning algorithms. Many fields, such as supply chain management, weather and meteorology, energy markets, and finance, have pivoted toward probabilistic forecasting. This results in the need not only for accurate learning of the expected value but also for learning the conditional heteroskedasticity and conditional moments. Against this backdrop, we present a methodology for online estimation of regularized, linear distributional models. The proposed algorithm combines recent developments in online estimation of LASSO models with the well-known GAMLSS framework. We provide a case study on day-ahead electricity price forecasting, in which we show the competitive performance of the incremental estimation combined with strongly reduced computational effort. Our algorithms are implemented in a computationally efficient Python package ondil.

2406.14111 2026-04-27 cs.DS cs.LG

Expander Hierarchies for Normalized Cuts on Graphs

Kathrin Hanauer, Monika Henzinger, Robin Münk, Harald Räcke, Maximilian Vötsch

Comments Short version appeared at KDD'24, August 25-29, 2024, Barcelona, Spain

详情
英文摘要

Expander decompositions of graphs have significantly advanced the understanding of many classical graph problems and led to numerous fundamental theoretical results. However, their adoption in practice has been hindered due to their inherent intricacies and large hidden factors in their asymptotic running times. Here, we introduce the first practically efficient algorithm for computing expander decompositions and their hierarchies and demonstrate its effectiveness and utility by incorporating it as the core component in a novel solver for the normalized cut graph clustering objective. Our extensive experiments on a variety of large graphs show that our expander-based algorithm outperforms state-of-the-art solvers for normalized cut with respect to solution quality by a large margin on a variety of graph classes such as citation, e-mail, and social networks or web graphs while remaining competitive in running time.

2209.06865 2026-04-27 q-bio.NC cond-mat.dis-nn cs.AI cs.NE q-bio.MN

Sketch of a novel approach to a neural model

Gabriele Scheler

详情
英文摘要

We present an account of neuroplasticity with respect to cell-internal processing pathways in relation to membrane and synaptic plasticity. We think traditional synapse-centric, weight-based models of memorization are not sufficient or adequate to capture the complexity of neuroplasticity. In these accounts, the model is a network of neurons connected by adaptive transmission links. The adaptation of the transmission links relies on weight changes according to use of the transmission link (short-term and long-term potentiation/depression). In contrast, we propose a paradigm switch from a synapse-centric model (each synapse learns independently, based on its history of use) to a neuron-centric model (each neuron uses signal selection for intracellular pathways to express plasticity at the membrane). A neural model consists of (a) expression of parameters at the membrane, in particular dendritic synapses or spines, and axonal boutons (b) internal parameters in the sub-membrane zone and the cytoplasm with its protein signaling network and (c) core parameters in the nucleus for genetic and epigenetic information. In a neuron-centric model, each node (=neuron) in the network has its own internal memory. Neural transmission and information storage are separated, not automatically combined by coupling strength. There is filtering and selection of signals for storage. Not every transmission event leaves a trace. This represents an important conceptual advance over synaptic weight models. We present the neuron as a self-programming device, rather than as passively determined by ongoing input. We believe a new approach to neural modeling is necessary, because the experimental evidence is not well captured by traditional synapse-centric models. Ultimately, we are interested in the possibilities of a flexible memory system that processes external signals according to its inherent structure.

2604.22752 2026-04-27 stat.ME

From Physics to Statistics: A Simple Route to Exponential Families via Maximum Entropy

Korbinian Strimmer

Comments 17 pages, 2 tables

详情
英文摘要

Exponential families form the backbone of modern statistics and machine learning, but textbooks seldom derive them from first principles in an accessible way. Although minimal sufficiency and the principle of maximum entropy, originating in physics, provide core motivation, they are often presented as technical and requiring advanced prerequisites. Here, a short, self-contained derivation of exponential families based on maximum entropy is presented that is straightforward to carry out, requires only a modest background in information entropy, and avoids technicalities like constrained optimisation. Two propositions are demonstrated in this fashion: i) exponential families with a general base maximise information entropy with respect to that base subject to fixed expectations of canonical statistics, and ii) exponential families with a uniform base maximise standard information entropy under the same constraints. Maximum entropy therefore provides a principled foundation for exponential families with minimal prerequisites, highlighting the value of teaching entropy in statistics courses.

2604.22751 2026-04-27 quant-ph

Correlated Quantum Dephasometry: Symmetry-Resolved Noise Spectroscopy of Two-Dimensional Superconductors and Altermagnets

Wenbo Sun, Zubin Jacob

Comments 5 pages (main text) + 10 pages (supplemental material), 3 figures

详情
英文摘要

Symmetry-resolved spectroscopies, such as angle-resolved photoemission spectroscopy and polarization-resolved Raman, are central for quantum material characterization, yet remain challenging at nanoscale dimensions and low frequencies. Here, we propose correlated quantum dephasometry, which enables symmetry resolved quantum noise spectroscopy of materials at nanoscale and low ($\sim$MHz) frequencies via correlated dephasing of two spin qubits near materials. Our approach leverages the finite-range spatial structures of nonlocal near-field noise correlations to isolate rotational symmetry of the material response in momentum space beyond single qubit capabilities. We apply our approach to two-dimensional (2D) superconductors, and predict clear fingerprints that discriminate s-, d-, and g-wave symmetry of the superconducting gap. To highlight the generality, we further show that the same framework resolves 2D s-wave antiferromagnets and d-wave altermagnets. Our results establish correlated quantum dephasometry as a nanoscale, low-frequency complement for symmetry-resolved spectroscopy applicable to a broad class of quantum materials.

2604.22747 2026-04-27 cs.SE

Code for All: Educational Applications of the "Vibe Coding" Hackathon in Programming Education across All Skill Levels

Ashley J. Chen, Yijia Cao, Minghao Shao, Ramesh Karri, Muhammad Shafique

Comments 15 pages, 14 figures

详情
英文摘要

The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-long online hackathon that welcomed participants from multiple countries, ranging from complete beginners to experienced developers. The hackathon offered three tracks with increasing technical demands. Spark emphasized basic frontend functionality and dynamic features such as buttons, forms, and API calls. Build required backend or database integration. Launch targeted production ready web applications, including deployment. Participants were required to develop projects using only LLM generated code without manual edits and submitted complete chat histories, source code, demo videos, and functionality reports. We assessed educational effectiveness with a mixed methods design that combined standardized project evaluations across functionality, user interface and user experience design, impact, prompt quality, and code readability, along with post-hackathon surveys of perceived learning outcomes and thematic analysis of open-ended feedback. Our findings describe how participants with different backgrounds engage with vibe coding as task complexity increases, how the no manual editing constraint shapes prompting and debugging practices, and what these patterns imply for integrating AI assisted development into programming education and competitive learning environments.