arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2605.07297 2026-05-11 stat.ML cs.LG

Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers

Mana Sakai, Masaaki Imaizumi

AI总结 本文研究了训练好的Transformer模型泛化性能良好的原因,提出了基于谱适配的后验泛化界。通过逐层控制谱范数,作者将泛化界表示为查询-键、值和前馈权重矩阵的Schatten量,这些量可根据训练后的奇异值分布进行自适应选择,从而在谱复杂度与维度、深度相关因素之间取得平衡。实验表明,与基于范数的界相比,本文提出的复杂度代理量随深度和隐藏维度的增长速度更慢,为理解Transformer的泛化能力提供了新的视角。

详情
英文摘要

Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers remove the explicit polynomial dependence on the hidden dimension, they typically impose fixed norm constraints specified a priori and can exhibit unfavorable exponential dependence on depth. In this paper, we derive spectrum-adaptive post hoc generalization bounds for multi-layer Transformers. Under layerwise spectral norm control, the bounds are expressed in terms of layerwise Schatten quantities of the query-key, value, and feedforward weight matrices. Since the Schatten indices need not be fixed a priori and can instead be selected after training, separately for each matrix type and layer, the bounds adaptively trade off spectral complexity against the dimension- and depth-dependent factors according to the learned singular-value profiles. Empirical comparisons of BERT-adapted proxies for the leading complexity factors suggest that the proxies induced by our bounds grow more slowly with depth and hidden dimension than the corresponding norm-based proxies. Overall, our results provide a complexity-based perspective on how the spectral structure of trained Transformers is reflected in generalization analyses.

2605.07289 2026-05-11 cs.DS cs.CL

On the Complexity of the Matching Problem of Regular Expressions with Backreferences

Soh Kumabe, Yuya Uezato

AI总结 本文研究了带有回溯引用的正则表达式(REWB)的字符串匹配问题的精细复杂性。作者证明了在某些参数化条件下,该问题无法在接近线性时间内求解,并给出了基于集合论假设和参数化复杂性的下界结果。同时,他们提出了一种针对单次使用REWB的高效算法,其时间复杂度为 $O(n \log^2 n)$,显著优于现有算法。这一工作为构建高效且安全的正则表达式匹配引擎提供了理论支持。

Comments Full version of ICALP 2026; The abstract field is slightly shorter than that in the paper due to arXiv's length limit

详情
英文摘要

ReDoS is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines has been exploited in successful attacks, as exemplified by major outages at Stack Overflow (2016) and Cloudflare (2019). These incidents motivate a fundamental question: Is it possible to construct matching engines that are provably efficient, running in (near-)linear time in the length of the input string? For classical regular expressions (REGEX), Thompson's construction yields a linear-time algorithm. However, practical engines support powerful features such as backreferences, which strictly extend the expressive power of REGEX but unfortunately increase the risk of ReDoS attacks. This paper investigates the fine-grained complexity of the string matching problem for regular expressions with backreferences (REWBs). Specifically, we consider $r$-use $k$-REWBs. On the hardness side, we show that the string matching problem for $k$-REWBs cannot be solved in $O(n^{2k-ε})$ time for any $ε> 0$ under SETH. We also prove that this problem is \textbf{W[2]}-hard when parameterized by the length of the REWB expression, strengthening the previous \textbf{W[1]}-hardness. Moreover, we prove that this problem for $2$-use $2$-REWBs cannot be solved in $n^{1+o(1)}$ time unless the triangle detection problem can be solved in that time. On the algorithmic side, we present an $O(n \log^2 n)$-time algorithm for $1$-use REWBs, which significantly improves upon the recent $O(n^2)$-time algorithm by Nogami and Terauchi (MFCS, 2025). Our algorithm employs several techniques including suffix trees, transition monoids of REGEXes, factorization forest data structures, and periodicity of strings.

2605.07286 2026-05-11 math.NA cs.LG cs.NA physics.comp-ph

Sparse Random-Feature Neural Networks with Krylov-Based SVD for Singularly Perturbed ODE

Kevin Kurian Thomas Vaidyan, Siddharth Rout

AI总结 本文研究了稀疏随机特征神经网络在求解奇异扰动常微分方程中的应用,针对传统方法中隐藏层激活矩阵秩低且病态严重的问题,提出了一种结合结构稀疏性的框架,并利用基于Krylov子空间的稀疏奇异值分解(sSVD)方法高效求解最小二乘问题。该方法在保持或提升求解精度的同时,显著提高了训练效率和鲁棒性,尤其适用于强对流的一维稳态对流-扩散方程。

详情
英文摘要

Random-feature neural networks (RFNNs), including architectures with fixed hidden layers and analytically determined output weights, offer fast training but often suffer from issues due to dense representations of the hidden layer activation. Their reliance on dense feature mappings and least squares solvers can limit scalability and numerical stability, particularly for high-dimensional or stiff systems. Specifically, the activation matrix is observed to be low-rank and extremely ill-conditioned. In this work, we propose a sparse framework for RFNNs that integrates structured sparsity into the hidden layer activations that increases the rank and employs Sparse Singular Value Decomposition (sSVD) for solving the resulting linear least squares problem scalably and efficiently while catering to the bad condition number. We explore the theory behind Lanczos-Golub-Kahan Bidiagonalization technique for sparse SVD and conduct some experiments to identify some limitations and justify the requirement for orthogonalization step in our application. Then, we demonstrate that the proposed method maintains or improves solution accuracy for solving the benchmark one-dimensional steady convection-diffusion equations case having stronger advection, while achieving substantial gains in training efficiency and robustness compared to standard dense implementations.

2605.07266 2026-05-11 cs.IT cs.LG math.IT

How Big Should a Wireless Foundation Model Be?

Wei-Lun Cheng, Wanjiun Liao

AI总结 本文研究了无线基础模型的规模应如何设定,指出其核心瓶颈是信道的非线性流形维度(dNL),这一物理约束决定了模型性能的提升上限。研究发现,当模型参数超过约3000万时,性能提升显著减缓,而通过测试时训练(TTT)等适应性方法,小型模型在参数仅为大模型1/8时仍能实现更优的通信性能。研究结果表明,无线AI的设计应以信道几何特性而非模型规模为核心依据。

详情
英文摘要

Wireless foundation models are rapidly emerging as a key enabler of AI-native communication systems, yet a fundamental question remains unanswered: how large should these models be? We present a principled, physics-grounded answer, showing that the intrinsic dimensionality (dNL, the nonlinear manifold dimension of the channel) acts as the fundamental bottleneck, defining the scaling ceiling once a data-sufficient regime is reached. This dimensionality is not a design choice but a physical constraint: Maxwell's equations, finite scatterers, and antenna aperture inherently constrain wireless propagation environments to a limited number of degrees of freedom -- spanning 5-35 across both real-world OTA measurements and 3GPP-standardized channel models we evaluate -- orders of magnitude below the ~1,000-dimensional semantic space of language. As a consequence, we propose a scaling framework for wireless AI: taking NTN satellite channels as a representative case (dNL ~= 14), scaling gains diminish rapidly beyond ~30 million parameters, entering a stochastic asymptote above 70M where a further 1.6x increase (96M->150M) yields only 0.52 dB. Beyond this ceiling, inference-time adaptation via pilot-aided test-time training (TTT) is far more effective: a compact 12M-parameter model surpasses a static 96M model by 9.9 dB (NMSE, SNR = 20 dB) / 7.6 dB (MCM, SNR = 10 dB) at one-eighth the parameters. With dNL distributions validated across real-world indoor massive MIMO measurements, our scaling laws and TTT gains are demonstrated through NTN satellite simulations, reframing wireless AI design: channel geometry -- not model size -- fundamentally governs the scaling laws of physical-layer wireless AI.

2605.07252 2026-05-11 cs.GR cs.CV cs.MM

PersonaGest: Personalized Co-Speech Gesture Generation with Semantic-Guided Hierarchical Motion Representation

Junchuan Zhao, Qifan Liang, Ye Wang

AI总结 PersonaGest 是一种用于个性化共语手势生成的两阶段框架,旨在生成与语音语义一致且符合用户指定风格的逼真身体动作。该方法通过语义引导的残差向量量化变分自编码器(RVQ-VAE)分离运动内容与手势风格,并利用语义感知的运动码本和对比学习增强内容与风格的解耦。在第二阶段,通过掩码生成变压器和风格残差变压器实现内容生成与风格控制,实验表明其在客观指标和用户感知研究中均达到先进水平。

Comments 26 pages, 10 figures, 12 tables

详情
英文摘要

Co-speech gesture generation aims to synthesize realistic body movements that are semantically coherent with speech and faithful to a user-specified gestural style. Existing VQ-VAE based co-speech gesture generation methods improve generation quality but fail to encode semantic structure into the motion representation or explicitly disentangle content from style, limiting both semantic coherence and personalization fidelity. We present PersonaGest, a two-stage framework addressing both limitations. In the first stage, a semantic-guided RVQ-VAE disentangles motion content and gestural style within the residual quantization structure, where a Semantic-Aware Motion Codebook (SMoC) organizes the content codebook by gesture semantics and contrastive learning further enforces content-style separation. In the second stage, a Masked Generative Transformer generates content tokens via a semantic-aware re-masking strategy, followed by a cascade of Style Residual Transformers conditioned on a reference motion prompt for style control. Extensive experiments demonstrate state-of-the-art performance on objective metrics and perceptual user studies, with strong style consistency to the reference prompt. Our project page with demo videos is available at https://danny-nus.github.io/PersonaGest/

2605.07158 2026-05-11 cs.IR cs.CL cs.LG

Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings

Junseon Yoo

AI总结 该研究探讨了文本嵌入在科学文献检索中的有效性,发现基于余弦相似度的检索方法在识别论文研究议程相关性方面存在显著缺陷。通过构建包含358万篇科学论文的增强引用图,并采用分层社区划分方法,研究发现主流嵌入模型在子领域层面表现尚可,但在具体研究议程层面匹配率大幅下降,仅15%-21%的检索结果与查询议程一致。进一步实验表明,基于引用数量的简单重排序方法在议程匹配上显著优于基于嵌入的检索方法,揭示了当前嵌入模型在科学检索任务中的局限性。

Comments 16 pages, 4 figures, 4 tables

详情
英文摘要

Vector search and retrieval-augmented generation (RAG) rest on the assumption that cosine similarity between text embeddings reflects conceptual relatedness. We measure where this assumption breaks. We build an augmented citation graph over 3.58M scientific papers and partition it via Leiden CPM at two granularities: sub-field (L1) and research-agenda (L2, hierarchical inside each L1). Four state-of-the-art embeddings (Gemini, Qwen3-8B, Qwen3-0.6B, SPECTER2) clear the L1 bar reasonably (45-52% top-10 same-rate) but stop working at L2: only 15-21% of top-10 neighbors share the query's research agenda. In absolute terms, 8 of every 10 retrieved papers are off-agenda. The failure is universal across eight scientific domains and all four models; SPECTER2, despite its citation-based contrastive training, is the weakest. As a diagnostic probe, we test whether the same augmented graph also functions as a retrieval signal: a deliberately simple citation-count rerank reaches 57.7% top-1 L2 on top of LLM-expanded Boolean retrieval and 59.6% on top of plain BM25, on 80 curated agenda queries -- about 9 points above the best cosine retriever (Gemini, 50.6%) and 20 points above BM25 alone (39.3%). The probe isolates a slice of the agenda-matching signal the graph carries but the embeddings miss, connecting recent theoretical limits on single-vector retrieval to a concrete failure mode of scientific RAG.

2605.07145 2026-05-11 cond-mat.mtrl-sci cs.CV

Fine-tuning a vision-language model for fracture-surface morphology recognition

Quanliang Liu, Jungtaek Kim, Kangwook Lee, Hyunseok Oh

AI总结 本文研究了如何通过微调视觉-语言模型(VLM)来提升对断口表面形貌的识别能力。作者使用包含13,168张开源文献图片的定制数据集,对开源模型Qwen3-VL-32B-Instruct进行微调,并通过GPT-5.2-Reasoning生成形态标注,结合人工采集和旋转增强进一步丰富数据。实验表明,该专用模型在100张人工标注的测试图像上实现了0.92的高精度,显著优于多个主流模型,展示了针对特定领域特征进行数据收集和模型微调的有效性。

详情
英文摘要

Vision-language models (VLMs) have shown strong potential for scientific image understanding, but general-purpose models often lack the domain-specific visual knowledge required for reliable materials characterization. In this work, we fine-tuned an open-source VLM (Qwen3-VL-32B-Instruct) for fracture-surface image analysis using a curated dataset of 13,168 open-source, literature-mined fracture-surface images. Morphology annotations were generated by GPT-5.2-Reasoning (high) from both the images and relevant excerpts of their source papers, and the dataset was further enriched with targeted manual collection and rotation-based augmentation. The resulting specialist model outperforms flagship proprietary multimodal models on a benchmark of 100 manually annotated images. It achieves a precision of 0.92, compared to 0.35 for the base Qwen3-VL-32B-Instruct, 0.58 for GPT-5.5-Reasoning (high), and 0.78 for Gemini 3.1 Pro-Reasoning (high). Dataset ablations show that manual collection of rare-feature images and augmentation via image rotation are both beneficial to improve recognition of less common fracture morphology features. We further discuss integrated use of the fine-tuned model with proprietary models to combine fracture-specific visual accuracy with broader multimodal reasoning for autonomous fractography. Although focused on fracture-surface images, this work demonstrates how VLMs can be adapted through targeted collection and fine-tuning on novel feature images to recognize those features and support downstream decision-making in autonomous microscopy workflows.

2605.07129 2026-05-11 cs.IR cs.AI cs.LG

RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation

Shijun Li, Wooseong Yang, Yu Wang, Tianxin Wei, Joydeep Ghosh

AI总结 本文提出了一种名为RRCM的排名驱动检索与推理框架,用于基于大语言模型(LLM)的推荐系统。该方法通过协作记忆和元数据记忆中的自然语言表示,灵活获取异构证据,避免了传统固定上下文构建策略和人工注入的局限性。RRCM采用基于最终推荐质量的排名奖励机制优化检索策略,实验表明其在推荐效果上显著优于传统方法和现有的LLM推荐模型。

详情
英文摘要

Large Language Models (LLMs) have emerged as a promising paradigm for next-generation recommender systems, offering strong semantic understanding and natural-language reasoning abilities. Despite recent progress, current LLM-based recommenders still face key challenges in constructing decision-relevant contexts from heterogeneous evidence. First, existing methods often rely on fixed context construction strategies: collaborative behavioral evidence and item-side metadata are typically incorporated through predefined prompts, static retrieval pipelines, or handcrafted injection mechanisms, making it difficult to determine what information is truly beneficial for each instance. Second, heterogeneous evidence introduces a severe context-efficiency bottleneck. Rich metadata and collaborative interaction records can quickly overwhelm the context window, while aggressive compression or heuristic filtering may discard fine-grained evidence critical for accurate recommendation. To address these challenges, we propose RRCM, a ranking-driven retrieval-and-reasoning framework over collaborative and metadata memories for LLM-based agentic recommendation. RRCM starts from a lightweight user-history context and learns whether to recommend directly, retrieve collaborative evidence, retrieve item metadata, or interleave both through reasoning. Both memories are represented in natural language and accessed through a unified retrieval interface, enabling flexible evidence acquisition without handcrafted CF injection or fixed retrieval rules. We optimize this memory-reading policy with an outcome-only ranking reward, instantiated using group relative policy optimization, so that retrieval decisions are directly driven by final top-k recommendation quality. Extensive experiments show that RRCM significantly outperforms traditional baselines and diverse LLM-based recommendation approaches.

2605.07125 2026-05-11 cs.IR cs.AI

An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation

Haoyu Han, Li Ma, Hanbing Wang, Bingheng Li, Daochen Zha, Chun How Tan, Huiji Gao, Xin Liu, Stephanie Moyerman, Sanjeev Katariya, Hui Liu, Jiliang Tang

AI总结 本文研究了序列推荐领域中现代生成式推荐模型是否真正需要复杂的建模能力,并提出了一种极其简单的图启发式方法,仅基于用户最近交互的物品和物品转换图进行候选推荐,无需序列编码器或生成目标即可达到或超越许多现代基线模型的性能。实验表明,现有基准数据集可能包含可被简单方法利用的“捷径”结构,使得复杂模型的优势被削弱,因此在使用基准测试评估推荐模型时应更加谨慎地选择数据集并进行诊断分析。

详情
英文摘要

Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these benchmarks actually require the advanced modeling capabilities that modern generative recommenders claim to provide? We conduct a benchmark audit with an intentionally simple graph heuristic. Starting from only the last one or two interacted items, it retrieves candidates from a few-hop item-transition graph and ranks them by item-feature similarity. Despite using no sequence encoder, generative objective, or training, this heuristic matches or outperforms many modern baselines, with relative NDCG@10 improvements of 38.10% and 44.18% over the best competing baseline on Amazon Review Sports and CDs. We show that this behavior reflects shortcut solvability rather than an artifact of one heuristic. We identify three shortcut structures that can make next-item prediction easier than expected: low-branching local transitions, feature-smooth transitions, and limited dependence on long user histories. These shortcuts need not appear together; even one or two strong signals can make simple local retrieval highly competitive, while weakening them makes the benefits of more sophisticated models clearer. Across 14 datasets, model rankings vary substantially with dataset properties, yet the heuristic remains competitive on 10 of them. Our findings suggest that strong performance on standard benchmarks does not always demonstrate advanced sequential, semantic, or generative modeling ability. We call for more careful dataset selection and dataset-level diagnostic analysis when using benchmarks to support claims about new recommendation models.

2605.07119 2026-05-11 stat.ML cs.LG

Classification Fields: Arbitrarily Fine Recursive Hierarchical Clustering From Few Examples

Yicen Li, Ruiyang Hong, Anastasis Kratsios, Haitz Sáez de Ocáriz Borde, Paul D. McNicholas

AI总结 该论文提出了一种名为“分类场”的无限深度分层聚类结构,用于从少量样本中学习递归生成细粒度层次化的聚类场。研究通过定义局部的父节点到子节点的细化规则,生成具有无限深度的聚类中心、Voronoi单元和层次结构的有向无环图。论文证明了所学模型在完成单元度量下的指数收敛性,并在实验中验证了其在生成分层结构、保持几何特性与路径度量方面的能力。

详情
英文摘要

Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale refinements that continue beyond the levels directly observed. We introduce classification fields: infinite-depth hierarchical cluster structures on $\mathbb{R}^d$ generated by a local parent-to-child refinement rule. A classification field generator maps each parent centre to an ordered, bounded, and separated tuple of child residuals. Together with a root and a scale factor, this rule recursively generates cluster centres, Voronoi cells, and a metric DAG encoding the hierarchy. Given only a finite prefix of such a hierarchy, we learn a classification field predictor that approximates the generator and can be rolled out to unseen depths. We prove exponential truncation convergence in the completed cell metric and ReLU realizability with width $O(\varepsilon^{-γ})$ and depth $\widetilde O(\varepsilon^{-3γ/2})$, where $γ=\log K/(-\log s)$, up to finite-window aspect-ratio factors. The approximation holds at the level of the induced compact metric structures, measured in the completed cell-metric Hausdorff distance. Experimental validation on matched CFG-generated hierarchies, IFS fractals, and image-induced recursive clustering hierarchies shows that learned predictors preserve ordered child slots, unordered geometry, and hierarchy-level path metrics under recursive rollout. These results support the claim that finite hierarchical observations can reveal local refinement rules capable of generating substantially deeper classification fields.

2605.07100 2026-05-11 stat.ML cs.LG

TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

Zhenhan Fang, Aixin Tan, Jian Huang

AI总结 TRACE 是一种基于扩散模型和流匹配模型的符合性预测框架,旨在为多维输出构建有效且信息丰富的预测区间。该方法通过运输对齐来定义非符合性分数,避免了显式似然评估和可逆变换的限制,仅通过沿随机运输轨迹的去噪或速度匹配误差来衡量候选输出与生成动态的契合度。实验表明,TRACE 能在保证边际覆盖率的同时,适应多模态和非凸条件分布,具有良好的实用性和泛化能力。

Comments 22 pages, 5 figures and 5 tables

详情
英文摘要

Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical performance critically depends on the choice of nonconformity score. Existing approaches often rely on restrictive geometric assumptions or require explicit likelihood evaluation and invertible transformations, limiting their applicability in complex generative settings. In this work, we introduce TRACE (TRansport Alignment Conformal Estimation), a conformal prediction framework that defines nonconformity through transport alignment in diffusion and flow matching models. Rather than evaluating likelihoods, we measure how well a candidate output aligns with the learned generative dynamics by averaging denoising or velocity-matching errors along stochastic transport trajectories. The resulting transport-based scores are scalar-valued and can be calibrated using split conformal prediction, yielding valid marginal coverage under exchangeability. We further analyze the statistical properties of the proposed scores and their sensitivity to computational budget. Experiments on synthetic and real datasets demonstrate valid coverage and show that the resulting regions adapt naturally to multimodal and non-convex conditional distributions.

2605.07097 2026-05-11 stat.ML cs.LG cs.NE math.LO math.ST stat.TH

Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

Anastasis Kratsios, Gregory Cousins, Haitz Sáez de Ocáriz Borde, Bum Jun Kim, Simone Brugiapaglia

AI总结 本文证明了在PAC学习模型中,一类广泛的前馈神经网络具有有限样本复杂度:任何固定层数且各层在o-极小结构中可定义的前馈网络,即使参数无界,也具有有限样本复杂度。该结果适用于标准的固定大小的多层感知机、卷积神经网络、图神经网络和固定序列长度的Transformer等现代非循环架构,涵盖了这些结构中常用的各类操作和层。研究指出,现代非循环网络的分布无关可学习性并非依赖于特定激活函数或架构特有VC维论证的例外性质,而是源于其“温顺”的前馈计算特性。

详情
英文摘要

We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting, even with unbounded parameters. This covers standard fixed-size MLPs, CNNs, GNNs, and transformers with fixed sequence length, together with the operations and layers typically used in such architectures, including linear projections, residual connections, attention mechanisms, pooling layers, normalization layers, and admissible positional encodings. Hence, distribution-free learnability for modern non-recurrent architectures is not an exceptional property of particular activations or architecture-specific VC arguments, but a consequence of tame feedforward computation. Our results reposition finite-sample PAC learnability as a baseline rather than a differentiator: they shift the focus of architectural comparison toward inductive biases, symmetries and geometric priors, scalability, and optimization behaviour.

2605.07065 2026-05-11 stat.ML cs.AI cs.LG econ.EM

Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks

Gandharv Patil, Keyi Tang, Raquel Aoki, Leo Guelman

AI总结 该研究针对个体处理效应的识别问题,提出了一种基于认知神经网络的因果EpiNets方法,用于在有限样本下更精确地估计个体层面的因果效应。该方法通过设计满足结构约束的神经网络架构,并结合精度校正的交集界推理,有效解决了传统估计方法在结构概率约束和极值偏差上的缺陷。实验表明,该方法在高维场景下能够保持名义覆盖度和约束有效性,优于现有估计器。

详情
英文摘要

Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and observational data. In finite samples, however, standard plug-in estimators systematically fail: they violate structural probability constraints and suffer from extremum bias induced by max-min operators, yielding spuriously narrow intervals. We propose a neural framework for finite-sample PNS estimation that resolves both pathologies. We introduce an anchored neural architecture that guarantees structural constraint satisfaction by construction. To correct extremum bias, we employ precision-corrected intersection-bound inference, leveraging Epistemic Neural Networks for scalable, high-dimensional uncertainty quantification. Empirical evaluations confirm that this approach maintains nominal coverage and exact constraint validity in high-dimensional regimes where standard estimators systematically undercover.

2605.07062 2026-05-11 cs.SE cs.AI

From Assistance to Agency: Rethinking Autonomy and Control in CI/CD Pipelines

Marcus Emmanuel Barnes, Taher A. Ghaleb, Safwat Hassan

AI总结 本文探讨了在持续集成与持续部署(CI/CD)流程中,人工智能代理从辅助角色向自主决策角色转变所带来的自主性与控制权问题。研究提出了“数据平面权威”与“控制平面权威”的区分,指出当前系统主要在有限自主性下运行,安全依赖外部治理机制而非代理自身的保证。文章识别出当前设计中的三大趋势,并提出了以控制平面安全与治理机制为核心的研究方向,以推动更可靠和高效的自主CI/CD系统发展。

Comments Accepted to the 3rd ACM International Conference on AI-Powered Software (AIware 2026), Main Track, Montreal, Canada, July 6-7, 2026. 5 pages

详情
英文摘要

AI agents are assuming active roles in Continuous Integration and Continuous Deployment (CI/CD) workflows, yet the research community lacks a shared vocabulary for describing what it means for CI/CD to be agentic, how much decision authority is delegated, and where control should reside. This paper presents a vision of agentic CI/CD in which the central challenge is not improving task performance but designing authority transfer, defined as the delegation of operational decisions from human-controlled pipelines to agent systems under specified constraints and recourse mechanisms. To structure this argument, we introduce a distinction between data-plane authority (localized interventions such as patch generation and test reruns) and control-plane authority (modifications to pipeline configuration, deployment policies, and approval gates). Drawing on research prototypes and industrial platforms, we show that current systems operate mainly at the data plane under bounded autonomy, with safety achieved through surrounding governance infrastructure rather than intrinsic agent guarantees. We identify three recurring patterns: constrained autonomy as the dominant design, external governance as the primary safety mechanism, and a widening gap between deployment momentum and evaluation methodology. We propose a research agenda in which control-plane safety and governance mechanisms represent the most urgent open problem, followed by formalization of autonomy boundaries, evaluation frameworks, and human--agent coordination.

2605.07052 2026-05-11 eess.SY cs.LG cs.SY

A Behavioral Framework for Data-Driven Modeling of Nonlinear Systems in Vector-Valued Reproducing Kernel Hilbert Spaces

Boya Hou, Maxim Raginsky

AI总结 本文将Jan Willems的行为方法推广到向量值再生核希尔伯特空间中的一类离散时间非线性系统,涵盖Volterra系统、自回归系统及Hammerstein型状态空间系统。研究提出了一个数据驱动建模框架,无需显式系统辨识即可实现系统仿真或控制,方法结合了最小范数插值和子空间辨识两种技术,为非线性系统的建模提供了新的理论基础和实用工具。

Comments 12 pages

详情
英文摘要

We generalize Jan Willems' behavioral approach to a class of discrete-time nonlinear systems in a vector-valued reproducing kernel Hilbert space (RKHS). Apart from linear time-invariant systems, this class covers nonlinear systems modeled by Volterra series and their autoregressive variants, as well as systems admitting Hammerstein-type state-space realizations. We apply the proposed framework to the problem of data-driven modeling of such systems, i.e., when simulation or control objectives for an unknown system are carried out without an explicit system identification step. To that end, we link the behavioral approach to two data-driven modeling methods in a vector-valued RKHS: (1) minimum-norm interpolation and (2) subspace identification.

2605.07046 2026-05-11 stat.ML cs.AI cs.LG

An Interpretable and Scalable Framework for Evaluating Large Language Models

Xinhao Qu, Qiang Heng, Hao Zeng, Xiaoqian Liu

AI总结 本文提出了一种可解释且可扩展的框架,用于评估大型语言模型(LLM),旨在解决传统基准测试方法忽视模型输出随机性和题目异质性的问题。该方法基于最大-最小化原理,将评估问题转化为一系列约束矩阵分解子问题,从而实现稳定高效的参数估计,并具有理论上的可识别性和收敛性保证。实验表明,该方法在多个合成和真实数据集上表现出更高的可扩展性和解释性,同时在速度和估计精度方面优于现有方法。

详情
英文摘要

Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) offers a principled framework for modeling latent model abilities and item characteristics, but conventional methods are computationally expensive and numerically unstable, limiting large-scale implementations. To address these challenges, we propose an interpretable and scalable framework for LLM evaluation based on the majorization-minimization principle. Our approach reformulates the problem as a sequence of constrained matrix factorization subproblems, enabling stable and efficient parameter estimation with theoretical guarantees for identifiability and convergence. Experiments on synthetic and real-world datasets, including MATH-500 and six Open LLM Leaderboard benchmarks, demonstrate that our method achieves superior scalability and interpretability. It delivers orders-of-magnitude speedups over competing methods while maintaining comparable or even higher estimation accuracy. Our results align with established scaling laws and offer insights into item difficulty and discrimination, informing more principled benchmark design.

2605.07034 2026-05-11 cs.CR cs.LG

Beyond the Wrapper: Identifying Artifact Reliance in Static Malware Classifiers using TRUSTEE

Riyazuddin Mohammed, Lan Zhang

AI总结 现代网络安全高度依赖基于静态机器学习的恶意软件分类器,但这类分类器容易受到文件打包等非语义修改的影响,导致分类可靠性下降。本文提出了一种基于可解释性工具TRUSTEE的框架,用于识别分类器对这些无关特征(如打包痕迹和PE元数据)的依赖,并通过手动分析验证了这一现象。研究发现,分类器往往将打包等特征误认为恶意行为,表明其对数据集构成高度敏感,该框架为构建更鲁棒、语义更明确的恶意软件检测模型提供了指导。

详情
英文摘要

Modern cybersecurity relies heavily on static machine-learning-based malware classifiers. However, transformations such as packing and other non-semantic modifications applied to executable files limit their reliability. Malware classifiers often learn these unnecessary artifacts rather than the true binary behavior because of the high association between maliciousness and packing. Moreover, these malware classifiers are black boxes, making it difficult to understand what they learn. To address this issue, we proposed a two-part framework using the post-hoc interpretability XAI tool TRUSTEE, followed by a manual analysis of the top features. We conducted several controlled experiments by varying the dataset composition ratios to understand their impact on the results. The top-ranked features across all experiments, identified by TRUSTEE, were predominantly packing artifacts, portable executable(PE) metadata, and n-grams at the string level, rather than malicious semantics. These results suggest that these malware classifiers are highly sensitive to dataset composition and can misinterpret packing as malicious behavior. Our proposed framework allows for the reproducible diagnosis of such biases and forms a guideline for building more robust and semantically meaningful malware detection models

2605.07029 2026-05-11 stat.ML cs.AI cs.LG stat.ME

BGM-IV: an AI-powered Bayesian generative modeling approach for instrumental variable analysis

Guyue Luo, Qiao Liu

AI总结 该论文提出了一种基于生成模型的贝叶斯方法BGM-IV,用于解决非线性工具变量回归中的因果效应估计问题。该方法通过构建一个具有因果结构的潜在空间,将非线性IV回归问题转化为后验推断问题,从而更有效地处理高维协变量和复杂的非线性关系。BGM-IV通过引入工具变量诱导的伪似然函数,克服了内生性问题,在多个基准数据集上表现出优越的性能,特别是在高维协变量场景中效果显著。

详情
英文摘要

Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed feature space or rely on learned representations within two-stage or moment-based procedures, which can struggle when the causal information is embedded in a high-dimensional representation. We propose BGM-IV, a latent Bayesian generative modeling approach that reframes nonlinear IV regression as posterior inference in a causally structured latent space. BGM-IV infers latent components that separately capture shared confounding structure, outcome-specific variation, treatment-specific variation, and covariate-only nuisance information. To account for endogeneity, BGM-IV replaces the confounded outcome likelihood with an IV-integrated pseudo-likelihood that averages over instrument-induced treatment values within the latent model. Across various benchmark datasets, BGM-IV remains competitive in the classical low-dimensional regime and performs best in high-dimensional covariate regimes. Together, these results show that structured latent generative modeling provides a principled and effective strategy to nonlinear IV estimation with rich covariates. The code of BGM-IV is available at https://github.com/liuq-lab/BGM-IV.

2605.07026 2026-05-11 q-bio.NC cs.AI cs.LG

Learning Cross-Atlas Consistent Brain Disorder Representations via Disentangled Multi-Atlas Functional Connectivity Learning

Minheng Chen, Chao Cao, Jing Zhang, Tianming Liu, Dajiang Zhu

AI总结 该研究旨在解决不同脑图谱下功能连接(FC)表示不一致的问题,提出了一种多图谱解耦功能连接学习框架(MADCLE)。该方法通过联合编码来自不同图谱的FC矩阵,学习图谱特异的疾病相关表示,并通过分布对齐促进跨图谱一致性。同时,通过协变量监督、图谱特异性重建和去相关约束,分离协变量和图谱依赖的残差因素,减少非疾病信息对疾病嵌入的干扰。实验表明,MADCLE在ADNI和ADHD-200数据集上优于多种现有方法,展示了其在异构图谱方案下基于FC的疾病识别中的有效性。

详情
英文摘要

Functional connectivity (FC) derived from resting-state fMRI is widely used to characterize large-scale brain network alterations in neurological and psychiatric disorders. However, FC construction critically depends on the choice of brain atlas, and different parcellations may emphasize distinct organizational features, leading to heterogeneous and sometimes inconsistent representations. Existing multi-atlas approaches partially alleviate this issue but often fuse atlas-derived features or predictions at a relatively shallow level, while single-atlas disentanglement methods do not explicitly address cross-atlas heterogeneity. We propose Multi-Atlas Disentangled Connectivity LEarning (MADCLE), a multi-branch representation learning framework that jointly encodes FC matrices derived from different brain atlases. Rather than introducing a single explicitly shared latent variable across parcellations, MADCLE learns atlas-wise disease-related representations and encourages them to be cross-atlas consistent through distributional alignment. Meanwhile, covariate-related and atlas-dependent residual factors are modeled separately using covariate similarity supervision, atlas-specific reconstruction, and decorrelation constraints, thereby reducing the leakage of non-disease and parcellation-dependent information into the disease-related embeddings. Experiments on the ADNI and ADHD-200 datasets suggest that MADCLE achieves competitive or improved performance compared with single-atlas baselines, multi-atlas GNN/Transformer models, and recent multi-atlas consistency frameworks. These results support the potential value of structured disentanglement for FC-based disorder identification under heterogeneous parcellation schemes.

2605.06989 2026-05-11 stat.AP cs.AI cs.LG stat.ME

Drawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data

Pedro Henrique Ramos Pinto, Maria Jullyanna Ferreira Marques, Luiz Carlos Serramo Lopez

AI总结 该研究探讨了K均值聚类在心理测量数据中的应用,指出其传统方法虽广泛用于识别心理子群和类型,但并未检验这些群组是否真实存在。通过构建受控的模拟数据集并分析国际心理测量数据集SMARVUS,研究发现即使在没有真实子群结构的连续高斯潜在空间中,K均值仍能生成稳定且视觉上连贯的聚类结果,揭示了其在心理空间划分中的潜在有效性。

Comments Methodological study on K-means clustering in psychometric data using simulated and empirical datasets

详情
英文摘要

K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets. We then extend the analysis to the SMARVUS dataset, a large international psychometric dataset comprising survey responses from university students across 35 countries, to evaluate whether similar geometric partitioning patterns emerge in empirical psychological data. By contrasting simulated and empirical data, this paper argues that K-means can produce stable and visually coherent clustering solutions even in continuous Gaussian latent spaces without true subgroup structure.

2605.06988 2026-05-11 cs.MA cs.IT cs.RO math.IT

The Cost of Consensus: Malignant Epistemic Herding and Adaptive Gating in Distributed Multi-Agent Search

David Farr, Iain Cruickshank, Kate Starbird, Jevin West

AI总结 本文研究了分布式多智能体系统中共识达成的成本问题,重点分析了在通信受限环境下,智能体之间如何通过协调与信息共享形成一致的环境认知。研究提出了“认知对齐”这一关键概念,揭示了通信频率与内容对集体信念状态的联合影响,并指出仅凭传统协调指标难以检测这种对齐程度,从而为设计高效、鲁棒的分布式搜索与协作策略提供了新思路。

详情
英文摘要

Distributed agents in real-world settings frequently must coordinate under uncertainty with only partial observations. Coordination is necessary to share beliefs to aid in task completion, but communication costs bandwidth, introduces latency, and if done poorly, can degrade collective reasoning. This tension is especially acute in bandwidth-constrained deployments such as distributed sensing networks, autonomous reconnaissance, and collaborative cyber defense, where excessive transmission carries direct operational costs. Existing work has focused on multi-agent exploration and communication strategies, but not on how communication frequency and content jointly shape the collective belief state. Central to this challenge is the degree to which agents maintain compatible internal beliefs about the environment, a property we term \textit{epistemic alignment}. When agents share beliefs effectively, they converge on correct hypotheses; when communication is poorly designed, agents may converge confidently on wrong ones. We formalize this distinction and show it is not detectable from coordination metrics alone such as Jensen-Shannon Divergence or rate to consensus.

2605.06981 2026-05-11 cs.IR cs.CL

Bridging Textual Profiles and Latent User Embeddings for Personalization

Zhaoxuan Tan, Xiang Zhai, Yan Zhu, Meng Jiang, Mohamed Hammad

AI总结 本文研究如何将可解释的文本用户画像与隐式用户嵌入相结合,以提升个性化推荐的效果。为此,作者提出了BLUE框架,通过强化学习将基于语言模型生成的文本画像与基于嵌入的推荐目标对齐,从而在保持可解释性的同时提升推荐性能。实验表明,BLUE在多个数据集上优于现有方法,尤其在跨领域迁移和个性化问答任务中表现突出。

详情
英文摘要

Personalized systems rely on user representations to connect behavioral history with downstream recommendation applications. Existing methods typically employ either supervised latent user embeddings, which are effective for retrieval but difficult to interpret, or textual user profiles, which are interpretable but challenging to optimize for downstream utility due to lack of direct supervision. To bridge this gap, we present BLUE, a reinforcement learning framework that unifies these two forms of user representation by aligning language-based user profiles with embedding-based recommendation objectives. Given a user interaction history, BLUE leverages a profiler Large Language Model (LLM) to generate textual profiles, while an embedding model provides reward signals. This encourages the resulting textual representations to move closer to positive items and farther from negative ones in the embedding space. We further introduce a text-space supervision signal based on next-item prediction, ensuring the learned profiles remain both semantically meaningful and highly effective for downstream retrieval. Experiments on Amazon Reviews 2023 and Google Local Reviews in zero-shot sequential recommendation settings demonstrate that BLUE consistently outperforms strong baselines under both frozen and trainable embedding conditions. Notably, BLUE achieves clear gains in cross-domain transfer, highlighting the strong generalization ability of the learned user profiles. Furthermore, these generated profiles provide superior personalized context for question answering compared to raw user histories or alternative profile optimization methods. Overall, these results show that BLUE provides an effective way to unify interpretable textual profiling with discriminative latent embeddings for personalization.

2605.06976 2026-05-11 stat.ML cs.LG stat.CO

A Differentiable Bayesian Relaxation for Latent Partial-Order Inference

Dongqing Li, Geoff K. Nicholls, Shiyi Sun, You Luo

AI总结 许多排序和代理轨迹数据集以线性顺序记录,但实际上其潜在结构是部分有序的。本文提出了一种可微分的贝叶斯松弛方法,用于从这类轨迹中推断潜在的部分顺序关系。该方法通过引入平滑替代品,将不连续的偏序关系和边界可行性条件转化为连续的后验分布,从而支持基于梯度的MCMC和变分推断,并在实验中表现出良好的推断精度和计算效率。

详情
英文摘要

Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true prerequisites. We introduce a differentiable relaxation for latent partial-order inference from such traces. Starting from a hard frontier-constrained model of noisy linear extensions, we replace discontinuous product-order precedence and binary frontier feasibility with smooth surrogates, yielding a continuous posterior that preserves closure-level partial-order semantics and supports gradient-based MCMC and variational inference. We prove soft transitivity, sharp-limit frontier recovery, and convergence to the hard likelihood. Experiments on synthetic data, records of social dominance relations, and cloud-agent traces show close posterior fidelity to hard MCMC on small instances and improved runtime--accuracy trade-offs on larger problems.

2605.06971 2026-05-11 eess.SP cs.AI cs.SY eess.SY

Decentralized Time-Varying Optimization for Streaming Data via Temporal Weighting

Muhammad Faraz Ul Abrar, Nicolò Michelusi, Erik G. Larsson

AI总结 本文研究了在分布式网络中处理流数据的去中心化时变优化问题,提出了一种基于时间加权的优化方法,以追踪所有节点历史数据构成的时变目标函数的最小值。通过分析受限通信和计算预算下的去中心化梯度下降(DGD)算法,揭示了跟踪误差的组成,并针对均匀加权和指数衰减加权两种策略进行了理论分析,揭示了去中心化结构对跟踪性能的影响。实验验证了理论结果的正确性。

详情
英文摘要

Classical optimization theory largely focuses on fixed objective functions, whereas many modern learning systems operate in dynamic environments where data arrive sequentially and decisions must be updated continuously. In this work, we study optimization with streaming data over a distributed network of agents. We adopt a structured, weight-based formulation that explicitly captures the streaming-data origin of the time-varying objective: at each time step, every agent receives a new sample, and the network seeks to track the minimizer of a temporally weighted objective formed from all samples observed across the network so far. We focus on decentralized gradient descent (DGD) with a limited communication/computation budget, where at each time step, only a limited number of DGD iterations can be performed before the objective changes again. For strongly convex and smooth losses, we analyze the tracking error with respect to the time-varying minimizer through a fixed-point theory lens. Our analysis reveals that the tracking error decomposes into a fixed-point tracking term and a bias term induced by data heterogeneity across agents. We specialize the analysis to two natural weighting strategies: uniform weights, which treat all samples equally, and exponentially discounted weights, which geometrically decay the influence of older data. Under uniform weighting, DGD tracks the fixed-point at a rate $\mathcal{O}(1/t)$, whereas discounted weighting yields a non-vanishing fixed-point tracking floor controlled by the discount factor. In both cases, decentralization induces an additional non-zero bias floor under a constant step size. We validate our theoretical findings through numerical simulations.

2605.06965 2026-05-11 cs.CY cs.AI cs.HC

AI and Consciousness: Shifting Focus Towards Tractable Questions

Iulia-Maria Comsa

AI总结 随着基于语言的人工智能系统日益拟人化,它们是否具有主观体验的问题变得愈发紧迫。本文指出,直接探讨人工智能是否具有意识这一根本问题目前因缺乏公认的意识科学理论而难以解决,但围绕“感知到的AI意识”这一相关议题的研究则更具可行性且对社会影响深远。文章呼吁将研究重点转向探讨人们如何感知AI的意识及其影响,并强调清晰准确的沟通对于应对AI意识相关不确定性的重要性。

详情
英文摘要

As language-based AI systems become more anthropomorphic, the question of whether they can have subjective experience is increasingly pressing. I focus here on the tractability of research questions in the space of AI consciousness. I argue that the fundamental problem of whether AI systems can be conscious is currently intractable in its direct form, given the absence of a universally accepted scientific theory of consciousness, as well as the historical open-endedness of the philosophical mind-body problem. In contrast, questions around the adjacent subject of perceived AI consciousness are tractable, timely, and highly consequential for society. The general public is increasingly open to the possibility of consciousness in AI systems and routinely adopts the vocabulary of human cognition and subjective experience to describe them. This phenomenon is already driving societal shifts across user experience, ethical standards, and linguistic norms. I therefore propose an increased research focus on uncovering the causes and effects of perceived AI consciousness, which ultimately shape how we see our own human subjective experience relative to artificial entities. To support this, I map the current landscape of AI consciousness perception and discuss its key potential drivers and societal consequences. Finally, I urge developers, decision-makers, and the broader scientific community to commit to clear and accurate communication regarding the topic of AI consciousness, explicitly acknowledging its inherent uncertainties.

2605.06963 2026-05-11 cs.HC cs.AI cs.CL cs.IR

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

Anna Ostrowska, Michał Kukla, Gabriela Majstrak, Jan Opala, Sebastian Pergała, Jan Skwarek, Anna Wróblewska

AI总结 本文介绍了一款基于 Moodle 的 AI 教学与学习助手插件,旨在从表面学习转向深度理解。该系统采用检索增强生成(RAG)技术,结合教师提供的教学材料生成准确、无幻觉的教育内容,并通过双中心设计为学生提供苏格拉底式互动辅导,为教师提供监督内容生成的工作空间。评估结果表明,该系统在忠实度和用户推荐率方面表现优异,验证了其在提升学习深度和教学可靠性方面的有效性。

Comments 5 pages, accepted as demo paper at IJCAI 2026

详情
英文摘要

This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the Ragas (LLM-as-a-Judge) framework and a preliminary user study confirms its effectiveness, achieving faithfulness scores up to 0.97 and a 4.00/5.00 recommendation rate.

2605.06959 2026-05-11 stat.ML cs.LG math.ST stat.TH

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

Haitham Kanj, Kiryung Lee

AI总结 本文提出了一种基于自适应块梯度下降(ABGD)算法的参数化分段线性回归方法,其核心思想是将分段线性函数表示为最大仿射函数(DoMA)的差。通过非渐近的局部收敛分析,证明了在子高斯协变量和噪声分布下ABGD的线性收敛性,并展示了其在噪声环境下所需的样本复杂度及无噪声情况下的精确恢复能力。实验结果验证了理论分析,并表明该方法在实际数据集上具有竞争力。

详情
英文摘要

This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of max-affine (DoMA) functions. A non-asymptotic local convergence analysis for ABGD is provided under sub-Gaussian covariate and noise distributions. To initialize ABGD, we adapt a prior algorithm originally developed for the simpler setting of max-affine functions. When suitably initialized, ABGD converges linearly to an $ε$-accurate estimate given $\tilde{\mathcal{O}}(d\max(σ_z/ε,1)^2)$ observations where $σ_z^2$ denotes the noise variance. This implies exact recovery given $\tilde{\mathcal{O}}(d)$ samples in the noiseless case. Also, such a rate is shown to be minimax optimal up to logarithmic factors. Synthetic numerical results corroborate the theoretical guarantees for ABGD. We also observe competitive performance compared to the state-of-the-art methods on real-world datasets.

2605.06929 2026-05-11 physics.optics cs.LG

Physics-Based Flow Matching for Full-Field Prediction of Silicon Photonic Devices

Joseph Quaratiello, Anthony Rizzo

AI总结 本文提出了一种基于物理的生成模型PIC-Flow,用于高效预测光子器件的全场电磁场分布,以替代传统的高计算成本的有限差分时域(FDTD)仿真。该方法结合了条件流匹配生成框架、实值U-Net网络结构以及基于亥姆霍兹方程的物理约束损失函数,实现了对光子器件在给定几何结构和工作波长下的电磁场分布的准确预测。实验表明,该模型在多种光子器件上具有良好的泛化能力,并为未来实现更广泛适用、更快速的光子器件设计提供了基础。

Comments 11 pages, 4 figures

详情
英文摘要

Designing photonic integrated circuits requires accurate electromagnetic field simulations, which remain computationally expensive even for simple device geometries. We present PIC-Flow, a generative neural surrogate that predicts electromagnetic field distributions for photonic devices given their geometry and operating wavelength as an alternative to costly finite-difference time-domain (FDTD) simulations. Our approach combines three key ideas: (i) conditional flow matching as the generative framework, learning a velocity field that transports Gaussian noise to physically valid field solutions; (ii) a real-valued U-Net operating on split real and imaginary field channels; and (iii) physics-constrained training through a Helmholtz residual loss enforcing $\nabla^2 E_z + k_0^2 \varepsilon E_z = 0$. We introduce an interface-aware masking scheme for the Helmholtz residual that excludes dielectric boundary pixels where finite-difference stencil errors dominate, yielding a physically meaningful compliance metric. The data set consists of 22,500 ground-truth FDTD simulations split evenly between multimode interferometers, Y-branches, and directional couplers at $λ=1.55\,μ$m in an 80/10/10 split between training, validation, and test sets. We evaluate ablations on the network against the held out test devices and also show that the model generalizes to held out device classes such as S-bends, tapers, and cascaded Y-branches. Rather than a drop-in replacement for FDTD, this work establishes a foundation that, with broader data coverage, more compute, and further training optimization, could scale toward broadband, device-agnostic field prediction with dramatically improved runtime for rapid design-space exploration of complex photonic devices and circuits.

2605.06920 2026-05-11 cs.GT cs.AI cs.LG

In-Context Credit Assignment via the Core

Keegan Harris, Siddharth Prasad, Asher Trockman

AI总结 本文研究了如何在上下文中对AI生成内容(如代码、新闻文章、短视频)的创作者进行合理的信用分配问题,提出了一种基于合作博弈论中核心解概念的激励对齐机制。该方法通过确保每个创作者群体获得与其独立生成价值相匹配的补偿,实现信用分配的稳定性。研究还开发了用于近似计算最小核心的算法,并在网页检索任务中验证了其有效性,相比其他方法显著减少了大型语言模型的调用次数。

详情
英文摘要

We propose incentive-aligned mechanisms for in-context credit assignment: the task of assigning credit for AI-generated content (e.g. code, news articles, short-form videos) among creators whose intellectual property appears in the context window. Our approach is based on the least core solution concept from cooperative game theory, which distributes value in a way that is as stable as possible by ensuring that no subset of creators is significantly under-compensated relative to the value they could generate on their own. We develop algorithms for approximating the least core, which leverage novel routines for constraint seeding and constraint separation. On a web retrieval credit assignment task, we find that our approaches are capable of approximating the least core using orders of magnitude fewer LLM calls compared to alternative methods.

2605.06918 2026-05-11 cs.MA cs.LG

Generalising Travel Time Prediction To Varying Route Choices In Urban Networks

Łukasz Gorczyca, Kacper Drozd, Michał Bujak, Rafał Kucharski

AI总结 本文研究了如何在城市交通网络中准确预测不同路线选择下的出行时间,提出了一个通用出行时间预测模型GenTTP。该模型基于图神经网络,能够捕捉复杂的时空交通模式以及路线选择与出行时间之间的微观关系,有效解决了现有方法在面对不同路线分配时预测能力不足的问题。该工作填补了出行时间预测领域在不同路线分布下泛化能力不足的关键空白。

详情
英文摘要

Previous methods that predict system-wide travel time, predominantly grounded in graph neural networks, remain limited to typical and recurring demand patterns. While they successfully predict future congestion following daily commute, they inherently approximate a single demand realisation and fail to capture varying route choices. In this work, we propose a Generalised Travel Time Predictor (GenTTP) that successfully differentiates route choices and offers accurate flow and travel time predictions. Our framework learns to uncover complex spatiotemporal traffic patterns and microscopic relationships between route choices and the resulting travel times. This addresses a critical gap: the lack of travel time prediction models that generalise across varying route assignments, where the same demand can produce substantially different network-wide outcomes depending on how travellers are distributed over available paths.