arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1968
2605.15085 2026-05-15 stat.ML cs.LG stat.AP stat.ME

From Data to Action: Accelerating Refinery Optimization with AI

Dániel Pfeifer, Ábrahám Papp, Tibor Bernáth, Tamás Zoltán Varga, Márk Czifra, Botond Szilágyi, Edith Alice Kovács

AI总结 本文研究了如何利用人工智能加速炼油厂优化过程,针对线性规划(LP)方法在实际应用中面临的解释与应用难题,提出结合机器学习的方法以提升决策支持。核心方法包括改进的异常检测工具和高维数据处理策略,有效识别了炼油厂调度与规划中的业务机会与数据供应错误,为优化结果的可信度提供了新的洞察。

Comments 34 pages, 17 figures

详情
英文摘要

Nowadays refinery optimization utilizes sheer amounts of data, which can be handled with modern Linear Programming (LP) software, but the interpreting and applying the results remains challenging. Large petrochemical companies use massive models, with hundreds of thousands of input matrix elements. The LP solution is mathematically correct, but simplifications are made in the model, and data supply errors may occur. Therefore, further insight is needed to trust the results. The LP solver does not have a memory, so additional understanding could be gained by analyzing historical data and comparing it to the current plan. As such, machine learning approaches were suggested to support decision making based on the LP solution. Among these, Anomaly Detection tools are proposed to be used in tandem with the LP output. A transformed version of the popular ECOD methodology is applied. New methods are proposed to handle high-dimensional data: choosing the most informative pairs. Then, this is used alongside two 2D Anomaly Detection algorithms, revealing several business opportunities and data supply errors in the MOL refinery scheduling and planning architecture.

2605.15082 2026-05-15 stat.ML cs.LG math.ST stat.TH

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel

AI总结 本文研究了在样本数量少于精确预测所需的情况下,如何通过学习预测器发现数据中的低维结构。具体来说,考虑从有限数据对中恢复多指标多项式模型 $f^*(x)=h(Ux)$ 的问题,其中输入仅通过未知的 $r$ 维中心子空间的投影来影响输出。作者提出了一种简单方法:拟合核岭回归(KRR)并计算拟合预测器的平均梯度外积(AGOP),证明其前 $r$ 个特征向量可准确恢复该子空间,即使预测误差仍较大时也成立。研究还表明,当目标函数的低阶部分包含所有预测相关方向时,子空间恢复所需的样本量远低于精确预测所需的样本量,揭示了预测与表示之间的差异。

Comments 95 pages, 12 figures

详情
英文摘要

We study a prototypical situation when a learned predictor can discover useful low-dimensional structure in data, while using fewer samples than are needed for accurate prediction. Specifically, we consider the problem of recovering a multi-index polynomial $f^*(x)=h(Ux)$, with $U\in\mathbb{R}^{r\times d}$ and $r\ll d$, from finitely many data/label pairs. Importantly, the target function depends on input $x$ only through the projection onto an unknown $r$-dimensional central subspace. The algorithm we analyze is appealingly simple: fit kernel ridge regression (KRR) to the data and compute the Average Gradient Outer Product (AGOP) from the fitted predictor. Our main results show that under reasonable assumptions the top $r$-dimensional eigenspace of AGOP provably recovers the central subspace, even in regimes when the prediction error remains large. Specifically, if the target function $f^*$ has degree $p^*$, it is known that $n\asymp d^{p^*}$ samples are necessary for KRR to achieve accurate prediction. In contrast, we show that if a low degree $p$ component of $f^*$ already carries all relevant directions for prediction, subspace recovery occurs in the much lower sample regime $n\asymp d^{p+δ}$ for any $δ\in(0,1)$. Our results thus demonstrate a separation between prediction and representation, and provide an explanation for why iterative kernel methods such as Recursive Feature Machines (RFM) can be sample-efficient in practice.

2605.15058 2026-05-15 cs.NE cs.AI

NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework

Alessio Caviglia, Filippo Marostica, Roberta Bardini, Alessandro Savino, Stefano Di Carlo

AI总结 本文综述了脉冲神经网络(SNN)训练算法的最新进展,系统梳理了包括替代梯度反向传播、局部学习规则、生物启发可塑性机制等在内的多种方法,并提出了一个统一的分类体系。为支持可复现的研究,作者开发了开源框架NeuroTrain,实现了多种典型算法,提供了统一、模块化且可扩展的基准测试平台。该工作整合了分散的文献资源,明确了当前挑战与未来研究方向,为高效、可扩展的SNN训练提供了重要参考。

详情
英文摘要

The rapid expansion of spiking neural networks (SNNs) has led to a proliferation of training algorithms that differ widely in biological inspiration, computational structure, and hardware suitability. Despite this progress, the field lacks a unified, fine-grained taxonomy that systematically organizes these approaches and clarifies their conceptual relationships. This survey provides a comprehensive taxonomy of SNN training algorithms, spanning surrogate-gradient backpropagation, local and three-factor learning rules, biologically inspired plasticity mechanisms, ANN-to-SNN conversion pipelines, and non-standard optimization strategies. We analyze each class in terms of its computational principles, learning signals, and locality properties. To support reproducible research, we release NeuroTrain, an open-source snnTorch-based framework that implements a representative set of these algorithms within a unified, modular, and extendable framework, enabling consistent benchmarking across datasets, architectures, and training regimes. By consolidating fragmented literature and providing a reusable benchmarking framework, this survey identifies common patterns, highlights open challenges, and outlines promising directions for future work on scalable, efficient SNN training.

2605.15032 2026-05-15 eess.SP cs.LG

Multi-Block Attention for Efficient Channel Estimation in IRS-Assisted mmWave MIMO

Mehrdad Momen-Tayefeh, Mehrshad Momen-Tayefeh, Maryam Sabbaghian

AI总结 本文研究了智能反射表面(IRS)辅助毫米波MIMO系统中的高效信道估计问题,提出了基于深度学习的多块注意力(MBA)框架,用于降低训练开销并提升估计精度。该方法通过选择性关闭IRS元素并结合两阶段网络结构,分别进行空间相关性恢复和噪声抑制,有效减少了信道估计中的误差传播。实验表明,MBA方法在保持低计算复杂度的同时,显著降低了导频开销并提升了信道估计性能。

详情
Journal ref
IEEE Transactions on Communications, vol. 73, no. 12, pp. 13891-13903, Dec. 2025
英文摘要

Intelligent Reflecting Surfaces (IRSs) are a promising technology for enhancing the spectral and energy efficiency of millimeter-wave (mmWave) multiple-input multiple-output (MIMO) systems. In these systems, accurate channel estimation remains challenging due to the passive nature of IRS elements and the high pilot overhead in large-scale deployments. This paper presents a deep learning-based Multi-Block Attention (MBA) framework for efficient cascaded channel estimation in IRS-assisted mmWave MIMO systems that utilize orthogonal frequency division multiplexing (OFDM). First, we show the optimality of the discrete Fourier transform (DFT) and Hadamard matrices as phase configurations for least squares (LS) estimation. To reduce training overhead, we selectively deactivate IRS elements and compensate for induced feature loss using a two-stage architecture: (i) a Convolutional Attention Network (CAN) for spatial correlation recovery and (ii) a Complex Multi-Convolutional Network (CMN) for noise suppression. The MBA architecture mitigates error propagation through attention-guided feature refinement and denoising. Simulation results indicate that the MBA method reduces pilot overhead by up to 87% compared to the LS estimator. Additionally, at signal-to-noise ratios of 10 dB, our proposed method achieves approximately 51% lower normalized mean squared error (NMSE) than leading methods. It also maintains low computational complexity and adapts effectively to various propagation environments.

2605.15030 2026-05-15 cs.CR cs.AI

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

Tri Cao, Yulin Chen, Hieu Cao, Yibo Li, Khoi Le, Thong Nguyen, Yuexin Li, Yufei He, Yue Liu, Shuicheng Yan, Bryan Hooi

AI总结 本文提出WARD,一种针对网络代理的对抗性鲁棒防御方法,用于抵御HTML内容或视觉界面中的提示注入攻击。WARD基于大规模数据集WARD-Base和专门设计的攻击数据集WARD-PIG进行训练,并引入了A3T自适应对抗训练框架,通过记忆驱动的攻击者与防御者共进化过程提升模型鲁棒性。实验表明,WARD在分布外基准上实现了接近完美的召回率,保持较低的误报率,并在分布偏移和针对性攻击下仍表现出高效稳定的防御性能。

Comments Code and models: https://github.com/caothientri2001vn/WARD-WebAgent

详情
英文摘要

Web agents can autonomously complete online tasks by interacting with websites, but their exposure to open web environments makes them vulnerable to prompt injection attacks embedded in HTML content or visual interfaces. Existing guard models still suffer from limited generalization to unseen domains and attack patterns, high false positive rates on benign content, reduced deployment efficiency due to added latency at each step, and vulnerability to adversarial attacks that evolve over time or directly target the guard itself. To address these limitations, we propose WARD (Web Agent Robust Defense against Prompt Injection), a practical guard model for secure and efficient web agents. WARD is built on WARD-Base, a large-scale dataset with around 177K samples collected from 719 high-traffic URLs and platforms, and WARD-PIG, a dedicated dataset designed for prompt injection attacks targeting the guard model. We further introduce A3T, an adaptive adversarial attack training framework that iteratively strengthens WARD through a memory-based attacker and guard co-evolution process. Extensive experiments show that WARD achieves nearly perfect recall on out-of-distribution benchmarks, maintains low false positive rates to preserve agent utility, remains robust against guard-targeted and adaptive attacks under substantial distribution shifts, and runs efficiently in parallel with the agent without introducing additional latency.

2605.15026 2026-05-15 cs.OS cs.AI cs.PF

SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

Georgios Liargkovas, Mihir Nitin Joshi, Hubertus Franke, Kostis Kaffes

AI总结 SemaTune 是一种基于大语言模型的语义感知在线操作系统调优框架,旨在提升长期运行服务的性能。该方法通过整合系统参数、监控数据、配置历史等信息构建决策上下文,结合快速和慢速反馈回路进行调优,并在更新前进行类型验证,从而在保证模型开销和系统稳定性的同时,实现对操作系统控制语义的理解。实验表明,SemaTune 在多个基准测试中显著优于传统方法,提升了稳定阶段的性能表现,并有效避免了系统性能的严重下降。

Comments 17 pages, 12 figures

详情
英文摘要

Online OS tuning can improve long-running services, but existing controllers are poorly matched to live hosts. They treat scheduler, power, memory, and I/O controls as black-box variables and optimize a scalar reward. This view ignores cross-knob policy structure, breaks down when application metrics are unavailable, and can send a running service into degraded regions that persist after the bad setting is removed. We present SemaTune, a host-side framework for steady-state OS tuning with bounded language-model guidance. SemaTune turns knob schemas, telemetry, current configuration, recent action--response history, and retrieved prior runs into a compact decision context. A fast loop proposes low-latency updates, a slower loop periodically revises the search strategy, and every proposed change passes through typed validation before reaching kernel or sysctl interfaces. This lets the controller reason about OS-control meaning and indirect performance signals while keeping model cost, latency, and authority constrained. We evaluate SemaTune on 13 live workloads from five benchmark suites while tuning up to 41 Linux parameters. Across the suite, SemaTune improves stable-phase performance by 72.5\% over default settings and by 153.3\% relative to the strongest non-LLM baseline. A 30-window session costs about \$0.20 in model calls. With only host-level metrics, SemaTune still outperforms baselines given direct application objectives by 93.7 percentage points, while avoiding severe degraded regions reached by structure-blind exploration.

2605.14983 2026-05-15 cs.GT cs.AI cs.CY cs.MA

Agreement, Diversity, and Polarization Indices for Approval Elections

Piotr Faliszewski, Jitka Mertlová, Krzysztof Sornat, Stanisław Szufa, Tomasz Wąs

AI总结 本文研究了如何通过指数量化批准选举中选民之间的一致性、多样性和极化程度。提出了一系列归一化的指数,用于衡量选举中这些特征,并分析了它们的性质。研究还利用这些指数绘制了新的批准选举图谱,并比较了来自多个真实数据集的选举之间的异同。

详情
英文摘要

An index is a function that given an election outputs a value between 0 and 1, indicating the extent to which this election has a particular feature. We seek indices that capture agreement, diversity, and polarization among voters in approval elections, and that are normalized with respect to saturation. By the latter we mean that if two elections differ by the fraction of candidates approved by an average voter, but otherwise are of similar nature, then they should have similar index values. We propose several indices, analyze their properties, and use them to (a) derive a new map of approval elections, and (b) show similarities and differences between various real-life elections from Pabulib, Preflib and other sources.

2605.13338 2026-05-15 cs.CR cs.AI

Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models

Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao, Licheng Pan, Hui Xue, Zhixuan Chu

AI总结 本文研究了大型推理模型(LRMs)在面对不完整或逻辑不一致输入时容易“过度思考”的漏洞,该行为会导致推理过程冗长且耗能,可能被用于发起拒绝服务(DoS)攻击。作者提出了一种基于分层遗传算法的黑盒攻击框架,通过系统性地扰动输入问题的逻辑结构,诱导模型产生更长的推理过程。实验表明,该方法在多个先进推理模型上显著放大了输出长度,并具有良好的迁移性,凸显了“过度思考”作为现代推理系统共有的潜在安全风险。

Comments Accepted at ICML 2026. Code available at: https://github.com/EndlessCao/Overthink-HGA

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), PMLR 306, 2026
英文摘要

Large Reasoning Models (LRMs) are increasingly integrated into systems requiring reliable multi-step inference, yet this growing dependence exposes new vulnerabilities related to computational availability. In particular, LRMs exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces, when confronted with incomplete or logically inconsistent inputs. This behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS) style resource exhaustion. In this work, we investigate this attack surface and propose an automated black-box framework that induces overthinking in LRMs by systematically perturbing the logical structure of input problems. Our method employs a hierarchical genetic algorithm (HGA) operating on structured problem decompositions, and optimizes a composite fitness function designed to maximize both response length and reflective overthinking markers. Across four state-of-the-art reasoning models, the proposed method substantially amplifies output length, achieving up to a 26.1x increase on the MATH benchmark and consistently outperforming benign and manually crafted missing-premise baselines. We further demonstrate strong transferability, showing that adversarial inputs evolved using a small proxy model retain high effectiveness against large commercial LRMs. These findings highlight overthinking as a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.

2512.16768 2026-05-15 stat.ML cs.LG math.PR

On The Hidden Biases of Flow Matching Samplers

Soon Hoe Lim

AI总结 本文研究了流匹配(Flow Matching)采样器在有限样本情况下的隐藏偏差问题。通过将总体期望替换为样本平均,并用有限样本替代目标分布,作者提出了一种经验流匹配模型的层次结构。针对仿射条件流,文中推导了精确的经验最小化解,并识别出一种平滑插值机制,使得终端分布恰好为核混合估计量。研究揭示了经验流匹配中的多重偏差来源,包括目标分布替换带来的统计目标变化、经验最小化解可能不是梯度场,以及边际路径无法唯一确定粒子动力学等问题。

Comments 41 pages

详情
英文摘要

Flow matching (FM) constructs continuous-time ODE samplers by prescribing probability paths between a base distribution and a target distribution. In this note, we study FM through the lens of finite-sample plug-in estimation. In addition to replacing population expectations by sample averages, one may replace the target distribution itself by a finite-sample surrogate, ranging from the empirical measure to a smoothed estimator. This viewpoint yields a natural hierarchy of empirical FM models. For affine conditional flows, we derive the exact empirical minimizer and identify a smoothed plug-in regime in which the terminal law is exactly a kernel-mixture estimator. This plug-in perspective clarifies several coupled finite-sample biases of empirical FM. First, replacing the target law by a finite-sample surrogate changes the statistical target. Second, the empirical minimizer is generally not a gradient field, even when each conditional flow is. Third, a fixed empirical marginal path does not determine a unique particle dynamics: one may add extra vector fields whose probability flux has zero divergence without changing the marginal path. For Gaussian affine conditional paths, we give explicit families of such flux-null corrections. Finally, the source distribution provides a primary mechanism controlling upper tails of kinetic energy. In particular, Gaussian bases yield exponential upper-tail bounds for instantaneous and integrated kinetic energies, whereas polynomially tailed bases yield corresponding polynomial upper-tail bounds.

2502.03672 2026-05-15 physics.comp-ph cs.LG cs.NA math.NA

Physically consistent predictive reduced-order modeling by enhancing Operator Inference with state constraints

Hyeonghun Kim, Boris Kramer

AI总结 本文提出了一种增强算子推断方法的新策略,通过在降阶模型中嵌入状态约束,以提高对复杂多物理系统(如焦炭燃烧)的预测稳定性与物理一致性。该方法引入基于关键性能指标的正则化超参数选择方式,并在实际应用中展示了其在稳定性、准确性和外推能力方面的优越性。

Comments 33 pages, 13 figures

详情
英文摘要

Numerical simulations of complex multiphysics systems, such as char combustion considered herein, yield numerous state variables that inherently exhibit physical constraints. This paper presents a new approach to augment Operator Inference -- a methodology within scientific machine learning that enables learning from data a low-dimensional representation of a high-dimensional system governed by nonlinear partial differential equations -- by embedding such state constraints in the reduced-order model predictions. In the model learning process, we propose a new way to choose regularization hyperparameters based on a key performance indicator. Since embedding state constraints improves the stability of the Operator Inference reduced-order model, we compare the proposed state constraints-embedded Operator Inference with the standard Operator Inference and other stability-enhancing approaches. For an application to char combustion, we demonstrate that the proposed approach yields state predictions superior to the other methods regarding stability and accuracy. It extrapolates over 200\% past the training regime while being computationally efficient and physically consistent.

2412.14291 2026-05-15 math.OC cs.LG stat.ML

Projected gradient methods for nonconvex and stochastic smooth optimization: new complexities and auto-conditioned stepsizes

Guanghui Lan, Tianjiao Li, Yangyang Xu

AI总结 本文提出了一类新的投影梯度(PG)方法,用于在凸紧集上最小化光滑但不一定凸的目标函数。研究引入了“自适应条件化”投影梯度(AC-PG)方法,在无需输入梯度的Lipschitz常数或进行线搜索的情况下,达到了与现有最佳方法相当的迭代复杂度。此外,文章将PG方法推广到随机优化场景,提出了随机投影梯度(SPG)和方差缩减随机梯度(VR-SPG)方法,并在不同Oracle设置下获得了新的复杂度界,同时为这些方法设计了自适应步长策略,保证了收敛性。

详情
英文摘要

We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the constant-stepsize PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieves the same iteration complexity without requiring the input of the Lipschitz constant of the gradient or any line search procedure. The key idea is to estimate the Lipschitz constant using first-order information gathered from the previous iterations, and to show that the error caused by underestimating the Lipschitz constant can be properly controlled. We then generalize the PG methods to the stochastic setting, by proposing a stochastic projected gradient (SPG) method and a variance-reduced stochastic gradient (VR-SPG) method, achieving new complexity bounds in different oracle settings. We also present auto-conditioned stepsize policies for both stochastic PG methods and establish comparable convergence guarantees.

2304.03641 2026-05-15 math.OC cs.LG cs.NA math.NA

A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints

Ganzhao Yuan

AI总结 本文研究了在正交约束下的非光滑复合优化问题,这类问题在统计学习和数据科学中有广泛应用,但因其目标函数非光滑且约束非凸,求解较为困难。作者提出了一种基于块坐标下降的新方法OBCD,每次迭代更新解矩阵的$k$行($k \geq 2$),通过求解一个小规模的非光滑优化子问题实现。该方法具有计算高效、可行性强的特点,并在理论上证明了其更新方案的完备性及收敛性,实验结果表明该方法优于现有方法。

Comments Future versions of this paper can be found at arXiv:2304.03641

详情
英文摘要

Nonsmooth composite optimization with orthogonality constraints has a wide range of applications in statistical learning and data science. However, this problem is challenging due to its nonsmooth objective and computationally expensive nonconvex constraints. In this paper, we propose a new approach called \textbf{OBCD}, which leverages block coordinate descent to address these challenges. \textbf{OBCD} is a feasible method with a small computational footprint. In each iteration, it updates \(k\) rows of the solution matrix, where \(k \geq 2\), by globally solving a small nonsmooth optimization problem under orthogonality constraints. We prove the completeness of the proposed update scheme, showing that row-wise orthogonal updates can reach any feasible point from any feasible initialization. We further prove that the limit points generated by \textbf{OBCD}, referred to as global block-\(k\) stationary points, offer stronger optimality than standard critical points. Furthermore, we show that \textbf{OBCD} finds an \(ε\)-block-\(k\) stationary point with an iteration complexity of \(\mathcal{O}(1/ε)\). Additionally, under the Kurdyka--Lojasiewicz (KL) inequality, we establish the non-ergodic convergence rate of \textbf{OBCD}. We also demonstrate how novel breakpoint search methods can be used to solve the subproblems arising in \textbf{OBCD}. Empirical results show that our approach consistently outperforms existing methods.

2012.14425 2026-05-15 cs.CR cs.LG

Vendor-Conditioned Contrastive Learning for Predicting Organizational Cyber Threat Targets

Benjamin M. Ampel

AI总结 该研究旨在识别网络攻击中针对的组织目标,提出了一种基于CySecBERT的对比学习框架TRACE,通过结合时间信息和供应商条件优化组织分类与表示学习,提升在时间分布偏移下的鲁棒性。研究利用涵盖九个漏洞数据库和黑客论坛的多源大规模语料库,构建了包含129,126个样本的七类组织数据集,在时间分布外测试中取得了97.00%的宏F1分数,显著优于多种经典机器学习和深度学习方法。

Comments 6 pages, 3 figures

详情
英文摘要

Cyberattacks cause billions of dollars in damage annually, with malicious hackers often sharing exploit code and techniques on underground forums. Identifying which organizations are targeted by these exploits is critical for proactive Cyber Threat Intelligence (CTI). To address that gap, we propose Temporal Representation and Classification of Exploits (TRACE), a vendor-conditioned contrastive learning framework built on CySecBERT that jointly optimizes organizational target classification and vendor-coherent representations while evaluating robustness under temporal distribution shift. Unlike prior work limited to small, single-source datasets, we leverage a large-scale, multi-source corpus spanning 9 exploit databases and hacker forums, comprising 352,866 posts collected over three decades, yielding a 129,126-sample dataset across seven organizational categories. In the temporal out-of-distribution evaluation, TRACE achieves macro F1=97.00\%, substantially outperforming 17 benchmark classical ML methods, deep learning with GloVe/FastText embeddings, and pretrained transformer models.

2605.14960 2026-05-15 cs.GR cs.CG cs.CV

Meschers: Geometry Processing of Impossible Objects

Ana Dodik, Isabella Yu, Kartik Chandra, Jonathan Ragan-Kelley, Joshua Tenenbaum, Vincent Sitzmann, Justin Solomon

AI总结 本文研究了如何用计算机准确表示“不可能物体”——一类在现实中无法存在但人类可以感知的几何构造。传统方法通过切割或弯曲深度轴来实现,但会导致局部几何变化或光照处理困难,影响后续图形处理。为此,作者提出了一种名为 Meschers 的网格表示方法,基于离散外微分几何理论,能够有效支持渲染、光照和距离计算等应用,并实现了对不可能物体的逆向渲染,优于传统方法。

详情
Journal ref
ACM Trans. Graph. 44, 4, Article 70 (August 2025)
英文摘要

Impossible objects, geometric constructions that humans can perceive but that cannot exist in real life, have been a topic of intrigue in visual arts, perception, and graphics, yet no satisfying computer representation of such objects exists. Previous work embeds impossible objects in 3D, cutting them or twisting/bending them in the depth axis. Cutting an impossible object changes its local geometry at the cut, which can hamper downstream graphics applications, such as smoothing, while bending makes it difficult to relight the object. Both of these can invalidate geometry operations, such as distance computation. As an alternative, we introduce Meschers, meshes capable of representing impossible constructions akin to those found in M.C. Escher's woodcuts. Our representation has a theoretical foundation in discrete exterior calculus and supports the use-cases above, as we demonstrate in a number of example applications. Moreover, because we can do discrete geometry processing on our representation, we can inverse-render impossible objects. We also compare our representation to cut and bend representations of impossible objects.

2605.14941 2026-05-15 eess.SP cs.HC cs.LG

nASR: An End-to-End Trainable Neural Layer for Channel-Level EEG Artifact Subspace Reconstruction in Real-Time BCI

Shantanu Sarkar, Jose L. Contreras-Vidal

AI总结 该研究提出了一种端到端可训练的神经网络层nASR,用于实时脑机接口(BCI)中的通道级EEG伪影子空间重构。传统ASR方法依赖固定阈值参数,易影响有效神经信号,而nASR通过引入两个可学习的阈值参数,实现了伪影检测与后续解码的联合优化,有效提升了信号质量与解码性能。实验表明,nASR在分类准确率和推理速度上均优于传统方法,适用于对延迟和性能要求较高的实时BCI应用。

Comments Preprint. Submitted to IEEE SMC 2026 (under review)

详情
英文摘要

Electroencephalogram (EEG) signals are highly susceptible to artifacts, resulting in a low signal-to-noise ratio which makes extraction of meaningful neural information challenging. Artifact Subspace Reconstruction (ASR) is one of the most widely used artifact filtering techniques in EEG-based BCI applications, owing to its real-time applicability. ASR reconstructs artifact-free signals by operating in Principal Component (PC) space within sliding windows. However, ASR performance is critically sensitive to its threshold parameter - an incorrect threshold risks removing task-relevant neural features alongside artifacts. Furthermore, since PCs are linear combinations of all channels, subspace reconstruction in PC space may alter the underlying data structure, potentially discarding essential neural information. To address these limitations, we propose nASR, a novel end-to-end trainable Keras layer that jointly optimizes artifact rejection and downstream decoding. nASR introduces two trainable threshold parameters: K, which governs artifact detection in PC variance space, and L, which quantifies eigen-spread to pinpoint the primary artifact--contributing channels, enabling selective channel-level reconstruction that preserves clean channel information. An ablation study comprising five model variants (m01 - m05), evaluated across two subjects from the BCI Competition IV Dataset 1, confirms that nASR variants consistently outperform traditional ASR on test classification metrics, while achieving a 6-8x reduction in inference time, making nASR a strong candidate for real-time BCI applications demanding both low latency and high decoding performance.

2605.14939 2026-05-15 physics.plasm-ph cs.LG

Real-time virtual circuits for plasma shape control via neural network emulators

Alasdair Ross, George K. Holt, Kamran Pentland, Adriano Agnello, Nicola C. Amorisco, Pedro Cavestany, Aran Garrod, Timothy Nunn, Charles Vincent, Graham McArdle

AI总结 该研究旨在解决托卡马克等离子体形状控制中实时调节多个强耦合参数的问题,提出了一种基于神经网络的虚拟电路(VC)实时生成方法。通过构建包含一百多万个模拟Grad–Shafranov平衡态的数据库,研究开发了能够实时生成状态感知虚拟电路的神经网络模型,从而实现对等离子体形状参数的独立控制。该方法不仅提高了控制精度和鲁棒性,还为复杂等离子体场景下的实时控制提供了可扩展的解决方案。

详情
英文摘要

Reliable position and shape control in tokamak plasmas requires accurate real-time regulation of several strongly coupled shape parameters. The control vectors that disentangle these couplings, referred to as \textit{virtual circuits} (VCs), enable independent shape parameter control for a specific Grad--Shafranov (GS) equilibrium. Numerical calculation of VCs is not currently feasible in real time, therefore VCs are usually computed prior to each experiment, using a small number of reference GS equilibria sampled along the desired scenario trajectory, with each VC used to control the plasma within a preset time interval. While effective near the reference equilibrium, this approach can lead to degraded performance as the plasma departs from the reference equilibrium and/or from the desired trajectory, and it complicates the design of robust control strategies for rapidly evolving plasma configurations. In this paper, we construct neural-network-based emulators of plasma shape parameters from which VCs can be derived, to provide the MAST Upgrade (MAST-U) plasma control system with state-aware VCs in real-time. To do this, we develop an extensive library of over a million simulated GS equilibria, covering a substantial portion of the MAST-U operational space. These emulators provide differentiable functions whose gradients can be rapidly computed, enabling the derivation of accurate VCs for real-time shape control. We perform extensive verification of the emulated VCs by testing whether they disentangle the control problem. The neural-network-based approach delivers high accuracy and orthogonality across a diverse range of equilibria. This work establishes the physical validity of emulated VCs as a scalable and general alternative to schedules of precomputed VCs.

2605.14883 2026-05-15 eess.SP cs.HC cs.LG

BCI-Based Assessment of Ocular Response Time Using Dynamic Time Warping Leveraging an RDWT-Driven Deep Neural Framework

Shantanu Sarkar, Sai Shashank Gandavarapu, Jeff Feng, Saurabh Prasad, Reza Khanbabaie, Jose L. Contreras-Vidal

AI总结 该研究提出了一种基于脑机接口(BCI)的方法,用于评估眼部反应时间,以辅助轻度脑外伤(mTBI)的早期诊断。研究结合了脑电图(EEG)与增强现实(AR)引导的前庭/眼动筛查(VOMS)任务,利用冗余离散小波变换(RDWT)驱动的深度神经网络框架处理EEG信号,并通过动态时间规整(DTW)计算眼部反应时间。实验结果表明,该方法在区分不同受试者的眼动行为方面具有显著效果,尤其在追踪任务中表现出良好的时间差异识别能力,为多模态mTBI评估提供了新的技术途径。

Comments Submitted to IEEE SMC 2026 (under review)

详情
英文摘要

Mild traumatic brain injury (mTBI) is a prevalent condition that remains difficult to diagnose in its early stages. Oculomotor dysfunction is a well-established marker of mTBI, motivating the development of portable tools that capture both eye-movement behavior and underlying neurophysiology. In this work, we present an initial framework that integrates electroencephalogram (EEG) with augmented-reality (AR)-based Vestibular/Ocular Motor Screening (VOMS) tasks to estimate subject-specific ocular response times. Pre-processed EEG signals, obtained through band-pass filtering and average referencing, are analyzed using a Redundant Discrete Wavelet Transform (RDWT)-driven deep neural framework. The RDWT coefficients are subjected to trainable zero-phase convolutional filtering and reconstructed into the time domain via inverse RDWT, followed by channel-wise temporal and spatial filtering using 2D convolution layers and convolutional-LSTM-based decoding. An ablation study demonstrates that wavelet-domain filtering serves as an effective denoising strategy, improving prediction performance. Sliding-window predictions were validated using Pearson correlation (>= 0.5), and Dynamic Time Warping (DTW) was subsequently used to estimate ocular response times. DTW-derived metrics revealed significant inter-subject differences across all VOM tasks, supported by Mann-Whitney U tests. Cross-correlation analysis further revealed task-dependent temporal behaviors: pursuit tasks exhibited reactive tracking, whereas saccades showed anticipatory responses. Overall, the results highlight pursuit tasks as particularly informative for distinguishing timing differences and demonstrate the potential of RDWT-based EEG features combined with DTW metrics for multimodal mTBI assessment.

2605.14879 2026-05-15 cs.MA cs.GT cs.LG

Temporal Fair Division in Multi-Agent Systems: From Precise Alternation Metrics to Scalable Coordination Proxies

Nikolaos Al. Papadopoulos

AI总结 本文研究多智能体系统中时间维度上的公平分配问题,提出了一种新的度量方法——旋转周期性(RP),以及滑动窗口度量ALT,用于评估多智能体在重复资源竞争中的时间公平性。研究通过引入“完美交替”(PA)作为时间公平的典型解,将时间公平分解为旋转得分(RS)和等待期评估(WPE)两个子指标,显著提升了计算效率。实验表明,RP在保持高区分度的同时,相比ALT具有更高的计算效率,两者结合可为时间公平分配提供有效的诊断工具。

Comments 15 pages, 3 figures, 8 tables. Submitted to ACM Transactions on Economics and Computation, Special Issue on Fair Division

详情
英文摘要

A plethora real-world environments require agents to compete repeatedly for the same limited resource, calling for a temporal notion of fairness judged across entire interaction histories. This paper advances the theory of temporal fair division by introducing Rotational Periodicity (RP), a family of lightweight metrics, alongside the ALT family of sliding-window measures, within a unified framework for repeated multi-agent resource competition. We formalise the Multi-Agent Battle of the Exes (MBoE) as a repeated fair division instance and establish Perfect Alternation (PA) as its canonical temporally fair solution, drawing connections to proportionality, envy-freeness, and n-periodic round-robin allocation. RP decomposes temporal fairness into two complementary sub-measures: Rotational Score (RS) and Waiting Periods Evaluation (WPE), achieving O(nu+n) time complexity versus the O(nu*n) of ALT, where nu is the episode count and n the agent count. Empirical evaluation across n in {2,3,5,8,10} reveals three findings. First, both RP and ALT expose a coordination failure invisible to traditional metrics: Q-learning agents perform worse than random policies by 10-73% on RP and 7-35% on CALT, while Reward Fairness remains misleadingly high (above 0.92 for n>=3). Second, RP achieves 12-25x computational speedup over ALT, growing with n. Third, the two families are complementary: ALT provides richer discrimination for small populations; RP scales reliably where ALT becomes intractable. Together they form a diagnostic toolkit for temporal fair division.

2605.14866 2026-05-15 cs.SE cs.AI

Towards In-Depth Root Cause Localization for Microservices with Multi-Agent Recursion-of-Thought

Lingzhe Zhang, Tong Jia, Kangjin Wang, Chiming Duan, Minghua He, Rongqian Wang, Xi Peng, Meiling Wang, Gong Zhang, Renhai Chen, Ying Li

AI总结 随着微服务系统因动态交互和运行环境变化而日益复杂,故障频率不断上升,准确的根因定位(RCL)对系统可靠性至关重要。现有基于传统机器学习和深度学习的方法在可解释性和跨部署迁移能力方面存在不足,而基于大语言模型(LLM)的方法虽有所改进,但仍面临上下文爆炸和串行推理结构导致的诊断效率与准确性问题。本文提出RCLAgent,一个基于多智能体递归思维的微服务根因定位框架,通过并行推理分解诊断过程,显著提升了定位精度和推理效率。

详情
英文摘要

As modern microservice systems grow increasingly complex due to dynamic interactions and evolving runtime environments, they experience failures with rising frequency. Ensuring system reliability therefore critically depends on accurate root cause localization (RCL). While numerous traditional machine learning and deep learning approaches have been explored for this task, they often suffer from limited interpretability and poor transferability across deployments. More recently, large language model (LLM)-based methods have been proposed to address these issues. However, existing LLM-based approaches still face two fundamental limitations: context explosion, which dilutes critical evidence and degrades localization accuracy, and serial reasoning structures, which hinder deep causal exploration and impair inference efficiency. In this paper, we conduct a comprehensive study of both how human SREs perform root cause localization in practice and why existing LLM-based methods fall short. Motivated by these findings, we introduce RCLAgent, an in-depth root cause localization framework for microservice systems that realizes multi-agent recursion-of-thought with parallel reasoning. RCLAgent decomposes the diagnostic process along the trace graph by assigning each span to a Dedicated Agent and organizing agents recursively and in parallel according to the graph topology, with the final diagnosis obtained by synthesizing the Root-Level Diagnosis Report and the Global Evidence Graph. Extensive experiments on multiple public benchmarks demonstrate that RCLAgent consistently outperforms state-of-the-art methods in both localization accuracy and inference efficiency.

2605.14860 2026-05-15 math.OC cs.LG

A Non-Monotone Preconditioned Trust-Region Method for Neural Network Training

Andrea Angino, Bindi Çapriqi, Shega Likaj, Ken Trotti, Rolf Krause

AI总结 本文提出了一种非单调预条件信任区域方法(NAPTS),用于大规模神经网络训练。该方法基于加性预条件信任区域策略(APTS),引入非单调接受准则和非线性加性施瓦茨预条件子,结合并行子域修正与全局粗空间方向,有效提升了训练效率。实验表明,NAPTS在保持精度的同时,将CPU时间减少了30%,并显著降低了被拒绝的迭代步数。

Comments 7 pages, 2 figures,

详情
英文摘要

Training deep neural networks at scale can benefit from domain decomposition, where the network is split into subdomains trained in parallel and coupled by a global trust-region mechanism. Building on the Additively Preconditioned Trust-Region Strategy (APTS), we propose a non-monotone variant with a nonlinear additive Schwarz preconditioner that combines parallel subdomain corrections with global coarse-space directions. A windowed acceptance criterion allows controlled objective increases, avoiding needless rejection of effective coarse steps. The resulting non-monotone APTS (NAPTS) preserves accuracy while reducing CPU time by 30\% and cutting rejected steps to one third of those in APTS.

2605.14851 2026-05-15 cs.MA cs.AI

IFPV: An Integrated Multi-Agent Framework for Generative Operational Planning and High-Fidelity Plan Verification

Zhigao Huang, Zhengqing Hu, Dong Chen, Shaohan Zhang, Zhao Jin, Bo Zhang, Han Wu, Mingliang Xu

AI总结 本文提出了一种集成多智能体框架IFPV,用于生成作战计划并进行高保真度的计划验证。该框架包含两个紧密耦合的模块:多视角分层智能体MPHA用于生成作战行动序列,以及对抗认知仿真引擎ACSE用于高保真度的对抗验证。实验表明,IFPV在任务成功率和操作成本方面优于传统方法,验证模块也显著提升了对候选计划潜在漏洞的识别能力。

Comments Submitted to Neurocomputing

详情
英文摘要

Operational plan generation and verification are critical for modern complex and rapidly changing battlefield environments, yet traditional generation and verification methods still respectively face the challenges of generation infeasibility and verification insufficiency. To alleviate these limitations, we propose an Integrated Multi-Agent Framework for Generative Operational Planning and High-Fidelity Plan Verification (IFPV). IFPV consists of two tightly coupled modules: Multi-Perspective Hierarchical Agents (MPHA) for generative operational planning and an Adversarial Cognitive Simulation Engine (ACSE) for high-fidelity adversarial plan verification. MPHA decomposes commander intent into executable multi-platform tactical action sequences through the collaboration of Pathfinder, Analyst, and Planner agents. ACSE introduces an opponent equipped with a customized world model, which predicts the future evolution of mission-critical platforms and conducts dynamic counteractions against candidate plans. Simulation experiments in the Asymmetric Combat Tactic Simulator (ACTS) show that IFPV improves mission success by 19.4% and reduces operational cost by 41.7% compared with a single-step large language model (LLM) planning baseline. Compared with a traditional rule-based validator, ACSE increases the average suppression rate by 31.8%, indicating that the proposed verification environment is stricter and more discriminative in revealing the latent vulnerabilities of candidate plans. The code for IFPV can be found at https://github.com/zhigao3ks/IFPV.

2605.14828 2026-05-15 stat.ML cs.LG stat.ME

K-Models: a Flexible and Interpretable Method for Ordinal Clustering with Application to Antigen-Antibody Interaction Profiles

Giulia Patanè, Alessandra Menafoglio, Alexander Krauth, Peter Fechner, Luca Dede', Bianca Maria Colosimo, Federica Nicolussi

AI总结 该研究提出了一种名为K-Models的新型聚类方法,用于处理具有序数关系的函数型数据,旨在在保证聚类性能的同时提升模型的可解释性。该方法通过引入序数约束,估计生成观测函数型数据的随机过程中的关键要素,从而更准确地识别数据的内在结构。研究通过仿真和实际应用(如抗原-抗体相互作用的反射传感器数据)验证了该方法的有效性,展示了其在具有潜在序数结构的数据分析中的优越性和实用性。

详情
英文摘要

Existing clustering methods for functional data often prioritize partitioning accuracy over interpretability, making it challenging to extract meaningful insights when the data-generating process follows a specific underlying structure and an ordinal relationship among clusters is suspected. This work introduces K-Models, a novel framework that integrates ordinal constraints and estimates key underlying elements of the random process generating the observed functional profiles, improving both interpretability and structure identification. The proposed method is evaluated through simulations and real-world applications. In particular, it is tested on Region of Interest (ROI) curves, which represent reaction profiles from a reflectometric sensor monitoring biomolecular interactions, such as antigen-antibody binding. These curves represent changes in reflected light intensity over time at multiple measurement spots with immobilized antigens during analyte exposure, capturing the binding dynamics of the system. The goal is to identify intrinsic signal patterns solely from the observed dynamics, making this dataset an ideal benchmark for assessing the added interpretability of the proposed approach. By incorporating structural assumptions into the clustering process, K-Models enhances interpretability while maintaining performance comparable to state-of-the-art techniques, providing a valuable tool for analyzing functional data with an underlying ordinal structure.

2605.14786 2026-05-15 cs.CR cs.AI cs.HC cs.LG

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

William Lugoloobi, Samuelle Marro, Jabez Magomere, Joss Wright, Chris Russell

AI总结 随着基于大语言模型(LLM)的智能体越来越多地代表用户浏览网页,一个自然的问题是:网站能否被动识别出驱动该智能体的底层模型?本研究发现,通过被动的JavaScript追踪器捕获智能体的动作和交互时间,可以以高达96%的F1分数识别出使用的模型。研究还表明,基于智能体行为训练的分类器能够跨不同规模和家族的模型泛化,并且仅需少量交互轨迹即可训练出高效的分类器。尽管引入随机时间延迟可以降低分类器性能,但重新训练后仍能恢复识别效果。

详情
英文摘要

As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four web environments spanning information retrieval and shopping tasks, we show that an agent's actions and interaction timings, captured via a passive JavaScript tracker, are sufficient to identify the underlying model with up to 96\% F1. We formalise this attack surface by demonstrating that classifiers trained on agent actions generalise across model sizes and families. We further show that strong classifiers can be trained from few interaction traces and that agent identity can be inferred early within an episode. Injecting randomised timing delays between actions substantially degrades classifier performance, but does not provide robust protection: a classifier retrained on delayed traces largely recovers performance. We release our harness and a labelled corpus of agent traces \href{https://github.com/KabakaWilliam/known_actions}{here}.

2605.14750 2026-05-15 cs.CR cs.AI

EVA: Editing for Versatile Alignment against Jailbreaks

Yi Wang, Hongye Qiu, Yue Xu, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang

AI总结 大型语言模型(LLMs)和视觉语言模型(VLMs)虽然表现出色,但仍易受越狱攻击的影响,攻击者通过文本或视觉触发器绕过安全防护。为解决现有防御方法带来的计算开销大和性能下降问题,本文提出EVA框架,通过直接模型编辑技术精准修正模型中导致越狱行为的关键神经元,无需大规模重训练,从而在保持模型原有能力的同时有效消除有害行为。实验表明,EVA在多种模型上均优于现有方法,为部署后的安全对齐提供了高效且精确的解决方案。

Comments IEEE TPAMI 2026

详情
英文摘要

Large Language Models (LLMs) and Vision Language Models (VLMs) have demonstrated impressive capabilities but remain vulnerable to jailbreaking attacks, where adversaries exploit textual or visual triggers to bypass safety guardrails. Recent defenses typically rely on safety fine-tuning or external filters to reduce the model's likelihood of producing harmful content. While effective to some extent, these methods often incur significant computational overheads and suffer from the safety utility trade-off, degrading the model's performance on benign tasks. To address these challenges, we propose EVA (Editing for Versatile Alignment against Jailbreaks), a novel framework that pioneers the application of direct model editing for safety alignment. EVA reframes safety alignment as a precise knowledge correction task. Instead of retraining massive parameters, EVA identifies and surgically edits specific neurons responsible for the model's susceptibility to harmful instructions, while leaving the vast majority of the model unchanged. By localizing the updates, EVA effectively neutralizes harmful behaviors without compromising the model's general reasoning capabilities. Extensive experiments demonstrate that EVA outperforms baselines in mitigating jailbreaks across both LLMs and VLMs, offering a precise and efficient solution for post-deployment safety alignment.

2605.14741 2026-05-15 eess.SY cs.AI cs.SY

Addressing Terminal Constraints in Data-Driven Demand Response Scheduling

Maximilian Bloor, Martha White, Ehecatl Antonio del Rio Chanona, Calvin Tsay

AI总结 本文研究了在数据驱动的需求响应调度中如何满足终端约束的问题,提出了一种结合目标空间规划(GSP)与深度确定性策略梯度(DDPG)的方法,通过学习离散子目标的时序抽象模型,有效传递长期价值,提升调度效果。该方法在模拟的空气分离系统中验证了其在提高样本效率和满足终端存储约束方面的优势,缓解了传统方法在长期约束处理上的不足。

Comments Accepted to IFAC World Congress 2026

详情
英文摘要

Electrified chemical processes are incentivized by exposure to time-varying electricity markets to operate flexibly, but participating in demand response schemes can require satisfying terminal constraints over long horizons. Specifically, terminal constraints may be required when computing optimal schedules in order to preserve dynamic stability. Model-based optimization methods are computationally costly, and data-driven scheduling via reinforcement learning (RL) faces severe credit-assignment challenges. We integrate Goal-Space Planning (GSP) with Deep Deterministic Policy Gradient (DDPG), using learned temporally abstract models over discrete subgoals to propagate value across extended horizons. Using a simulated air separation benchmark, we demonstrate the proposed approach improves sample efficiency over standard DDPG while satisfying terminal storage constraints, mitigating myopic control behavior.

2605.14731 2026-05-15 cs.GR cs.CV cs.SD

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars

Xiaoyu Zhan, Xinyu Fu, Chenghao Yang, Xiaohong Zhang, Dongjie Fu, Pengcheng Fang, Tengjiao Sun, Xiaohao Cai, Hansung Kim, Yuanqi Li, Jie Guo, Yanwen Guo

AI总结 本文提出了一种统一的稀疏运动建模方法UMo,用于实现高保真、实时的共语义数字人动画生成。UMo通过统一处理文本、音频和运动信息,结合空间稀疏的专家混合框架和时间稀疏的关键帧设计,实现了高效实时的密集重建,能够在保证时间一致性和高保真度的同时提升生成质量。此外,UMo采用多阶段训练策略和针对性的音频增强方法,有效提升了语音-运动对齐的精度和语义一致性,为实时共语义动画提供了实用的解决方案。

详情
英文摘要

Speech-driven gestures and facial animations are fundamental to expressive digital avatars in games, virtual production, and interactive media. However, existing methods are either limited to a single modality for audio motion alignment, failing to fully utilize the potential of massive human motion data, or are constrained by the representation ability and throughput of multimodal models, which makes it difficult to achieve high-quality motion generation or real-time performance. We present UMo, a unified sparse motion modeling architecture for real-time co-speech avatars, which processes text, audio, and motion tokens within a unified formulation. Leveraging a spatially sparse Mixture-of-Experts framework and a temporally sparse, keyframe-centric design, UMo efficiently performs real-time dense reconstruction, enabling temporally coherent and high-fidelity animation generation for both facial expressions and gestures. Furthermore, we implement a multi-stage training strategy with targeted audio augmentation to enhance acoustic diversity and semantic consistency. Consequently, UMo preserves fine-grained speech-motion alignment even under strict latency constraints. Extensive quantitative and qualitative evaluations show that UMo achieves better output quality under low latency and real-time performance constraints, offering a practical solution for high-fidelity real-time co-speech avatars.

2605.14671 2026-05-15 cond-mat.mtrl-sci cs.AI

Agentic Design of Compositional Descriptors via Autoresearch for Materials Science Applications

Matteo Cobelli, Stefano Sanvito

AI总结 本文提出了一种基于自研(autoresearch)框架的智能代理系统Automat,用于材料科学中化学成分描述符的设计。该系统利用大型语言模型作为编码代理,自动生成仅基于化学公式的描述符,并通过随机森林进行评估,实现了对无机材料带隙和铁磁化合物居里温度的预测。研究显示,Automat在性能上优于传统基准方法,且生成的描述符具有化学可解释性,展示了无需人工特征工程即可设计任务特定材料描述符的潜力,同时也揭示了当前在描述符冗余和搜索策略等方面存在的挑战。

详情
英文摘要

Autoresearch offers a flexible paradigm for automating scientific tasks, in which an AI agent proposes, implements, evaluates, and refines candidate solutions against a quantitative objective. Here, we use composition-based materials-property prediction to test whether such agents can perform a task beyond model selection and hyperparameter optimization: the design of input descriptors. We introduce Automat, an autoresearch framework where a coding agent based on a large language model generates composition-only descriptors for chemical compounds and evaluates them using a random forest workflow. The agent is restricted to information derivable from chemical formulas and iteratively proposes, implements, and tests chemically motivated descriptor strategies. We apply Automat, with OpenAI Codex using GPT-5.5 as the coding agent, to the prediction of experimental band gaps in inorganic materials and Curie temperatures in ferromagnetic compounds. In both tasks, Automat improves over fractional-composition, Magpie, and combined fractional-composition/Magpie baselines, while producing descriptor families that are chemically interpretable. These results provide a demonstration that autoresearch agents can generate competitive, task-specific materials descriptors without manual feature engineering during the run. They also reveal current limitations, including descriptor redundancy, sensitivity to greedy feature expansion, and the need for explicit complexity control, descriptor pruning, and more sophisticated search strategies.

2605.14662 2026-05-15 math.OC cs.LG

Scalable Solution of the Stochastic Multi-path Traveling Salesman Problem via Neural Networks

Xiaochen Chou, Ludovica Di Marco, Enza Messina

AI总结 本文研究了在智能城市和城市物流中出现的具有随机旅行成本的多路径旅行商问题,旨在寻找一条最小化期望总旅行成本的哈密顿回路。为解决该问题,作者提出了一种两阶段随机规划方法,并引入基于神经网络的代理模型来近似第二阶段的 recourse 问题,从而显著降低计算复杂度。实验表明,该方法在计算效率、解的质量和泛化能力方面表现良好,为处理不确定性下的复杂车辆路径问题提供了可扩展的解决方案。

详情
英文摘要

The multi-path Traveling Salesman Problem with stochastic travel costs arises in hybrid vehicle routing applications designed for Smart City and City Logistics, where multiple paths exist between each pair of locations. Travel times along these paths are typically affected by real-time traffic conditions and therefore modeled as stochastic. The objective of the problem is to determine a Hamiltonian tour that minimizes the expected total travel cost under uncertainty. In this work, we adopt a two-stage stochastic programming formulation. In the first stage, a predefined route specifying the sequence of locations to be visited is determined, while taking into consideration a second-stage recourse problem that selects the optimal path from the feasible set of alternative paths for each pair of locations, once real-time traffic conditions are realized. To reduce the computational burden imposed by the large number of scenarios required to capture travel time uncertainty, the innovation of this work is the integration of neural network-based surrogate models to approximate the expected value of the second-stage recourse problem. Different architectures and training strategies for the neural networks are proposed and analyzed, with performance evaluated in terms of computation time, solution quality, and generalization capability. Preliminary findings demonstrate the enhanced scalability and practical applicability of the approach for complex vehicle routing problems under uncertainty.

2605.14629 2026-05-15 eess.IV cs.CV

Efficient Dense Matching for Enhanced Gaussian Splatting Using AV1 Motion Vectors

Julien Zouein, Vibhoothi Vibhoothi, François Pitié, Anil Kokaram

AI总结 本文提出了一种基于AV1运动向量的高效密集匹配方法,用于提升高斯泼溅(3DGS)的初始点云质量。该方法利用AV1视频编解码器中的运动向量,避免了传统SfM方法中耗时的穷举匹配,显著降低了计算开销并提高了点云密度。实验表明,该方法生成的点云数量是传统SfM方法的八倍,有效提升了3DGS的重建精度和训练效率。

详情
英文摘要

3D Gaussian Splatting (3DGS) has emerged as a prominent framework for real-time, photorealistic scene reconstruction, offering significant speed-ups over Neural Radiance Fields (NeRF). However, the fidelity of 3DGS representations remains heavily dependent on the quality of the initial point cloud. While standard Structure-from-Motion (SfM) pipelines using COLMAP provide adequate initialisation, they often suffer from high computational costs and sparsity in textureless regions, which degrades subsequent reconstruction accuracy and convergence speed. In this work, we introduce an AV1-based feature detection and matching pipeline that significantly reduces SfM processing overhead. By leveraging motion vectors inherent to the AV1 video codec, we bypass computationally expensive exhaustive matching while maintaining geometric robustness. Our pipeline produces substantially denser point clouds, with up to eight times as many points as classical SfM. We demonstrate that this enhanced initialisation directly improves 3DGS performance, yielding an 9-point increase in VMAF and a 63% average reduction in training time required to reach baseline quality. The project page: https://sigmedia.tv/AV1-3DGS.github.io/

2605.14612 2026-05-15 cs.SE cs.AI

In-IDE Toolkit for Developers of AI-Based Features

Yaroslav Sokolov, Yury Khudyakov, Lenar Sharipov, Andrei Gasparian, Parth Tiwary, Artem Trofimov

AI总结 本文提出了一种集成在JetBrains IDE中的AI Toolkit插件,旨在帮助非机器学习背景的软件工程师更便捷地测试、调试和评估基于大语言模型和智能体工作流的AI功能。该工具通过在运行/调试过程中实现追踪与评估,满足了开发者对可重复评估、实时追踪和简化设置的核心需求。实验表明,该工具能有效降低使用门槛,促进开发者形成规范的AI开发实践。

Comments Published at IDE'26 co-located with ICSE'26

详情
英文摘要

AI-enabled features built on LLMs and agentic workflows are difficult to test, debug, and reproduce, especially for product-focused software engineers without a machine learning background. We present the AI Toolkit plugin for JetBrains IDEs, which brings tracing and evaluation directly into the Run/Debug loop. A mixed methods study with practitioners presents three consistent needs: (1) make evaluation regular and repeatable, (2) expose traces at the moment of execution, and (3) minimize setup and context switching. Guided by these needs, the AI Toolkit introduces an IDE-native workflow: run-triggered trace capture; immediate, hierarchical inspection; one-click "Add to Dataset" from traces; and unit-test-like evaluations with pluggable metrics. The first release in PyCharm shows promising early signals - strong conversion when promoted at Run, sustained usage among those who capture traces, and low churn - suggesting that IDE-native observability lowers activation energy and helps developers adopt disciplined practices. We detail the design and implementation of the AI Agents Debugger and AI Evaluation, report initial adoption telemetry, and outline next steps to broaden framework coverage and scale evaluations. Together, these results indicate that integrating AI observability and evaluation into everyday IDE workflows can make modern AI development accessible to non-ML specialists while preserving software-engineering practices.