arXivDaily arXiv每日学术速递 周一至周五更新
热门方向导航
2604.03146 2026-06-19 stat.ML cs.LG 版本更新

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen, China Huawei Noah's Ark Lab, Huawei Technologies, Paris, France School of Electronic Information Communications, Huazhong University of Science \& Technology, China

AI总结 通过将凸高斯极小极大定理推广到非高斯数据,刻画了高维经验风险最小化估计量的渐近分布,揭示了高斯普适性的适用范围与局限。

Comments 28 pages, 5 figures, 1 table

Journal ref ICML 2026

详情
AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化(ERM)。通过启发式地将凸高斯极小极大定理(CGMT)扩展到非高斯设置,我们推导出关键统计量的渐近极小极大表征,从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地,在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下,我们证明:对于独立于训练数据的测试协变量 $x$,投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量(方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$)的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外,我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟,以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

2604.00527 2026-06-19 math.MG cs.RO math.DG 版本更新

Bistable Quad-Nets Composed of Four-Bar Linkages

由四杆机构组成的双稳态四边网

Gudrun Szewieczek, Daniel Huczala, Martin Pfurner, Hans-Peter Schröcker

发表机构 * University of Innsbruck, Department of Basic Sciences in Engineering Sciences(因斯布鲁克大学工程科学基础科学系) Seoul National University, Robotics Laboratory(首尔国立大学机器人实验室)

AI总结 研究由空间四杆机构组成的双稳态机械结构,通过Study二次曲面解释并利用Whiteley去平均化从柔性四边网构造,无需数值优化即可控制几何参数。

详情
AI中文摘要

我们研究了一种新型机械结构,由空间四杆机构组成,具有双稳态特性,即允许两种不同的构型。这些结构在Study二次曲面中具有四边网的解释,我们利用该解释证明了具有无限数量连杆和关节的组装体的存在性。我们提出了一种纯几何构造方法,从欧几里得空间中的无穷小柔性四边网出发,应用Whiteley去平均化。这一观点将问题置于离散微分几何的更广泛框架内,并能够从众所周知的四边网类别(如离散极小曲面)构造双稳态结构。与许多其他双稳态结构构造方法相比,我们的方法不依赖于数值优化,并且允许简单控制相关几何参数,如轴位置和卡扣角度。

英文摘要

We study a novel type of mechanical structures, composed of spatial four-bar linkages, that are bistable, that is, they allow for two distinct configurations. These structures have an interpretation as quad nets in the Study quadric which we use to prove existence of assemblies with an unbounded number of links and joints. We propose a purely geometric construction of such objects, starting from infinitesimally flexible quad nets in Euclidean space and applying Whiteley de-averaging. This point of view situates the problem within the broader framework of discrete differential geometry and enables the construction of bistable structures from well-known classes of quad nets, such as discrete minimal surfaces. In contrast to many other construction methods for bistable structures, our approach does not rely on numerical optimization and it allows for simple control of relevant geometric parameters such as axis positions and snap angles.

2603.19423 2026-06-19 cs.CR cs.AI cs.LG 版本更新

The Autonomy Tax: Defense Training Breaks LLM Agents

自主性税:防御训练破坏LLM智能体

Shawn Li, Yue Zhao

发表机构 * University of Southern California(南加州大学)

AI总结 揭示防御训练在提升LLM智能体安全性时,系统性地破坏其工具执行能力,导致任务失败率飙升,且无法有效防御复杂攻击。

详情
AI中文摘要

大型语言模型(LLM)智能体日益依赖外部工具(文件操作、API调用、数据库事务)来自主完成复杂的多步骤任务。实践者部署经过防御训练的模型,以防止通过恶意观察或检索内容操纵智能体行为的提示注入攻击。我们揭示了一个基本的\textbf{能力-对齐悖论}:旨在提高安全性的防御训练系统性地破坏了智能体的能力,同时未能阻止复杂的攻击。在97个智能体任务和1000个对抗性提示上,将防御模型与未防御基线进行比较,我们发现了多步骤智能体特有的三种系统性偏差。\textbf{智能体无能偏差}表现为立即的工具执行崩溃,模型在观察到任何外部内容之前就在良性任务上拒绝或生成无效操作。\textbf{级联放大偏差}导致早期失败通过重试循环传播,使防御模型在99%的任务中超时,而基线仅为13%。\textbf{触发偏差}导致矛盾的安全退化,防御模型的表现比未防御基线更差,而直接攻击以高概率绕过防御。根本原因分析表明,这些偏差源于捷径学习:模型过度拟合表面攻击模式而非语义威胁理解,这由防御效果在不同攻击类别上的极端方差所证明。我们的发现表明,当前的防御范式优化了单轮拒绝基准,同时使多步骤智能体从根本上不可靠,因此需要新的方法在对抗条件下保持工具执行能力。

英文摘要

Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tasks. Practitioners deploy defense-trained models to protect against prompt injection attacks that manipulate agent behavior through malicious observations or retrieved content. We reveal a fundamental \textbf{capability-alignment paradox}: defense training designed to improve safety systematically destroys agent competence while failing to prevent sophisticated attacks. Evaluating defended models against undefended baselines across 97 agent tasks and 1,000 adversarial prompts, we uncover three systematic biases unique to multi-step agents. \textbf{Agent incompetence bias} manifests as immediate tool execution breakdown, with models refusing or generating invalid actions on benign tasks before observing any external content. \textbf{Cascade amplification bias} causes early failures to propagate through retry loops, pushing defended models to timeout on 99\% of tasks compared to 13\% for baselines. \textbf{Trigger bias} leads to paradoxical security degradation where defended models perform worse than undefended baselines while straightforward attacks bypass defenses at high rates. Root cause analysis reveals these biases stem from shortcut learning: models overfit to surface attack patterns rather than semantic threat understanding, evidenced by extreme variance in defense effectiveness across attack categories. Our findings demonstrate that current defense paradigms optimize for single-turn refusal benchmarks while rendering multi-step agents fundamentally unreliable, necessitating new approaches that preserve tool execution competence under adversarial conditions.

2603.16941 2026-06-19 eess.AS cs.CL cs.SD 版本更新

The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs

言语背后的声音:量化语音大语言模型中的交叉偏见

Shree Harsha Bokkahalli Satish, Christoph Minixhofer, Maria Teleki, James Caverlee, Ondřej Klejch, Peter Bell, Gustav Eje Henter, Éva Székely

发表机构 * 1 Department of Speech, Music Hearing, KTH Royal Institute of Technology, Sweden 2 Centre for Speech Technology Research, University of Edinburgh, UK 3 Texas A\&M University, USA

AI总结 本研究通过2880次受控交互,评估三种语音大语言模型在六种英语口音和两种性别呈现中的口音与性别交叉偏见,发现东欧口音(尤其女性)获得更低有用性评分,且人类评估者比LLM评判更敏感。

Comments 5 pages, 3 figures, 1 table, Accepted to Interspeech 2026

详情
AI中文摘要

语音大语言模型直接处理语音输入,保留了之前级联管道中去除的口音和感知性别等线索,这导致了依赖于说话者身份的反应差异。我们使用2880次受控交互(涵盖六种英语口音和两种性别呈现,通过语音克隆保持语言内容不变),对三种语音大语言模型中的口音和性别偏见进行了大规模交叉评估。通过逐点LLM评判评分、成对比较以及经过人工验证的最佳-最差缩放,我们检测到反复出现的定向差异。东欧口音的语音获得较低的有用性评分,尤其是女性呈现的语音。反应保持礼貌但在有用性上存在差异。虽然LLM评判捕捉到了这些偏见的定向趋势,但人类评估者表现出显著更高的敏感性,显示出更强的口音级别对比。

英文摘要

Speech Large Language Models (SpeechLLMs) process spoken input directly, retaining cues such as accent and perceived gender that were previously removed in cascaded pipelines. This introduces speaker identity dependent variation in responses. We present a large-scale intersectional evaluation of accent and gender bias in three SpeechLLMs using 2,880 controlled interactions across six English accents and two gender presentations, keeping linguistic content constant through voice cloning. Using pointwise LLM-judge ratings, pairwise comparisons, and Best-Worst Scaling with human validation, we detect recurring directional disparities. Eastern European-accented speech receives lower helpfulness scores, particularly for female-presenting voices. Responses remain polite but differ in helpfulness. While LLM judges capture the directional trend of these biases, human evaluators exhibit significantly higher sensitivity, showing stronger accent-level contrasts.

2603.10184 2026-06-19 stat.ML cs.LG 版本更新

Stabilizing Bandits using Regularization: Precise Regret and A Quantitative Central Limit Theorem

使用正则化稳定赌博机:精确遗憾与定量中心极限定理

Budhaditya Halder, Ishan Sengupta, Koustav Chowdhury, Samya Praharaj, Koulik Khamaru

发表机构 * Department of Statistics, Rutgers University(罗切斯特大学统计系) Indian Statistical Institute, Kolkata(加尔各答印度统计研究所)

AI总结 本文提出一种精细的稳定性条件,证明正则化随机镜像下降算法满足该条件,并推导出自适应采样下经验奖励估计的非渐近Berry-Esseen界、匹配的遗憾上下界,以及抗腐败下的渐近正态性,同时揭示正则化是有效推断的必要代价。

Comments Updated rate of convergence and precise regret in version 2

详情
AI中文摘要

由于自适应采样违反了经典渐近理论中的独立性假设,使用赌博机数据进行统计推断面临根本性挑战。近期工作将稳定性~\citep{laiwei82} 确定为自适应下有效推断的充分条件。本文首先提出一个精细的稳定性条件,以在线算法的迭代形式表述,并证明一大类正则化随机镜像下降算法满足该条件。这一精细条件使我们能够在多个方面加强~\citet{laiwei82} 的渐近结果。首先,我们推导出自适应采样下经验奖励估计的非渐近Berry-Esseen界。其次,我们推导出所提算法遗憾的匹配非渐近上下界,从而精确刻画其遗憾。第三,我们证明这些正则化算法在给定水平的对抗性腐败下保持渐近正态性和有效推断。最后,我们表明正则化是必要的而非偶然的:Lai-Wei稳定性与最优的$O(\sqrt{T})$遗憾率(如EXP3等非正则化算法所达到的)不相容,因此受控的多对数级遗憾膨胀是有效推断的代价。

英文摘要

Statistical inference with bandit data presents fundamental challenges owing to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability~\citep{laiwei82} as a sufficient condition for valid inference under adaptivity. This paper first provides a refined stability condition, stated in terms of the iterates of an online algorithm, and shows that a large class of regularized stochastic-mirror-descent-style algorithms satisfy it. This refined condition allows us to strengthen the asymptotic results of~\citet{laiwei82} in several ways. First, we derive a non-asymptotic Berry--Esseen bound for the empirical reward estimates under adaptive sampling. Second, we derive matching non-asymptotic upper and lower bounds on the regret of the proposed algorithm, yielding a precise characterization of its regret. Third, we show that these regularized algorithms preserve asymptotic normality and valid inference under a prescribed level of adversarial corruption. Finally, we show that regularization is necessary rather than incidental: Lai--Wei stability is incompatible with the optimal $O(\sqrt{T})$ regret rate -- the rate attained by unregularized algorithms such as EXP3 -- so that a controlled, polylogarithmic inflation in regret is the price of valid inference.

2601.22300 2026-06-19 physics.optics cond-mat.dis-nn cs.ET cs.LG 版本更新

Toward all-optical unsupervised Hebbian learning in deep photonic neuromorphic networks

面向全光学无监督Hebbian学习的深度光子神经形态网络

Xi Li, Disha Biswas, Peng Zhou, Wesley H. Brigner, Anna Capuano, Joseph S. Friedman, Qing Gu

发表机构 * Department of Electrical and Computer Engineering, North Carolina State University(北卡罗来纳州立大学电气与计算机工程系) Department of Electrical and Computer Engineering, The University of Texas at Dallas(德克萨斯大学达拉斯分校电气与计算机工程系) Department of Materials Science and Engineering, North Carolina State University(北卡罗来纳州立大学材料科学与工程系) Department of Physics, North Carolina State University(北卡罗来纳州立大学物理系)

AI总结 提出一种基于相变材料突触和局部光反馈的深度光子神经形态网络架构,实现在线无监督Hebbian学习,实验验证了自适应突触演化和光学推理。

Comments 16 pages, 4 figures

详情
AI中文摘要

我们提出了一种基于相变材料(PCM)突触和局部光反馈的深度光子神经形态网络(PNN)架构,用于在线、无监督的Hebbian学习。该架构将光学矢量-矩阵乘法、非易失性PCM突触加权以及局部符合驱动的突触自适应结合在一个与光子集成电路兼容的多层光子交叉开关框架中。与依赖外部计算梯度、重复光电转换或全局反向传播的传统PNN不同,所提出的框架采用由突触前和突触后光学活动直接控制的局部Hebbian学习。为了研究所提出的学习机制的可行性,我们使用光纤组件、可编程可变光衰减器和包含PCM热动力学的实时软件控制实现了PNN设计。在离线和在线学习条件下,使用代表性图像识别任务实验评估了监督和无监督学习行为。实验结果表明,在现实光纤硬件条件下,通过局部Hebbian学习实现了自适应突触演化、成功的光学推理和自主模式编码。这些结果为未来能够实现可扩展和节能的在线Hebbian学习的集成光子神经形态系统铺平了道路。

英文摘要

We propose a deep photonic neuromorphic network (PNN) architecture based on phase-change material (PCM) synapses and local optical feedback for online, unsupervised Hebbian learning. The proposed architecture combines optical vector-matrix multiplication, non-volatile PCM synaptic weighting, and local coincidence-driven synaptic adaptation within a multilayer photonic crossbar framework compatible with photonic integrated circuits. Unlike conventional PNNs that rely on externally computed gradients, repeated optical-electrical-optical conversions, or global backpropagation, the proposed framework employs local Hebbian learning governed directly by correlated pre- and post-synaptic optical activity. To investigate the feasibility of the proposed learning mechanism, we implemented the PNN design using fiber-optic components, programmable variable optical attenuators, and real-time software control that incorporates PCM thermal dynamics. Supervised and unsupervised learning behaviors were experimentally evaluated under both offline and online learning conditions using representative image-recognition tasks. The experimental results demonstrate adaptive synaptic evolution, successful optical inference, and autonomous pattern encoding through local Hebbian learning under realistic fiber-optic hardware conditions. These results establish a pathway toward future integrated photonic neuromorphic systems capable of scalable and energy-efficient online Hebbian learning.

2602.15707 2026-06-19 cs.MM cs.CL cs.LG 版本更新

Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU

基于音频和IMU的主动式程序性任务对话助手

Rehana Mahfuz, Yinyi Guo, Erik Visser, Phanidhar Chinchili

发表机构 * Qualcomm Technologies, Inc.(高通技术公司)

AI总结 提出首个仅使用音频和IMU模态的实时对话助手,通过微调语言模型减少不必要对话并提升问答准确性,在边缘设备上实现无云依赖。

Comments 5 figures. 5 more in appendix

详情
AI中文摘要

实时对话助手用于程序性手工任务通常依赖视频输入,这会导致计算成本高且侵犯用户隐私。我们首次提出一种实时对话助手,仅使用来自用户可穿戴设备的轻量级隐私保护模态(如音频和IMU输入)来理解上下文,为程序性手工任务提供全面指导。通过家具组装任务和烹饪任务,我们展示了该助手如何主动向执行程序性任务的用户提供逐步指令,并回答用户问题。我们阐述了实现该助手的数据生成方法和系统设计。观察到现成的语言模型健谈但并非总能正确回答问题,我们展示了微调模型如何将其减少不必要对话的能力提升50%(精确度),同时将正确回答问题的能力提升150%(召回率)。我们进一步描述了如何在边缘设备上实现该助手,无需依赖云端。

英文摘要

Real-time conversational assistants for procedural manual tasks often depend on video input, which can be computationally expensive and compromise user privacy. For the first time, we propose a real-time conversational assistant that provides comprehensive guidance for procedural manual tasks using only lightweight privacy-preserving modalities such as audio and IMU inputs from a user's wearable device to understand the context. Using a furniture assembly task and a cooking task, we show how this assistant proactively communicates step-by-step instructions to a user performing a procedural task, and answers user questions. We illustrate the data generation method and the system design to achieve such an assistant. On observing that an off-the-shelf language model is a talkative assistant but is not always able to answer questions correctly, we demonstrate how finetuning the model improves its ability to limit unnecessary dialogues with a 50% increase in the precision, while also improving its ability to answer questions correctly, measured by a 150% increase in the recall of answers. We further describe how such an assistant is implemented on an edge device with no dependence on the cloud.

2509.24894 2026-06-19 math.OC cs.LG 版本更新

Improved Stochastic Optimization of LogSumExp

改进的LogSumExp随机优化

Egor Gladin, Alexey Kroshnin, Jia-Jie Zhu, Pavel Dvurechensky

发表机构 * HSE University(莫斯科高等经济学院) Department of Mathematics, KTH Royal Institute of Technology(皇家理工学院数学系)

AI总结 针对LogSumExp函数在大规模指数项下的优化难题,提出一种保持凸性和光滑性的近似方法,基于Safe KL散度,在分布鲁棒优化和连续最优传输中优于现有基线。

Comments 21 pages, 6 figures, 5 tables; added convergence statement and additional experiments

详情
AI中文摘要

LogSumExp函数作为Kullback-Leibler (KL)散度的对偶函数,在许多重要优化问题中扮演核心角色,包括熵正则化最优传输(OT)和分布鲁棒优化(DRO)。实践中,当对数内指数项数量很大或无穷时,优化变得具有挑战性,因为计算梯度需要对每一项求导。我们提出了一种新颖的保持凸性和光滑性的LogSumExp近似,可以使用随机梯度方法高效优化。该近似基于对偶中KL散度的合理修改,产生了一种新的$f$-散度,称为Safe KL散度。我们在DRO和连续OT中基于LogSumExp的随机优化的实验和理论分析表明,我们的方法优于现有基线。

英文摘要

The LogSumExp function, dual to the Kullback-Leibler (KL) divergence, plays a central role in many important optimization problems, including entropy-regularized optimal transport (OT) and distributionally robust optimization (DRO). In practice, when the number of exponential terms inside the logarithm is large or infinite, optimization becomes challenging since computing the gradient requires differentiating every term. We propose a novel convexity- and smoothness-preserving approximation to LogSumExp that can be efficiently optimized using stochastic gradient methods. This approximation is rooted in a sound modification of the KL divergence in the dual, resulting in a new $f$-divergence called the Safe KL divergence. Our experiments and theoretical analysis of the LogSumExp-based stochastic optimization, arising in DRO and continuous OT, demonstrate the advantages of our approach over existing baselines.

2601.16233 2026-06-19 cs.SI cs.AI 版本更新

Policy-Embedded Graph Expansion: Networked HIV Testing with Diffusion-Driven Network Samples

策略嵌入图扩展:基于扩散驱动网络样本的网络化HIV检测

Akseli Kangaslahti, Davin Choo, Lingkai Kong, Milind Tambe, Alastair van Heerden, Cheryl Johnson

发表机构 * Harvard University(哈佛大学) University of Witwatersrand(沃特瓦特斯兰大学) Wits Health Consortium(沃茨健康联盟) World Health Organization(世界卫生组织)

AI总结 提出策略嵌入图扩展(PEGE)框架,将图扩展的生成分布直接嵌入决策策略,结合基于扩散的图扩展模型DDB,在真实HIV传播网络上实现优于基线17.3%的折扣奖励和15.4%的检测提升。

详情
AI中文摘要

HIV是一种攻击人类免疫系统的逆转录病毒,如不进行适当治疗可导致死亡。我们与WHO和威特沃特斯兰德大学合作,研究如何提高HIV检测效率,目标是最终部署,直接支持联合国可持续发展目标3.3的进展。虽然先前的工作已展示了智能算法在基于网络的序贯HIV检测中的潜力,但现有方法依赖于在我们实际实施中不切实际的假设。在此,我们研究在逐步揭示的疾病网络上的序贯检测,并引入策略嵌入图扩展(PEGE),这是一种新颖的框架,直接将图扩展的生成分布嵌入决策策略,而不是尝试显式的拓扑重建。我们进一步提出动力学驱动分支(DDB),一种基于扩散的图扩展模型,支持PEGE中的决策制定,并专为数据有限的环境设计,其中森林结构自然出现,如我们实际转诊过程中的情况。在真实HIV传播网络上的实验表明,组合方法(PEGE + DDB)持续优于基线(例如,折扣奖励提高17.3%,在测试25%人口时多检测15.4%的HIV病例),并探索了驱动解决方案质量的关键权衡。

英文摘要

HIV is a retrovirus that attacks the human immune system and can lead to death without proper treatment. In collaboration with the WHO and the University of Witwatersrand, we study how to improve the efficiency of HIV testing with the goal of eventual deployment, directly supporting progress toward UN Sustainable Development Goal 3.3. While prior work has demonstrated the promise of intelligent algorithms for sequential, network-based HIV testing, existing approaches rely on assumptions that are impractical in our real-world implementations. Here, we study sequential testing on incrementally revealed disease networks and introduce Policy-Embedded Graph Expansion (PEGE), a novel framework that directly embeds a generative distribution over graph expansions into the decision-making policy rather than attempting explicit topological reconstruction. We further propose Dynamics-Driven Branching (DDB), a diffusion-based graph expansion model that supports decision making in PEGE and is designed for data-limited settings where forest structures arise naturally, as in our real-world referral process. Experiments on real HIV transmission networks show that the combined approach (PEGE + DDB) consistently outperforms baselines (e.g., 17.3% improvement in discounted reward and 15.4% more HIV detections with 25% of the population tested) and explore key tradeoffs that drive solution quality.

2601.14430 2026-06-19 stat.ML cs.LG 版本更新

Meta Flow Maps enable scalable reward alignment

元流映射实现可扩展的奖励对齐

Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S. Albergo, Yee Whye Teh

发表机构 * University of Oxford(牛津大学) Harvard University(哈佛大学) Kempner Institute(凯普纳研究所)

AI总结 提出元流映射(MFMs)框架,通过可微分的单步后验采样实现高效价值函数估计,从而无需轨迹模拟即可进行推理时引导和离策略微调,显著降低计算成本。

详情
AI中文摘要

控制生成模型在计算上是昂贵的。这是因为与奖励函数的最优对齐——无论是通过推理时引导还是微调——都需要估计价值函数。这一任务需要访问条件后验 $p_{1|t}(x_1|x_t)$,即与中间状态 $x_t$ 一致的干净数据 $x_1$ 的分布,这一要求通常迫使方法诉诸昂贵的轨迹模拟。为了解决这一瓶颈,我们引入了元流映射(MFMs),这是一个将一致性模型和流映射扩展到随机机制的框架。MFMs 被训练为执行随机单步后验采样,从任意中间状态生成任意多个独立同分布的干净数据 $x_1$ 样本。关键在于,这些样本提供了一个可微分的重参数化,从而解锁了高效的价值函数估计。我们利用这一能力解决了两种范式中的瓶颈:实现无需内部展开的推理时引导,并促进对一般奖励的无偏、离策略微调。实验上,我们的单粒子引导 MFM 采样器在 ImageNet 上以极少的计算量在多个奖励上优于 Best-of-1000 基线。

英文摘要

Controlling generative models is computationally expensive. This is because optimal alignment with a reward function--whether via inference-time steering or fine-tuning--requires estimating the value function. This task demands access to the conditional posterior $p_{1|t}(x_1|x_t)$, the distribution of clean data $x_1$ consistent with an intermediate state $x_t$, a requirement that typically compels methods to resort to costly trajectory simulations. To address this bottleneck, we introduce Meta Flow Maps (MFMs), a framework extending consistency models and flow maps into the stochastic regime. MFMs are trained to perform stochastic one-step posterior sampling, generating arbitrarily many i.i.d. draws of clean data $x_1$ from any intermediate state. Crucially, these samples provide a differentiable reparametrization that unlocks efficient value function estimation. We leverage this capability to solve bottlenecks in both paradigms: enabling inference-time steering without inner rollouts, and facilitating unbiased, off-policy fine-tuning to general rewards. Empirically, our single-particle steered-MFM sampler outperforms a Best-of-1000 baseline on ImageNet across multiple rewards at a fraction of the compute.

2511.08378 2026-06-19 cs.IR cs.AI 版本更新

Bid Farewell to Seesaw: Towards Accurate Long-tail Session-based Recommendation via Dual Constraints of Hybrid Intents

告别跷跷板:通过混合意图的双重约束实现准确的长期会话推荐

Xiao Wang, Ke Qin, Dongyang Zhang, Xiurui Xie, Shuang Liang

发表机构 * University of Electronic Science and Technology of China(电子科技大学)

AI总结 针对会话推荐中长尾分布导致准确性与多样性冲突的跷跷板问题,提出混合意图双重约束框架HID,通过属性感知谱聚类重构意图映射并区分噪声意图,结合多样性与准确性约束损失,实现长尾与准确性的双赢。

Comments accepted by AAAI 2026 Oral

详情
AI中文摘要

基于会话的推荐(SBR)旨在根据用户的交互会话预测匿名用户的下一次交互。在实际推荐场景中,低曝光物品构成了交互的大部分,形成长尾分布,严重损害了推荐多样性。现有方法试图通过提升尾部物品来解决这一问题,但会导致准确性下降,在长尾与准确性性能之间表现出“跷跷板”效应。我们将这种冲突归因于尾部物品中的会话无关噪声,而现有的长尾方法未能有效识别和约束这些噪声。为了解决这一根本冲突,我们提出了HID(混合意图双重约束框架),这是一个即插即用的框架,通过引入基于混合意图的双重约束,将传统的“跷跷板”转变为“双赢”,同时提升长尾和准确性性能。该框架包含两个关键创新:(i)混合意图学习,我们通过采用属性感知谱聚类重构物品到意图的映射,重新制定了意图提取策略。此外,通过为每个会话分配目标意图和噪声意图,实现了会话无关噪声的区分。(ii)意图约束损失,它引入了两种关于多样性和准确性的新约束范式,以调节物品和会话的表示学习过程。通过严格的理论推导,这两个目标被统一到单个训练损失中。在多个SBR模型和数据集上的大量实验表明,HID能够同时提升长尾性能和推荐准确性,在长尾推荐系统中建立了新的最先进性能。

英文摘要

Session-based recommendation (SBR) aims to predict anonymous users' next interaction based on their interaction sessions. In the practical recommendation scenario, low-exposure items constitute the majority of interactions, creating a long-tail distribution that severely compromises recommendation diversity. Existing approaches attempt to address this issue by promoting tail items but incur accuracy degradation, exhibiting a "see-saw" effect between long-tail and accuracy performance. We attribute such conflict to session-irrelevant noise within the tail items, which existing long-tail approaches fail to identify and constrain effectively. To resolve this fundamental conflict, we propose \textbf{HID} (\textbf{H}ybrid \textbf{I}ntent-based \textbf{D}ual Constraint Framework), a plug-and-play framework that transforms the conventional "see-saw" into "win-win" through introducing the hybrid intent-based dual constraints for both long-tail and accuracy. Two key innovations are incorporated in this framework: (i) \textit{Hybrid Intent Learning}, where we reformulate the intent extraction strategies by employing attribute-aware spectral clustering to reconstruct the item-to-intent mapping. Furthermore, discrimination of session-irrelevant noise is achieved through the assignment of the target and noise intents to each session. (ii) \textit{Intent Constraint Loss}, which incorporates two novel constraint paradigms regarding the \textit{diversity} and \textit{accuracy} to regulate the representation learning process of both items and sessions. These two objectives are unified into a single training loss through rigorous theoretical derivation. Extensive experiments across multiple SBR models and datasets demonstrate that HID can enhance both long-tail performance and recommendation accuracy, establishing new state-of-the-art performance in long-tail recommender systems.

2601.03112 2026-06-19 eess.IV cs.CV 版本更新

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

DiT-JSCC:基于扩散变换器与语义表示的深度JSCC再思考

Kailin Tan, Jincheng Dai, Sixian Wang, Guo Lu, Shuo Shao, Kai Niu, Wenjun Zhang, Ping Zhang

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Shanghai Jiao Tong University(上海交通大学) University of Shanghai for Science and Technology(上海科技大学)

AI总结 提出DiT-JSCC框架,联合学习语义优先表示编码器和扩散变换器生成解码器,通过粗细粒度条件解码和基于Kolmogorov复杂度的自适应带宽分配,在极端信道条件下提升语义一致性与传输效率。

Comments 14pages, 14figures, 2tables

详情
AI中文摘要

生成式联合源信道编码(GJSCC)已成为一种新的深度JSCC范式,用于在极端无线信道条件(如超低带宽和低信噪比)下实现高保真和鲁棒的图像传输。近期研究通常采用扩散模型作为生成解码器,但经常产生视觉上逼真但语义一致性有限的结果。这种局限性源于面向重建的JSCC编码器与生成解码器之间的根本性不匹配,因为前者缺乏显式的语义判别能力,无法提供可靠的条件线索。在本文中,我们提出DiT-JSCC,一种新颖的GJSCC骨干网络,能够联合学习语义优先的表示编码器和基于扩散变换器(DiT)的生成解码器,我们的开源项目旨在促进GJSCC的未来研究。具体来说,我们设计了一个语义-细节双分支编码器,与从粗到细的条件DiT解码器自然对齐,在极端信道条件下优先考虑语义一致性。此外,受Kolmogorov复杂度启发,引入了一种无需训练的自适应带宽分配策略,以进一步提高传输效率,从而真正重新定义生成解码时代的信息价值概念。大量实验表明,DiT-JSCC在语义一致性和视觉质量上始终优于现有JSCC方法,尤其是在极端条件下。

英文摘要

Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

2601.02322 2026-06-19 stat.ME cs.LG 版本更新

Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction

环境自适应协变量选择:学习何时利用虚假相关进行分布外预测

Shuozhi Zuo, Yixin Wang

发表机构 * Department of Statistics, University of Michigan, Ann Arbor(统计系,密歇根大学,安阿伯分校)

AI总结 针对分布外预测中协变量选择问题,提出环境自适应算法,根据环境特征动态选择协变量集,在模拟和实际数据中优于静态方法。

详情
AI中文摘要

一种常见的分布外预测方法将模型限制为因果或不变协变量,以避免可能随环境变化的虚假关联。尽管具有理论吸引力,但当仅观察到结果的部分因果父节点时,该策略可能不如经验风险最小化。在这种情况下,非因果协变量可以作为未观察到的因果父节点的代理,当代理关系稳定时改善预测,但当变化破坏这种关系时则有害。因此,最优协变量集可能取决于所遇到的具体变化。由于不同的变化会在未标记的协变量分布中留下特征,我们提出了一种环境自适应协变量选择算法,该算法将环境级摘要映射到特定于环境的协变量集。这些摘要可以是手工制作的,也可以从多环境数据中学习,并且先验因果知识可以作为约束条件纳入。在模拟和应用数据集中,所提出的方法在各种变化下优于静态因果、不变和其他非自适应规则。

英文摘要

A common approach to out-of-distribution prediction restricts models to causal or invariant covariates to avoid spurious associations that may change across environments. Despite its theoretical appeal, this strategy can underperform empirical risk minimization when only a subset of the causal parents of the outcome is observed. In such settings, non-causal covariates can serve as proxies for unobserved causal parents and improve prediction when the proxy relationship is stable, but they can hurt when shifts disrupt that relationship. Thus, the optimal covariate set can depend on the specific shift encountered. Because different shifts leave signatures in the unlabeled covariate distribution, we propose an environment-adaptive covariate selection algorithm that maps environment-level summaries to environment-specific covariate sets. These summaries may be hand-crafted or learned from multi-environment data, and prior causal knowledge can be incorporated as constraints. Across simulations and applied datasets, the proposed method improves over static causal, invariant, and other non-adaptive rules under diverse shifts.

2601.00014 2026-06-19 eess.SP cs.AI cs.LG 版本更新

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

建模全天心电图信号以可解释人工智能预测心力衰竭风险

Eran Zvuloni, Ronit Almog, Michael Glikson, Shany Brimer Biton, Ilan Green, Izhar Laufer, Offer Amir, Joachim A. Behar

发表机构 * Leumit Health Services(Leumit健康服务)

AI总结 提出DeepHHF深度学习模型,利用24小时单导联心电图数据预测五年内心力衰竭风险,AUC达0.80,优于短时片段和临床评分,可解释性分析显示模型关注心律失常和心脏异常。

详情
AI中文摘要

心力衰竭(HF)影响11.8%的65岁及以上成年人,降低生活质量和寿命。预防HF可降低发病率和死亡率。我们假设将人工智能(AI)应用于24小时单导联心电图(ECG)数据可预测五年内HF风险。为此,使用了Technion-Leumit Holter ECG(TLHE)数据集,包括20年间收集的47,729名患者的69,663条记录。我们的深度学习模型DeepHHF在24小时ECG记录上训练,实现了0.80的受试者工作特征曲线下面积,优于使用30秒片段和临床评分的模型。DeepHHF识别的高风险个体住院或死亡事件概率翻倍。可解释性分析显示DeepHHF关注心律失常和心脏异常。本研究强调了深度学习建模24小时连续ECG数据的可行性,捕捉了对可靠风险预测至关重要的阵发性事件。应用于单导联Holter ECG的人工智能无创、廉价且广泛可及,使其成为HF风险预测的有前景工具。

英文摘要

Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

2512.17473 2026-06-19 eess.SP cs.LG math.OC stat.ML 版本更新

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

非线性矩阵分解的交替方向乘子法

Atharva Awari, Nicolas Gillis, Arnaud Vandaele

发表机构 * University of Mons(蒙斯大学)

AI总结 提出基于交替方向乘子法(ADMM)的算法求解非线性矩阵分解(NMD),支持多种非线性函数和损失函数,在真实数据集上验证了适用性和效率。

Comments 16 pages, 7 figures. v3: Revised version: added new experiments and comparisons. Code available from https://gitlab.com/Atharva05/admm-for-nmd

详情
AI中文摘要

我们提出了一种基于交替方向乘子法(ADMM)的算法,用于求解非线性矩阵分解(NMD)。给定输入矩阵 $X \in \mathbb{R}^{m \times n}$ 和分解秩 $r \ll \min(m, n)$,NMD 寻求矩阵 $W \in \mathbb{R}^{m \times r}$ 和 $H \in \mathbb{R}^{r \times n}$,使得 $X \approx f(WH)$,其中 $f$ 是逐元素非线性函数。我们在几个代表性非线性模型上评估了我们的方法:适用于非负稀疏数据近似的修正线性单元激活 $f(x) = \max(0, x)$,适用于概率电路表示的逐分量平方 $f(x) = x^2$,以及适用于推荐系统的 MinMax 变换 $f(x) = \min(b, \max(a, x))$。所提出的框架灵活支持多种损失函数,包括最小二乘、$\ell_1$ 范数和 Kullback-Leibler 散度,并且可以轻松扩展到其他非线性和度量。我们在真实世界数据集上展示了该方法的适用性、效率和适应性,突出了其在广泛应用中的潜力。

英文摘要

We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n}$ and a factorization rank $r \ll \min(m, n)$, NMD seeks matrices $W \in \mathbb{R}^{m \times r}$ and $H \in \mathbb{R}^{r \times n}$ such that $X \approx f(WH)$, where $f$ is an element-wise nonlinear function. We evaluate our method on several representative nonlinear models: the rectified linear unit activation $f(x) = \max(0, x)$, suitable for nonnegative sparse data approximation, the component-wise square $f(x) = x^2$, applicable to probabilistic circuit representation, and the MinMax transform $f(x) = \min(b, \max(a, x))$, relevant for recommender systems. The proposed framework flexibly supports diverse loss functions, including least squares, $\ell_1$ norm, and the Kullback-Leibler divergence, and can be readily extended to other nonlinearities and metrics. We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.

2503.02636 2026-06-19 q-bio.NC cs.AI 版本更新

A Deep Generative Model for Resting-State EEG Synthesis and Transferable Representation Learning

一种用于静息态脑电合成与可迁移表示学习的深度生成模型

Yeganeh Farahzadi, Morteza Ansarinia, Zoltan Kekecs

发表机构 * Institute of Psychology, Eötvös Loránd University(埃斯特哈兹·洛朗大学心理学研究所) Doctoral School of Psychology, Eötvös Loránd University(埃斯特哈兹·洛朗大学心理学博士学院) Department of Behavioural and Cognitive Sciences, University of Luxembourg(卢森堡大学行为与认知科学系)

AI总结 提出REST-GAN框架,结合对抗训练与自监督重构,从原始时域信号合成静息态EEG并学习可迁移表示,在频谱、连接性及分类任务中表现优异。

详情
AI中文摘要

静息态脑电提供了一种非侵入性的自发脑活动观测方式,但提取有意义的模式常受限于高质量数据稀缺和对人工设计特征的依赖。生成对抗网络(GAN)能够合成神经信号并从原始数据中学习可迁移表示,这一双重能力在脑电研究中尚未被充分探索。本文提出REST-GAN,一个基于GAN的静息态脑电框架,将对抗训练与辅助自监督重构目标相结合,以支持信号合成和无监督特征提取。尽管仅使用原始时域信号训练,未引入显式的频域或传感器拓扑监督,生成的时序列再现了真实脑电的关键时间、频谱和连接特性。在频带功率特征空间中,生成的样本在睁眼和闭眼条件下均表现出高精确率和召回率(EO: 0.91/0.67; EC: 0.87/0.65),而组平均频谱相干矩阵与真实数据在各频段上的平均绝对差异较低(约0.01-0.03)。模型判别器学习到的表示可迁移至独立的静息态人口统计学分类任务,其性能优于直接在原始脑电上训练的模型,并与近期脑电基础模型表现相当,同时所需训练数据和计算资源大幅减少。这些发现突显了一种计算高效的架构驱动策略,其中生成模型不仅作为脑电信号生成器,还作为无监督特征提取器。该方法有望支持更数据高效的脑电分析,同时减少对人工特征工程的依赖。REST-GAN的实现代码见:this https URL。

英文摘要

Resting-state EEG provides a non-invasive view of spontaneous brain activity, but extracting meaningful patterns is often limited by scarce high-quality data and reliance on manually engineered features. Generative adversarial networks (GANs) can synthesize neural signals and learn transferable representations directly from raw data, a dual capability that remains underexplored in EEG research. Here, we introduce REST-GAN, a GAN-based framework for resting-state EEG that combines adversarial training with an auxiliary self-supervised reconstruction objective to support signal synthesis and unsupervised feature extraction. Although trained only on raw time-domain signals, without explicit frequency-domain or sensor-topographic supervision, the generated time series reproduced key temporal, spectral, and connectivity properties of real EEG. In band-power feature space, generated samples showed high precision and recall across eyes-open and eyes-closed conditions (EO: 0.91/0.67; EC: 0.87/0.65), while group-average spectral coherence matrices showed low mean absolute differences from real data across frequency bands (~0.01-0.03). The representations learned by the model's critic transferred to independent resting-state demographic classification tasks, outperforming models trained directly on raw EEG and showing competitive performance relative to a recent EEG foundation model, while requiring substantially less training data and computational resources. These findings highlight a computationally efficient, architecture-driven strategy in which generative models serve not only as EEG signal generators, but also as unsupervised feature extractors. This approach may support more data-efficient EEG analysis while reducing reliance on manual feature engineering. The implementation code for REST-GAN is available at: https://github.com/Yeganehfrh/REST-GAN.

2509.15822 2026-06-19 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities

具有多于 $\sqrt{n}$ 个社区的随机块模型的相变

Alexandra Carpentier, Christophe Giraud, Nicolas Verzelen

发表机构 * Institut für Mathematik – Universität Potsdam, Potsdam, Germany(波恩大学数学研究所,德国波恩) Laboratoire de Mathématiques d’Orsay, Université Paris-Saclay, CNRS, France(奥赛数学实验室,巴黎-萨克雷大学,法国 CNRS) INRAE, Institut Agro, MISTEA, Univ. Montpellier, France(国家农业研究院,蒙彼利埃大学,法国)

AI总结 本文证明在随机块模型中,当社区数 $K\geq \sqrt{n}$ 时,低度多项式在 Chin 等人提出的阈值以下无法恢复社区,而通过计数特定子图可在多项式时间内实现恢复,支持了新相变阈值的猜想。

详情
AI中文摘要

统计物理的预测表明,在随机块模型(SBM)中,当社区数 $K$ 固定时,社区恢复在 Kesten-Stigum (KS) 阈值以上(且仅在其以上)可以在多项式时间内实现。这一猜想催生了丰富的文献,证明在 KS 阈值以上的 SBM 中,非平凡社区恢复确实是可能的。只要 $K\ll \sqrt{n}$(其中 $n$ 是观测图中的节点数),KS 阈值以下低度多项式(LDP)的失败也被证明。当 $K\geq \sqrt{n}$ 时,Chin 等人(2025)最近证明,在稀疏机制中,通过计数非回溯路径,可以在 KS 阈值以下的多项式时间内实现社区恢复。这一突破使他们提出了多社区机制 $K\geq \sqrt{n}$ 的新阈值。在这项工作中,我们为他们的猜想提供了证据:\n1- 我们证明,对于任意图密度,LDP 无法在 Chin 等人(2025)提出的阈值以下恢复社区;\n2- 我们证明,在所提出的阈值以上,不仅是在 Chin 等人(2025)考虑的稀疏机制中,而且在适度稀疏机制中,通过计数受 LDP 分析启发的某些特定子图,可以在多项式时间内实现社区恢复。\n特别地,计数长度为 $\log(n)$ 的自避路径(这与基于非回溯算子的谱算法密切相关)仅在稀疏机制中是最优的。在更密集的机制中,必须考虑基于循环放大的更复杂子图。

英文摘要

Predictions from statistical physics postulate that recovery of the communities in the Stochastic Block Model (SBM) with a fixed number $K$ of communities is possible in polynomial time above, and only above, the Kesten-Stigum (KS) threshold. This conjecture has given rise to a rich literature, proving that non-trivial community recovery is indeed possible in SBM above the KS threshold. Failure of low-degree polynomials (LDP) below the KS threshold was also proven, as long as $K\ll \sqrt{n}$, where $n$ is the number of nodes in the observed graph. When $K\geq \sqrt{n}$, Chin et al.(2025) recently proved that, in a \emph{sparse regime}, community recovery in polynomial time is possible below the KS threshold by counting non-backtracking paths. This breakthrough led them to postulate a new threshold for the many-communities regime $K\geq \sqrt{n}$. In this work, we provide evidence supporting their conjecture:\\ 1- We prove that, for \emph{any graph density}, LDP fail to recover communities below the threshold postulated by Chin et al.(2025) ;\\ 2- We prove that community recovery is possible in polynomial time above the postulated threshold, not only in the \emph{sparse regime} considered in Chin et al.~(2025), but also in \emph{moderately sparse regimes}, by counting occurrences of some specific motifs inspired by the LDP analysis.\\ In particular, counting self-avoiding paths of length $\log(n)$, which is closely related to spectral algorithms based on the Non-Backtracking operator, is optimal only in the sparse regime. More complex motifs based on the blow-up of a cycle must be considered in denser regimes.

2509.23806 2026-06-19 cs.SE cs.LG 版本更新

Influence-Guided Concolic Testing of Transformer Robustness

影响力引导的Transformer鲁棒性具体化测试

Chih-Duo Hong, Chih-Cheng Yang, Yu Wang, Fang Yu

发表机构 * Department of Management Information Systems(管理信息系)

AI总结 提出一种基于SHAP影响力排序路径谓词的具体化测试方法,通过纯Python实现多头注意力语义并显式化softmax边界,在CIFAR-10上对紧凑Transformer分类器实现60%攻击成功率,比差分进化基线高45%,且谓词优先级排序将中位攻击时间降低51%。

Comments Accepted at the 26th International Conference on Software Quality, Reliability, and Security

详情
AI中文摘要

神经网络的具体化测试交替进行具体执行和约束求解,以搜索翻转模型决策的输入。我们提出一种针对Transformer分类器的具体化测试器,使用SHAP估计对待定路径谓词按其当前预测的影响进行排序。为了支持SMT求解驱动的执行中多头自注意力机制,我们用纯Python实现注意力语义,使其与求解器兼容,并通过具体化指数参数使softmax边界显式化。我们在CIFAR-10上对三个紧凑Transformer分类器、ResNet18和VGG16在单像素预算和900秒时限下评估了该方法。在匹配比较的500个模型-输入对中,我们的方法实现了60%的成功率,而将模型视为黑盒的差分进化基线仅为15%。在主要的两层Transformer分支排序研究中,基于SHAP的谓词优先级排序将成功率从56%提升至60%,并将中位攻击时间降低51%。这些结果表明,影响力引导的路径探索可以使具体化测试成为在Transformer模型中寻找对抗样本的实用方法。

英文摘要

Concolic testing for neural networks alternates concrete execution with constraint solving to search for inputs that flip model decisions. We present a concolic tester for Transformer classifiers that uses SHAP estimates to rank pending path predicates by their impact on the current prediction. To support self-attention with multiple heads in execution backed by SMT solving, we implement attention semantics in pure Python that are compatible with the solver and make the softmax boundary explicit by concretizing exponentiation arguments. We evaluate our method on CIFAR-10 across three compact Transformer classifiers, ResNet18, and VGG16 under a one-pixel budget and a 900s horizon. Across the 500 model--input pairs in this matched comparison, our method achieves 60% success, compared with 15% for a differential evolution baseline that treats the model as a black box. In the primary two-layer Transformer branch-ordering study, SHAP-based predicate prioritization raises success from 56% to 60% and reduces median attack time by 51%. These results show that influence-guided path exploration can make concolic testing a practical way to find adversarial examples in Transformer models.

2508.13313 2026-06-19 stat.ML cs.LG math.OC 版本更新

Flow Matching for Efficient and Scalable Data Assimilation

用于高效可扩展数据同化的流匹配

Taos Transue, Bohan Chen, So Takao, Bao Wang

发表机构 * The Computing and Mathematical Sciences Department, California Institute of Technology(加州理工学院计算与数学科学系) Department of Mathematics and Scientific Computing and Imaging Institute, University of Utah(犹他大学数学与科学计算系和成像研究所)

AI总结 提出基于流匹配的无训练集成流滤波器(EnFF),通过蒙特卡洛估计和局部化引导加速高维非线性数据同化,在成本-精度权衡和可扩展性上优于现有方法。

Comments revamp presentation, add experiments

详情
AI中文摘要

数据同化(DA)从含噪声观测中估计动态系统的状态。最近的生成模型如集成得分滤波器(EnSF)改进了高维非线性设置下的DA,但计算成本高。我们引入集成流滤波器(EnFF),一种基于流匹配(FM)的无训练框架,加速采样并提供流设计灵活性。EnFF使用边际流场的蒙特卡洛估计器、用于观测同化的局部化引导,并利用一种利用贝叶斯DA公式的新型流路径。它推广了经典滤波器如自举粒子滤波器和集成卡尔曼滤波器。在高维基准上的实验证明了EnFF改进的成本-精度权衡和可扩展性,突显了FM在高效、可扩展DA中的潜力。代码见 https://this URL。

英文摘要

Data assimilation (DA) estimates a dynamical system's state from noisy observations. Recent generative models like the ensemble score filter (EnSF) improve DA in high-dimensional nonlinear settings but are computationally expensive. We introduce the ensemble flow filter (EnFF), a training-free, flow matching (FM)-based framework that accelerates sampling and offers flexibility in flow design. EnFF uses Monte Carlo estimators for the marginal flow field, localized guidance for observation assimilation, and utilizes a novel flow path that exploits the Bayesian DA formulation. It generalizes classical filters such as the bootstrap particle filter and ensemble Kalman filter. Experiments on high-dimensional benchmarks demonstrate EnFF's improved cost-accuracy tradeoffs and scalability, highlighting FM's potential for efficient, scalable DA. Code is available at https://github.com/Utah-Math-Data-Science/Data-Assimilation-Flow-Matching.

2509.02581 2026-06-19 cs.DL cs.AI 版本更新

Charting the Future of Scholarly Knowledge with AI: A Community Perspective

用AI绘制学术知识的未来:社区视角

Azanzi Jiomekong, Hande Küçük McGinty, Keith G. Mills, Allard Oelen, Enayat Rajabi, Harry McElroy, Antrea Christou, Anmol Saini, Janice Anta Zebaze, Hannah Kim, Anna M. Jacyszyn, Gollam Rabby, Dirk Betz, Claudia Biniossek, Sanju Tiwari, Sören Auer

发表机构 * TIB Leibniz Information Centre for Science and Technology(蒂宾根莱比锡科学与技术信息中心) Department of Computer Science, University of Yaounde 1(亚奥内1大学计算机科学系) Department of Computer Science, Kansas State University(堪萨斯州立大学计算机科学系) School of EECS, Louisiana State University(路易斯安那州立大学电子工程与计算机科学学院) Management Science Department, Cape Breton University(cape breton 大学管理科学系) Department of Development and Research, Performigence(Performigence 发展与研究部) Department of Engineering and Computer Science, Wright State University(怀特州立大学工程与计算机科学系) Department of Physics, University of Yaounde 1(亚奥内1大学物理系) FIZ Karlsruhe, Leibniz Institute for Information Infrastructure(卡尔斯鲁厄莱比锡信息基础设施研究所) Sharda University, Delhi-NCR, India(德里-纳尔默德印度大学) L3S Research Center, Leibniz University of Han(汉莱比锡大学L3S研究中心)

AI总结 本文从社区视角出发,识别促进跨学科对话、共享挑战、分类新合作并塑造学术知识组织未来研究方向的方法。

Comments 39 pages, 3 figures

详情
AI中文摘要

尽管支持学术知识提取和组织的工具日益普及,许多研究人员仍依赖手动方法,有时是因为对现有技术不熟悉或缺乏领域适应性解决方案。同时,跨学科学术出版物的快速增长使得跟上最新进展越来越困难,进一步凸显了对可扩展的、基于AI的方法来结构化和综合学术知识的需求。各个研究社区已开始独立应对这一挑战,开发旨在构建可靠、动态且可查询的学术知识库的工具和框架。然而,这些社区之间的有限互动阻碍了方法、模型和最佳实践的交流,减缓了向更集成解决方案的进展。本文确定了促进跨学科对话、识别共同挑战、分类新合作并塑造学术知识组织未来研究方向的方法。

英文摘要

Despite the growing availability of tools designed to support scholarly knowledge extraction and organization, many researchers still rely on manual methods, sometimes due to unfamiliarity with existing technologies or limited access to domain-adapted solutions. Meanwhile, the rapid increase in scholarly publications across disciplines has made it increasingly difficult to stay current, further underscoring the need for scalable, AI-enabled approaches to structuring and synthesizing scholarly knowledge. Various research communities have begun addressing this challenge independently, developing tools and frameworks aimed at building reliable, dynamic, and queryable scholarly knowledge bases. However, limited interaction across these communities has hindered the exchange of methods, models, and best practices, slowing progress toward more integrated solutions. This manuscript identifies ways to foster cross-disciplinary dialogue, identify shared challenges, categorize new collaboration and shape future research directions in scholarly knowledge and organization.

2507.19712 2026-06-19 cs.DC cs.AI cs.GT cs.LG cs.NI 版本更新

Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Oranits: 基于Open RAN的智能交通系统中的任务分配与卸载——元启发式与深度强化学习方法

Ngoc Hung Nguyen, Nguyen Van Thieu, Quang-Trung Luu, Anh Tuan Nguyen, Senura Wanasekara, Nguyen Cong Luong, Fatemeh Kavehmadavani, Van-Dinh Nguyen

发表机构 * Department of Smart City, Hanyang University(翰阳大学智能城市系)

AI总结 提出Oranits系统模型,通过元启发式算法CGG-ARO和深度强化学习框架MA-DDQN优化车辆协作中的任务依赖与卸载成本,分别提升任务完成率7.7%和12.5%。

Comments 16 pages, 13 figures

Journal ref IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2026

详情
AI中文摘要

本文研究了基于开放无线接入网(Open RAN)的智能交通系统(ITS)中的任务分配与卸载问题,其中自动驾驶车辆利用移动边缘计算进行高效处理。现有研究常忽视任务之间的复杂依赖关系以及将任务卸载到边缘服务器的成本,导致决策次优。为弥补这一不足,我们引入了Oranits,一种新颖的系统模型,明确考虑了任务依赖性和卸载成本,同时通过车辆协作优化性能。为此,我们提出了一种双重优化方法。首先,我们开发了一种基于元启发式的进化计算算法,即混沌高斯全局ARO(CGG-ARO),作为单时隙优化的基线。其次,我们设计了一种增强的基于奖励的深度强化学习(DRL)框架,称为多智能体双深度Q网络(MA-DDQN),该框架集成了多智能体协调和多动作选择机制,显著减少了任务分配时间并提高了对基线方法的适应性。大量仿真表明,CGG-ARO将完成任务数量和总体收益分别提高了约7.1%和7.7%。同时,MA-DDQN在任务完成率和总体收益方面分别实现了11.0%和12.5%的更大提升。这些结果凸显了Oranits在动态ITS环境中实现更快、更自适应、更高效任务处理的有效性。

英文摘要

In this paper, we explore mission assignment and task offloading in an Open Radio Access Network (Open RAN)-based intelligent transportation system (ITS), where autonomous vehicles leverage mobile edge computing for efficient processing. Existing studies often overlook the intricate interdependencies between missions and the costs associated with offloading tasks to edge servers, leading to suboptimal decision-making. To bridge this gap, we introduce Oranits, a novel system model that explicitly accounts for mission dependencies and offloading costs while optimizing performance through vehicle cooperation. To achieve this, we propose a twofold optimization approach. First, we develop a metaheuristic-based evolutionary computing algorithm, namely the Chaotic Gaussian-based Global ARO (CGG-ARO), serving as a baseline for one-slot optimization. Second, we design an enhanced reward-based deep reinforcement learning (DRL) framework, referred to as the Multi-agent Double Deep Q-Network (MA-DDQN), that integrates both multi-agent coordination and multi-action selection mechanisms, significantly reducing mission assignment time and improving adaptability over baseline methods. Extensive simulations reveal that CGG-ARO improves the number of completed missions and overall benefit by approximately 7.1% and 7.7%, respectively. Meanwhile, MA-DDQN achieves even greater improvements of 11.0% in terms of mission completions and 12.5% in terms of the overall benefit. These results highlight the effectiveness of Oranits in enabling faster, more adaptive, and more efficient task processing in dynamic ITS environments.

2503.01425 2026-06-19 cs.GR cs.CV 版本更新

MeshPad: Interactive Sketch-Conditioned Artist-Reminiscent Mesh Generation and Editing

MeshPad: 交互式草图条件艺术家风格网格生成与编辑

Haoxuan Li, Ziya Erkoc, Lei Li, Daniele Sirigatti, Vladislav Rosov, Angela Dai, Matthias Nießner

发表机构 * Technical University of Munich(慕尼黑技术大学) AUDI AG(奥迪股份公司)

AI总结 提出MeshPad,一种基于草图输入的交互式3D网格生成与编辑方法,通过分解为网格区域的删除和添加操作,结合Transformer和顶点对齐推测策略,实现快速迭代编辑,在Chamfer距离上提升22%以上质量,并获90%用户偏好。

Comments Project page: https://derkleineli.github.io/meshpad/ Video: https://www.youtube.com/watch?v=_T6UTGTMZ1E

详情
AI中文摘要

我们介绍了MeshPad,一种从草图输入生成3D网格的生成方法。基于最近在艺术家风格三角形网格生成方面的进展,我们的方法解决了交互式网格创建的需求。为此,我们专注于通过将编辑分解为网格区域的“删除”和随后新网格几何的“添加”来实现一致编辑。这两个操作都由用户对草图图像的简单编辑触发,促进了迭代内容创建过程,并能够构建复杂的3D网格。我们的方法基于三角形序列网格表示,利用大型Transformer模型进行网格三角形的添加和删除。为了交互式地执行编辑,我们在加法网格生成器之上引入了一种顶点对齐的推测预测策略。该推测器预测对应于一个顶点的多个输出标记,从而显著降低推理的计算成本并加速编辑过程,使得每个编辑步骤只需几秒钟即可完成。综合实验表明,MeshPad优于最先进的草图条件网格生成方法,在Chamfer距离上实现了超过22%的网格质量改进,并且在感知评估中被90%的参与者所偏好。

英文摘要

We introduce MeshPad, a generative approach that creates 3D meshes from sketch inputs. Building on recent advances in artist-reminiscent triangle mesh generation, our approach addresses the need for interactive mesh creation. To this end, we focus on enabling consistent edits by decomposing editing into 'deletion' of regions of a mesh, followed by 'addition' of new mesh geometry. Both operations are invoked by simple user edits of a sketch image, facilitating an iterative content creation process and enabling the construction of complex 3D meshes. Our approach is based on a triangle sequence-based mesh representation, exploiting a large Transformer model for mesh triangle addition and deletion. In order to perform edits interactively, we introduce a vertex-aligned speculative prediction strategy on top of our additive mesh generator. This speculator predicts multiple output tokens corresponding to a vertex, thus significantly reducing the computational cost of inference and accelerating the editing process, making it possible to execute each editing step in only a few seconds. Comprehensive experiments demonstrate that MeshPad outperforms state-of-the-art sketch-conditioned mesh generation methods, achieving more than 22% mesh quality improvement in Chamfer distance, and being preferred by 90% of participants in perceptual evaluations.

2508.05762 2026-06-19 cond-mat.mtrl-sci cs.LG 版本更新

Evaluating Universal Machine Learning Force Fields Against Experimental Measurements

评估通用机器学习力场与实验测量的对比

Sajid Mannan, Vaibhav Bihani, Carmelo Gonzales, Kin Long Kelvin Lee, Nitya Nand Gosvami, Sayan Ranu, Santiago Miret, N M Anoop Krishnan

发表机构 * Department of Civil Engineering, Indian Institute of Technology Delhi(印度理工学院德里土木工程系) Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi(印度理工学院德里人工智能学院) Intel Labs, California, USA(美国加州英特尔实验室) Department of Materials Science and Engineering, Indian Institute of Technology Delhi(印度理工学院德里材料科学与工程系) Department of Computer Science and Engineering, Indian Institute of Technology Delhi(印度理工学院德里计算机科学与工程系)

AI总结 提出UniFFBench框架和MinX数据集,系统评估六种通用机器学习力场,发现模型在计算基准上表现优异但在实验复杂性下存在显著“现实差距”,密度预测误差高于实际应用阈值。

详情
AI中文摘要

通用机器学习力场(UMLFFs)有望通过实现跨元素周期表的快速原子模拟来革新材料科学。然而,它们的评估一直局限于可能无法反映实际性能的计算基准。我们引入了UniFFBench,一个全面的评估框架,包含MinX数据集——一个涵盖85种元素、极端热力学条件(0–5000 K, 0–1000 GPa)和结构复杂性(包括部分占据和无序)的1500多种矿物系统的多样化集合。这种多样性,结合用于验证的实验参考值,使得能够评估UMLFF在化学空间和条件上的泛化能力,这些条件远超典型的训练场景。我们对六种最先进的UMLFF的系统评估揭示了一个显著的“现实差距”:在计算基准上表现令人印象深刻的模型在面对实验复杂性时常常失败。即使是最好的模型也表现出高于实际应用所需阈值的密度预测误差。我们观察到模拟稳定性和力学性能准确性之间的脱节,预测误差与训练数据表示相关,而非建模方法。

英文摘要

Universal machine learning force fields (UMLFFs) promise to revolutionize materials science by enabling rapid atomistic simulations across the periodic table. However, their evaluation has been limited to computational benchmarks that may not reflect real-world performance. We introduce UniFFBench, a comprehensive evaluation framework featuring the MinX dataset -- a diverse collection of 1,500+ mineral systems spanning 85 elements, extreme thermodynamic conditions (0--5000 K, 0--1000 GPa), and structural complexity, including partial occupancy and disorder. This diversity, combined with experimental reference values for validation, enables assessment of UMLFF generalization across chemical space and conditions substantially beyond typical training scenarios. Our systematic evaluation of six state-of-the-art UMLFFs reveals a substantial ``reality gap'': models achieving impressive performance on computational benchmarks often fail when confronted with experimental complexity. Even the best-performing models exhibit higher density prediction error than the threshold required for practical applications. We observe disconnects between simulation stability and mechanical property accuracy, with prediction errors correlating with training data representation rather than the modeling method.

2507.19653 2026-06-19 cs.NI cs.AI cs.LG 版本更新

On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

关于射线追踪在城市环境中基于学习的射频任务局限性的研究

Armen Manukyan, Hrant Khachatrian, Edvard Ghukasyan, Theofanis P. Raptis

发表机构 * Yerevan State University, Yerevan, Armenia(亚美尼亚叶里温州立大学) YerevaNN, Yerevan, Armenia(亚美尼亚叶里温YerevaNN) Institute of Informatics and Telematics, National Research Council, Pisa, Italy(意大利那不勒斯国家研究委员会信息与电信研究所)

AI总结 通过罗马城区实测数据评估Sionna射线追踪仿真器,发现天线位置和方向对保真度影响显著,而超参数影响微弱;优化后相关性提升5%-130%,定位误差降低三分之一,但残差城市噪声仍是挑战。

Comments This work was supported by funding under the bilateral agreement between CNR (Italy) and HESC MESCS RA (Armenia) as part of the DeepRF project for the 2025-2026 biennium, and by the HESC MESCS RA grant No. 22rl-052 (DISTAL)

Journal ref 2026 IEEE Wireless Communications and Networking Conference (WCNC)

详情
AI中文摘要

我们研究了Sionna v1.0.2射线追踪在罗马市中心户外蜂窝链路中的真实感。我们使用了包含1,664个用户设备(UE)和六个名义基站(BS)站点的真实测量数据集。利用这些固定位置,我们系统地改变了主要仿真参数,包括路径深度、漫反射/镜面反射/折射标志、载波频率,以及天线的属性如高度、辐射方向和方向图。通过测量功率与仿真功率之间的Spearman相关性,以及基于RSSI指纹的k近邻定位算法,对每个基站的仿真保真度进行评分。在所有实验中,求解器超参数对所选指标的影响微不足道。相反,天线位置和方向被证明是决定性的。通过简单的贪婪优化,我们将不同基站的Spearman相关性提高了5%到130%,而仅使用仿真数据作为参考点的kNN定位误差在真实世界样本上减少了三分之一,但仍比纯真实数据的误差高一倍。因此,精确的几何形状和可信的天线模型是必要但不充分的;忠实地捕捉残余的城市噪声仍然是实现可迁移、高保真户外射频仿真的一个开放挑战。

英文摘要

We study the realism of Sionna v1.0.2 ray-tracing for outdoor cellular links in central Rome. We use a real measurement set of 1,664 user-equipments (UEs) and six nominal base-station (BS) sites. Using these fixed positions we systematically vary the main simulation parameters, including path depth, diffuse/specular/refraction flags, carrier frequency, as well as antenna's properties like its altitude, radiation pattern, and orientation. Simulator fidelity is scored for each base station via Spearman correlation between measured and simulated powers, and by a fingerprint-based k-nearest-neighbor localization algorithm using RSSI-based fingerprints. Across all experiments, solver hyper-parameters are having immaterial effect on the chosen metrics. On the contrary, antenna locations and orientations prove decisive. By simple greedy optimization we improve the Spearman correlation by 5% to 130% for various base stations, while kNN-based localization error using only simulated data as reference points is decreased by one-third on real-world samples, while staying twice higher than the error with purely real data. Precise geometry and credible antenna models are therefore necessary but not sufficient; faithfully capturing the residual urban noise remains an open challenge for transferable, high-fidelity outdoor RF simulation.

2507.19137 2026-06-19 eess.AS cs.AI cs.SD 版本更新

Assessment of Personality Dimensions Across Situations in Dyadic Role-Play Scenarios

二元角色扮演场景中跨情境的人格维度评估

Alice Zhang, Skanda Muralidhar, Daniel Gatica-Perez, Mathew Magimai-Doss

发表机构 * Idiap Research Institute(日内瓦研究所) The University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 研究通过对话语音分析,发现感知人格在不同工作情境下显著变化,并识别出与各人格特质相关的声学特征。

Comments Accepted to IEEE Transactions on Affective Computing

详情
AI中文摘要

先前研究表明,用户偏好与其人格相匹配的辅助技术。这引发了对自动人格感知(APP)的兴趣,旨在预测个体感知到的人格特质。以往的APP研究将人格视为静态特质,独立于情境。然而,心理学研究表明,感知人格会随情境和场景而变化。在本研究中,我们调查了参与两种工作情境(中性面试和压力客户互动)的参与者对话语音与感知人格之间的关系。我们的主要发现是:1)感知人格在不同互动中显著不同;2)响度、声压级和频谱通量特征在中性互动中指示感知的外向性、宜人性、尽责性和开放性,而在压力情境中,神经质与这些特征相关;3)手工声学特征和非语言特征在感知人格推断中优于说话人嵌入;4)压力互动更能预测神经质,这与现有心理学研究一致。

英文摘要

Prior research indicates that users prefer assistive technologies whose personalities align with their own. This has sparked interest in automatic personality perception (APP), which aims to predict an individual's perceived personality traits. Previous studies in APP have treated personalities as static traits, independent of context. However, perceived personalities can vary by context and situation as shown in psychological research. In this study, we investigate the relationship between conversational speech and perceived personality for participants engaged in two work situations (a neutral interview and a stressful client interaction). Our key findings are: 1) perceived personalities differ significantly across interactions, 2) loudness, sound level, and spectral flux features are indicative of perceived extraversion, agreeableness, conscientiousness, and openness in neutral interactions, while neuroticism correlates with these features in stressful contexts, 3) handcrafted acoustic features and non-verbal features outperform speaker embeddings in inference of perceived personality, and 4) stressful interactions are more predictive of neuroticism, aligning with existing psychological research.

2503.23179 2026-06-19 eess.IV cs.CV 版本更新

OncoReg: Medical Image Registration for Oncological Challenges

OncoReg:面向肿瘤学挑战的医学图像配准

Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Tim Hable, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

发表机构 * Institute of Medical Informatics, University of Lübeck(吕贝克大学医学信息学研究所) Institute of Radiology and Nuclear Medicine, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射科和核医学研究所) Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute(伦塞拉塞尔理工学院生物医学工程系和生物技术与跨学科研究中心) School of Computer Science, University of Birmingham(伯明翰大学计算机科学学院) Division of Informatics, Imaging and Data Sciences, University of Manchester(曼彻斯特大学信息学、成像和数据科学系) DAMO Academy, Alibaba Group(阿里集团DAMO学院) Hangzhou Shengshi Technology Co., Ltd(杭州盛世科技有限公司) Department of Radiation Oncology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射肿瘤科) EchoScout GmbH Radboud University Medical Center, Nijmegen(奈密根大学医学中心) Institute of Interventional Radiology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院介入放射科)

AI总结 提出OncoReg挑战,通过两阶段框架在保护患者隐私的同时开发可泛化的图像配准方法,用于放射治疗中锥束CT与扇束CT的配准,发现特征提取是关键,深度学习和经典方法结合最有效。

Comments 21 pages, 13 figures

详情
AI中文摘要

在现代癌症研究中,由于患者隐私相关的挑战,产生的大量医学数据往往未被充分利用。OncoReg挑战通过一个两阶段框架解决了这一问题,该框架使研究人员能够在确保患者隐私的同时开发和验证图像配准方法,并促进更可泛化的AI模型的发展。第一阶段涉及使用公开可用的数据集,第二阶段则专注于在安全的医院网络内对私有数据集进行模型训练。OncoReg建立在Learn2Reg挑战的基础上,纳入了放射治疗中介入性锥束计算机断层扫描与标准计划扇束CT图像的配准。准确的图像配准在肿瘤学中至关重要,特别是在图像引导放射治疗的动态治疗调整中,需要精确对齐以最小化对健康组织的辐射暴露,同时有效靶向肿瘤。本文详细介绍了OncoReg挑战的方法和数据,并对竞赛参赛作品和结果进行了全面分析。研究发现,特征提取在此配准任务中起着关键作用。从该挑战中涌现的一种新方法展示了其多功能性,而现有方法的表现与新技术相当。深度学习和经典方法在图像配准中仍扮演重要角色,尤其是方法的组合,特别是在特征提取方面,被证明最为有效。

英文摘要

In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves working with a publicly available dataset, while phase two focuses on training models on a private dataset within secure hospital networks. OncoReg builds upon the foundation established by the Learn2Reg Challenge by incorporating the registration of interventional cone-beam computed tomography with standard planning fan-beam CT images in radiotherapy. Accurate image registration is crucial in oncology, particularly for dynamic treatment adjustments in image-guided radiotherapy, where precise alignment is necessary to minimise radiation exposure to healthy tissues while effectively targeting tumours. This work details the methodology and data behind the OncoReg Challenge and provides a comprehensive analysis of the competition entries and results. Findings reveal that feature extraction plays a pivotal role in this registration task. A new method emerging from this challenge demonstrated its versatility, while established approaches continue to perform comparably to newer techniques. Both deep learning and classical approaches still play significant roles in image registration, with the combination of methods, particularly in feature extraction, proving most effective.

2506.01678 2026-06-19 cond-mat.mtrl-sci cs.AI 版本更新

Overcoming Labelled Data Scarcity for Defect Classification in Scanning Tunneling Microscopy

克服扫描隧道显微镜缺陷分类中的标注数据稀缺问题

Nikola L. Kolev, Max Trouton, Filippo Federici Canova, Geoff Thornton, David Z. Gao, Neil J. Curson, Taylor J. Z. Stock

发表机构 * London Centre for Nanotechnology, University College London(伦敦纳米技术中心,伦敦大学学院) Department of Electronic and Electrical Engineering, University College London(电子与电气工程系,伦敦大学学院) Department of Physics and Astronomy, University College London(物理与天文学系,伦敦大学学院) Department of Chemistry, University College London(化学系,伦敦大学学院) Aalto Science Institute, School of Science, Aalto University(艾尔沃斯科学研究所,艾尔沃斯大学) Nanolayers Research Computing LTD, London, UK(纳米层研究计算有限公司,伦敦,英国) Department of Physics, NTNU Norwegian University of Science and Technology(物理系,挪威科技大学)

AI总结 提出结合少样本学习和无监督学习的自动分割方法,在仅需少量标注数据下实现高精度STM图像缺陷分类,并在三种表面验证了强泛化能力。

详情
AI中文摘要

扫描隧道显微镜(STM)是一种以原子分辨率对表面成像的强大技术,可深入理解单原子和分子层面的物理化学过程。STM图像分析的一项常规任务是在均匀背景中识别和标记感兴趣的特征。手动执行此操作是一项劳动密集型工作,需要大量人力。为减轻这一负担,我们提出了一种自动化的STM图像分割方法,该方法同时使用少样本学习和无监督学习。与之前的监督方法相比,我们的技术提供了更大的灵活性;它消除了对大型手动标注数据集的需求,因此更容易适应未见过的表面,同时仍保持高精度。我们通过使用该方法识别三种不同表面上的原子特征来展示其有效性:Si(001)、Ge(001)和TiO$_2$(110),包括吸附在硅和锗表面上的AsH$_3$分子。我们的模型表现出强大的泛化能力,在初始训练后,仅需一个额外的标注数据点即可适应未见过的表面。这项工作朝着高效且与材料无关的STM图像自动分割迈出了重要一步。

英文摘要

Scanning tunnelling microscopy (STM) is a powerful technique for imaging surfaces with atomic resolution, providing insight into physical and chemical processes at the level of single atoms and molecules. A regular task of STM image analysis is the identification and labelling of features of interest against a uniform background. Performing this manually is a labour-intensive task, requiring significant human effort. To reduce this burden, we propose an automated approach to the segmentation of STM images that uses both few-shot learning and unsupervised learning. Our technique offers greater flexibility compared to previous supervised methods; it removes the requirement for large manually annotated datasets and is thus easier to adapt to an unseen surface while still maintaining a high accuracy. We demonstrate the effectiveness of our approach by using it to recognise atomic features on three distinct surfaces: Si(001), Ge(001), and TiO$_2$(110), including adsorbed AsH$_3$ molecules on the silicon and germanium surfaces. Our model exhibits strong generalisation capabilities, and following initial training, can be adapted to unseen surfaces with as few as one additional labelled data point. This work is a significant step towards efficient and material-agnostic, automatic segmentation of STM images.

2503.20646 2026-06-19 cs.HC cs.RO cs.SY eess.SY 版本更新

Immersive and Wearable Thermal Rendering for Augmented Reality

增强现实的沉浸式可穿戴热渲染

Alexandra Watkins, Ritam Ghosh, Evan Chow, Nilanjan Sarkar

发表机构 * Vanderbilt University(范德比大学)

AI总结 提出一种掌戴式热反馈原型,通过间接反馈、主动热透传和时空变化渲染策略,在增强现实中实现沉浸式热触觉体验,实验验证了其可行性与权衡。

详情
AI中文摘要

我们提出了一种概念验证的掌戴式热反馈原型,针对增强现实(AR)中的热渲染挑战,用户必须在其物理工作空间中与真实和虚拟物体交互。与为虚拟现实开发的热反馈系统相比,AR热反馈必须保持手部灵活性、维持对真实世界热线索的访问,并在不阻碍自然物体交互的情况下提供连贯的虚拟温度感知。我们提出了三个AR特定的设计考虑,并由我们的原型实现:间接反馈以保持指尖灵活性、主动热透传以感知和渲染接触物理表面的温度,以及手掌上的空间和时间变化热渲染。人体实验评估了AR交互过程中的感知灵敏度、间接反馈、主动热透传、空间模式识别和移动热渲染。结果表明,尽管间接反馈在指尖视觉接触时降低了感知真实感,但并未降低沉浸感或舒适度;主动热透传支持真实与渲染表面之间的温度辨别;时空渲染相比静态热刺激显著提高了沉浸感和真实感。这些发现表明,我们的设计考虑是AR热触觉的可行设计策略,同时澄清了需要精确真实感与更广泛沉浸式热体验的应用之间的权衡。

英文摘要

We present a proof-of-concept palm-mounted thermal feedback prototype addressing thermal rendering challenges specific to augmented reality (AR), where users must interact with both real and virtual objects in their physical workspace. In contrast to thermal feedback systems developed for virtual reality, AR thermal feedback must preserve manual dexterity, maintain access to real-world thermal cues, and provide coherent virtual temperature sensations without obstructing natural object interaction. We propose three AR-specific design considerations, which our prototype implements: indirect feedback to preserve fingertip dexterity, active thermal passthrough to sense and render the temperature of contacted physical surfaces, and spatially and temporally varying thermal rendering across the palm. Human-subject experiments evaluated perceptual sensitivity, indirect feedback, active thermal passthrough, spatial pattern recognition, and moving thermal rendering during AR interaction. Results showed that although indirect feedback reduced perceived realism during visual contact at the fingertips, it did not reduce immersion or comfort; active thermal passthrough supported temperature discrimination between real and rendered surfaces; and spatiotemporal rendering significantly improved immersion and realism compared with static thermal stimulation. These findings suggest that our design considerations are viable design strategies for AR thermal haptics, while also clarifying tradeoffs for applications that require precise realism versus broader immersive thermal experience.

2503.17386 2026-06-19 eess.SY cs.LG cs.SY 版本更新

A graph neural network surrogate model for mesh-based crashworthiness prediction of vehicle panel components

基于图神经网络的网格级车辆面板部件耐撞性预测代理模型

Haoran Li, Yingxue Zhao, Haosu Zhou, Tobias Pfaff, Nan Li

发表机构 * Dyson School of Design Engineering, Imperial College London(迪森设计工程学院,帝国理工学院伦敦分校) NVIDIA

AI总结 提出递归图U-Net (ReGUNet) 代理模型,通过图表示有限元网格,结合层次架构和递归机制,高效准确预测车辆B柱等面板部件的动态变形和耐撞性指标。

Comments Accepted manuscript version. Final published version available in Results in Engineering via DOI: 10.1016/j.rineng.2026.110925

Journal ref Results in Engineering 30 (2026) 110925

详情
AI中文摘要

耐撞性是安全关键车辆面板部件(如B柱)设计中的关键性能指标。有限元(FE)模拟广泛用于评估碰撞响应,但对于大规模非线性碰撞场景,特别是当集成到迭代设计和优化过程中时,计算成本仍然很高。尽管基于机器学习的代理模型已被开发用于快速耐撞性分析,但它们在对复杂三维部件的详细表示方面存在局限性。图神经网络(GNN)已成为处理复杂结构数据的有前景的解决方案。然而,现有的GNN模型通常缺乏足够的精度和计算效率以满足工业需求。本文提出了递归图U-Net(ReGUNet),一种用于车辆面板部件耐撞性分析的基于图的代理模型。通过将有限元网格表示为图形式,该模型自然地适应复杂的非规则结构几何。其层次架构提高了计算效率和精度,而递归的引入增强了多时间步长上时间预测的稳定性。使用不同几何形状的热冲压钢B柱的侧面碰撞案例研究来生成训练数据集。训练后的模型在预测未见过的部件设计的动态变形行为和耐撞性指标方面表现出高精度。与基线方法相比,ReGUNet在平均变形预测误差上实现了超过52%的降低,同时计算效率显著提高。ReGUNet提供了快速可靠的耐撞性评估,从而加速了车辆面板部件的设计周期。

英文摘要

Crashworthiness is a key performance measure in the design of safety-critical vehicle panel components such as B-pillars. Finite element (FE) simulations are widely used to evaluate crash responses but remain computationally expensive for large-scale, nonlinear impact scenarios, particularly when integrated into iterative design and optimisation processes. Although machine learning-based surrogate models have been developed for rapid crashworthiness analysis, they exhibit limitations in detailed representation of complex 3-dimensional components. Graph Neural Networks (GNNs) have emerged as a promising solution for processing data with complex structures. However, existing GNN models often lack sufficient accuracy and computational efficiency to meet industrial demands. This paper proposes Recurrent Graph U-Net (ReGUNet), a graph-based surrogate model for crashworthiness analysis of vehicle panel components. By representing FE meshes in graph form, the model naturally accommodates complex irregular structural geometries. Its hierarchical architecture improves computational efficiency and accuracy, while the introduction of recurrence enhances stability of temporal predictions over multiple time steps. A side-impact case study of hot-stamped steel B-pillars with varying geometries is used to generate training dataset. The trained model demonstrates high accuracy in predicting the dynamic deformation behaviour and crashworthiness indicators of previously unseen component designs. ReGUNet achieves over a 52% reduction in the average deformation prediction error relative to baseline methods, together with markedly improved computational efficiency. ReGUNet provides rapid and reliable crashworthiness assessments, which in turn accelerates the design cycle of vehicle panel components.

2406.02421 2026-06-19 cs.DM cs.LG cs.SC 版本更新

Representing Piecewise-Linear Functions by Functions with Minimal Arity

用最小元数函数表示分段线性函数

Christoph Koutschan, Anton Ponomarchuk, Josef Schicho

发表机构 * Johann Radon Institute for Computational and Applied Mathematics(约翰·拉登研究所(计算与应用数学)) Research Institute for Symbolic Computation(符号计算研究所) Johannes Kepler University(约翰· Kepler大学)

AI总结 本文研究了连续分段线性函数表示为max函数线性组合所需的最小参数个数,建立了函数诱导的空间剖分与所需参数个数之间的直接联系。

详情
AI中文摘要

任何连续分段线性函数 $F\colon \mathbb{R}^{n}\to \mathbb{R}$ 都可以表示为至多 $n+1$ 个仿射线性函数的 $\max$ 函数的线性组合。在我们之前的论文 [``Representing piecewise linear functions by functions with small arity'', AAECC, 2023] 中,我们证明了 $n+1$ 个参数的上界是紧的。在本文中,我们通过建立函数 $F$ 与任何此类分解所需的最小参数个数之间的对应关系来扩展这一结果。我们表明,由函数 $F$ 诱导的输入空间 $\mathbb{R}^{n}$ 的剖分与 $\max$ 函数中的参数个数有直接联系。

英文摘要

Any continuous piecewise-linear function $F\colon \mathbb{R}^{n}\to \mathbb{R}$ can be represented as a linear combination of $\max$ functions of at most $n+1$ affine-linear functions. In our previous paper [``Representing piecewise linear functions by functions with small arity'', AAECC, 2023], we showed that this upper bound of $n+1$ arguments is tight. In the present paper, we extend this result by establishing a correspondence between the function $F$ and the minimal number of arguments that are needed in any such decomposition. We show that the tessellation of the input space $\mathbb{R}^{n}$ induced by the function $F$ has a direct connection to the number of arguments in the $\max$ functions.