arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2117
2601.21324 2026-06-12 stat.ML cs.LG 版本更新

Bulk-Calibrated Credal Ambiguity Sets: Fast, Tractable Decision Making under Out-of-Sample Contamination

批量校准的置信模糊集:样本外污染下的快速、可处理决策

Mengqi Chen, Thomas B. Berrett, Theodoros Damoulas, Michele Caprio

发表机构 * University of Bristol(布里斯托大学) University of Cambridge(剑桥大学) University of California, Berkeley(加州大学伯克利分校) University of Oxford(牛津大学)

AI总结 提出批量校准置信模糊集,通过分离批量内污染和尾部贡献,得到闭式有限风险目标,转化为线性或二阶锥规划,实现高效鲁棒优化。

详情
Comments
Accepted for publication (spotlight) at ICML 2026
AI中文摘要

分布鲁棒优化(DRO)在模糊集上最小化最坏情况期望损失,该模糊集可捕捉样本外环境中的分布偏移。虽然Huber(线性-空)污染是$\varepsilon$分数任意扰动的经典最小假设模型,但将其纳入模糊集可能导致最坏情况风险无穷大,且DRO目标变得无意义,除非施加强有界性或支撑假设。我们通过引入批量校准的置信模糊集来解决这些挑战:我们从数据中学习一个高质量批量集,同时考虑批量内的污染,并分别约束剩余尾部贡献。这导致一个闭式、有限的$\mathrm{mean}+\sup$鲁棒目标,以及针对常见损失和批量几何结构的可处理线性或二阶锥规划。通过该框架,我们强调并利用上期望(不精确概率概念)与最坏情况风险之间的等价性,展示IP置信集如何转化为具有可解释容忍水平的DRO目标。在重尾库存控制、地理偏移房价回归和人口偏移文本分类上的实验显示了竞争性的鲁棒性-准确性权衡和高效的优化时间,使用了贝叶斯、频率学派或经验参考分布。

英文摘要

Distributionally robust optimisation (DRO) minimises the worst-case expected loss over an ambiguity set that can capture distributional shifts in out-of-sample environments. While Huber (linear-vacuous) contamination is a classical minimal-assumption model for an $\varepsilon$-fraction of arbitrary perturbations, including it in an ambiguity set can make the worst-case risk infinite and the DRO objective vacuous unless one imposes strong boundedness or support assumptions. We address these challenges by introducing bulk-calibrated credal ambiguity sets: we learn a high-mass bulk set from data while considering contamination inside the bulk and bounding the remaining tail contribution separately. This leads to a closed-form, finite $\mathrm{mean}+\sup$ robust objective and tractable linear or second-order cone programs for common losses and bulk geometries. Through this framework, we highlight and exploit the equivalence between the imprecise probability (IP) notion of upper expectation and the worst-case risk, demonstrating how IP credal sets translate into DRO objectives with interpretable tolerance levels. Experiments on heavy-tailed inventory control, geographically shifted house-price regression, and demographically shifted text classification show competitive robustness-accuracy trade-offs and efficient optimisation times, using Bayesian, frequentist, or empirical reference distributions.

2512.23566 2026-06-12 math.DS cond-mat.stat-mech cs.LG math.OC stat.ML 版本更新

From geometry to dynamics: Learning overdamped Langevin dynamics from sparse observations with geometric constraints

从几何到动力学:基于几何约束从稀疏观测学习过阻尼朗之万动力学

Dimitra Maoutsa

发表机构 * Dimitra Maoutsa(迪米特拉·马乌茨)

AI总结 提出一种随机控制框架,利用系统不变密度的几何结构进行路径增强,从稀疏时间采样数据中恢复过阻尼朗之万动力学,无需参数模型假设。

详情
Comments
10+54 pages, 14 figures; accepted at ICML 2026 An earlier account of this work has previously appeared in arXiv:2301.08102 and arXiv:2304.00423 ; main methodology remains the same, this version includes additional numerical experiments and theory
AI中文摘要

当随机系统的轨迹在时间上稀疏采样时,我们如何学习其动力学背后的规律?现有方法要么需要时间分辨的高频观测,要么依赖于仅适用于保守系统的几何论证,限制了它们能恢复的动力学范围。在这里,我们提出一个新的框架,通过将推断重新表述为随机控制问题来调和这两种观点。我们的方法使用几何驱动的路径增强,以系统不变密度的几何结构为指导,重构可能的轨迹并推断底层动力学,而不假设特定的参数模型。应用于过阻尼朗之万系统,我们的方法即使在极度欠采样数据下也能准确恢复随机动力学,在合成基准测试中优于现有方法。这项工作证明了将几何归纳偏差纳入随机系统识别方法的有效性。

英文摘要

How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequency observations, or rely on geometric arguments that apply only to conservative systems, limiting the range of dynamics they can recover. Here, we present a new framework that reconciles these two perspectives by reformulating inference as a stochastic control problem. Our method uses geometry-driven path augmentation, guided by the geometry in the system's invariant density to reconstruct likely trajectories and infer the underlying dynamics without assuming specific parametric models. Applied to overdamped Langevin systems, our approach accurately recovers stochastic dynamics even from extremely undersampled data, outperforming existing methods in synthetic benchmarks. This work demonstrates the effectiveness of incorporating geometric inductive biases into stochastic system identification methods.

2512.21227 2026-06-12 cond-mat.mtrl-sci cs.AI 版本更新

PhononBench:A Large-Scale Phonon-Based Benchmark for Dynamical Stability in Crystal Generation

PhononBench:面向晶体生成中动态稳定性的基于声子的大规模基准

Xiao-Qi Han, Ze-Feng Gao, Wen-Kao Li, Peng-Jie Guo, Zhong-Yi Lu

发表机构 * School of Physics, Renmin University of China(中国人民大学物理学院)

AI总结 提出PhononBench,首个大规模AI生成晶体动态稳定性基准,利用MatterSim势高效计算声子,评估7个模型生成的133,838个结构,发现平均动态稳定性率仅32.15%。

详情
Comments
53 pages, 6 figures
AI中文摘要

近年来,生成式人工智能在晶体材料设计方面取得了显著进展,催生了基于图神经网络、扩散模型和大语言模型的方法。现有评估通常遵循稳定性-唯一性-新颖性(S.U.N.)框架,其中稳定性主要使用热力学标准评估,这未能完全捕捉材料实际存在所必需的动态稳定性。动态稳定性是决定材料能否被合成并持续存在的关键因素,声子谱计算是其评估标准。然而,此类计算的高计算成本阻碍了对生成晶体动态稳定性的大规模评估。在这项工作中,我们引入了PhononBench,这是首个针对AI生成晶体动态稳定性的大规模基准。利用最近开发的MatterSim原子间势,该势能在超过10,000种材料中实现了密度泛函理论(DFT)级别的声子预测精度,PhononBench能够对7个领先晶体生成模型生成的133,838个晶体结构进行高效的声子计算和动态稳定性分析。PhononBench揭示了当前生成模型的一个普遍局限性:除非另有说明,所有报告的动态稳定性指标均在-0.1 THz的声子频率阈值下评估,所有生成结构的平均动态稳定性率仅为32.15%,表现最佳的模型MatterGen也仅达到45.05%。此外,我们识别出32,995个在-0.001 THz严格阈值下整个布里渊区声子稳定的晶体结构。另外,一个基于网页的服务可通过此http URL访问,实现分钟级的超快声子预测。

英文摘要

In recent years, generative artificial intelligence has made significant advances in the design of crystalline materials, giving rise to approaches based on graph neural networks, diffusion models, and large language models. Existing evaluations commonly follow the stability-uniqueness-novelty (S.U.N.) framework, where stability is primarily assessed using thermodynamic criteria, which do not fully capture the dynamical stability essential for a material's practical existence. Dynamical stability is a key determinant of whether a material can be synthesized and persist, with phonon spectrum calculations serving as the standard for its evaluation. However, the high computational cost of such calculations has prevented large-scale assessment of dynamical stability in generated crystals. In this work, we introduce PhononBench, the first large-scale benchmark for dynamical stability in AI-generated crystals. Leveraging the recently developed MatterSim interatomic potential, which achieves density-functional-theory (DFT)-level accuracy in phonon predictions across more than 10,000 materials, PhononBench enables efficient phonon calculations and dynamical-stability analysis for 133,838 crystal structures generated by 7 leading crystal generation models. PhononBench reveals a widespread limitation of current generative models: unless otherwise specified, all reported dynamical-stability metrics are evaluated at a phonon-frequency threshold of -0.1 THz, with the average dynamical-stability rate across all generated structures being only 32.15%, and the top-performing model, MatterGen, reaching just 45.05%.In addition, we identify 32,995 crystal structures that are phonon-stable across the entire Brillouin zone under a strict threshold of -0.001 THz. In addition, a web-based service is accessible at http://phononbench.cn/, enabling minute-level ultra-fast phonon predictions.

2511.19716 2026-06-12 math.NA cs.LG cs.NA 版本更新

Design Criteria for SGD Preconditioners: Local Conditioning, Noise Floors, and Basin Stability

SGD预条件子的设计准则:局部条件数、噪声基底与盆地稳定性

Mitchell Scott, Tianshi Xu, Ziyuan Tang, Alexandra Pichette-Emmons, Qiang Ye, Yousef Saad, Yuanzhe Xi

发表机构 * Department of Mathematics, Emory University(埃默里大学数学系) Department of Mathematics, University of Minnesota Twin Cities(明尼苏达大学双城分校数学系) Department of Computer Science, University of Minnesota Twin Cities(明尼苏达大学双城分校计算机科学系) Department of Mathematics, University of Kentucky(肯塔基大学数学系)

AI总结 针对SGD在训练后期因各向异性曲率和梯度噪声导致的收敛缓慢问题,提出基于对称正定矩阵M的预条件SGD分析框架,推导收敛速率和噪声基底受M相关量控制的界,并给出非凸目标下的盆地稳定性保证,为科学机器学习提供设计准则。

详情
Journal ref
Trans. of Mach. Learning Research, 06/2026
Comments
31 pages, 11 Figures
AI中文摘要

随机梯度下降(SGD)在训练后期常因各向异性曲率和梯度噪声而变慢。我们在对称正定矩阵$\mathbf{M}$诱导的几何中分析预条件SGD,推导出收敛速率和随机噪声基底均受$\mathbf{M}$相关量控制的界:速率通过$\mathbf{M}$度量下的有效条件数,基底通过该条件数与预条件噪声水平的乘积。对于非凸目标,我们建立了依赖于预条件子的盆地稳定性保证:当光滑性和盆地大小以$\mathbf{M}$范数度量时,迭代停留在良好局部区域的概率有显式下界。这一视角在科学机器学习(SciML)中尤为重要,其中在随机更新下实现小训练损失与物理保真度、数值稳定性和约束满足密切相关。该框架适用于对角/自适应和曲率感知预条件子,并给出一个简单的设计原则:选择$\mathbf{M}$以改善局部条件同时衰减噪声。在二次诊断问题和三个SciML基准上的实验验证了预测的速率-基底行为。

英文摘要

Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $\mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $\mathbf{M}$-dependent quantities: the rate through an effective condition number in the $\mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $\mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $\mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.

2511.13271 2026-06-12 cs.SE cs.AI cs.IR 版本更新

Examining the Usage of Generative AI Models in Student Learning Activities for Software Programming

生成式AI模型在学生软件编程学习活动中的使用研究

Rufeng Chen, Shuaishuai Jiang, Jiyun Shen, AJung Moon, Lili Wei

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 通过对比生成式AI与传统在线资源对编程学习的影响,发现AI能提升任务表现但未必带来知识增益,初学者过度依赖而中级生选择性使用,呼吁将AI作为学习工具而非解题工具。

详情
Comments
9 pages, 4 figures, published at AIWARE 2025
AI中文摘要

生成式AI(GenAI)工具如ChatGPT的兴起为计算教育带来了新的机遇和挑战。现有研究主要关注GenAI完成教育任务的能力及其对学生表现的影响,往往忽视了其对知识获取的作用。在本研究中,我们调查了GenAI辅助与传统在线资源在不同熟练水平下对知识获取的支持效果。我们进行了一项受控用户实验,涉及24名具有两种不同编程经验水平(初学者、中级)的本科生,以考察学生在解决编程任务时如何与ChatGPT互动。我们分析了任务表现、概念理解和交互行为。我们的发现表明,使用GenAI生成完整解决方案显著提高了任务表现,尤其是对初学者而言,但并未持续带来知识增益。重要的是,使用策略因经验而异:初学者倾向于过度依赖GenAI以完成任务,过程中往往没有知识增益,而中级生则采用更具选择性的方法。我们发现,过度依赖和极少使用都会导致整体知识增益较弱。基于我们的结果,我们呼吁学生和教育工作者将GenAI作为学习工具而非解题工具。我们的研究强调了在将GenAI整合到编程教育中时,迫切需要指导以促进更深层次的理解。

英文摘要

The rise of Generative AI (GenAI) tools like ChatGPT has created new opportunities and challenges for computing education. Existing research has primarily focused on GenAI's ability to complete educational tasks and its impact on student performance, often overlooking its effects on knowledge gains. In this study, we investigate how GenAI assistance compares to conventional online resources in supporting knowledge gains across different proficiency levels. We conducted a controlled user experiment with 24 undergraduate students of two different levels of programming experience (beginner, intermediate) to examine how students interact with ChatGPT while solving programming tasks. We analyzed task performance, conceptual understanding, and interaction behaviors. Our findings reveal that generating complete solutions with GenAI significantly improves task performance, especially for beginners, but does not consistently result in knowledge gains. Importantly, usage strategies differ by experience: beginners tend to rely heavily on GenAI toward task completion often without knowledge gain in the process, while intermediates adopt more selective approaches. We find that both over-reliance and minimal use result in weaker knowledge gains overall. Based on our results, we call on students and educators to adopt GenAI as a learning rather than a problem solving tool. Our study highlights the urgent need for guidance when integrating GenAI into programming education to foster deeper understanding.

2412.08610 2026-06-12 cs.GT cs.AI cs.CY 版本更新

Competition and Diversity in Generative AI

生成式人工智能中的竞争与多样性

Manish Raghavan

发表机构 * MIT Sloan School of Management & Department of Electrical Engineering and Computer Science(麻省理工学院斯隆管理学院及电气工程与计算机科学系)

AI总结 通过博弈论模型和Scattergories游戏实验,研究竞争如何促使生成式AI模型多样化,缓解同质化,并提升社会福利。

详情
AI中文摘要

最近的实验和现实证据表明,使用生成式人工智能会降低所产生内容的多样性。使用相同或相似的AI模型似乎会导致更同质化的行为。我们的工作从观察到存在一股相反方向的推动力开始:竞争。当生产者相互竞争(例如,争夺客户或注意力)时,他们被激励去创造新颖或独特的内容。我们探讨了竞争对内容多样性和整体社会福利的影响。通过一个正式的博弈论模型,我们表明竞争市场会选择多样化的AI模型,从而缓解单一文化。我们进一步表明,一个在孤立环境中表现良好(即根据基准)的生成式AI模型可能在竞争市场中无法提供价值。我们的结果强调了在生成式AI模型输出分布的广度上评估它们的重要性,特别是当它们将被部署在竞争环境中时。我们通过使用语言模型玩Scattergories(一个奖励正确且独特答案的文字游戏)来实证验证我们的结果。总体而言,我们的结果表明,由生成式AI导致的同质化不太可能在竞争市场中持续存在,相反,下游市场的竞争可能会推动AI模型开发的多样化。

英文摘要

Recent evidence, both in the lab and in the wild, suggests that the use of generative artificial intelligence reduces the diversity of content produced. The use of the same or similar AI models appears to lead to more homogeneous behavior. Our work begins with the observation that there is a force pushing in the opposite direction: competition. When producers compete with one another (e.g., for customers or attention), they are incentivized to create novel or unique content. We explore the impact competition has on both content diversity and overall social welfare. Through a formal game-theoretic model, we show that competitive markets select for diverse AI models, mitigating monoculture. We further show that a generative AI model that performs well in isolation (i.e., according to a benchmark) may fail to provide value in a competitive market. Our results highlight the importance of evaluating generative AI models across the breadth of their output distributions, particularly when they will be deployed in competitive environments. We validate our results empirically by using language models to play Scattergories, a word game in which players are rewarded for answers that are both correct and unique. Overall, our results suggest that homogenization due to generative AI is unlikely to persist in competitive markets, and instead, competition in downstream markets may drive diversification in AI model development.

2509.21548 2026-06-12 cs.CY cs.CL 版本更新

C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset

C-QUERI:国会机构中的问题、交流与回答数据集

Manjari Rudra, Daniel Magleby, Sujoy Sikdar

发表机构 * School of Computing, Binghamton University(宾夕法尼亚大学布林莫尔分校计算机学院) Department of Political Science, Binghamton University(宾夕法尼亚大学布林莫尔分校政治学系)

AI总结 提出从听证会记录中提取问答对的流程,构建108-117届国会委员会听证数据集,分析显示提问者党派可从问题本身预测,为政治话语研究提供框架。

详情
AI中文摘要

政治采访和听证中的问题除了信息收集外,还具有战略目的,包括推进党派叙事和塑造公众认知。然而,由于缺乏大规模数据集来研究此类话语,这些战略方面仍未得到充分研究。国会听证会为研究政治提问提供了一个特别丰富且易于处理的地点:互动由正式规则组织,证人必须回答,不同政治派别的成员保证有机会提问,从而能够比较跨政治光谱的行为。我们开发了一个流程,从非结构化听证记录中提取问答对,并构建了一个包含第108至117届国会委员会听证的新数据集。我们的分析揭示了跨党派的提问策略的系统性差异,表明仅从问题本身即可预测提问者的党派归属。我们的数据集和方法不仅推进了国会政治研究,还为分析类似采访环境中的问答提供了通用框架。

英文摘要

Questions in political interviews and hearings serve strategic purposes beyond information gathering including advancing partisan narratives and shaping public perceptions. However, these strategic aspects remain understudied due to the lack of large-scale datasets for studying such discourse. Congressional hearings provide an especially rich and tractable site for studying political questioning: Interactions are structured by formal rules, witnesses are obliged to respond, and members with different political affiliations are guaranteed opportunities to ask questions, enabling comparisons of behaviors across the political spectrum. We develop a pipeline to extract question-answer pairs from unstructured hearing transcripts and construct a novel dataset of committee hearings from the 108th--117th Congress. Our analysis reveals systematic differences in questioning strategies across parties, by showing the party affiliation of questioners can be predicted from their questions alone. Our dataset and methods not only advance the study of congressional politics, but also provide a general framework for analyzing question-answering across interview-like settings.

2402.01779 2026-06-12 eess.IV cs.CV cs.LG stat.ML 版本更新

Plug-and-Play image restoration with Stochastic deNOising REgularization

即插即用图像恢复:随机去噪正则化

Marien Renaud, Jean Prost, Arthur Leclaire, Nicolas Papadakis

发表机构 * arXiv.org GitHub

AI总结 提出SNORE框架,仅在适当噪声水平图像上应用去噪器,结合随机正则化与梯度下降求解逆问题,在去模糊和修复任务上达到SOTA。

详情
AI中文摘要

即插即用(PnP)算法是一类迭代算法,通过结合物理模型和深度神经网络进行正则化来解决图像逆问题。尽管它们能产生令人印象深刻的图像恢复结果,但这些算法依赖于在迭代过程中噪声逐渐减小的图像上非标准地使用去噪器,这与最近基于扩散模型(DM)的算法形成对比,后者仅在重新加噪的图像上应用去噪器。我们提出了一种新的PnP框架,称为随机去噪正则化(SNORE),该框架仅在具有适当噪声水平的图像上应用去噪器。它基于显式的随机正则化,从而产生一种随机梯度下降算法来解决不适定逆问题。提供了该算法及其退火扩展的收敛性分析。实验上,我们证明SNORE在去模糊和修复任务上与最先进方法相比具有竞争力,无论是在定量还是定性方面。

英文摘要

Plug-and-Play (PnP) algorithms are a class of iterative algorithms that address image inverse problems by combining a physical model and a deep neural network for regularization. Even if they produce impressive image restoration results, these algorithms rely on a non-standard use of a denoiser on images that are less and less noisy along the iterations, which contrasts with recent algorithms based on Diffusion Models (DM), where the denoiser is applied only on re-noised images. We propose a new PnP framework, called Stochastic deNOising REgularization (SNORE), which applies the denoiser only on images with noise of the adequate level. It is based on an explicit stochastic regularization, which leads to a stochastic gradient descent algorithm to solve ill-posed inverse problems. A convergence analysis of this algorithm and its annealing extension is provided. Experimentally, we prove that SNORE is competitive with respect to state-of-the-art methods on deblurring and inpainting tasks, both quantitatively and qualitatively.

2505.04021 2026-06-12 cs.DC cs.AI cs.LG cs.PF 版本更新

Prism: Cost-Efficient Multi-LLM Serving via GPU Memory Ballooning

Prism: 通过GPU内存气球实现经济高效的多LLM服务

Shan Yu, Yifan Qiao, Mingyuan Ma, Yangmin Li, Shuo Yang, Xinyuan Tong, Yang Wang, Zhiqiang Xie, Yuwei An, Shiyi Cao, Ke Bao, Deepak Vij, Xiaoning Ding, Yichen Wang, Qingda Lu, Zhong Wang, Gao Gao, Harry Xu, Junyi Shu, Jiarong Xing, Ying Sheng

发表机构 * UCLA(加州大学洛杉矶分校) UC Berkeley(伯克利加州大学) Harvard University(哈佛大学) CMU(卡内基梅隆大学) University of Edinburgh(爱丁堡大学) Intel(英特尔) Stanford University(斯坦福大学) LMSYS(灵州市系统实验室) ByteDance(字节跳动) Alibaba Cloud(阿里云) Tsinghua University(清华大学) Novita AI Rice University(里士满大学)

AI总结 针对多LLM服务中资源效率低下的问题,提出基于内存气球的内存中心化LLM协同服务框架Prism,统一空间与时间共享,已在10K+ GPU生产环境部署。

详情
Comments
OSDI'26
AI中文摘要

推理提供商必须为许多LLM保持可用性,包括低流量但关键的模型,随着token价格下降,资源效率变得越来越重要。对生产轨迹的分析揭示了一种动态突发组模式,其中一组模型同时活跃并随时间变化;现有的空间和时间共享方法缺乏适应这种变化的原理性机制,迫使在SLO遵守和效率之间进行权衡。我们观察到弹性内存分配可以统一空间和时间共享。基于这一洞察,我们开发了Prism,一个以内存为中心的LLM协同服务框架,它应用内存气球来跨模型回收内存,并在单一方案下支持两种形式的共享。Prism的气球驱动程序,称为kvcached,已在https://github.com/... 开源,并在超过10K GPU的生产环境中部署。

英文摘要

Inference providers must maintain availability for many LLMs, including low-volume but essential models, making resource efficiency increasingly important as token prices fall. Analysis of production traces reveals a dynamic bursty-group pattern in which sets of models become active together and shift over time; existing space- and time-sharing approaches lack principled mechanisms to adapt to this variability, forcing trade-offs between SLO adherence and efficiency. We observe that elastic memory allocation can unify spatial and temporal sharing. Based on this insight, we have developed Prism, a memory-centric LLM co-serving framework that applies memory ballooning to reclaim memory across models and support both forms of sharing under a single scheme. Prism's balloon driver, referred to as kvcached, has been open-sourced at https://github.com/ovg-project/kvcached, and deployed in production environments across 10K+ GPUs.

2401.08301 2026-06-12 eess.SP cs.LG cs.SY eess.SY 版本更新

QoS Improvement in Multi User Cellular-Symbiotic Radio Network Assisted by Active-STAR-RIS

基于有源同步透射反射智能超表面的多用户蜂窝共生无线电网络中的QoS改进

Rahman Saadat Yeganeh, Mohammad Javad Omidi, Farshad Zeinali, Mohammad Robat Mili, Mohammad Ghavami

发表机构 * Department of Electrical and Computer Engineering, Isfahan University of Technology(伊斯法罕理工大学电气与计算机工程系) Department of Electronics and Communication Engineering, Kuwait College of Science and Technology(科威特科学与技术学院电子与通信工程系) The Pasargad Institute for Advanced Innovative Solutions (PIAIS)(帕萨尔加德先进创新解决方案研究所) Electrical and Electronic Engineering Department, London South Bank University(伦敦南岸大学电子与电气工程系)

AI总结 本文利用有源同步透射反射智能超表面(ASRIS)增强6G蜂窝网络服务质量,通过深度强化学习优化波束成形、相位调整和调度参数,最大化共生反向散射设备与用户间的吞吐量。

详情
Comments
This article will be submitted to the Transactions journal
AI中文摘要

在本文中,我们采用有源同步透射反射可重构智能表面(ASRIS)来增强6G蜂窝网络服务的质量。该网络集成了共生无线电(CSR)子系统,以促进无源物联网(IoT)用户与有源用户之间的通信,分别称为共生反向散射设备(SBD)和共生用户设备(SUE)。由于SBD是无源的,向SUE传输信息面临重大挑战。为克服这一挑战,我们利用基站(BS)内大规模多输入多输出(MIMO)天线的能力,以更大的功率中继SBD传输的信息。该方案采用非正交多址(NOMA)技术实现所有用户的多址接入,并使用连续干扰消除(SIC)消除潜在干扰。主要目标是最大化SBD与SUE之间的吞吐量。为此,我们构建了一个优化问题,涉及BS和ASRIS处的有源波束成形系数、ASRIS的相位调整以及CSR与蜂窝网络之间的调度参数。为解决该优化问题,我们使用了三种深度强化学习(DRL)方法:近端策略优化(PPO)、双延迟深度确定性策略梯度(TD3)和异步优势演员-评论家(A3C)。对这些方法进行了仿真,结果表明A3C、TD3和PPO分别具有最快的收敛速度并实现了最高的网络吞吐量增长。最后,使用无源同步透射反射RIS(STAR-RIS)对所提方案进行了评估,其性能劣于ASRIS。

英文摘要

In this article, we employ active simultaneously transmitting and reflecting reconfigurable intelligent surfaces (ASRIS) to enhance the quality of 6G cellular network services. The network integrates commensal symbiotic radio (CSR) subsystems to facilitate communication between passive Internet of Things (IoT) users and active users, referred to as symbiotic backscatter devices (SBDs) and symbiotic user equipments (SUEs), respectively. Since the SBDs are passive, transmitting information to the SUEs poses significant challenges. To overcome this challenge, we harness the capabilities of massive multiple input multiple output (MIMO) antennas within the base station (BS) to relay the information transmitted by SBDs with greater power. This scheme uses the non-orthogonal multiple access (NOMA) technique for multiple access among all users, and potential interferences are eliminated using successive interference cancellation (SIC). The primary objective is to maximize the throughput between SBDs and SUEs. To achieve this, we formulate an optimization problem involving variables such as active beamforming coefficients at the BS and ASRIS, phase adjustments of ASRIS, and scheduling parameters between CSR and cellular networks. To solve this optimization problem, we used three deep reinforcement learning (DRL) methods: proximal policy optimization (PPO), twin delayed deep deterministic policy gradient (TD3), and asynchronous advantage actor critic (A3C). These methods were simulated, and the results demonstrate that A3C, TD3, and PPO have the best convergence speeds and achieve the highest increases in network throughput, respectively. Finally, the proposed scheme was evaluated using passive simultaneously transmitting and reflecting RIS (STAR-RIS), which demonstrated poorer performance compared to ASRIS.

2604.15372 2026-06-12 cs.CR cs.AI cs.MM

The Synthetic Media Shift: Tracking the Rise, Virality, and Detectability of AI-Generated Multimodal Misinformation

Zacharias Chrysidis, Stefanos-Iordanis Papadopoulos, Symeon Papadopoulos

发表机构 * Centre for Research and Technology Hellas(希腊研究中心)

详情
英文摘要

As generative AI advances, the distinction between authentic and synthetic media is increasingly blurred, challenging the integrity of online information. In this study, we present CONVEX, a large-scale dataset of multimodal misinformation involving miscaptioned, edited, and AI-generated visual content, comprising over 150K multimodal posts with associated notes and engagement metrics from X's Community Notes. We analyze how multimodal misinformation evolves in terms of virality, engagement, and consensus dynamics, with a focus on synthetic media. Our results show that while AI-generated content achieves disproportionate virality, its spread is driven primarily by passive engagement rather than active discourse. Despite slower initial reporting, AI-generated content reaches community consensus more quickly once flagged. Moreover, our evaluation of specialized detectors and vision-language models reveals a consistent decline in performance over time in distinguishing synthetic from authentic images as generative models evolve. These findings highlight the need for continuous monitoring and adaptive strategies in the rapidly evolving digital information environment.

2601.02149 2026-06-12 cond-mat.mes-hall cond-mat.dis-nn cs.AI

AI-enhanced tuning of quantum dot Hamiltonians toward Majorana modes

Mateusz Krawczyk, Jarosław Pawłowski

发表机构 * Institute of Theoretical Physics, Wrocław University of Science and Technology(理论物理研究所,沃林大学技术学院)

详情
Journal ref
Phys. Rev. Applied 25, 064032 (2026)
Comments
12 pages, 8 figures, 2 tables
英文摘要

We propose a neural network-based model capable of learning the broad landscape of working regimes in quantum dot simulators, and using this knowledge to autotune these devices - based on transport measurements - toward obtaining Majorana modes in the structure. The model is trained in an unsupervised manner on synthetic data in the form of conductance maps, using a physics-informed loss that incorporates key properties of Majorana zero modes. We show that, with appropriate training, a deep vision-transformer network can efficiently memorize relation between Hamiltonian parameters and structures on conductance maps and use it to propose parameters update for a quantum dot chain that drive the system toward topological phase. Starting from a broad range of initial detunings in parameter space, a single update step is sufficient to generate nontrivial zero modes. Moreover, by enabling an iterative tuning procedure - where the system acquires updated conductance maps at each step - we demonstrate that the method can address a much larger region of the parameter space.

2603.26705 2026-06-12 q-bio.BM cs.AI cs.LG

PI-Mamba: Linear-Time Protein Backbone Generation via Spectrally Initialized Flow Matching

Tianyu Wu, Lin Zhu

发表机构 * Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign(生物物理与定量生物学中心,伊利诺伊大学厄巴纳-香槟分校) School of Information Science, University of Illinois Urbana-Champaign(信息科学学院,伊利诺伊大学厄巴纳-香槟分校)

详情
Journal ref
Bioinformatics (2026)
英文摘要

Motivation: Generative models for protein backbone design have to simultaneously ensure geometric validity, sampling efficiency, and scalability to long sequences. However, most existing approaches rely on iterative refinement, quadratic attention mechanisms, or post-hoc geometry correction, leading to a persistent trade-off between computational efficiency and structural fidelity. Results: We present Physics-Informed Mamba (PI-Mamba), a generative model that enforces exact local covalent geometry by construction while enabling linear-time inference. PI-Mamba integrates a differentiable constraint-enforcement operator into a flow-matching framework and couples it with a Mamba-based state-space architecture. To improve optimisation stability and backbone realism, we introduce a spectral initialization derived from the Rouse polymer model and an auxiliary cis-proline awareness head. Across benchmark tasks, PI-Mamba achieves 0.0\% local geometry violations and high designability (scTM = $0.91\pm 0.03$, n = 100), while scaling to proteins exceeding 2,000 residues on a single A5000 GPU (24 GB).

2602.18072 2026-06-12 cs.AR cs.AI

HiAER-Spike Software-Hardware Reconfigurable Platform for Event-Driven Neuromorphic Computing at Scale

Gwenevere Frank, Gopabandhu Hota, Keli Wang, Christopher Deng, Krish Arora, Diana Vins, Abhinav Uppal, Omowuyi Olajide, Kenneth Yoshimoto, Qingbo Wang, Mari Yamaoka, Johannes Leugering, Stephen Deiss, Leif Gibb, Gert Cauwenberghs

发表机构 * Institute for Neural Computation, UC San Diego(神经计算研究所,加州大学圣地亚哥分校) Fujitsu(富士通) Forschungszentrum Jülich(吕贝克研究中心) Qernel AI

详情
Journal ref
npj Unconventional Computing (2026)
Comments
Leif Gibb, Gert Cauwenberghs are equal authors. arXiv admin note: substantial text overlap with arXiv:2504.03671
英文摘要

In this work, we present HiAER-Spike, a modular, reconfigurable, event-driven neuromorphic computing platform designed to execute large spiking neural networks with up to 160 million neurons and 40 billion synapses - roughly twice the neurons of a mouse brain at faster than real time. This system, assembled at the UC San Diego Supercomputer Center, comprises a co-designed hard- and software stack that is optimized for run-time massively parallel processing and hierarchical address-event routing (HiAER) of spikes while promoting memory-efficient network storage and execution. The architecture efficiently handles both sparse connectivity and sparse activity for robust and low-latency event-driven inference for both edge and cloud computing. A Python programming interface to HiAER-Spike, agnostic to hardware-level detail, shields the user from complexity in the configuration and execution of general spiking neural networks with minimal constraints in topology. The system is made easily available over a web portal for use by the wider community. In the following, we provide an overview of the hard- and software stack, explain the underlying design principles, demonstrate some of the system's capabilities and solicit feedback from the broader neuromorphic community. Examples are shown demonstrating HiAER-Spike's capabilities for event-driven vision on benchmark CIFAR-10, DVS event-based gesture, MNIST, and Pong tasks.

2411.02933 2026-06-12 cs.DB cs.LG cs.PF

P-MOSS: Scheduling Main-Memory Indexes Over NUMA Servers Using Next Token Prediction

Yeasir Rayhan, Walid G. Aref

发表机构 * Purdue University West Lafayette, IN, USA(普渡大学西拉法叶分校)

详情
Comments
Accepted to SIGMOD'26
英文摘要

Ever since the Dennard scaling broke down in the early 2000s and the frequency of the CPUs stalled, vendors have started to increase the core count in each CPU chip at the expense of introducing heterogeneity, thus ushering the era of NUMA and Chiplet processors. Since then, the heterogeneity in the design space of hardware has only increased to the point that DBMS performance may vary significantly up to an order of magnitude in modern servers. An important factor that affects performance includes the location of the logical cores where the DBMS queries execute, and the location where the data resides. This paper introduces P-MOSS, a learned spatial scheduling framework that schedules query execution to specific logical cores, and co-locates data on the corresponding NUMA node. For cross-hardware and workload adaptability, P-MOSS leverages core principles from Large Language Models, such as Next Token prediction, Generative Pre-training, and Fine-tuning. In the spirit of hardware-software synergy, P-MOSS guides its scheduling decision solely based on the low-level hardware statistics collected from the hardware Performance Monitoring Unit with the aid of a Decision Transformer. Experimental evaluation is performed in the context of the B$^+$-Tree index. Performance results demonstrate that P-MOSS offers an improvement of up to $6\times$ over traditional schedules in terms of query throughput.

2601.10885 2026-06-12 physics.plasm-ph cs.LG physics.comp-ph

Learning collision operators from plasma phase space data using differentiable simulators

利用可微分模拟器从等离子体相空间数据学习碰撞算子

Diogo D. Carvalho, Pablo J. Bilbao, Warren B. Mori, Luis O. Silva, E. Paulo Alves

发表机构 * GoLP/Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, Universidade de Lisboa(GoLP/等离子体与核融合研究所,理工学院,里斯本大学) Mani L. Bhaumik Institute for Theoretical Physics, University of California, Los Angeles(马尼·L·巴乌米克理论物理研究所,加州大学洛杉矶分校) The Rudolf Peierls Centre for Theoretical Physics, University of Oxford(鲁道夫·皮埃尔尔斯理论物理中心,牛津大学) Department of Physics and Astronomy University of California, Los Angeles(物理与天文学系,加州大学洛杉矶分校)

AI总结 提出一种结合可微分Fokker-Planck求解器与梯度优化方法,从等离子体相空间数据推断碰撞算子的方法,并在二维PIC模拟数据上验证其准确性和计算效率。

详情
Journal ref
J. Plasma Phys. (2026), vol. 92, E76
Comments
accepted for publication in Journal of Plasma Physics, code available at https://github.com/diogodcarvalho/ml-pic-collision-operators
AI中文摘要

我们提出了一种从等离子体动力学相空间数据推断碰撞算子的方法。该方法结合了一个可微分动力学模拟器(其核心组件是一个可微分的Fokker-Planck求解器)与基于梯度的优化方法,以学习最能描述相空间动力学的碰撞算子。我们使用空间均匀热等离子体的二维Particle-in-Cell模拟数据测试了该方法,学习了能够捕获有限大小带电粒子之间自洽电磁相互作用的碰撞算子,该算子适用于多种模拟参数。我们证明,学习到的算子比基于粒子轨迹的替代估计更准确,同时无需对过程的相关时间尺度做出先验假设,并显著降低了内存需求。我们发现,在非相对论条件下获得的算子与静电场景的理论预测高度一致。我们的结果表明,可微分模拟器为推断新算子提供了一种强大且计算高效的方法,适用于广泛的问题,如电磁主导的碰撞动力学和随机波粒相互作用。

英文摘要

We propose a methodology to infer collision operators from phase space data of plasma dynamics. Our approach combines a differentiable kinetic simulator, whose core component in this work is a differentiable Fokker-Planck solver, with a gradient-based optimisation method to learn the collisional operators that best describe the phase space dynamics. We test our method using data from two-dimensional Particle-in-Cell simulations of spatially uniform thermal plasmas, and learn the collision operator that captures the self-consistent electromagnetic interaction between finite-size charged particles over a wide variety of simulation parameters. We demonstrate that the learned operators are more accurate than alternative estimates based on particle tracks, while making no prior assumptions about the relevant time scales of the processes and significantly reducing memory requirements. We find that the retrieved operators, obtained in the non-relativistic regime, are in excellent agreement with theoretical predictions derived for electrostatic scenarios. Our results show that differentiable simulators offer a powerful and computational efficient approach to infer novel operators for a wide rage of problems, such as electromagnetically dominated collisional dynamics and stochastic wave-particle interactions.

2510.03699 2026-06-12 q-bio.NC cs.AI cs.LG cs.NE cs.SY eess.SY

Dissecting Larval Zebrafish Hunting using Deep Reinforcement Learning Trained RNN Agents

Raaghav Malik, Satpreet H. Singh, Sonja Johnson-Yu, Nathan Wu, Roy Harpaz, Florian Engert, Kanaka Rajan

发表机构 * California Institute of Technology(加州理工学院) Harvard University(哈佛大学)

详情
Journal ref
Proceedings of the 9th Conference on Cognitive Computational Neuroscience (2026)
英文摘要

Larval zebrafish hunting provides a tractable setting to study how ecological and energetic constraints shape adaptive behavior in both biological brains and artificial agents. Here we develop a minimal agent-based model, training recurrent policies with deep reinforcement learning in a bout-based zebrafish simulator. Despite its simplicity, the model reproduces hallmark hunting behaviors -- including eye vergence-linked pursuit, speed modulation, and stereotyped approach trajectories -- that closely match real larval zebrafish. Quantitative trajectory analyses show that pursuit bouts systematically reduce prey angle by roughly half before strike, consistent with measurements. Virtual experiments and parameter sweeps vary ecological and energetic constraints, bout kinematics (coupled vs. uncoupled turns and forward motion), and environmental factors such as food density, food speed, and vergence limits. These manipulations reveal how constraints and environments shape pursuit dynamics, strike success, and abort rates, yielding falsifiable predictions for neuroscience experiments. These sweeps identify a compact set of constraints -- binocular sensing, the coupling of forward speed and turning in bout kinematics, and modest energetic costs on locomotion and vergence -- that are sufficient for zebrafish-like hunting to emerge. Strikingly, these behaviors arise in minimal agents without detailed biomechanics, fluid dynamics, circuit realism, or imitation learning from real zebrafish data. Taken together, this work provides a normative account of zebrafish hunting as the optimal balance between energetic cost and sensory benefit, highlighting the trade-offs that structure vergence and trajectory dynamics. We establish a virtual lab that narrows the experimental search space and generates falsifiable predictions about behavior and neural coding.

2508.19273 2026-06-12 cs.CR cs.AI

MixGAN: A Hybrid Semi-Supervised and Generative Approach for DDoS Detection in Cloud-Integrated IoT Networks

Tongxi Wu, Chenwei Xu, Jin Yang

发表机构 * College of Cyber Science and Engineering, Sichuan University(四川大学网络空间安全学院) College of Information Science and Technology, Tibet University(西藏大学信息科学学院)

详情
Journal ref
ECAI 2025, 28th European Conference on Artificial Intelligence
英文摘要

The proliferation of cloud-integrated IoT systems has intensified exposure to Distributed Denial of Service (DDoS) attacks due to the expanded attack surface, heterogeneous device behaviors, and limited edge protection. However, DDoS detection in this context remains challenging because of complex traffic dynamics, severe class imbalance, and scarce labeled data. While recent methods have explored solutions to address class imbalance, many still struggle to generalize under limited supervision and dynamic traffic conditions. To overcome these challenges, we propose MixGAN, a hybrid detection method that integrates conditional generation, semi-supervised learning, and robust feature extraction. Specifically, to handle complex temporal traffic patterns, we design a 1-D WideResNet backbone composed of temporal convolutional layers with residual connections, which effectively capture local burst patterns in traffic sequences. To alleviate class imbalance and label scarcity, we use a pretrained CTGAN to generate synthetic minority-class (DDoS attack) samples that complement unlabeled data. Furthermore, to mitigate the effect of noisy pseudo-labels, we introduce a MixUp-Average-Sharpen (MAS) strategy that constructs smoothed and sharpened targets by averaging predictions over augmented views and reweighting them towards high-confidence classes. Experiments on NSL-KDD, BoT-IoT, and CICIoT2023 demonstrate that MixGAN achieves up to 2.5% higher accuracy and 4% improvement in both TPR and TNR compared to state-of-the-art methods, confirming its robustness in large-scale IoT-cloud environments. The source code is publicly available at https://github.com/0xCavaliers/MixGAN.

2606.13671 2026-06-12 cs.LG 新提交

Understanding Truncated Positional Encodings for Graph Neural Networks

理解图神经网络的截断位置编码

James Flora, Mitchell Black, Weng-Keen Wong, Amir Nayyeri

AI总结 研究截断位置编码(如前k个特征空间或邻接矩阵幂)对图神经网络表达能力的影响,理论证明截断后多种位置编码的表达能力存在本质差异,且截断谱位置编码不再强于1-WL测试,实验表明混合截断编码优于单一类型。

详情
Comments
28 pages, 4 figures, ICML 2026
AI中文摘要

位置编码(PEs)在理论和经验上增强了图神经网络(GNNs)的能力。两个最流行的PE家族——谱(例如,拉普拉斯特征空间、有效电阻)和基于游走的(邻接矩阵的多项式)——在表达能力上理论等价,其表达性介于1-WL和3-WL测试之间。然而,这种等价性假设GNN使用这些PE的“完整”版本,这需要$O(n^3)$的时间和空间复杂度。相反,从业者通常使用这些编码的截断变体,例如前$k$个特征空间或邻接矩阵的幂。然而,这些截断PE的理论性质尚不清楚。在这项工作中,我们启动了对这些截断PE的研究。理论上,我们表明,在截断下,几个PE家族在表达能力上存在根本差异。作为推论,我们证明截断谱PE不再强于1-WL测试。我们还研究了一个谱PE家族——$k$-调和距离——以突出即使密切相关的截断PE在表达能力上的差异。最后,我们通过实验表明,在真实世界数据集上,混合截断PE优于任何单一家族。

英文摘要

Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based (polynomials of the adjacency matrix) - are theoretically equivalent in expressive power, with expressivity between the 1-WL and 3-WL tests. However, this equivalence assumes the GNN uses the "complete" version of these PEs, which requires $O(n^3)$ time and space complexity. Instead, practitioners commonly use truncated variants of these encodings, such as the first $k$ eigenspaces or powers of the adjacency matrix. However, the theoretical properties of these truncated PEs are unknown. In this work, we initiate the study of these truncated PEs. Theoretically, we show that, under truncation, several families of PEs are fundamentally different in expressive power. As a corollary, we show that truncated spectral PEs are no longer stronger than the 1-WL test. We also study a family of spectral PEs, the $k$-harmonic distances, to highlight the differences in expressive power of even closely related truncated PEs. Finally, we experimentally show that a mix of truncated PEs is preferable to any single family on real-world datasets.

2606.13658 2026-06-12 cs.AI 新提交

Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonization

在你思考之前:系统0、AI中介认知与认知殖民化

Marianna Bergamaschi Ganapini, Massimo Chiriatti, Enrico Panai, Giuseppe Riva

AI总结 本文比较三种AI认知框架,提出系统0具有独特理论地位,并引入“认知殖民化”概念,指出AI系统能将外部利益嵌入自我架构,构成难以察觉的影响。

详情
AI中文摘要

本文考察了三种用于理解人工智能的认知和认识后果的最新框架:三系统理论、思维框架和系统0。本文认为,虽然前两种框架捕捉了AI对个体推理和集体认识实践影响的重要维度,但系统0占据了一个理论上的独特地位,其他两者都无法完全复制。本文引入了认知殖民化的概念,根据这一概念,AI系统能够以用户难以察觉的方式将外部利益嵌入自我架构中。由于此类系统已广泛部署,理解这些无形的影响形式是一项紧迫的哲学和实践任务。

英文摘要

This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture important dimensions of AI's influence on individual reasoning and collective epistemic practices, System 0 occupies a theoretically distinctive position that neither can fully replicate. The paper introduces the concept of cognitive colonization, according to which AI systems can embed external interests within the architecture of the self in ways that are difficult for users to perceive. Because such systems are already widely deployed, understanding these invisible forms of influence is an urgent philosophical and practical task.

2606.13655 2026-06-12 cs.CV cs.GR 新提交

Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

Flex4DHuman:面向4D人体重建的灵活多视角视频扩散模型

Jen-Hao Cheng, Yipeng Wang, Hao Zhang, Gengshan Yang, Jenq-Neng Hwang

AI总结 提出Flex4DHuman,一种基于相对相机位姿条件化的多视角视频扩散模型,无需显式几何先验即可将单目或稀疏多视角视频转换为密集多视角视频,并用于4D高斯溅射重建。

详情
Comments
18 pages, 8 figures. Code, and multi-view caption dataset available
AI中文摘要

我们提出Flex4DHuman,一种多视角视频扩散模型,它通过仅使用相对相机位姿条件化,将动态主体的单目或稀疏多视角视频转换为同步的密集多视角视频。与先前依赖骨架、深度图、法线或渲染目标视角几何的人体中心方法不同,Flex4DHuman不需要显式几何先验,而是通过相对相机位姿位置编码来条件化生成。生成的视频可直接被下游重建流程用于创建动态4D高斯溅射。基于Wan 2.1 1.3B文本到视频模型,Flex4DHuman保留了骨干架构,并通过五轴位置编码编码相机和视角信息,该编码将时空RoPE扩展了视角索引和连续SE(3)相对相机几何。三阶段课程逐步训练模型以进行位姿跟随、灵活的参考到目标视角生成以及时间展开。为支持时间展开,我们使用干净的历史目标视角令牌进行训练。我们还添加了多视角字幕以实现测试时文本控制。结合现成的4D高斯溅射阶段,我们的框架将单目静态相机视频提升为动态4D高斯溅射。在DNA-Rendering和ActorsHQ上的实验表明,Flex4DHuman超越了先前最先进的方法,而相同的公式在混合人体-动物训练后泛化到动物类别。这些能力使Flex4DHuman成为从随意单目视频进行可扩展4D内容创建的实际一步,适用于仿真、游戏、AR/VR和视频重拍。

英文摘要

We present Flex4DHuman, a multi-view video diffusion model that transforms a monocular or sparse multi-view video of a dynamic subject into synchronized dense multi-view videos using only relative camera-pose conditioning. Unlike prior human-centric methods that rely on skeletons, depth maps, normals, or rendered target-view geometry, Flex4DHuman requires no explicit geometry priors and instead conditions generation through relative camera-pose positional encoding. The generated videos can be directly ingested by downstream reconstruction pipelines to create dynamic 4D Gaussian splats. Built on the Wan 2.1 1.3B text-to-video model, Flex4DHuman preserves the backbone architecture and encodes camera and view information through a five-axis positional encoding that extends spatio-temporal RoPE with view indices and continuous SE(3) relative camera geometry. A three-stage curriculum progressively trains the model for pose following, flexible reference-to-target view generation, and temporal rollout. To support temporal rollout, we train with clean historical target-view tokens. We also add multi-view captions to enable test-time text control. Combined with an off-the-shelf 4D Gaussian Splatting stage, our framework lifts monocular static-camera videos into dynamic 4D Gaussian splats. Experiments on DNA-Rendering and ActorsHQ show that Flex4DHuman surpasses prior state-of-the-art methods, while the same formulation generalizes to animal categories after mixed human-animal training. These capabilities make Flex4DHuman a practical step toward scalable 4D content creation from casual monocular videos for simulation, gaming, AR/VR, and video re-shooting.

2606.13637 2026-06-12 cs.LG 新提交

The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learning

稳定恢复流形:持续学习中可恢复性的几何原理

Ayushman Trivedi, Bhavika Melwani

AI总结 通过分析Split CIFAR-100上ResNet-18的顺序学习,发现遗忘知识在表示重组后仍可紧凑解码,提出稳定恢复流形假说,表明灾难性遗忘主要是可访问性和流形对齐问题。

详情
Comments
9 pages, 8 figures, 8 tables
AI中文摘要

灾难性遗忘通常被视为顺序学习过程中先前学习知识的破坏。基于可访问性崩溃框架,我们研究了持续学习中可恢复性的几何结构。使用Split CIFAR-100和顺序训练的ResNet-18,我们分析了十个任务上的可恢复性、表示漂移和恢复复杂度。我们引入了恢复子空间维度(k_t),即保持完整探针性能90%所需的最小奇异方向数量。与我们的可恢复性扩散假说相反,尽管存在显著的表示漂移,恢复维度在整个训练过程中保持稳定(平均k_t = 8.0)。主角度漂移强烈预测可恢复性(r = -0.862),一个简单的几何模型解释了82.2%的可恢复性方差。这些发现支持稳定恢复流形假说,表明遗忘的知识在表示重组后仍可紧凑解码。结果表明,灾难性遗忘主要是一个可访问性和流形对齐问题,而非信息破坏。

英文摘要

Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual learning. Using Split CIFAR-100 and a sequentially trained ResNet-18, we analyze recoverability, representational drift, and recovery complexity across ten tasks. We introduce Recovery Subspace Dimensionality (k_t), a measure of the minimum number of singular directions required to preserve 90 percent of full probe performance. Contrary to our Recoverability Diffusion hypothesis, recovery dimensionality remains stable throughout training (mean k_t = 8.0) despite substantial representational drift. Principal-angle drift strongly predicts recoverability (r = -0.862), and a simple geometric model explains 82.2 percent of recoverability variance. These findings support the Stable Recovery Manifold hypothesis, suggesting that forgotten knowledge remains compactly decodable despite representational reorganization. The results indicate that catastrophic forgetting is primarily an accessibility and manifold-alignment problem rather than information destruction.

2606.13603 2026-06-12 cs.LG cs.AI cs.CL 新提交

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models

超越承诺边界:探究大型推理模型中的附带思维链

Daniel Scalena, Sara Candussio, Luca Bortolussi, Elisabetta Fersini, Malvina Nissim, Gabriele Sarti

AI总结 通过早期退出估计思维链步骤的因果重要性,发现推理中存在从瞬态猜测到稳定答案的“承诺边界”,后续步骤为附带现象,可提前退出以缩短推理长度达55%而不影响性能。

详情
AI中文摘要

思维链推理是语言模型推理时扩展的主导范式,但每个步骤对最终答案的因果影响尚不明确。我们通过早期退出估计每个步骤的因果重要性,并利用这一度量研究多个模型家族的推理轨迹中答案如何形成。在多种任务中,我们发现推理通常会跨越一个“承诺边界”——从瞬态中间猜测到稳定、高置信度答案的急剧转变。这种转变通常发生在单个步骤中,远在模型推理块结束之前,随后是“附带”的思维链步骤,这些步骤不改变最终答案概率。利用注意力探针,我们表明答案形成阶段可以从中间推理步骤中以高精度线性解码,并稳健地泛化到未见过的推理任务。我们利用这一信号在承诺边界处提前退出推理块,平均将思维链长度减少高达55%,而对模型性能影响微乎其微。

英文摘要

Chain-of-thought (CoT) reasoning is the dominant paradigm for inference-time scaling in language models, yet the causal influence of individual steps on the final answer poorly understood. We estimate each step's causal importance via early exit and use this measure to study how answers form across the reasoning traces of several model families. Across diverse tasks, we find that reasoning typically crosses a \emph{commitment boundary} -- a sharp transition from transient intermediate guesses to a stable, high-confidence answer. This transition often happens in a single step, well before the model's reasoning block ends, and is followed by \emph{epiphenomenal} CoT steps that leave the final answer probability unaltered. Using attention probes, we show that answer-formation stages can be linearly decoded from intermediate reasoning steps with high accuracy and generalize robustly to unseen reasoning tasks. We exploit this signal to early-exit reasoning blocks at the commitment boundary, reducing the length of CoTs up to 55\% on average with negligible impact on model performance.

2606.13598 2026-06-12 cs.AI cs.CL cs.LG cs.MA 新提交

Reward Modeling for Multi-Agent Orchestration

多智能体编排的奖励建模

King Yeung Tsang, Zihao Zhao, Vishal Venkataramani, Haizhou Shi, Zixuan Ke, Semih Yavuz, Shafiq Joty, Hao Wang

AI总结 提出OrchRM框架,通过自监督学习从多智能体执行中间产物构建奖励模型,无需人工标注,实现高效编排器训练和测试时扩展,在多个领域提升性能并降低计算成本。

详情
Comments
Preprint; work in progress
AI中文摘要

基于大型语言模型(LLM)的多智能体系统(MAS)需要有效的编排来协调专门化的智能体,然而训练这样的编排器受到有限监督和高计算成本的阻碍。我们提出了编排奖励建模(OrchRM),一种无需人工标注即可评估编排质量的自监督框架。OrchRM利用多智能体执行过程中的中间产物来构建Bradley-Terry奖励模型训练的胜负对。与现有的依赖昂贵子智能体展开的MAS测试时扩展和编排器训练框架不同,OrchRM直接在编排层面操作,实现了高效且高性能的奖励引导编排器训练和MAS测试时扩展。OrchRM在token使用上提高了高达10倍的训练效率,同时将MAS测试时扩展的准确率提升了高达8%。这些增益在多个领域(包括数学推理、基于网络的问答和多跳推理)中一致迁移,证明了编排级奖励建模作为鲁棒多智能体编排的可扩展方向。代码将在此https URL提供。

英文摘要

Multi-Agent Systems (MAS) built on Large Language Models (LLMs) require effective orchestration to coordinate specialized agents, yet training such orchestrators is hindered by limited supervision and high computational cost. We propose Orchestration Reward Modeling (OrchRM), a self-supervised framework for evaluating orchestration quality without human annotations. OrchRM leverages intermediate artifacts from multi-agent executions to construct win-lose pairs for Bradley-Terry reward model training. Unlike existing MAS test-time scaling and orchestrator training frameworks that rely on costly sub-agent rollouts, OrchRM operates directly at the orchestration level, enabling efficient and high-performing reward-guided orchestrator training and MAS test-time scaling. OrchRM improves training efficiency by up to 10x in token usage while improving MAS test-time scaling performance by up to 8% in accuracy. These gains consistently transfer across multiple domains, including mathematical reasoning, web-based question answering, and multi-hop reasoning, demonstrating orchestration-level reward modeling as a scalable direction for robust multi-agent orchestration. Code will be available at https://github.com/Wang-ML-Lab/OrchRM.

2606.13587 2026-06-12 cs.CV 新提交

Towards Effective Waste Segmentation for Automated Waste Recycling in Cluttered Background

面向杂乱背景下的自动废物回收的有效废物分割

Mamoona Javaid, Mubashir Noman, Abdul Hannan, Shah Nawaz, Mustansar Fiaz, Sajid Ghuffar

AI总结 提出一种结合空间域和谱域的级联分割网络,并引入辅助特征增强模块,在杂乱场景下实现高效废物分割,在三个数据集上验证了有效性。

详情
Comments
accepted at ICML 2026
AI中文摘要

城市区域的快速扩张和人口增长导致废物产量急剧增加,这需要高效自动化的废物管理。在此背景下,使用深度学习的自动废物回收(AWR)可以帮助人类实现最优废物管理。最近的AWR深度学习方法提供了有前景的废物分割性能,但这些方法依赖大型骨干网络,对AWR系统效率低下,且在杂乱场景中性能下降。为此,本文引入了一种最优废物分割网络,该网络有效利用空间域捕获局部结构依赖性和谱域高效提取全局上下文关系。这种级联设计使网络能够逐步利用互补域中的局部和全局表示,突出有效分割各种废物对象所需的语义信息。此外,引入了辅助特征增强模块(AFEM),以增强目标对象的边界和斑点放大,从而在杂乱场景中实现更好的分割。在ZeroWaste-aug、ZeroWaste-f和SpectralWaste数据集上的大量实验揭示了所提出方法的优势。

英文摘要

Rapid expansion of urban areas and population growth is causing an immense increase in waste production, which demands the need for efficient and automated waste management. In this scenario, automated waste recycling (AWR) using deep learning methods can assist humans in optimal waste management. Recent deep learning approaches for AWR provide promising waste segmentation performance, however, these methods rely on large backbone networks that are inefficient for AWR systems and suffer from performance deterioration in cluttered scenes. To this end, an optimal waste segmentation network is introduced which effectively utilizes the spatial domain to capture localized structural dependencies and the spectral domain to efficiently extract global contextual relationships. This cascaded design allows the network to progressively leverage both local and global representations across complementary domains to highlight the semantic information necessary for effective segmentation of various waste objects. Furthermore, auxiliary feature enhancement module (AFEM) is introduced to enhance the target objects' boundaries and blob amplification for better segmentation in cluttered scenarios. Extensive experimentation on ZeroWaste-aug, ZeroWaste-f and SpectralWaste datasets reveals the merits of the proposed method.

2606.13566 2026-06-12 cs.AI 新提交

A Three-Layer Framework for AI in Scientific Discovery

人工智能在科学发现中的三层框架

Guojun Liao

AI总结 提出AI在科学发现中的三层框架,核心创新是第二层:通过定性推理进行模型形成,识别框架结构不足并寻找缺失概念,通过三个案例说明其重要性。

详情
AI中文摘要

当前关于人工智能在科学发现中的讨论往往被两种可见的能力所主导:对现有知识的搜索以及通过优化、模拟和自动化的执行。两者都很重要,但都没有完全捕捉到发现的核心行为:模型的形成和演化。本文提出了AI在发现中的三层视图。第一层是大语言模型的搜索与检索。第二层,作为本文的主要创新,是通过定性推理进行模型形成:识别当前框架在结构上不足的能力,并在更广泛的表示空间中理解问题,不是通过试错,而是通过结构性的洞察,了解缺失了什么以及可以在哪里找到。第三层是执行、优化和细化。主要主张是第二层既是最重要的,也是发展最不充分的。没有模型形成的搜索仍然局限于继承的框架,而没有概念修订的执行只会放大现有的表述。我们通过三个案例研究来说明第二层推理:陈省身对高斯-博内定理的内蕴证明,通过李雅普诺夫函数解决内斯特罗夫加速梯度收敛问题,以及OpenAI在2026年自主反驳埃尔德什单位距离猜想。每个案例都表现出相同的结构特征:一个已经变得不充分的框架,一个缺失的概念对象,以及在一个意想不到的邻近领域中找到的解决方案。

英文摘要

Current discussions of AI in scientific discovery are often dominated by two visible capabilities: search over existing knowledge and execution through optimization, simulation, and automation. Both are important, but neither fully captures the central act of discovery: the formation and evolution of models. This paper proposes a three-layer view of AI in discovery. Layer 1 is search and retrieval by large language models. Layer 2, as the main innovation of this paper, is model formation through qualitative reasoning: the capacity to recognize when a current framework is structurally inadequate and to understand the problem within a broader representational space, not through trial and error, but through structural insight into what is missing and where it can be found. Layer 3 is execution, optimization, and refinement. The main claim is that Layer 2 is both the most important and the least developed. Search without model formation remains confined to inherited frameworks, while execution without conceptual revision only amplifies an existing formulation. We illustrate Layer 2 reasoning through three case studies: S. S. Chern's intrinsic proof of the Gauss-Bonnet theorem, the resolution of the Nesterov Accelerated Gradient convergence problem via Lyapunov functions, and the autonomous disproof of the Erdos unit distance conjecture by OpenAI in 2026. Each case exhibits the same structural signature: a framework that had become inadequate, a missing conceptual object, and a resolution found in an unexpected neighboring field.

2606.13565 2026-06-12 cs.LG 新提交

A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

A2D2: 任意长度离散扩散模型的自适应解码微调

Sophia Tang, Yuchen Zhu, Molei Tao, Pranam Chatterjee

AI总结 提出A2D2框架,通过联合优化插入和去掩蔽策略及基于质量的推理调度,实现任意长度离散扩散模型的奖励引导微调,理论上保证收敛到奖励倾斜分布,实验提升奖励优化与生成灵活性和准确性。

详情
AI中文摘要

离散扩散模型为序列生成提供了一个简单且稳定的基于似然的框架,最近通过令牌插入扩展到任意长度设置。然而,针对任意长度离散扩散的基于奖励的微调原则性方法仍 largely unexplored。我们引入了任意长度离散扩散模型的自适应解码微调(A2D2),这是一个统一的框架,通过联合优化插入和去掩蔽策略以及基于质量的推理调度,实现任意长度离散扩散模型的奖励引导微调。我们推导了联合插入-去掩蔽路径测度的Radon-Nikodym导数,从而在不需要目标样本的情况下,理论上保证收敛到难以处理的奖励倾斜序列分布。在此基础上,我们将去掩蔽和插入质量确立为最小化解码误差的可行方法,并引入自适应联合解码(AJD)损失,该损失可证明地生成产生奖励倾斜分布的最优路径测度。实验上,A2D2在提高奖励优化的同时,相比先前的固定长度微调和推理时引导方法,增强了生成的灵活性和准确性。

英文摘要

Discrete diffusion models offer a simple and stable likelihood-based framework for sequence generation, recently extended to any-length settings via token insertion. Principled reward-guided fine-tuning for any-length discrete diffusion, however, remains largely unexplored. We introduce Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding (A2D2), a unified framework for reward-guided fine-tuning of any-length discrete diffusion models via joint optimization of the insertion and unmasking policies together with a quality-based inference schedule. We derive the Radon-Nikodym derivative for the joint insertion-unmasking path measures, enabling theoretically guaranteed convergence to the intractable reward-tilted sequence distribution without requiring target samples. Building on this, we establish unmasking and insertion quality as tractable approaches for minimizing decoding error and introduce the Adaptive Joint Decoding (AJD) loss, which provably yields the optimal path measure that generates the reward-tilted distribution. Empirically, A2D2 improves reward optimization while enhancing generation flexibility and accuracy over prior fixed-length fine-tuning and inference-time guidance methods.

2606.13461 2026-06-12 cs.LG cs.CV 新提交

Reinforcement Learning for Neural Model Editing

神经模型编辑的强化学习

Shaivi Malik

AI总结 提出将神经模型编辑形式化为强化学习问题,通过奖励反馈学习编辑策略,在偏见缓解和机器遗忘任务上取得良好效果。

详情
AI中文摘要

编辑预训练神经网络需要针对特定目标定制的专用算法。设计此类算法通常耗时且需要大量精力。我们提出了一个探索性框架,将神经模型编辑形式化为强化学习问题,其中智能体使用奖励反馈修改模型。我们引入了两个环境:MaskWorld,其中智能体以乘法方式缩放权重;以及ShiftWorld,其中智能体应用加法权重更新。奖励函数结合了效用保持目标和任务特定编辑目标,使智能体能够在保持整体模型性能的同时学习有针对性的修改。我们在文本分类中的偏见缓解和图像分类中的机器遗忘上评估了该框架,这两者传统上都依赖于专用算法。我们的结果表明,在遗忘任务中,学习到的策略将遗忘集准确率降至接近0%,同时保留集准确率保持在90%以上。在偏见缓解设置中,学习到的策略将偏见相关性能提高了5%以上,同时保持了一般分类效用。我们的发现表明,神经模型编辑可以转化为强化学习问题,从而可以从奖励反馈中学习编辑策略,而不是为每个任务手动设计。

英文摘要

Editing pretrained neural networks requires specialized algorithms tailored to specific objectives. Designing such algorithms is often time-consuming and demands significant effort. We present an exploratory framework that formulates neural model editing as a reinforcement learning problem, where agents modify models using reward feedback. We introduce two environments: MaskWorld, where agents scale weights multiplicatively, and ShiftWorld, where agents apply additive weight updates. The reward function combines a utility-preservation objective with a task-specific editing objective, enabling agents to learn targeted modifications while maintaining overall model performance. We evaluate the framework on bias mitigation in text classification and machine unlearning in image classification, both of which traditionally rely on specialized algorithms. Our results show that the learned policies reduce forget set accuracy to nearly 0% while preserving over 90% retain set accuracy on the unlearning task. In the bias mitigation setting, the learned policies improve bias-related performance by more than 5% while maintaining general classification utility. Our findings show that neural model editing can be cast as a reinforcement learning problem, allowing editing policies to be learned from reward feedback rather than manually engineered for each task.

2606.13451 2026-06-12 cs.LG 新提交

Uncertainty Estimation for Molecular Diffusion Models

分子扩散模型的不确定性估计

Paul Seij, Christian A. Naesseth, Stephan Mandt, Metod Jazbec

AI总结 提出一种事后方法,利用去噪网络的拉普拉斯近似估计预训练分子扩散模型中每个样本的不确定性,该分数与样本质量负相关,可用于过滤生成样本。

详情
AI中文摘要

扩散模型已被广泛用于三维分子生成,但它们没有提供关于生成分子何时可能质量低下的原则性信号。我们提出了一种事后方法,用于估计预训练分子扩散模型中每个样本的不确定性。基于去噪网络的拉普拉斯近似,我们测量了生成轨迹中噪声预测的变异性。实验表明,所得的不确定性分数能够反映样本质量,与已建立的样本级质量指标呈负相关。我们进一步研究了如何使用所提出的不确定性分数来过滤生成的样本,通过测试时缩放提高模型性能。

英文摘要

Diffusion models have seen wide adoption for 3D molecular generation, yet they offer no principled signal of when a generated molecule is likely to be of low quality. We propose a post-hoc method for estimating per-sample uncertainty in pretrained molecular diffusion models. Building on a Laplace approximation of the denoising network, we measure the variability of the noise prediction across the generation trajectory. Empirically, we show that the resulting uncertainty score is informative of sample quality, exhibiting a negative correlation with established sample-level quality metrics. We further study how the proposed uncertainty score can be used to filter generated samples, improving model performance via test-time scaling.

2606.13443 2026-06-12 cs.LG 新提交

How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators

我们需要多少记忆?神经算子的自适应记忆门

Jihyeon Hur, Yongseok Kwon, Min-Gi Jo, Jeongwhan Choi, Noseong Park

AI总结 针对现有神经算子固定记忆权重适应性不足的问题,提出AMGFNO,通过可学习门动态调节记忆权重,在低分辨率下nRMSE降低55-79%。

详情
AI中文摘要

神经算子已成为求解时间相关PDE的强大数据驱动方法。在最近的进展中,记忆增强神经算子显式地纳入过去状态,并在低分辨率观测设置下取得了显著性能。然而,现有方法无论观测条件(如分辨率或物理参数)如何,都应用固定的记忆权重,限制了其适应性。我们的初步实验表明,最优记忆权重随分辨率和粘度变化,这意味着固定记忆权重无法同时优化不同设置下的性能。我们提出AMGFNO,通过可学习门动态调节记忆权重。在Kuramoto-Sivashinsky和Burgers方程上,AMGFNO在低分辨率下实现了55-79%的nRMSE降低,且学习到的门值随分辨率增加自动从$\bar{g} \approx 0.7$降至接近零。

英文摘要

Neural operators have emerged as a powerful data-driven approach for solving time-dependent PDEs. Among recent advances, memory-augmented neural operators explicitly incorporate past states and have achieved remarkable performance under low-resolution observation settings. However, existing approaches apply a fixed memory weight regardless of observation conditions, such as resolution or physical parameters, limiting their adaptability. Our preliminary experiments reveal that optimal memory weight varies with resolution and viscosity, implying that a fixed memory weight cannot simultaneously optimize performance across diverse settings. We propose AMGFNO, which dynamically modulates memory weight through a learnable gate. On the Kuramoto-Sivashinsky and Burgers' equations, AMGFNO achieves 55-79% nRMSE reduction over at low resolution, with the learned gate value automatically decreasing from $\bar{g} \approx 0.7$ to near-zero as resolution increases.