arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2605.07886 2026-05-11 stat.ML cs.LG

Characterizing and Correcting Effective Target Shift in Online Learning

Ziyan Li, Naoki Hiratani

AI总结 本文研究了在线学习中由于分布偏移导致的有效目标漂移问题,通过核回归的视角揭示了在线学习与离线学习之间的关系,并推导出在线核回归等价于使用漂移目标输出的离线回归。通过目标校正方法,论文证明了在线学习可以与离线学习达到相同的预测性能,并提出了闭式和迭代式的目标修正方法。实验表明,该方法在持续学习任务中优于使用真实目标的在线梯度下降方法,为非平稳环境下的在线学习提供了分析与改进的理论框架。

Comments 22 pages; 6 figures

详情
英文摘要

Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online and offline learning in the context of kernel regression. We derive a closed-form expression for the function learned by online kernel regression, revealing that online kernel regression is equivalent to offline regression with shifted, inaccurate target outputs. Conversely, we show that by compensating for this effective shift in the teaching signal through target correction, online kernel-based learning can provably learn the same predictor as its offline counterpart. We derive both a closed-form expression for this target correction and an iterative form that can be applied sequentially. Applying this framework to image classification tasks on CIFAR-10 and CORe50, we show that online stochastic gradient descent with iteratively corrected targets outperforms learning with the true targets in continual learning settings. This work therefore provides a basic framework for analyzing and improving online learning in non-stationary environments.

2605.07838 2026-05-11 q-bio.QM cs.AI cs.LG

PPI-Net connects molecular protein interactions to functional processes in disease

Kyle Higgins, Guadalupe Gonzalez, Dennis Veselkov, Ivan Laponogov, Kirill Veselkov

AI总结 该研究提出了一种名为PPI-Net的分层图神经网络,旨在通过整合蛋白质-蛋白质相互作用网络与通路层级表示,揭示分子互作如何驱动疾病功能过程。该模型利用图注意力机制,将患者特异的分子特征在共享的生物互作网络中传播,从而实现从基因到高阶生物学过程的信号聚合。实验表明,PPI-Net在多种癌症数据集上表现出优异的预测性能,并通过整合多组学数据提升了模型的可解释性,揭示了癌症相关的关键信号通路和生物学机制。

Comments 17 pages, 3 figures, 2 tables

详情
英文摘要

Understanding how molecular alterations propagate across biological systems to drive disease remains a central challenge. Although high-throughput profiling enables comprehensive characterization of tumor states, most models neglect structured biological relationships or lack interpretability across scales. Here we present PPI-Net, a hierarchical graph neural network that integrates protein-protein interaction (PPI) networks with pathway-level representations to model disease from molecular interactions to functional processes. Patient-specific molecular profiles are embedded within a shared interaction network from STRING and propagated through a multi-layer Reactome hierarchy using graph attention, enabling aggregation of gene-level signals into higher-order biological programs. Across RNA-seq data from ten cancer types from The Cancer Genome Atlas, PPI-Net achieves robust predictive performance, with balanced accuracy exceeding 90% in multiple cohorts. Comparative analysis on RNA-Seq data from breast cancer demonstrated that PPI-Net's integration of the Reactome hierarchy improved balanced accuracy by 6.7% relative to a PPI-only model, while hierarchical multi-level supervision improved balanced accuracy by 12.3% relative to using only a single top-level prediction head. Applying a multi-omics approach using RNA-seq and methylation data improves model interpretation, recovering canonical oncogenic modules, including TP53-AKT signaling and stress response pathways, while revealing convergence onto coherent programs such as ion signaling and cellular responses to stimuli. These results demonstrate that integrating interaction networks with pathway hierarchies enables accurate prediction while providing mechanistic insight into cancer biology.

2605.07830 2026-05-11 cs.CR cs.AI

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim, Hoki Kim

AI总结 本文提出CyBiasBench,一个用于评估大语言模型(LLM)代理在网络攻击场景中偏见行为的基准测试平台。研究发现,不同代理在攻击选择上表现出显著的偏见,倾向于集中使用特定类型的攻击方法,且这种偏见不受提示变化的影响。通过在不同目标和提示条件下对五种代理进行系统评估,作者揭示了攻击分布的熵值差异及偏见惯性效应,表明代理的攻击偏好是其固有特性,而非攻击成功率的函数。

Comments Under Review

详情
英文摘要

Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection bias, disproportionately concentrating its efforts on a narrow subset of attack families regardless of prompt variations. To systematically quantify this behavior, we introduce CyBiasBench, a comprehensive 630-session benchmark that evaluates five agents on three targets and four prompt conditions with ten attack families. We identify explicit bias across agents, with different dominant attack families and varying entropy levels in their attack-family allocation distributions. Such bias is better characterized as a trait of the agents, rather than a factor associated with the attack success rate. Furthermore, our experiments reveal a bias momentum effect, where agents resist explicit steering toward attack families that conflict with their bias. This forced distribution shift does not yield measurable improvements in attack performance. To ensure reproducibility and facilitate future research, we release an interactive result dashboard at https://trustworthyai.co.kr/CyBiasBench/ and a reproducibility artifact with aggregated session-level statistics and full evaluation scripts at https://github.com/Harry24k/CyBiasBench.

2605.07825 2026-05-11 cs.MM cs.CV

Anisotropic Modality Align

Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang, Hao Tang, Yue Song, Xiaobin Hu, Chengwei Qin, Shuicheng Yan, Hui Xiong

AI总结 多模态大语言模型的训练长期受到高质量配对数据稀缺的限制。本文研究发现,不同模态在共享表示空间中存在各向异性残差结构,这是阻碍模态互换的主要原因。基于此,作者提出了一个各向异性模态对齐框架 AnisoAlign,通过利用目标模态的几何先验对源模态表示进行有界修正,从而在无配对数据情况下实现模态对齐,实验表明该方法在几何诊断和文本-only训练中均表现出色。

详情
英文摘要

Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent Modality Gap in the shared space. In this work, we revisit the geometric nature of the modality gap. We find that modality representations already share compatible dominant semantic geometry. What truly hinders modality interchangeability is not a simple global shift, but an anisotropic residual structure concentrated along a small number of dominant directions. Based on this finding, we further propose the principle of anisotropic modality gap alignment: effective modality alignment should align with the target-modality distribution while preserving the semantic structure of the source modality. Guided by this principle, we propose an anisotropic geometric correction framework, AnisoAlign, for unpaired modality alignment. This framework leverages the internal geometric prior of the target modality and performs bounded correction on source-modality representations, thereby constructing substitute representations in the target modality. Experiments confirm its benefits in both geometric diagnostics and text-only MLLM training. Overall, this work recasts the modality gap from an empirical observation into a correctable, structured geometric phenomenon and provides a new representation alignment perspective for training multimodal models with unimodal data.

2605.07812 2026-05-11 cs.CR cs.LG

GRASP -- Graph-Based Anomaly Detection Through Self-Supervised Classification

Robin Buchta, Carsten Kleiner, Felix Heine, Gabi Dreo Rodosek

AI总结 本文提出了一种基于图的自监督分类方法GRASP,用于检测高级持续性威胁(APT)攻击。该方法通过遮蔽进程可执行文件信息,并从其两跳溯源图邻域中学习推断,从而识别异常行为,无需依赖预设阈值,提高了检测的鲁棒性和泛化能力。实验表明,GRASP在多个数据集上优于现有系统,能够有效检测已知攻击行为,并发现文档中未标记的潜在恶意活动。

Comments 17 pages

详情
英文摘要

Advanced persistent threat (APT) attacks remain difficult to detect due to their stealth, adaptability, and use of legitimate system components. Provenance-based intrusion detection systems (PIDS) offer a promising defense by capturing detailed relationships between system components and actions. However, current PIDS rely on predefined or subset-determined thresholds, which limit detection stability and the ability to detect any anomalous behavior in general. Furthermore, related work often neglects the role of process executables, which describe system activity by interacting through a process with files, network components, and other processes. We introduce GRASP, a PIDS based on masked self-supervised classification. GRASP masks the executable information of processes and learns to infer it from their two-hop provenance graph neighborhood, marking misclassified processes as anomalies. It captures behavior patterns for the learned executables without thresholding, making it robust against interference and unknown activities. Evaluations on the DARPA TC and OpTC datasets demonstrate that GRASP consistently detects anomalous behavior, including known attack-related activities, outperforming existing systems. Our PIDS identifies all documented attacks on datasets where the behavior of executables is learnable. In addition, compared to existing systems, GRASP uncovers potentially malicious anomalous behavior not labeled as an attack in the documentation.

2605.07810 2026-05-11 physics.optics cs.CV

Pre-training Enables Extraordinary All-optical Image Denoising

Xudong Lv, Yuxiang Sun, Shuo Wang, Nanxing Chen, Jun Guan, Jingtian Hu

AI总结 本文提出了一种基于预训练的全光图像去噪方法,有效提升了光学神经网络在处理严重噪声图像时的性能。研究采用两步优化流程,首先利用大规模简单图像数据集进行预训练,再针对具体任务数据集进行微调,显著提高了去噪质量,将信噪比从低于8 dB提升至高于18 dB,并在多种不同风格的图像数据上表现出良好的泛化能力。该方法在基于视觉的应用中,如人脸识别、车牌识别和无人机定位中也展现了重要价值。

详情
英文摘要

Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to their digital counterparts and are leading to suboptimal performance. This paper reports a pre-training-driven approach that leads to snapshot image denoising with substantially improved quality. We demonstrated effective free-space optical denoising by a diffractive network optimized by a two-step process including (1) pre-training using a massive dataset of 3.45 million diverse but simple images and (2) fine-tuning with the corresponding task-specific datasets. Compared to conventional Fourier-domain filtering and directly trained diffractive networks, such a transfer learning process exhibited prominent advantages for denoising images degraded by severe noise, peak signal-to-noise ratio (PSNR) below 8 dB, while preserving fine image features and improving the PSNR to above 18 dB. Importantly, the same pre-trained optical network could be consistently fine-tuned to process degraded images from highly diverse styles ranging from handwritten digits (MNIST) and chest X-rays (ChestMNIST) to CIFAR-10 images and human faces (CelebA). We further demonstrated the critical role of our optical denoisers in vision-based applications, including face detection, plate recognition, and localization of UAVs in noisy conditions.

2605.07768 2026-05-11 eess.SY cs.LG cs.SY

Interactive Trajectory Planning with Learning-based Distributionally Robust Model Predictive Control and Markov Systems

Erik Börve, Nikolce Murgovski, Morteza Haghir Chehreghani, Leo Laine

AI总结 本文研究了在周围智能体决策存在不确定性的情况下,如何进行交互式轨迹规划。作者提出了一种基于学习的分布鲁棒模型预测控制(DR-MPC)方法,结合PAC学习理论,以应对学习分布中的误差。该方法能够在样本数量变化时,在鲁棒MPC与理想SMPC之间进行有效插值,提升了轨迹规划的鲁棒性与适应性。

详情
英文摘要

We investigate interactive trajectory planning subject to uncertainty in the decisions of surrounding agents. To control the ego-agent, we aim to first learn the decision distribution and solve a Stochastic Model Predictive Control (SMPC) problem. To account for errors in the learned distribution, we show that it is possible to utilize Probably Approximately Correct (PAC) learning in combination with Distributionally Robust (DR) optimization to obtain a solution which accounts for the errors induced by the learning model. The results indicate that our PAC learning-based DR-MPC framework provides a method to interpolate between a robust MPC and an omnipotent SMPC, based on the available number of samples.

2605.07758 2026-05-11 cs.FL cs.LG

SMT-Based Active Learning of Weighted Automata

Tiago Ferreira, Kevin Batz, Alexandra Silva

AI总结 本文提出了一种基于SMT的主动学习算法,用于学习非确定性加权自动机(WFA),作为Hankel/L*-类方法的实用且鲁棒的替代方案。该算法在给定半环的基础上进行参数化,若能终止,则保证生成最小的WFA,并证明了其部分正确性及终止条件。实验表明,该算法在有限和无限半环上均能有效学习最小WFA,显著优于简单基线方法,并在生成更小自动机和减少与教师交互方面具有竞争力。

Comments Appearing in CAV 2026

详情
英文摘要

We present an SMT-based active learning algorithm for nondeterministic weighted automata (WFAs) as a practical and robust alternative to Hankel/L*-style methods. Our algorithm is parametric in a given semiring and, if it terminates, guaranteed to produce minimal WFAs. We prove partial correctness and provide a sufficient termination condition, which in particular implies termination for all finite semirings. Our extensive experimental evaluation shows that our algorithm is capable of learning numerous minimal WFAs over both finite and infinite semirings, vastly outperforms a naive baseline, and is competitive with a state-of-the-art algorithm while producing significantly smaller automata and requiring less interaction with the teacher.

2605.07751 2026-05-11 cs.CY cs.AI

Vibe coding before the trend

Leon van Bokhorst, Koen Suilen

AI总结 2025年初,研究者在四组不同专业的学生中开展了“氛围编程”挑战活动,观察到AI工具的使用促使学生从关注语法转向高阶思维,从记忆转向评估,同时提升了对AI技能的重视。非技术背景学生尤其认可AI工具的易用性,研究认为AI与学习者的关系更像伙伴关系而非替代关系。本文总结了课堂实验中的观察结果,为教育者提供了实践经验和反思。

Comments 10 pages

详情
英文摘要

Early 2025 we ran a series of vibe coding challenges across four different student cohorts. The cohorts included 54 ICT students, 24 digital marketing students, and 7 journalism students at Fontys University of Applied Sciences (Netherlands), and 22 BA Communication students at North-West University (South Africa). From the student reflections, five major patterns emerged. Students reported that AI tools shifted their focus from syntax to higher-order thinking; they also described a skill shift from memorizing to evaluating; they viewed AI proficiency as career-essential; they framed their relationship with AI as partnership rather than replacement; and finally non-technical students showed the strongest appreciation for the accessibility these tools provide. This practitioner report documents what we observed during the classroom experiments, we reflect on how the landscape has shifted in the year since, and shares practical lessons for educators considering similar experiments. We present the observations as what they are: patterns from practice, not proven conclusions, in the beleif that sharing early stage experiences contributes to the overall field of AI and education.

2605.07746 2026-05-11 stat.ML cs.LG q-bio.QM

Flow Matching for Count Data

Ganchao Wei, John Pearson

AI总结 本文研究了高维计数数据(如单细胞RNA测序和神经脉冲序列)的生成建模问题,提出了一种基于连续时间出生-死亡过程的流匹配框架count-FM。该方法通过模拟自由的方式学习计数空间中的边际转移率,实现了在任意计数分布源和目标之间进行高效的生成与迁移。实验表明,count-FM在样本质量、模型效率和路径可解释性方面优于现有方法,适用于无条件生成、数据迁移和条件生成等多种任务。

详情
英文摘要

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.

2605.07738 2026-05-11 physics.comp-ph cs.LG

Physics-Informed Reduced-Order Operator Learning for Hyperelasticity in Continuum Micromechanics

Hamidreza Eivazi, Henning Wessels

AI总结 该研究提出了一种结合物理信息的降阶算子学习方法,用于连续介质细观力学中的超弹性问题。通过将平衡神经算子(EquiNO)与基于QR分解的离散经验插值法(Q-DEIM)结合,有效降低了损失函数评估的计算成本,并保证了周期性和力学平衡的约束。该方法在三维代表性体积单元(RVE)上验证,显著提升了计算效率,同时保持了对微观应力场和宏观应力的高精度预测能力。

Comments 22 pages, 12 figures

详情
英文摘要

Physics-informed operator learning is an attractive candidate for surrogate modeling of microstructures, especially in multiscale finite-element simulations. Its practical use, however, is often limited by the high cost of loss evaluation. We address this bottleneck by combining the Equilibrium Neural Operator (EquiNO) with the QR-based discrete empirical interpolation method (Q-DEIM). EquiNO learns only the modal coefficients of reduced displacement-fluctuation and first Piola-Kirchhoff stress representations built from periodic and divergence-free bases, thereby enforcing periodicity and mechanical equilibrium by construction. Q-DEIM then identifies a small set of spatial points through a column-pivoted QR factorization of the stress basis and restricts constitutive evaluations during training to these points alone. This makes full-batch second-order optimization practical for three-dimensional representative volume elements (RVEs). Homogenized first Piola-Kirchhoff stresses are recovered directly from the offline-averaged reduced stress modes, without the need to reconstruct the full stress field at inference time. We validate the framework on two three-dimensional finite-strain hyperelastic RVEs. Q-DEIM reduces the per-step training cost by roughly three orders of magnitude relative to full-field loss evaluation, while reduced homogenization achieves speed-up factors of order $10^3$ to $10^4$ over direct full-field computations. Despite relying on only a small number of offline snapshot loading paths for basis construction, the method accurately interpolates and extrapolates both microscopic stress fields and homogenized stresses, with prediction quality improving systematically as more snapshots are added.

2605.07723 2026-05-11 cs.DL cs.AI cs.CY physics.soc-ph

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg, Yian Yin

AI总结 本研究通过分析arXiv、bioRxiv、SSRN和PubMed Central等平台上的250万篇论文共计1.11亿条引用,揭示了大型语言模型(LLM)在实际应用中产生的“幻觉”问题——即生成不存在的引用。研究发现,随着LLM的广泛使用,虚假引用数量显著上升,仅2025年就估计有146,932条此类错误引用。这些错误在AI技术应用较多的领域、使用AI辅助写作的论文以及由小型或早期职业研究团队撰写的论文中尤为明显,且倾向于错误地引用已知名且男性学者,可能加剧科学界现有的不平等问题。研究指出,当前的预印本审核和期刊出版流程难以有效遏制这类错误的扩散。

详情
英文摘要

Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.

2605.07705 2026-05-11 cs.LO cs.AI

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

Veeti Ahvonen, Damian Heiman, Antti Kuusisto, Miguel Moreno, Matias Selin

AI总结 本文提出了对编码器-解码器变换器(如大型语言模型的基础架构)的一种新颖逻辑刻画,研究了其在浮点数和软注意力机制下的实际文本处理场景。作者引入了一种新的时序逻辑,扩展了命题逻辑,包含对编码器输入的计数全局模态和对解码器输入的过去模态。此外,还通过分布式自动机对这类变换器进行了补充刻画,并展示了结果的通用性,能够适应如掩码等架构变化。最后,文章还讨论了编码器-解码器变换器在自回归设置中的应用。

详情
英文摘要

We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical setting of floating-point numbers and soft-attention, characterizing them with a new temporal logic. This logic extends propositional logic with a counting global modality over the encoder input and a past modality over the decoder input. We also give an additional characterization of such transformers via a type of distributed automata, and show that our results are not limited to the specific choices in the architecture and can account for changes in, e.g., masking. Finally, we discuss encoder-decoder transformers in the autoregressive setting.

2605.07694 2026-05-11 eess.AS cs.AI cs.SD eess.SP

Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation

Michael Neri, Archontis Politis, Tuomas Virtanen

AI总结 本文研究了单通道说话人距离估计模型对房间脉冲响应中早期反射和晚期混响的依赖性。通过将模拟的RIR分解为四种变体,并在不同校准条件下进行评估,发现模型在未进行时间校准时主要依赖早期反射信息,而在时间校准条件下仅通过传播延迟即可实现较高精度的距离估计。研究还表明,早期能量越强、环境混响越弱,估计精度越高。

Comments Submitted to IWAENC 2026

详情
英文摘要

Single-channel speaker distance estimation has recently achieved centimeter-level accuracy in simulated environments, yet it remains unclear which components of the room impulse response (RIR) the model exploits and how performance depends on the recording conditions. In this work, we decompose simulated RIRs into four variants (full, direct-only, no-late, and no-early) using the mixing time estimated from the echo density function as the boundary between early reflections and late reverberation. We define four calibration scenarios, from fully calibrated (synchronised capture, known source level) to fully uncalibrated (arbitrary onset, unknown level), and evaluate all combinations on a matched dataset. Results show that without time calibration, mean absolute error (MAE) increases to $1.29$ m and the model extracts reverberation-based cues, with early reflections emerging as the most informative component. Further analysis against DRR, $C_{50}$, and $T_{60}$ confirms that estimation accuracy improves with stronger early energy and degrades in highly reverberant environments. When time calibration is available, the model achieves a MAE of $0.14$ m by extracting the propagation delay alone, regardless of the RIR content.

2605.07677 2026-05-11 cs.IR cs.AI cs.CL

TRACE: Tourism Recommendation with Accountable Citation Evidence

Zixu Zhao, Sijin Wang, Yu Hou, Yuanyuan Xu, Yufan Sheng, Xike Xie, Wenjie Zhang, Won-Yong Shin, Xin Cao

AI总结 本文提出TRACE,一个用于旅游推荐的可问责对话推荐数据集,旨在解决现有系统在可信性、可验证性和适应性方面的不足。TRACE包含多轮对话、真实用户评论引用和明确的拒绝回合,涵盖2400个POI和34000条评论,支持14种基线方法和25项评估指标。研究揭示了旅游推荐中的“三能力差距”,并表明基于真实评论的引用评分与人工标注高度一致,为构建更可靠、可解释的旅游推荐系统提供了新方向。

详情
英文摘要

Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim review-span evidence and rejection recovery. This leaves an evaluation gap for tourism recommendation that is simultaneously trustworthy, verifiable, and adaptive: recommend the right point of interest (POI) for multi-aspect preferences (such as cuisine, price, atmosphere, walking distance), justify each suggestion with verifiable evidence from prior visitors so the traveler can act without trial and error, and recover when the first recommendation is rejected mid-dialogue. We introduce TRACE, where each item is a multi-turn tourism recommendation dialogue with review-span citations and explicit rejection turns: 10,000 dialogues over 2,400 Yelp POIs and 34,208 reviews across eight U.S. cities, paired with 14 retrieval, planning, and LLM baselines, along with 25 metrics organized under Accuracy, Grounding, and Recovery. Across these baselines, TRACE reveals the Three-Competency Gap: LLM Zero-Shot leads in closed-set Recall@1 and rejection recovery but cites less densely than retrievers; non-LLM retrievers achieve surface-verbatim grounding but with low accuracy; Multi-Review Synthesis fails at recovery. The Grounding Score agrees with human citation precision (Spearman rho=+0.80, p<10^-20), and paired t-tests reproduce the per-baseline ranking (p<0.01 on the dominant contrasts). TRACE reframes accountable tourism recommendation as a joint target (right POI, verifiable evidence, adaptive repair) rather than a single-axis leaderboard.

2605.07674 2026-05-11 cs.GT cs.CR cs.LG

Differentially Private Auditing Under Strategic Response

Florian A. D. Burnat

AI总结 本文研究了在开发者可以战略性响应隐私约束审计接口的情况下,如何设计差分隐私审计机制。作者将隐私约束审计建模为一个双层Stackelberg博弈,并引入了福利加权未检测差距 $B_w$ 作为衡量审计效果的指标,证明了传统的差分隐私审计方法在特定条件下会导致更大的未检测风险。为此,作者提出了战略性私有审计设计(SPAD)方法,通过开发者的KKT系统将双层问题转化为单层优化问题,并设计了基于投影梯度的算法进行求解。

详情
英文摘要

Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the welfare-weighted under-detection gap $B_w$, the welfare-weighted true residual harm the audit fails to detect at the developer's strategic best response, and prove that naive DP auditing (uniform or harm-proportional allocation) induces a strictly larger $B_w$ than any non-strategic mitigation baseline whenever effective detectability is heterogeneous, the welfare weights are not comonotone with detectability, and the developer's optimum is interior. We characterize the optimal auditor allocation as a four-factor balance of welfare weight, audit miss-probability, detectability elasticity, and mitigation-cost curvature, and provide a single-level reformulation of the bilevel problem via the developer's KKT system. We propose Strategic Private Audit Design (SPAD), a projected-gradient algorithm with hypergradients computed through the developer's best response.

2605.07671 2026-05-11 cs.GT cs.AI cs.MA econ.TH math.OC

The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting

Lauri Lovén, Sasu Tarkoma

AI总结 本文研究了在自主智能体报告评分机制中,由于评分规则与智能体自身利益之间的内生性关联,导致真实报告难以被有效激励的问题。核心发现是,当使用非仿射的批准函数进行类型筛选时,智能体在无法被检测的偏差下,真实报告将不再是其最优策略,从而破坏评分校准。研究进一步提出,通过使用阶梯函数的批准阈值,可以在不损害校准的前提下实现最优类型筛选,尤其在Brier评分下,次优与最优之间的福利差距可以被消除,这一特性在其他评分规则中并不成立。

Comments 38 pages, no figures. Targeting ACM Transactions on Economics and Computation (TEAC); preprint

详情
英文摘要

Eliciting truthful reports from autonomous agents is a core problem in scalable AI oversight: a principal scores the agent's report using a strictly proper scoring rule, but the agent also benefits from the report through a non-accuracy channel (approval for autonomous action, allocation share, downstream control). The same structure appears in classical mechanism-design settings such as marketplace operation. Our main result is an endogeneity: the principal's optimal oversight necessarily uses a non-affine approval function to screen types, yet any non-affine approval makes truthful reporting suboptimal under the combined objective whenever deviation is undetectable. The principal cannot avoid the perturbation that undermines calibration. This impossibility holds for all strictly proper scoring rules, with a closed-form perturbation formula. A constructive escape exists: a step-function approval threshold achieves first-best screening for every strictly proper scoring rule, because the agent's binary inflate-or-not choice creates a type-space threshold regardless of the generator's curvature. Under the Brier score specifically, the type-independent inflation cost yields a welfare equivalence between second-best and first-best; we prove this equivalence is unique to Brier (the welfare gap under smooth $C^1$ oversight is bounded below by $Ω(\text{Var}(1/G'') (γ/β)^2)$ for every non-Brier rule). Two instances develop the framework: AI agent oversight (the lead motivating setting) and marketplace operation (a parallel mechanism-design domain). The message for AI alignment is direct: smooth scoring-based oversight cannot elicit truthful reports from a strategic agent; sharp thresholds are the calibration-preserving design.

2605.07665 2026-05-11 stat.ML cs.LG

Debiased Counterfactual Generation via Flow Matching from Observations

Hugh Dance, Johnny Xi, Peter Orbanz, Benjamin Bloem-Reddy

AI总结 本文研究了在干预下估计反事实分布的问题,提出了一种基于观测数据的去混淆流匹配方法,通过利用观测分布与反事实分布之间的紧密联系,提高了反事实生成的准确性。该方法通过流匹配框架和半参数高效估计器实现,能够在高维空间中学习最小能量流,有效克服了现有方法的偏差和失败模式。

详情
英文摘要

Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close under weak confounding, and share any features of high-dimensional outcomes which are invariant to confounders. These properties motivate learning counterfactual distributions not from scratch, but via a deconfounding flow from the observational distribution. We formulate this problem via flow-matching and derive a semiparametrically efficient estimator based on a novel efficient influence function correction. We subsequently extend our estimator to target minimal-energy flows in high-dimensions, which we show can be especially simple targets between observational and counterfactual distributions. In experiments, deconfounding flows outperform existing debiased counterfactual distribution estimators, while also mitigating known failure modes of flow-based methods.

2605.07663 2026-05-11 cs.GT cs.CR cs.LG

Quotient Semivalues for False-Name-Resistant Data Attribution

Florian A. D. Burnat, Brittany I. Davidson

AI总结 本文研究了在机器学习数据归属中,如何防止贡献者通过虚假身份(如数据分裂、复制或合成数据)来夸大自身贡献的问题。作者提出了一种基于商值半值(quotient semivalue)的机制,通过在证据支持的归属簇上计算Shapley、Banzhaf等值,有效吸收簇内的重复贡献,从而提升归属的公平性与抗虚假身份攻击能力。实验表明,该方法在合成分类任务中显著降低了Sybil攻击带来的收益,提升了数据归属的鲁棒性。

详情
英文摘要

Data valuation methods allocate payments and audit training data's contribution to machine-learning pipelines; however, they often assume passive contributors. In reality, contributors can split datasets across pseudonymous identities, duplicate high-value examples, create near-duplicates, or launder synthetic variants to inflate their share. We formalize this as false-name manipulation in ML data attribution. Our main construction is the quotient semivalue mechanism: compute Shapley-, Banzhaf-, or Beta-style values over evidence-backed attribution clusters instead of raw identities, using a canonical-representative operator to absorb within-cluster duplication. We prove an impossibility: on a fixed monotone data-value game, exact Shapley-fair attribution over reported identities is incompatible with unrestricted false-name-proofness, even on binary-valued instances, and characterize the split-gain of a general semivalue on a unanimity counter-example. The mechanism is exactly false-name-proof under two structural conditions: false-name-neutral within-cluster allocation and quotient-stable manipulations. Under imperfect provenance, when these conditions hold approximately, manipulation gain and fairness loss are bounded by three measurable quantities: escaped-cluster mass, value-estimation error, and clustering distance. We instantiate the mechanisms in DataMarket-Gym, a benchmark for attribution under strategic provider attacks. On synthetic classification tasks, quotient semivalues with example-level evidence reduce manipulation gain on duplicate and near-duplicate Sybil attacks from $1.74$ under baseline Shapley to $0.96$, near the honest level. The cosine-threshold and (false-merge, false-split) rate sweeps trace the corresponding fairness--Sybil frontier.

2605.07654 2026-05-11 stat.ML cs.CL cs.LG

Reliable Chain-of-Thought via Prefix Consistency

Naoto Iwase, Yuki Ichihara, Mohammad Atif Quamar, Junpei Komiyama

AI总结 该研究提出了一种名为“前缀一致性”的新方法,用于提升大型语言模型在推理任务中的可靠性。通过观察正确答案的思维链在截断后更可能被重新生成,研究利用这一特性作为可靠性信号,对候选答案进行加权。实验表明,该方法在多个数学和科学基准测试中表现出色,能以更少的计算资源达到与多数投票相当的准确率。

Comments See our project page at https://naoto-iwase.github.io/prefix-consistency-page

详情
英文摘要

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

2605.07634 2026-05-11 math.OC cs.LG math.ST stat.TH

Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling

Manojlo Vukovic, Dusan Jakovetic

AI总结 本文研究了在重尾噪声环境下鲁棒的一阶随机优化方法,提出了一种基于中位数梯度采样的新型随机梯度下降算法(R-SGD-Mini)。该方法通过将数据批次划分为多个子块,计算每个子块的梯度,并选择梯度中位数方向进行参数更新,从而有效降低噪声影响。理论分析表明,该算法在非凸设置下能够以 $\mathcal{O}(T^{-1})$ 的速率收敛,并在已知时间范围时达到 $\mathcal{O}(T^{-1/2})$ 的更快收敛速度,实验结果也验证了其优于传统方法的性能。

详情
英文摘要

We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In this paper, we take an alternative approach and propose a novel stochastic first order method dubbed Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling, R-SGD-Mini for short. The core idea of R-SGD-Mini is to split the $K$-sized data batch into $M$ distinct data chunks, form for each chunk the stochastic gradient, and update the solution estimate with respect to the stochastic gradient direction of the chunk that is medoid of gradients of all data-chunks. Under a general class of symmetric heavy-tailed gradient noises and a standard non-convex setting, we establish explicit bounds on the expected time-averaged squared gradient norm. More precisely, we show that the latter quantity converges at rate $\mathcal{O}(T^{-1})$ to a small neighborhood of zero; we explicitly characterize this neighborhood in terms of noise and algorithm's parameters. Moreover, if the time horizon is known in advance, we establish the rate of $\mathcal{O}(T^{-\frac{1}{2}}).$ Furthermore, when clipping is incorporated, we obtain convergence guaranties in the high-probability sense and recover the same rate. Experimental results indicate that R-SGD-Mini and its clipped variant consistently perform favorably compared to SGD, clipped SGD and Median-of-Means based methods.

2605.07536 2026-05-11 cs.CR cs.LG

GESR: Graph-Based Edge Semantic Reconstruction for Stealthy Communication Detection with Benign-Only Training

Henghui Xu, Yuchen Zhang, Xiaobo Ma

AI总结 在仅有良性流量训练的情况下检测隐蔽恶意通信是网络安全部门面临的重要挑战。为解决这一问题,本文提出了一种基于图结构的新型框架GESR,通过重构通信边的语义信息,从局部结构上下文中捕捉通信模式,从而有效识别异常通信和主机。该方法无需依赖标记的攻击样本,利用图结构的一致性进行异常检测,并在多个数据集上取得了优异的检测性能。

详情
英文摘要

Detecting stealthy malicious communications from flow logs under benign-only training remains a critical challenge in network security. Malicious communications often camouflage as normal traffic like standard HTTPS flows. Conventional intrusion detectors rely strictly on known labeled attacks. Alternatively, they score flows completely independently. These approaches fail against sparse and context-dependent suspicious activity. To capture this essential context, graph anomaly detectors have been introduced to add valuable relational information to the analysis. However, existing methods fail to test the structural consistency of specific communication edges. To overcome these fundamental limitations, we present GESR, a novel graph-based framework for detecting suspicious communications and anomalous hosts under a benign-only training setting. GESR models complex network activity as attributed communication graphs. It cleverly reconstructs edge semantics entirely from local structural context rather than isolated features. This non-intuitive design forces the framework to predict expected communication patterns from neighborhood topologies. Attackers cannot easily manipulate this deep structural dependency. The model then converts the resulting structural inconsistencies into host-level anomaly scores. It utilizes robust Median Absolute Deviation (MAD) calibration for this final step. We evaluate GESR extensively on CTU-13 and CICIDS2017 datasets. These evaluations strictly impose tight false-positive operating constraints. On CICIDS2017, GESR achieves an outstanding ROC-AUC of 0.9753. It also yields a high TPR of 0.8569 at a strict 5% FPR threshold. GESR consistently outperforms existing methods across both evaluated benchmarks. The results prove that structure-conditioned edge reconstruction is a credible direction for practical intrusion detection.

2605.01041 2026-05-11 cs.MA cs.AI cs.GT cs.LG cs.RO

Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning

Iman Sharifi, Hyeong Tae Kim, Maheed Hatem Ahmed, Mahsa Ghasemi, Peng Wei

AI总结 本文研究了在未来高密度城市空域中,不同公司运营异构小型无人机编队时,如何通过多智能体强化学习实现安全分离的问题。提出了一种基于注意力增强的近端策略优化优势演员-评论家(PPOA2C)框架,用于解决同编队和跨编队的冲突,各编队独立训练策略以保护隐私。实验表明,采用共享PPOA2C策略的两编队能够达到安全分离的均衡状态,且该策略在冲突解决和与规则策略的交互中表现出更强的适应性,突显了其在异构无人机系统中公平冲突管理的重要性。

Comments 8 pages, 3 figure, 1 table

详情
英文摘要

In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.

2605.00932 2026-05-11 cs.SE cs.AI

Code World Model Preparedness Report

Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue

AI总结 本报告评估了Meta开发的代码世界模型(CWM)的准备情况,该模型用于代码生成和代码推理。研究通过在可能带来灾难性风险的领域进行预发布测试,并评估模型的潜在偏差,发现CWM并未引入当前AI生态系统之外的额外风险,因此作为开放权重模型发布。

Comments 25 pages, 3 figures

详情
英文摘要

This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned propensities. Our assessment found that CWM does not pose additional frontier risks beyond those present in the current AI ecosystem. We therefore release it as an open-weight model.

2605.00754 2026-05-11 cs.SE cs.LG

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Indraneil Paul, Goran Glavaš, Iryna Gurevych

AI总结 该研究提出了Themis-RM,一套用于多语言代码生成的鲁棒奖励模型,支持灵活的多维度评分。为解决现有代码奖励模型主要依赖执行反馈、评分维度单一的问题,研究者构建了Themis-CodeRewardBench基准,并收集了超过35万个代码偏好对,用于训练多语言、多准则的代码奖励模型。实验表明,Themis-RM在多语言迁移和多维度评分任务中表现出色,显著提升了代码奖励模型的灵活性和可靠性。

详情
英文摘要

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choice constrains post-training to optimizing functional correctness over self-contained executable code. In this work, we examine the training and evaluation of multilingual, multi-criteria code RMs. To this end, we first compile Themis-CodeRewardBench, a benchmark to evaluate code RMs across five preference dimensions (i.e., criteria) and eight programming languages, on which we profile 50+ code, math, and general-purpose RMs. Observing the limited proficiency of current RMs beyond scoring for functional correctness, we develop Themis-CodePreference, the largest open-source collection of code preferences to date (more than 350k preference pairs), and use it to train Themis-RM, a suite of multilingual code reward models for flexible multi-criteria scoring, ranging in size from 600M to 32B parameters. Our experiments and ablations demonstrate positive scaling trends, strong cross-lingual transfer when training on diverse preferences, and the importance of multi-criteria training for reliable code reward modeling.

2604.06276 2026-05-11 eess.IV cs.CV

Structural Regularities of Cinema SDR-to-HDR Mapping in a Controlled Mastering Workflow: A Pixel-wise Case Study on ASC StEM2

Xin Zhang, Xiaoyi Chen

AI总结 本文基于ASC StEM2数据集,对电影从标准动态范围(SDR)到高动态范围(HDR)的映射关系进行了像素级的实证研究,分析了在受控制作流程中SDR与HDR版本在亮度和色彩结构上的规律性差异。研究发现,SDR与HDR版本在亮度上具有稳定的全局单调对应关系,而色彩上则表现出色调一致、饱和度分布调整等特点。通过EXR源数据作为参考,研究进一步构建了像素级决策图,区分了需恢复原场景信息的区域和需内容自适应调整的区域,为结构感知的SDR到HDR映射分析提供了可解释的定量基准。

Comments 15 pages, 6 figures. Empirical case study on cinema SDR-to-HDR mapping using ASC StEM2

详情
Journal ref
Advanced Motion Picture Technology, 2026, no. 3, pp. 14-22
英文摘要

We present an empirical case study of cinema SDR-to-HDR mapping using ASC StEM2, a rare common-source dataset containing EXR scene-referred images and matched SDR/HDR cinema release masters from the same ACES-based mastering workflow. Based on pixel-wise statistics over all 18,580 frames of the test film, we construct a three-domain comparison involving EXR source data, SDR release masters, and HDR release masters to characterize their luminance and color structural relationships within this controlled workflow. In the luminance dimension, SDR and HDR masters exhibit a highly stable global monotonic correspondence, with geometric structure remaining largely consistent overall; sparse and structured deviations appear in self-luminous highlights and specific material regions. In the color dimension, the two masters remain largely consistent in hue, with saturation exhibiting a redistribution pattern of shadow suppression, midtone expansion, and highlight convergence. Using EXR as a scene-referred anchor, we further define a pixel-level decision map that operationally separates EXR-closer recovery regions from content-adaptive adjustment regions. Under this operational definition, 82.4% of sampled image regions are classified as EXR-closer recovery, while the remainder require localized adaptive adjustment. Rather than claiming a universal law for all cinema mastering pipelines, the study provides an interpretable quantitative baseline for structure-aware SDR-to-HDR analysis and for designing learning-based models under shared-source mastering conditions.

2604.04891 2026-05-11 math.OC cs.AI stat.ML

Muon Dynamics as a Spectral Wasserstein Flow

Gabriel Peyré

AI总结 本文研究了深度学习中梯度归一化方法的连续时间动力学,提出了一种基于谱范数的Wasserstein距离,用于描述参数空间上的概率测度演化。核心方法通过引入由不同矩阵范数索引的谱Wasserstein距离,将归一化训练过程解释为梯度流,并建立了与Benamou-Brenier公式等的理论联系。研究贡献包括静态Kantorovich公式、鲁棒成本表示、高斯简化以及在多种模型中的数值验证,为理解归一化训练提供了新的几何视角。

详情
英文摘要

Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum version of this idea in the mean-field regime, where wide models are represented by probability measures on parameter space. Starting from normalized matrix flows, we introduce Spectral Wasserstein distances indexed by norms $γ$ on positive semidefinite matrices: the trace norm gives classical $W_2$, the operator norm gives the Muon geometry, and Schatten norms interpolate between them. We develop the static Kantorovich formulation, a max-min robust-cost representation, Gaussian reductions extending the Bures formula, and for monotone norms, prove equivalence with a Benamou--Brenier formulation. This yields a gradient-flow interpretation of the mean-field normalized training dynamics. We illustrate these findings by numerical experiments on MMD flows, Gaussian reductions, two-layer ReLU models, and shallow attention.

2603.24914 2026-05-11 math.HO cs.AI

Shaping the Future of Mathematics in the Age of AI

Johan Commelin, Mateja Jamnik, Rodrigo Ochigame, Lenny Taelman, Akshay Venkatesh

AI总结 本文探讨了人工智能时代下数学学科面临的变革与挑战,重点分析了价值观、实践方式、教学、技术应用和伦理五个关键领域。作者提出了一系列建议,旨在维护数学界的自主性,重构研究实践,拓展课程内容,建设学术导向的基础设施,并制定共同的伦理准则,以确保数学的未来发展由数学界自身主导。

Comments To appear in Notices of the American Mathematical Society. Based on discussions at a September 2025 workshop on "Mechanization and Mathematical Research" held at the Lorentz center, Leiden

详情
英文摘要

Artificial intelligence is transforming mathematics at a speed and scale that demand active engagement from the mathematical community. We examine five areas where this transformation is particularly pressing: values, practice, teaching, technology, and ethics. We offer recommendations on safeguarding our intellectual autonomy, rethinking our practice, broadening curricula, building academically oriented infrastructure, and developing shared ethical principles - with the aim of ensuring that the future of mathematics is shaped by the community itself.

2602.08786 2026-05-11 cs.CY cs.LG

On the Meta-Design of Allocation Problems

Unai Fischer-Abaigar, Emily Aiken, Christoph Kern, Juan Carlos Perdomo

AI总结 本文研究了资源分配问题中设计参数的元设计问题,即如何在预测、容量约束和干预质量等高层决策上进行优化,而不仅仅是固定这些参数后寻找最优分配策略。文章首次形式化定义了资源分配问题的元设计空间,并开发了相应的实证工具,帮助实践者进行系统分析。通过德国就业服务和埃塞俄比亚定向现金转移项目的案例研究,验证了该框架的有效性与实用性。

详情
英文摘要

There is an extensive literature that studies how to find optimal policies in resource allocation problems, taking the underlying design parameters that define the allocation, such as what data is collected, how many people can be served, and quality of service as fixed constraints. Yet, from a planner's perspective, these design parameters are themselves optimization variables that are just as important in determining overall welfare as selecting the optimal targeting rule for a given set of constraints. This realization motivates a rich set of meta-design questions exploring how planners should make principled decisions about investments in prediction, capacity constraints, and treatment quality, all of which lie upstream of classical policy optimization. Building on initial theoretical work in this space, our paper has three main contributions. First, we formally define the broad meta-design space of resource allocation problems. Second, we develop empirical tools that enable practitioners to reliably navigate it. Third, we demonstrate the framework in two real-world case studies on German employment services and targeted cash transfer programs in Ethiopia.

2602.04774 2026-05-11 cond-mat.dis-nn cs.LG stat.ML

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Blake Bordelon, Francesco Mori

AI总结 本文研究了深度学习中学习率调度的最优理论,针对随机特征模型在随机梯度下降(SGD)下的训练过程,提出了基于最优控制理论的分析方法。研究发现学习率调度可分为“易相”和“难相”两个阶段,分别对应不同的衰减策略,并揭示了学习率与批量大小联合优化对训练效率的影响。实验表明,该理论在图像分类和语言模型任务中均具有良好的适用性,为学习率调度提供了理论指导和实践参考。

详情
英文摘要

Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $η_T^\star(t)$ where $t$ is the current iterate and $T$ is the training horizon. This schedule is computed both as a numerical optimization problem and also analytically using optimal control theory. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $η_T^\star(t) \simeq T^{-ξ} (1-t/T)^δ$ where $ξ$ and $δ$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant initial LR and annealing performed over a vanishing fraction of training steps. We investigate joint optimization of LR and batch size and find batch ramps can improve the wall-clock time in the easy phase. Beyond SGD, we derive optimal schedules for momentum parameter $β(t)$ and show that it improves the loss-scaling exponent in the hard phase. We compare our optimal schedule to various benchmarks including (1) optimal constant learning rates $η_T(t) \sim T^{-ξ}$ (2) optimal power laws $η_T(t) \sim T^{-ξ} t^{-χ}$, finding that our schedule achieves better rates than either of these. Our theory suggests that LR transfer across training horizon depends on the structure of the model and task. For ResNet image classification on CIFAR-5M, the learning curves exhibit hard-phase behavior where optimal base LRs are constant under sufficient annealing. GPT-2 style transformers trained in language modeling exhibit easy-phase behavior where optimal LRs shift even under annealing.