arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4046
2605.08130 2026-05-12 cs.LG

Additive Atomic Forests for Symbolic Function and Antiderivative Discovery

Reda Belaiche

AI总结 本文提出了一种从数据中同时恢复函数及其原函数的符号表达式的方法。该方法基于一种导数代数,通过递归应用乘积法则和链式法则,结合两个基础原语EML和SOL,构建一个能够自我扩展的函数-导数对库。通过“加法原子森林”结构,该方法能够在无需显式积分的情况下,同时拟合函数及其导数,实验表明其在多个数据集上表现优异并能生成可解释的公式。

详情
英文摘要

We present a framework for the simultaneous symbolic recovery of a function and its antiderivative from data. The framework rests on three ideas. First, a derivative algebra: the observation that the product rule $\frac{d}{dx}[f \cdot g] = f'g + fg'$ and the chain rule, applied to a seed set of elementary functions, generate a self-expanding system of function-derivative pairs -- a living library that grows each time a new function is discovered. Second, two complementary primitives -- EML$\,(e^u - \ln v)$, which is theoretically complete for all elementary functions, and SOL$\,(\sin u - \cos v)$, introduced here, which makes trigonometric atoms available at depth~1 instead of depth~$\sim$8 -- that seed the library with core atoms cheaply. Third, additive atomic forests: finite sums of primitive trees, optionally composed via multiplicative nodes, whose derivatives are fitted to data by continuous optimisation or by exhaustive search over the library. Because differentiation of each atom is determined by construction, the forest simultaneously encodes a symbolic expression $F$ and its derivative $F'$; no symbolic integration step is required. The library is not a fixed object: it self-constructs from a small seed set by recursive application of the product rule, chain rule, and the two primitives, and it can grow as newly discovered functions are folded back in. The larger the library, the richer the expressible class of candidate functions. We give conditional completeness, additive-depth, and analytic simultaneous-recovery results for the framework. Empirically, in our reported runs on 17 classification benchmarks, sparse atom combinations match or exceed XGBoost on 13 datasets while producing interpretable formulas.

2605.08129 2026-05-12 cs.LG

Towards Customized Multimodal Role-Play

Chao Tang, Jianzong Wu, Qingyu Shi, Ye Tian, Aixi Zhang, Hao Jiang, Jiangning Zhang, Yunhai Tong

AI总结 该论文提出了一种新的任务——定制化多模态角色扮演(CMRP),旨在同时定制角色的个性、对话风格和视觉身份,并在文本和图像生成中保持跨模态的一致性。研究构建了包含20个角色的RoleScape-20数据集,并设计了基于统一模型的两阶段训练框架UniCharacter,通过少量样本即可实现对角色的高效定制。实验表明,该方法在角色生成任务中显著优于现有方法,为下一代具有角色特征和沉浸感的交互智能体提供了基础。

Comments Code available at https://github.com/Tangc03/UniCharacter Project page available at https://tangc03.github.io/UniCharacter.github.io/

详情
英文摘要

Unified multimodal understanding and generation models enable richer human-AI interaction. Yet jointly customizing a character's persona, dialogue style, and visual identity while maintaining output consistency across modalities remains largely unexplored. To mitigate this gap, we introduce a new task, Customized Multimodal Role-Play (CMRP). We construct the RoleScape-20 dataset comprising 20 characters, including training and evaluation data that cover persona, stylistic descriptions, visual/expressive cues, and text-image interactions. Building on a unified model, we devise UniCharacter, a two-stage training framework containing Unified Supervised Finetuning (Unified-SFT) and character-specific group relative policy optimization (Character-GRPO). Given only 10 images plus corresponding interaction examples, the model acquires the target character and exhibits coherent persona, style, and visual identity in both generated text and images. This process takes about 100 GPU hours. Experiments on the RoleScape-20 dataset show that the proposed method substantially outperforms prior approaches. Ablation studies further validate the effectiveness of our cross-modal consistency design and few-shot customization strategy. We argue that CMRP, coupled with unified modeling, provides a basis for next-generation characterful and immersive interactive agents.

2605.08128 2026-05-12 cs.LG cs.AI

Towards Universal Gene Regulatory Network Inference: Unlocking Generalizable Regulatory Knowledge in Single-cell Foundation Models

Jiaxin Qi, Hang Li, Yan Cui, Yuhua Zheng, Jianqiang Huang

AI总结 该论文旨在解决单细胞基础模型(scFMs)在基因调控网络(GRN)推断中的泛化能力不足问题。研究提出了一种新的基准测试,用于评估模型在未见基因和数据集上的调控预测能力,并设计了两种新方法——虚拟值扰动和梯度轨迹,以从scFMs中提取隐含的调控信息,生成高度泛化的基因间特征。实验表明,该方法显著优于现有技术,为利用scFMs实现通用GRN推断提供了新范式。

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情
英文摘要

Gene Regulatory Network (GRN) inference is essential for understanding complex cellular mechanisms, rendered tractable through single-cell transcriptomic data. With the emergence of single-cell Foundation Models (scFMs), enhanced transcriptomic encoding is widely expected to revolutionize GRN inference. However, we observe that their performance remains far from satisfactory. The primary reason is that the standard reconstruction-based pre-training objectives often fail to explicitly capture latent regulatory signals. To bridge this gap, we first introduce a GRN generalization benchmark designed to evaluate regulatory predictions on unseen genes and datasets, which relies on the zero-shot capabilities of scFMs and is inherently challenging for traditional methods. Furthermore, to unlock the regulatory knowledge within the foundation models, we propose two novel methods, Virtual Value Perturbation and Gradient Trajectory, to distill implicit regulatory information from scFMs into highly generalizable inter-gene features. Extensive experiments demonstrate that our approach significantly outperforms existing methods, establishing a new paradigm for leveraging the potential of scFMs in universal GRN inference.

2605.08119 2026-05-12 cs.LG cs.AI

Feature Repulsion and Spectral Lock-in: An Empirical Study of Two-Layer Network Grokking

Yongzhong Xu

AI总结 该研究探讨了两层网络在“grokking”现象中的特征排斥与谱锁定机制。通过实验证明,特征相似性在参数更新中表现出明确的符号规律,并在特定激活函数下产生可检测的谱特征。研究发现,特征排斥的效应与激活函数密切相关,不同激活下参数更新的谱结构存在显著差异,揭示了特征学习与权重更新之间的非对称关系。

Comments 11 pages, 4 figures

详情
英文摘要

Tian (2025) proves a repulsion theorem (Theorem 6) for the matrix $ B = (\widetilde{F}^\top \widetilde{F} + ηI)^{-1} $ during the interactive feature-learning stage of grokking: similar features have negative off-diagonal entries $ B_{j\ell} $, producing an effective repulsive force that drives them apart. However, the theorem does not specify when this mechanism becomes empirically observable, nor whether it leaves a measurable spectral signature in the parameter updates. We test this directly on Tian's modular addition setup ($ M = 71 $, $ K = 2048 $, MSE loss) and observe a clear structure-mechanism dissociation. The predicted sign rule holds robustly on the top-200 most-similar feature pairs across activations (empirical sign-match rising from 0.865 to 0.985 on $ σ= x^2 $ across 5 seeds, and saturating at 1.000 on $ σ= \operatorname{ReLU} $). However, the spectral signature in the parameter updates is strongly activation-dependent. With $ σ= x^2 $, a simple slope detector on the rolling eigengap $ σ_2 / σ_3 $ of $ ΔW $ fires in 15/15 grokking seeds at epoch 174 (IQR [173,174]) and in 0/15 non-grokking controls, with 229$ \times $ late-stage magnitude separation; the spectrum is rank-2. In contrast, with $ σ= \operatorname{ReLU} $, the detector never fires and the spectrum remains effectively rank-1. This dissociation aligns with Tian's Theorem 5 distinction between focused (power-law) and spreading (ReLU) memorization: while the sign structure of $ B $ depends only on $ \widetilde{F}^\top \widetilde{F} $, how feature repulsion translates into weight updates critically depends on the activation derivative $ σ' $.

2605.08114 2026-05-12 cs.LG cs.IT cs.MS math.IT

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

Paolo D'Alberto

AI总结 本文研究了三种在相同比特预算下的KV缓存量化方案(KV、KQV和QKQV),分析了它们在不同分布和秩下的性能差异。通过统计推断和信息度量,揭示了QJL量化在K方向上会放大内积方差,并通过softmax非线性放大效应进一步影响模型表现。实验发现,KQV在多数实际场景下表现最优,而QKQV在某些预算下表现更佳,揭示了预算依赖的性能交叉现象,为量化与率失真理论提供了新的见解。

Comments 23 pages, 7 Figures, multiple tables, the process is highly assisted by AI

详情
英文摘要

We analyse three KV cache quantization schemes under a fair bit budget: \textbf{KV} (scalar MSE baseline), \textbf{KQV} (WHT + MSE on $K$; WHT + MSE + QJL on $V$), and \textbf{QKQV} (WHT + MSE + QJL on both). Starting from the Beta distribution on the hypersphere, we trace how QJL on $K$ inflates inner product variance by $π/2$, which softmax amplifies nonlinearly via Jensen's inequality, and we present statistical inference and information metrics to highlight practical differences. Three empirical findings emerge. (1)~At $n=4$ (the practically dominant budget), KQV wins on every measure -- KL divergence, geometric $K$ error, and 6D distance -- across all distributions and ranks tested. (2)~The K--V asymmetry is unconditional: QKQV is consistently worse than KQV in KL divergence at every budget and distribution. (3)~A budget-dependent crossover exists: QKQV achieves better geometric $K$ reconstruction at $n \in \{2,3,5\}$, KQV at $n \in \{4,6\}$, invariant to rank and tail weight -- an open rate-distortion problem. $\mathrm{KL}(p_{\mathrm{ref}} \| p_{\mathrm{quant}})$, K-only by construction, bridges K direction error to routing corruption and output collapse. We present a sufficient condition when the Jensen mechanism amplifies superlinearly through the softmax. At $n \in \{2,3,5\}$, QKQV wins geometrically because this assumption does not bind. At $n=4$, elevated K error and KL divergence for QKQV strongly suggest the Jensen mechanism is the operative cause of the crossover, providing a new perspective and explanation.

2605.08113 2026-05-12 cs.LG cs.CV

Do Foundation Model Embeddings Improve Cross-Country Crop Yield Generalisation? A Leave-One-Country-Out Evaluation in Sub-Saharan Africa

Yaw Osei Adjei

AI总结 本文研究了基础模型嵌入在跨国家玉米产量预测中的表现,特别是在撒哈拉以南非洲地区。通过留一国交叉验证方法,对比了Prithvi-EO-1.0-100M和ViT-Base等基础模型嵌入与传统Sentinel-2光谱特征的性能,发现所有特征集在跨国家测试中均表现不佳,R²值普遍为负。研究指出,产量分布的国家间差异是影响泛化能力的主要因素,而非特征表示的质量,并为此提供了可复现的负基准以供未来研究参考。

Comments 9 pages, 10 figures, appendix, code and processed results released publicly

详情
英文摘要

Accurate predictions of smallholder maize yields across national boundaries are critical for food security planning in sub-Saharan Africa, yet most published benchmarks report within-country performance that overstates true generalisability. This paper evaluates whether geospatial foundation model embeddings, specifically Prithvi-EO-1.0-100M and ViT-Base, outperform traditional Sentinel-2 spectral features under a Leave-One-Country-Out cross-validation scheme on 6,404 maize field observations from five African countries. The results show a clear generalisability gap: within-country random cross-validation yields moderate R^2 values, but all feature sets perform poorly under cross-country testing, with universally negative R^2. Frozen Prithvi-EO embeddings provide no meaningful advantage over engineered spectral features for cross-country prediction in this setting. The paper argues that the main limitation is a shift in yield distribution between countries rather than representation quality and releases a reproducible negative benchmark for future work.

2605.08111 2026-05-12 cs.LG cs.AI stat.ME

TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data

Omar Faruque, Sahara Ali, Xue Zheng, Jianwu Wang

AI总结 该论文提出了一种名为TTCD的新型端到端框架,用于从非平稳时间序列数据中发现瞬时和滞后因果关系。TTCD结合了Transformer架构,引入了非平稳特征学习模块和自定义因果结构学习模块,通过重建引导的因果信号蒸馏方法,有效抑制噪声和虚假相关性,从而在无需强统计假设的前提下推断出潜在的因果图。实验表明,TTCD在多种合成和真实数据集上均优于现有方法,展现出在复杂现实场景中进行因果发现的有效性。

Comments 18 Pages

详情
英文摘要

The widespread availability of complex time series data in various domains such as environmental science, epidemiology, and economics demands robust causal discovery methods that can identify intricate contemporaneous and lagged relationships in non-stationary, nonlinear, and noisy settings. Existing constraint-based methods often rely heavily on conditional independence tests that degrade for limited data samples and complex distributions, while score-based methods impose strong statistical assumptions. Recent methods address special cases such as change point detection or distribution shifts, but struggle to provide a unified solution. We propose the Transformer Integrated Temporal Causal Discovery (TTCD) Framework, a novel end-to-end approach that learns contemporaneous and lagged causal relations from non-stationary time series. TTCD introduces a Non-Stationary Feature Learner integrating temporal and frequency-domain attention with dynamic non-stationarity profiling, and a custom Causal Structure Learner. A key innovation is reconstruction-guided causal signal distillation, to distill essential causal signals through the reconstruction process of the transformer decoder, which mitigates noise and spurious correlations while preserving meaningful dependencies. The Causal Structure Learner operates on distilled reconstructed signals to infer the underlying causal graph without restrictive assumptions on noise distributions or data generation processes. Experiments on synthetic, benchmark, and real world datasets show that TTCD consistently outperforms state-of-the-art baselines in both accuracy and consistency with domain knowledge, demonstrating the approach's effectiveness for causal discovery in challenging real world contexts.

2605.08110 2026-05-12 cs.LG cs.AI

BaLoRA: Bayesian Low-Rank Adaptation of Large Scale Models

Dario Coscia, Sindy Löwe, Max Welling

AI总结 本文提出了一种名为BaLoRA的贝叶斯低秩适配方法,用于在降低计算成本的前提下对大规模预训练模型进行微调。该方法通过引入输入自适应的贝叶斯参数化方式,对传统LoRA的低秩矩阵进行扩展,在几乎不增加参数和计算量的情况下,不仅提供了校准良好的不确定性估计,还显著提升了预测精度,缩小了与全量微调之间的性能差距。实验表明,BaLoRA在材料预测等任务中表现出更强的误差关联性和计算效率。

详情
英文摘要

Low-Rank Adaptation (LoRA) has become the standard for fine-tuning large pre-trained models at reduced computational cost. However, its low-rank point-estimate updates limit expressiveness, leave a persistent gap relative to full fine-tuning accuracy, and provide no built-in uncertainty quantification, limiting its applicability in settings where reliability matters as much as accuracy. We introduce BaLoRA, a Bayesian extension of LoRA with a novel input-adaptive Bayesian parameterization of LoRA matrices that adds minimal parameters and compute. Surprisingly, not only does the Bayesian extension yield well-calibrated uncertainty estimates, but the adaptive noise injection underlying our approach also significantly improves prediction accuracy, narrowing the gap with full fine-tuning across both natural language reasoning and vision tasks. When applied to band gap prediction in metal-organic frameworks, BaLoRA produces zero-shot test-time uncertainty estimates that correlate more strongly with model error than a trained ensemble of LoRA models, and improve monotonically with compute without sacrificing accuracy.

2605.08109 2026-05-12 cs.LG cond-mat.mtrl-sci physics.flu-dyn

Geometry-free prediction of inertial lift forces in microfluidic devices using deep learning

Jesse Ward-Bond, Ali Mashadian, Timothy C. Y. Chan, Edmond W. K. Young

AI总结 本文提出了一种基于深度学习的几何无关方法,用于预测微流控器件中的惯性提升力,无需显式输入几何参数。该方法通过引入新的参数集训练神经网络模型,不仅在训练集中的通道几何结构上表现优异,还能有效推广到未见过的几何结构。研究成果可直接应用于粒子追踪仿真软件,准确预测多种通道设计下的粒子迁移行为,为微流控系统的设计与优化提供了高效工具。

详情
英文摘要

Inertial microfluidic devices (IMDs) offer low-cost, high-throughput alternative techniques for many traditional particle- (or cell-) manipulation tasks, but simulating them requires being able to predict particle migration, and thus particle lift forces, under a variety of possible channel geometries. Recent work has demonstrated that machine learning models can be used to drastically speed up these numerical simulations, but doing so required training individual models for every unique channel cross-section type (e.g., rectangular, triangular) -- shifting the burden from the simulation step to the training step. In this paper, we develop a novel approach for predicting particle lift forces that contains no explicit geometric parameters. We train a neural network model using a new parameter set and show that while it performs comparably to existing models on channel geometries in the training set, it is able to generalize to unseen channel geometries far more effectively. We show that the lift force model developed herein can be easily transferred to particle tracing simulation software, where it is capable of predicting particle migration patterns consistent with the literature across a variety of channel designs.

2605.08104 2026-05-12 cs.LG

Distributional Reinforcement Learning via the Cramér Distance

Vanya Aziz, Ivo Nowak, E. M. T Hendrix

AI总结 本文研究了在分布强化学习框架下应用软演员评论家(SAC)算法,并提出了一种基于克拉默距离的分布软演员评论家算法(C-DSAC)。该方法通过最小化平方克拉默距离来学习状态-动作值的分布,实验结果表明其在多个机器人基准任务中优于传统SAC及其他分布方法,尤其在复杂环境中表现更优。研究还分析指出,该方法的高效性部分归因于其基于置信度的Q值更新机制,能够有效抑制值估计的过估计问题。

详情
英文摘要

This paper explores the application of the Soft Actor-Critic (SAC) algorithm within a Distributional Reinforcement Learning setting and introduces an implementation of such algorithm named Cramér-based Distributional Soft Actor-Critic (C-DSAC). The novel approach employs distributional reinforcement learning to represent state-action values, and minimizes the squared Cramér distance for learning the distribution. Empirical results across various robotic benchmarks indicate that our algorithm surpasses the performance of baseline SAC and contemporary distributional methods, with the performance advantage becoming increasingly pronounced in high-complexity environments. To explain the efficiency of the new approach, we conduct an analysis showing that its superior performance is partly due to \textit{confidence-driven} Q-value updates: High-variance target distributions (low confidence in target) lead to more conservative model updates, thereby attenuating the impact of overestimated values. This work deepens the understanding of distributional reinforcement learning, offering insights into the algorithmic mechanisms governing convergence and value estimation.

2605.08102 2026-05-12 cs.LG stat.ML

Path-Based Gradient Boosting for Graph-Level Prediction

Claudio Meggio, Johan Pensar, Riccardo De Bin

AI总结 本文提出了一种名为PathBoost的梯度树提升方法,用于图级别的分类与回归任务,能够直接从图结构中学习具有判别性的路径特征。该方法在原有针对化学应用的工作基础上进行了三项关键扩展,包括对二分类任务的适配、多节点和边属性的融合以及自动选择锚点节点。实验表明,PathBoost在多个基准数据集上表现优异,尤其在节点数量较多的图上效果更佳,展示了基于路径的提升方法在复杂图任务中的竞争力。

Comments 20 Pages, 1 figure

详情
英文摘要

We propose PathBoost, a gradient tree boosting method for graph-level classification and regression that learns discriminative path-based features directly from the input graph structure. Building on a previous work, which was tailored to a specific chemistry application, PathBoost introduces three key extensions: (i) adaptation to binary classification through gradient boosting with a logistic loss, (ii) incorporation of multiple node and edge attributes into the path feature space via a prefix-based decomposition, and (iii) automatic anchor node selection based on categorical attribute diversity, eliminating the need for the user to specify the starting point of the considered path features. We compared PathBoost to graph neural networks and graph kernel approaches on several benchmark datasets, obtaining better results in half of them, and comparable results in the rest. PathBoost shows better performances on graphs with larger average node counts. Overall, the results demonstrate that path-based boosting methods can be competitive with more complex black-box approaches.

2605.08098 2026-05-12 cs.LG

Reinforcement learning for inverse structural design and rapid laser cutting of kirigami prototypes

Milad Yazdani, Shahriar Shalileh, Dena Shahriari

AI总结 该研究提出了一种基于强化学习的逆向结构设计框架RL-Kirigami,用于快速生成可变形的Kirigami结构并实现激光切割。该方法结合最优运输条件流匹配与强化学习,有效解决了非线性展开、几何兼容性及切割布局可行性等问题。实验表明,该方法在生成精度和效率方面显著优于传统求解器,且能直接输出设计图纸并实现快速制造。

详情
英文摘要

Kirigami is an increasingly useful fabrication method to produce shape-programmable metamaterial structures. However, inverse design remains difficult because deployment is nonlinear, and feasible cut layouts must satisfy discrete compatibility rules, avoid overlap, and map one target shape to valid designs. We present RL-Kirigami, an inverse design framework that combines optimal-transport conditional flow matching (OT-CFM) with reinforcement learning to generate compatible ratio fields for compact reconfigurable parallelogram quad kirigami. A marching decoder enforces global geometric compatibility, and Group Relative Policy Optimization (GRPO) aligns the generator with nondifferentiable rewards for silhouette matching, feasibility, and ratio-field regularity. Across procedurally generated target shape instances, a single sample from the pretrained OT-CFM prior reached $94.2%$ sIoU and outperformed solver baselines while reducing forward simulator evaluations from hundreds to 1. GRPO improved accuracy to $94.91%$ sIoU and, with regularity included, reduced $\mathrm{TV}(\mathbf{x})$ from 0.95 to 0.81 while maintaining $94.83%$ sIoU. Generated layouts were exported to DXF and laser-cut in $50~μ\mathrm{m}$ polymeric sheets to produce deployable prototypes in $8.0 \pm 1.0$ minutes per part. These results support a manufacturing-aware inverse design workflow for deployable kirigami metamaterials under hard geometric feasibility constraints.

2605.06241 2026-05-12 cs.CL

Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning

Ömer Faruk Akgül, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna

AI总结 该论文重新审视了强化学习(RL)在大语言模型(LLM)推理能力提升中的作用,指出RL并非教会模型新策略,而是重新分配模型已有解的概率权重。研究发现,RL的实际效果主要体现在少数高熵决策点上的稀疏修正,这些修正无需RL训练即可通过模型自身熵值识别,并可通过一种名为ReasonMaxxer的简洁方法实现,其性能可与完整RL相当,但训练成本大幅降低。

详情
英文摘要

Reinforcement learning has become the standard for improving reasoning in large language models, yet evidence increasingly suggests that RL does not teach new strategies; it redistributes probability mass over solutions the base model already contains. In this work, we ask: if RL merely steers the model toward paths it already knows, is the RL optimization loop itself necessary? Through token-level analysis across multiple model families and RL algorithms, we find that RL's beneficial footprint is a sparse, predictable correction concentrated at high-entropy decision points where the model is uncertain which branch to take. Only 1--3\% of token positions are affected, the promoted token always lies within the base model's top-5 alternatives, and targeted corrections at those few positions causally recover a large fraction of RL's accuracy gain, while random corrections fail. The base model's own entropy identifies these positions without any RL-trained model, and the entire correction is low-dimensional, representable in a tiny fraction of model parameters. These findings reframe reasoning improvement as sparse policy selection, not capability acquisition. We translate this insight into ReasonMaxxer, a minimal RL-free method that applies contrastive loss only at entropy-gated decision points, using a few hundred base-model rollouts and no online generation. Across three model families, six scales, and six math reasoning benchmarks, ReasonMaxxer matches or exceeds full RL performance while requiring only tens of problems and minutes of single-GPU training, a reduction in training cost of roughly three orders of magnitude.

2605.05443 2026-05-12 cs.CL cs.AI

SLAM: Structural Linguistic Activation Marking for Language Models

Fabrice Harel-Canada, Amit Sahai

AI总结 本文提出了一种名为SLAM的新型白盒语言模型水印方案,通过在语言模型的结构几何中嵌入水印,而非改变词频分布,从而在不显著影响文本质量的前提下实现高精度检测。该方法利用稀疏自编码器识别编码语言结构的残差流方向,并在生成时对这些方向进行因果引导,保持词汇采样和语义的自由度。实验表明,SLAM在Gemma-2 2B和9B模型上实现了100%的检测准确率,质量损失仅为1-2个奖励点,远低于现有方法,同时保持了自然性和多样性。

Comments Under review

详情
英文摘要

LLM watermarks must be detectable without compromising text quality, yet most existing schemes bias the next-token distribution and pay for detection with measurable quality loss. We present SLAM (Structural Linguistic Activation Marking), a novel white-box watermarking scheme that sidesteps this cost by writing the mark into structural geometry rather than token frequencies: sparse autoencoders identify residual-stream directions encoding linguistic structure (e.g., voice, tense, clause order), and we causally steer those directions at generation time, leaving lexical sampling and semantics unconstrained. On Gemma-2 2B and 9B, SLAM achieves 100% detection accuracy with a quality cost of only 1-2 reward points - compared to 7.5-11.5 for KGW, EWD, and Unigram - with naturalness and diversity preserved at near-unwatermarked levels across both models. The trade-off is a complementary robustness profile: SLAM resists word-level edits but is vulnerable to paraphrase that restructures syntax (at a quality cost), the converse of token-distribution methods.

2604.24934 2026-05-12 cs.RO cs.SY eess.SY

TEACar: An Open-Source Autonomous Driving Platform

Zhongzheng Zhang, Maxwell Ruyle, Andrew Kappes, Tyler Ruble, William Shaoul, Dana Moreno, Jack Penn, Ivan Ruchkin

AI总结 本文介绍了一款名为TEACar的开源自动驾驶平台,旨在为智能交通系统(ITS)提供硬件在环的实验验证环境。该平台采用模块化机械架构和基于ROS 2的软件系统,通过四层结构实现感知、计算、执行和供电子系统的物理解耦,提升了结构刚性和可重构性。实验表明,TEACar在机械稳定性、计算能力和系统鲁棒性方面表现良好,为ITS研究、教育和开发提供了一个可扩展、模块化且成本低廉的测试平台。

详情
英文摘要

Intelligent Transportation Systems (ITS) increasingly rely on vision-based perception and learning-based control, necessitating experimental platforms that support realistic hardware-in-the-loop validation. Small-scale platforms for autonomous racing offer a practical path to hardware validation, but often suffer from limited modularity, high integration complexity, or restricted extensibility. This paper presents TEACAR, a 1/14- to 1/16-scale autonomous driving platform designed with modular mechanical architecture, hardware abstraction, and ROS 2-based software. The system adopts a four-layer deck structure that physically decouples sensing, computation, actuation, and power subsystems, improving structural rigidity while simplifying reconfiguration. We constructed and comprehensively evaluated the prototype of TEACAR. Its mechanical stability, structural characteristics, and software performance were quantified based on three CNN-based steering controllers. Inference latency, power consumption, and system operating time were measured to evaluate computational capability and robustness. Our experiments demonstrated that TEACAR offers a scalable, modular, and cost-effective testbed for ITS research, education, and development. Our project repository is available on GitHub.

2604.18486 2026-05-12 cs.CV cs.CL cs.RO

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Jinghui Lu, Jiayi Guan, Zhijian Huang, Jinlong Li, Guang Li, Lingdong Kong, Yingyan Li, Han Wang, Shaoqing Xu, Yuechen Luo, Fang Li, Chenxu Dang, Junli Wang, Tao Xu, Jing Wu, Jianhua Wu, Xiaoshuai Hao, Wen Zhang, Tianyi Jiang, Lingfeng Zhang, Lei Zhou, Yingbo Tang, Jie Wang, Yinfeng Gao, Xizhou Bu, Haochen Tian, Yihang Qiu, Feiyang Jia, Lin Liu, Yigu Ge, Hanbing Li, Yuannan Shen, Jianwei Cui, Hongwei Xie, Bing Wang, Haiyang Sun, Jingwei Zhao, Jiahui Huang, Pei Liu, Zeyu Zhu, Yuncheng Jiang, Zibin Guo, Chuhong Gong, Hanchao Leng, Kun Ma, Naiyan Wang, Guang Chen, Kuiyuan Yang, Hangjun Ye, Long Chen

AI总结 该研究提出了一种名为Xiaomi OneVL的统一视觉-语言-动作(VLA)与世界模型框架,旨在解决基于链式思维(CoT)的自动驾驶轨迹预测中推理延迟过高的问题。其核心方法通过引入视觉世界模型解码器和语言解码器,引导隐空间学习道路几何、代理运动和环境变化的因果动态,从而在保持推理准确性的前提下实现单步并行推理。实验表明,OneVL在多个基准测试中首次超越了显式CoT方法,在保持预测速度的同时提升了精度,验证了世界模型监督下隐式CoT方法的优越性。

Comments Technical Report; 49 pages, 22 figures, 10 tables; Project Page at https://xiaomi-embodied-intelligence.github.io/OneVL GitHub at https://github.com/xiaomi-research/onevl

详情
英文摘要

Chain-of-Thought (CoT) reasoning has become a powerful driver of trajectory prediction in VLA-based autonomous driving, yet its autoregressive nature imposes a latency cost that is prohibitive for real-time deployment. Latent CoT methods attempt to close this gap by compressing reasoning into continuous hidden states, but consistently fall short of their explicit counterparts. We suggest that this is due to purely linguistic latent representations compressing a symbolic abstraction of the world, rather than the causal dynamics that actually govern driving. Thus, we present OneVL (One-step latent reasoning and planning with Vision-Language explanations), a unified VLA and World Model framework that routes reasoning through compact latent tokens supervised by dual auxiliary decoders. Alongside a language decoder that reconstructs text CoT, we introduce a visual world model decoder that predicts future-frame tokens, forcing the latent space to internalize the causal dynamics of road geometry, agent motion, and environmental change. A three-stage training pipeline progressively aligns these latents with trajectory, language, and visual objectives, ensuring stable joint optimization. In inference, the auxiliary decoders are discarded, and all latent tokens are prefilled in a single parallel pass, matching the speed of answer-only prediction. Across four benchmarks, OneVL becomes the first latent CoT method to surpass explicit CoT, delivering superior accuracy at answer-only latency. These results show that with world model supervision, latent CoT produces more generalizable representations than verbose token-by-token reasoning. Code has been open-sourced to the community. Project Page: https://xiaomi-embodied-intelligence.github.io/OneVL

2604.15113 2026-05-12 cs.AI

HyperSpace: A Generalized Framework for Spatial Encoding in Hyperdimensional Representations

Shay Snyder, Andrew Capodieci, David Gorsich, Maryam Parsa

AI总结 本文提出了一种名为HyperSpace的开源框架,用于对超维度表示中的空间编码进行模块化操作,包括编码、绑定、捆绑、相似性计算、清理和回归等。通过该框架,研究者对比分析了两种典型的VSA后端——HRR和FHRR,在实际应用中发现尽管FHRR在单个操作上的理论复杂度较低,但由于相似性和清理操作在空间领域占主导,两者在端到端性能上表现相当。此外,HRR在内存占用方面具有优势,仅为FHRR的一半,揭示了VSA系统在实际部署中的权衡因素。

详情
英文摘要

Vector Symbolic Architectures (VSAs) provide a well-defined algebraic framework for compositional representations in hyperdimensional spaces. We introduce HyperSpace, an open-source framework that decomposes VSA systems into modular operators for encoding, binding, bundling, similarity, cleanup, and regression. Using HyperSpace, we analyze and benchmark two representative VSA backends: Holographic Reduced Representations (HRR) and Fourier Holographic Reduced Representations (FHRR). Although FHRR provides lower theoretical complexity for individual operations, HyperSpaces modularity reveals that similarity and cleanup dominate runtime in spatial domains. As a result, HRR and FHRR exhibit comparable end-to-end performance. Differences in memory footprint introduce additional deployment trade-offs where HRR requires approximately half the memory of FHRR vectors. By enabling modular, system-level evaluation, HyperSpace reveals practical trade-offs in VSA pipelines that are not apparent from theoretical or operator-level comparisons alone.

2604.15016 2026-05-12 cs.LG

DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

Jingyuan Wang, Zhihao Jia, Chenyu Liu, Xinliang Zhou, Haoran Luo, Ziyu Jia, Yong Li, Fang Li, Junfeng Yao, Yi Ding

AI总结 本文提出DLink方法,旨在从脑电基础模型(EFM)中高效提取层间和主导知识,以提升知识蒸馏的效果。DLink通过光谱引导的蒸馏框架和输入条件化的层路由机制,将EFM中分散在中间层的任务判别信息有效传递给紧凑的学生模型,并通过幅度和相位谱对齐减少压缩带来的谱失真。实验表明,DLink在多个脑电基准数据集上显著提升了紧凑模型的性能,同时大幅降低了参数量、计算量和推理延迟。

详情
英文摘要

EEG foundation models (EFMs) achieve strong cross-subject and cross-task generalization through large-scale pretraining and downstream fine-tuning. Through empirical analysis, we observe that (i) task-adapted EFMs provide strong decoding performance but incur substantial overhead when retained as inference backbones, making knowledge distillation a natural route for optimizing compact students; and (ii) direct distillation from a fixed teacher representation underutilizes EFM knowledge, as task-discriminative information is distributed across intermediate layers rather than concentrated in the final layer. These observations motivate DLink (Distilling Layer-wise and Dominant Knowledge), a spectrally guided distillation framework with input-conditioned layer routing for transferring EFM knowledge into compact students. DLink uses a lightweight router to aggregate teacher layers for each input, and aligns magnitude and phase spectra to mitigate compression-induced spectral distortion in learned representations. The routed teacher knowledge is internalized by a project-then-compress student; the teacher and router are used only during training. Experiments on four EEG benchmarks show that DLink improves matched compact students and remains competitive with lightweight baselines, narrowing the gap to fine-tuned EFMs while substantially reducing parameters, FLOPs, and CPU-only inference latency.

2604.02474 2026-05-12 cs.LG stat.ML

Time-Warping Recurrent Neural Networks for Transfer Learning

Jonathon Hirschi

AI总结 本文研究了如何利用时间拉伸方法在循环神经网络(RNN)中实现迁移学习,以应对物理系统在不同环境条件下演化速度变化的问题。提出的方法基于对时间尺度的重新标定,证明了LSTM可以高精度逼近一类线性微分方程模型,并在保持精度的前提下进行时间拉伸。该方法在预测燃料含水率的应用中得到验证,实验表明,仅调整少量参数即可实现对不同时间尺度数据的准确预测,效果与现有迁移学习方法相当。

详情
英文摘要

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.

2603.29261 2026-05-12 cs.LG cs.AI

Monodense Deep Neural Model for Determining Item Price Elasticity

Lakshya Garg, Sai Yaswanth, Deep Narayan Mishra, Karthik Kumaran, Anupriya Sharma, Mayank Uniyal

AI总结 本文提出了一种名为Monodense的深度神经网络模型,用于从大规模交易数据中估计商品的价格弹性,无需依赖对照组设置。该模型结合嵌入层、密集层和Monodense层,能够更准确地捕捉消费者对价格变化的响应。实验结果表明,与传统的机器学习方法相比,Monodense模型在多类别零售数据上表现出更优的弹性估计性能,有助于企业优化定价策略和提升收益。

Comments Accepted at AAIML 2026 (International Conference on Advances in Artificial Intelligence and Machine Learning). Copyright 2026 IEEE. 6 pages, 4 figures

详情
Journal ref
2026 International Conference on Advances in Artificial Intelligence and Machine Learning (AAIML)
英文摘要

Item Price Elasticity is used to quantify the responsiveness of consumer demand to changes in item prices, enabling businesses to create pricing strategies and optimize revenue management. Sectors such as store retail, e-commerce, and consumer goods rely on elasticity information derived from historical sales and pricing data. This elasticity provides an understanding of purchasing behavior across different items, consumer discount sensitivity, and demand elastic departments. This information is particularly valuable for competitive markets and resource-constrained businesses decision making which aims to maximize profitability and market share. Price elasticity also uncovers historical shifts in consumer responsiveness over time. In this paper, we model item-level price elasticity using large-scale transactional datasets, by proposing a novel elasticity estimation framework which has the capability to work in an absence of treatment control setting. We test this framework by using Machine learning based algorithms listed below, including our newly proposed Monodense deep neural network. (1) Monodense-DL network -- Hybrid neural network architecture combining embedding, dense, and Monodense layers (2) DML -- Double machine learning setting using regression models (3) LGBM -- Light Gradient Boosting Model We evaluate our model on multi-category retail data spanning millions of transactions using a back testing framework. Experimental results demonstrate the superiority of our proposed neural network model within the framework compared to other prevalent ML based methods listed above.

2603.22421 2026-05-12 cs.CV

OsteoFlow: Lyapunov-Guided Flow Distillation for Predicting Bone Remodeling after Mandibular Reconstruction

Hamidreza Aftabi, Faye Yu, Brooke Switzer, Zachary Fishman, Eitan Prisman, Antony Hodgson, Cari Whyne, Sidney Fels, Michael Hardisty

AI总结 本文提出了一种名为OsteoFlow的流模型框架,用于从术后第五天的CT扫描预测下颌骨重建后的第一年骨重塑情况。其核心方法是基于李雅普诺夫引导的轨迹蒸馏技术,通过从配准得到的静态速度场中学习连续的演化轨迹,从而在保证解剖准确性的同时提升长期预测的一致性。实验表明,该方法在344对感兴趣区域上显著优于现有方法,手术切除区域的平均绝对误差降低了约20%,展示了轨迹蒸馏在长期医学影像预测中的潜力。

详情
英文摘要

Predicting long-term bone remodeling after mandibular reconstruction would be of great clinical benefit, yet standard generative models struggle to maintain trajectory-level consistency and anatomical fidelity over long horizons. We introduce OsteoFlow, a flow-based framework predicting Year-1 post-operative CT scans from Day-5 scans. Our core contribution is Lyapunov-guided trajectory distillation: Unlike one-step distillation, our method distills a continuous trajectory over transport time from a registration-derived stationary velocity field teacher. Combined with a resection-aware image loss, this enforces geometric correspondence without sacrificing generative capacity. Evaluated on 344 paired regions of interest, OsteoFlow significantly outperforms state of-the-art baselines, reducing mean absolute error in the surgical resection zone by ~20%. This highlights the promise of trajectory distillation for long-term prediction. Code is available on GitHub: OsteoFlow.

2603.14681 2026-05-12 cs.LG

Generalized Hierarchical Bayesian Segmentation with Irregular Designs, Multi-Sequence Hierarchies, and Grouped/Latent-Group Designs

Omid Shams Solari

AI总结 该论文提出了一种名为 BayesBreak 的模块化贝叶斯分割框架,旨在解决具有不规则设计、多序列层次结构以及分组/潜在分组设计的数据分割问题。该方法将局部块评分与全局推理分离,通过动态规划结合各候选块的边际似然,实现对分割数量、边界和潜在信号的后验推断。研究支持多种先验设计和非共轭模型的近似推断,并提供了后验奇数的稳定性保证,适用于包括地质、基因组学和金融等多领域的实际数据分析。

详情
英文摘要

Bayesian change-point and segmentation models provide uncertainty-aware piecewise-constant representations of ordered data, but exact inference is often limited to narrow likelihood classes, single sequences, or index-uniform designs. We present \texttt{BayesBreak}, a modular offline Bayesian segmentation framework that separates local block scoring from global inference: each candidate block supplies a marginal likelihood and any needed moment numerators, while a dynamic program combines these scores to compute posteriors over segment counts, boundaries, and latent signals. For weighted exponential-family likelihoods with conjugate priors, block evidences and posterior moments are available in closed form from cumulative sufficient statistics, enabling exact sum-product inference for $p(y\mid k)$, $p(k\mid y)$, boundary marginals, and Bayes regression curves. We distinguish these summaries from the \emph{joint} MAP segmentation, recovered by a separate max-sum recursion. BayesBreak supports design-aware partition priors for irregular observations, exact pooling across replicates with shared boundaries, and latent-template mixtures with exact EM updates. For non-conjugate GLM blocks, the same DP layer can use deterministic local approximations such as Laplace, variational methods, EP, or quadrature. We prove a posterior-odds stability bound: uniform per-block log-evidence error $\varepsilon$ perturbs $k$-odds and boundary-odds by at most $(k+k')\varepsilon$ and $2k\varepsilon$. Validation includes synthetic recovery, calibration, and scaling experiments, plus four real-data illustrations: well-log geology, array-CGH copy number, equity-return volatility, and CpG-atlas methylation.

2603.10960 2026-05-12 cs.LG math.ST stat.TH

Ranking Reasoning LLMs under Test-Time Scaling

Mohsen Hariri, Michael Hinczewski, Jing Ma, Vipin Chaudhary

AI总结 本文研究了在测试时缩放(test-time scaling)条件下对推理大语言模型(LLMs)进行排名的问题,提出了一个名为Scorio的开源库,实现了多种统计排名方法,如配对比较模型、项目反应理论模型等。实验表明,在多个数学基准测试中,多数方法的排名结果与贝叶斯黄金标准高度一致,且部分方法在单次试验下仍能保持较高一致性。研究为不同预算下的模型排名提供了可靠的解决方案。

Comments Code is available at https://github.com/mohsenhariri/scorio

详情
Journal ref
The 64th Annual Meeting of the Association for Computational Linguistics (ACL), 2026
英文摘要

Test-time scaling evaluates reasoning LLMs by sampling multiple outputs per prompt, but ranking models in this regime remains underexplored. We formalize dense benchmark ranking under test-time scaling and introduce Scorio, a library that implements statistical ranking methods such as paired-comparison models, item response theory (IRT) models, voting rules, and graph- and spectral-based methods. Across $20$ reasoning models on four Olympiad-style math benchmarks (AIME'24, AIME'25, HMMT'25, and BrUMO'25; up to $N=80$ trials), most full-trial rankings agree closely with the Bayesian gold standard $\mathrm{Bayes}_{\mathcal{U}}@80$ (mean Kendall's $τ_b = 0.93$--$0.95$), and $19$--$34$ methods recover exactly the same ordering. In the single-trial regime, the best methods reach $τ_b \approx 0.86$. Using greedy decoding as an empirical prior ($\mathrm{Bayes}_{\mathbf{R}_0}@N$) reduces variance at $N=1$ by $16$--$52\%$, but can bias rankings when greedy and stochastic sampling disagree. These results identify reliable ranking methods for both high- and low-budget test-time scaling. We release Scorio as an open-source library at https://github.com/mohsenhariri/scorio.

2603.04333 2026-05-12 cs.LG cs.AI

What Does Flow Matching Bring To TD Learning?

Bhavya Agrawalla, Michal Nauman, Aviral Kumar

AI总结 该论文探讨了流匹配(flow matching)在时序差分(TD)强化学习中的作用,揭示了其相较于传统价值函数估计方法的优势。研究指出,流匹配通过积分过程中的逐步值读取和密集速度监督,提升了TD学习的稳定性与效率,具体表现为测试时误差的逐步恢复和网络更具可塑性的特征学习。实验表明,流匹配方法在高UTD在线学习等挑战性场景中显著优于传统方法,表现出更高的性能和样本效率。

Comments Added code link, updated acknowledgements

详情
英文摘要

Recent work shows that flow matching can be effective for scalar Q-value function estimation in reinforcement learning (RL), but it remains unclear why or how this approach differs from standard critics. Contrary to conventional belief, we show that their success is not explained by distributional RL, as explicitly modeling return distributions can reduce performance. Instead, we argue that the use of integration for reading out values and dense velocity supervision at each step of this integration process for training improves TD learning via two mechanisms. First, it enables robust value prediction through \emph{test-time recovery}, whereby iterative computation through integration dampens errors in early value estimates as more integration steps are performed. This recovery mechanism is absent in monolithic critics. Second, supervising the velocity field at multiple interpolant values induces more \emph{plastic} feature learning within the network, allowing critics to represent non-stationary TD targets without discarding previously learned features or overfitting to individual TD targets encountered during training. We formalize these effects and validate them empirically, showing that flow-matching critics substantially outperform monolithic critics (2$\times$ in final performance and around 5$\times$ in sample efficiency) in settings where loss of plasticity poses a challenge e.g., in high-UTD online RL problems, while remaining stable during learning.

2602.07209 2026-05-12 cs.RO

Continuum Robot Localization using Distributed Time-of-Flight Sensors

Spencer Teetaert, Giammarco Caroleo, Marco Pontin, Sven Lilge, Jessica Burgner-Kahrs, Timothy D. Barfoot, Perla Maiolino

AI总结 本文研究了如何利用分布式低分辨率飞行时间(ToF)传感器对柔性连续体机器人进行定位。针对传统高分辨率传感器不适用于连续体机器人的问题,作者提出了一种基于分布式ToF传感器与机器人形状先验信息融合的定位方法,能够在传感器频繁出现退化场景的情况下实现高精度定位。实验表明,该方法在多种环境下的定位误差平均为2.5厘米和7.2度,具有良好的鲁棒性和实用性。

Comments Print version, to be published at Robotics: Science and Systems (RSS) 2026

详情
英文摘要

Localization and mapping of an environment are crucial tasks for any robot operating in unstructured environments. Time-of-flight (ToF) sensors (e.g.,~lidar) have proven useful in mobile robotics, where high-resolution sensors can be used for simultaneous localization and mapping. In soft and continuum robotics, however, these high-resolution sensors are too large for practical use. This, combined with the deformable nature of such robots, has resulted in continuum robot (CR) localization and mapping in unstructured environments being a largely untouched area. In this work, we present a localization technique for CRs that relies on small, low-resolution ToF sensors distributed along the length of the robot. By fusing measurement information with a robot shape prior, we show that accurate localization is possible despite each sensor experiencing frequent degenerate scenarios. We achieve an average localization error of 2.5cm in position and 7.2° in rotation across all experimental conditions with a 53cm long robot. We demonstrate that the results are repeated across multiple environments, in both simulation and real-world experiments, and study robustness in the estimation to deviations in the prior map.

2602.07126 2026-05-12 cs.LG

Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting

Joshua Ward, Chi-Hua Wang, Guang Cheng

AI总结 该论文研究了多表合成数据场景下的成员推理攻击问题,针对现实世界中用户信息分布在多个关联表中的情况,提出了新的攻击方法。作者提出了一种基于异构图神经网络的多表成员推理攻击(MT-MIA),能够在用户实体层面更准确地检测隐私泄露风险,相较于传统的单表攻击方法更具针对性。实验表明,该方法能有效揭示当前最先进的多表合成数据生成模型中存在的隐私漏洞。

详情
英文摘要

Synthetic tabular data has gained attention for enabling privacy-preserving data sharing. While substantial progress has been made in single-table synthetic generation where data are modeled at the row or item level, most real-world data exists in relational databases where a user's information spans items across multiple interconnected tables. Recent advances in synthetic relational data generation have emerged to address this complexity, yet release of these data introduce unique privacy challenges as information can be leaked not only from individual items but also through the relationships that comprise a complete user entity. To address this, we propose a novel Membership Inference Attack (MIA) setting to audit the empirical user-level privacy of synthetic relational data and show that single-table MIAs that audit at an item level underestimate user-level privacy leakage. We then propose Multi-Table Membership Inference Attack (MT-MIA), a novel adversarial attack under a No-Box threat model that targets learned representations of user entities via Heterogeneous Graph Neural Networks. By incorporating all connected items for a user, MT-MIA better targets user-level vulnerabilities induced by inter-tabular relationships than existing attacks. We evaluate MT-MIA on a range of real-world multi-table datasets and demonstrate that this vulnerability exists in state-of-the-art relational synthetic data generators, employing MT-MIA to additionally study where this leakage occurs.

2602.05902 2026-05-12 cs.LG cs.AI

CoreQ: Learning-Free Mismatch Correction and Successive Rounding for Quantization

Seohyeon Cha, Huancheng Chen, Dongjun Kim, Haoran Zhang, Kevin Chan, Gustavo de Veciana, Haris Vikalo

AI总结 本文提出了一种无需训练的后训练量化方法 CoreQ,旨在解决量化过程中因前层量化误差导致的层间激活不匹配问题。CoreQ 通过几何分解推导出一个闭式系数,动态调整各层的校准目标,有效减少对有限校准数据的过拟合,并无需超参数调优。该方法结合贪心逐步四舍五入求解器和改进的束搜索扩展,显著提升了多种大语言模型在不同量化设置下的困惑度和下游任务性能。

详情
英文摘要

Post-training quantization (PTQ) enables efficient deployment of large language models by mapping pretrained weights to low-bit formats without retraining, typically using a small calibration set to minimize a layer-wise calibration objective. However, this sequential procedure induces a mismatch: errors from earlier quantized layers alter the inputs received by later layers, causing the activations to deviate from those of the full-precision model. Recent approaches introduce mismatch-aware calibration objectives to compensate for this effect, but leave open how much of the observed mismatch should shift each layer's calibration target. Fully applying this correction can overfit limited calibration data, while scaling the mismatch correction with a fixed coefficient ignores varying reliability of mismatch estimates across layers. To address these limitations, we propose CoreQ, a learning-free PTQ framework that applies a closed-form coefficient for mismatch correction derived from a geometric decomposition of the mismatch. The resulting coefficient adapts the correction across layers, reduces overfitting to finite calibration data, and requires no hyperparameter tuning. Given the corrected target, CoreQ minimizes the induced triangular least-squares objective with an efficient greedy successive-rounding solver and a bounded beam-search extension, K-CoreQ, that trades modest additional compute for improved performance. Across multiple LLM families, scales, bit-widths, and quantization settings, CoreQ improves perplexity and downstream accuracy over strong PTQ baselines.

2512.08875 2026-05-12 cs.LG cs.AI

When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation

Joshua Ward, Bochao Gu, Chi-Hua Wang, Guang Cheng

AI总结 本文研究了基于大语言模型(LLM)生成表格合成数据时可能存在的隐私泄露问题,指出当前主流方法在生成过程中容易记忆并复现训练数据中的数字序列,从而导致隐私风险。为此,作者提出了一种名为LevAtt的无盒成员推理攻击方法,仅通过合成数据即可检测出模型是否泄露了训练数据中的信息,并在多个模型和数据集上验证了该攻击的有效性。针对这一问题,作者进一步提出了两种防御方法,其中一种新颖的采样策略能够在保持数据质量的同时有效缓解隐私泄露风险。

详情
英文摘要

Large Language Models (LLMs) have recently demonstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two primary approaches have emerged for adapting LLMs to tabular data generation: (i) fine-tuning smaller models directly on tabular datasets, and (ii) prompting larger models with examples provided in context. In this work, we show that popular implementations from both regimes exhibit a tendency to compromise privacy by reproducing memorized patterns of numeric digits from their training data. To systematically analyze this risk, we introduce a simple No-box Membership Inference Attack (MIA) called LevAtt that assumes adversarial access to only the generated synthetic data and targets the string sequences of numeric digits in synthetic observations. Using this approach, our attack exposes substantial privacy leakage across a wide range of models and datasets, and in some cases, is even a perfect membership classifier on state-of-the-art models. Our findings highlight a unique privacy vulnerability of LLM-based synthetic data generation and the need for effective defenses. To this end, we propose two methods, including a novel sampling strategy that strategically perturbs digits during generation. Our evaluation demonstrates that this approach can defeat these attacks with minimal loss of fidelity and utility of the synthetic data.

2509.21677 2026-05-12 cs.LG cs.SE

Prophecy: Inferring Formal Properties from Neuron Activations

Divya Gopinath, Corina S. Pasareanu, Muhammad Usman

AI总结 本文提出了一种名为 Prophecy 的工具,用于自动推断前馈神经网络的形式化属性。该工具基于一个关键观察:网络内部层中神经元的激活状态能够捕捉其逻辑行为,Prophecy 通过提取基于神经元激活状态的规则,来推导出对输出属性(如预测类别)的保证条件。该方法在形式化解释、组合验证、运行时监控等多个方面展现出广泛应用潜力,尤其在大语言视觉模型时代具有重要价值。

详情
英文摘要

We present Prophecy, a tool for automatically inferring formal properties of feed-forward neural networks. Prophecy is based on the observation that a significant part of the logic of feed-forward networks is captured in the activation status of the neurons at inner layers. Prophecy works by extracting rules based on neuron activations (values or on/off statuses) as preconditions that imply certain desirable output property, e.g., the prediction being a certain class. These rules represent network properties captured in the hidden layers that imply the desired output behavior. We present the architecture of the tool, highlight its features and demonstrate its usage on different types of models and output properties. We present an overview of its applications, such as inferring and proving formal explanations of neural networks, compositional verification, run-time monitoring, repair, and others. We also show novel results highlighting its potential in the era of large vision-language models.

2509.21484 2026-05-12 cs.LG stat.ML

High-probability zeroth-order online convex optimisation beyond Euclidean geometry

David Janz, El-Mahdi El-Mhamdi, Arya Akhavan

AI总结 本文研究了在非欧几里得几何框架下的零阶在线凸优化问题,考虑了具有$\ell_q$-Lipschitz损失函数和$\ell_p$-正则化FTRL算法的优化方法,并基于$\ell_r$-球上的锥测度采样构造了随机两点有限差分梯度估计器。作者给出了适用于所有$p,q,r \in [1,\infty]$的高概率统一后悔界,并通过分析梯度估计器在对偶FTRL范数下的所有矩界,实现了时间一致的二次变分控制。该算法具有任意时刻有效和数据驱动的特点,其收敛速率在已有研究中得到了强化,并揭示了在$q > 2$时存在与估计器本身相关的性能瓶颈。

详情
英文摘要

We study online convex optimisation with $\ell_q$-Lipschitz losses, $\ell_p$-regularised FTRL, and randomised two-point finite-difference gradient estimators based on cone-measure sampling from $\ell_r$-spheres. For random Lipschitz losses whose mean is convex, we prove unified high-probability regret bounds for all $p,q,r \in [1,\infty]$. The analysis is driven by all-moment bounds for the gradient estimator in the dual FTRL norm, yielding time-uniform control of the quadratic variation. The algorithm is anytime and data-driven; in the special cases previously studied, its rates recover the known in-expectation guarantees while strengthening them to time-uniform high probability. Together with constant-probability lower bounds, these results establish optimality for $q\in[1,2]$ under appropriate sampling geometry, and expose a gap for $q>2$ that appears intrinsic to the estimators themselves.