arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1709
专题追踪
2508.03736 2026-06-15 cs.CV cs.AI 版本更新

Fusion of Pervasive RF Data with Spatial Images via Vision Transformers for Enhanced Mapping in Smart Cities

通过视觉Transformer融合泛在射频数据与空间图像以增强智慧城市地图构建

Rafayel Mkrtchyan, Armen Manukyan, Hrant Khachatrian, Theofanis P. Raptis

发表机构 * Yerevan State University(亚美尼亚国立大学) Consiglio Nazionale delle Ricerche(意大利国家研究委员会)

AI总结 提出基于DINOv2的深度学习框架,融合开源地图与射频数据,利用视觉Transformer联合处理多模态信息,在合成与真实数据集上实现65.3%和64.9%的宏观IoU,显著优于单一数据源方法。

Comments Work supported by funding under the bilateral agreement between CNR (Italy) and HESC MESCS RA (Armenia) as part of the DeepRF project for the 2025-2026 biennium, and by the HESC MESCS RA grant No. 22rl-052 (DISTAL)

Journal ref Pervasive and Mobile Computing, Article 102261, 2026

详情
AI中文摘要

本文提出一种基于深度学习的方法,集成DINOv2架构,通过结合来自开源平台的(可能错误的)地图与从多个无线用户设备和基站收集的泛在射频(RF)数据,改进建筑地图构建。与先前方法不同,我们的方法利用基于视觉Transformer的架构,在统一框架内联合处理RF和地图模态,有效捕捉空间依赖性和结构先验,以提高地图构建精度。为评估目的,我们使用华为联合制作的合成数据集。为应对真实世界数据不完善的挑战,我们向其RF数据引入受控噪声以模拟真实条件。此外,我们开发并训练了一个仅利用聚合路径损耗信息来解决地图构建问题的模型。我们根据三个性能指标衡量结果:Jaccard指数(交并比,IoU)、Hausdorff距离和Chamfer距离。我们的设计实现了65.3%的宏观IoU,显著超过(i)错误地图基线(40.1%)、(ii)文献中仅使用RF的方法(37.3%)以及(iii)我们设计的非AI融合基线(42.2%)。对比评估突显了仅依赖RF数据或空间数据的局限性,以及AI在融合数据以提升智慧城市地图构建精度方面的有效性。我们还在奥斯陆地区的真实世界数据上进一步验证了我们的方法,通过真实部署环境补充了合成评估,其中我们的最佳融合模型达到了64.9%的宏观IoU。我们还概述了一种通过使用重叠窗口对区域进行分块来在更大区域上部署模型的策略。

英文摘要

In this paper, we present a deep learning-based approach that integrates the DINOv2 architecture to improve building mapping by combining (possibly erroneous) maps from open-source platforms with pervasive radio frequency (RF) data collected from multiple wireless user equipments and base stations. Unlike prior methods, our approach leverages a vision transformer-based architecture to jointly process both RF and map modalities within a unified framework, effectively capturing spatial dependencies and structural priors for enhanced mapping accuracy. For the evaluation purposes, we employ a synthetic dataset co-produced by Huawei. To address the challenges associated with real-world data imperfections, we introduce controlled noise to its RF data so as to simulate real-world conditions. Additionally, we develop and train a model that leverages only aggregated path loss information to tackle the mapping problem. We measure the results according to three performance metrics: the Jaccard index (intersection over union, IoU), the Hausdorff distance, and the Chamfer distance. Our design achieves a macro IoU of 65.3%, significantly surpassing (i) the erroneous maps baseline, which yields 40.1%, (ii) an RF-only method from the literature, which yields 37.3%, and (iii) a non-AI fusion baseline that we designed which yields 42.2%. The comparative evaluation highlights the limitations of relying solely on RF data or on spatial data, as well as the effectiveness that AI can have on fusing data towards enhancing smart city mapping accuracy. We further validate our method on real-world data from the Oslo region, complementing the synthetic evaluation with a real deployment setting, where our best fusion model reaches 64.9% macro IoU. We additionally outline a strategy for deploying the model over larger areas by tiling the region with overlapping windows.

2504.20908 2026-06-15 cs.LG 版本更新

MOSIC: Model-Agnostic Optimal Subgroup Identification with Multi-Constraint for Improved Reliability

MOSIC: 模型无关的多约束最优子群识别以提升可靠性

Wenxin Chen, Weishen Pan, Kyra Gan, Fei Wang

发表机构 * Cornell University(康奈尔大学) Weill Cornell Medicine(韦尔·科恩医学中心) Operations Research and Information Engineering(运筹学与信息工程)

AI总结 提出统一优化框架,将约束直接融入子群识别优化过程,通过梯度下降-上升算法求解,实现模型无关且满足多约束的最优子群识别。

详情
AI中文摘要

当前的子群识别方法通常采用两步法:首先估计条件平均处理效应,然后应用阈值或基于规则的程序来定义子群。虽然直观,但这种解耦方法未能纳入对现实临床决策至关重要的关键约束,如子群大小和倾向性重叠。这些约束在根本不同的轴上运作,与CATE估计不同,并且不能自然地适应现有框架,从而限制了这些方法的实际适用性。我们提出了一个统一的优化框架,直接求解原始约束优化问题以识别最优子群。我们的关键创新是将约束原始问题重新表述为无约束可微的最小-最大目标,通过梯度下降-上升算法求解。我们从理论上证明我们的解收敛到可行且局部最优的解。与将约束作为事后过滤器的基于阈值的CATE方法不同,我们的方法在优化过程中直接强制执行约束。该框架是模型无关的,兼容各种CATE估计器,并可扩展到额外约束,如成本限制或公平性标准。在合成和真实数据集上的大量实验证明了其在识别高收益子群的同时更好地满足约束的有效性。

英文摘要

Current subgroup identification methods typically follow a two-step approach: first estimate conditional average treatment effects and then apply thresholding or rule-based procedures to define subgroups. While intuitive, this decoupled approach fails to incorporate key constraints essential for real-world clinical decision-making, such as subgroup size and propensity overlap. These constraints operate on fundamentally different axes than CATE estimation and are not naturally accommodated within existing frameworks, thereby limiting the practical applicability of these methods. We propose a unified optimization framework that directly solves the primal constrained optimization problem to identify optimal subgroups. Our key innovation is a reformulation of the constrained primal problem as an unconstrained differentiable min-max objective, solved via a gradient descent-ascent algorithm. We theoretically establish that our solution converges to a feasible and locally optimal solution. Unlike threshold-based CATE methods that apply constraints as post-hoc filters, our approach enforces them directly during optimization. The framework is model-agnostic, compatible with a wide range of CATE estimators, and extensible to additional constraints like cost limits or fairness criteria. Extensive experiments on synthetic and real-world datasets demonstrate its effectiveness in identifying high-benefit subgroups while maintaining better satisfaction of constraints.

2507.20068 2026-06-15 cs.LG stat.ML 版本更新

PERRY: Policy Evaluation with Confidence Intervals using Auxiliary Data

PERRY: 使用辅助数据的策略评估与置信区间

Aishwarya Mandyam, Jason Meng, Ge Gao, Jiankai Sun, Mac Schwager, Barbara E. Engelhardt, Emma Brunskill

发表机构 * Stanford University(斯坦福大学) Gladstone Institutes(加利福尼亚大学旧金山分校)

AI总结 提出两种方法,利用辅助数据构建离线策略评估的置信区间,通过共形预测和双重稳健估计,在多个模拟和真实医疗数据集上验证有效性。

详情
AI中文摘要

离线策略评估(OPE)方法在部署前估计新强化学习(RL)策略的价值。最近的研究表明,利用辅助数据集(例如由生成模型合成的数据)可以提高OPE方法的准确性。不幸的是,此类辅助数据集也可能存在偏差,并且现有在OPE中使用数据增强的方法缺乏原则性的不确定性量化。在医疗等高风险领域,可靠的不确定性估计对于确保RL策略的安全和知情部署至关重要。在这项工作中,我们提出了两种方法来构建带有数据增强的OPE的有效置信区间。第一种方法提供了关于$V^{\pi}(s)$的置信区间,即条件于初始状态$s$的策略价值。为此,我们引入了一种适用于具有连续状态空间的马尔可夫决策过程(MDP)的新共形预测方法,将先前工作扩展到更高维度的设置。其次,我们考虑更常见的任务,即估计所有初始状态上的平均策略性能$V^{\pi}$;我们引入了一种方法,该方法借鉴了双重稳健估计和预测驱动推断的思想。在涵盖库存管理、机器人、医疗以及来自MIMIC-IV的真实医疗数据集的模拟器中,我们发现我们的方法可以有效利用辅助数据,并一致地产生覆盖真实策略价值的置信区间,这与先前提出的方法不同。我们的工作使得OPE能够在高风险领域提供严格的不确定性估计成为可能。

英文摘要

Off-policy evaluation (OPE) methods estimate the value of a new reinforcement learning (RL) policy prior to deployment. Recent advances have shown that leveraging auxiliary datasets, such as those synthesized by generative models, can improve the accuracy of OPE methods. Unfortunately, such auxiliary datasets may also be biased, and existing methods for using data augmentation within OPE lack principled uncertainty quantification. In high stakes domains like healthcare, reliable uncertainty estimates are important for ensuring safe and informed deployment of RL policies. In this work, we propose two methods to construct valid confidence intervals for OPE with data augmentation. The first provides a confidence interval over $V^π(s)$, the policy value conditioned on an initial state $s$. To do so we introduce a new conformal prediction method suitable for Markov Decision Processes (MDPs) with continuous state spaces, extending prior work to higher-dimensional settings. Second, we consider the more common task of estimating the average policy performance over all initial states, $V^π$; we introduce a method that draws on ideas from doubly robust estimation and prediction powered inference. Across simulators spanning inventory management, robotics, healthcare, and a real healthcare dataset from MIMIC-IV, we find that our methods can effectively leverage auxiliary data and consistently produce confidence intervals that cover the ground truth policy values, unlike previously proposed methods. Our work enables a future in which OPE can provide rigorous uncertainty estimates for high-stakes domains.

2303.09209 2026-06-15 cs.AI 版本更新

Learning optimal policies from event logs through reinforcement learning: a comparison of deep and MDP-based approaches

从事件日志中通过强化学习学习最优策略:基于深度和MDP的方法比较

Stefano Branchi, Andrei Buliga, Chiara Di Francescomarino, Chiara Ghidini, Riccardo Graziosi, Francesca Meneghello, Massimiliano Ronzani

发表机构 * FBK - Fondazione Bruno Klopfer(FBK - 基础研究机构布鲁诺·克洛普弗) Unitn(乌迪内大学) Unibz(博尔扎诺大学)

AI总结 提出两种强化学习方法(基于MDP和离线深度RL)从历史事件日志中学习最优行为策略以优化KPI,在数据驱动的BPS环境中评估,两种方法均有效提升KPI,但基于MDP的方法计算效率更高。

Comments 38 pages + appendix, 12 figures, new version published in IS journal

Journal ref Information Systems, Volume 141, 2026, 102763, ISSN 0306-4379

详情
AI中文摘要

规范性流程监控是流程挖掘中的一个新兴领域,专注于推荐行动以优化业务成果。大多数现有工作规定预定义的干预措施,即应用于正在进行的流程执行以实现特定目标或关键绩效指标(KPI)的一组行动。相比之下,只有少数方法探索了学习和评估最优行为策略,即确定最佳行动序列以最大化期望KPI的通用策略。在本文中,我们通过提出一种基于AI的方法来解决学习最优行为策略的问题,该方法使用强化学习(RL)直接从历史流程执行中学习最优策略,以推荐优化KPI的最佳行动。为此,我们采用了两种RL技术。第一种是经典的基于模型的方法,通过构建捕获流程行为的马尔可夫决策过程(MDP)来扩展作者先前的工作。第二种是基于离线深度RL的无模型技术。与现有工作不同,我们旨在最小化领域知识的使用,并直接从历史事件数据中学习最优策略。这使我们能够学习何时应用干预措施,并直接从数据中发现有效的干预措施。此外,我们针对涉及外部参与者的复杂场景,其中流程所有者仅控制部分活动。我们采用数据驱动的业务流程模拟(BPS)环境来评估学习到的策略。结果表明,两种方法都以相似的有效性改进了目标KPI,而基于模型的方法在计算效率上优于离线深度RL。

英文摘要

Prescriptive Process Monitoring is an emerging area within Process Mining that focuses on recommending actions to optimize business outcomes. Most existing works prescribe pre-defined interventions, i.e., sets of actions applied to ongoing process executions to achieve a specific objective or Key Performance Indicator (KPI). In contrast, only a few approaches have explored learning and evaluating optimal behavioral policies, i.e., general strategies that determine the best sequence of actions to maximize a desired KPI. In this paper, we address the problem of learning optimal behavioral policies by proposing an AI-based approach that learns an optimal policy directly from historical process executions using Reinforcement Learning (RL) to recommend the best actions for optimizing a KPI. To this end, we employ two RL techniques. The first is a classical model-based approach that extends previous work by the authors through the construction of a Markov Decision Process (MDP) capturing process behavior. The second is a model-free technique based on offline Deep RL. Unlike state-of-the-art work, we aim to minimize the use of domain knowledge and learn optimal policies directly from historical event data. This allows us to learn when to apply interventions and discover effective ones directly from data. Moreover, we target complex scenarios involving external actors, where the process owner controls only part of the activities. We adopt a data-driven Business Process Simulation (BPS) environment to evaluate the learned policies. Results show that both methods improve the targeted KPI with similar effectiveness, while the model-based approach outperforms offline Deep RL in computational efficiency.

2506.17255 2026-06-15 cs.LG cs.AI 版本更新

UltraSketchLLM: Sub-1-Bit LLM Compression via Sketch and Hardware-Friendly Operators

UltraSketchLLM:基于草图与硬件友好算子的低于1比特LLM压缩

Sunan Zou, Xueting Sun, Ziyun Zhang, Guojie Luo

发表机构 * National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University(国家多媒体信息处理重点实验室,计算机科学学院,北京大学) School of Electronic Engineering and Computer Science, Peking University(电子工程与计算机科学学院,北京大学) Center for Energy-efficient Computing and Applications, Peking University(能效计算与应用中心,北京大学)

AI总结 提出UltraSketchLLM,利用数据草图将LLM权重压缩至0.5比特,结合硬件友好实现,在保持可接受性能下降的同时实现14.9倍加速。

Comments Accepted by the 63rd ACM/IEEE The Chips to Systems Conference (DAC 2026)

详情
AI中文摘要

大型语言模型(LLM)如今需要更大的GPU内存,因此需要高效且极端的权重压缩方法。现有的压缩方法要么在理论上受限于每权重1比特,要么面临严重的性能下降和效率低下。为了在资源受限的场景中部署LLM,我们引入了UltraSketchLLM,它利用数据草图压缩LLM。它通过高达每权重0.5比特的高压缩率降低了峰值GPU内存占用。结合硬件友好的实现,UltraSketchLLM保持了可容忍的性能下降和极低的延迟开销,与朴素草图解决方案相比实现了14.9倍的加速。

英文摘要

Large language models (LLMs) require larger GPU memory size these days, necessitating efficient and extreme weight compression methods. Existing compression methods are either theoretically limited by 1 bit per weight or face severe performance degradation and inefficiency. To deploy LLMs in resource-constrained scenarios, we introduce UltraSketchLLM, compressing LLMs with data sketch. It reduces peak GPU memory footprint with a high compression rate down to 0.5 bit per weight. Combined with hardware-friendly implementation, UltraSketchLLM keeps tolerable performance degradation and extremely low latency overhead with 14.9x speedup compared to naive sketch solution.

2505.12992 2026-06-15 cs.LG cs.AI cs.CL stat.ML 版本更新

Fractured Chain-of-Thought Reasoning

断裂链式思维推理

Baohao Liao, Hanze Dong, Yuhui Xu, Doyen Sahoo, Christof Monz, Junnan Li, Caiming Xiong

发表机构 * University of Amsterdam(阿姆斯特丹大学) eBay Microsoft(微软) Google Research(谷歌研究) Salesforce

AI总结 提出断裂采样策略,通过截断推理链、调整轨迹数和解数,在推理时实现精度与成本的帕累托最优。

详情
AI中文摘要

推理时扩展技术通过在不重新训练的情况下利用额外的推理计算,显著增强了大型语言模型(LLMs)的推理能力。类似地,链式思维(CoT)提示及其扩展Long CoT通过生成丰富的中间推理轨迹来提高准确性,但这些方法会带来大量的token成本,阻碍了它们在延迟敏感场景中的部署。在这项工作中,我们首先证明截断CoT(即在完成推理前停止并直接生成最终答案)通常在使用显著更少token的情况下与完整CoT采样相匹配。基于这一见解,我们引入了断裂采样,这是一种统一的推理时策略,沿着三个正交轴在完整CoT和仅解决方案采样之间进行插值:(1)推理轨迹的数量,(2)每条轨迹的最终解数量,以及(3)推理轨迹被截断的深度。通过在五个不同的推理基准和多个模型规模上进行大量实验,我们证明断裂采样始终实现优越的精度-成本权衡,在Pass@k与token预算之间产生陡峭的对数线性缩放增益。我们的分析揭示了如何在这些维度上分配计算以最大化性能,为更高效和可扩展的LLM推理铺平了道路。代码可在该https URL获取。

英文摘要

Inference-time scaling techniques have significantly bolstered the reasoning capabilities of large language models (LLMs) by harnessing additional computational effort at inference without retraining. Similarly, Chain-of-Thought (CoT) prompting and its extension, Long CoT, improve accuracy by generating rich intermediate reasoning trajectories, but these approaches incur substantial token costs that impede their deployment in latency-sensitive settings. In this work, we first show that truncated CoT, which stops reasoning before completion and directly generates the final answer, often matches the full CoT sampling while using dramatically fewer tokens. Building on this insight, we introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling along three orthogonal axes: (1) the number of reasoning trajectories, (2) the number of final solutions per trajectory, and (3) the depth at which reasoning traces are truncated. Through extensive experiments on five diverse reasoning benchmarks and several model scales, we demonstrate that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget. Our analysis reveals how to allocate computation across these dimensions to maximize performance, paving the way for more efficient and scalable LLM reasoning. Code is available at https://github.com/BaohaoLiao/frac-cot.

2502.00869 2026-06-15 cs.CV 版本更新

A Unified Theory of Sinusoidal Activation Families for Implicit Neural Representations

隐式神经表示的正弦激活函数族统一理论

Alireza Morsali, MohammadJavad Vaez, Mohammadhossein Soltani, Amirhossein Kazerouni, Babak Taati, Morteza Mohammad-Noori

发表机构 * McGill University(麦吉尔大学) University of Melbourne(墨尔本大学) ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS)(细胞系统数学分析卓越中心(MACSYS)) University of Toronto(多伦多大学) Vector Institute(向量研究所) University Health Network(大学健康网络) University of Tehran(塔里斯坦大学)

AI总结 提出STAF框架,通过可学习振幅、频率和相位的傅里叶式激活函数,理论分析其表达能力、NTK谱变化及初始化,实验证明在图像、音频、形状、逆问题和NeRF任务中优于或匹敌现有方法。

Comments Published in TMLR

详情
AI中文摘要

隐式神经表示(INR)使用紧凑神经网络建模连续信号,已成为视觉、图形和信号处理中的标准工具。一个核心挑战是在没有繁重的手工编码或脆弱的训练启发式方法的情况下准确捕捉精细细节。在文献中,周期激活函数已成为一种引人注目的补救措施:从使用固定全局频率的单一正弦波的SIREN,到采用多个正弦波并在某些情况下使用可训练频率和相位的更近期架构。我们研究了这一正弦激活函数族,并为INR中的可训练正弦激活函数开发了一个有原则的理论和实践框架。具体来说,我们通过正弦可训练激活函数(STAF)实例化该框架,这是一种傅里叶式激活函数,其振幅、频率和相位都是可学习的。我们的分析(i)建立了一个Kronecker等价构造,将可训练正弦激活函数与标准正弦网络联系起来,并量化了表达能力增长;(ii)刻画了在可训练正弦参数化下神经正切核(NTK)谱的变化;(iii)提供了一种初始化方法,无需渐近中心极限定理(CLT)论证即可产生标准正态后激活。在实验上,对于图像、音频、形状、逆问题(超分辨率、去噪)和NeRF,STAF在评估的INR任务中,在PSNR/SSIM等面向失真的重建指标上具有竞争力且通常更强,并且在逐层共享下具有有利的参数效率。虽然周期激活函数可以缓解谱偏差的实际表现,但我们的结果表明它们并未消除谱偏差;相反,可训练正弦函数可以改善评估设置中观察到的容量-优化权衡。

英文摘要

Implicit Neural Representations (INRs) model continuous signals with compact neural networks and have become a standard tool in vision, graphics, and signal processing. A central challenge is accurately capturing fine detail without heavy hand-crafted encodings or brittle training heuristics. Across the literature, periodic activations have emerged as a compelling remedy: from SIREN, which uses a single sinusoid with a fixed global frequency, to more recent architectures employing multiple sinusoids and, in some cases, trainable frequencies and phases. We study this family of sinusoidal activations and develop a principled theoretical and practical framework for trainable sinusoidal activations in INRs. Concretely, we instantiate this framework with Sinusoidal Trainable Activation Functions (STAF), a Fourier-like activation whose amplitudes, frequencies, and phases are learned. Our analysis (i) establishes a Kronecker-equivalence construction that expresses trainable sinusoidal activations with standard sine networks and quantifies expressive growth, (ii) characterizes how the Neural Tangent Kernel (NTK) spectrum changes under trainable sinusoidal parameterization, and (iii) provides an initialization that yields standard normal post-activations without asymptotic central limit theorem (CLT) arguments. Empirically, on images, audio, shapes, inverse problems (super-resolution, denoising) and NeRF, STAF is competitive and often stronger on distortion-oriented reconstruction metrics such as PSNR/SSIM across the evaluated INR tasks, with favorable parameter efficiency under layer-wise sharing. While periodic activations can alleviate practical manifestations of spectral bias, our results indicate they do not eliminate it; instead, trainable sinusoids can improve the observed capacity-optimization trade-off in the evaluated settings.

2505.16988 2026-06-15 cs.CL cs.AI cs.MA 版本更新

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems

MASLab:基于LLM的多智能体系统的统一全面代码库

Rui Ye, Keduan Huang, Qimin Wu, Yuzhu Cai, Tian Jin, Xianghe Pang, Xiangrui Liu, Jiaqi Su, Chen Qian, Bohan Tang, Kaiqu Liang, Jiaao Chen, Yue Hu, Zhenfei Yin, Rongye Shi, Bo An, Yang Gao, Wenjun Wu, Lei Bai, Siheng Chen

发表机构 * Shanghai Jiao Tong University(上海交通大学) Shanghai AI Laboratory(上海人工智能实验室) University of Oxford(牛津大学) Princeton University(普林斯顿大学) Meta University of Michigan(密歇根大学) The University of Sydney(悉尼大学) Beihang University(北航) Nanyang Technological University(南洋理工大学) Nanjing University(南京大学)

AI总结 提出MASLab代码库,集成20余种方法,提供统一环境与标准化评估,降低研究门槛,覆盖10+基准测试和8种模型。

Comments 18 pages, 11 figures

详情
AI中文摘要

基于LLM的多智能体系统(MAS)在增强单个LLM以解决实际应用中复杂多样任务方面展现出巨大潜力。尽管取得了显著进展,该领域缺乏统一代码库来整合现有方法,导致重复实现、不公平比较和研究人员的高入门门槛。为应对这些挑战,我们引入MASLab,一个统一、全面且研究友好的基于LLM的MAS代码库。(1)MASLab集成了跨多个领域的20余种已建立方法,每种方法均通过逐步输出与官方实现的比较得到严格验证。(2)MASLab提供统一环境,包含多种基准测试,用于方法间的公平比较,确保一致输入和标准化评估协议。(3)MASLab在共享的简化结构中实现方法,降低了理解和扩展的门槛。基于MASLab,我们进行了涵盖10+基准测试和8种模型的广泛实验,为研究人员提供了当前MAS方法格局的清晰全面视图。MASLab将持续发展,跟踪该领域最新进展,并欢迎更广泛开源社区的贡献。

英文摘要

LLM-based multi-agent systems (MAS) have demonstrated significant potential in enhancing single LLMs to address complex and diverse tasks in practical applications. Despite considerable advancements, the field lacks a unified codebase that consolidates existing methods, resulting in redundant re-implementation efforts, unfair comparisons, and high entry barriers for researchers. To address these challenges, we introduce MASLab, a unified, comprehensive, and research-friendly codebase for LLM-based MAS. (1) MASLab integrates over 20 established methods across multiple domains, each rigorously validated by comparing step-by-step outputs with its official implementation. (2) MASLab provides a unified environment with various benchmarks for fair comparisons among methods, ensuring consistent inputs and standardized evaluation protocols. (3) MASLab implements methods within a shared streamlined structure, lowering the barriers for understanding and extension. Building on MASLab, we conduct extensive experiments covering 10+ benchmarks and 8 models, offering researchers a clear and comprehensive view of the current landscape of MAS methods. MASLab will continue to evolve, tracking the latest developments in the field, and invite contributions from the broader open-source community.

2505.16120 2026-06-15 cs.AI 版本更新

LLM-Powered AI Agent Systems and Their Applications in Industry

基于大语言模型的AI智能体系统及其工业应用

Guannan Liang, Qianqian Tong

发表机构 * GitHub

AI总结 本文综述了从传统智能体到LLM驱动智能体的演进,分类为软件、物理和自适应混合系统,并讨论了在客服、软件开发、制造、教育、金融和医疗等领域的应用及挑战。

Comments This is the author's accepted version of the paper accepted to appear at IEEE AIIoT 2025. The final version will be available via IEEE Xplore. \c{opyright}2025 IEEE. Personal use of this material is permitted

详情
AI中文摘要

大型语言模型(LLM)的出现重塑了智能体系统。与任务范围有限的传统基于规则的智能体不同,LLM驱动的智能体提供了更大的灵活性、跨领域推理和自然语言交互能力。此外,通过集成多模态LLM,当前的智能体系统能够处理包括文本、图像、音频和结构化表格数据在内的多种数据模态,从而实现更丰富、更自适应的现实世界行为。本文全面考察了从LLM前时代到当前LLM驱动架构的智能体系统演进。我们将智能体系统分为基于软件、物理和自适应混合系统,重点介绍了在客户服务、软件开发、制造自动化、个性化教育、金融交易和医疗保健中的应用。我们进一步讨论了LLM驱动智能体带来的主要挑战,包括高推理延迟、输出不确定性、缺乏评估指标和安全漏洞,并提出了缓解这些问题的潜在解决方案。

英文摘要

The emergence of Large Language Models (LLMs) has reshaped agent systems. Unlike traditional rule-based agents with limited task scope, LLM-powered agents offer greater flexibility, cross-domain reasoning, and natural language interaction. Moreover, with the integration of multi-modal LLMs, current agent systems are highly capable of processing diverse data modalities, including text, images, audio, and structured tabular data, enabling richer and more adaptive real-world behavior. This paper comprehensively examines the evolution of agent systems from the pre-LLM era to current LLM-powered architectures. We categorize agent systems into software-based, physical, and adaptive hybrid systems, highlighting applications across customer service, software development, manufacturing automation, personalized education, financial trading, and healthcare. We further discuss the primary challenges posed by LLM-powered agents, including high inference latency, output uncertainty, lack of evaluation metrics, and security vulnerabilities, and propose potential solutions to mitigate these concerns.

2505.04671 2026-06-15 cs.CL cs.LG 版本更新

Reward-SQL: Boosting Text-to-SQL via Stepwise Execution-Aware Reasoning and Process-Supervised Rewards

Reward-SQL:通过逐步执行感知推理和过程监督奖励提升Text-to-SQL

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Guoliang Li, Bin Wu, Wenchao Zhou

发表机构 * Renmin University of China(中国人民大学) Tsinghua University(清华大学) Alibaba Cloud Computing(阿里云 computing)

AI总结 针对强化学习在Text-to-SQL中缺乏逐步执行感知推理和过程级奖励的问题,提出CoCTE框架和Reward-SQL方法,通过中间视图验证、结构化CTE及过程奖励模型,显著提升复杂查询的准确性和可解释性。

详情
AI中文摘要

最近,使用强化学习(RL)训练的大型语言模型(LLMs)的进展提高了Text-to-SQL的性能。然而,基于RL的方法仍然在处理复杂查询时面临两个关键限制:缺乏基于数据库反馈的逐步执行感知推理,以及缺乏用于指导推理优化的过程级奖励。为了解决这些问题,我们提出了CoCTE,一种分治且执行感知的推理框架,通过中间视图验证和结构化公共表表达式(CTEs)逐步组合SQL查询,提高了准确性和可解释性。为了实现CoCTE推理过程,我们开发了Reward-SQL,一种统一的方法,包含三个阶段:(1)模型初始化,使LLMs具备结构化CoCTE推理能力;(2)过程奖励设计,提供细粒度的、执行感知的监督;(3)过程监督的RL和推理,将过程奖励整合到训练中,并通过过程奖励指导推理阶段。本文解决了Reward-SQL中的核心挑战,并做出了以下贡献。我们引入了一个过程奖励模型(PRM),它将执行感知的轨迹评分与基于熵的步骤加权相结合,在推理步骤中提供密集且可解释的监督。我们将PRM集成到RL训练和推理阶段,稳定优化并通过过程级信号改进轨迹探索。实验表明,Reward-SQL在可比模型大小下显著优于基线,并表现出强大的跨领域泛化能力。

英文摘要

Recent advances in large language models (LLMs) trained with reinforcement learning (RL) have improved Text-to-SQL performance. However, RL-based approaches still struggle with complex queries due to two key limitations: insufficient stepwise execution-aware reasoning grounded in database feedback, and the lack of process-level rewards for guiding reasoning optimization. To address these issues, we propose CoCTE, a divide-and-conquer and execution-aware reasoning framework that progressively composes SQL queries through intermediate view validation and structured Common Table Expressions (CTEs), improving both accuracy and interpretability. To realize a CoCTE reasoning process, we develop Reward-SQL, a unified approach with three stages: (1) model initialization, which equips LLMs with structured CoCTE reasoning capabilities; (2) process reward design, which delivers fine-grained, execution-aware supervision; and (3) process-supervised RL and inference, which integrates process rewards into training and guides the inference stage by process rewards. This paper addresses the core challenges in Reward-SQL and makes the following contributions. We introduce a process reward model (PRM) that combines execution-aware trajectory scoring with entropy-based step weighting, providing dense and interpretable supervision across reasoning steps. We integrate PRM into both RL training and inference stages, stabilizing optimization and improving trajectory exploration with process-level signals. Experiments show that Reward-SQL significantly outperforms baselines with comparable model sizes, and exhibits strong cross-domain generalization.

2503.14331 2026-06-15 cs.RO cs.CV cs.SY eess.SY 版本更新

ADAPT: An Autonomous Forklift for Construction Site Operation

ADAPT:一种用于建筑工地作业的自主叉车

Johannes Huemer, Markus Murschitz, Matthias Schörghuber, Lukas Reisinger, Thomas Kadiofsky, Christoph Weidinger, Mario Niedermeyer, Benedikt Widy, Marcel Zeilinger, Csaba Beleznai, Tobias Glück, Andreas Kugi, Patrik Zips

发表机构 * Center for Vision, Automation and Control(视觉、自动化与控制中心) AIT Austrian Institute of Technology GmbH(奥地利技术研究所) Automation and Control Institute(自动化与控制研究所) Technische Universität Wien(维也纳技术大学)

AI总结 提出ADAPT自主叉车,结合AI感知与经典方法,在非结构化建筑工地实现近人类水平的物流操作,提升安全与效率。

详情
AI中文摘要

高效的物料物流在控制建筑行业的成本和进度中起着关键作用。然而,人工物料搬运仍然容易出现效率低下、延误和安全风险。自主叉车提供了一种有前景的解决方案,以简化现场物流,减少对人类操作员的依赖并缓解劳动力短缺。本文介绍了ADAPT(自主动态全地形托盘运输车)的开发与评估,这是一种专为建筑环境设计的全自主越野叉车。与结构化的仓库环境不同,建筑工地面临重大挑战,包括动态障碍物、非结构化地形和多变的天气条件。为应对这些挑战,我们的系统将AI驱动的感知技术与传统的决策、规划和控制方法相结合,实现了在复杂环境中的可靠操作。我们通过广泛的真实世界测试验证了该系统,并在各种天气条件下将其连续性能与经验丰富的人类操作员进行了比较。我们的研究结果表明,自主户外叉车可以达到接近人类水平的性能,为更安全、更高效的建筑物流提供了一条可行路径。

英文摘要

Efficient material logistics play a critical role in controlling costs and schedules in the construction industry. However, manual material handling remains prone to inefficiencies, delays, and safety risks. Autonomous forklifts offer a promising solution to streamline on-site logistics, reducing reliance on human operators and mitigating labor shortages. This paper presents the development and evaluation of ADAPT (Autonomous Dynamic All-terrain Pallet Transporter), a fully autonomous off-road forklift designed for construction environments. Unlike structured warehouse settings, construction sites pose significant challenges, including dynamic obstacles, unstructured terrain, and varying weather conditions. To address these challenges, our system integrates AI-driven perception techniques with traditional approaches for decision making, planning, and control, enabling reliable operation in complex environments. We validate the system through extensive real-world testing, comparing its continuous performance against an experienced human operator across various weather conditions. Our findings demonstrate that autonomous outdoor forklifts can operate near human-level performance, offering a viable path toward safer and more efficient construction logistics.

2503.19947 2026-06-15 cs.CV cs.AI 版本更新

Vanishing Depth: Training Generalized Depth Adapters with Sinusoidal Depth Preprocessing for Pretrained RGB Encoders

消失深度:基于正弦深度预处理的预训练RGB编码器通用深度适配器训练

Paul Koch, Jörg Krüger

发表机构 * Fraunhofer IPK(弗劳恩霍夫研究所) TU-Berlin(技术大学柏林)

AI总结 提出自监督训练方法,为预训练RGB编码器添加深度适配器,结合正弦深度编码实现通用鲁棒的深度特征提取,在分割、姿态估计和深度补全等下游任务中提升基线性能,SUN-RGBD分割达56.05 mIoU。

Comments Accepted to IntelliSys 2026

详情
AI中文摘要

通用度量深度理解对于精确的视觉引导机器人技术至关重要,而当前最先进的视觉编码器不支持这一点。为解决此问题,我们提出一种自监督训练方法,为预训练RGB编码器扩展一个深度适配器,将度量深度纳入并对齐到组合潜在空间中,同时不干扰预训练的RGB特征提取。结合我们的正弦深度编码,深度适配器实现了通用且鲁棒的深度密度和分布不变特征提取。我们的深度适配器在分割、姿态估计和深度补全等一系列相关RGBD下游任务中,提升了一组通用RGB基线的性能,而无需微调。最重要的是,我们在SUN-RGBD分割中达到了56.05 mIoU,同时在实验中优于最先进的深度感知和多模态编码器。当没有深度信息时,可以使用空地图激活深度适配器,利用单像素深度线索或单目深度估计,将深度感知特征提取纳入后续下游任务。

英文摘要

Generalized metric depth understanding is critical for precise vision-guided robotics, which current state-of-the-art (SOTA) vision-encoders do not support. To address this, we propose a self-supervised training approach that extends pretrained RGB encoders with a depth adapter to incorporate and align metric depth into a combined latent space without interfering with the pretrained RGB feature extraction. In combination with our sinusoidal depth encoding, the depth adapter enables generalized and robust depth density and distribution invariant feature extraction. Our depth adapters improve a wide set of generalized RGB baselines across a spectrum of relevant RGBD downstream tasks in segmentation, pose estimation, and depth completion -- without the necessity of finetuning. Most importantly, we achieve 56.05 mIoU in the SUN-RGBD segmentation, while outperforming SOTA depth-aware and multi-modal encoders in our experiments. When no depth is present, one can activate our depth adapter with an empty map, use single pixel depth clues, or monocular depth estimation to include the depth aware feature extraction into subsequent downstream tasks.

2410.00722 2026-06-15 cs.LG math.AG 版本更新

On the Geometry and Optimization of Polynomial Convolutional Networks

多项式卷积网络的几何与优化

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

发表机构 * KTH Royal Institute of Technology(皇家理工学院)

AI总结 研究使用单项式激活函数的卷积神经网络,证明其参数化映射是正则且几乎处处同构,通过代数几何方法计算神经流形的维数和度,并量化回归损失优化中临界点的数量。

Comments Accepted at AISTATS 2025. New version: corrected Section 4.2

详情
AI中文摘要

我们研究了使用单项式激活函数的卷积神经网络。具体来说,我们证明了它们的参数化映射是正则的,并且在几乎处处(除了滤波器重缩放)是同构的。通过利用代数几何的工具,我们探索了该映射在函数空间中的像(通常称为神经流形)的几何性质。特别地,我们计算了神经流形的维数和度,这衡量了模型的表达能力,并描述了其奇点。此外,对于一般的较大数据集,我们推导出一个显式公式,量化了回归损失优化中出现的临界点数量。

英文摘要

We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.

2405.14154 2026-06-15 cs.RO 版本更新

Cross-Stage Sensorimotor Perception Scheduling and Sparse Map Encoding for Efficient Edge Embodied Navigation

跨阶段感知运动调度与稀疏地图编码用于高效边缘具身导航

Yaotian Liu, Sri Sai Rakesh Nakkilla, Xiangyu Zhou, Yu Cao, Jeff Zhang

发表机构 * Arizona State University(亚利桑那州立大学) University of Minnesota(明尼苏达大学)

AI总结 针对边缘设备上具身导航的延迟和内存瓶颈,提出SKIP调度器(基于安全跳过准则)和SCOUT稀疏编码器,实现1.7倍加速、50.5%内存降低和7.1% SPL提升。

Comments 9 pages, 6 figures

详情
AI中文摘要

具身智能体必须在严格的延迟、内存和能量预算下,在嵌入式硬件上完成从感知到动作的闭环,这使得部署成为一个系统级协同设计问题,而非模型精度问题。我们针对模块化目标导航(ObjectNav)研究了这一挑战,其中我们的性能分析显示语义地图构建主导了每步延迟,而目标预测主导了峰值内存。我们将边缘具身导航部署形式化为一个预算约束的设计空间问题,并引入了两个正交优化旋钮:SKIP,一种自适应感知运动调度器,将安全跳过形式化为有界地图影响准则,并学习一个轻量级预测器,在每个FORWARD步骤中从廉价传感器线索估计该准则,暴露了一个原则性的质量-效率旋钮(基于深度的更新始终保留);以及SCOUT,一种稀疏上下文编码器,将活动地图区域上的子流形稀疏卷积与轻量级密集上下文流相结合。在HM3D上,在服务器和嵌入式平台上,SKIP+SCOUT在选定操作点相比密集基线实现了高达1.7倍的端到端加速、50.5%的峰值内存降低和7.1%的SPL提升,优于朴素的小型感知骨干网络。SKIP可迁移到第二个模块化流水线(PONI),性能几乎无损,并且在深度传感器噪声下保持鲁棒。SKIP+SCOUT共同为边缘物理AI系统揭示了一系列设备感知的帕累托操作点。

英文摘要

Embodied agents must close a perception-to-action loop on embedded hardware under tight latency, memory, and energy budgets, making deployment a system-level co-design problem rather than a model-accuracy problem. We study this challenge for modular Object Goal Navigation (ObjectNav), where our profiling shows semantic mapping dominates per-step latency while goal prediction dominates peak memory. We formulate edge embodied navigation deployment as a budget-constrained design-space problem and introduce two orthogonal optimization knobs: SKIP, an adaptive sensorimotor scheduler that formalizes safe skipping as a bounded map-impact criterion and learns a lightweight predictor to estimate it from cheap sensor cues at each \texttt{FORWARD} step, exposing a principled quality-efficiency knob (depth-based updates are always retained); and SCOUT, a sparse-context encoder that couples submanifold sparse convolutions on active map regions with a lightweight dense context stream. On HM3D across server and embedded platforms, SKIP+SCOUT delivers up to 1.7x end-to-end speedup, 50.5% lower peak memory, and 7.1% higher SPL than the dense baseline at the selected operating point, outperforming naively smaller perception backbones. SKIP transfers to a second modular pipeline (PONI) with near-lossless performance and remains robust under depth-sensor noise. Together, SKIP+SCOUT expose a family of device-aware Pareto operating points for edge physical AI systems.

2412.03716 2026-06-15 cs.LG cs.CY 版本更新

A Water Efficiency Dataset for African Data Centers

非洲数据中心用水效率数据集

Noah Shumba, Opelo Tshekiso, Pengfei Li, Giulia Fanti, Shaolei Ren

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Carnegie Mellon University Africa Kigali Rwanda(卡内基梅隆大学非洲分校,基亚利,卢旺达) Rochester Institute of Technology(罗切斯特理工学院) Rochester New York USA(罗切斯特,纽约州,美国) Carnegie Mellon University Pittsburgh Pennsylvania USA(卡内基梅隆大学匹兹堡,宾夕法尼亚州,美国) University of California, Riverside(加州大学河滨分校)

AI总结 构建首个结合天气与发电数据的非洲41国数据中心用水效率数据集,评估Llama-3-70B和GPT-4推理用水量,发现多数非洲国家用水低于全球平均。

Comments Accepted by NeurIPS 2024 Workshop on Tackling Climate Change with Machine Learning

详情
AI中文摘要

人工智能计算和数据中心消耗大量淡水,既直接用于冷却,也间接用于发电。尽管大多数关注集中在发达国家如美国,本文首次提出了一个结合国家层面天气和发电数据的数据集,用于估算非洲41个国家(跨越五个不同气候区域)的数据中心用水效率。我们还利用该数据集评估和估算了在11个选定的非洲国家中,两个大型语言模型(即Llama-3-70B和GPT-4)推理的用水量。我们的估算表明,使用Llama-3-70B撰写一份10页的报告可能消耗多达0.66升水,而GPT-4完成相同任务可能消耗高达约59升水。对于撰写一封120-200词的中等长度电子邮件,Llama-3-70B和GPT-4可能分别消耗约0.13升和2.9升水。所有生成模型推理任务的数字均基于我们最初准备分析时的2024年公开信息。自那时起,AI推理系统已大幅改进。例如,最近披露的信息表明,2024年5月至2025年5月期间,能效提高了30倍以上。因此,我们2024年的估算应被解释为历史参考值,而非代表当前性能。有趣的是,对于相同的AI模型,11个选定的非洲国家中有9个的用水量低于全球平均水平,这主要是由于其发电的用水强度较低。

英文摘要

Artificial intelligence (AI) computing and data centers consume large amounts of freshwater, both directly for cooling and indirectly for electricity generation. While most attention has been paid to developed countries such as the U.S., this paper presents the first-of-its-kind dataset that combines nation-level weather and electricity generation data to estimate water usage effectiveness for data centers in 41 African countries across five different climate regions. We also use our dataset to evaluate and estimate the water consumption of inference on two large language models (i.e., Llama-3-70B and GPT-4) in 11 selected African countries. Our estimates suggest that writing a 10-page report using Llama-3-70B could consume as much as {0.66 liters} of water, while the water consumption by GPT-4 for the same task may go up to about {59 liters}. For writing a medium-length email of 120-200 words, Llama-3-70B and GPT-4 could consume about {0.13 liters} and {2.9 liters} of water, respectively. All the numbers for generative model inference tasks are based on public information available in 2024, when we initially prepared the analysis. Since then, AI inference systems have improved substantially. For example, recent disclosures suggest that energy efficiency improved by more than 30x between May 2024 and May 2025. Accordingly, our 2024 estimates should be interpreted as historical reference values rather than as representative of current performance. Interestingly, given the same AI model, 9 of the 11 selected African countries consume less water than the global average, mainly because of lower water intensities for electricity generation.

2207.03116 2026-06-15 cs.LG math.GR 版本更新

Equivariant Representation Learning via Class-Pose Decomposition

通过类-姿态分解的等变表示学习

Giovanni Luca Marchetti, Gustaf Tegnér, Anastasiia Varava, Danica Kragic

发表机构 * School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology(电气工程与计算机科学学院,皇家理工学院)

AI总结 提出一种将潜在空间分解为不变因子和对称群的方法,利用相对对称信息监督学习等变表示,实现无损、可解释且解耦的表示,实验优于其他等变表示学习框架。

Comments 12 pages

详情
AI中文摘要

我们提出了一种通用的学习方法,用于学习对数据对称性等变的表示。我们的核心思想是将潜在空间分解为一个不变因子和对称群本身。这些组件在语义上分别对应于内在数据类别和姿态。学习器基于相对对称信息的监督,在鼓励等变性的损失上进行训练。该方法受到群论理论结果的启发,并保证了表示是无损、可解释且解耦的。我们通过涉及多种对称性数据集的实验进行了实证研究。结果表明,我们的表示捕捉了数据的几何结构,并优于其他等变表示学习框架。

英文摘要

We introduce a general method for learning representations that are equivariant to symmetries of data. Our central idea is to decompose the latent space into an invariant factor and the symmetry group itself. The components semantically correspond to intrinsic data classes and poses respectively. The learner is trained on a loss encouraging equivariance based on supervision from relative symmetry information. The approach is motivated by theoretical results from group theory and guarantees representations that are lossless, interpretable and disentangled. We provide an empirical investigation via experiments involving datasets with a variety of symmetries. Results show that our representations capture the geometry of data and outperform other equivariant representation learning frameworks.

2606.05264 2026-06-15 cs.LG

REGEN: Reference-Guided Synthetic Multivariate Time Series Generation for Forecasting

REGEN:参考引导的合成多元时间序列生成用于预测

Moulik Gupta, Dhruv Kumar, Murari Mandal, Saurabh Deshpande

发表机构 * Birla AI Labs, Office of Ananya Birla(Birla AI实验室,Ananya Birla办公室) Birla Institute of Technology and Science, Pilani(Birla理工学院与科学学院,Pilani) Kalinga Institute of Industrial Technology, Bhubaneswar(Kalinga工业技术学院,Bhubaneswar)

AI总结 提出参考引导生成管道ReGeN,通过将观测序列分解为周期骨干、随机残差和跨变量依赖三个可解释组件,实现可控合成,在低数据场景下生成的数据可替代真实数据并提升预测性能。

详情
AI中文摘要

训练鲁棒的多元时间序列预测模型需要大规模、多样化的语料库,然而许多现实领域仅提供少量观测序列。现有生成器无法解决这种不匹配:基于先验的方法(如CauKer、TimePFN)产生领域无关的样本,而数据驱动方法(如TimeGAN)将参考视为黑盒监督,丧失了对周期结构、局部变异和跨变量动态的显式控制。我们提出ReGeN,一种参考引导的生成管道,将观测序列视为可控合成的结构支架而非模仿示例。ReGeN将每个参考分解为三个可解释组件:捕获主导领域形态的相位对齐周期骨干;使用深核高斯过程建模的每变量随机残差;以及通过具有拟合耦合系数的结构因果模型注入的滞后感知跨变量依赖。以可控温度采样这些组件可拓宽分布覆盖,同时保留领域基础结构。我们表明,ReGeN生成的数据始终能替代真实兄弟数据,且预测性能下降极小,在交通等强周期领域中甚至能超越真实源数据。我们进一步表明,在ReGeN语料库上预训练的基础模型优于在基于先验和数据驱动的合成替代方案上预训练的模型。这表明,在低数据场景下,如何结构性利用参考数据可能与数据量同样重要。

英文摘要

Training robust multivariate time series forecasting models requires large, diverse corpora, yet many real-world domains provide only a handful of observed sequences. Existing generators fail to resolve this mismatch: prior-based approaches (e.g., CauKer, TimePFN) produce domain-agnostic samples, while data-driven methods (e.g., TimeGAN) treat references as black-box supervision, forfeiting explicit control over periodic structure, local variability, and cross-variable dynamics. We propose ReGeN, a reference-guided generative pipeline that treats observed sequences not as examples to imitate, but as structural scaffolds for controllable synthesis. ReGeN decomposes each reference into three interpretable components: a phase-aligned periodic backbone capturing dominant domain morphology; per-variable stochastic residuals modeled with a deep-kernel Gaussian process; and lag-aware cross-variable dependencies injected through a structural causal model with fitted coupling coefficients. Sampling these components at controllable temperature broadens distributional coverage while preserving domain-grounded structure. We show that ReGeN-generated data consistently substitutes for real sibling data with minimal forecasting degradation, and in strongly periodic domains such as traffic, can outperform the real source itself. We further show that a foundation model pretrained on ReGeN corpora outperforms those pretrained on prior-based and data-driven synthetic alternatives. This suggests that in low-data regimes, how reference data is structurally exploited can matter as much as how much data is available.

2210.00379 2026-06-15 cs.CV

NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review (Updated Post-Gaussian Splatting)

NeRF: 3D视觉中的神经辐射场:全面综述(更新后Gaussian Splatting发布后)

Kyle Gao, Yina Gao, Hongjie He, Dening Lu, Linlin Xu, Jonathan Li

发表机构 * Faculty of Engineering, University of Toronto(多伦多大学工程学院) Department of Geomatics Engineering, University of Calgary(卡尔加里大学测绘工程系)

AI总结 本文综述了NeRF在过去五年的研究,涵盖了在Gaussian Splatting出现前和后的发展,总结了NeRF在新颖视角合成和3D隐式和混合表示神经场学习中的应用和贡献。

Comments Updated Post-Gaussian Splatting

详情
AI中文摘要

2020年3月,神经辐射场(NeRF)革新了计算机视觉,使隐式、基于神经网络的场景表示和新颖视角合成成为可能。NeRF模型在机器人、城市测绘、自动驾驶导航、虚拟现实/增强现实等领域找到了广泛的应用。2023年8月,Gaussian Splatting作为一种直接竞争对手被提出,获得了巨大的势头,并在新颖视角合成领域超越了基于NeRF的研究,成为主导框架。本文综述了过去五年(2020-2025)的NeRF相关论文。这些论文包括Gaussian Splatting出现前的时期,当时NeRF在新颖视角合成和3D隐式和混合表示神经场学习中占据主导地位。我们还包含Gaussian Splatting出现后的作品,其中NeRF和隐式/混合神经场找到了更多小众应用。我们的综述分为Gaussian Splatting出现前的架构和应用分类,以及NeRF、神经场和隐式/混合神经表示方法的活跃研究领域分类。我们介绍了NeRF的理论及其通过可微体积渲染进行训练的介绍。我们还提供了经典NeRF、隐式和混合神经表示以及神经场模型的性能和速度的基准比较,并概述了关键数据集。

英文摘要

In March 2020, Neural Radiance Field (NeRF) revolutionized Computer Vision, allowing for implicit, neural network-based scene representation and novel view synthesis. NeRF models have found diverse applications in robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, and more. In August 2023, Gaussian Splatting, a direct competitor to the NeRF-based framework, was proposed, gaining tremendous momentum and overtaking NeRF-based research in terms of interest as the dominant framework for novel view synthesis. We present a comprehensive survey of NeRF papers from the past five years (2020-2025). These include papers from the pre-Gaussian Splatting era, where NeRF dominated the field for novel view synthesis and 3D implicit and hybrid representation neural field learning. We also include works from the post-Gaussian Splatting era where NeRF and implicit/hybrid neural fields found more niche applications. Our survey is organized into architecture and application-based taxonomies in the pre-Gaussian Splatting era, as well as a categorization of active research areas for NeRF, neural field, and implicit/hybrid neural representation methods. We provide an introduction to the theory of NeRF and its training via differentiable volume rendering. We also present a benchmark comparison of the performance and speed of classical NeRF, implicit and hybrid neural representation, and neural field models, and an overview of key datasets.

2508.08935 2026-06-15 cs.LG cs.AI

LNN-PINN: A Unified Physics-Only Training Framework with Liquid Residual Blocks

LNN-PINN: 一种带有液体残差块的统一纯物理训练框架

Ze Tao, Hanxuan Wang, Fujun Liu

发表机构 * Nanophotonics and Biophotonics Key Laboratory of Jilin Province, School of Physics, Changchun University of Science and Technology(吉林省纳米光子与生物光子重点实验室,物理学院,长春理工大学) Faculty of Chinese Medicine, Macau University of Science and Technology(澳门科技大学中医药学院)

AI总结 针对物理信息神经网络在复杂问题中预测精度有限的问题,提出LNN-PINN框架,通过引入液体残差门控架构提升预测精度,并在多个基准问题上验证了其有效性和稳定性。

Journal ref Computer Physics Communications, 326, 110237 (2026)

详情
AI中文摘要

物理信息神经网络(PINNs)因其能够将偏微分方程先验知识整合到深度学习框架中而受到广泛关注;然而,在应用于复杂问题时,它们通常表现出有限的预测精度。为了解决这一问题,我们提出了LNN-PINN,一种物理信息神经网络框架,它结合了液体残差门控架构,同时保留原始的物理建模和优化流程以提高预测精度。该方法仅在隐藏层映射中引入轻量级门控机制,保持采样策略、损失组成和超参数设置不变,以确保改进纯粹来自架构优化。在四个基准问题上,LNN-PINN在相同训练条件下持续降低了RMSE和MAE,绝对误差图进一步证实了其精度提升。此外,该框架在不同维度、边界条件和算子特性下表现出强大的适应性和稳定性。总之,LNN-PINN为提升物理信息神经网络在复杂科学和工程问题中的预测精度提供了一种简洁有效的架构增强方法。

英文摘要

Physics-informed neural networks (PINNs) have attracted considerable attention for their ability to integrate partial differential equation priors into deep learning frameworks; however, they often exhibit limited predictive accuracy when applied to complex problems. To address this issue, we propose LNN-PINN, a physics-informed neural network framework that incorporates a liquid residual gating architecture while preserving the original physics modeling and optimization pipeline to improve predictive accuracy. The method introduces a lightweight gating mechanism solely within the hidden-layer mapping, keeping the sampling strategy, loss composition, and hyperparameter settings unchanged to ensure that improvements arise purely from architectural refinement. Across four benchmark problems, LNN-PINN consistently reduced RMSE and MAE under identical training conditions, with absolute error plots further confirming its accuracy gains. Moreover, the framework demonstrates strong adaptability and stability across varying dimensions, boundary conditions, and operator characteristics. In summary, LNN-PINN offers a concise and effective architectural enhancement for improving the predictive accuracy of physics-informed neural networks in complex scientific and engineering problems.

2507.09011 2026-06-15 cs.CL q-bio.NC q-bio.QM

From dots to faces: Individual differences in visual imagery capacity predict the content of Ganzflicker-induced hallucinations

从点到面孔:个体在视觉意象能力上的差异预测Ganzflicker诱导的幻觉内容

Ana Chkhaidze, Reshanne R. Reeder, Connor Gag, Anastasia Kiyonaga, Seana Coulson

发表机构 * Department of Cognitive Science, University of California, San Diego (USA)(加州大学圣地亚哥分校认知科学系) Department of Psychology, Institute of Population Health, University of Liverpool (UK)(利物浦大学心理学系、人口健康研究所) Department of Computer Science, University of California, San Diego (USA)(加州大学圣地亚哥分校计算机科学系) Kavli Institute for Brain and Mind, San Diego (USA)(圣地亚哥脑与心智研究所)

AI总结 研究探讨视觉意象能力差异如何影响Ganzflicker诱导的幻觉内容,通过自然语言处理分析4000多人的描述,发现高意象者描述复杂自然内容,低意象者报告简单几何图案。

详情
AI中文摘要

一种快速交替的红黑显示称为Ganzflicker,会引发反映视觉系统生成能力的视觉幻觉。个体在视觉意象程度上存在差异,从无到生动不等。最近的提议认为,这种意象光谱中的视觉系统差异也应影响其他内部生成视觉体验的复杂性。在这里,我们利用自然语言处理工具分析超过4000名参与者的自由文本描述,探讨不同意象表型的人在Ganzflicker诱导的幻觉中是否看到不同内容。主题建模显示,强意象者描述复杂、自然的内容,而弱意象者报告简单几何图案。使用众包的传感器运动规范,我们还发现,强意象者使用更丰富的感知关联语言。这些发现可能反映个体在早期视觉区域和与意象光谱相关的更高阶区域之间协调性的差异。

英文摘要

A rapidly alternating red and black display known as Ganzflicker induces visual hallucinations that reflect the generative capacity of the visual system. Individuals vary in their degree of visual imagery, ranging from absent to vivid imagery. Recent proposals suggest that differences in the visual system along this imagery spectrum should also influence the complexity of other internally generated visual experiences. Here, we used tools from natural language processing to analyze free-text descriptions of hallucinations from over 4,000 participants, asking whether people with different imagery phenotypes see different things in their mind's eye during Ganzflicker-induced hallucinations. Topic modeling of descriptions revealed that strong imagers described complex, naturalistic content, while weak imagers reported simple geometric patterns. Using crowd-sourced sensorimotor norms, we also found that participants with stronger imagery used language with richer perceptual associations. These findings may reflect individual variation in coordination between early visual areas and higher-order regions relevant for the imagery spectrum.

2507.06174 2026-06-15 cs.RO cs.AI cs.SY eess.SY

Design and Experimental Validation of Sensorless 4-Channel Bilateral Teleoperation for Low-Cost Manipulators

无传感器四通道双侧远程操控的设计与实验验证用于低成本机械臂

Koki Yamane, Yunhan Li, Masashi Konosu, Koki Inami, Junji Oaki, Toshiaki Tsuji, Sho Sakaino

发表机构 * Degree Programs in Intelligent and Mechanical Interaction Systems, University of Tsukuba(智能与机械交互系统专业,东京大学) Faculty of Engineering, Information and Systems, University of Tsukuba(工程、信息与系统学部,东京大学) Department of Electrical Engineering, Electronics, and Applied Physics, Saitama University(电子工程、电子学与应用物理系,埼玉大学)

AI总结 本文提出了一种无传感器四通道双侧远程操控框架,结合非线性动力学补偿与基于观测器的扰动估计方案,实验证明在低成本硬件限制下可实现稳定的高速接触密集场景远程操控,并提升模仿学习任务的成功率。

Comments 22 pages, 12 figures, Submitted to IEEE Access

详情
AI中文摘要

远程操控低成本机械臂正逐渐成为收集模仿学习演示数据的实用手段。然而,现有大多数低成本系统依赖单侧位置控制无力反馈,而实现力反馈双侧远程操控困难,因为低成本机械臂通常具有低分辨率编码器和无关节扭矩传感器。本文提出了一种无传感器四通道双侧远程操控框架,整合了识别的非线性动力学补偿与基于扰动观测器的速度和外部力估计方案。通过在频域中解释观测器结构,我们澄清了速度和外部力估计带宽之间的耦合,并基于阻尼比和单个截止频率推导了实用的调谐指南。实车实验,包括力传感器比较和远程操控任务,证明所提出的框架提供了实用的力估计,并在低成本硬件限制下实现了高速和接触密集场景下的稳定远程操控。作为应用,模仿学习实验表明,将估计的力信息纳入演示中可提高测试接触密集操作任务的任务成功率。

英文摘要

Teleoperation of low-cost manipulators is attracting increasing attention as a practical means of collecting demonstration data for imitation learning. However, most existing low-cost systems rely on unilateral position control without force feedback, while implementing force-feedback bilateral teleoperation is difficult because low-cost manipulators typically have low-resolution encoders and no joint torque sensors. This paper presents a sensorless 4-channel bilateral teleoperation framework that integrates identified nonlinear dynamics compensation with a disturbance-observer-based velocity and external-force estimation scheme. By interpreting the observer structure in the frequency domain, we clarify the coupling between the velocity- and external-force-estimation bandwidths and derive practical tuning guidelines based on the damping ratio and a single cutoff frequency. Real-robot experiments, including force-sensor comparison and teleoperation tasks, demonstrate that the proposed framework provides practically useful force estimates and enables stable teleoperation in high-speed and contact-rich scenarios under low-cost hardware constraints. As an application, imitation-learning experiments demonstrate that incorporating estimated force information into demonstrations improves task success rates in the tested contact-rich manipulation tasks.

2602.13040 2026-06-15 cs.LG

TCRL: Temporal-Coupled Adversarial Training for Robust Constrained Reinforcement Learning in Worst-Case Scenarios

TCRL: 时序耦合对抗训练用于最坏情况下的鲁棒约束强化学习

Wentao Xu, Zhongming Yao, Weihao Li, Zhenghang Song, Yumeng Song, Tianyi Li, Yushuai Li

发表机构 * Northeastern University(东北大学) Zhejiang University(浙江大学) Aalborg University(奥胡斯大学)

AI总结 TCRL通过引入时序耦合对抗训练框架,解决传统方法在处理时序耦合扰动时的不足,提升约束强化学习在最坏情况下的鲁棒性。

Journal ref Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems, 3489 - 3491, 2026

详情
AI中文摘要

约束强化学习(CRL)旨在在约束条件下优化决策策略,广泛应用于自动驾驶、机器人和电网管理等安全关键领域。然而,现有鲁棒CRL方法主要关注单步扰动和时间独立对抗模型,缺乏对时间耦合扰动的显式建模。为此,我们提出TCRL,一种新的时序耦合对抗训练框架,用于最坏情况下的鲁棒约束强化学习。首先,TCRL引入了一个最坏情况感知的成本约束函数,用于估计在时间耦合扰动下的安全成本,无需显式建模对抗攻击者。其次,TCRL在奖励上建立双约束防御机制,以对抗时间耦合对手的同时保持奖励的不可预测性。实验结果表明,TCRL在多种CRL任务中均在对抗时间耦合扰动攻击的鲁棒性方面优于现有方法。

英文摘要

Constrained Reinforcement Learning (CRL) aims to optimize decision-making policies under constraint conditions, making it highly applicable to safety-critical domains such as autonomous driving, robotics, and power grid management. However, existing robust CRL approaches predominantly focus on single-step perturbations and temporally independent adversarial models, lacking explicit modeling of robustness against temporally coupled perturbations. To tackle these challenges, we propose TCRL, a novel temporal-coupled adversarial training framework for robust constrained reinforcement learning (TCRL) in worst-case scenarios. First, TCRL introduces a worst-case-perceived cost constraint function that estimates safety costs under temporally coupled perturbations without the need to explicitly model adversarial attackers. Second, TCRL establishes a dual-constraint defense mechanism on the reward to counter temporally coupled adversaries while maintaining reward unpredictability. Experimental results demonstrate that TCRL consistently outperforms existing methods in terms of robustness against temporally coupled perturbation attacks across a variety of CRL tasks.

2512.19805 2026-06-15 cs.LG stat.ME

Guardrailed Uplift Targeting: A Causal Optimization Playbook for Marketing Strategy

受保护的提升目标:营销策略的因果优化指南

Deepit Sapru

发表机构 * Deepit Sapru

AI总结 本文提出一个优化客户定向的营销决策框架,结合异质处理效应估计与明确业务保护规则,旨在最大化收入和留存同时遵守预算、收入保护和客户体验等约束。

详情
AI中文摘要

本文介绍了一个营销决策框架,通过整合异质处理效应估计与明确业务保护规则来优化客户定向。目标是在遵守预算、收入保护和客户体验等约束条件下最大化收入和留存。该框架首先使用提升学习器估计条件平均处理效应(CATE),然后解决一个受约束的分配问题以决定针对谁以及部署哪种优惠。该框架支持留存信息、活动奖励和支出阈值分配的决策。通过离线模拟和在线A/B测试验证,该方法一致优于倾向和静态基线,提供了一个可重复使用的因果定向大规模应用指南。

英文摘要

This paper introduces a marketing decision framework that optimizes customer targeting by integrating heterogeneous treatment effect estimation with explicit business guardrails. The objective is to maximize revenue and retention while adhering to constraints such as budget, revenue protection, and customer experience. The framework first estimates Conditional Average Treatment Effects (CATE) using uplift learners, then solves a constrained allocation problem to decide whom to target and which offer to deploy. It supports decisions in retention messaging, event rewards, and spend-threshold assignment. Validated through offline simulations and online A/B tests, the approach consistently outperforms propensity and static baselines, offering a reusable playbook for causal targeting at scale.

2512.20932 2026-06-15 cs.LG cs.AI

Guardrailed Elasticity Pricing: A Churn-Aware Forecasting Playbook for Subscription Strategy

受约束的弹性定价:面向订阅策略的 churn 意识预测指南

Deepit Sapru

发表机构 * Deepit Sapru

AI总结 本文提出一个动态定价框架,结合多变量需求预测、分段价格弹性及 churn 预测,以优化收入和留存。通过季节性模型与树状学习器,解决受约束优化问题,提升 SaaS 产品组合的定价效果,同时保障客户体验与伦理约束。

详情
AI中文摘要

本文提出一个营销分析框架,将订阅定价作为动态、受约束的决策系统,结合多变量需求预测、分段层面的价格弹性及 churn 可能性,以优化收入、利润率和留存。该方法融合季节性时间序列模型与树状学习器,运行蒙特卡洛情景测试以映射风险范围,并解决受约束优化问题,以确保客户体验、利润率底线和允许的 churn。在异质 SaaS 产品组合中经过验证,该方法持续优于静态层级和统一提升,通过将价格变动重新分配给愿意支付更多费用的分段,同时保护价格敏感的群体。系统通过模块化 API 实现实时重新校准,并包含模型可解释性以满足治理和合规需求。从管理角度看,该框架作为策略指南,明确何时从固定定价转向动态定价,如何将定价与客户生命周期价值(CLV)和每月 recurring 收入(MRR)目标对齐,以及如何嵌入伦理约束,从而实现可持续增长而不损害客户信任。

英文摘要

This paper presents a marketing analytics framework that operationalizes subscription pricing as a dynamic, guardrailed decision system, uniting multivariate demand forecasting, segment-level price elasticity, and churn propensity to optimize revenue, margin, and retention. The approach blends seasonal time-series models with tree-based learners, runs Monte Carlo scenario tests to map risk envelopes, and solves a constrained optimization that enforces business guardrails on customer experience, margin floors, and allowable churn. Validated across heterogeneous SaaS portfolios, the method consistently outperforms static tiers and uniform uplifts by reallocating price moves toward segments with higher willingness-to-pay while protecting price-sensitive cohorts. The system is designed for real-time recalibration via modular APIs and includes model explainability for governance and compliance. Managerially, the framework functions as a strategy playbook that clarifies when to shift from flat to dynamic pricing, how to align pricing with CLV and MRR targets, and how to embed ethical guardrails, enabling durable growth without eroding customer trust.

2601.08334 2026-06-15 cs.LG

Automated Machine Learning in Radiomics: A Comparative Evaluation of Performance, Efficiency and Accessibility

医学影像组学中的自动化机器学习:性能、效率和可及性的比较评估

Jose Lozano-Montoya, Emilio Soria-Olivas, Almudena Fuster-Matanzo, Angel Alberich-Bayarri, Ana Jimenez-Pastor

发表机构 * University of Valencia(瓦伦西亚大学) Research & Frontiers in AI Department, Quantitative Imaging Biomarkers in Medicine, Quibim SL(研究与前沿人工智能部门、定量影像生物标志物在医学中的应用、Quibim SL) Intelligent Data Analysis Laboratory, IDAL, University of Valencia(智能数据分析实验室,IDAL,瓦伦西亚大学)

AI总结 本文比较了通用和专用自动化机器学习框架在医学影像组学分类任务中的性能、效率和可及性,发现专用工具在性能上表现最佳,而通用框架在易用性上更优,但存在生存分析支持不足和特征可重复性整合不足等问题。

Comments 27 pages, 4 figures, 3 tables, code available, see https://github.com/joselznom/AutoML-Comparison-in-Radiomics

Journal ref JMIR Form Res. 2026;10:e91492

详情
AI中文摘要

自动化机器学习(AutoML)框架通过使没有编程经验的研究人员能够构建模型,降低了预测和预后模型开发在影像组学中的技术障碍。然而,其在解决影像组学特定挑战的有效性仍不明确。本研究评估了通用和专用AutoML框架在多样化的影像组学分类任务中的性能、效率和可及性,从而突出影像组学的发展需求。使用了十个公共/私人影像组学数据集,涵盖多种成像模态(CT/MRI)、大小、解剖结构和终点。通过预定义参数使用标准化交叉验证测试了六个通用和五个专用框架。评估指标包括AUC、运行时间,以及与软件状态、可及性和可解释性相关的定性方面。Simplatab,一个具有无代码界面的专用工具,实现了最高的平均测试AUC(81.81%)和中等运行时间(约1小时)。LightAutoML,一个通用框架,展示了最快的执行速度,性能(6分钟内平均AUC为78.74%)具有竞争力。大多数专用框架由于过时、编程需求大或计算效率低而被排除在性能分析之外。相反,通用框架在可及性和易用性上表现更优。Simplatab为影像组学分类问题提供了性能、效率和可及性的有效平衡。然而,仍存在显著差距,包括缺乏可及的生存分析支持以及当前AutoML框架中特征可重复性和和谐整合的有限整合。未来研究应聚焦于调整AutoML解决方案以更好地解决这些影像组学特定挑战。

英文摘要

Automated machine learning (AutoML) frameworks can lower technical barriers for predictive and prognostic model development in radiomics by enabling researchers without programming expertise to build models. However, their effectiveness in addressing radiomics-specific challenges remains unclear. This study evaluates the performance, efficiency, and accessibility of general-purpose and radiomics-specific AutoML frameworks on diverse radiomics classification tasks, thereby highlighting development needs for radiomics. Ten public/private radiomics datasets with varied imaging modalities (CT/MRI), sizes, anatomies and endpoints were used. Six general-purpose and five radiomics-specific frameworks were tested with predefined parameters using standardized cross-validation. Evaluation metrics included AUC, runtime, together with qualitative aspects related to software status, accessibility, and interpretability. Simplatab, a radiomics-specific tool with a no-code interface, achieved the highest average test AUC (81.81%) with a moderate runtime (~1 hour). LightAutoML, a general-purpose framework, showed the fastest execution with competitive performance (78.74% mean AUC in six minutes). Most radiomics-specific frameworks were excluded from the performance analysis due to obsolescence, extensive programming requirements, or computational inefficiency. Conversely, general-purpose frameworks demonstrated higher accessibility and ease of implementation. Simplatab provides an effective balance of performance, efficiency, and accessibility for radiomics classification problems. However, significant gaps remain, including the lack of accessible survival analysis support and the limited integration of feature reproducibility and harmonization within current AutoML frameworks. Future research should focus on adapting AutoML solutions to better address these radiomics-specific challenges.

2511.17637 2026-06-15 cs.LG cs.CL

PocketLLM: Ultimate Compression of Large Language Models via Meta Networks

PocketLLM: 通过元网络实现大语言模型的终极压缩

Ye Tian, Chengcheng Wang, Jing Han, Yehui Tang, Kai Han

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出PocketLLM,通过元网络在潜在空间压缩大语言模型,利用编码器和解码器实现高效压缩,实验表明在高压缩比下仍保持高精度。

Comments AAAI 2026 camera ready

Journal ref Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33250-33258 (2026)

详情
AI中文摘要

随着大语言模型(LLMs)的持续增长,将其存储和传输到边缘设备变得越来越具有挑战性。传统方法如量化和剪枝在不牺牲精度的情况下难以实现极端压缩。本文介绍了一种新的压缩方法PocketLLM,通过元网络在潜在空间中压缩LLMs。提出一个简单的编码器网络,将LLMs的权重投影到离散的潜在向量中,然后使用紧凑的代码本进行表示。轻量级的解码器网络用于将代码本的代表性向量映射回原始权重空间。该方法仅需一个小解码器、简洁的代码本和一个索引即可实现LLMs中大权重的显著压缩。大量实验表明,PocketLLM在显著的压缩比下仍能保持优越的性能,例如将Llama 2-7B压缩10倍,精度损失微不足道。

英文摘要

As Large Language Models (LLMs) continue to grow in size, storing and transmitting them on edge devices becomes increasingly challenging. Traditional methods like quantization and pruning struggle to achieve extreme compression of LLMs without sacrificing accuracy. In this paper, we introduce PocketLLM, a novel approach to compress LLMs in a latent space via meta-networks. A simple encoder network is proposed to project the weights of LLMs into discrete latent vectors, which are then represented using a compact codebook. A lightweight decoder network is employed to map the codebook's representative vectors back to the original weight space. This method allows for significant compression of the large weights in LLMs, consisting solely of a small decoder, a concise codebook, and an index. Extensive experiments show that PocketLLM achieves superior performance even at significantly high compression ratios, e.g., compressing Llama 2-7B by 10x with a negligible drop in accuracy.

2412.00123 2026-06-15 cs.LG math.PR

Electricity Price Prediction Using Multi-Kernel Gaussian Process Regression Combined with Kernel-Based Support Vector Regression

利用多核高斯过程回归与核支持向量回归预测电力价格

Abhinav Das, Stephan Schlüter, Lorenz Schneider

发表机构 * Faculty of Mathematics and Economics, Ulm University(数学与经济学学院,乌尔姆大学) Institute of Energy Engineering and Energy Economics, Ulm University of Applied Sciences(能源工程与能源经济学研究所,应用科学大学乌尔姆) Emlyon Business School, Lyon, France(埃默里昂商学院,法国里昂)

AI总结 本文提出一种新的混合模型用于预测德国电力价格,结合高斯过程回归和支持向量回归,通过选择合适的数据依赖协方差函数提升GPR性能,并利用支持向量回归处理非线性过程和异常值,实验表明优于现有基准模型。

Journal ref Journal of Forecasting (2026) 45, no. 4: 2059:2077

详情
AI中文摘要

本文提出了一种新的混合模型用于预测德国电力价格。该算法基于高斯过程回归(GPR)和支持向量回归(SVR)的结合。尽管GPR在学习数据中的随机模式和插值方面表现良好,但其在样本外数据的预测性能并不理想。通过选择合适的数据依赖协方差函数,可以增强GPR对德国小时电力价格的预测性能。然而,由于样本外预测依赖于训练数据,预测容易受到噪声和异常值的影响。为了解决这个问题,通过SVR进行单独预测,该方法应用基于边界的优化。这种方法在处理非线性过程和异常值时具有优势,因为只有训练数据中的某些必要点(支持向量)负责回归。然后通过均匀权重线性组合个体预测。在测试历史德国电力价格时,该方法优于公开可用的基准,即LASSO估计的自回归回归模型以及最近研究中提供的深度神经网络。

英文摘要

This paper presents a new hybrid model for predicting German electricity prices. The algorithm is based on a combination of Gaussian Process Regression (GPR) and Support Vector Regression (SVR). Although GPR is a competent model for learning stochastic patterns within data and for interpolation, its performance for out-of-sample data is not very promising. By choosing a suitable data-dependent covariance function, we can enhance the performance of GPR for the German hourly power prices being tested. However, since the out-of-sample prediction is dependent on the training data, the prediction is vulnerable to noise and outliers. To overcome this issue, a separate prediction is calculated using SVR, which applies margin-based optimization. This method is advantageous when dealing with non-linear processes and outliers, since only certain necessary points (support vectors) in the training data are responsible for regression. The individual predictions are then linearly combined using uniform weights. When tested on historic German power prices, this approach outperforms the publicly available benchmarks, namely the LASSO estimated autoregressive regression model, deep neural network provided in the recent research by [1].

2506.18271 2026-06-15 cs.LG

Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models

具有长时上下文处理能力的大型语言模型记忆增强架构

Haseeb Ullah Khan Shinwari, Muhammad Usama

发表机构 * Newton AI Lab(牛顿AI实验室) School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院电气工程学院)

AI总结 本文提出一种记忆增强架构,通过动态检索、更新和剪枝过去交互信息,提升大型语言模型的长时上下文处理能力,实验表明该方法能有效提高上下文连贯性、降低内存开销并提升响应质量。

Journal ref IEEE Transactions on Artificial Intelligence, 2026

详情
AI中文摘要

大型语言模型在维护长对话中的一致性交互时面临显著挑战,由于其有限的上下文记忆能力,导致对话碎片化和响应相关性降低,影响用户体验。为解决这些问题,我们提出了一种记忆增强的架构,该架构能够动态地从过去交互中检索、更新和剪枝相关信息,从而确保有效的长时上下文处理。实验结果表明,我们的解决方案显著提高了上下文连贯性,减少了内存开销,并增强了响应质量,展示了其在交互系统中的实时应用潜力。

英文摘要

Large Language Models face significant challenges in maintaining coherent interactions over extended dialogues due to their limited contextual memory. This limitation often leads to fragmented exchanges and reduced relevance in responses, diminishing user experience. To address these issues, we propose a memory-augmented architecture that dynamically retrieves, updates, and prunes relevant information from past interactions, ensuring effective long-term context handling. Experimental results demonstrate that our solution significantly improves contextual coherence, reduces memory overhead, and enhances response quality, showcasing its potential for real-time applications in interactive systems.

2506.00784 2026-06-15 cs.CL

Research Borderlands: Analysing Writing Across Research Cultures

研究边疆:跨研究文化中的写作分析

Shaily Bhatt, Tal August, Maria Antoniak

发表机构 * Language Technologies Institute, Carnegie Mellon University(卡内基梅隆大学语言技术研究所) Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校Siebel计算与数据科学学院) Pioneer Centre for AI, University of Copenhagen(哥本哈根大学先锋人工智能中心) Allen Institute for Artificial Intelligence (Ai2)(人工智能研究院)

AI总结 本文通过跨文化访谈,揭示研究文化中的语言规范,并评估LLM的文化适应能力,强调人类中心方法在测量文化规范中的有效性。

Comments Accepted to ACL 2025 (Main)

详情
AI中文摘要

提升语言技术的文化能力至关重要。然而,大多数最新研究很少与所研究的社区互动,而是依赖合成设置和不完美的文化代理。本文采用以人类为中心的方法,发现并测量基于语言的文化规范以及LLM的文化能力。我们专注于一种文化——研究文化,以及一种任务——在不同研究文化中适应写作。通过与跨学科研究者进行访谈,这些专家擅长在不同文化间转换,我们创建了结构、风格、修辞和引用规范的框架,这些规范在不同研究文化中有所不同。我们通过一组计算度量来操作这些特征,并用于(a)大规模揭示人类写作研究论文中的潜在文化规范;以及(b)突出LLM缺乏文化能力,以及其倾向于同质化写作的倾向。总体而言,我们的工作展示了以人类为中心的方法在测量人类写作和LLM生成文本中的文化规范的有效性。

英文摘要

Improving cultural competence of language technologies is important. However most recent works rarely engage with the communities they study, and instead rely on synthetic setups and imperfect proxies of culture. In this work, we take a human-centered approach to discover and measure language-based cultural norms, and cultural competence of LLMs. We focus on a single kind of culture, research cultures, and a single task, adapting writing across research cultures. Through a set of interviews with interdisciplinary researchers, who are experts at moving between cultures, we create a framework of structural, stylistic, rhetorical, and citational norms that vary across research cultures. We operationalise these features with a suite of computational metrics and use them for (a) surfacing latent cultural norms in human-written research papers at scale; and (b) highlighting the lack of cultural competence of LLMs, and their tendency to homogenise writing. Overall, our work illustrates the efficacy of a human-centered approach to measuring cultural norms in human-written and LLM-generated texts.

2506.09087 2026-06-15 cs.LG math.PR q-bio.NC stat.ML

Spiking Neural Models for Decision-Making Tasks with Learning

基于学习的脉冲神经模型用于决策任务

Sophie Jaffard, Giulia Mezzadri, Patricia Reynaud-Bouret, Etienne Tanré

发表机构 * Cognition and Decision Lab, Columbia University(认知与决策实验室,哥伦比亚大学)

AI总结 本文提出一种生物合理性的脉冲神经网络模型,结合学习机制和多变量Hawkes过程,用于决策任务,通过耦合DDM与Poisson计数器模型,推导出带有相关噪声的DDM,并设计在线分类任务验证模型预测。

详情
AI中文摘要

在认知领域,决策任务中的响应时间和选择通常用漂移扩散模型(DDMs)建模,该模型将决策证据的累积描述为随机过程,特别是布朗运动,其中漂移速率反映证据强度。同样,泊松计数器模型将证据累积描述为离散事件,其计数随时间建模为泊松过程,并可解释为神经元活动。然而,这些模型缺乏学习机制且局限于参与者已知类别任务。为弥合认知与生物模型之间的差距,本文提出一种生物合理性的脉冲神经网络(SNN)模型,用于决策任务,该模型包含学习机制,其神经元活动由多变量Hawkes过程建模。首先,我们证明了DDM与泊松计数器模型之间的耦合结果,表明这两个模型提供相似的分类和响应时间,并且DDM可近似由脉冲泊松神经元建模。为进一步推进,我们证明了一个具有相关噪声的特定DDM可从由局部学习规则支配的脉冲神经元Hawkes网络中推导出来。此外,我们设计了一个在线分类任务来评估模型预测。本文为将生物相关神经机制整合到认知模型中提供了重要进展,促进了对神经活动与行为之间关系的深入理解。

英文摘要

In cognition, response times and choices in decision-making tasks are commonly modeled using Drift Diffusion Models (DDMs), which describe the accumulation of evidence for a decision as a stochastic process, specifically a Brownian motion, with the drift rate reflecting the strength of the evidence. In the same vein, the Poisson counter model describes the accumulation of evidence as discrete events whose counts over time are modeled as Poisson processes, and has a spiking neurons interpretation as these processes are used to model neuronal activities. However, these models lack a learning mechanism and are limited to tasks where participants have prior knowledge of the categories. To bridge the gap between cognitive and biological models, we propose a biologically plausible Spiking Neural Network (SNN) model for decision-making that incorporates a learning mechanism and whose neurons activities are modeled by a multivariate Hawkes process. First, we show a coupling result between the DDM and the Poisson counter model, establishing that these two models provide similar categorizations and reaction times and that the DDM can be approximated by spiking Poisson neurons. To go further, we show that a particular DDM with correlated noise can be derived from a Hawkes network of spiking neurons governed by a local learning rule. In addition, we designed an online categorization task to evaluate the model predictions. This work provides a significant step toward integrating biologically relevant neural mechanisms into cognitive models, fostering a deeper understanding of the relationship between neural activity and behavior.