赞 0 踩 0

2510.10779 2026-06-18 cs.CV 版本更新

Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans

结构化谱图表示学习用于3D CT扫描的多标签异常分析

Theo Di Piazza, Carole Lazarus, Olivier Nempont, Loic Boussel

发表机构 * INSA Lyon, University of Lyon, CNRS, INSERM, CREATIS UMR 5220, U1294（里昂国立应用科学学院、里昂大学、国家科学研究中心、法国国家医学研究院、CREATIS UMR 5220、U1294）

AI总结提出一种基于谱图卷积的2.5D框架，将3D CT体积表示为结构化图，通过轴向切片三元组节点建模层间依赖，实现多标签异常分类，跨数据集泛化性能强。

Comments Accepted at MELBA Journal 2026

详情

DOI: 10.59275/j.melba.2026-87e3

AI中文摘要

随着CT检查数量的增长，对器官分割、异常检测和报告生成等自动化工具的需求日益增加，以支持放射科医生管理临床工作负载。由于三维数据中固有的复杂空间关系和异常的广泛变异性，3D胸部CT扫描的多标签分类仍然是一个关键但具有挑战性的问题。基于3D卷积神经网络的现有方法难以捕捉长距离依赖，而视觉Transformer通常需要在大规模领域特定数据集上进行大量预训练才能获得竞争力。在这项工作中，我们提出了一种2.5D替代方案，引入了一个新的基于图的框架，将3D CT体积表示为结构化图，其中轴向切片三元组作为节点，通过谱图卷积处理，使模型能够推理层间依赖，同时保持与临床部署兼容的复杂度。我们的方法在来自独立机构的3个数据集上进行训练和评估，实现了强大的跨数据集泛化能力，并与最先进的视觉编码器相比表现出竞争性能。我们进一步进行了全面的消融研究，以评估各种聚合策略、边加权方案和图连接模式的影响。此外，我们通过自动放射学报告生成和腹部CT数据的迁移实验展示了我们方法的更广泛适用性。

英文摘要

With the growing volume of CT examinations, there is an increasing demand for automated tools such as organ segmentation, abnormality detection, and report generation to support radiologists in managing their clinical workload. Multi-label classification of 3D Chest CT scans remains a critical yet challenging problem due to the complex spatial relationships inherent in volumetric data and the wide variability of abnormalities. Existing methods based on 3D convolutional neural networks struggle to capture long-range dependencies, while Vision Transformers often require extensive pre-training on large-scale, domain-specific datasets to perform competitively. In this work, we propose a 2.5D alternative by introducing a new graph-based framework that represents 3D CT volumes as structured graphs, where axial slice triplets serve as nodes processed through spectral graph convolution, enabling the model to reason over inter-slice dependencies while maintaining complexity compatible with clinical deployment. Our method, trained and evaluated on 3 datasets from independent institutions, achieves strong cross-dataset generalization, and shows competitive performance compared to state-of-the-art visual encoders. We further conduct comprehensive ablation studies to evaluate the impact of various aggregation strategies, edge-weighting schemes, and graph connectivity patterns. Additionally, we demonstrate the broader applicability of our approach through transfer experiments on automated radiology report generation and abdominal CT data.

URL PDF HTML ☆

赞 0 踩 0

2602.11467 2026-06-18 cs.LG 版本更新

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

PRISM：一种用于可解释形状建模的三维概率神经表示

Yining Jiao, Sreekalyani Bhamidi, Carlton Jude Zdanski, Julia S Kimbell, Andrew Prince, Cameron P Worden, Samuel Kirse, Christopher Rutter, Benjamin H Shields, Jisan Mahmud, Marc Niethammer

发表机构 * Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA（北卡罗来纳大学教堂山分校计算机科学系）； Department of Computer Science, University of California San Diego, La Jolla, USA（加州大学圣地亚哥分校计算机科学系）； School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, USA（北卡罗来纳大学教堂山分校医学院）

AI总结提出PRISM框架，结合隐式神经表示与不确定性感知统计形状分析，通过封闭形式Fisher信息度量实现高效局部时间不确定性量化，在形状演化、个性化预测和异常检测任务中表现优异。

Comments ICML 2026, camera-ready version, 24 pages

详情

AI中文摘要

理解解剖形状如何响应发育协变量而演变——并量化其空间变化的不确定性——在医疗保健研究中至关重要。现有方法通常依赖于忽略空间异质性动态的全局时间扭曲公式。我们引入PRISM，一种新颖的框架，将隐式神经表示与不确定性感知统计形状分析相结合。PRISM建模给定协变量下形状的条件分布，提供总体均值和协变量依赖不确定性在任意位置的空间连续估计。一个关键的理论贡献是封闭形式的Fisher信息度量，通过自动微分实现高效、解析可处理的局部时间不确定性量化。在三个合成数据集和一个临床数据集上的实验表明，PRISM在统一框架内从建模形状演化到个性化形状预测和异常检测等多样化任务中表现出色，同时提供可解释且临床有意义的不确定性估计。

英文摘要

Understanding how anatomical shapes evolve in response to developmental covariates - and quantifying their spatially varying uncertainties - is critical in healthcare research. Existing approaches typically rely on global time-warping formulations that ignore spatially heterogeneous dynamics. We introduce PRISM, a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis. PRISM models the conditional distribution of shapes given covariates, providing spatially continuous estimates of both the population mean and covariate-dependent uncertainty at arbitrary locations. A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification via automatic differentiation. Experiments on three synthetic datasets and one clinical dataset demonstrate PRISM's strong performance across diverse tasks - from modeling shape evolution to personalized shape prediction and anomaly detection - within a unified framework, while providing interpretable and clinically meaningful uncertainty estimates.

URL PDF HTML ☆

赞 0 踩 0

2602.09234 2026-06-18 cs.LG cs.AI 版本更新

Do Neural Networks Lose Plasticity in a Gradually Changing World?

神经网络在渐变世界中会失去可塑性吗？

Tianhui Liu, Lili Mou

发表机构 * Dept. Computing Science \& Alberta Machine Intelligence Institute (Amii), University of Alberta ； Canada CIFAR AI Chair

AI总结研究任务转换的突然性对神经网络可塑性损失的影响，通过输入/输出插值和任务采样模拟渐变环境，理论和实验表明可塑性损失严重程度与任务转换突然性密切相关，渐变环境下可显著减轻。

2602.07544 2026-06-18 cs.CV 版本更新

MUFASA: A Multi-Layer Framework for Slot Attention

MUFASA: 一种用于槽注意力的多层框架

Sebastian Bock, Leonie Schüßler, Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

发表机构 * TU Darmstadt（图宾根大学）； Zuse School ELIZA（泽尼特学校ELIZA）

AI总结提出MUFASA，一种轻量级即插即用框架，通过跨ViT编码器多层计算槽注意力并融合，提升无监督对象中心学习的分割性能，达到新最优。

Comments CVPR 2026. Authors Sebastian Bock and Leonie Schüßler contributed equally. Project page: https://visinf.github.io/mufasa/

详情

AI中文摘要

无监督对象中心学习（OCL）将视觉场景分解为不同的实体。槽注意力是一种流行的方法，将单个对象表示为潜在向量，称为槽。当前方法仅从预训练视觉变换器（ViT）的最后一层获取这些槽表示，忽略了跨其他层编码的宝贵、语义丰富的信息。为了更好地利用这些潜在语义信息，我们引入了MUFASA，一种用于基于槽注意力的无监督对象分割方法的轻量级即插即用框架。我们的模型跨ViT编码器的多个特征层计算槽注意力，充分利用其语义丰富性。我们提出了一种融合策略，将在多个层上获得的槽聚合成统一的以对象为中心的表示。将MUFASA集成到现有的OCL方法中，提高了它们在多个数据集上的分割结果，在仅增加少量推理开销的同时，建立了新的最先进水平并改善了训练收敛性。

英文摘要

Unsupervised object-centric learning (OCL) decomposes visual scenes into distinct entities. Slot attention is a popular approach that represents individual objects as latent vectors, called slots. Current methods obtain these slot representations solely from the last layer of a pre-trained vision transformer (ViT), ignoring valuable, semantically rich information encoded across the other layers. To better utilize this latent semantic information, we introduce MUFASA, a lightweight plug-and-play framework for slot-attention-based approaches to unsupervised object segmentation. Our model computes slot attention across multiple feature layers of the ViT encoder, fully leveraging their semantic richness. We propose a fusion strategy to aggregate slots obtained on multiple layers into a unified object-centric representation. Integrating MUFASA into existing OCL methods improves their segmentation results across multiple datasets, setting a new state of the art while simultaneously improving training convergence with only minor inference overhead.

URL PDF HTML ☆

赞 0 踩 0

2602.06774 2026-06-18 cs.AI 版本更新

基于噪声条件频率暴露的扩散逆问题后验延续

Feng Tian, Yixuan Li, Weili Zeng, Weitian Zhang, Yichao Yan, Xiaokang Yang

发表机构 * Shanghai Jiao Tong University（上海交通大学）

AI总结提出后验延续框架，根据扩散噪声水平逐步暴露测量频率，结合稳定采样器实现超分辨率、修复和去模糊的先进性能。

详情

AI中文摘要

扩散后验采样通过将预训练的扩散先验与测量一致性指导相结合来解决逆问题。然而，在高噪声水平下，全频带指导可能不可靠，因为干净估计包含分数诱导误差，且高频测量方向弱可识别。我们认为后验指导应根据瞬时扩散噪声水平暴露测量频率。基于这一原则，我们提出一个后验延续框架，构建一系列中间后验，其似然强调当前可靠频带并逐渐恢复全频带一致性。我们通过一个稳定采样器实例化该框架，该采样器结合了扩散预测器、频率受限似然细化以及Haar域承诺规则，该规则提交可靠粗校正同时推迟弱可识别细节。在超分辨率、修复和去模糊任务中，我们的方法实现了具有竞争力乃至最先进的恢复性能，包括在FFHQ和ImageNet评估中，运动去模糊相比强基线PSNR提升高达5 dB。

英文摘要

Diffusion posterior sampling solves inverse problems by combining a pretrained diffusion prior with measurement-consistency guidance. However, full-band guidance can be unreliable at high noise levels, where clean estimates contain score-induced errors and high-frequency measurement directions are weakly identifiable. We argue that posterior guidance should expose measurement frequencies according to the instantaneous diffusion noise level. Based on this principle, we propose a posterior continuation framework that constructs a family of intermediate posteriors whose likelihood emphasizes currently reliable frequency bands and gradually returns to full-band consistency. We instantiate this framework with a stabilized sampler that combines a diffusion predictor, frequency-limited likelihood refinement, and a Haar-domain commitment rule that commits reliable coarse corrections while deferring weakly identifiable details. Across super-resolution, inpainting, and deblurring, our method achieves competitive-to-state-of-the-art restoration performance, including up to 5 dB PSNR improvement on motion deblurring over strong baselines in evaluations on FFHQ and ImageNet.

URL PDF HTML ☆

赞 0 踩 0

2602.00161 2026-06-18 cs.LG cs.AI cs.CL quant-ph 版本更新

LLM Compression by Block Removal with Constrained Binary Optimization

通过带约束二进制优化的块移除进行LLM压缩

David Jansen, Roman Rausch, Ali Hashemi, David Montero, Román Orús

发表机构 * Multiverse Computing（多维计算公司）； Donostia International Physics Center（多斯蒂亚国际物理中心）； Ikerbasque Foundation for Science（伊克尔巴斯克科学基金会）

AI总结提出将大语言模型块移除压缩问题建模为约束二进制优化，映射到Ising玻璃系统，实现高效排序和高质量非连续块移除，在50%压缩时MMLU提升近23个百分点，且计算高效、通用性强。

Comments 16 pages, 3 figures

详情

AI中文摘要

在本文中，我们将通过最优删除Transformer块（“块移除”）来压缩大语言模型（LLM）的问题，表述为一个约束二进制优化（CBO）问题，该问题可以映射到物理系统（Ising玻璃），其能量是下游模型性能的强代理。这种表述使得能够高效地对大量候选块移除配置进行排序，产生许多高质量、非平凡的解决方案，而不仅仅是移除连续区域。我们的方法在深度压缩场景中表现强劲，例如在Llama-3.3-70B-Instruct的50%压缩中，与其他最先进的块移除方法相比，我们在MMLU基准上取得了近23个百分点的提升。对于较轻的压缩，它在多个基准上与这些方法表现相当，适用于Llama-3.1-8B-Instruct、Qwen3-14B（重训练前后）以及Llama-3.3-70B-Instruct。该方法计算效率高，仅需在校准数据集上对少数活跃参数进行前向和反向传播。此外，我们证明，当无法精确求解CBO问题时，使用良好的启发式求解器可以在可忽略的运行时间内提供在下游任务上表现良好的解决方案。该方法可以轻松应用于任何架构。我们在最近的NVIDIA-Nemotron-3-Nano-30B-A3B-FP8模型上展示了这种通用性，该模型具有高度不均匀且具有挑战性的块结构，并且在移除2个注意力层或3个混合专家层时，我们在AIME25和GPQA上超越了最先进水平。

英文摘要

In this paper, we formulate the compression of large language models (LLMs) by optimally deleting transformer blocks (``block removal'') as a constrained binary optimization (CBO) problem that can be mapped to a physical system (Ising glass), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations yielding many high-quality, non-trivial solutions beyond those only removing consecutive regions. Our method performs strongly in the deep compression regime, such as for 50% compression of Llama-3.3-70B-Instruct, where we achieve an almost 23 percentage point increase on the MMLU benchmark compared to other state-of-the-art (SOTA) block-removal methods. For lighter compression, it performs on par with those methods across several benchmarks for Llama-3.1-8B-Instruct, Qwen3-14B (both before and after retraining), as well as Llama-3.3-70B-Instruct. The approach is computationally efficient and requires only forward and backward passes on a calibration dataset for a few active parameters. Additionally, we demonstrate that using good heuristic solvers for the CBO problem provides solutions that perform well on downstream tasks in negligible runtime when it is unfeasible to solve the problem exactly. The method can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure, and where we outperform SOTA for AIME25 and GPQA when removing either 2 attention layers or 3 mixture-of-experts layers.

URL PDF HTML ☆

赞 0 踩 0

2511.20002 2026-06-18 cs.CV cs.AI cs.CR 版本更新

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

语义路由器：通过单一对抗扰动劫持多模态大语言模型的可行性研究

Changyue Li, Jiaying Li, Youliang Yuan, Jiaming He, Zhicong Huang, Pinjia He

发表机构 * The Chinese University of Hong Kong, Shenzhen, China（香港中文大学（深圳））； School of Data Science, School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen, China（数据科学学院、人工智能学院、香港中文大学（深圳））

AI总结提出语义感知通用扰动（SAUP），作为语义路由器同时劫持多个无状态决策，通过理论分析和SORT优化策略实现，在Qwen上对五个目标达到66%攻击成功率。

Comments Accepted to ICML 2026

详情

AI中文摘要

多模态大语言模型（MLLMs）越来越多地部署在无状态系统中，例如自动驾驶和机器人技术。本文研究了一种新型威胁：语义感知劫持。我们探索了使用单一通用扰动同时劫持多个无状态决策的可行性。我们引入了语义感知通用扰动（SAUP），它充当语义路由器，“主动”感知输入语义并将其路由到不同的、攻击者定义的目标。为了实现这一点，我们对潜在空间中的几何特性进行了理论和实证分析。在这些见解的指导下，我们提出了语义导向（SORT）优化策略，并标注了一个具有细粒度语义的新数据集以评估性能。在三个代表性MLLM上的大量实验证明了这种攻击的基本可行性，在针对Qwen的五个目标上使用单帧实现了66%的攻击成功率。

英文摘要

Multimodal Large Language Models (MLLMs) are increasingly deployed in stateless systems, such as autonomous driving and robotics. This paper investigates a novel threat: Semantic-Aware Hijacking. We explore the feasibility of hijacking multiple stateless decisions simultaneously using a single universal perturbation. We introduce the Semantic-Aware Universal Perturbation (SAUP), which acts as a semantic router, "actively" perceiving input semantics and routing them to distinct, attacker-defined targets. To achieve this, we conduct theoretical and empirical analysis on the geometric properties in the latent space. Guided by these insights, we propose the Semantic-Oriented (SORT) optimization strategy and annotate a new dataset with fine-grained semantics to evaluate performance. Extensive experiments on three representative MLLMs demonstrate the fundamental feasibility of this attack, achieving a 66% attack success rate over five targets using a single frame against Qwen.

URL PDF HTML ☆

赞 0 踩 0

2601.21626 2026-06-18 cs.LG cs.AI 版本更新

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

HeRo-Q: 通过Hessian条件化实现稳定低比特量化的通用框架

Jinhao Zhang, Yunquan Zhang, Zicheng yan, Boyang Zhang, Jun Sun, Daning Cheng

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Science and Technology of China（中国科学技术大学）； Zhejiang Lab（浙江实验室）； Peng Cheng Laboratory（鹏城实验室）

AI总结针对后训练量化中“低误差、高损失”的矛盾，提出HeRo-Q算法，通过轻量可学习的旋转压缩矩阵重塑损失景观，降低最大Hessian特征值，增强对量化噪声的鲁棒性，在Llama和Qwen模型上优于现有方法。

详情

AI中文摘要

后训练量化（PTQ）是一种主流的模型压缩技术，但由于其仅专注于最小化量化误差，常常导致矛盾的“低误差、高损失”现象。根本原因在于LLM损失景观的Hessian矩阵：少数高曲率方向对扰动极其敏感。为了解决这个问题，我们提出了Hessian鲁棒量化（HeRo Q）算法，该算法在量化前对权重空间应用一个轻量级、可学习的旋转压缩矩阵。这个联合框架通过降低最大的Hessian特征值并减小其最大特征值来重塑损失景观，从而显著增强对量化噪声的鲁棒性。HeRo-Q不需要修改架构，计算开销可忽略不计，并且可以无缝集成到现有的PTQ流程中。在Llama和Qwen模型上的实验表明，HeRo Q在标准W4A8设置下不仅持续优于包括GPTQ、AWQ和SpinQuant在内的最先进方法，而且在极具挑战性的W3A16超低比特场景中表现出色，将Llama3 8B在GSM8K上的准确率提升至70.15%，并有效避免了激进量化中常见的逻辑崩溃。

英文摘要

Post Training Quantization (PTQ), a mainstream model compression technique, often leads to the paradoxical 'low error, high loss' phenomenon because it focuses solely on minimizing quantization error. The root cause lies in the Hessian matrix of the LLM loss landscape: a few high curvature directions are extremely sensitive to perturbations. To address this, we propose the Hessian Robust Quantization (HeRo Q) algorithm, which applies a lightweight, learnable rotation-compression matrix to the weight space prior to quantization. This joint framework reshapes the loss landscape by reducing the largest Hessian eigenvalue and reducing its max eigenvalue, thereby significantly enhancing robustness to quantization noise. HeRo-Q requires no architectural modifications, incurs negligible computational overhead, and integrates seamlessly into existing PTQ pipelines. Experiments on Llama and Qwen models show that HeRo Q consistently outperforms state of the art methods including GPTQ, AWQ, and SpinQuant not only achieving superior performance under standard W4A8 settings, but also excelling in the highly challenging W3A16 ultra low bit regime, where it boosts GSM8K accuracy on Llama3 8B to 70.15\% and effectively avoids the logical collapse commonly seen in aggressive quantization.

URL PDF HTML ☆

赞 0 踩 0

2601.20381 2026-06-18 cs.RO 版本更新

KEPLA：一种用于精确预测蛋白质-配体结合亲和力的知识增强深度学习框架

Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang

发表机构 * Department of Computer Science, City University of Hong Kong（香港城市大学计算机科学系）； ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University（浙江大学杭州国际科技创新中心）； School of Software, Shandong University（山东大学软件学院）； College of Informatics, Harbin Institute of Technology (Shenzhen)（哈尔滨工业大学（深圳）计算机学院）

AI总结提出KEPLA框架，通过整合基因本体和配体属性的先验知识，利用全局表示对齐与局部交叉注意力，提升蛋白质-配体结合亲和力预测的准确性，在多个基准数据集上超越现有方法。

详情

AI中文摘要

准确预测蛋白质-配体结合亲和力对药物发现至关重要。尽管最近的深度学习方法已展现出有希望的结果，但它们通常仅依赖蛋白质和配体的结构特征，忽略了与结合亲和力相关的宝贵生化知识。为解决这一局限，我们提出KEPLA，一种新颖的深度学习框架，明确整合来自基因本体和配体属性的先验知识以增强预测性能。KEPLA以蛋白质序列和配体分子图作为输入，并优化两个互补目标：（1）将全局表示与知识图谱关系对齐，以捕获领域特定的生化见解；（2）利用局部表示之间的交叉注意力构建细粒度联合嵌入用于预测。在两个基准数据集上的域内和跨域场景实验表明，KEPLA始终优于最先进的基线方法。此外，基于知识图谱关系和交叉注意力图的可解释性分析为潜在的预测机制提供了有价值的见解。

英文摘要

Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2601.14968 2026-06-18 cs.LG cs.AI 版本更新

RSLCPP——使用ROS 2进行确定性仿真

Simon Sagmeister, Marcel Weinmann, Phillip Pitschi, Markus Lienkamp

发表机构 * Technical University of Munich, Germany（慕尼黑技术大学）； School of Engineering & Design, Department of Mobility Systems Engineering, Institute of Automotive Technology（工程与设计学院，移动系统工程系，汽车技术研究所）； School of Engineering & Design, Department of Engineering Physics and Computation, Institute of Automatic Control（工程与设计学院，工程物理与计算系，自动控制研究所）

AI总结针对ROS异步多进程设计导致仿真结果不可复现的问题，提出RSLCPP库，通过确定性回调执行实现跨平台可复现仿真，无需修改现有节点代码。

Comments Accepted for publication at the 'IEEE Robotics and Automation Practice'

详情

DOI: 10.1109/RAP.2026.3704080

AI中文摘要

仿真在现实机器人技术中至关重要，为开发各种机器人应用提供了安全、可扩展且高效的环境。虽然机器人操作系统（ROS）在学术界和工业界已被广泛采用作为这些机器人应用的基础，但其异步、多进程的设计使得复现变得复杂，尤其是在不同的硬件平台上。当计算时间和通信延迟变化时，无法保证确定性回调执行。这种缺乏复现性的问题给科学基准测试和持续集成带来了困难，因为在这些场景中一致的结果至关重要。为了解决这个问题，我们提出了一种使用ROS 2节点创建确定性仿真的方法。我们的ROS仿真库（RSLCPP）实现了这种方法，使得现有节点可以组合成一个产生可复现结果的仿真例程，通常无需更改任何源代码。我们证明，在测试合成基准测试和真实机器人系统时，我们的方法在各种CPU和架构上产生相同的结果。RSLCPP已开源，网址为：https://this https URL。

英文摘要

Simulation is crucial in real-world robotics, offering safe, scalable, and efficient environments for developing a variety of robotic applications. While the Robot Operating System (ROS) has been widely adopted as the backbone of these robotic applications in both academia and industry, its asynchronous, multi-process design complicates reproducibility, especially across varying hardware platforms. Deterministic callback execution cannot be guaranteed when computation times and communication delays vary. This lack of reproducibility complicates scientific benchmarking and continuous integration, where consistent results are essential. To address this, we present a methodology to create deterministic simulations using ROS 2 nodes. Our ROS Simulation Library for C++ (RSLCPP) implements this approach, enabling existing nodes to be combined into a simulation routine that yields reproducible results, usually without requiring any source code changes. We demonstrate that our approach produces identical results across various CPUs and architectures when testing both a synthetic benchmark and a real-world robotics system. RSLCPP is open-sourced at https://github.com/TUMFTM/rslcpp.

URL PDF HTML ☆

赞 0 踩 0

2501.02874 2026-06-18 cs.RO 版本更新

Bench-Push：基于推动的移动机器人导航与操作任务基准测试

Ninghan Zhong, Steven Caro, Megnath Ramesh, Rishi Bhatnagar, Avraiem Iskandar, Stephen L. Smith

发表机构 * Institute for Robotics and Intelligent Machines, Georgia Institute of Technology（机器人与智能机器研究所，佐治亚理工学院）； Department of Electrical and Computer Engineering, University of Waterloo（电气与计算机工程系，滑铁卢大学）； Department of Mechanical Engineering, University of Alberta（机械工程系，阿尔伯塔大学）

AI总结提出首个统一的推动式移动机器人导航与操作基准Bench-Push，包含多种模拟环境、新评估指标和基线实现，用于解决可移动障碍物环境中的机器人推动任务评估问题。

Comments Published in CRV 2026

详情

AI中文摘要

移动机器人越来越多地部署在具有可移动物体的杂乱环境中，这对禁止交互的传统方法提出了挑战。在这种环境中，移动机器人必须超越传统的避障策略，利用推动或轻推策略来实现其目标。尽管基于推动的机器人研究正在增长，但评估依赖于临时设置，限制了可重复性和交叉比较。为了解决这个问题，我们提出了Bench-Push，这是首个用于基于推动的移动机器人导航和操作任务的统一基准。Bench-Push包括多个组件：1）一系列全面的模拟环境，捕捉推动任务中的基本挑战，包括在具有可移动障碍物的迷宫中导航、自主船舶在冰覆盖水域中导航、箱子递送和区域清理，每个任务都有不同复杂程度；2）新的评估指标，用于捕捉效率、交互努力和部分任务完成；3）使用Bench-Push评估跨环境的已建立基线的示例实现。Bench-Push作为Python库开源，采用模块化设计。代码、文档和训练模型可在https://this URL找到。

英文摘要

Mobile robots are increasingly deployed in cluttered environments with movable objects, posing challenges for traditional methods that prohibit interaction. In such settings, the mobile robot must go beyond traditional obstacle avoidance, leveraging pushing or nudging strategies to accomplish its goals. While research in pushing-based robotics is growing, evaluations rely on ad hoc setups, limiting reproducibility and cross-comparison. To address this, we present Bench-Push, the first unified benchmark for pushing-based mobile robot navigation and manipulation tasks. Bench-Push includes multiple components: 1) a comprehensive range of simulated environments that capture the fundamental challenges in pushing-based tasks, including navigating a maze with movable obstacles, autonomous ship navigation in ice-covered waters, box delivery, and area clearing, each with varying levels of complexity; 2) novel evaluation metrics to capture efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-Push to evaluate example implementations of established baselines across environments. Bench-Push is open-sourced as a Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation

Structured Spectral Graph Representation Learning for Multi-label Abnormality Analysis from 3D CT Scans

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

Do Neural Networks Lose Plasticity in a Gradually Changing World?

MUFASA: A Multi-Layer Framework for Slot Attention

Towards Understanding What State Space Models Learn About Code

Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition

Tilt-Ropter: A Fully Actuated Hybrid Aerial-Terrestrial Vehicle with Tilt Rotors and Passive Wheels

SuperCarver: Texture-Consistent 3D Geometry Super-Resolution for High-Fidelity Surface Detail Generation

Posterior Continuation with Noise-Conditioned Frequency Exposure for Diffusion Inverse Problems

LLM Compression by Block Removal with Constrained Binary Optimization

Semantic Router: On the Feasibility of Hijacking MLLMs via a Single Adversarial Perturbation

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning

STORM: Slot-based Task-aware Object-centric Representation for robotic Manipulation

TINNs: Time-Induced Neural Networks for Solving Time-Dependent PDEs

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Retelling

KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving

RSLCPP -- Deterministic Simulations Using ROS 2

Steering Flexible Linear Objects in Planar Environments by Two Robot Hands Using Euler's Elastica Solutions

Robust and Efficient MuJoCo-based Model Predictive Control via Web of Affine Spaces Derivatives

Efficient Zeroth-Order Federated Finetuning of Language Models on Resource-Constrained Devices

Odyssey: An Automotive Lidar-Inertial Odometry Dataset with GNSS-denied situations

Beyond the Linear Separability Ceiling: Aligning Representations in VLMs

Bench-Push: Benchmarking Pushing-based Navigation and Manipulation Tasks for Mobile Robots