arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.19864 2026-06-19 cs.CL 新提交

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

近乎智能的革命：扩大审议规模并利用AI赋能人类的选项

Serge Sharoff

发表机构 * Centre on Participatory and Deliberative Democracy（参与性和协商性民主研究中心）

AI总结探讨大型语言模型如何通过系统功能语言学视角扩大民主审议规模，增强包容性并赋权边缘群体，同时警惕过度承诺与低估风险。

Comments Published in /Handbook of Democracy in the Era of Artificial Intelligence/ edited by Evangelos Pournaras, Srijoni Majumdar, Carina Ines Hausladen, and Dirk Helbing. 2026

详情

AI中文摘要

大型语言模型在公共话语中的日益突出为民主审议带来了机遇和挑战。虽然红队策略有助于缓解特定风险，但关于语言限制、偏见和LLM的谄媚倾向等更广泛的担忧仍然存在。本章探讨如何利用LLM显著扩大和民主化审议，特别是在促进包容性和赋权传统边缘群体方面。借鉴系统功能语言学的概念，本章考察了语言使用者之间的差异（例如，关于社会人口群体）和语言使用中的差异（例如，关于交际功能）如何影响AI支持的审议参与。本章介绍了AI驱动的审议研究，并评估了它们在支撑论证、增强可及性以及减少嵌入在声望语域中的排斥性语言规范和偏见的影响方面的潜力。同时，本章警告不要过度承诺（导致不切实际的期望）和低估承诺（冒着错失AI辅助参与机会的风险）。最后，本章确定了未来的研究方向，以最大化AI辅助参与的民主潜力，同时嵌入伦理保障以抵消语言不平等的再生产。

英文摘要

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation. While red teaming strategies help mitigate specific risks, broader concerns persist regarding linguistic constraints, biases, and the sycophantic tendencies of LLMs. This chapter explores how LLMs can be used to significantly scale up and democratise deliberation, particularly in fostering inclusivity and empowering traditionally marginalised groups. Drawing on concepts from Systemic-Functional Linguistics, the chapter examines how variations across language users (for example, with respect to socio-demographic groups) and across language use (for example, with respect to communicative functions) shape participation in AI-supported deliberation. The chapter presents AI-driven deliberation studies and assesses their potential to scaffold argumentation, enhance access, and reduce the influence of exclusionary linguistic norms and biases which are embedded in prestigious registers. At the same time, the chapter cautions against both overclaiming, which leads to unrealistic expectations, and underclaiming, which risks missed opportunities for AI-assisted engagement. The chapter concludes by identifying future research directions to maximise the democratic potential of AI-assisted participation while embedding ethical safeguards to counteract the reproduction of linguistic inequalities.

URL PDF HTML ☆

赞 0 踩 0

2606.19861 2026-06-19 cs.NE 新提交

Weight Adaptation for Improving Parallel Performance of Adaptive Stochastic Natural Gradient

权重自适应提升自适应随机自然梯度的并行性能

Yutaro Yamada, Kento Uchida, Shinichi Shirakawa

AI总结提出WA-ASNG，通过梯度上升自适应更新权重参数，最大化自然梯度信号，在二进制优化问题中优于PBIL和ASNG，并有效处理强噪声。

Comments Accepted at EvoCOP 2026 (Part of EvoStar 2026)

详情

DOI: 10.1007/978-3-032-20537-7_10

AI中文摘要

基于概率模型的进化算法在黑箱优化中很有前景。具体来说，自适应随机自然梯度（ASNG）自适应地更新其学习率（概率模型进化算法中的典型超参数），从而实现高效且鲁棒的优化。尽管权重参数是常见的超参数，但随着对耗时任务并行评估需求的增加，如何为更大的种群规模设置合适的权重仍不清楚。在本文中，我们提出了权重自适应ASNG（WA-ASNG），它将权重自适应机制融入ASNG。我们从自然梯度的累积中计算更新方向的估计信号。然后，为了最大化该信号，WA-ASNG通过优化上的梯度上升自适应地更新其权重参数。学习率自适应在满足预期目标值单调改进的充分条件方面发挥作用，而权重自适应机制旨在最大化这种改进。实验结果表明，在二进制优化问题中，种群规模从25到100的各种设置下，WA-ASNG优于PBIL和ASNG。此外，WA-ASNG在存在强噪声的情况下也能高效运行。我们的代码可在此https URL获取。

英文摘要

Probabilistic model-based evolutionary algorithms are promising for black-box optimization. Specifically, the adaptive stochastic natural gradient (ASNG) adaptively updates its learning rate, a typical hyperparameter in probabilistic model-based evolutionary algorithms, thereby realizing efficient and robust optimization. Although weight parameters are common hyperparameters, with the increasing demand for parallel evaluation of time-consuming tasks, it remains unclear how to set suitable weights for larger population sizes. In this paper, we propose Weight Adaptation ASNG (WA-ASNG), which incorporates a weight adaptation mechanism into ASNG. We calculated the estimated signal of the update direction from the accumulations of the natural gradient. Then, to maximize the signal, WA-ASNG adaptively updates its weight parameters by a gradient ascent over the optimization. While the learning rate adaptation plays a role in satisfying a sufficient condition for monotonic improvement of the expected objective value, the mechanism of weight adaptation is intended to maximize this improvement. The experimental results demonstrate that WA-ASNG outperforms PBIL and ASNG across various settings with population sizes ranging from 25 to 100 for binary optimization problems. Furthermore, WA-ASNG can perform efficiently in the presence of strong noise. Our code is available at https://github.com/shiralab/WA-ASNG .

URL PDF HTML ☆

赞 0 踩 0

2606.19857 2026-06-19 cs.CL cs.AI 新提交

Large Language Models Do Not Always Need Readable Language

大型语言模型并不总是需要可读语言

Jiayi Zhu, Haoxuan Peng, Junxi Wang, Liang Ke, Chen Zhang, Linfeng Zhang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； The University of Sydney（悉尼大学）； Hefei University of Technology（合肥工业大学）； Xi’an Jiaotong University（西安交通大学）； Nanjing University（南京大学）

AI总结研究提出BabelTele表示法，将语义编码为紧凑、非标准文本，牺牲人类可读性但保持LLM可恢复性，实验表明可压缩至27.9%长度并保持99.5%语义保真度，降低上下文开销。

Comments 23 pages, 10 figures. Preprint

详情

AI中文摘要

大型语言模型（LLM）通常使用人类可读的自然语言进行提示和交互，即使目标读者是另一个模型。本文研究语义信息是否可以编码为紧凑、非标准的文本形式，这种形式牺牲了人类可读性，但能被LLM恢复。我们将这类以模型为中心的文本表示称为BabelTele，这里不是作为固定协议，而是作为探索LLM生成和解释此类表示能力的经验探针。通过可读性诊断、模型似然度量、人类问卷和下游任务评估，我们发现BabelTele可以显著偏离普通自然语言，同时为指令调优的LLM保留核心语义。作为一种任务无关的表示范式，BabelTele展示了高信息密度，即使文本体积压缩到原始长度的27.9%，也能保持99.5%的语义保真度。我们进一步评估了其在跨模型迁移、智能体记忆和多智能体通信中的语义鲁棒性。结果表明，BabelTele可以降低上下文开销，同时通常保持可靠的下游性能，但其有效性取决于压缩器-读取器对和任务设置。这些发现表明，人类可读性、自然语言典型性和模型端语义可恢复性可以部分解耦，为未来探索LLM系统中的模型原生表示开辟了道路。

英文摘要

Large language models (LLMs) are commonly prompted and interfaced with human-readable natural language, even when the intended reader is another model. This paper investigates whether semantic information can be encoded in compact, non-standard textual forms that sacrifice human readability while remaining recoverable by LLMs. We refer to this class of model-centric textual representations as BabelTele, approached here not as a fixed protocol but as an empirical probe into LLMs' capacity to generate and interpret such representations. Through readability diagnostics, model likelihood measures, human questionnaires, and downstream task evaluations, we find that BabelTele can substantially depart from ordinary natural language while preserving core semantics for instruction-tuned LLMs. As a task-agnostic representational paradigm, BabelTele demonstrates high information density, maintaining 99.5% semantic fidelity even when the text volume is condensed to 27.9% of its original length. We further evaluate its semantic robustness in cross-model transfer, agent memory, and multi-agent communication. Results suggest that BabelTele can reduce context overhead while generally maintaining reliable downstream performance, although its effectiveness depends on the compressor-reader pair and task setting. These findings indicate that human readability, natural-language typicality, and model-side semantic recoverability can be partially decoupled, opening a path toward model-native representations in future exploration of LLM systems.

URL PDF HTML ☆

赞 0 踩 0

2606.19852 2026-06-19 cs.CL cs.LG 新提交

Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives

提示、规划、提取：用于从临床叙述中提取肺部病理学的零样本智能体LLM工作流

Aman Pathak, Cheng Peng, Mengxian Lyu, Ziyi Chen, Reema Solan, Sankalp Talankar, Yasir Khan, Hiren Mehta, Aokun Chen, Yi Guo, Yonghui Wu

发表机构 * Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida（健康结果与生物医学信息学系，医学院，佛罗里达大学）； Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, College of Medicine, University of Florida（呼吸科、重症医学科和睡眠医学科，医学系，医学院，佛罗里达大学）； College of Nursing, Florida State University（护理学院，佛罗里达州立大学）

AI总结提出零样本智能体工作流，利用开源大语言模型从肺切除病理报告中提取13个CAP字段，在无训练下达到0.893 Micro-F1，接近监督方法。

Comments 7 pages, 2 figures, 3 tables. Affiliations: (1) Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; (2) Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA; (3) College of Nursing, Florida State University, Tallahassee, FL, USA

详情

AI中文摘要

从病理报告中提取信息对于癌症分期和肿瘤登记人群至关重要。然而关键数据仍嵌入在叙述性报告中，使得手动提取劳动密集且易出错。传统的监督自然语言处理流程通过完全监督的命名实体识别和关系提取来解决这一问题，但需要昂贵的人工标注，并且当上游实体缺失时会出现级联故障。在本研究中，我们开发了一个零样本智能体工作流，并评估了五个开源生成式大语言模型（LLMs），以从肺切除病理报告中填充13个美国病理学家学会的概要字段。我们使用一种新颖的、与注册对齐的评估框架，将它们与最先进的监督GatorTron NER-RE基线进行比较。基线达到了0.960的Micro-F1，而最佳零样本模型（GPT-OSS-20B）达到了0.893的Micro-F1（召回率：0.949），在没有任务特定训练的情况下准确提取了复杂关系（如病理分期）。这些结果表明，开源零样本智能体LLMs是提取肺部病理信息的低成本解决方案。

英文摘要

Information extraction from pathology reports is essential for cancer staging, tumor registry population. Yet key data remains embedded in narrative reports, making manual extraction labor-intensive and error-prone. Traditional supervised Natural Language Processing pipelines address this through fully supervised Named Entity Recognition and Relation Extraction, but require expensive manual annotation and suffer cascading failures when upstream entities are missed. In this study, we developed a zero-shot, agentic workflow, and evaluated five open-source generative Large Language Models (LLMs) to populate 13 College of American Pathologists synoptic fields from lung resection pathology reports. We compared them against a state-of-the-art supervised GatorTron NER-RE baseline using a novel, registry-aligned evaluation framework. The baseline achieved Micro-F1of 0.960, while the best zero-shot model (GPT-OSS-20B) achieved Micro-F1 of 0.893 (recall: 0.949), accurately extracting complex relations like Pathologic Stage without task-specific training. These results suggest that open-source, zero-shot agentic LLMs are a low-cost solution for extracting lung pathology information.

URL PDF HTML ☆

赞 0 踩 0

2606.19850 2026-06-19 cs.LG cs.AI 新提交

Neural Additive and Basis Models with Feature Selection and Interactions

具有特征选择和交互的神经加性模型与神经基础模型

Yasutoshi Kishimoto, Kota Yamanishi, Takuya Matsuda, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

AI总结提出在神经加性模型和神经基础模型中引入特征选择机制，通过特征选择层减少计算开销，并支持高维数据中的特征交互学习，性能优于或持平于现有GAM方法。

Comments Accepted at PAKDD 2024. Code is available at https://github.com/shiralab/NAM-FS

详情

DOI: 10.1007/978-981-97-2259-4_1

AI中文摘要

深度神经网络（DNN）在各个领域表现出色，但通常可解释性较低。神经加性模型（NAM）及其变体神经基础模型（NBM）在广义加性模型（GAM）中使用神经网络（NN）作为非线性形状函数。这两种模型具有高度可解释性，并且在NN训练中表现出良好的性能和灵活性。NAM和NBM基于GAM架构，可以提供并可视化每个特征对预测的贡献。然而，当使用双输入NN来考虑特征交互或将其应用于高维数据集时，由于所需计算资源的增加，训练NAM和NBM变得棘手。本文提出将特征选择机制融入NAM和NBM以解决计算瓶颈。我们在两种模型中引入特征选择层，并在训练过程中更新选择权重。我们的方法简单，与原始NAM和NBM相比，可以降低计算成本和模型大小。此外，它使我们即使在数据维度很高的情况下也能使用双输入NN并捕获特征交互。我们证明，所提出的模型与原始NAM和NBM相比计算效率更高，并且与最先进的GAM相比表现出更好或相当的性能。

英文摘要

Deep neural networks (DNNs) exhibit attractive performance in various fields but often suffer from low interpretability. The neural additive model (NAM) and its variant called the neural basis model (NBM) use neural networks (NNs) as nonlinear shape functions in generalized additive models (GAMs). Both models are highly interpretable and exhibit good performance and flexibility for NN training. NAM and NBM can provide and visualize the contribution of each feature to the prediction owing to GAM-based architectures. However, when using two-input NNs to consider feature interactions or when applying them to high-dimensional datasets, training NAM and NBM becomes intractable due to the increase in the computational resources required. This paper proposes incorporating the feature selection mechanism into NAM and NBM to resolve computational bottlenecks. We introduce the feature selection layer in both models and update the selection weights during training. Our method is simple and can reduce computational costs and model sizes compared to vanilla NAM and NBM. In addition, it enables us to use two-input NNs even in high-dimensional datasets and capture feature interactions. We demonstrate that the proposed models are computationally efficient compared to vanilla NAM and NBM, and they exhibit better or comparable performance with state-of-the-art GAMs.

URL PDF HTML ☆

赞 0 踩 0

2606.19849 2026-06-19 cs.CV 新提交

ViCoStream: Streaming VideoLLMs Can Run Beyond 100 FPS with Stage-Wise Coordinated Inference

ViCoStream: 流式视频大模型通过阶段协调推理可运行超过100 FPS

Yang Tan, Junlong Tong, Linan Yue, Hao Wu, Pengfei Fang, Xiaoyu Shen

发表机构 * Southeast University（东南大学）； Eastern Institute of Technology, Ningbo（宁波东方理工大学）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出ViCoStream框架，通过阶段协调的流水线（分块执行、CUDA流重叠、视觉令牌控制、有界视觉注意力、查询端检索）实现流式视频大模型的高吞吐低延迟推理，在单A100上达到134 FPS视频吞吐和<50 ms首令牌延迟，精度接近全历史基线。

Comments 19 pages, 7 figures, 13 tables

详情

AI中文摘要

流式视频大模型必须持续处理传入的视频，同时保持低查询延迟，这使得视频摄入吞吐量和查询时间响应性对于实时部署至关重要。现有方法主要集中于加速单个模块，如视觉编码、令牌剪枝或KV缓存压缩，但对由此产生的系统能否维持实时流式性能提供的见解有限。我们将流式视频大模型推理形式化为一个协调的流水线，涵盖视觉预处理、视觉编码、令牌丢弃和LLM预填充/解码。基于这一形式化，我们提出了ViCoStream（视频协调流式处理），一个阶段协调的流式框架，结合了分块执行、CUDA流重叠、视觉令牌控制、有界视觉注意力和查询端检索，以限制每块的计算和内存成本。我们进一步对瓶颈迁移进行了系统研究，揭示了块大小、令牌保留、注意力局部性和检索范围如何影响吞吐量-准确率权衡。在多个流式基准测试上使用Qwen2.5-VL-3B/7B-Instruct进行的实验表明，ViCoStream在单块A100 GPU上实现了134 FPS的视频吞吐量和小于50 ms的首令牌延迟，同时保持接近全历史基线的准确率。

英文摘要

Streaming VideoLLMs must continuously process incoming video while maintaining low query latency, making both video-ingestion throughput and query-time responsiveness critical for real-time deployment. Existing methods largely focus on accelerating individual modules, such as visual encoding, token pruning, or KV-cache compression, but provide limited insight into whether the resulting system can sustain real-time streaming performance. We formulate streaming VideoLLM inference as a coordinated pipeline spanning visual preprocessing, visual encoding, token dropping, and LLM prefilling/decoding. Building on this formulation, we propose ViCoStream (Video Coordinated Streaming), a stage-wise coordinated streaming framework that combines chunk-wise execution, CUDA-stream overlap, visual token control, bounded visual attention, and query-side retrieval to bound per-chunk computation and memory costs. We further provide a systematic study of bottleneck migration, revealing how chunk size, token retention, attention locality, and retrieval scope shape the throughput-accuracy trade-off. Experiments with Qwen2.5-VL-3B/7B-Instruct across multiple streaming benchmarks show that ViCoStream achieves 134 FPS video throughput and less than 50 ms TTFT on a single A100 GPU while maintaining accuracy close to full-history baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.19847 2026-06-19 cs.CL 新提交

AtomMem: Building Simple and Effective Memory System for LLM Agents via Atomic Facts

AtomMem: 通过原子事实构建简单有效的LLM智能体记忆系统

Yanyu Yao, Shangze Li, Zhi Zheng, Hui Zheng, Qi Liu, Tong Xu, Enhong Chen

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China（中国科学技术大学认知智能国家重点实验室）； Anhui University（安徽大学）

AI总结针对现有记忆系统存储粗粒度、更新不稳定的问题，提出AtomMem，通过事实执行器提取高价值原子事实作为高效记忆表示，并组织为层次化事件结构和时间档案，实现价值密集存储和稳定演化，在LoCoMo基准上取得最优性能。

Comments 19 pages, 10 figures, 5 tables

详情

AI中文摘要

大型语言模型（LLM）展示了强大的推理和生成能力，但其固定的上下文窗口限制了跨多会话交互的长期信息积累和重用。现有的记忆增强系统通常以粗粒度且不稳定的方式构建记忆，依赖于低效的记忆表示或不稳定的无约束更新。为了解决这些挑战，我们提出了AtomMem，一种专为价值密集存储和稳定记忆演化设计的长期记忆系统。AtomMem引入了一个事实执行器，从长形式交互中选择性地提取高价值原子事实，作为高效的记忆表示。随后，AtomMem将这些事实组织成层次化的事件结构和时间档案，捕获连贯的情景上下文并随时间跟踪动态演变的用户属性。在检索过程中，系统激活一个关联记忆图来连接碎片化的记忆。在LoCoMo基准上的实验证实，AtomMem在各种推理任务中实现了最先进的性能，为部署智能个性化智能体提供了一种可扩展且经济可行的解决方案。

英文摘要

Large language models (LLMs) demonstrate strong reasoning and generation abilities, but their fixed context windows limit long-term information accumulation and reuse across multi-session interactions. Existing memory-augmented systems often construct memory in a coarse and unstable manner, relying on inefficient memory representations or unstable unconstrained updates. To address these challenges, we propose AtomMem, a long-term memory system designed for value-dense storage and stable memory evolution. AtomMem introduces a Fact Executor, which selectively extracts high value atomic facts from long form interactions to serve as highly efficient memory representations. Subsequently, AtomMem organizes these facts into hierarchical event structures and temporal profiles, capturing coherent episodic contexts and tracking dynamically evolving user attributes over time. During retrieval, the system activates an associative memory graph to connect fragmented memories. Experiments on the LoCoMo benchmark confirm that AtomMem achieves state-of-the-art performance across various reasoning tasks, offering a scalable and economically viable solution for deploying intelligent personalized agents.

URL PDF HTML ☆

赞 0 踩 0

2606.19838 2026-06-19 cs.CV 新提交

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

OTCHA: 基于最优传输的置信度感知潜在中心对齐用于多视图医学图像分类

Jiwoong Yang, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

AI总结提出OTCHA模块，通过最优传输对齐多视图补丁令牌与共享潜在中心令牌，结合置信度门控和部分匹配，消除无关特征，提升多视图医学图像分类鲁棒性。

Comments Accepted at MICCAI 2026

详情

AI中文摘要

多视图成像（如乳腺X线摄影和胸部X线摄影）是临床实践的标准组成部分。然而，医学图像通常未配准，且包含视图特定的伪影或无关背景线索，这些可能掩盖诊断相关发现。许多现有方法直接融合每个视图的表征，使得此类无关内容污染融合嵌入，并在不同视图配置下降低鲁棒性。我们提出OTCHA，一种基于最优传输（OT）的置信度感知潜在中心令牌对齐模块，在融合前细化补丁令牌以用于多视图分类。OTCHA引入一组跨视图共享的可学习潜在中心令牌。对于每个视图，我们计算补丁令牌与中心令牌之间的OT计划，该计划联合考虑特征相似性和几何结构，并通过令牌条件尘埃箱增强OT公式以实现部分匹配并丢弃无关令牌。所得传输计划提供令牌级匹配置信度，该置信度门控中心介导的消息传递，并加权一种新的基于最优传输的表征对齐损失以稳定细化。在三个多视图医学图像数据集上的实验表明，在不同解剖结构和视图配置下，相比竞争基线方法取得一致改进。我们的代码可在该https URL获取。

英文摘要

Multi-view imaging, such as mammography and chest radiography, is a standard component of clinical practice. However, medical images are often unregistered and contain view-specific artifacts or irrelevant background cues that can obscure diagnostically relevant findings. Many existing methods directly fuse per-view representations, allowing such irrelevant content to contaminate the fused embedding and reducing robustness under varying view configurations. We propose OTCHA, a confidence-aware latent hub token alignment module based on optimal transport (OT) that refines patch tokens before fusion for multi-view classification. OTCHA introduces a set of learnable latent hub tokens shared across views. For each view, we compute an OT plan between patch tokens and hub tokens that jointly considers feature similarity and geometry, and augment the OT formulation with token-conditional dustbins to enable partial matching and discard irrelevant tokens. The resulting transport plan provides token-wise matching confidence, which gates hub-mediated message passing and weights a novel optimal-transport-based representation alignment loss to stabilize refinement. Experiments on three multi-view medical image datasets demonstrate consistent improvements over competing baselines across diverse anatomies and view configurations. Our code is available at https://github.com/labhai/OTCHA.

URL PDF HTML ☆

赞 0 踩 0

2606.19836 2026-06-19 cs.RO cs.CV 新提交

World Engine: Towards the Era of Post-Training for Autonomous Driving

World Engine：迈向自动驾驶后训练时代

Tianyu Li, Li Chen, Caojun Wang, Haochen Liu, Kashyap Chitta, Zhenjie Yang, Yuhang Lu, Naisheng Ye, Yihang Qiu, Yufei Wang, Luoxi Zou, Jiaxin Peng, Jin Pan, Zhaoyu Su, Andrei Bursuc, Shengbo Eben Li, Andreas Geiger, Peng Su, Hongyang Li

发表机构 * The University of Hong Kong（香港大学）； Huawei（华为）； Shanghai Innovation Institute（上海创新研究院）； Archon Robotics（Archon机器人）； KE:SAI ； NVIDIA Research（NVIDIA研究）； NTU（南洋理工大学）； Tsinghua University（清华大学）

AI总结提出World Engine生成式框架，通过从真实日志重建高保真交互环境并外推安全关键变体，利用强化后训练对齐策略与安全约束，显著减少罕见安全关键场景故障，提升自动驾驶安全性。

Comments Technical Report. Project Page: https://opendrivelab.com/WorldEngine/

详情

AI中文摘要

自动驾驶车辆必须在现实世界中安全运行，而错误可能带来严重后果。尽管现代端到端驾驶策略在常规场景中表现出色，但其可靠性受限于真实驾驶数据集中安全关键的“长尾”事件的稀缺性。这些罕见交互定义了学习策略的实际安全边界，但在现实世界中难以大规模收集。我们展示了这一根本限制可以通过在合成的关键交互上对预训练驾驶模型进行后训练来解决。我们引入了World Engine，一个生成式框架，从真实日志中重建高保真交互环境，并系统性地将其外推为现实的安全关键变体。这一范式使得基于强化的后训练能够将策略与安全约束对齐，规避现实世界探索中固有的物理风险。在基于nuPlan构建的公开基准上，World Engine显著减少了罕见安全关键场景中的故障，并且相比仅扩展预训练数据带来了更大的增益。此外，当部署到生产级自动驾驶系统时，所得策略减少了模拟碰撞，并在道路测试中显示出可衡量的改进，表明在合成的安全关键交互上进行后训练为更安全的自动驾驶提供了一条可扩展且有效的途径。完整的代码库套件（包括训练）已向公众发布。

英文摘要

Autonomous vehicles must operate safely in the real world, where errors can have severe consequences. Although modern end-to-end driving policies excel in routine scenarios, their reliability is limited by the scarcity of safety-critical ``long-tail'' events in real driving datasets. These rare interactions define the practical safety boundary of the learned policy, yet they are difficult to collect at scale in the real world. Here we show that this fundamental limitation can be addressed by post-training pre-trained driving models on synthesized high-stakes interactions. We introduce World Engine, a generative framework that reconstructs high-fidelity interactive environments from real-world logs and systematically extrapolates them into realistic safety-critical variations. This paradigm enables reinforcement-based post-training to align policies with safety constraints, circumventing the physical risks inherent in real-world exploration. On a public benchmark built on nuPlan, World Engine substantially reduces failures in rare safety-critical scenarios and yields significantly larger gains than scaling pre-training data alone. Furthermore, when deployed on a production-scale autonomous driving system, the resulting policy reduces simulated collisions and demonstrates measurable improvements in on-road testing, showing that post-training on synthesized, safety-critical interactions offers a scalable and effective pathway to safer autonomous driving. The full codebase suite, including training, is released to the public.

URL PDF HTML ☆

赞 0 踩 0

2606.19835 2026-06-19 cs.CV 新提交

Neural Events: Discrete Asynchronous Autoencoders for Event-Based Vision

神经事件：用于事件视觉的离散异步自编码器

Roberto Pellerito, Daniel Gehrig, Shintaro Shiba, Davide Scaramuzza

发表机构 * Robotics and Perception Group, University of Zurich（苏黎世大学机器人感知组）； University of Pennsylvania（宾夕法尼亚大学）； The University of Tokyo（东京大学）； Keio University（庆应义塾大学）

AI总结提出将事件流重新标记为少量高信息量的“神经事件”，每个事件代表一个局部时空上下文窗口的离散可学习编码，在物体检测和分类任务中达到或超越现有方法，同时将事件率降低2.0倍。

详情

AI中文摘要

事件相机通过将动态场景表示为微秒分辨率的连续事件流，以卓越的时间保真度捕捉动态场景。然而，每个单独的事件仅携带最小的语义价值，仅仅表示局部亮度变化。为了获得有意义的信号，下游算法需要快速整合来自潜在大量低信息事件流的线索。然而，当前的架构很容易被淹没，难以在捕捉细粒度时间动态和维持可管理的数据吞吐量之间取得平衡。本文提出一个框架，将事件流重新标记为少量高信息量的“神经事件”，每个事件代表一个局部时空上下文窗口，并带有离散可学习编码。每次该编码翻转时，触发一个神经事件，产生高度压缩的数据流。我们证明，在物体检测和分类任务中，基于神经事件训练的网络与最先进方法性能相当或更优，同时将事件率降低2.0倍。

英文摘要

Event cameras capture dynamic scenes with exceptional temporal fidelity by representing them as a continuous stream of microsecond resolution \textit{events}. Each individual event, however, only carries minimal semantic value, merely signaling a localized brightness change. To derive meaningful signals, downstream algorithms need to quickly integrate cues from a potentially massive torrent of low-information events. Current architectures, however, are easily overwhelmed, struggling to balance capturing fine-grained temporal dynamics and maintaining a manageable data throughput. This paper proposes a framework to re-tokenize event streams into a small set of highly informative \textit{neural events}, each representing a local spatio-temporal context window with a discrete learnable code. Every time this code flips, a neural event is triggered, yielding a highly compressed data stream. We demonstrate that, across object detection and classification, networks trained on neural events are on par or surpass the performance of state-of-the-art approaches while reducing the event rate by a factor of 2.0.

URL PDF HTML ☆

赞 0 踩 0

2606.19831 2026-06-19 cs.CL cs.LG 新提交

Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models

杠杆不等于可达性：语言模型中单神经元操控的控制窗口定律

Hongliang Liu

发表机构 * Palo Alto Networks

AI总结提出预算归一化控制窗口框架，通过残差范数与写入范数之比定义的相干预算，预测单神经元干预何时产生连贯行为控制，并在15个神经元上验证了预测精度。

详情

AI中文摘要

对齐语言模型通过稀疏前馈神经元门控拒绝和语言路由等行为，但尚无理论预测单神经元干预何时连贯地控制行为而非导致输出崩溃。我们开发了一个预算归一化的控制窗口框架用于单神经元操控。沿一个写入方向的剂量简化为一个控制坐标：残差流与写入之间的对齐，该对齐沿着一条通用饱和曲线驱动，以残差范数除以写入范数设定的相干预算为单位。当行为触发点低于崩溃上限时，存在连贯控制。同一坐标控制良性模式切换和拒绝；上限由权重和一次通用前向传播得出，而触发点在 rollout 时测量。在15个保留神经元上，预测上限的平均绝对误差为0.14，在批量层中约为0.07，并且承诺的开启或关闭判定在11个神经元上成立，而多数基线为10/15。关闭情况揭示了三种失败模式而非违反：触发前崩溃、深度不足以传播、或归一化限制了单个神经元能推动的距离。该定律解释了为什么局部梯度归因反直觉地预测控制：真正的控制器偏离读出轴写入，并携带接近零的一阶梯度。由窗口精确化的仅前向对比筛选恢复了归因遗漏的控制器。在拒绝这一最难案例中，干预成功是类型化的而非标量：连贯旁路和严格可操作可达性分离，因此一个神经元可以在流畅、任务相关且无操作内容的文本中翻转拒绝，而真正的可操作可达性仅出现在六个审计的 Llama 枢轴中的三个，且仅在较晚的 rollout 时间范围内。因此，单神经元操控是对可控性的预算化、类型化审计，而非固定剂量的轶事。

英文摘要

Aligned language models gate behaviors such as refusal and language routing through sparse feed forward neurons, yet no theory predicts when a single neuron intervention controls a behavior coherently rather than collapsing the output. We develop a budget normalized control window framework for single neuron steering. A dose along one write direction reduces to one control coordinate: the alignment between the residual stream and the write, driven along a universal saturation curve in units of a coherence budget set by the residual norm divided by the write norm. Coherent control exists when a behavior trigger lies below the collapse ceiling. The same coordinate governs benign mode switches and refusal; the ceiling follows from weights and one generic forward pass, while triggers are measured at rollout. On fifteen held out neurons, the predicted ceiling has mean absolute error 0.14, about 0.07 in bulk layers, and the committed open or closed verdict holds on eleven against a ten of fifteen majority baseline. Closed cases expose three failure modes rather than violations: collapse before trigger, too little depth to propagate, or a normalization that caps how far one neuron can push. The law explains why local gradient attribution anti predicts control: true controllers write off the readout axis and carry a near zero first order gradient. A forward only contrastive screen made precise by the window recovers controllers that attribution misses. On refusal, the hardest case, intervention success is typed, not scalar: coherent bypass and strict actionable reach separate, so a neuron can flip refusal in fluent, on task text with no actionable content, and genuine actionable reach appears only for three of six audited Llama pivots and only at later rollout horizons. Single neuron steering is therefore a budgeted, typed audit of controllability rather than a fixed dose anecdote.

URL PDF HTML ☆

赞 0 踩 0

2606.19830 2026-06-19 cs.SE cs.CL 新提交

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

JAMER：专业游戏引擎上的项目级代码框架数据集与基准测试

Jianwen Sun, Chuanhao Li, Zizhen Li, Yukang Feng, Fanrui Zhang, Yifei Huang, Yu Dai, Kaipeng Zhang

发表机构 * Nankai University（南开大学）； Shanghai Innovation Institute（上海创新研究院）； Shanghai AI Laboratory（上海人工智能实验室）

AI总结提出首个基于专业游戏引擎的项目级代码框架数据集JamSet和基准JamBench，通过设计确定性验证流程，从24万仓库中筛选出8133个已验证项目，评估9个前沿模型发现项目规模增大时能力急剧下降。

详情

AI中文摘要

当前AI驱动的游戏开发在资产生成、游戏设计和基于Web的游戏编码方面取得了实质性进展，但由于缺乏大规模数据集和确定性评估方法，专业游戏引擎上的项目级代码工程仍然很大程度上未被探索。我们提出了JamSet和JamBench，这是首个基于专业游戏引擎的项目级游戏代码框架数据集和基准。我们的关键洞察是，Game Jam竞赛（开发者在严格时间限制下构建完整游戏的社区活动）产生了数千个适合此目的的开源项目。基于Godot引擎的文本格式和无头执行模式，我们设计了一个从文件完整性到运行时行为收集的确定性验证流程，从超过24万个仓库中提炼出8133个已验证项目。其中，300个手动验证的项目构成JamBench；其余构成JamSet。JamBench定义了主题驱动的生成和代码补全任务，通过结合编译通过率、结构完整性得分（SCS）和行为对齐得分（BAS）的流水线进行评估。对9个前沿模型的评估揭示了随着项目规模增加的能力悬崖，运行时通过率从小型项目的80.4%下降到大型项目的5.7%（Task2a）。代码代理提高了编译率，但在运行时行为质量上没有带来提升，表明瓶颈在于架构设计而非语法正确性。实验验证了JamSet作为有效训练数据。所有数据和代码均已公开。

英文摘要

Current AI-driven game development has made substantial progress in asset generation, gameplay design, and web-based game coding, yet project-level code engineering on professional game engines remains largely unexplored due to the absence of large-scale datasets and deterministic evaluation methods. We present JamSet and JamBench, the first project-level game code framework dataset and benchmark built on a professional game engine. Our key insight is that Game Jam competitions, community events where developers build complete games under tight time constraints, yield thousands of open-source projects suitable for this purpose. Building on the Godot engine's text-based format and headless execution mode, we design a deterministic verification pipeline from file integrity to runtime behavior collection, distilling 8,133 verified projects from over 240,000 repositories. Of these, 300 manually verified projects form JamBench; the rest constitute JamSet. JamBench defines theme-driven generation and code completion tasks, evaluated through a pipeline combining compilation pass rates, Structural Completeness Score (SCS), and Behavioral Alignment Score (BAS). Evaluation of 9 frontier models reveals a capability cliff as project scale increases, with runtime pass rates dropping from 80.4% on small projects to 5.7% on large ones (Task2a). Code Agents improve compilation rates yet yield no gains in runtime behavioral quality, indicating that the bottleneck lies in architectural design rather than syntactic correctness. Experiments validate JamSet as effective training data. All data and code are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.19828 2026-06-19 cs.CV 新提交

3D-PLOT-LLM: Part-Level Object Tokens for 3D Large Language Models

3D-PLOT-LLM: 用于三维大语言模型的部件级对象标记

Jintang Xue, Xinyu Wang, Yixing Wu, Jingwen Chen, C. -C. Jay Kuo

发表机构 * University of Southern California（南加州大学）； Ohio State University（俄亥俄州立大学）

AI总结提出3D-PLOT-LLM，通过重组输入标记流使部件可直接通过LLM词汇寻址，无需分割解码器或边界框，在部件级基准上超越现有方法。

详情

AI中文摘要

三维多模态大语言模型（3D MLLMs）将3D对象作为一个整体进行描述，但无法处理、命名或推理其部件。先前的部件感知尝试增加了分割解码器、更重的3D编码器或边界框语法，导致参数成本大幅增加。我们采取了一条根本不同的路径：重新组织输入标记流，使得部件通过LLM自身的词汇变得可直接寻址。我们的模型3D-PLOT-LLM将冻结的点编码器的块分割成K个局部一致的区域，并在每个区域的块标记之前插入一个可学习的每区域标记和一个保留词汇标记<part_k>；然后，一个标记空间精化（MSR）模块根据每个区域的空间统计信息和邻接邻居对该标记进行条件化。因此，模型在其输出中引用部件，并遵循通过标记引用部件的提示，这是先前对象级3D MLLMs所不具备的能力。为了探究这一接口，我们构建了PartVerse-QA，一个基于PartVerse网格注释改编的词汇级部件问答基准（77K训练对和588个保留查询，基于不相交的对象划分），在该基准上，3D-PLOT-LLM达到了描述到槽的Jaccard指数0.459和精确匹配率13.78%，槽到描述的GPT-4o评判得分为44.68。在3DCoMPaT-GrIn部件感知接地描述基准上，3D-PLOT-LLM在所有文本输出指标上优于PointLLM、Kestrel、PARIS3D和SegPoint，并在4项指标中的3项上优于ShapeLLM，相比PointLLM的GPT-4o评判得分最高提升+3.03。在Objaverse整体对象描述中，在第二阶段添加PartVerse-QA使得相比PointLLM的SBERT得分提升+0.65，GPT-4o得分提升+1.85，并且在5项传统指标中的4项（SBERT、SimCSE、BLEU-1、METEOR）上超过PointLLM-PiSA，尽管其目标是不同的（部件接地）目标。所有这些仅需在冻结的点编码器上增加不到100万个可训练参数，比先前的部件感知3D MLLMs低一个数量级，且无需分割解码器或边界框头。

英文摘要

3D multimodal large language models (3D MLLMs) describe a 3D object as a whole but cannot address, name, or reason about its parts. Prior part-aware attempts add segmentation decoders, heavier 3D encoders, or bounding-box grammars at substantial parameter cost. We take a fundamentally different path: we reorganize the input token stream so that parts become directly addressable through the LLM's own vocabulary. Our model, 3D-PLOT-LLM, partitions the frozen point encoder's patches into K locally coherent regions and inserts, before each region's patch tokens, a learnable per-region marker and a reserved vocabulary token <part_k>; a Marker-Space Refinement (MSR) module then conditions each marker on its region's spatial statistics and adjacency neighbors. The model thus cites parts in its output and follows prompts that refer to parts by token, a capability absent from prior object-level 3D MLLMs. To probe this interface, we construct PartVerse-QA, a vocabulary-level part-QA benchmark adapted from PartVerse mesh annotations (77K training pairs and 588 held-out queries on disjoint object splits), on which 3D-PLOT-LLM reaches caption-to-slots Jaccard 0.459 and Exact-match 13.78%, with a slot-to-caption GPT-4o judge of 44.68. On the 3DCoMPaT-GrIn part-aware grounded description benchmark, 3D-PLOT-LLM outperforms PointLLM, Kestrel, PARIS3D, and SegPoint on every text-output metric, and ShapeLLM on 3 of 4, with up to +3.03 GPT-4o judge over PointLLM. On Objaverse whole-object captioning, adding PartVerse-QA at Stage 2 yields +0.65 SBERT and +1.85 GPT-4o over PointLLM, and tops PointLLM-PiSA on 4 of 5 traditional metrics (SBERT, SimCSE, BLEU-1, METEOR) despite targeting a different (part-grounded) objective. All with under 1M new trainable parameters on a frozen point encoder, an order of magnitude below prior part-aware 3D MLLMs, and no segmentation decoder or bounding-box head.

URL PDF HTML ☆

赞 0 踩 0

2606.19827 2026-06-19 cs.LG cs.AI 新提交

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

何时、何地以及如何：面向表格自监督学习的自适应分箱

Daehwan Kim, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

AI总结提出自适应分箱方法，通过特征级粗到细课程学习动态优化离散化，结合类别重建与顺序监督，在医疗表格数据上提升自监督学习性能。

Comments Accepted to MICCAI 2026

详情

AI中文摘要

医疗表格数据在临床研究中无处不在，但表格数据的深度学习仍未被充分探索，因为可靠的标签通常需要昂贵的专家判定，尽管结构化临床变量通常以表格形式常规可用。自监督学习可以利用这些未标记的表格，而最近基于分箱的前置任务提供了一种有前景的归纳偏置，但现有目标固定单个全局分位数离散化并应用特征无关的监督。我们提出自适应分箱，一种用于表格自监督学习的训练自适应离散化前置任务，通过特征级粗到细课程将离散化与学习耦合。受神经网络的频谱偏差和课程学习原则的启发，我们的方法在检测到平台期时逐步细化每个特征的离散化，并选择表示感知的分割点，以联合改善值空间浓度和表示空间一致性。一种异质性感知目标统一了类别重建与数值特征的顺序监督，在统一评估协议下对公共医疗表格数据集的实验显示，线性探测和微调均取得一致改进，无需数据集特定的离散化调整。我们进一步引入一个医疗表格自监督学习基准，配备标准化协议，以支持这一未被充分探索领域的可重复进展。我们的代码可在该网址获取。

英文摘要

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.

URL PDF HTML ☆

赞 0 踩 0

2606.19826 2026-06-19 cs.CR cs.MA 新提交

Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience

对抗性同伴下的异构LLM辩论：诚实增益、替代成本与韧性

Prashanti Nilayam, Kiran Kumar Ramanna, Prashil Tumbade, Sankalp Nayak

AI总结研究异构LLM辩论中诚实与对抗性同伴对修正行为的影响，发现诚实同伴降低有害修正率，对抗性同伴则逆转，且异构性在已有对手时也能作为防御。

详情

AI中文摘要

异构LLM辩论的动机在于，多样化的同伴可以相互纠正，但同样的交流既携带纠正也携带对抗性影响。我们通过跟踪异构同伴如何改变诚实智能体的修正行为来衡量哪种影响占主导：他们改变答案的频率，以及这种改变是纠正性的还是有害的。我们比较了匹配面板（同质基线、诚实混合和对抗混合）以及受污染面板（其中已存在一个恶意的同族同伴），涵盖四个模型家族和三个推理基准。一个诚实的异构同伴显著降低了有害修正，而对抗性同伴则逆转了这一效果。对于Llama-3.1-70B防御者在MATH-hard上，诚实插槽的有害修正率从同质面板的89%下降到有诚实同伴时的35%，而对抗性同伴使其回到90%。条件率对弱防御者隐藏了这种损害，但辩论结束时的翻转率暴露了它。该模式在家族和基准上保持符号一致，而其幅度随防御者-基准机制变化。我们还测量了当已存在一个对抗性同族同伴时的效果：一个诚实的异构同伴降低了有害修正率以及最初正确答案丢失的比率。在相同的Llama-3.1-70B设置下，添加的诚实同伴将最初正确项上的翻转率从同族对手下的31%降至6%。因此，异构性不仅是一个攻击面，而且当对手已经存在时，也是一种防御。

英文摘要

Heterogeneous LLM debate is motivated by the promise that diverse peers correct one another, but the same exchange that carries correction also carries adversarial influence. We measure which dominates by tracking how a heterogeneous peer changes the honest agents' revision behavior: how often they change their answer, and whether the change is corrective or harmful. We compare matched panels (homogeneous baseline, honest-mixed, and adversarial-mixed) and contaminated panels in which a malicious same-family peer is already present, spanning four model families and three reasoning benchmarks. An honest heterogeneous peer sharply lowers harmful revision, and an adversarial one reverses it. For Llama-3.1-70B defenders on MATH-hard, the honest-slot harmful-revision rate falls from 89% in the homogeneous panel to 35% with an honest peer, and an adversarial peer returns it to 90%. The conditional rate hides this damage on weak defenders, but the end-of-debate flip rate exposes it. The pattern keeps its sign across families and benchmarks while its magnitude varies with the defender-benchmark regime. We also measure the effects when an adversarial same-family peer is already present: an honest heterogeneous peer lowers both harmful revision and the rate at which initially-correct answers are lost. On the same Llama-3.1-70B setting, the added honest peer cuts the flip rate on initially-correct items from 31% under a same-family adversary to 6%. Heterogeneity is therefore not only an attack surface but, when an adversary is already present, also a defense.

URL PDF HTML ☆

赞 0 踩 0

2606.19825 2026-06-19 cs.LG 新提交

Enhancing Graph Neural Networks Using Proximity Graphs for Dust Source Emission Forecasting

利用邻近图增强图神经网络用于沙尘源排放预测

Maryam Sanisales, Zahed Rahmati, Ali Darvishi Boloorani, Ali Vefghi

发表机构 * Amirkabir University of Technology（阿米尔卡比尔理工大学）； University of Tehran（德黑兰大学）

AI总结提出使用Delaunay三角剖分等邻近图作为图神经网络输入，通过消息传递捕捉沙尘源排放的时空动态，相比随机图和LSTM模型显著提升预测精度。

详情

AI中文摘要

准确预测沙尘源排放对于减轻沙尘暴带来的重大环境和健康危害至关重要。传统预测方法通常难以捕捉这些现象的复杂时空动态。在本文中，我们证明邻近图使图神经网络（GNN）能够有效建模数据点之间复杂的空间和时间关系。具体来说，我们使用邻近图——如Delaunay三角剖分、Gabriel图、k-最近邻图和Yao图——作为GNN（包括GraphSAGE、图卷积网络和图注意力网络）的输入来执行消息传递。我们的方法强调了将邻近图与GNN集成用于稳健准确的沙尘源预测的有效性。为了强调邻近图表示的重要性，我们将我们的方法与使用随机图进行消息传递的GNN进行了比较。结果表明，使用邻近图的GNN显著优于使用随机图的GNN，并且在沙尘源排放预测中也远优于长短期记忆（LSTM）模型。

英文摘要

Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal dynamics of these phenomena. In this paper, we demonstrate that proximity graphs enable Graph Neural Networks (GNNs) to effectively model the intricate spatial and temporal relationships between data points. Specifically, we use proximity graphs--such as Delaunay triangulation, Gabriel graph, k-Nearest Neighbor graph, and Yao graph--as the input for GNNs (including GraphSAGE, Graph Convolutional Networks, and Graph Attention Networks) to perform message passing. Our approach highlights the effectiveness of integrating proximity graphs with GNNs for robust and accurate dust source forecasting. To emphasize the importance of proximity graph representations, we compare our method against GNNs using random graphs for message passing. The results show that GNNs with proximity graphs significantly outperform those with random graphs and are also far superior to Long Short-Term Memory (LSTM) model in dust source emission forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.19824 2026-06-19 cs.CV cs.AI 新提交

CSWinUNETR: Segmentation of Thin Anatomical Structures in Medical Images

CSWinUNETR: 医学图像中薄解剖结构的分割

Junho Moon, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

AI总结提出CSWinUNETR通用骨干网络，通过交叉形条带自注意力、循环移位、细节增强多尺度自注意力和稀疏控制动态蛇形卷积，解决薄结构分割中的低对比度、断裂和类不平衡问题，在眼科、神经血管和皮肤科基准上超越现有方法。

Comments Accepted at MICCAI 2026

详情

不确定性感知的奖励建模用于稳定的RLHF

Licheng Pan, Haocheng Yang, Haoxuan Li, Yichen Sun, Yunsheng Lu, Shijian Wang, Lei Shen, Yuan Lu, Zhixuan Chu, Hao Wang

发表机构 * Zhejiang University（浙江大学）； Peking University（北京大学）； National University of Singapore（新加坡国立大学）

AI总结提出不确定性感知奖励建模（UARM），通过分位数保形预测校准不确定性并利用异方差方差分解重加权GRPO优势，以缓解奖励黑客问题，提升对齐质量。

详情

AI中文摘要

从人类反馈中强化学习（RLHF）通过在偏好数据上训练奖励模型并优化策略以最大化预测奖励来对齐大型语言模型。然而，该流程面临两个基本挑战：（1）奖励模型无法在预测不可靠时发出信号，因为它们通常充当确定性点估计器；（2）现代基于组的策略优化可能放大不可靠的奖励信号，例如GRPO在优势计算中对奖励的统一处理。随着策略探索越来越多样化的响应，这两个限制造成了一个关键漏洞：不可靠的奖励估计可能被赋予不成比例的影响力，引发严重的奖励黑客问题。我们提出不确定性感知奖励建模（UARM），通过基于分位数的保形预测为奖励模型配备校准的不确定性，并通过异方差方差分解重加权GRPO优势。在HelpSteer、UltraFeedback和PKU-SafeRLHF上的实验表明，与标准GRPO和不确定性无关的基线相比，UARM显著改善了奖励模型校准，减少了奖励黑客问题，并增强了下游对齐质量。

DISARM：目标电子设备知情的软件运行时侧信道漏洞缓解

Tasneem Suha, Tanzim Mahfuz, Rima Asmar Awad, Prabuddha Chakraborty

AI总结提出DISARM方法，利用真实嵌入式设备时序值生成针对性软件修复，以缓解运行时侧信道漏洞，在五个不同设备上优于现有方案。

详情

AI中文摘要

程序运行时或时序攻击利用程序执行时间的变化来提取敏感信息（如加密密钥、敏感变量数据、知识产权）。针对运行时侧信道攻击的最新解决方案试图平衡不同控制流路径下敏感代码的执行时间，以消除时序泄漏。然而，在缓解过程中，大多数技术未考虑目标程序运行的底层硬件或设备。这可能导致过度修复（不必要的额外操作）、修复不足（未正确解决不平衡）甚至失败。我们提出DISARM，一种联合硬件-软件方法（不同于任何现有解决方案），用于缓解运行时侧信道漏洞，该方法利用真实嵌入式设备的时序值生成针对性的软件修复。我们实现了DISARM以支持C、C++和Java源代码，并在22个标准基准测试上进行验证。在五个不同的嵌入式或边缘设备上，DISARM在执行时间开销、代码大小开销和正确性方面均优于现有解决方案如PENDULUM和DifFuzzAR。

英文摘要

Program runtime or timing attacks exploit variations in a program's execution times to extract sensitive information from the program (e.g. encryption keys, sensitive variable data, intellectual property). State-of-the-art solutions to runtime side-channel attacks attempt to balance the execution time of the sensitive code for different control flow paths to eliminate the timing leakage. However, during the mitigation process, most techniques do not consider the underlying hardware or device on which the target program is supposed to run on. This can lead to over-fixing (unnecessary extra operations), under-fixing (not solving the imbalance properly), and even failures. We propose DISARM, a joint hardware-software methodology (unlike any existing solution) for mitigating runtime side-channel vulnerabilities that utilizes timing values from real embedded devices to generate targeted software fixes. We implement DISARM to support C, C++, and Java source codes and validate it across 22 standard benchmarks. DISARM outperforms state-of-the-art solutions such as PENDULUM and DifFuzzAR in terms of execution time overhead, code size overhead, and correctness on five different embedded or edge devices.

URL PDF HTML ☆

赞 0 踩 0

2606.19805 2026-06-19 cs.CV cs.AI 新提交

ParaScale: Scale-Calibrated Camera-Motion Transfer via a Gauge-Invariant Parallax Number

ParaScale: 通过规范不变视差数进行尺度校准的相机运动迁移

Zijie Meng

发表机构 * Peking University（北京大学）

AI总结提出ParaScale模块，通过规范不变的视差数Pi实现尺度忠实相机运动迁移，无需重新训练，在四个数量级尺度上降低视差一致性误差3倍以上。

Comments Accepted by SCA2026(poster)

详情

AI中文摘要

将参考视频的相机运动迁移到新生成的视频中，可以让创作者重复使用电影级运镜。然而，参考视频和目标视频往往处于不兼容的尺度——例如跨越银河系的扫视与桌面上的轻推——直接复用恢复的轨迹会导致运动要么不可察觉，要么剧烈夸张。我们将此归结为一个几何事实：平移引起的图像运动与||T||/Z成比例，因此单目轨迹仅在深度尺度规范下才有意义。我们将此提炼为视差数Pi = ||Delta T|| / Zbar，这是一个无量纲、规范不变的描述符，用于衡量相机运动的感知强度，并证明它是尺度忠实迁移必须保持的量，而非原始轨迹。ParaScale是一个即插即用模块，它从任何参考视频中读取Pi，并针对目标场景的深度逐帧重新实现它，保持旋转不变。它位于姿态提取和姿态注入之间，无需重新训练，可插入任何姿态条件生成器。我们进一步引入了视差一致性误差（PCE），这是一种尺度对称的度量，与相似性对齐的TransErr不同，它能暴露场景尺度不匹配。在跨越四个数量级的尺度范围和多个骨干网络上，ParaScale将实现的视差保持在恒等线上，并将PCE比未校准的迁移降低3倍以上，且不损失视觉保真度。

英文摘要

Transferring the camera motion of a reference video to a freshly generated one lets creators reuse cinematic moves. Yet reference and target often live at incompatible scales -- a sweep across a galaxy versus a nudge across a desk -- and naively reusing the recovered trajectory yields either imperceptible or violently exaggerated motion. We trace this to a geometric fact: translation-induced image motion scales as ||T||/Z, so a monocular trajectory is meaningful only up to a depth-scale gauge. We distill this into the Parallax Number Pi = ||Delta T|| / Zbar, a dimensionless, gauge-invariant descriptor of how strongly a camera move is felt, and prove that it -- not the raw trajectory -- is the quantity that scale-faithful transfer must preserve. ParaScale is a plug-and-play module that reads Pi off any reference video and re-realizes it against the target scene's own depth, per frame, leaving rotation untouched. Sitting between pose extraction and pose injection, it requires no retraining and drops into any pose-conditioned generator. We further introduce the Parallax Consistency Error (PCE), a scale-symmetric metric that -- unlike the similarity-aligned TransErr -- exposes scene-scale mismatch. Across scale regimes spanning four orders of magnitude and multiple backbones, ParaScale keeps the realized parallax on the identity line and cuts PCE by more than 3x over uncalibrated transfer with no loss of visual fidelity.

URL PDF HTML ☆

赞 0 踩 0