2601.22478 2026-05-20 cs.LG

FreeOrbit4D: 通过前景完整4D重建实现免训练的任意相机重定向

Wei Cao, Hao Zhang, Fengrui Tian, Yulun Wu, Yingying Li, Shenlong Wang, Ning Yu, Yaoyao Liu

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of Pennsylvania（宾夕法尼亚大学）； Eyeline Labs（Eyeline实验室）

AI总结本文提出FreeOrbit4D，一种无需训练的框架，通过恢复完整的前景4D代理来解决大角度重定向中的几何模糊问题，从而生成更真实且时间一致的视频。

Comments 12 pages, 10 figures. Accepted to SIGGRAPH Conference Papers 2026

详情

DOI: 10.1145/3799902.3811122

AI中文摘要

Camera redirection aims to replay a dynamic scene from a single monocular video under a user-specified camera trajectory. However, large-angle redirection is inherently ill-posed: a monocular video captures only a narrow spatio-temporal view of a dynamic 3D scene, providing severely limited observations of the underlying 4D world. The key challenge is therefore to recover a complete and coherent representation from this limited input, with consistent geometry and motion. While recent diffusion-based methods achieve impressive visual generation quality, they often break down under large-angle viewpoint changes far from the original trajectory, where missing visual grounding leads to severe geometric ambiguity and temporal inconsistency. We present FreeOrbit4D, an effective 免训练 framework that tackles this ambiguity by recovering a foreground-complete 4D proxy as structural grounding for video generation. We obtain this proxy by decoupling foreground and background reconstructions: we unproject the monocular video into a static background and partial foreground point clouds in a unified global space, then use an object-centric multi-view diffusion model to synthesize multi-view images and reconstruct complete foreground point clouds in canonical object space. By aligning the canonical foreground point cloud to the global scene space via dense pixel-synchronized 3D-3D correspondences and projecting the foreground-complete 4D proxy onto target camera viewpoints, we provide geometric scaffolds that guide a conditional video diffusion model. Extensive experiments show that FreeOrbit4D produces more faithful and temporally coherent redirected videos under challenging large-angle trajectories, and our proxy further enables applications such as edit propagation and 4D data generation. Project page: https://freeorbit4d.vision.ischool.illinois.edu/

英文摘要

Camera redirection aims to replay a dynamic scene from a single monocular video under a user-specified camera trajectory. However, large-angle redirection is inherently ill-posed: a monocular video captures only a narrow spatio-temporal view of a dynamic 3D scene, providing severely limited observations of the underlying 4D world. The key challenge is therefore to recover a complete and coherent representation from this limited input, with consistent geometry and motion. While recent diffusion-based methods achieve impressive visual generation quality, they often break down under large-angle viewpoint changes far from the original trajectory, where missing visual grounding leads to severe geometric ambiguity and temporal inconsistency. We present FreeOrbit4D, an effective training-free framework that tackles this ambiguity by recovering a foreground-complete 4D proxy as structural grounding for video generation. We obtain this proxy by decoupling foreground and background reconstructions: we unproject the monocular video into a static background and partial foreground point clouds in a unified global space, then use an object-centric multi-view diffusion model to synthesize multi-view images and reconstruct complete foreground point clouds in canonical object space. By aligning the canonical foreground point cloud to the global scene space via dense pixel-synchronized 3D-3D correspondences and projecting the foreground-complete 4D proxy onto target camera viewpoints, we provide geometric scaffolds that guide a conditional video diffusion model. Extensive experiments show that FreeOrbit4D produces more faithful and temporally coherent redirected videos under challenging large-angle trajectories, and our proxy further enables applications such as edit propagation and 4D data generation. Project page: https://freeorbit4d.vision.ischool.illinois.edu/

URL PDF HTML ☆

赞 0 踩 0

2601.16823 2026-05-20 cs.CL cs.AI

深度研究代理能否检索和组织？通过专家分类法评估合成差距

Ming Zhang, Jiabao Zhuang, Wenqing Jing, Kexin Tan, Ziyu Kong, Jingyi Deng, Yujiong Shen, Yuhui Wang, Zhenghao Xiang, Qiyuan Peng, Yuhang Zhao, Ning Luo, Renzhe Zheng, Jiahui Lin, Mingqi Wu, Long Ma, Shihan Dou, Maxm Pan, Tao Gui, Qi Zhang, Xuanjing Huang

发表机构 * Fudan University（复旦大学）； Hunyuan Team, Tencent（腾讯 Hunyuan 团队）

AI总结本文提出TaxoBench基准，评估深度研究代理在检索和组织论文方面的能力，发现两者在能力与对齐方面均存在瓶颈。

详情

AI中文摘要

深度研究代理越来越多地自动化文献综述生成，但它们是否能像人类专家一样检索关键论文并将其组织成专家级分类法仍不清楚。现有基准强调写作质量和引用正确性，而标准聚类指标忽略层次结构。我们引入TaxoBench，一个包含72篇高引LLM综述、专家编写的分类树和3,815篇映射到论文类别的论文的基准。TaxoBench评估（1）检索通过召回率/精确率/F1，以及（2）在叶级别（论文到类别分配）和层次级别通过两个新指标：无序语义树编辑距离（US-TED/US-NTED）和语义路径相似性（Sem-Path）。支持两种模式：深度研究（主题-only，端到端）和自下而上（提供专家论文集，仅组织）。为了区分与单一专家参考的分歧与真正的模型失败，我们明确将发现分为能力基于（参考自由）和对齐基于（参考依赖）组。评估7个深度研究代理和12个前沿LLM揭示了双重瓶颈。在能力方面，最好的代理只能检索专家引用论文的20.92%，1,000个模型分类法显示75.9%的兄弟节点重叠，51.2%的MECE违规，和83.4%的结构不平衡，所有这些在没有参考的情况下都可以检测到。在对齐方面，所有12个LLM收敛到Sem-Path 28-29%，远低于三个独立人工标注组在相同论文集上达到的47-58%。我们的基准在https://github.com/KongLongGeFDU/TaxoBench上公开可用。

英文摘要

Deep Research Agents increasingly automate survey generation, yet whether they match human experts at retrieving essential papers and organizing them into expert-like taxonomies remains unclear. Existing benchmarks emphasize writing quality or citation correctness, while standard clustering metrics ignore hierarchical structure. We introduce TaxoBench, a benchmark of 72 highly cited LLM surveys with expert-authored taxonomy trees and 3,815 papers mapped to paper categories. TaxoBench evaluates (1) retrieval via Recall/Precision/F1, and (2) organization at a leaf level (paper-to-category assignment) and a hierarchy level via two new metrics: Unordered Semantic Tree Edit Distance (US-TED/US-NTED) and Semantic Path Similarity (Sem-Path). Two modes are supported: Deep Research (topic-only, end-to-end) and Bottom-Up (expert paper set provided, organization-only). To distinguish disagreement with a single expert reference from genuine model failure, we explicitly partition findings into capability-based (reference-free) and alignment-based (reference-dependent) groups. Evaluating 7 Deep Research Agents and 12 frontier LLMs reveals a dual bottleneck. On the capability side, the best agent retrieves only 20.92% of expert-cited papers, and 1,000 model taxonomies show 75.9% sibling overlap, 51.2% MECE violations, and 83.4% structural imbalance, all detectable without any reference. On the alignment side, all 12 LLMs converge to Sem-Path 28-29%, well below 47-58% achieved by three independent human-annotator groups on the same paper sets. Our benchmark is publicly available at https://github.com/KongLongGeFDU/TaxoBench.

URL PDF HTML ☆

赞 0 踩 0

2601.05437 2026-05-20 cs.CL cs.AI

Tracing Moral Foundations in Large Language Models

在大型语言模型中追溯道德基础

Chenxiao Yu, Bowen Yi, Farzan Karimi-Malekabadi, Suhaib Abdurahman, Jinyi Ye, Shrikanth Narayanan, Yue Zhao, Morteza Dehghani

发表机构 * Department of Computer Science, University of Southern California（南加州大学计算机科学系）； Department of Psychology, University of Southern California（南加州大学心理学系）； Center for Computational Language Sciences, University of Southern California（南加州大学计算语言科学中心）

AI总结本文研究了大型语言模型中道德基础的编码、组织和表达，通过多层方法分析道德基础与人类道德感知的一致性，并发现道德结构在预训练和微调过程中自然形成，且部分解耦。

详情

AI中文摘要

大型语言模型常常产生类似人类的道德判断，但不清楚这种表现是内部概念结构还是表面的'道德模仿'。使用道德基础理论（MFT）作为分析框架，我们研究了14个基础和指令微调的LLM在四个模型家族（Llama、Qwen2.5、Qwen3-MoE、Mistral）和从7B到70B的不同规模上如何编码、组织和表达道德基础。我们采用多级方法结合（i）逐层分析MFT概念表示及其与人类道德感知的一致性，（ii）在残差流上预训练稀疏自编码器（SAEs）以识别支持道德概念的稀疏特征，以及（iii）使用密集MFT向量和稀疏SAE特征进行因果引导干预。我们发现模型在表示和区分道德基础方面与人类判断一致，且这种道德几何结构自然从预训练中产生，并在微调中被选择性重 wiring。在更细的尺度上，SAE特征显示出与特定基础的明确语义联系，表明在共享表示中存在部分解耦的机制。最后，沿着密集向量或稀疏特征引导会产生可预测的在基础相关行为上的变化，证明了内部表示与道德输出之间的因果联系。共同，我们的结果提供了机械证据，表明LLM中的道德概念是分布的、分层的且部分解耦的，暗示了多元道德结构可以从语言的统计规律中作为潜在模式出现。

英文摘要

Large language models often produce human-like moral judgments, but it is unclear whether this reflects an internal conceptual structure or superficial ``moral mimicry.'' Using Moral Foundations Theory (MFT) as an analytic framework, we study how moral foundations are encoded, organized, and expressed across 14 base and instruction-tuned LLMs spanning four model families (Llama, Qwen2.5, Qwen3-MoE, Mistral) and scales from 7B to 70B. We employ a multi-level approach combining (i) layer-wise analysis of MFT concept representations and their alignment with human moral perceptions, (ii) pretrained sparse autoencoders (SAEs) over the residual stream to identify sparse features that support moral concepts, and (iii) causal steering interventions using dense MFT vectors and sparse SAE features. We find that models represent and distinguish moral foundations in a manner that aligns with human judgments, and that this moral geometry naturally emerges from pretraining and is selectively rewired by post-training. At a finer scale, SAE features show clear semantic links to specific foundations, suggesting partially disentangled mechanisms within shared representations. Finally, steering along either dense vectors or sparse features produces predictable shifts in foundation-relevant behavior, demonstrating a causal connection between internal representations and moral outputs. Together, our results provide mechanistic evidence that moral concepts in LLMs are distributed, layered, and partly disentangled, suggesting that pluralistic moral structure can emerge as a latent pattern from the statistical regularities of language alone.

URL PDF HTML ☆

赞 0 踩 0

2512.24470 2026-05-20 cs.RO cs.AI

Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models

桥梁上的基础模型：基于视觉-语言模型的语义危险检测与安全操作用于海上自主性

Kim Alexander Christensen, Andreas Gudahl Tufte, Alexey Gusev, Rohan Sinha, Milan Ganai, Ole Andreas Alsos, Marco Pavone, Martin Steinert

发表机构 * Dept. of Mechanical and Industrial Engineering, NTNU（机械与工业工程系，挪威科技大学）； Dept. of Aeronautics and Astronautics, Stanford University（航空航天工程系，斯坦福大学）； Dept. of Computer Science, Stanford University（计算机科学系，斯坦福大学）； NVIDIA Research（NVIDIA研究）

AI总结本文提出了一种基于视觉-语言模型的语义危险检测与安全操作方法，用于满足IMO草案MASS代码对海上自主船舶的要求，通过快速-慢速异常管道和短时间范围的人类可覆盖回退操作来实现，在40个港口场景中验证了该方法的性能。

Comments 17 pages without bibliography or appendix. The main paper has 16 figures. Paper webpage can be found at https://kimachristensen.github.io/bridge_policy/

Journal ref Ocean Engineering 359, Part 3 (2026), Article 124646

详情

DOI: 10.1016/j.oceaneng.2026.124646

AI中文摘要

草案IMO MASS代码要求自主和远程监督的海事船舶检测其操作设计领域偏离，进入预定义的回退模式以通知操作员，允许立即的人类接管，并避免在未经批准的情况下更改航行计划。在警报到接管的间隙中满足这些义务需要一个短时间范围、可人类接管的回退操作。传统的海事自主堆栈在正确行动依赖于意义（例如，潜水员旗表示水中的人员，附近有火表示危险）时会遇到困难。我们主张（i）视觉-语言模型（VLMs）为这些分布外情况提供语义意识，（ii）一个快速-慢速异常管道，带有短时间范围、可人类接管的回退操作，使在交接窗口内实现这一目标成为可能。我们引入了Semantic Lookout，一种仅使用摄像头、候选约束的VLM回退操作选择器，它在连续人类授权下，从水有效、世界锚定的轨迹中选择一个谨慎的操作（或站守）。在40个港口场景中，我们测量了每调用场景的理解和延迟，与人类共识（模型多数三票投票）的一致性，短时间范围在火险场景中的风险缓解，以及在水上的警报->回退操作->操作员交接。子10秒的模型保留了较慢的最新模型大部分的意识。回退操作选择器在火险场景中比仅基于几何的基线表现更好，并增加了 standoff 距离。一次现场运行验证了端到端的操作。这些结果支持VLMs作为符合草案IMO MASS代码的语义回退操作选择器，适用于实际延迟预算，并激励未来工作，研究适应领域、混合自主性，将基础模型语义与多传感器鸟瞰感知和短时间范围重新规划相结合。网站：kimachristensen.github.io/bridge_policy

英文摘要

The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the operator, permit immediate human override, and avoid changing the voyage plan without approval. Meeting these obligations in the alert-to-takeover gap calls for a short-horizon, human-overridable fallback maneuver. Classical maritime autonomy stacks struggle when the correct action depends on meaning (e.g., diver-down flag means people in the water, fire close by means hazard). We argue (i) that vision-language models (VLMs) provide semantic awareness for such out-of-distribution situations, and (ii) that a fast-slow anomaly pipeline with a short-horizon, human-overridable fallback maneuver makes this practical in the handover window. We introduce Semantic Lookout, a camera-only, candidate-constrained VLM fallback maneuver selector that selects one cautious action (or station-keeping) from water-valid, world-anchored trajectories under continuous human authority. On 40 harbor scenes we measure per-call scene understanding and latency, alignment with human consensus (model majority-of-three voting), short-horizon risk-relief on fire hazard scenes, and an on-water alert->fallback maneuver->operator handover. Sub-10 s models retain most of the awareness of slower state-of-the-art models. The fallback maneuver selector outperforms geometry-only baselines and increases standoff distance on fire scenes. A field run verifies end-to-end operation. These results support VLMs as semantic fallback maneuver selectors compatible with the draft IMO MASS Code, within practical latency budgets, and motivate future work on domain-adapted, hybrid autonomy that pairs foundation-model semantics with multi-sensor bird's-eye-view perception and short-horizon replanning. Website: kimachristensen.github.io/bridge_policy

URL PDF HTML ☆

赞 0 踩 0

2512.23461 2026-05-20 cs.LG cs.AI

Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance

通过信息论指导消除奖励模型中的归纳偏置

Zhuo Li, Pengyu Cheng, Zhechao Yu, Feifei Tong, Anningzhe Gao, Tsung-Hui Chang, Xiang Wan, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

发表机构 * Qwen Large Model Application Team, Alibaba（阿里巴巴大模型应用团队）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； Shenzhen Research Institute of Big Data（深圳大数据研究院）

AI总结本文提出了一种基于信息论的奖励模型去偏方法DIR，通过最大化奖励模型评分与人类偏好对之间的互信息，同时最小化奖励模型输出与偏好输入偏置属性之间的互信息，从而有效缓解归纳偏置问题并提升RLHF性能。

Comments Published as a conference paper at The International Conference on Learning Representations (ICLR) 2026

详情

AI中文摘要

奖励模型（RMs）在人类反馈的强化学习（RLHF）中至关重要，用于将大型语言模型（LLMs）对齐于人类价值观。然而，RM训练数据通常被认为是低质量的，包含可能导致过拟合和奖励黑客的归纳偏置。例如，更详细和全面的响应通常更受人类青睐，但包含更多单词，导致响应长度成为不可避免的归纳偏置之一。有限的先前RM去偏方法要么针对单一特定类型的偏置，要么仅用简单的线性相关性建模，例如皮尔逊系数。为缓解奖励建模中更复杂和多样的归纳偏置，我们引入了一种新的信息论去偏方法，称为通过信息优化的奖励模型去偏（DIR）。受信息瓶颈（IB）的启发，我们最大化奖励模型评分与人类偏好对之间的互信息（MI），同时最小化奖励模型输出与偏好输入偏置属性之间的互信息。从信息论的理论依据出发，DIR能够处理更复杂的偏置类型，具有非线性相关性，从而广泛扩展了RM去偏方法在现实世界中的应用场景。在实验中，我们验证了DIR在三种归纳偏置类型（响应长度、奉承和格式）上的有效性。我们发现，DIR不仅有效缓解了目标归纳偏置，还通过多样化的基准测试提升了RLHF性能，展现出更好的泛化能力。代码和训练配方可在https://github.com/Qwen-Applications/DIR获取。

英文摘要

Reward models (RMs) are essential in reinforcement learning from human feedback (RLHF) to align large language models (LLMs) with human values. However, RM training data is commonly recognized as low-quality, containing inductive biases that can easily lead to overfitting and reward hacking. For example, more detailed and comprehensive responses are usually human-preferred but with more words, leading response length to become one of the inevitable inductive biases. A limited number of prior RM debiasing approaches either target a single specific type of bias or model the problem with only simple linear correlations, \textit{e.g.}, Pearson coefficients. To mitigate more complex and diverse inductive biases in reward modeling, we introduce a novel information-theoretic debiasing method called \textbf{D}ebiasing via \textbf{I}nformation optimization for \textbf{R}M (DIR). Inspired by the information bottleneck (IB), we maximize the mutual information (MI) between RM scores and human preference pairs, while minimizing the MI between RM outputs and biased attributes of preference inputs. With theoretical justification from information theory, DIR can handle more sophisticated types of biases with non-linear correlations, broadly extending the real-world application scenarios for RM debiasing methods. In experiments, we verify the effectiveness of DIR with three types of inductive biases: \textit{response length}, \textit{sycophancy}, and \textit{format}. We discover that DIR not only effectively mitigates target inductive biases but also enhances RLHF performance across diverse benchmarks, yielding better generalization abilities. The code and training recipes are available at https://github.com/Qwen-Applications/DIR.

URL PDF HTML ☆

赞 0 踩 0

2512.16856 2026-05-20 cs.AI

Distributional AGI Safety

分布式AGI安全

Nenad Tomašev, Matija Franklin, Julian Jacobs, Sébastien Krier, Simon Osindero

发表机构 * Google DeepMind（谷歌深Mind）

AI总结本文提出了一种分布式的AGI安全框架，旨在通过设计和实现虚拟代理沙盒经济来应对群体代理协调带来的安全风险，强调市场机制、可审计性和监管的重要性。

详情

AI中文摘要

人工智能安全和对齐研究主要集中在保护单个AI系统的方法上，基于最终出现单一人工通用智能（AGI）的假设。另一种AGI出现假说认为，一般能力首先通过具有互补技能和能力的子AGI个体代理群体中的协调表现出来，这一假说受到较少关注。本文认为，这种碎片化AGI假说需要得到认真考虑，并应指导相应安全措施和缓解措施的发展。先进AI代理的快速部署，使其具备工具使用能力和通信协调能力，使其成为紧迫的安全问题。因此，我们提出了一种分布式的AGI安全框架，超越了评估和对齐单个代理。该框架以设计和实现虚拟代理沙盒经济（不可渗透或半渗透）为中心，其中代理间的交易由稳健的市场机制调控，并辅以适当的可审计性、声誉管理和监管，以缓解集体风险。

英文摘要

AI safety and alignment research has predominantly been focused on methods for safeguarding individual AI systems, resting on the assumption of an eventual emergence of a monolithic Artificial General Intelligence (AGI). The alternative AGI emergence hypothesis, where general capability levels are first manifested through coordination in groups of sub-AGI individual agents with complementary skills and affordances, has received far less attention. Here we argue that this patchwork AGI hypothesis needs to be given serious consideration, and should inform the development of corresponding safeguards and mitigations. The rapid deployment of advanced AI agents with tool-use capabilities and the ability to communicate and coordinate makes this an urgent safety consideration. We therefore propose a framework for distributional AGI safety that moves beyond evaluating and aligning individual agents. This framework centres on the design and implementation of virtual agentic sandbox economies (impermeable or semi-permeable), where agent-to-agent transactions are governed by robust market mechanisms, coupled with appropriate auditability, reputation management, and oversight to mitigate collective risks.

URL PDF HTML ☆

赞 0 踩 0

2512.11234 2026-05-20 cs.CV

RoomPilot: Controllable Indoor Scene Synthesis via Multimodal Semantic Parsing

RoomPilot: 通过多模态语义解析实现可控的室内场景合成

Wentang Chen, Shougao Zhang, Yiman Zhang, Tianhao Zhou, Ruihui Li

发表机构 * School of Information Science and Engineering, Hunan University（信息科学与工程学院，湖南大学）

AI总结该研究提出RoomPilot框架，通过多模态语义解析实现可控的室内场景合成，解决了现有方法输入模态有限和生成过程隐式的问题，提高了场景结构和语义的可控性。

Comments 30 pages, 8 figures

详情

AI中文摘要

生成可控的室内场景对于游戏开发、建筑可视化和具身AI应用至关重要。然而，现有方法要么只支持有限的输入模态，要么依赖隐式生成过程，限制了对场景结构和语义的精确控制。为了解决这些限制，我们引入RoomPilot，一个统一的框架，从多模态输入（包括文本描述和CAD平面图）中生成可控的室内场景。RoomPilot将异构输入映射到一个室内领域特定语言（IDSL），作为描述室内场景的结构化和可解释的语义表示。基于IDSL，RoomPilot提出一个分层合成流程，逐步在建筑、房间和物体层面组织场景，促进多房间布局中的结构一致性和功能一致性。此外，RoomPilot构建了一个经过精心挑选的资产数据集，具有丰富的语义注释，以支持高质量的场景合成，提高视觉真实感和外观一致性。广泛的实验表明，该方法在多模态理解、场景生成的细粒度可控性以及物理一致性和视觉保真度方面均有所提升，标志着可控3D室内场景合成的重要一步。代码和模型将公开。

英文摘要

Generating controllable indoor scenes is fundamental to applications in game development, architectural visualization, and embodied AI. However, existing approaches either support a limited input modalities or rely on implicit generation processes that hinder precise control over scene structure and semantics. To address these limitations, we introduce RoomPilot, a unified framework for controllable indoor scene synthesis from multi-modal inputs, including textual descriptions and CAD floor plans. RoomPilot maps heterogeneous inputs into an Indoor Domain-Specific Language (IDSL), which serves as a structured and interpretable semantic representation for describing indoor scenes. Built upon IDSL, RoomPilot presents a hierarchical synthesis pipeline that progressively organizes scenes at the building, room, and object levels, promoting structural coherence and functional consistency across multi-room layouts. Moreover, RoomPilot constructs a curated asset dataset with rich semantic annotations to support high-quality scene synthesis, improving visual realism and appearance consistency. Extensive experiments demonstrate effective multi-modal understanding, fine-grained controllability in scene generation, and improved physical consistency and visual fidelity, marking a significant step toward controllable 3D indoor scene synthesis. Code and model will be available.

URL PDF HTML ☆

赞 0 踩 0

2512.10891 2026-05-20 cs.RO cs.LG

MaxShapley：迈向具有公平上下文归因的激励兼容生成搜索

Sara Patel, Mingxun Zhou, Giulia Fanti

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； HKUST（香港科技大学）

AI总结本文提出MaxShapley算法，用于在生成搜索流程中公平地归因和补偿内容提供者，该算法基于Shapley值的特例，通过可分解的max-sum效用函数在多项式时间内计算归因，相比Shapley值的指数成本具有更高的效率。

详情

AI中文摘要

基于大型语言模型（LLMs）的生成搜索引擎正在取代传统搜索引擎，从根本上改变了信息提供者如何获得补偿。为了维持这一生态系统，我们需要公平的机制来根据内容提供者对生成答案的贡献来归因和补偿。我们介绍了MaxShapley，一种高效的算法，用于在生成搜索流程中进行公平的信用归因，该流程在生成之前检索外部来源。MaxShapley是著名Shapley值的特例；它利用可分解的max-sum效用函数，在文档数量上以多项式时间计算归因，而不是Shapley值的指数成本。我们在三个多跳问答数据集（HotPotQA、MuSiQUE、MS MARCO）上评估MaxShapley；MaxShapley在归因质量上与精确的Shapley计算相当，同时消耗的资源更少——例如，在相同归因准确性下，它在资源消耗上比先前最先进的方法减少了高达9倍。我们发布了开源代码和重新校准的数据集。一个教育演示可在https://fair-search.com上获得。

英文摘要

Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem, we need fair mechanisms to attribute and compensate content providers based on their contributions to generated answers. We introduce MaxShapley, an efficient algorithm for fair credit attribution in generative search pipelines that retrieve external sources before generation. MaxShapley is a special case of the celebrated Shapley value; it leverages a de-composable max-sum utility function to compute attributions with polynomial-time computation in the number of documents, as opposed to the exponential cost of Shapley values. We evaluate MaxShapley on three multi-hop QA datasets (HotPotQA, MuSiQUE, MS MARCO); MaxShapley achieves comparable attribution quality to exact Shapley computation, while consuming a fraction of its tokens--for instance, it gives up to a 9x reduction in resource consumption over prior state-of-the-art methods at the same attribution accuracy. We release open-source code and re-calibrated datasets. An educational demo is available at https://fair-search.com.

URL PDF HTML ☆

赞 0 踩 0

2512.05721 2026-05-20 cs.LG

BERTO: Intent-Driven Network Time Series Forecasting via Natural Language Operator Preferences

BERTO：通过自然语言运算偏好进行意图驱动的网络时间序列预测

Nitin Priyadarshini Shankar, Vaibhav Singh, Sheetal Kalyani, Christian Maciocco

发表机构 * Intel Labs（英特尔实验室）； Indian Institute of Technology Madras（印度理工学院马德拉斯分校）

AI总结 BERTO通过自然语言运算偏好进行意图驱动的网络时间序列预测，利用BERT框架实现交通预测和能耗优化，结合平衡损失函数和提示条件，使模型能够根据运营商需求动态调整预测偏差，实现灵活的决策感知预测。

Comments 7 pages, 3 figures, 2 tables

详情

AI中文摘要

传统的蜂窝交通预测模型优化于最小化对称误差，使其对操作优先级的变化不敏感。为弥合这一差距，我们引入BERTO，一种基于BERT的框架，用于蜂窝网络的交通预测和能耗优化。基于Transformer架构，BERTO在实现高预测精度的同时，通过自然语言运营商提示使单个微调模型能够在多个预测制度中运行。通过结合平衡损失函数（BLF）和基于提示的条件，BERTO能够根据运营商在节能和服务质量之间的权衡需求，自适应地调整预测偏差，向欠预测或过预测倾斜。这使得同一模型能够在不重新训练或修改模型参数的情况下，动态生成不同的决策感知预测。在真实世界数据集上的实验表明，BERTO可以在约1.4kW的功率消耗范围内运行，同时平衡9倍的服务级别协议（SLA）违规变化，使其非常适合智能RAN部署。

英文摘要

Traditional cellular traffic forecasting models are optimized for minimizing symmetric errors, leaving them indifferent to shifting operational priorities. To bridge this gap, we introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO achieves high prediction accuracy while enabling a single fine-tuned model to operate across multiple forecasting regimes via natural-language operator prompts. By combining a Balancing Loss Function (BLF) with prompt-based conditioning, BERTO adaptively shifts its forecasting bias toward underprediction or overprediction depending on the operator's desired trade-off between power savings and service quality. This allows the same model to dynamically generate different decision-aware forecasts without retraining or modifying model parameters. Experiments on real-world datasets demonstrate that BERTO can operate across a flexible range of approximately 1.4 kW in power consumption while balancing 9x variation in service level agreement (SLA) violations, making it well suited for intelligent RAN deployments.

URL PDF HTML ☆

赞 0 踩 0

2512.01152 2026-05-20 cs.LG cs.AI cs.CV

Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution

开放集域适应在背景分布偏移下的挑战：挑战与一种可证明高效的解决方案

Shravan Chaudhari, Yoav Wald, Suchi Saria

发表机构 * Department of Computer Science, Johns Hopkins University（约翰霍普金斯大学计算机科学系）； Faculty of Data and Decision Sciences, Technion（技术学院数据与决策科学学院）； Center for Data Science, New York University（纽约大学数据科学中心）； Bayesian Health（贝叶斯健康）

AI总结本文研究了在背景分布偏移情况下开放集域适应的挑战，并提出了一种可证明高效的解决方案CoLOR，通过理论分析和实验证明其在简化过参数化设置中优于基线方法，同时展示了其在图像和文本数据上的广泛适用性。

Comments Project page at https://github.com/Shra1-25/CoLOR

Journal ref Transactions on Machine Learning Research (TMLR) 2026/May ISSN: 2835-8856

详情

AI中文摘要

随着我们将机器学习系统部署到现实世界中，一个核心挑战是保持模型在数据偏移时的性能。这种偏移可以以多种形式存在：新类可能在训练时不存在，这被称为开放集识别，以及已知类别的分布可能发生变化。对于开放集识别的保证大多基于假设已知类别的分布（我们称之为背景分布）是固定的。在本文中，我们开发了CoLOR，一种在挑战性情况下（即背景分布偏移）也能解决开放集识别的方法。我们证明该方法在温和假设下有效，即新类可与非新类分离，并提供理论保证，表明其在简化过参数化设置中优于代表基线方法。我们开发了使CoLOR可扩展和稳健的技术，并在图像和文本数据上进行了全面的实证评估。结果表明，CoLOR在背景偏移下显著优于现有开放集识别方法。此外，我们还提供了新的见解，探讨了诸如新类大小等因素对性能的影响，这在先前工作中尚未得到广泛探索。

英文摘要

As we deploy machine learning systems in the real world, a core challenge is to maintain a model that is performant even as the data shifts. Such shifts can take many forms: new classes may emerge that were absent during training, a problem known as open-set recognition, and the distribution of known categories may change. Guarantees on open-set recognition are mostly derived under the assumption that the distribution of known classes, which we call the background distribution, is fixed. In this paper we develop CoLOR, a method that is guaranteed to solve open-set recognition even in the challenging case where the background distribution shifts. We prove that the method works under benign assumptions that the novel class is separable from the non-novel classes, and provide theoretical guarantees that it outperforms a representative baseline in a simplified overparameterized setting. We develop techniques to make CoLOR scalable and robust, and perform comprehensive empirical evaluations on image and text data. The results show that CoLOR significantly outperforms existing open-set recognition methods under background shift. Moreover, we provide new insights into how factors such as the size of the novel class influences performance, an aspect that has not been extensively explored in prior work.

URL PDF HTML ☆

赞 0 踩 0

2512.00281 2026-05-20 cs.CV q-bio.NC

Beyond Size and Growth: Rethinking Lung Cancer Screening with AI Based Nodule Detection and Diagnosis

超越尺寸和增长：利用AI进行肺结节检测与诊断的肺癌筛查再思考

Sylvain Bodard, Pierre Baudot, Benjamin Renoust, Charles Voyton, Gwendoline De Bie, Ezequiel Geremia, Van-Khoa Le, Danny Francis, Pierre-Henri Siot, Yousra Haddou, Vincent Bobin, Jean-Christophe Brisset, Carey C. Thomson, Valerie Bourdes, Benoit Huet

发表机构 * Université de Paris Cité, AP-HP, Hôpital Universitaire Necker Enfants Malades, Service d’Imagerie Adulte（巴黎大学Cité，AP-HP，Necker儿童医院成人影像科）； Memorial Sloan Kettering Cancer Center, Department of Radiology（纪念斯隆凯特琳癌症中心，放射科）； Sorbonne Université, CNRS UMR 7371, INSERM U 1146, Laboratoire d’Imagerie Biomédicale (LIB)（索邦大学，CNRS UMR 7371，INSERM U 1146，生物医学成像实验室）； Median Technologies, eyonis（Median Technologies，eyonis）； Mount Auburn Hospital/Beth Israel Lahey Health, Cambridge MA, USA（Mount Auburn医院/Beth Israel Lahey健康，马萨诸塞州剑桥市，美国）； Harvard Medical School, Boston MA, USA（哈佛医学院，马萨诸塞州波士顿，美国）

AI总结本文提出了一种基于AI的集成系统，通过低剂量CT扫描在结节层面直接进行结节检测和恶性评估，超越传统基于尺寸和增长的筛查标准，提高了肺癌筛查的准确性和效率。

Comments 25 pages, 8 figures, with supplementary information containing 11 figures

详情

AI中文摘要

分层调度优化用于快速且稳健的扩散模型采样

Aihua Zhu, Rui Su, Qinglin Zhao, Li Feng, Meng Shen, Shibo He

发表机构 * School of Computer Science and Engineering, Macau University of Science and Technology（澳门科学技术大学计算机科学与工程学院）； Beijing Institute of Technology（北京理工大学）； Zhejiang University（浙江大学）

AI总结本文提出了一种分层调度优化方法，通过改进的双层优化框架，在极低的函数评估次数下实现高效的扩散模型采样，显著提升了样本质量和计算效率。

Comments Preprint, accepted to AAAI 2026

详情

AI中文摘要

扩散概率模型在生成保真度方面设立了新标准，但受到采样过程缓慢的迭代限制。一种强大的无训练策略是调度优化，旨在在固定的、较小的函数评估次数（NFE）下找到最优的时间步分布以最大化样本质量。为此，成功的调度优化方法必须遵循四个核心原则：有效性、适应性、实用性鲁棒性和计算效率。然而，现有方法难以同时满足这些原则，推动了更先进解决方案的需求。为克服这些限制，我们提出了分层调度优化器（HSO），一种新颖且高效的双层优化框架。HSO通过交替迭代两个协同层级将全局最优调度的搜索转化为更可处理的问题：上层的全局搜索用于寻找最优初始化策略，下层的局部优化用于调度细化。这一过程由两个关键创新引导：中点误差代理（MEP），一种求解器无关且数值稳定的局部优化目标，以及间距惩罚适应度（SPF）函数，通过惩罚病态接近的时间步确保实用性鲁棒性。大量实验表明，HSO在极低NFE范围内为无训练采样设定了新的状态-of-the-art。例如，仅使用5次NFE，HSO在LAION-Aesthetics上实现显著的FID为11.94，使用Stable Diffusion v2.1。关键的是，这种性能不是通过昂贵的重新训练实现的，而是一次性的优化成本不到8秒，提供了一种高效且实用的扩散模型加速范式。

英文摘要

Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this process is Schedule Optimization, which aims to find an optimal distribution of timesteps for a fixed and small Number of Function Evaluations (NFE) to maximize sample quality. To this end, a successful schedule optimization method must adhere to four core principles: effectiveness, adaptivity, practical robustness, and computational efficiency. However, existing paradigms struggle to satisfy these principles simultaneously, motivating the need for a more advanced solution. To overcome these limitations, we propose the Hierarchical-Schedule-Optimizer (HSO), a novel and efficient bi-level optimization framework. HSO reframes the search for a globally optimal schedule into a more tractable problem by iteratively alternating between two synergistic levels: an upper-level global search for an optimal initialization strategy and a lower-level local optimization for schedule refinement. This process is guided by two key innovations: the Midpoint Error Proxy (MEP), a solver-agnostic and numerically stable objective for effective local optimization, and the Spacing-Penalized Fitness (SPF) function, which ensures practical robustness by penalizing pathologically close timesteps. Extensive experiments show that HSO sets a new state-of-the-art for training-free sampling in the extremely low-NFE regime. For instance, with an NFE of just 5, HSO achieves a remarkable FID of 11.94 on LAION-Aesthetics with Stable Diffusion v2.1. Crucially, this level of performance is attained not through costly retraining, but with a one-time optimization cost of less than 8 seconds, presenting a highly practical and efficient paradigm for diffusion model acceleration.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning

Rethinking the Design Space of Reinforcement Learning for Diffusion Models: On the Importance of Likelihood Estimation Beyond Loss Design

Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture

Contextualized Visual Personalization in Vision-Language Models

Learning ORDER-Aware Multimodal Representations for Composite Materials Design

Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion

FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction

Disentangling generalization and memorization in large language models using chess

Q-learning with Adjoint Matching

Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies

Tracing Moral Foundations in Large Language Models

Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models

Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance

Distributional AGI Safety

RoomPilot: Controllable Indoor Scene Synthesis via Multimodal Semantic Parsing

Iterative Compositional Data Generation for Robot Control

Fast-BEV++: Fast by Algorithm, Deployable by Design

SETUP: Sentence-level English-To-Uniform Meaning Representation Parser

MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution

BERTO: Intent-Driven Network Time Series Forecasting via Natural Language Operator Preferences

Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A Provably Efficient Solution

Beyond Size and Growth: Rethinking Lung Cancer Screening with AI Based Nodule Detection and Diagnosis

Reflection-Based Relative Localization for Cooperative UAV Teams Using Active Markers

SVG360: Editable Multiview Vector Graphics from a Single SVG

GRLoc: Geometric Representation Regression for Visual Localization

Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis

Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling