arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.13780 2026-05-13 cs.LG

Principled Latent Diffusion for Graphs via Laplacian Autoencoders

Antoine Siraudin, Christopher Morris

AI总结该论文提出了一种基于拉普拉斯自编码器的图潜在扩散模型LG-Flow，用于解决传统图扩散模型在节点数量增加时计算复杂度呈二次增长的问题。通过将图结构编码到低维潜在空间，模型实现了近似无损的图重建，并有效避免了稀疏图中边缺失建模的冗余问题。该方法利用排列等变自编码器和扩散变换器，显著提升了图生成的效率与规模，实验表明其在生成性能上具有竞争力，且训练速度提升了近千倍。

Comments Preprint, under review

2601.07473 2026-05-13 cs.LG

AntiPaSTO: Self-Supervised Honesty Steering via Anti-Parallel Representations

Michael J. Clark

AI总结随着模型能力增强，人类难以可靠地验证模型的输出。本文提出了一种名为 AntiPaSTO 的自监督方法，通过在反平行轴上分离表示并引入一致性约束，实现对模型诚实性的内部引导。该方法仅需在模板句中插入两个对比词进行训练，无需人工标注，实验表明其在多个价值轴上均优于传统提示方法，且具备双向控制能力。

Comments Code is available at https://github.com/wassname/AntiPaSTO

2601.07384 2026-05-13 cs.LG

CompNO: A Novel Foundation Model approach for solving Partial Differential Equations

Hamda Hmida, Hsiu-Wen Chang Joly, Youssef Mesri

AI总结本文提出了一种名为CompNO的新基础模型方法，用于求解参数化偏微分方程（PDEs）。该方法通过学习一组基础模块（每个模块对应一种基本微分算子的傅里叶神经算子），并结合轻量的适配模块构建任务特定求解器，从而避免了传统单一大模型的高昂预训练成本和可解释性不足的问题。实验表明，CompNO在多种PDEs上取得了比现有方法更低的相对L2误差，并能准确满足边界条件，展现出良好的泛化能力和物理可解释性。

Comments Under review at MDPI

详情

DOI: 10.3390/app16020972

英文摘要

Partial differential equations (PDEs) govern a wide range of physical phenomena, but their numerical solution remains computationally demanding, especially when repeated simulations are required across many parameter settings. Recent Scientific Foundation Models (SFMs) aim to alleviate this cost by learning universal surrogates from large collections of simulated systems, yet they typically rely on monolithic architectures with limited interpretability and high pretraining expense. In this work we introduce Compositional Neural Operators (CompNO), a compositional neural operator framework for parametric PDEs. Instead of pretraining a single large model on heterogeneous data, CompNO first learns a library of Foundation Blocks, where each block is a parametric Fourier neural operator specialized to a fundamental differential operator (e.g. convection, diffusion, nonlinear convection). These blocks are then assembled, via lightweight Adaptation Blocks, into task-specific solvers that approximate the temporal evolution operator for target PDEs. A dedicated boundary-condition operator further enforces Dirichlet constraints exactly at inference time. We validate CompNO on one-dimensional convection, diffusion, convection--diffusion and Burgers' equations from the PDEBench suite. The proposed framework achieves lower relative L2 error than strong baselines (PFNO, PDEFormer and in-context learning based models) on linear parametric systems, while remaining competitive on nonlinear Burgers' flows. The model maintains exact boundary satisfaction with zero loss at domain boundaries, and exhibits robust generalization across a broad range of Peclet and Reynolds numbers. These results demonstrate that compositional neural operators provide a scalable and physically interpretable pathway towards foundation models for PDEs.

URL PDF HTML ☆

赞 0 踩 0

2601.05752 2026-05-13 cs.CL cs.SE

AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor

Shu Yang, Jingyu Hu, Tong Li, Hanqi Yan, Wenxuan Wang, Di Wang

AI总结本文介绍了 AutoMonitor-Bench，这是首个用于系统评估基于大语言模型（LLM）的异常行为监控可靠性 benchmark，涵盖问答、代码生成和推理等任务，包含 3,010 个精心标注的测试样本。研究通过误检率（MR）和误报率（FAR）两个指标评估监控性能，揭示了不同模型在检测能力与敏感度之间的权衡。此外，作者构建了大规模训练语料并微调 Qwen3-4B-Instruction，探索了针对已知异常行为数据训练是否能提升模型对未知隐性异常的监控能力，突显了构建可靠且可扩展的 LLM 异常监控系统所面临的挑战。

Comments ACL 2026 Findings

2601.03627 2026-05-13 cs.CL cs.AI

Evaluating the Pre-Consultation Ability of LLMs using Diagnostic Guidelines

Jean Seo, Gibaeg Kim, Kihun Shin, Seungseop Lim, Hyunkyung Lee, Wooseok Han, Jongwon Lee, Eunho Yang

AI总结本文提出EPAG，一个用于评估大语言模型（LLMs）预诊能力的基准数据集和框架，通过比较病史信息与诊断指南直接评估模型能力，并通过疾病诊断间接评估。研究发现，经过精心构建的特定任务数据集微调的小型开源模型在预诊任务中可超越前沿大模型，同时发现病史信息量的增加并不一定提升诊断性能。研究还揭示了预诊对话的语言特性受对话内容影响，并开源了数据集和评估流程以促进临床场景中LLM应用的发展。

Comments EACL 2026 Industry

2512.22933 2026-05-13 cs.AI cs.CL

RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild

Danni Xu, Shaojing Fan, Harry Cheng, Mohan Kankanhalli

AI总结本文提出 RW-Post，一个用于真实场景下多模态事实核查的可审计基准数据集，每个样本都关联原始社交媒体帖子、推理过程和来自人工事实核查文章的明确证据。该数据集支持多种评估模式，有助于系统分析模型在视觉关联和证据利用方面的能力。实验表明，当前模型在证据关联方面仍有较大提升空间，而基于证据的评估方式能有效提升模型的准确性和可信度。

Comments Code and dataset will be released at https://github.com/xudanni0927/AgentFact

2512.22579 2026-05-13 cs.AI cs.NI

SANet: A Semantic-aware Agentic AI Networking Framework for Cross-layer Optimization in 6G

Yong Xiao, Xubo Li, Haoran Zhou, Yingyu Li, Yayu Gao, Guangming Shi, Ping Zhang, Marwan Krunz

AI总结本文提出了一种名为SANet的语义感知智能体网络框架，旨在实现6G无线网络中的跨层优化。该框架通过理解用户的语义目标，自动分配不同网络层的智能体以完成任务，并针对多智能体多目标优化问题，提出了寻找帕累托最优解的优化方法。此外，文章还引入了模型划分与共享（MoPS）机制，以提升计算资源的利用效率，并通过实验验证了该框架在性能提升和计算效率方面的显著优势。

Comments Accepted at IEEE Transactions on Mobile Computing

详情

DOI: 10.1109/TMC.2026.3691804
Journal ref: IEEE Transactions on Mobile Computing, 2026

英文摘要

Agentic AI networking (AgentNet) is a novel AI-native networking paradigm in which a large number of specialized AI agents collaborate to perform autonomous decision-making, dynamic environmental adaptation, and complex missions. It has the potential to facilitate real-time network management and optimization functions, including self-configuration, self-optimization, and self-adaptation across diverse and complex environments. This paper proposes SANet, a novel semantic-aware AgentNet architecture for wireless networks that can infer the semantic goal of the user and automatically assign agents associated with different layers of the network to fulfill the inferred goal. Motivated by the fact that AgentNet is a decentralized framework in which collaborating agents may generally have different and even conflicting objectives, we formulate the decentralized optimization of SANet as a multi-agent multi-objective problem, and focus on finding the Pareto-optimal solution for agents with distinct and potentially conflicting objectives. We propose three novel metrics for evaluating SANet. Furthermore, we develop a model partition and sharing (MoPS) framework in which large models, e.g., deep learning models, of different agents can be partitioned into shared and agent-specific parts that are jointly constructed and deployed according to agents' local computational resources. Two decentralized optimization algorithms are proposed. We derive theoretical bounds and prove that there exists a three-way tradeoff among optimization, generalization, and conflicting errors. We develop an open-source RAN and core network-based hardware prototype that implements agents to interact with three different layers of the network. Experimental results show that the proposed framework achieved performance gains of up to 14.61% while requiring only 44.37% of FLOPs required by state-of-the-art algorithms.

URL PDF HTML ☆

赞 0 踩 0

2512.12177 2026-05-13 cs.AI

Floorplan2Guide: LLM-Guided Floorplan Parsing for BLV Indoor Navigation

Aydin Ayanzadeh, Tim Oates

AI总结本文提出了一种基于大语言模型（LLM）引导的室内平面图解析方法Floorplan2Guide，旨在提升盲人和低视力（BLV）人群的室内导航能力。该方法将建筑平面图转化为可导航的知识图谱，并生成可读的导航指令，减少了传统方法对人工预处理的依赖。实验表明，该方法在模拟和真实环境中均能有效提升导航准确率，尤其在少样本学习下表现优异，且基于图结构的空间推理比直接视觉推理具有更高的成功率。

Comments Accepted for publication in the proceedings of the IEEE International Conference on Big Data (IEEE BigData 2025)

2512.12165 2026-05-13 cs.CV

Audio-Visual Camera Pose Estimation with Passive Scene Sounds and In-the-Wild Video

Daniel Adebi, Sagnik Majumder, Kristen Grauman

AI总结本文研究了如何利用被动场景声音和野外视频进行音频-视觉相机位姿估计，解决视觉退化条件下相机运动估计的难题。作者提出了一种简单有效的音频-视觉框架，将到达方向（DOA）谱和双耳嵌入特征融合到先进的视觉位姿估计模型中，显著提升了位姿估计的准确性和鲁棒性。该方法在两个大规模数据集上的实验表明，相比纯视觉方法具有明显优势，尤其在视觉信息受损时表现突出，为现实场景中的相机位姿估计提供了新的音频辅助思路。

2512.12131 2026-05-13 cs.LG cs.DC

BOOST: BOttleneck-Optimized Scalable Training Framework for Low-Rank Large Language Models

Zhengyang Wang, Ziyue Liu, Ruijie Zhang, Avinash Maurya, Paul Hovland, Bogdan Nicolae, Franck Cappello, Zheng Zhang

AI总结本文提出了一种名为 BOOST 的高效训练框架，专门用于大规模低秩瓶颈架构的大语言模型。针对传统张量并行方法在低秩模型中通信开销大、GPU利用率低的问题，BOOST 引入了瓶颈感知的张量并行策略，并结合在线 RMSNorm、线性层分组和低秩激活检查点等优化技术，显著提升了训练速度。实验表明，BOOST 在多种低秩瓶颈架构上相比全秩模型和简单集成的 3D 并行方法分别实现了 1.46 到 1.91 倍和 1.87 到 2.27 倍的加速，同时提高了 GPU 利用率并减少了通信开销。

2512.11321 2026-05-13 cs.CV

KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes

Jingchao Wu, Zejian Kang, Haibo Liu, Yuanchen Fei, Xiangru Huang

AI总结本文提出了一种名为 KeyframeFace 的语言驱动面部动画生成方法，通过语义关键帧实现对人脸表情的精确控制。与现有方法直接从文本生成连续帧不同，该方法借鉴动画制作中的关键帧理念，在可解释的 ARKit 控制空间中使用语义关键帧表示动画，并利用大语言模型生成与文本描述和情绪线索对齐的关键帧。实验表明，该方法在表情保真度和语义一致性方面优于传统方法，同时提供了更清晰的语义控制结构。

2512.05683 2026-05-13 cs.CV physics.optics

Physics-Informed Graph Neural Networks for Frequency-Aware Optical Aberration Correction

Yong En Kok, Bowen Deng, Alexander Bentley, Andrew J. Parkes, Michael G. Somekh, Amanda J. Wright, Michael P. Pound

AI总结本文提出了一种基于物理信息的图神经网络ZRNet，用于频率感知的光学像差校正。该方法结合了Zernike多项式系数预测与光学图像复原，通过引入Zernike图模块和频率感知对齐损失，显式建模多项式间的物理关系并增强图像与系数预测在频域的一致性。实验表明，ZRNet在多种显微成像模态和复杂生物样本上均取得了最先进的像差校正和图像复原效果，并在真实光学系统数据上验证了其鲁棒性和泛化能力。

2512.00775 2026-05-13 cs.RO cs.SY eess.SY

SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning

Ruijia Liu, Ancheng Hou, Xiang Yin

AI总结本文研究了在严格离线、无模型设定下，基于线性时序逻辑（LTL）的机器人任务规划与执行问题。为解决该问题，作者提出了一种名为SAGAS的框架，结合符号合成的组合性与从离线轨迹中学习到的数据驱动可达结构。该方法通过学习可复用的潜在可达图和固定的目标条件执行器，并对每个新的LTL公式进行语义图增强和布奇积搜索，从而生成可执行且成本高效的路径规划，实现了对未见过的LTL任务的零样本泛化。

2511.22475 2026-05-13 cs.LG cs.CV

Adversarial Flow Models

Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan

AI总结本文提出了一类生成模型——对抗流模型，结合了对抗学习和流模型的优点，支持一步或多步生成，并通过对抗目标进行训练。与传统GAN不同，该模型鼓励生成器学习确定性的噪声到数据映射，从而显著稳定训练过程；与基于一致性的方法相比，它无需学习概率流的中间时间步，直接实现一步或多步生成，避免了误差累积并保留了模型容量。实验表明，该模型在ImageNet-256px数据集上取得了优于现有方法的生成质量。

Comments ICML 2026

2511.17038 2026-05-13 cs.AI eess.IV stat.ML

DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing

Hao Chen, Renzheng Zhang, Scott S. Howard

AI总结本文提出了一种名为DAPS++的新型扩散逆问题求解方法，旨在解决传统扩散模型在逆问题中先验引导不足的问题。该方法通过将扩散初始化与似然驱动的优化过程完全解耦，使重建过程更直接地由测量一致性引导，同时保持数值稳定性。实验表明，DAPS++在减少函数评估次数和优化步骤的前提下，实现了高效的计算性能和鲁棒的图像恢复效果。

2511.16520 2026-05-13 cs.LG cs.CV eess.IV eess.SP

Saving Foundation Flow-Matching Priors for Inverse Problems

Yuxiang Wan, Ryan Devera, Wenjie Zhang, Ju Sun

AI总结本文提出了一种名为FMPlug的插件框架，旨在提升基础流匹配模型在逆问题中的应用效果。该方法结合了实例引导的时序预热策略和尖锐高斯正则化，既增强了问题特异性指导，又保持了高斯结构的稳定性。实验表明，FMPlug在图像修复和样本稀缺的科学逆问题中均表现出色，为在这些场景中实用化基础流匹配模型提供了有效途径。

Comments Accepted by ICML 2026

2511.14715 2026-05-13 cs.LG cs.AI cs.CR cs.DC cs.MA

FLARE: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning

Abolfazl Younesi, Leon Kiss, Zahra Najafabadi Samani, Juan Aznar Poveda, Thomas Fahringer

AI总结联邦学习（FL）在保障数据隐私的同时实现协作训练，但易受到恶意客户端通过拜占庭攻击、数据投毒等手段破坏模型完整性。为应对这一问题，本文提出 FLARE，一种基于自适应多维信誉评估的框架，通过持续、多维的信誉评分机制动态评估客户端可靠性，并结合自适应阈值调整、信誉加权聚合和本地差分隐私等技术，提升系统鲁棒性。实验表明，FLARE 在多种攻击场景下均能保持较高的模型准确率和收敛速度，显著优于现有方法。

Comments The authors want to withdraw this manuscript for further verification and revision. We may release a substantially revised version in the future

详情

英文摘要

Federated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors. Existing defense mechanisms rely on static thresholds and binary classification, failing to adapt to evolving client behaviors in real-world deployments. We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation. FLARE integrates: (i) a multi-dimensional reputation score capturing performance consistency, statistical anomaly indicators, and temporal behavior, (ii) a self-calibrating adaptive threshold mechanism that adjusts security strictness based on model convergence and recent attack intensity, (iii) reputation-weighted aggregation with soft exclusion to proportionally limit suspicious contributions rather than eliminating clients outright, and (iv) a Local Differential Privacy (LDP) mechanism enabling reputation scoring on privatized client updates. We further introduce a highly evasive Statistical Mimicry (SM) attack, a benchmark adversary that blends honest gradients with synthetic perturbations and persistent drift to remain undetected by traditional filters. Extensive experiments with 100 clients on MNIST, CIFAR-10, and SVHN demonstrate that FLARE maintains high model accuracy and converges faster than state-of-the-art Byzantine-robust methods under diverse attack types, including label flipping, gradient scaling, adaptive attacks, ALIE, and SM. FLARE improves robustness by up to 16% and preserves model convergence within 30% of the non-attacked baseline, while achieving strong malicious-client detection performance with minimal computational overhead. https://github.com/Anonymous0-0paper/FLARE

URL PDF HTML ☆

赞 0 踩 0

2511.12034 2026-05-13 cs.CV cs.LG cs.MM

Calibrated Multimodal Representation Learning with Missing Modalities

Xiaohao Liu, Xiaobo Xia, Jiaheng Wei, Shuo Yang, Xiu Su, See-Kiong Ng, Tat-Seng Chua

AI总结多模态表征学习旨在将不同模态的信息对齐到统一的潜在空间中，但现有方法通常要求所有模态都存在，难以处理数据中缺失模态的情况。本文从锚点偏移的角度出发，揭示了缺失模态导致对齐偏差的理论机制，并提出了一种名为CalMRL的方法，通过利用模态间的先验知识和内在联系，在表征层面进行缺失模态的补全与对齐校准。实验表明，该方法有效缓解了锚点偏移问题，提升了模型在缺失模态数据上的表现。

Comments Accepted by ICML 2026

2510.25609 2026-05-13 cs.LG cs.AI eess.SP

Revisiting GAN with Bayes-Optimal Discrimination

Mohammadreza Tavasoli Naeini, Ali Bereyhi, Morteza Noshad, Ben Liang, Alfred O. Hero

AI总结本文提出了一种改进的标准生成对抗网络（GAN）训练方法，其核心在于将判别器的目标从交叉熵损失转变为直接最小化判别贝叶斯错误率（BER）。为此，作者引入了贝叶斯最优学习阈值（BOLT）损失函数，并通过最大化判别BER的替代量来训练生成器。该方法统一了GAN训练的不同目标，揭示了它们在平滑性与紧致性之间的权衡关系，并在平衡类别先验的条件下，证明了最大化替代BER能够最小化数据分布与生成分布之间的总变分距离，同时与Wasserstein GAN建立了联系。实验表明，该方法在图像生成任务中提升了样本质量和覆盖范围。

2510.24570 2026-05-13 cs.CL

BEST-RQ-Based Self-Supervised Learning for Whisper Domain Adaptation

Raphaël Bagat, Irina Illina, Emmanuel Vincent

AI总结本文提出了一种名为BEARD的新型框架，用于在缺乏标注数据的低资源场景下对Whisper语音识别模型进行领域自适应。该方法结合了BEST-RQ自监督学习目标与知识蒸馏技术，通过未标注数据微调Whisper编码器，并与预训练解码器保持互补性。实验表明，在具有非母语发音、噪声和专业术语的航空管制通信领域，该方法在仅使用5000小时未转录语音和2小时标注语音的情况下，相比已有基线和微调模型，相对提升了12%的识别性能，是首个将自监督学习应用于Whisper领域自适应的工作。

Comments Accepted to ICASSP 2026

2510.06371 2026-05-13 cs.CL cs.AI

OASIS: A Multilingual and Multimodal Dataset for Culturally Grounded Spoken Visual QA

Firoj Alam, Ali Ezzat Shahroor, Md. Arid Hasan, Zien Sheikh Ali, Hunzalah Hassan Bhatti, Mohamed Bayan Kmainasi, Shammur Absar Chowdhury, Basel Mousi, Fahim Dalvi, Nadir Durrani, Natasa Milic-Frayling

AI总结 OASIS 是一个大规模的多语言、多模态数据集，旨在支持基于文化背景的口语视觉问答任务。该数据集包含大量图像、文本和语音数据，涵盖英语和阿拉伯语多种变体，适用于评估模型在常识推理、文化理解和真实场景中的表现。研究提出了一种可扩展的半自动框架 EverydayMMQA 用于构建本地化的问答资源，并通过多阶段人工验证确保数据质量，为多模态模型的训练与评估提供了重要支持。

Comments Multimodal Foundation Models, Large Language Models, Native, Multilingual, Language Diversity, Contextual Understanding, Culturally Informed

2510.05408 2026-05-13 cs.CV cs.AI

See the past: Time-Reversed Scene Reconstruction from Thermal Traces Using Visual Language Models

Kebin Contreras, Luis Toscano-Palomino, Mauro Dalla Mura, Jorge Bacca

AI总结该研究提出了一种基于热成像和视觉语言模型的时序逆向重建方法，旨在从当前的热痕迹中恢复过去几秒内的场景状态。方法结合了视觉语言模型与约束扩散过程，通过生成场景描述并指导图像重建，确保语义与结构的一致性。实验表明，该方法能够在受控环境下重建出最多120秒前的合理场景画面，为基于热痕迹的时序逆向成像提供了初步实现。

2510.04265 2026-05-13 cs.AI cs.CL math.ST stat.ML stat.TH

Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation

Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary

AI总结本文提出了一种基于贝叶斯框架的大语言模型评估方法，旨在解决传统Pass@k指标在样本量有限时排名不稳定、易误导的问题。该方法通过估计模型的底层成功概率及其可信区间，提供更稳定且具有统计意义的模型排名，并支持对评分标准的灵活加权。实验表明，该框架在收敛速度和排名稳定性方面优于Pass@k，且能明确区分统计显著差异与噪声，适用于二元和非二元评估场景。

Comments OpenReview (ICLR 2026): https://openreview.net/forum?id=PTXi3Ef4sT

详情

Journal ref: The Fourteenth International Conference on Learning Representations (ICLR), 2026

英文摘要

Pass$@k$ is widely used to report the reasoning performance of LLMs, but it often produces unstable and potentially misleading rankings, especially when the number of trials (samples) is limited and computational resources are constrained. We present a principled Bayesian evaluation framework that replaces Pass$@k$ and average accuracy over $N$ trials (avg$@N$) with posterior estimates of a model's underlying success probability and credible intervals, yielding stable rankings and a transparent decision rule for differences. Evaluation outcomes are modeled as categorical (not just 0/1) with a Dirichlet prior, giving closed-form expressions for the posterior mean and uncertainty of any weighted rubric and enabling the use of prior evidence when appropriate. Theoretically, under a uniform prior, the Bayesian posterior mean is order-equivalent to average accuracy (Pass$@1$), explaining its empirical robustness while adding principled uncertainty. Empirically, in simulations with known ground-truth success rates and on AIME'24/'25, HMMT'25, and BrUMO'25, the posterior-based procedure achieves faster convergence and greater rank stability than Pass$@k$ and recent variants, enabling reliable comparisons at far smaller sample counts. The framework clarifies when observed gaps are statistically meaningful (non-overlapping credible intervals) versus noise, and it naturally extends to graded, rubric-based evaluations. Together, these results recommend replacing Pass$@k$ for LLM evaluation and ranking with a posterior-based, compute-efficient protocol that unifies binary and non-binary evaluation while making uncertainty explicit. Source code is available at https://github.com/mohsenhariri/scorio

URL PDF HTML ☆

赞 0 踩 0

2510.02043 2026-05-13 cs.CV cs.HC cs.LG

Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers

Sahil Bhandary Karnoor, Romit Roy Choudhury

AI总结本文研究了在传感器数量有限的情况下实现零样本人体姿态估计的问题。作者将姿态估计建模为一个逆问题，并提出了一种基于扩散模型的逆求解算法，仅依赖旋转测量信息进行条件生成，同时结合位置测量的似然项进行引导。该方法无需针对每个用户进行微调，实现了跨用户的零样本泛化，为少传感器场景下的姿态估计提供了新思路。

Comments Published as a Conference Paper at The Fourteenth International Conference on Learning Representations

2509.19207 2026-05-13 cs.CV

Long Story Short: Disentangling Compositionality and Long-Caption Understanding in Contrastive VLMs

Israfel Salazar, Desmond Elliott, Yova Kementchedjhieva

AI总结本文研究了对比视觉-语言模型（VLMs）在理解长篇组合性描述时面临的挑战，分析了组合推理与长描述理解之间的关系。通过在不同训练目标、数据集和架构设计下的受控实验，发现两者存在双向但敏感的关联，高质量且具有强视觉支撑的长描述数据有助于同时提升两种能力，而某些架构设计可能限制组合性学习。研究为改进VLM的泛化能力提供了数据选择和模型设计的实用指导。

Comments To be published in Findings of ACL 2026

2509.14933 2026-05-13 cs.LG

DAG: A Dual Correlation Network for Time Series Forecasting with Exogenous Variables

Xiangfei Qiu, Yuhan Zhu, Zhengyu Li, Xingjian Wu, Bin Yang, Jilin Hu

AI总结时间序列预测在多个领域具有重要意义，而引入外生变量（协变量）可以提供额外的预测信息，提高预测精度。然而，现有方法在利用外生变量，尤其是未来外生变量及其与内生变量之间的相关性方面存在不足。为此，本文提出DAG模型，通过在时间维度和通道维度上构建双重相关性网络，充分挖掘历史外生变量与未来外生变量、历史内生变量之间的相关性，并将其注入到未来内生变量的预测过程中，从而提升时间序列预测的准确性。

Comments Accepted by ICML 2026

2509.10692 2026-05-13 cs.RO

STL-Based Motion Planning and Uncertainty-Aware Risk Analysis for Human-Robot Collaboration with a Multi-Rotor Aerial Vehicle

Giuseppe Silano, Amr Afifi, Martin Saska, Antonio Franchi

AI总结本文提出了一种基于信号时序逻辑（STL）的运动规划与风险分析框架，旨在提升多旋翼无人机与人类的协作能力。该方法通过STL编码任务中的安全、时间约束和人体舒适性等关键目标，并结合优化规划生成符合无人机动力学约束的可行轨迹，同时引入不确定性感知的风险分析以应对人类姿态的不确定性。实验验证表明，该框架能够在真实操作条件下实现安全、高效且鲁棒的人机协作。

Comments 46 pages, 14 figures

2509.09838 2026-05-13 cs.LG cs.AI

Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives

Reza Asad, Reza Babanezhad, Sharan Vaswani

AI总结本文研究了离散动作空间中Soft Actor-Critic（DSAC）算法的局限性，并提出了一种改进的原理性替代方法。作者发现DSAC表现不佳的主要原因是策略和价值函数之间的熵耦合，通过解耦这一部分可以显著提升性能。基于此，他们提出了一种灵活的离策略actor-critic框架，支持新的目标函数，并在理论和实验上证明了其在Atari游戏中的优越性，即使不依赖熵正则化或显式探索机制也能保持稳健表现。

2509.06701 2026-05-13 cs.LG cs.AI

Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks

Su Hyeong Lee, Risi Kondor, Richard Ngo

AI总结本文提出了一种基于概率建模的智能代理理论，用于理解深度神经网络中的潜在代理子结构。研究通过定义代理的成果分布及其认知效用，结合加权对数混合方法，探讨了代理组合的形成机制，并证明了在特定条件下实现严格共识的可能性。研究还揭示了大型语言模型中代理对齐的现象，表明通过引导良性代理可以诱发对抗性代理，从而为代理型人工智能系统的对齐问题提供了新的数学框架和启示。

Comments Accepted by ICML 2026

2508.21260 2026-05-13 cs.RO eess.SP math.ST stat.TH

Remarks on stochastic cloning and delayed-state filtering

Tara Mina, Lindsey Marinello, John Christian

AI总结本文研究了在航空航天导航和机器人领域中处理依赖于先验状态的延迟状态测量的估计问题，重点探讨了随机克隆（SC）方法以及一种被长期忽视的替代方法——延迟状态卡尔曼滤波（DSKF）。研究发现，正确推导的DSKF能够在无需状态扩增的情况下，实现与SC相同的状态和协方差更新，并提供了两种等效的DSKF形式，从不同角度解释了如何在广义卡尔曼滤波框架中处理先验状态测量的相关性。研究还表明，DSKF在计算和存储复杂度上与SC相当，且在某些问题维度下可进一步降低计算和存储成本，澄清了卡尔曼滤波无法处理相关延迟状态测量的误解。