arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.08723 2026-05-12 cs.CV cs.MM

EAR: Enhancing Uni-Modal Representations for Weakly Supervised Audio-Visual Video Parsing

Huilai Li, Xiaomeng Di, Ying Xing, Yonghao Dang, Yiming Wang, Jianqin Yin

AI总结本文研究弱监督音视频视频解析（AVVP）问题，旨在仅使用粗粒度标签识别和定位视频中的音频、视觉及音视频事件。现有方法多关注多模态融合，却忽视了对单模态语义的引导与保持，导致伪标签噪声大、解析性能不佳。为此，本文提出一种增强单模态表征的新框架，通过相似性标签迁移方法提升伪标签生成器对单模态事件的理解，并采用软约束方式同步优化单模态与多模态特征建模，从而提升事件定位性能。实验表明，该方法在伪标签生成和AVVP任务中均优于现有先进方法。

2605.08722 2026-05-12 cs.RO cs.MA

HULK: Large-scale Hierarchical Coordination under Continual and Uncertain Temporal Tasks

Qingyuan Luo, Jie Li, Meng Guo

AI总结本文研究了在持续生成且任务数量不确定的环境下，如何实现大规模多智能体系统的高效协作与任务分配问题。为此，提出了一种分层协调框架HULK，通过滚动分配任务到子团队，并在子团队内进行动态协调，实现了不同粒度和触发条件下的分层协调机制。该方法在大规模异构系统中进行了严格验证，显著提升了计算效率和系统鲁棒性。

Comments Accepted to the IEEE International Conference on Robotics and Automation. 7 pages, 4 figures

2605.08721 2026-05-12 cs.CL

Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents

Minzheng Wang, Run Luo, Yanbo Wang, Zichen Liu, Yuqiao Tan, Tao Tan, Xu Nan, Yinhe Zheng, Wenji Mao

AI总结本文针对社交语言智能体在开放性任务中因策略空间庞大而陷入进化停滞的问题，提出了双尺度进化策略训练（DEPT）方法。该方法通过时间尺度感知机制检测停滞状态，并利用不对称优势重塑动态调整优化景观，从而恢复梯度信号并促进持续策略探索。实验表明，DEPT在多个社交语言游戏中显著优于现有方法，有效避免策略退化并推动智能体的持续进化。

Comments Accepted to the ACL 2026 Main Conference

2605.08716 2026-05-12 cs.AI cs.CL cs.LG

Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation

Jikun Wu, Dongxin Guo, Siu-Ming Yiu

AI总结该论文研究了序列信息处理架构中某些认知偏差是否是数学上的必然结果，证明了首因效应、锚定效应和顺序依赖在自回归语言模型中是结构上不可避免的。通过三个不可能性定理，论文揭示了这些偏差的产生机制，并提出了去偏方法的计算复杂性限制。研究在多个前沿大语言模型和人类实验中验证了理论预测，表明这些偏差是资源受限下顺序处理的理性反应。

Comments 6 pages, 3 figures, 5 tables. Accepted to CogSci 2026

2605.08713 2026-05-12 cs.RO cs.AI

REAP: Reinforcement-Learning End-to-End Autonomous Parking with Gaussian Splatting Simulator for Real2Sim2Real Transfer

Changze Li, Zhe Chen, Shaoyu Chen, Lisen Mu, Yijian Li, Yuelong Yu, Qian Zhang, Qing Su, Ming Yang, Tong Qin

AI总结本文提出了一种基于强化学习的端到端自主停车方法REAP，旨在解决极端停车场景下的挑战。通过引入不对称强化学习框架和行为克隆技术，REAP提升了训练效率和推理性能，并采用软预测碰撞惩罚机制降低碰撞风险。为实现从仿真到现实的迁移，研究构建了基于3D高斯点云的Real2Sim2Real模拟器，使训练模型能够直接应用于真实车辆，成功实现了包括机械车位在内的多种复杂停车场景的自主停车。

详情

英文摘要

In recent years, autonomous parking has made significant advances, yet parking tasks still face challenges in extreme scenarios such as mechanical and dead-end parking slots, often resulting in failures. This is mainly due to traditional parking methods adopting a multistage approach, lacking the ability to optimize the parking problem as a whole. End-to-end methods enable joint optimization across perception and planning modules to eliminate the accumulation of errors, enhancing algorithm performance in extreme scenarios. Although several end-to-end parking methods use imitation or reinforcement learning, the former is limited by data cost and distribution coverage, while the latter suffers from inefficient exploration. To address these challenges, we propose a Reinforcement learning End-to-end Autonomous Parking method (REAP). REAP employs Soft Actor-Critic (SAC) within an asymmetric reinforcement learning framework to improve training efficiency and inference performance. To accelerate model convergence, we distill the capabilities of a rule-based planner into the end-to-end network through behavior cloning. We further introduce a soft predictive collision penalty mechanism to reduce collision rates by penalizing obstacle-approaching actions. To ensure that the trained reinforcement learning network can directly transfer to real-world scenarios, we have established a Real2Sim2Real simulator. In the Real2Sim step, we use 3D Gaussian Splatting (3DGS) to transform real-world scenes into digital scenes. In the Sim2Real step, we deploy the end-to-end model onto the vehicle to bridge the Sim2Real gap. Trained in the 3DGS simulator and deployed on physical vehicles, REAP successfully parks in various types of parking spaces, especially demonstrating the feasibility of end-to-end RL parking in extremely narrow mechanical slots.

URL PDF HTML ☆

赞 0 踩 0

2605.08712 2026-05-12 cs.CV

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation

Bohan Li, Shuojue Yang, Baorui Peng, Xianda Guo, Erli Zhang, Youqi Tao, Junfeng Duan, Daguang Xu, Qi Dou, Xin Jin, Wenjun Zeng, Hao Zhao, Yueming Jin

AI总结本文研究了基于动作条件的手术视频生成问题，其核心挑战在于如何通过低维控制向量精确控制复杂的图像空间演变。为此，作者提出了一种从关节运动学向视觉控制提升的框架，将机械臂的运动学信息转化为五种与图像对齐的控制模态，并设计了一种分层路由的视觉控制体系，动态选择最相关的控制模态和运动尺度，从而提升生成效率与控制精度。此外，作者构建了一个包含精细标注的手术视频数据集，并通过实验验证了方法在动作忠实度、视觉保真度和跨域泛化能力方面的优越性。

2605.08710 2026-05-12 cs.AI

When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees

Dongxin Guo, Jikun Wu, Siu-Ming Yiu

AI总结该研究探讨了人类与人工智能团队在何种条件下能够超越个体表现，提出了基于置信度聚合规则的严格理论边界。通过结合信号检测理论与信息论分析，研究得出了互补性定理、增益尺度、不可行性结果及多分类推广等四个关键结论，并在多个数据集上验证了预测的准确性。研究揭示了人类与AI团队表现互补的罕见性，并为系统设计提供了可操作的理论指导。

Comments 8 pages, 2 figures, 7 tables. Accepted at CogSci 2026

2605.08709 2026-05-12 cs.CV

UniShield: Unified Face Attack Detection via KG-Informed Multimodal Reasoning

Hongrui Li, Yichen Shi, Hongyang Wang, Yuhao Gao, Hui Ma, Jun Feng, Zitong Yu

AI总结本文提出了一种基于知识引导的多模态推理框架UniShield，用于统一的人脸攻击检测，旨在同时识别物理欺骗和数字伪造攻击。该方法构建了人脸攻击知识图谱（FAKG），并通过攻击图指令调优（AGIT）生成大量训练样本，同时引入图一致性推理优化（GCRO）以提升推理的一致性。实验表明，UniShield在多种检测协议下均表现出优异的性能，显著提升了检测准确率和推理可靠性。

2605.08704 2026-05-12 cs.AI

AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization

Hyunmin Hwang, Jaemin Kim, Choonghan Kim, Hangeol Chang, Jong Chul Ye

AI总结本文提出了一种名为AgentPSO的框架，旨在通过多智能体粒子群优化方法提升大型语言模型的推理能力。该方法将每个智能体视为一个具有自然语言技能的粒子，通过迭代更新其技能状态和语义更新方向，使个体和集体推理能力共同提升。实验表明，AgentPSO不仅优于静态单智能体方法和仅在推理时使用的多智能体方法，而且其演化出的推理技能具有跨任务和跨模型的迁移能力。

2605.08703 2026-05-12 cs.AI cs.CL cs.CV cs.LG

RewardHarness: Self-Evolving Agentic Post-Training

Yuxuan Zhang, Penghui Du, Bo Li, Cong Wei, Junwen Miao, Huaisong Zhang, Songcheng Cai, Yubo Wang, Dongfu Jiang, Yuyu Zhang, Ping Nie, Wenhu Chen, Changqian Yu, Kelsey R. Allen

AI总结该研究提出了一种名为 RewardHarness 的自进化智能奖励框架，旨在解决图像编辑任务中评估指令引导编辑效果时所需奖励模型依赖大量人工标注的问题。该方法通过少量示例迭代进化工具和技能库，无需额外训练即可对齐人类偏好，显著提升了数据效率。实验表明，仅使用 0.05% 的标注数据，RewardHarness 在图像编辑评估基准上取得了优于 GPT-5 的性能，展现了其在奖励建模中的高效性与有效性。

Comments Project page: https://rewardharness.com

2605.08702 2026-05-12 cs.CV cs.AI

Gate-and-Merge: Zero-shot Compositional Personalization of Vision Language Models

Guodong Ding, Angela Yao

AI总结本文研究了视觉语言模型的组合式个性化问题，即在测试时同时识别或描述多个用户定义的概念。提出了一种零样本框架 Gate-and-Merge，无需共现训练即可实现组合式个性化。该方法通过独立学习每个概念的轻量 LoRA 适配器并结合概念标记，在推理时直接在权重空间合并相关更新，并利用门控机制抑制无关激活，从而提升模型在单一概念和组合场景下的性能。

2605.08701 2026-05-12 cs.LG physics.ao-ph

METBRA25Y: Brazil Surface Meteorology Archive with Harmonized Variables and Quality Control

Matheus Lima Castro, William Dantas Vichete, Leopoldo Lusquino Filho

AI总结本文介绍了METBRA25Y数据集，这是一个整合了巴西全国地面气象观测数据的标准化档案，包含从2000年至2025年的每小时气象观测记录。该数据集通过统一变量命名、质量控制和元数据标注，支持环境、气候、水文、农业等多领域研究，特别适用于需要标准化时间序列数据的机器学习应用。研究提出了两阶段的质量控制策略，包括异常值处理和时间与变量间的一致性检查，并提供了详细的站点信息和数据验证结果，为相关研究提供了可靠的数据基础。

Comments 12 pages, 5 figures. Dataset paper describing METBRA25Y, a harmonized archive of hourly Brazilian surface meteorological observations derived from INMET records. Dataset available at Zenodo: 10.5281/zenodo.19964979

详情

英文摘要

This data paper describes METBRA25Y, a harmonized archive of hourly surface meteorological observations from Brazil derived from public historical records of the Instituto Nacional de Meteorologia (INMET). The dataset was designed to support reproducible environmental, climatological, hydrological, agricultural, urban-risk, and machine-learning studies that require station-level meteorological time series with standardized variable names and explicit quality-control metadata. The processing workflow ingests annual INMET archives, parses station metadata from raw file headers, normalizes heterogeneous Portuguese column names into a canonical schema, constructs hourly timestamps, consolidates observations by city and station, and exports compressed CSV files together with station manifests, per-station quality flags, daily precipitation aggregates, variable-level failure summaries, and missing-data audits. The quality-control protocol follows a two-stage strategy: first, physically implausible values are converted to missing values and flagged; second, temporal and cross-variable consistency checks generate diagnostic flags without necessarily overwriting the original measurements. The resulting package covers observations between 2000 and 2025, with stationspecific temporal coverage, and includes key meteorological variables such as precipitation, air temperature, dew point, relative humidity, atmospheric pressure, wind speed, wind gust, wind direction, and global solar radiation. Based on the summary files included in the current release snapshot, the archive contains 616 unique station codes across variable summaries, of which 605 have coordinates within a broad Brazil plausibility envelope. This paper documents the dataset provenance, file organization, harmonized schema, quality-control rules, technical validation outputs, limitations, and recommended usage practices.

URL PDF HTML ☆

赞 0 踩 0

2605.08697 2026-05-12 cs.AI

MBP-KT: Learning Global Collaborative Information from Meta-Behavioral Pattern for Enhanced Knowledge Tracing

Yuhao Jia, Duantengchuan Li, Jinsong Chen, Zhongjie Mao, Mingwen Tong, Yue Li, Xiaoguang Wang

AI总结该研究提出了一种基于元行为模式的协作信息学习框架MBP-KT，旨在提升知识追踪（KT）模型的性能。通过构建元行为序列，MBP-KT能够更有效地捕捉学习者的行为模式，并利用无参数模块提取全局协作信息，从而增强模型对学习状态的预测能力。该方法还提供了通用的注入策略，使提取的协作信息能够广泛应用于不同下游KT模型，实验表明其在多个真实数据集上均能显著提升模型表现。

2605.08695 2026-05-12 cs.CV

EditSleuth: A Dataset of Grounded Reasoning Chains for Image-Edit Forensics

Van-Loc Nguyen, AprilPyone MaungMaung, Minh-Triet Tran, Isao Echizen

AI总结 EditSleuth 是一个用于图像编辑取证的新型数据集，包含257,725个图像编辑三元组，每个样本包含编辑后的图像、原始图像、编辑掩码、编辑类型标签、难度评分以及六步推理链。该数据集通过确定性方法构建，推理链中的每一步都基于可计算的视觉证据，旨在支持基于视觉依据的编辑定位与语义识别。实验表明，该数据集能够有效指导模型学习编辑推理能力，并生成具有解释性的取证说明。

2605.08689 2026-05-12 cs.LG cs.AI cs.SI

Structure-Centric Graph Foundation Model via Geometric Bases

Xiaodong He, Haolan He, Ruiyi Fang, Ming Sun, Zhao Kang

AI总结该论文提出了一种结构为中心的图基础模型（SCGFM），旨在解决图域间结构异质性和节点特征空间不兼容的问题。通过将图拓扑视为可迁移知识的核心，模型引入了可学习的几何基底，利用Gromov-Wasserstein距离对齐图结构，生成统一的结构对齐潜在表示。同时，模型采用结构感知的特征重编码机制，在无需固定特征维度或数据集特定预处理的情况下，实现了节点表示的统一，实验表明其在图级和节点级任务中均具有优异的域内和跨域泛化能力。

Comments Accepted by ICML 2026

2605.08688 2026-05-12 cs.AI cs.DB cs.LO

Reconciling Consistency-Based Diagnosis with Actual-Causality-Based Explanations

Leopoldo Bertossi

AI总结本文从可解释人工智能（XAI）的角度，建立了基于一致性诊断（CBD）与实际因果性及因果责任之间的联系。研究旨在弥合这两个领域之间的鸿沟，为XAI和可解释数据管理提供新的理论支持和方法途径。通过这种跨领域的结合，有望推动更深入的因果解释与诊断方法的发展。

Comments under submission

2605.08686 2026-05-12 cs.AI

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

Wenzhi Fang, Liangqi Yuan, Guangchen Lan, Dong-Jun Han, Christopher G. Brinton

AI总结本文提出了一种用于异构大语言模型（LLM）多智能体系统的迭代批判与路由控制器，解决了现有控制器仅能进行一次性模型选择、无法对中间结果进行批判和迭代优化的问题。该控制器将多智能体协作视为一个有限时间范围内的马尔可夫决策过程，在每一步评估当前输出，决定是否继续优化并选择合适的模型进行下一步改进。实验表明，该方法在多个异构系统和推理基准上显著优于现有方法，同时大幅缩小了与最强模型的性能差距，并减少了模型调用次数。

2605.08685 2026-05-12 cs.LG cs.AI

Event Fields: Learning Latent Event Structure for Waveform Foundation Models

Li Na, Yuanyun Zhang, Shi Li

AI总结本文提出了一类新型波形基础模型，通过建模生理时间序列作为潜在事件过程的实现，替代传统的序列表示方法。该方法假设临床有意义的结构来源于时间上延伸且相互作用的事件，而这些事件的边界和动态并未直接观测到。研究引入了一种自监督学习框架，通过在不同随机分割和时频投影之间保持一致性，学习对信号扰动具有鲁棒性且保留事件结构的表示，并在多种生理任务中展现出优越的性能和鲁棒性。

2605.08673 2026-05-12 cs.LG

PHIDA: Persistence-Guided Node-to-Cluster Mapping for Online Clustering

Naoki Masuyama, Yusuke Nojima, Stefan Wermter, Yuichiro Toda, Hisao Ishibuchi, Chu Kiong Loo

AI总结本文提出了一种名为PHIDA的在线聚类方法，旨在解决现有方法中节点状态到聚类结果映射不明确的问题。该方法结合逆距离自适应共振理论（IDA）的节点学习与持续同调（PH）约束的节点到聚类映射，从而在保持节点学习灵活性的同时，增强聚类结果的稳定性与鲁棒性。实验表明，PHIDA在多个基准数据集上表现优异，尤其在非平稳环境下优于其他自适应节点更新的在线聚类方法。

Comments This paper is currently under review

2605.08671 2026-05-12 cs.CL cs.AI

Explanation Fairness in Large Language Models: An Empirical Analysis of Disparities in How LLMs Justify Decisions Across Demographic Groups

Gautam Veldanda

AI总结本文研究了大语言模型（LLM）在不同人口群体之间解释决策时存在的公平性问题，提出了一个包含五个维度的“解释公平性分类法”（EFT），用于量化解释在长度、情感倾向、知识不确定性表达、与决策关联性以及词汇复杂性等方面的差异。通过在多个决策场景和多个主流模型上的实证分析，发现不同模型在解释公平性上存在显著差异，且某些改进方法虽能减少解释内容的相关性差异，却难以改善风格层面的不均衡，揭示了预训练数据分布对解释公平性的重要影响。

Comments 10 pages, 4 figures, 9 tables

详情

DOI: 10.5281/zenodo.19957410

英文摘要

Large language models (LLMs) are increasingly deployed not only to make decisions but to explain them. While AI decision fairness has been studied extensively, the fairness of AI explanations (whether LLMs justify decisions with equal quality, depth, tone, and linguistic sophistication across demographic groups) has received little attention. This paper introduces the Explanation Fairness Taxonomy (EFT), a framework comprising five formally defined, operationalizable dimensions: Verbosity Disparity, Sentiment Disparity, Epistemic Hedging Disparity, Decision-Linked Explanation Disparity, and Lexical Complexity Disparity. The taxonomy is instantiated in a controlled empirical study across 80 prompt templates, four consequential decision domains (hiring, medical triage, credit assessment, legal judgment), and five LLMs: GPT-4.1, Claude Sonnet, LLaMA 3.3 70B, GPT-OSS 120B, and Qwen3 32B. Two novel black-box metrics are introduced: the Hedging Density Score (HDS) and the Explanation Faithfulness Proxy (EFP), a heuristic indicator of decision-linked explanation variation. Across up to 400 prompt pairs, all eight EFT metrics show statistically significant disparities (Cohen's d ranging from small to large, all p_BH < 10^(-62)). Model choice is strongly associated with disparity magnitude: Qwen3 32B exhibits verbosity disparities 5.9x larger than LLaMA 3.3 70B. Two prompting-based mitigations show significant reductions in EFP disparity (78-95%) but no significant effect on stylistic dimensions, consistent with the hypothesis that stylistic explanation inequalities are encoded in pre-training distributions and are not resolvable through deployment-level instruction alone. A reproducible measurement framework is offered for explanation-level fairness auditing, with implications for AI regulation and deployment practice.

URL PDF HTML ☆

赞 0 踩 0

2605.08670 2026-05-12 cs.AI cs.CL cs.MA

MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction

Yixuan Li, Mingshu Cai, Ziyang Xiao, Wanyuan Wang, Yanchen Deng, Bo An

AI总结本文提出了一种名为 MIND-Skill 的框架，旨在自动从成功任务轨迹中生成具有质量保障的可复用技能，以提升人工智能代理在复杂任务中的表现。该方法通过归纳代理提取通用技能，并通过演绎代理根据这些技能重建任务轨迹，结合重建损失、结果损失和评分标准损失等多目标优化策略，确保生成技能的质量与适用性。实验表明，MIND-Skill 在多个任务基准上优于现有技能生成方法。

2605.08666 2026-05-12 cs.LG

The Cancellation Hypothesis in Critic-Free RL: From Outcome Rewards to Token Credits

Tianhao Cheng, Zeyu Huang, Zihan Qiu, Yu Cheng, Edoardo Ponti, Yinghui Xu, Ivan Titov, Zenglin Xu

AI总结本文研究了无评论强化学习（Critic-Free RL）在大语言模型中的机制，从标记（token）层面揭示了一个名为“抵消假设”的新现象：正负 rollout 中标记的概率变化存在显著相似性，且标记之间的梯度耦合在低置信度预测的相同标记间尤为明显。基于此，作者提出抵消假设，认为正负 rollout 共有的标记其正负信号会相互抵消，而更专属于成功 rollout 的标记则获得更强的强化，从而实现隐式的标记级信用分配。文章还提出了两种简单但有效的训练策略，以增强抵消效应，提升了多尺度模型的训练效果。

详情

英文摘要

A commonly accepted explanation of critic-free RL for LLMs, based on sequence-level rewards, is that it reinforces successful rollouts with a positive advantage while penalizing failed ones. In contrast, we study critic-free RL from a token-level perspective, revealing the token-flipping phenomenon: positive and negative rollouts exhibit remarkably similar proportions of tokens whose probabilities are boosted or suppressed during RL training. To explain this phenomenon, we further show that a token's change in probability is not fully determined by its own advantage; coupled gradient interactions with other tokens also play a non-negligible role. Specifically, these token coupling effects occur primarily between identical tokens that are both predicted with low confidence. Building upon this analysis, we propose the cancellation hypothesis: as a result of coupling, opposing signals cancel out for tokens shared by positive and negative rollouts, while tokens more specific to successful rollouts receive stronger reinforcement, thereby inducing hidden token-level credit assignment from rollout-level rewards. We support this hypothesis with complementary empirical evidence. (1) Compared with training on only positive rollouts, critic-free RL shifts updates from template and formatting tokens toward reasoning tokens; (2) Tokens boosted by critic-free RL consistently demonstrate higher value than suppressed tokens, regardless of whether they originate from positive or negative rollouts. Guided by this view, we implement two batching interventions to encourage or preserve cancellation in critic-free RL training: query-preserved mini-batching and reward-balanced batching. Despite their simplicity, these interventions improve RLVR training across multiple model scales, supporting cancellation as both an explanatory principle and a practical design criterion for critic-free RL training.

URL PDF HTML ☆

赞 0 踩 0

2605.08664 2026-05-12 cs.CV

IPAD-CLIP: Teaching CLIP to Detect Image Local Perceptual Artifacts

Juan Wang, Xinyu Sun, Ke Zhang, Jin Wang, Bing Li, Weiming Hu, Liang Wang

AI总结当前图像质量评估方法主要关注全局失真（如噪声、模糊），而忽视了局部感知伪影（如鬼影、镜头眩光、摩尔效应）的检测。为解决这一问题，本文提出图像感知伪影检测（IPAD）任务，并构建了一个包含3,520张标注图像的基准数据集。基于CLIP模型，研究者设计了IPAD-CLIP框架，通过学习与伪影相关的语义嵌入，增强模型对局部细微伪影的识别能力，实验表明该方法在资源效率和检测性能上均优于现有先进方法。

Comments 14 pages, 6 figures

2605.08663 2026-05-12 cs.CV

CAST: Channel-Aware Spatial Transfer Learning with Pseudo-Image Radar for Sign Language Recognition

Md. Shakhoyat Rahman Shujon, Sheikh Md. Galib Mahim, Md. Milon Islam, Md Rezwanul Haque, Md Rabiul Islam, Hamdi Altaheri, Fakhri Karray

AI总结本文提出了一种名为CAST的双流架构，用于解决仅基于60GHz雷达回波幅度的孤立手语识别问题。该方法结合了三个基于物理特性的模块与预训练视觉网络，通过通道感知的空间迁移学习，有效提升了雷达信号的表征能力。核心方法包括对数压缩信号的逆变换、跨天线空间注意力机制以及异构网络的跨注意力融合，实验表明该方法在五折交叉验证中达到了80.5%的Top-1准确率，优于现有最佳单模型基线。

Comments Accepted for the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), MSLR Workshop @ CVPR 2026 in Denver (Colorado, USA)

2605.08660 2026-05-12 cs.LG

Optimised Support Vector Regression for California Housing Price Prediction: The Critical Role of Feature Engineering and Hyperparameter Tuning

Emmanuel Adutwum

AI总结本文针对支持向量回归（SVR）在加州房价预测任务中表现不佳的问题，通过特征工程、超参数调优和标准化预处理等方法，显著提升了SVR的预测性能。研究构建了10个领域驱动的衍生特征，结合交叉验证进行超参数搜索，并通过消融实验验证各环节的贡献。最终优化后的SVR在测试集上达到0.723的R²值，相较之前结果提升了约20%，在十种模型中排名第四，验证了SVR在合理配置下的有效性。

Comments 25 pages, 13 figures, 10 tables

2605.08658 2026-05-12 cs.LG cs.AI cs.SE

Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching

Shan Jiang, Zijian Yi, Chenguang Zhu

AI总结本文提出了一种名为Sketch-and-Verify的方法，旨在通过程序草图在推理时高效扩展模型性能。该方法通过让大语言模型生成多种算法策略的程序草图，并在其中填充不同的实现，从而生成结构多样的候选解，再通过执行验证和指纹聚类进行筛选。实验表明，在固定模型层级下，该方法在代码生成任务中显著优于传统的采样方法，尤其在资源受限的情况下表现出更高的效率和效果。

2605.08657 2026-05-12 cs.LG cs.AI

Fitting Multilinear Polynomials for Logic Gate Networks

Youngsung Kim

AI总结本文研究了一种可学习的逻辑门网络，通过堆叠多层2输入布尔门来构建组合电路。每个布尔门对应一个4维空间中的多线性多项式，从而将训练问题转化为向量量化问题。作者提出了一种基于协方差雅可比矩阵的改进方法，有效解决了传统方法在深度增加时梯度消失的问题，并在多个数据集上表现出更优的性能。

2605.08653 2026-05-12 cs.AI

C2L-Net: A Data-Driven Model for State-of-Charge Estimation of Lithium-Ion Batteries During Discharge

Khoa Tran, T. Nguyen-Thoi, Vin Nguyen-Thai, Duong Tran Anh, Hung-Cuong Trinh, Tri Le

AI总结本文提出了一种名为C2L-Net的数据驱动模型，用于在放电过程中准确估计锂离子电池的荷电状态（SOC）。该模型通过仅使用20秒的短历史窗口进行实时估计，有效克服了传统方法依赖长历史序列带来的高计算成本和位置偏差问题。C2L-Net采用上下文与最新测量分离的框架，结合块级特征提取、因果编码和递归解码机制，显著提升了计算效率和动态适应能力，在多个固定温度条件下的实验中表现出优越的精度和效率。

详情

英文摘要

Accurate state-of-charge (SOC) estimation is critical for the safe and efficient operation of lithium-ion batteries in battery management systems (BMS). Although data-driven approaches can effectively capture nonlinear battery dynamics, many existing methods rely on long historical input sequences, resulting in high computational cost and introducing padding-induced positional bias at the beginning of drive cycles. To address these limitations, we propose C2L-Net, a novel context-to-latest data-driven framework for realistic online SOC estimation using only a short historical window (20 s). Unlike existing short-receptive-field or long-history models, the proposed framework explicitly separates contextual encoding from latest-measurement updating, enabling both efficient temporal modeling and rapid adaptation to dynamic battery states. The proposed model incorporates a chunk-based feature extraction mechanism that combines Theta Attention Pooling with a Fourier-based Seasonality Basis to capture local temporal patterns while reducing sequence length. A causal context encoder, integrating a gated recurrent unit (GRU) with Causal Cosine Attention, models temporal dependencies without information leakage. Furthermore, a latest-measurement decoder, inspired by recursive filtering, updates the contextual state using the most recent measurement, enhancing responsiveness to dynamic operating conditions. Extensive experiments on a public lithium-ion battery drive-cycle dataset under multiple fixed-temperature conditions demonstrate that the proposed method achieves state-of-the-art or competitive accuracy while significantly improving computational efficiency. In particular, C2L-Net achieves up to 60 times faster inference and requires fewer parameters than recent data-driven baselines, while maintaining robust performance across unseen driving profiles.

URL PDF HTML ☆

赞 0 踩 0

2605.08651 2026-05-12 cs.CV cs.AI cs.LG

Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection

Lei Wang, Wenxiang Diao, Andrew Busch, Jun Zhou, Yongsheng Gao

AI总结本文研究了隐私感知的视频异常检测问题，提出了一种通过正交子空间投影来保护隐私的新型方法。核心方法包括正交投影层（OPL）和引导式正交投影层（G-OPL），能够去除与任务无关的特征变化，同时抑制人脸属性信息，保留动作和姿态等非身份识别特征。该方法在保证检测性能的同时有效保护隐私，并引入了隐私感知的评估框架，实验表明其在提升检测准确性的同时有效过滤敏感信息。

Comments Accepted as a Spotlight paper at the Forty-Third International Conference on Machine Learning (ICML 2026)

2605.08648 2026-05-12 cs.LG q-bio.NC

FLUX: Geometry-Aware Longitudinal Flow Matching with Mixture of Experts

Josue Ortega Caro, Yongxu Zhang, Hannah M Batchelor, Sizhuang He, Jessica Cardin, Shreya Saxena

AI总结许多生物系统在连续的局部动态中演化，并在由学习、刺激背景、内部状态或发育阶段定义的潜在状态之间切换。这类过程通常只能以未配对的纵向快照形式观测，给轨迹建模和状态识别带来挑战。本文提出FLUX，一种基于专家混合的几何感知纵向流匹配框架，能够联合建模运输过程并实现无监督的状态发现。FLUX通过学习数据依赖的度量构建几何感知的条件路径，并将速度场分解为由直通Gumbel-Softmax路由选择的稀疏专家向量场，从而在多个生物系统中成功重建纵向运输并恢复可解释的状态结构。