arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.08302 2026-05-12 cs.LG cs.AI

SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS

Wenbin Wei, Ruixiang Gao, Suyuan Yao, Xuanzhen Zhao, Cheng Huang, Hen-Wei Huang

AI总结本文提出了一种名为SGC-RML的可靠且可解释的帕金森病纵向评估方法，用于解决真实世界中多模态数据异构、设备偏差和标签不完整等问题。该方法通过构建一个共享的8维症状节点空间，统一了运动和非运动症状的表示，并引入不确定性估计、符合校准和选择性决策路由机制，以实现症状预测、评估拒绝和重测建议。实验表明，SGC-RML在多个真实数据集上表现出优越的性能，展示了其在不完整多模态条件下进行准确、可校准和可解释的帕金森病纵向评估的潜力。

Comments Preprint. The first five authors contributed equally. Corresponding author: Hen-Wei Huang. 9 pages main text + appendix; 4 figures, 5 tables in main text

详情

英文摘要

Real-world digital Parkinson's disease assessment faces challenges such as heterogeneous modalities, cross-device bias, and incomplete labeling. Existing methods often focus on average predictive performance, lacking the reliability mechanisms needed for retrospective reliability-aware assessment - namely, determining when the model is reliable, when to reject an assessment, when to retest, and from which symptom dimensions the predictions are based. This paper proposes SGC-RML, which maps speech, gait, wearable motion, mobility tasks, and clinical variables to a shared 8-dimensional symptom node space (7 clinical symptom nodes and 1 reliability_state auxiliary node), unifying motor and non-motor representations through a symptom atlas. By jointly introducing uncertainty estimation, conformal calibration, and selective decision routing, the model can not only predict symptoms and severity but also reject assessments or suggest retests when evidence is insufficient. We validate this framework on five real-world PD datasets, covering classification, regression, event detection, and longitudinal severity prediction. Experiments show that SGC-RML achieves an MAE of 4.579 / R^2 of 0.772 on PPMI, an AUC of 0.953 on mPower, and an AUC of 0.825 on PADS. Under leak-free temporal anchoring, as few as 5 subject-specific anchors transform UCI from an essentially non-predictive subject-independent setting (motor MAE 8.38, CCC 0.02) into a calibrated longitudinal assessment (motor MAE 3.24, CCC 0.756) with split-conformal coverage held at the 0.80 target. Under the Daphnet LOSO protocol, it achieves an F1 of 0.803 / AUC of 0.872. These results demonstrate that SGC-RML provides a unified paradigm for accurate, calibrated, auditable, and symptom-interpretable retrospective longitudinal assessment of PD under incomplete multimodal conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.08301 2026-05-12 cs.LG cs.AI

Priming: Hybrid State Space Models From Pre-trained Transformers

Aditya Chattopadhyay, Elvis Nunez, Prannay Kaul, Benjamin Bowman, Evan Becker, Luca Zancato, David Thomas, Wei Xia, Stefano Soatto

AI总结该研究提出了一种名为Priming的方法，通过从预训练的Transformer模型初始化混合状态空间模型（Hybrid SSM），将混合架构的设计从头训练问题转化为知识迁移问题，从而显著降低了训练成本。该方法能够在使用不到0.5%的预训练数据量的情况下，恢复下游任务的性能，并且适用于不同类型的Transformer模型和规模。实验表明，基于Priming的混合模型在长上下文推理任务中表现优异，且推理速度比传统Transformer模型快2.3倍。

详情

英文摘要

Hybrid State-Space models combine Attention with recurrent State-Space Model (SSM) layers, balancing eidetic memory from Attention with compressed fading memory from SSMs. This yields smaller Key-Value caches and faster decoding than Transformers, along with a richer architectural design space. Exploring that design space at scale has so far required training from scratch, a barrier that has kept most large-model Hybrid research within a narrow range of architectures. We introduce Priming, a method that turns Hybrid architecture design from a pre-training problem into a knowledge transfer one. Priming initializes a Hybrid model from a pre-trained Transformer and, through short alignment and post-training phases, recovers downstream quality using less than 0.5% of the source model's pre-training token budget. Priming is agnostic to the source Transformer family (e.g., Qwen, Llama, Mistral), model class (dense or Mixture-of-Experts), and model scale. Priming enables us to run the first controlled comparison of SSM layer types at scale under identical conditions. We evaluate, Gated KalmaNet (GKA), Gated DeltaNet (GDN), and Mamba-2, and show that their expressiveness hierarchy, GKA>GDN>Mamba-2, directly predicts downstream performance on long-context reasoning tasks. We scale Priming to 8B/32B reasoning models with native 128K contexts. Our Hybrid GKA 32B improves over its source Qwen3-32B by +3.8 average reasoning points, while staying within 1% of a Transformer post-trained on the same data and enabling up to 2.3x higher decode throughput. To foster research on Hybrid architectures, we release a model zoo of primed Hybrid models for long-context reasoning and instruction following, together with the Priming training and inference code (Sequence Parallelism algorithms for long-context training, optimized GKA kernels, and vLLM serving plugin), all under Apache~2.0 License.

URL PDF HTML ☆

赞 0 踩 0

2605.08300 2026-05-12 cs.LG cs.AI cs.CL

mHC-SSM: Manifold-Constrained Hyper-Connections for State Space Language Models with Stream-Specialized Adapters

Abdulvahap Mutlu, Şengül Doğan, Türker Tuncer

AI总结本文提出了一种名为 mHC-SSM 的状态空间语言模型架构，通过引入流约束的超连接机制，将残差流混合矩阵限制在双随机矩阵流形上，以提升模型稳定性。该方法在 SSM 块中扩展残差流为多个并行流，并通过简单形约束的预混合和后混合实现流间信息聚合与分发，同时引入流专用适配器以增强模型表达能力。实验表明，mHC-SSM 在 WikiText-2 数据集上显著提升了验证损失和困惑度，同时带来了可预测的效率权衡。

Comments 28 Pages, 3 Figures, all implementation code available at: https://github.com/abdulvahapmutlu/mhc-slm

详情

英文摘要

Manifold-Constrained Hyper-Connections (mHC) introduce a stability-motivated variant of multi stream residual mixing by constraining residual stream mixing matrices to the manifold of doubly stochastic matrices via Sinkhorn-Knopp projection. In his work, we study whether mHC-style constrained multi-stream residual topology transfers effectively to state space model (SSM) language modeling. We implement a static mHC mechanism around an SSM block by expanding the residual stream into multiple parallel streams, aggregating streams into a single SSM input through simplex-constrained pre-mixing, scattering the SSM output back to streams through simplex-constrained post-mixing, and applying Sinkhorn-projected residual stream mixing at each layer. We further introduce stream-specialized adapters that add lightweight stream-specific capacity through a shared bottleneck with per-stream scaling, applied both before stream aggregation and after the SSM output prior to scattering. We evaluate baseline single-stream SSM, static mHC SSM, and mHC SSM with adapters on WikiText-2 using identical training settings and report checkpoint-based validation loss, perplexity, throughput, and peak GPU memory. Under the reported fair checkpoint evaluation, static mHC improves validation loss from 6.3507 to 6.2448 and reduces perplexity from 572.91 to 515.35, while mHC with adapters further improves validation loss to 6.1353 and perplexity to 461.88. These gains are accompanied by modest throughput reductions from 1025.52 to 964.81 and 938.90 tokens per second, and increased peak memory from 2365 MB to 2568 MB and 3092 MB. The results suggest that mHC-inspired constrained multi-stream residual mixing can yield measurable quality improvements in SSM language models and that stream-specialized adapter capacity can further enhance performance with predictable efficiency tradeoffs.

URL PDF HTML ☆

赞 0 踩 0

2605.08298 2026-05-12 cs.LG cs.AI

What Cohort INRs Encode and Where to Freeze Them

Vasiliki Sideri-Lampretsa, Sophie Starck, Robbie Holland, Julian McGinnis, Daniel Rueckert

AI总结该研究探讨了共训（cohort-trained）隐式神经表示（INRs）中哪些编码层具有可迁移性，并分析了这些层所编码的信息内容。通过实验发现，冻结共享编码器中权重稳定秩最高的层能够实现最佳性能，且效果优于传统微调方法。研究进一步采用稀疏自编码器（SAE）对INR激活进行分解，发现SIREN和FFMLP在共训任务中表现出相似的拟合质量，但其编码的字典原子具有本质差异：SIREN的原子局部化，而FFMLP的原子覆盖整个图像并追踪记忆信号的轮廓。这一结果为理解INR的可迁移机制提供了机制性解释，并为设计更注重泛化能力的架构提供了新方向。

Comments 9 content pages plus appendix

2605.08297 2026-05-12 cs.LG cs.AI

A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

Daning Cheng, Zeyu Liu, Jun Sun, Fen Xia, Boyang Zhang, Dongping Liu, Yunquan Zhang

AI总结本文研究了归一化残差网络中模型规模扩展时测试性能的提升机制，探讨在增加网络深度时，如何保证测试风险的可证改进。作者提出了一种统一的分析框架，将问题分解为表示增益、优化增益和泛化迁移三个部分，并在零初始化附近的一阶下降条件下，证明了扩展后的模型类中包含测试风险更小的辅助模型。同时，基于归一化残差结构的范数控制，建立了扩展模型类的Rademacher复杂度上界，从而提供了两种互补的测试风险保证，为残差网络深度扩展提升测试性能提供了理论依据。

2605.08296 2026-05-12 cs.CV eess.SP

BenchHAR: Benchmarking Self-Supervised Learning for Generalizable Sensor-based Activity Recognition

Yize Cai, Rui Feng, Anlan Yu, Baoshen Guo, Zhiqing Hong

AI总结本文提出 BenchHAR，一个用于评估自监督学习方法在传感器活动识别（HAR）中泛化能力的统一基准框架。针对可穿戴传感器数据异构和标注数据稀缺的问题，BenchHAR 构建了一个大规模数据集，并系统评估了八种代表性自监督学习方法在十二种编码器-分类器架构上的表现。研究发现，结合重建与对比预训练的混合方法在整体性能上最优，同时揭示了数据规模、设备类型和身体部位对泛化能力的影响，为构建更具泛化性的HAR系统提供了重要参考。

Comments 25 pages

详情

英文摘要

Human Activity Recognition (HAR) from wearable sensors supports broad healthcare and behavior science applications. However, data heterogeneity and the scarcity of labeled data limit its real-world generalization. Recent advances in self-supervised learning (SSL) in vision and language domains have shown strong capability for learning generalizable representations from unlabeled data. Yet, few studies have systematically compared the generalization performance of SSL methods or explored how to adapt them for generalizable HAR. To address these gaps, we present BenchHAR, a unified framework for evaluating the generalization capability of SSL methods for sensor-based HAR on unseen target distributions. BenchHAR curates a large-scale dataset (~258K samples) and evaluates eight representative SSL methods across 12 encoder-classifier architectures. Our results reveal that existing SSL methods struggle to achieve satisfactory generalization performance. We find that: (1) For HAR models, the hybrid paradigm (combining reconstruction and contrastive pretraining) achieves the best overall performance. The CNN encoder exhibits the strongest ability to learn generalizable representations, while more expressive classifier architectures further improve generalization. (2) For data scale, increasing the amount of pretraining data from downstream activity classes consistently improves generalization, while adding more labeled data yields limited gains. Interestingly, incorporating unlabeled data from non-downstream activity classes does not improve generalization. (3) Sensor data collected from custom-grade devices generalizes better than that from research-grade devices, and data from limb transfers more effectively to trunk positions. BenchHAR provides a unified benchmark and actionable insights for generalizable sensor-based HAR systems. Our code is available at https://github.com/saiketa/HAR-Bench.

URL PDF HTML ☆

赞 0 踩 0

2605.08295 2026-05-12 cs.LG cs.AI cs.CL

In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification

Ming Liu

AI总结本文研究了在少样本分类任务中，模型对示例标签的过度依赖问题。研究发现，当示例标签语义一致时，模型的分类准确率会大幅下降，甚至低于12%。通过实验分析，作者揭示了模型在生成答案时主要依赖示例中的标签词汇，而非语义理解，这一现象被称为“上下文固化”。研究还通过激活修补和逻辑透镜等方法，定位了相关神经网络结构，并验证了该现象在不同模型和任务中的广泛存在。

Comments 12 pages (10 main + 2 appendix), 4 figures, 5 tables

2605.08292 2026-05-12 cs.LG cs.AI math.OC

Hierarchical Mixture-of-Experts with Two-Stage Optimization

Gleb Molodtsov, Alexander Miasnikov, Aleksandr Beznosikov

AI总结本文提出了一种名为Hi-MoE的分层混合专家模型，旨在解决稀疏混合专家模型中路由器在负载均衡与专家专业化之间的根本性权衡问题。该方法通过将路由控制分解为组间负载均衡和组内专业化两个层次，有效提升了专家行为的互补性并防止组内崩溃。实验表明，Hi-MoE在自然语言处理和视觉任务中均优于现有稀疏路由和分组MoE模型，且在大规模预训练中表现出更优的性能与专家平衡性。

2605.08291 2026-05-12 cs.LG cs.AI cs.AR

Graph Computation Meets Circuit Algebra: A Task-Aligned Analysis of Graph Neural Networks for Electronic Design Automation

Hyunmog Kim

AI总结本文探讨了图神经网络（GNN）在电子设计自动化（EDA）任务中的适用性，指出不同EDA任务具有独特的代数结构，成功的GNN方法应与其任务的代数特性对齐。通过分析包括时序分析、布局、布线拥堵等任务，论文系统梳理了适用于电路的GNN架构工具，明确了电路图与通用图的差异，并指出了当前方法在代数与架构不匹配时的局限性及未来研究的关键挑战。

2605.08290 2026-05-12 cs.LG cs.AI

Toward Optimal Regret in Robust Pricing: Decoupling Corruption and Time

Kalana Kalupahana, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi

AI总结本文研究了在存在恶意干扰的动态定价问题中如何实现最优遗憾界，提出了一种将干扰程度 $C$ 与时间范围 $T$ 解耦的新型算法。该算法基于改进的二分搜索方法，在已知干扰情况下可达到 $\mathcal{O}(C + \log T)$ 的遗憾界，在未知干扰情况下则达到 $\mathcal{O}(C + \log^2 T)$，显著优于之前的结果，为鲁棒动态定价提供了更优的理论保证。

2605.08289 2026-05-12 cs.LG cs.AI

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

Fan Zhang, Shiming Fan, Hua Wang

AI总结多变量时间序列预测在许多实际系统中至关重要，而建模跨变量依赖关系是关键。本文提出了一种名为MS-FLOW的稀疏瓶颈框架，通过限制信息流容量来显式建模变量间的交互，以减少冗余连接和虚假相关性的传播。实验表明，该方法在12个真实数据集上取得了领先的预测精度，同时学习到更可靠、更少但更有效的跨变量依赖关系，实现了从“更多交互”到“更有效交互”的转变。

2605.08288 2026-05-12 cs.LG cs.AI cs.CR cs.DC

UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment

Shih-Yu Lai, Hirozumi Yamaguchi, Shang-Tse Chen, Yu-Lun Liu, Bing-Yu Chen

AI总结 UMEDA 是一种面向隐私保护的图联邦学习框架，旨在解决异构传感器设备在无线和视觉信号融合中的定位问题。该方法通过谱门控注意力机制和基于扩散模型的算子对齐技术，实现跨模态数据的高效融合与隐私保护。UMEDA 在保持模型性能的同时，有效应对了设备异构性、数据分布偏移和隐私噪声干扰等挑战，并在多个基准测试中展现出优越的准确性和通信效率。

2605.08287 2026-05-12 cs.LG cs.AI

Multi-Armed Bandits With Best-Action Queries

Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Francesco Emanuele Stradi

AI总结本文研究了增强型多臂老虎机问题，其中学习者可以查询一个能返回当前最佳动作的预言机。在更现实的单臂反馈模型下，作者完全解决了这一问题，证明了在随机且独立同分布的奖励设置中，最佳动作查询可将遗憾降低至 $\widetilde{\mathcal{O}}(\min\{T/k,\sqrt{T-k}\})$，并给出了相应的下界，从而全面刻画了最佳动作查询在该模型下的性能优势。

2605.08286 2026-05-12 cs.LG cs.AI

Diagnosing Spectral Ceilings in Equivariant Neural Force Fields

Hyunmog Kim

AI总结本文提出了一种频谱注入诊断方法，用于评估等变神经力场模型在不同角频率下的信息保留能力。通过向分子力场中注入可控的角频率扰动，并利用轻量化的频谱预测网络（SPN）进行分析，研究发现等变模型在特定频率边界处存在性能骤降现象。实验表明，这一现象并非由参数数量单独引起，而是与模型的频谱表达能力密切相关，揭示了等变神经网络在建模复杂分子系统时的频谱上限问题。

2605.08285 2026-05-12 cs.LG cs.CE

Exactness Matters for Physical Rule Enforcement

Bum Jun Kim

AI总结本文研究了在物理规则约束下科学预测模型中精确性对约束执行效果的影响，探讨了何时更强的物理规则约束能提升预测准确性、何时会引发分布偏移问题。通过操作符精确性分析，比较了不同约束方法在流体动力学等任务中的表现，发现精确投影在周期性系统中显著提升预测精度，但在非精确场景下，过度约束可能适得其反。研究还提出了一系列策略，以在近似约束条件下实现更稳健的预测性能。

Comments 28 pages, 6 figures

详情

英文摘要

Autoregressive scientific forecasters often enforce physical or structural constraints by repairing each predicted state before feeding it back into the model. However, it remains unclear when stronger physical rule enforcement becomes reliable and when it becomes a source of distribution shift. We study this question through operator exactness, meaning whether the repair map is the identity on the target manifold and is aligned with the target geometry. We compare raw forecasting, post hoc repair, and in-loop repair across periodic incompressible Navier--Stokes, non-periodic CFDBench flows, and a hierarchical-forecasting support task. In the exact periodic regime, Fourier projection substantially improves rollout accuracy. On the NS-128 benchmark, a strong Raw-FNO has a final-step rollout MSE at horizon 100 of $(9.390 \pm 6.290)\times 10^{-5}$, and post hoc and in-loop projection reduce it to $(1.130 \pm 0.165)\times 10^{-6}$ and $(5.370 \pm 0.113)\times 10^{-7}$. However, once an exact projection is unavailable and only approximate boundary-preserving cleanup is available, the ordering changes. Across cavity, tube, dam, and cylinder flow, stronger Poisson-based cleanup can reduce divergence while worsening rollout error; target-distortion MSE predicts this harm far better than a linear-system residual. Controlled mismatch, screened cleanup, adaptive gating, and external-backbone checks show that the best approximate-regime operating point can be raw or near-identity. Hierarchical forecasting gives the same broader pattern. Exact forecast reconciliation is a stable baseline, whereas blended top-down repair, a validation-tuned interpolation toward historical-proportion top-down reconciliation, is dataset-dependent. Thus, constraint enforcement should be benchmarked by operator--data alignment before enforcement strength.

URL PDF HTML ☆

赞 0 踩 0

2605.08283 2026-05-12 cs.LG cs.AI cs.CL

HTPO: Towards Exploration-Exploitation Balanced Policy Optimization via Hierarchical Token-level Objective Control

Xincheng Yao, Ruoqi Li, Cheng Chen, Daoxin Zhang, Yi Wu, Yao Hu, Chongyang Zhang

AI总结该研究针对大语言模型强化学习中的探索与利用平衡问题，提出了一种基于分层令牌级目标控制的策略优化方法HTPO。HTPO通过将响应中的令牌按难度、答案正确性和熵值三个维度分组，并为每组设计针对性的优化目标，从而实现对推理过程中不同令牌功能的精细化引导。实验表明，HTPO在多个复杂推理基准上显著优于现有方法，验证了其在提升模型推理能力方面的有效性。

Comments 29 pages

2605.08281 2026-05-12 cs.CV

Is Class Signal Clustered or Routed in Task-Induced Implicit Neural Representation Weight Spaces?

Xinyi Guo, Mingyi He, Haobin Ding, Weiming Chen, Xinrui Chen, Jiawen Li, Di Zhang, Minxi Ouyang, Yizhi Wang, Xitong Ling

AI总结本文研究了任务诱导的隐式神经表示（INR）权重空间中类别信号是聚类还是路由的问题。通过在基于SIREN的Meta Weight Transformer（MWT）框架下进行实验，发现类别信号并非通过权重空间的几何聚类实现分类，而是通过读取器（reader）进行路由。研究进一步识别出SIREN权重中的偏置列是影响分类性能的关键因果路径，并提出了一些改进方法，如增强路由机制或引入显式偏置路径，以提升模型性能。

2605.08280 2026-05-12 cs.LG cs.AI

Beyond the False Trade-off: Adaptive EWC for Stealthy and Generalizable T2I Backdoors

Lu Bowen, Xinyu Tang, Yin Yin Low, Shu-Min Leong

AI总结本文研究了如何在文本到图像（T2I）后门攻击中平衡攻击成功率与模型保真度的问题。传统方法如LwF依赖输出蒸馏，正则化效果有限，而作者引入基于参数的弹性权重固化（EWC）以提升保真度。针对标准EWC在固定正则化权重下导致的性能下降问题，提出了一种基于余弦语义效用和自适应调度的动态调整方法，有效提升了攻击成功率与模型保真度的平衡，并在跨域数据集上表现出更强的鲁棒性。

2605.08279 2026-05-12 cs.LG cs.AI

LaWM: Least Action World Models for Long-Horizon Physical Consistency from Visual Observations

Qixin Xiao, Maani Ghaffari

AI总结本文提出了一种名为LaWM的潜空间世界模型框架，旨在从视觉观测中学习具有长期物理一致性的预测模型。该方法通过在潜空间中实现最小作用量原理，利用学习到的拉格朗日作用泛函来引导未来状态的生成，而非依赖无约束的神经转移函数。核心技术创新在于引入了潜变分积分器，通过学习广义坐标和离散拉格朗日量，构建离散作用泛函并求解相应的积分条件，从而在长期预测中保持物理结构的保真性。实验表明，LaWM在多个物理和机器人任务中显著提升了预测的物理不变性、背景一致性及运动平滑性。

2605.08276 2026-05-12 cs.CV

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Weiming Chen, Xitong Ling, Zhenyang Cai, Xidong Wang, Jiawen Li, Tian Guan, Benyou Wang, Yonghong He

AI总结细胞级别的密集预测在计算病理学中至关重要，但由于组织结构的细粒度、领域差异大以及密集标注成本高等挑战，仍面临困难。为解决现有基于ViT的病理基础模型在细胞级预测中因使用块标记化而破坏空间连续性和削弱局部形态细节的问题，本文提出了一种基于掩码扩散的卷积基础模型CMD，采用全卷积的ConvNeXt-UNet主干网络，在像素空间中进行掩码扩散预训练，并通过自适应归一化引入冻结的病理基础模型特征。实验表明，CMD在多个病理密集预测任务中优于现有ViT模型，甚至在微调少量参数的情况下超越了最先进的端到端分割方法，尤其在标注有限的情况下展现出更强的鲁棒性和泛化能力。

2605.08273 2026-05-12 cs.LG cs.AI

Efficient Prompt Learning for Traffic Forecasting

Qianru Zhang, Xinyi Gao, Alexander Zhou, Reynold Cheng, Siu-Ming Yiu, Hongzhi Yin

AI总结本文研究了如何提高时空图神经网络在交通预测中的泛化能力，以应对时空动态变化带来的分布偏移问题。为此，作者提出了一种高效且模型无关的提示学习框架SimpleST，通过引入轻量级的提示机制，在不改变模型参数的前提下，使预训练模型能够适应新的分布。实验表明，该方法在多个真实城市时空数据集上表现出优越的预测精度和计算效率。

Comments 24 pages. This paper is accepted by VLDBJ

2605.08271 2026-05-12 cs.CV cs.AI

Bridging Modalities, Spanning Time: Structured Memory for Ultra-Long Agentic Video Reasoning

Jiazheng Li, Chi-Hao Wu, Yunze Liu, Kaize Ding, Jundong Li, Chuxu Zhang

AI总结该研究旨在解决超长视频（如第一视角录像、直播或监控视频）理解中的挑战，即现有模型在处理数天至数周的视频内容时存在上下文窗口限制和信息丢失问题。为此，作者提出了MAGIC-Video，一个无需训练的框架，通过构建多模态记忆图谱和交错叙事链，实现跨模态检索与长期叙事总结。该方法在多个基准测试中表现出色，显著优于现有主流方法。

2605.08269 2026-05-12 cs.RO cs.SY eess.SY

Anatomical Landmark-Guided Deep Reinforcement Learning for Autonomous Gastric Navigation

Haoxuan Wu, Sishen Yuan, Haitao Gao, Zhen Li, Xiuli Zuo, Hongliang Ren

AI总结该研究提出了一种基于解剖标志引导的深度强化学习框架，用于实现自主胃部导航，以提高无线胶囊内镜的诊断效果。通过融合边缘、轮廓和深度信息的轻量模块，方法在低维解剖标志坐标上进行决策，有效克服了仿真与现实之间的差距。实验表明，该方法在多个患者模型中实现了超过97%的覆盖面积，并在实际实验中相比人工操作减少了53%的时间。

2605.08255 2026-05-12 cs.LG cond-mat.mtrl-sci cs.AI

Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

Yuchu Liu, Rui Zhu, Jingwei Xiong, Haixu Tang

AI总结该研究探讨了大型语言模型是否能够仅通过阅读非结构化的科学文本，预测聚合物的物理和力学性能。传统模型通常依赖化学结构表示，忽略了合成工艺、加工条件等关键实验信息。为此，研究提出了一个基于自然语言的框架PolyLM，直接从全文文献中预测材料性能，并构建了一个包含18.5万篇论文和27.6万种聚合物样本的大规模数据集进行训练。实验表明，该模型在22项性能指标上取得了显著的预测精度，其中多项指标的$R^2$值超过0.80，验证了自然语言在材料性能预测中的强大潜力。

2605.08254 2026-05-12 cs.LG cs.AI

HyperTransport: Amortized Conditioning of T2I Generative Models

Valentino Maiorca, Eleonora Gualdoni, Xavier Suau, Marco Cuturi, Luca Zappella, Pau Rodríguez

AI总结随着基础模型能力的提升，如何高效且可靠地控制其行为变得至关重要。本文提出HyperTransport，一种基于超网络的框架，通过将预训练编码器（如CLIP）的嵌入直接映射到干预参数，实现了对生成模型行为的快速且稳定的控制。该方法通过端到端的最优运输损失进行训练，能够在每次干预时仅需一次前向传播，大幅提升了效率，并在未见过的概念上也表现出色，具备开放概念集的摊销控制、连续可解释强度调节和跨模态条件生成等多项优势。

2605.08252 2026-05-12 cs.CV

Multimodal Emotion Recognition via Causal-Diffusion Bridge (Affect-Diff)

Ankit Sanjyal

AI总结该研究针对多模态情感识别中数据严重不平衡的问题，提出了一种名为Affect-Diff的因果扩散桥模型，通过因果图重构模态权重、正则化潜在压缩以及扩散先验结构化潜在空间，有效提升了对小类情感（如恐惧、厌恶和惊讶）的识别能力。实验表明，Affect-Diff在CMU-MOSEI数据集上显著优于现有方法，验证集平衡准确率提升了18%，并且首次实现了对所有六类情感的检测。

Comments 10 Pages, 12 Figures, 6 Tables

2605.08250 2026-05-12 cs.CV cs.AI

Why Do DiT Editors Drift? Plug-and-Play Low Frequency Alignment in VAE Latent Space

Xiaoce Wang, Sifan Zhou, Kaifei Wang, Leli Xu, Xuerui Qiu, Tao He, Ming Li

AI总结近年来，扩散变压器（DiT）在单次图像编辑任务中表现出色，但在多次编辑过程中常出现语义漂移和质量下降的问题。本文从潜在空间频率的角度出发，将编辑过程分解为VAE和DiT两个部分，发现DiT在多次编辑中引入了累积的低频语义漂移，而VAE则主要贡献稳定的重建偏差。基于这一发现，作者提出了一种无需重新训练、可直接应用的低频对齐方法VAE-LFA，在VAE潜在空间中通过低通滤波和统计对齐有效抑制语义漂移，显著提升了多轮编辑的语义一致性和视觉质量。

Comments 9 pages main paper, 12 figures, 25 pages in total

2605.08249 2026-05-12 cs.CV eess.IV eess.SP

Dimensional Coactivation for Representational Consistency in Frozen Vision Foundation Models

Izaldein Al-Zyoud Abdulmotaleb El Saddik

AI总结本文研究了冻结视觉基础模型在单个输入样本内部表示的一致性问题，提出了一种名为维度共激活（DCA）的新方法，用于衡量模型在不同语义区域之间是否保持一致的表示结构。DCA通过分析特征维度在不同区域间的共激活模式来评估表示的一致性，避免了传统相似性度量中的归一化等操作，更适用于固定坐标系下的样本内分析。实验表明，DCA在深度伪造检测任务中表现出色，能够有效识别合成图像中语义区域之间的表示断裂。

2605.08246 2026-05-12 cs.CV cs.CR cs.LG

Smart Railway Obstruction Detection System using IoT and Computer Vision

Pravin Kumar, Mritunjay Shall Peelam, Ramakant Kumar, Sanjay Kumar, Vinay Chamola

AI总结本文提出了一种基于物联网和计算机视觉的智能铁路障碍物检测系统NETRA，旨在解决印度铁路面临的野生动物侵入和人为障碍物带来的安全问题。该系统部署在低成本的树莓派边缘设备上，通过概率传感器融合技术结合红外和超声波传感器，有效降低了误报率并减少了不必要的视觉处理。实验表明，NETRA在检测准确率、系统响应速度和部署成本方面均优于现有方案，为铁路安全提供了高效、经济的解决方案。

2605.08241 2026-05-12 cs.CV cs.AI

TinySSL: Distilled Self-Supervised Pretraining for Sub-Megabyte MCU Models

Bibin Wilson

AI总结本文提出了一种名为 TinySSL 的自监督预训练方法，旨在为参数少于500万的微控制器（MCU）模型提供高效的表示学习。该方法通过识别并克服小规模模型中的三个关键挑战，结合知识蒸馏、多尺度特征对齐和渐进式数据增强策略，显著提升了模型在图像分类和目标检测任务上的性能。实验表明，TinySSL 在保持模型轻量化的同时，实现了优于现有方法的准确率，并在部署时具有极低的内存占用和推理开销。