URL PDF HTML ☆

赞 0 踩 0

2605.18331 2026-05-19 cs.LG

Prune, Update and Trim: Robust Structured Pruning for Large Language Models

剪枝、更新与裁剪：大型语言模型的鲁棒结构剪枝

Diego Coello de Portugal Mecke, Tom Hanika, Lars Schmidth-Thieme

发表机构 * ISMLL & DARC VWFS University of Hildesheim（ISMLL与DARC VWFS大学海德斯海姆大学）； ISMLL University of Hildesheim（ISMLL大学海德斯海姆大学）

AI总结本文提出Putri方法，通过更新未剪枝权重、按顺序剪枝FFN层以及移除单个注意力头来改进大型语言模型的后训练剪枝，实现了在极端稀疏率下的高效剪枝。

详情

AI中文摘要

大型语言模型（LLMs）近年来经历了显著的增长和开发。然而，进行LLMs的推理仍然成本高昂，尤其是在长上下文推理或资源受限的设备上。这促使开发新的后训练剪枝（PTP）方法。这些方法通过移除模型参数的大量部分来降低LLMs的要求。被丢弃的权重根据其对模型性能的影响进行选择。当前的PTP方法通过移除FFN层中信息较少的隐藏节点和最不重要的注意力层来剪枝模型。我们提出Putri，一种PTP方法，引入了三个改进：首先，更新未剪枝的FFN权重以补偿引入的剪枝误差；其次，按顺序剪枝FFN层，考虑之前层的更新；第三，而不是移除完整的注意力层，我们移除单个注意力头。我们扩展了这种方法，使其能够处理分组查询注意力。总之，Putri是一种保持简单但表现卓越的结构剪枝方法。在多个模型上进行剪枝实验，涵盖广泛的稀疏率范围和不同的数据集，验证了Putri的通用性。值得注意的是，我们证明，与以前的方法不同，Putri可以在极端稀疏率下剪枝LLMs。代码可在：https://github.com/Coello-dev/Putri 获取。

英文摘要

Large Language Models (LLMs) have experienced significant growth and development in recent years. However, performing inference on LLMs remains costly, especially for long-context inference or in resource-constrained devices. This motivates the development of new post-training pruning (PTP) methods. These methods reduce LLMs' requirements by removing a substantial part of the model's parameters. The discarded weights are selected depending on their impact on the models performance. Current PTP methods prune the models by removing the less informative hidden nodes from the FFN layers, and the least important attention layers. We propose Putri, a PTP method that introduces three changes to the State- of-the-art. First, we update the un-pruned weights of the FFN to compensate for the introduced pruning error. Second, the FFN layers are pruned sequentially, taking into account the updates done to the previous layers. Third, instead of removing full attention layers, we remove individual attention-heads. We extend this method such that it can also address Grouped-Query Attention. In summary, Putri is a structure pruning method which remains simple while showing SOTA performance. Pruning experiments on multiple models with a wide variety of sparsity ranges and on different datasets, validate the generality of Putri. Notably, we demonstrate that, unlike previous methods, Putri can prune LLMs on extreme sparsity ratios. The code is available at: https://github.com/Coello-dev/Putri.

URL PDF HTML ☆

赞 0 踩 0

2605.18328 2026-05-19 cs.CV

CineMatte: Background Matting for Virtual Production and Beyond

CineMatte：虚拟制作及其他场景的背景分割

Yuanjian He, Chen Zhang, Fasheng Chen, Jiangbo Cao

发表机构 * Online Video Business Unit, Tencent PCG Shenzhen, China（腾讯PCG深圳在线视频事业部）

AI总结本文提出CineMatte，一种用于虚拟制作及其他场景的鲁棒背景分割框架。该方法采用交叉注意力条件设计，通过共享权重的冻结DINOv3 Vision Transformer编码输入帧和捕获的背景，并利用交叉注意力模块预测前景，从而保留预训练语义并提高对背景位移的鲁棒性。此外，还引入了CineMatte-4K数据集，包含4K HDR图像视频，为虚拟制作分割提供了首个非合成的数据集。

详情

AI中文摘要

LED虚拟制作（VP）利用大LED体积实时渲染背景，使镜头内视觉效果成为可能，但使剪辑后更改变得费力。我们通过CineMatte，一种用于VP及其他场景的鲁棒背景分割框架来解决这一问题。CineMatte采用交叉注意力条件设计。不同于将背景与输入拼接，CineMatte采用一个冻结的DINOv3 Vision Transformer，具有共享权重，分别对输入帧和捕获的背景进行编码。交叉注意力模块比较两个流以预测前景，保留预训练语义并提高对背景位移的鲁棒性。先前基于ViT的分割模型使用并行卷积“细节分支”来恢复细节，这在实际样本中可能由于与主干的语义对齐问题导致边界伪影。我们改用预训练的图像引导特征上采样器，这在很大程度上缓解了该问题。我们还引入了CineMatte-4K，一个在专业LED VP舞台上拍摄的4K HDR图像视频数据集。据我们所知，图像子集是首个VP分割数据集，非合成，通过绿色屏幕插入获得；视频子集包含相机运动和跟踪轨迹，以便后续可以正确渲染任意背景。在CineMatte-4K和公共基准（VideoMatte240K，YouTubeMatte）上，CineMatte不仅在VP中表现出色，而且对真实世界 footage 也具有强大的泛化能力。

英文摘要

LED Virtual Production (VP) uses large LED volumes to render backgrounds in real time, enabling in-camera visual effects but making post-shot changes labor-intensive. We address this with CineMatte, a robust background matting framework for VP and beyond. CineMatte employs a cross-attention-conditioned design. Instead of concatenating the background with the input, CineMatte employs a Siamese, frozen DINOv3 Vision Transformer with shared weights to encode the input frame and the captured background separately. A cross-attention module compares the two streams to predict the foreground, preserving pretrained semantics and improving robustness to background shifts. Previous ViT-based matting models use a parallel convolutional "detail branch" to recover fine details, which can cause boundary artifacts in real-world samples due to semantic misalignment with the backbone. We instead replace it with a pretrained, image-guided feature upsampler, which largely mitigates the problem. We also introduce CineMatte-4K, a 4K HDR image-video dataset captured on a professional LED VP stage. To the best of our knowledge, the image subset is the first dataset for VP matting and is non-synthetic, obtained via green-screen insertion; the video subset includes camera motion with tracked trajectories so that arbitrary backgrounds can be rendered later with correct parallax. Across CineMatte-4K and public benchmarks (VideoMatte240K, YouTubeMatte), CineMatte not only excels in VP but also generalizes robustly to real-world footage.

URL PDF HTML ☆

赞 0 踩 0

2605.18327 2026-05-19 cs.AI

Causely: A Causal Intelligence Layer for Enterprise AI A Benchmark Study on SRE and Reliability Workflows

Causely: 企业AI中的因果智能层一项关于SRE和可靠性工作流的基准研究

Dhairya Dalal, Endre Sara, Ben Yemini, Christine Miller, Shmuel Kliger

发表机构 * Causely

AI总结本文提出Causely，一种企业AI的因果智能层，通过维护环境拓扑、属性依赖性和因果关系的结构化表示，为AI代理提供语义和因果基础，以诊断、评估影响并安全地在生产环境中操作。通过在受控环境下注入故障的24微服务OpenTelemetry演示应用进行基准研究，评估了Causely的价值主张。

详情

AI中文摘要

目前，部署到SRE工作流中的AI代理在查询时从原始可观测性遥测中获取对环境状态的理解，这在令牌、延迟和推断可靠性上产生了语义解释的代价。我们提出了Causely，一种因果智能层，它维护了环境拓扑、属性依赖性和因果关系的结构化表示，这些关系锚定在受管理环境的本体表示上。Causely将原始遥测转换为一个实时、可查询的模型，为AI代理提供所需的语义和因果基础，以诊断、评估影响并在生产环境中安全地行动。我们通过在受控环境下注入故障的24微服务OpenTelemetry演示应用进行基准研究来评估这一价值主张。我们的实验比较了四种代理配置（Claude Code、OpenAI Codex、HolmesGPT与Sonnet和Gemini后端）。实验在两种场景下进行：活跃事件和健康基线，分别有和无访问Causely。在活跃故障场景中，因果基础将平均诊断时间减少63%，平均令牌消耗减少60%，平均工具调用次数减少78%，将调查足迹压缩了4.8倍，并降低了每运行的直接API成本57%；根因诊断准确率从75%提升到100%。

英文摘要

AI agents deployed into SRE workflows currently derive their understanding of environment state from raw observability telemetry at query time, paying a semantic-interpretation tax in tokens, latency, and inferential reliability. We propose Causely, a causal intelligence layer that maintains a structured representation of environment topology, attribute dependencies, and causal relationships that are anchroed to a ontological representation of the managed environment. Causely transforms raw telemetry into a live, queryable model providing the semantic and causal foundation AI agents require to diagnose, evaluate impact, and act safely in production. We evaluate this value proposition through a benchmark study conducted in a controlled setting with injected faults in a 24-microservice OpenTelemetry demo application. Our experiments compare four agent configurations (Claude Code, OpenAI Codex, HolmesGPT with Sonnet and Gemini backends). Experiments are run with and without access to Causely under two scenarios: an active incident and a healthy baseline. On the active-fault scenario, causal grounding reduces mean time-to-diagnosis by 63\%, mean token consumption by 60\%, and mean tool-call count by 78\%, compressing the investigation footprint by 4.8$\times$ and lowering direct API cost per run by 57\%; root-cause-diagnosis accuracy rises from 75\% to 100\%.

URL PDF HTML ☆

赞 0 踩 0

2605.18320 2026-05-19 cs.LG cs.AI

ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization

ISEP: 通过随机策略优化实现离线强化学习的隐式支持扩展

Yifei Chen, Shaoqin Zhu, Xiaoqiang Ji

发表机构 * The Chinese University of Hong Kong, Shenzhen Longgang（香港中文大学（深圳）松山湖校区）

AI总结本文提出ISEP方法，通过随机策略优化实现离线强化学习中的隐式支持扩展，以解决传统方法在安全约束下难以发现最优行为的问题，核心贡献是通过价值函数插值和随机动作选择策略提高策略改进的导航能力。

详情

AI中文摘要

DARE-EEG: 一种用于挖掘双对齐表示的EEG基础模型

Yang Shao, Peiliang Gong, Qun Dai, Daoqiang Zhang

发表机构 * College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics（航空宇航学院人工智能学院）

AI总结本文提出DARE-EEG，一种通过双对齐表示学习预训练的自监督基础模型，旨在解决EEG编码器在不完整观测下学习不变表示的问题，通过对比学习和动量更新实现语义稳定性，并通过卷积-线性探针策略适应异构电极配置和采样率，实验表明其在EEG基准测试中表现优异。

Comments 22 pages, 10 pages of main text + 12 pages of appendices

详情

时间任务多样性：非平稳性下的归纳偏置

Afiq Abdillah Effiezal Aswadi, Oliver Britton, Ross Baker, Matthew Farrugia-Roberts

发表机构 * University of Oxford（牛津大学）

AI总结研究探讨了在合成序列建模中，任务分布随时间变化对深度学习模型归纳偏置的影响，发现任务分布的多样性增强了模型对泛化而非记忆的偏好。

Comments Presented at Technical AI Safety Conference (TAIS), Oxford, May 2026. Code available at https://github.com/matomatical/temporal-task-diversity

2605.18263 2026-05-19 cs.CV

RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting

RT-Splatting：基于高斯点散布的联合反射-透射建模

Ji Shi, Xianghua Ying, Bowei Xing, Ruohao Guo, Wenzhen Yue

发表机构 * State Key Laboratory of General Artificial Intelligence（国家一般人工智能重点实验室）； School of Intelligence Science and Technology（智能科学与技术学院）

AI总结本文提出RT-Splatting方法，通过将高斯点的几何占用与光学不透明度分离，实现对半透明表面复杂反射和清晰透射的联合建模，从而在实时渲染中获得高质量的反射和透射效果。

Comments CVPR 2026 Highlight, Project Page: https://sjj118.github.io/RT-Splatting/

详情

AI中文摘要

3D高斯点散布（3DGS）能够实现实时新型视角合成，具有高质量的视觉效果。然而，现有方法在处理半透明镜面表面时存在困难，这些表面同时表现出复杂的反射和清晰的透射，常常产生模糊的反射或过度遮挡的透射。为了解决这个问题，我们提出了RT-Splatting框架，该框架将每个高斯点的几何占用与其光学不透明度解耦。这种分解产生了一个统一的表面-体积场景表示，使用单组高斯基元。我们的混合渲染器将这种表示同时解释为表面以捕获高频反射，以及体积以保持清晰的透射。为了减轻联合优化反射和透射时的模糊性，我们引入了Specular-Aware Gradient Gating，该方法抑制了高镜面区域的误导梯度进入透射分支，从而有效减少 distracting floaters。在具有挑战性的半透明场景上的实验表明，RT-Splatting实现了最先进的性能，能够实时渲染高质量的反射和清晰的透射。此外，我们的分解自然地支持灵活的场景编辑。项目页面可在https://sjj118.github.io/RT-Splatting上找到。

英文摘要

3D Gaussian Splatting (3DGS) enables real-time novel view synthesis with high visual quality. However, existing methods struggle with semi-transparent specular surfaces that exhibit both complex reflections and clear transmission, often producing blurry reflections or overly occluded transmission. To address this, we present RT-Splatting, a framework that disentangles each Gaussian's geometric occupancy from its optical opacity. This factorization yields a unified surface-volume scene representation with a single set of Gaussian primitives. Our hybrid renderer interprets this representation both as a surface to capture high-frequency reflections and as a volume to preserve clear transmission. To mitigate the ambiguity in jointly optimizing reflection and transmission, we introduce Specular-Aware Gradient Gating, which suppresses misleading gradients from highly specular regions into the transmission branch, effectively reducing distracting floaters. Experiments on challenging semi-transparent scenes show that RT-Splatting achieves state-of-the-art performance, delivering high-fidelity reflections and clear transmission with real-time rendering. Moreover, our factorization naturally enables flexible scene editing. The project page is available at https://sjj118.github.io/RT-Splatting.

URL PDF HTML ☆

赞 0 踩 0

2605.18262 2026-05-19 cs.RO

On Improving Multimodal Pedestrian Trajectory Prediction with CVAE: A Study on Benchmark and Robot Data

基于CVAE的多模态行人轨迹预测改进：对基准数据和机器人数据的研究

Yuzhou Liu, Cristina Olaverri-Monreal

发表机构 * Dept. Intelligent Transport Systems, Johannes Kepler University Linz（智能交通系统系，约翰·凯撒大学林茨）

AI总结本文提出基于Social-STGCNN的CVAE概率模型，以改进多模态行人轨迹预测，通过在基准数据集和真实机器人数据集上的评估，展示了方法在不同人群配置下的端点准确性和轨迹多样性改进。

详情

AI中文摘要

准确的行人轨迹预测对于在复杂环境中运行的自主系统至关重要，例如郊区或半结构化区域中的模块化巴士和送货机器人。Social Spatio-Temporal Graph Convolutional Neural Networks (Social-STGCNN) 通过建模社会互动展示了强大的性能；然而，生成多样且校准良好的未来轨迹仍然具有挑战性。在本文中，我们基于Social-STGCNN骨架，引入基于条件变分自动编码器（CVAE）的概率公式，以显式建模多模态未来轨迹。我们评估了该方法在ETH和UCY行人轨迹数据集以及由移动机器人收集的真实世界行人数据集上的性能。结果表明，在公共基准上取得了适度的提升，但在不同人群配置下表现出更一致的端点准确性和改进的轨迹多样性。在机器人收集的数据上的评估进一步证明了该方法在非定制基准之外的有效性，并支持其在实际部署中的适用性。

英文摘要

Accurate pedestrian trajectory prediction is crucial for autonomous systems operating in complex environments, such as modular buses and delivery robots in suburban or semi-structured areas. Social Spatio-Temporal Graph Convolutional Neural Networks (Social-STGCNN) have shown strong performance by modeling social interactions; however, producing diverse and well-calibrated future trajectories remains challenging. In this work, we build on a Social-STGCNN backbone and introduce a Conditional Variational Autoencoder (CVAE)-based probabilistic formulation to explicitly model multimodal future trajectories. We evaluate the method on the ETH and UCY pedestrian trajectory datasets as well as on a real-world pedestrian dataset collected by a mobile robot. Results show moderate gains on public benchmarks, but more consistent endpoint accuracy and improved trajectory diversity across different crowd configurations. Evaluation on robot-collected data further demonstrates the approach's effectiveness beyond curated benchmarks and supports its applicability in practical deployments.

URL PDF HTML ☆

赞 0 踩 0

2605.18261 2026-05-19 cs.CL

稀疏自编码基准测试是否可靠？

David Chanin

发表机构 * Decode Research, MATS, UCL（Decode研究、MATS、伦敦大学学院）

AI总结该研究评估了稀疏自编码（SAE）基准测试的可靠性，发现其中两个指标在多个角度下表现不佳，其他指标也未能达到预期效果，表明需要改进SAE基准测试。

详情

AI中文摘要

稀疏自编码（SAEs）是大型语言模型的核心可解释性工具，其进展依赖于能够可靠区分更好和更差SAE的基准测试。我们通过三种互补的视角审计了SAEBench中SAE质量指标：固定SAE上的重新播种噪声、合成SAE上的真实相关性以及训练轨迹的可区分性。我们发现，两个指标，即目标探测扰动（TPP）和虚假相关性消除（SCR），在它们的典型设置下未能通过多个视角，不应用于评估SAE。其他指标显示出更高的重新播种噪声和更低的可区分性，比领域假设的要差。sae-probes变体的k-稀疏探测是我们在测试中发现最可靠的指标，但即使sae-probes也难以区分同一体系结构的不同变体。我们的结果表明，领域需要更好的SAE基准测试。

英文摘要

Sparse autoencoders (SAEs) are a core interpretability tool for large language models, and progress on SAE architectures depends on benchmarks that reliably distinguish better SAEs from worse ones. We audit the SAE quality metrics in SAEBench, the de-facto standard SAE evaluation suite, through three complementary lenses: reseed noise on a fixed SAE, ground-truth correlation on synthetic SAEs, and discriminability across training trajectories. We find that two of these metrics, Targeted Probe Perturbation (TPP) and Spurious Correlation Removal (SCR), fail multiple lenses at their canonical settings and should not be used to evaluate SAEs. The other metrics show higher reseed noise and lower discriminability than the field assumes. The sae-probes variant of $k$-sparse probing is the most reliable metric we tested, but even sae-probes struggles to separate variants of the same SAE architecture. Our results show the field needs better SAE benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.18226 2026-05-19 cs.CL cs.AI

Context Memorization for Efficient Long Context Generation

上下文记忆用于高效长上下文生成

Yasuyuki Okoshi, Hao Mark Chen, Guanxi Lu, Hongxiang Fan, Masato Motomura, Daichi Fujiki

发表机构 * Institute of Science Tokyo, Japan（东京科学研究所）； Imperial College London, UK（伦敦帝国学院）

AI总结本文提出了一种无需训练的上下文记忆方法，通过将前缀外部化为轻量级的预计算注意力状态查找表，以提高长上下文生成的准确性和效率，同时减少注意力计算的延迟。

详情

AI中文摘要

现代大型语言模型（LLM）应用越来越多地依赖长前缀来在推理时控制模型行为。尽管增强前缀的推理是有效的，但存在两个结构限制：i）随着生成过程的进行，前缀的影响逐渐减弱；ii）对前缀的注意力计算与长度成线性关系。现有方法要么在注意力中保留前缀同时压缩它，要么通过梯度训练将它内部化到模型参数中。前者在推理时仍然会关注到前缀，而后者训练成本高且不适合前缀更新。为了解决这些问题，我们提出了注意力状态记忆，这是一种无需训练的方法，将前缀外部化为一个轻量级的预计算注意力状态的查找表。在ManyICLBench上使用LLaMA-3.1-8B，我们的方法在1K-8K内存预算下比上下文学习提高了准确性，同时在8K时将注意力延迟减少了1.36倍，并在NBA基准测试中仅使用其内存足迹的20%就超过了全注意力RAG性能。

英文摘要

Modern large language model (LLM) applications increasingly rely on long conditioning prefixes to control model behavior at inference time. While prefix-augmented inference is effective, it incurs two structural limitations: i) the prefix's influence fades as generation proceeds, and ii) attention computation over the prefix scales linearly with its length. Existing approaches either keep the prefix in attention while compressing it, or internalize it into model parameters through gradient-based training. The former still attends to the prefix at inference, while the latter is training-intensive and ill-suited to prefix updates. To address these issues, we propose attention-state memory, a training-free approach that externalizes the prefix into a lightweight, lookup-based memory of precomputed attention states between prefix and query tokens. On ManyICLBench with LLaMA-3.1-8B, our method improves accuracy over in-context learning at 1K-8K memory budgets while reducing attention latency by 1.36x at 8K, and surpasses full-attention RAG performance on NBA benchmark using only 20% of its memory footprint.

URL PDF HTML ☆

赞 0 踩 0

2605.18221 2026-05-19 cs.SD cs.CL cs.CV cs.LG physics.med-ph

SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

SIREM: 语音引导的MRI重建与学习采样

Md Hasan, Nyvenn Castro, Daiqi Liu, Lukas Mulzer, Jana Hutter, Jonghye Woo, Moritz Zaiss, Andreas Maier, Paula A. Perez-Toro

发表机构 * Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg（埃森哲-埃尔朗根-纽伦堡大学模式识别实验室）； Institute of Radiology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg（埃尔朗根大学医院放射学研究所）； Institut für Informationsverarbeitung, Leibniz Universität Hannover（汉诺威莱比锡大学信息处理研究所）； Department of Radiology, Harvard Medical School and Massachusetts General Hospital（哈佛医学院放射科和麻省总医院）

AI总结本文提出了一种语音引导的MRI重建框架SIREM，通过同步语音作为跨模态先验，利用语音与声音学之间的相关性预测图像内容，从而在更高的吞吐量下实现更合理的解剖结构重建。

详情

AI中文摘要

实时磁共振成像（rtMRI）在语音生产中的应用能够非侵入性地可视化动态声带运动，对语音科学和临床评估具有价值。然而，rtMRI本质上受到空间分辨率、时间分辨率和获取速度之间的权衡限制，常常导致k空间测量不足和重建质量下降。我们提出SIREM，一种利用同步语音作为跨模态先验的MRI重建框架。核心思想是语音期间的声带配置与产生的声音学相关，使图像部分内容可从音频预测。SIREM将每帧建模为音频驱动组件和MRI驱动组件的融合，通过空间加权图。音频分支从语音预测发音器相关结构，而MRI分支从测量的k空间数据重建互补内容。我们进一步引入了可学习的软加权轮廓，使螺旋臂的使用与语音引导融合的交互研究可微分。这产生了一个统一的多模态公式，结合了音频驱动预测、MRI重建和采样适应。我们在USC语音rtMRI基准上评估了SIREM，与标准基线（包括栅格、基于小波的压缩感知和总变分）进行比较。SIREM引入了一种语音引导的重建范式，在比迭代方法高得多的吞吐量下运行，同时保持解剖上合理的声带结构。这些结果为多模态语音引导的rtMRI重建建立了初步基准，并突显了同步语音作为快速重建辅助先验的潜力。源代码可在https://github.com/mdhasanai/SIREM获取。

英文摘要

Real-time magnetic resonance imaging (rtMRI) of speech production enables non-invasive visualization of dynamic vocal-tract motion and is valuable for speech science and clinical assessment. However, rtMRI is fundamentally constrained by trade-offs among spatial resolution, temporal resolution, and acquisition speed, often leading to undersampled k-space measurements and degraded reconstructions. We propose SIREM, a speech-informed MRI reconstruction framework that uses synchronized speech as a cross-modal prior. The central idea is that vocal-tract configurations during speech are correlated with the produced acoustics, making part of the image content predictable from audio. SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map. The audio branch predicts articulator-related structure from speech, while the MRI branch reconstructs complementary content from measured k-space data. We further introduce a learnable soft weighting profile over spiral arms, enabling a differentiable study of how k-space arm usage interacts with speech-informed fusion. This yields a unified multimodal formulation that combines audio-driven prediction, MRI reconstruction, and sampling adaptation. We evaluate SIREM on the USC speech rtMRI benchmark against standard baselines, including gridding, wavelet-based compressed sensing, and total variation. SIREM introduces a speech-informed reconstruction paradigm that operates in a substantially higher-throughput regime than iterative methods while preserving anatomically plausible vocal-tract structure. These results establish an initial benchmark for multimodal speech-informed rtMRI reconstruction and highlight the potential of synchronized speech as an auxiliary prior for fast reconstruction. The source code is available at https://github.com/mdhasanai/SIREM

URL PDF HTML ☆

赞 0 踩 0