arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.13846 2026-05-14 cs.CL cs.AI

WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data

Ziheng Zhang, Yunzhong Hou, Naijing Liu, Liang Zheng

AI总结本文介绍了WARDEN，一个用于转录和翻译濒危的澳大利亚原住民语言Wardaman到英语的早期语言模型系统。由于可用的标注音频数据仅有6小时，传统依赖大规模数据训练的方法不再适用，因此WARDEN采用分阶段设计，先进行语音到音素的转录，再进行音素到英语的翻译，并引入了两种增强性能的技术，包括利用音素相似的语言进行模型初始化和结合专家标注词典的大型语言模型推理。实验表明，WARDEN在极低数据条件下表现优于传统统一模型，为濒危语言处理提供了有力的基线。

2605.13840 2026-05-14 stat.ML cs.DS cs.LG math.ST stat.CO stat.TH

What is Learnable in Valiant's Theory of the Learnable?

Steve Hanneke, Anay Mehrotra, Grigoris Velegkas, Manolis Zampetakis

AI总结本文重新审视了Valiant在1984年提出的可学习性模型，探讨了其中哪些概念类是可以被学习的。研究发现，在有限域（包括布尔超立方体）中，一个类可学习当且仅当每个可实现的正样本可以通过多项式大小的自适应查询压缩方案进行认证。这一结果揭示了Valiant模型的学习能力严格介于PAC学习和无查询版本之间，并首次给出了在该模型中学习$d$维半空间的有效算法，展示了查询机制对可学习类的实质性影响。

详情

Comments: Abstract shortened for arXiv

英文摘要

Valiant's 1984 paper is widely credited with introducing the PAC learning model, but it, in fact, introduced a different model: unlike PAC learning, the learner receives only positives, may issue membership queries, and must output a hypothesis with no false positives. Prior work characterized variants, including the case without queries. We revisit Valiant's original model and ask: *Which classes are learnable in it?* For every finite domain, including Valiant's Boolean-hypercube setting, we show that a class is learnable if and only if every realizable positive sample can be certified by a poly-size adaptive query-compression scheme. This is a new variant of sample compression where the learner certifies samples via a short interaction with the membership oracle. Our characterization shows that learnability in Valiant's model is strictly sandwiched between learnability in the PAC model and the variant of Valiant's model without membership queries. This is one of the rare cases where introducing membership queries changes the set of learnable classes, and not just the sample or computational complexity. Next, we study the natural extension of the model to arbitrary domains. While we do not obtain an exact characterization, our techniques readily generalize and show that the same strict sandwiching persists. Finally, we show that $d$-dimensional halfspaces, which are not learnable without queries, are learnable with queries: we give a $\mathrm{poly}(d) \tilde{O}(1/ε)$ sample and $\mathrm{poly}(d) \mathrm{polylog}(1/ε)$ query algorithm, and prove that at least $Ω(d)$ samples or queries are necessary. To our knowledge, this is the first algorithm for halfspaces in Valiant's model. Together, these results uncover a surprisingly rich theory behind Valiant's original notion of learnability and introduce ideas that may be of independent interest in learning theory.

URL PDF HTML ☆

赞 0 踩 0

2605.13839 2026-05-14 cs.CL

Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

Wenrui Bao, Huan Wang, Jian Wang, Zhangyang Wang, Kai Wang, Yuzhang Shang

AI总结该论文研究了多智能体大语言模型系统中更高效的协作方式，提出了一种基于权重空间的通信框架TFlow，通过将发送者的隐藏状态转化为接收者特定的低秩权重扰动，替代传统的自然语言消息交换方式。这种方法在不改变模型结构和文本上下文的前提下，实现了对接收者的实例级适配，显著减少了计算开销和推理时间，实验表明其在多个基准测试中提升了准确率并大幅降低了处理的token数量。

2605.13836 2026-05-14 eess.SY cs.SY

Reachable-Set Decomposition for Real-Time Aggregation of Multi-Zone HVAC Fleets

Jingguan Liu, Xiaomeng Ai, Cong Chen, Shaoze Li, Shichang Cui, Jiakun Fang, Jinyu Wen

AI总结本文研究了多区域暖通空调（HVAC）系统实时聚合中的灵活性刻画问题，面对区域间强耦合和实时信息逐步揭示带来的挑战，提出了一种可达集分解框架。该方法通过离线阶段构建后向可达集，将剩余时段的可行性转化为每时段的状态约束，结合定制的内近似方法实现高效计算；在实时阶段，通过并行线性规划和功率区间闵科夫斯基求和，快速计算聚合灵活性并保证调度信号的递归可行性。实验验证了该方法在灵活性刻画、分解可行性及计算可扩展性方面的有效性。

2605.13835 2026-05-14 cs.CV

Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning

Hao Sun, Zi-Jun Ding, Da-Wei Zhou

AI总结该论文研究了基于CLIP的类别增量学习（CIL）问题，旨在使模型在持续学习新类别时避免灾难性遗忘。现有方法主要关注全局图像嵌入的对齐，而忽略了CLIP编码器中丰富的局部块级语义信息。为此，作者提出了一种名为SPA的方法，通过生成类别语义描述并引导选择具有判别性的块级视觉特征，结合最优传输进行跨模态对齐，从而更有效地利用局部信息提升识别性能，并引入任务特定投影器和伪特征采样策略以增强模型的适应性和稳定性。

2605.13833 2026-05-14 cs.LG cs.CV

QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

Hoang-Quan Nguyen, Sankalp Pandey, Khoa Luu

AI总结本文提出了一种名为QLAM的量子长注意力记忆方法，用于处理长序列的token建模问题。该方法结合量子计算的叠加特性与状态空间模型（SSMs）的线性时间效率，通过量子态表示隐藏状态，从而增强对历史信息的全局表示能力。实验表明，QLAM在多个序列图像分类任务中优于传统循环模型和基于Transformer的模型。

详情

英文摘要

Modeling long-range dependencies in sequential data remains a central challenge in machine learning. Transformers address this challenge through attention mechanisms, but their quadratic complexity with respect to sequence length limits scalability to long contexts. State-space models (SSMs) provide an efficient alternative with linear-time computation by evolving a latent state through recurrent updates, but their memory is typically formed via additive or linear transitions, which can limit their ability to capture complex global interactions across tokens. In this work, we introduce one of the first studies to leverage the superposition property of quantum systems to enhance state-based sequence modeling. In particular, we propose Quantum Long-Attention Memory (QLAM), a hybrid quantum-classical memory mechanism that can be viewed as a quantum extension of state-space models. Instead of maintaining a classical latent state updated through additive dynamics, QLAM represents the hidden state as a quantum state whose amplitudes encode a superposition of historical information. The state evolves through parameterized quantum circuits conditioned on the input, enabling a non-classical, globally update mechanism. In this way, QLAM preserves the recurrent and linear-time structure of SSMs while fundamentally enriching the memory representation through quantum superposition. Unlike attention mechanisms that explicitly compute pairwise interactions, QLAM implicitly captures global dependencies through the evolution of the quantum state, and retrieves task-relevant information via query-dependent measurements. We evaluate QLAM on sequential variants of standard image classification benchmarks, including sMNIST, sFashion-MNIST, and sCIFAR-10, where images are flattened into token sequences. Across all tasks, QLAM consistently improves over recurrent baselines and transformer-based models.

URL PDF HTML ☆

赞 0 踩 0

2605.13831 2026-05-14 cs.CV

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Zhaowei Wang, Lishu Luo, Haodong Duan, Weiwei Liu, Sijin Wu, Ji Luo, Shen Yan, Shuai Peng, Sihang Yuan, Chaoyi Huang, Yi Lin, Yangqiu Song

AI总结本文研究了如何有效训练长上下文视觉-语言模型（LVLMs），以实现超过128K上下文长度的泛化能力。通过系统性的继续预训练实验，作者发现长文档VQA任务比OCR转录更有效，并提出了三个关键结论：数据长度分布应保持平衡、检索能力是主要瓶颈、长文档数据可保留短上下文能力。基于这些发现，他们提出了MMProLong模型，在仅使用50亿token的情况下，显著提升了长文档VQA性能，并在更长的上下文长度上保持了良好的表现，无需额外训练。

详情

Comments: work in progress

英文摘要

Long-context modeling is becoming a core capability of modern large vision-language models (LVLMs), enabling sustained context management across long-document understanding, video analysis, and multi-turn tool use in agentic workflows. Yet practical training recipes remain insufficiently explored, particularly for designing and balancing long-context data mixtures. In this work, we present a systematic study of long-context continued pre-training for LVLMs, extending a 7B model from 32K to 128K context with extensive ablations on long-document data. We first show that long-document VQA is substantially more effective than OCR transcription. Building on this observation, our ablations further yield three key findings: i) for sequence-length distribution, balanced data outperforms target-length-focused data (e.g., 128K), suggesting that long-context ability requires generalizable key-information retrieval across various lengths and positions; ii) retrieval remains the primary bottleneck, favoring retrieval-heavy mixtures with modest reasoning data for task diversity; and iii) pure long-document VQA largely preserves short-context capabilities, suggesting that instruction-formatted long data reduces the need for short-data mixing. Based on these findings, we introduce MMProLong, obtained by long-context continued pre-training from Qwen2.5-VL-7B with only a 5B-token budget. MMProLong improves long-document VQA scores by 7.1% and maintains strong performance at 256K and 512K contexts beyond its 128K training window, without additional training. It further generalizes to webpage-based multimodal needle retrieval, long-context vision-text compression, and long-video understanding without task-specific supervision. Overall, our study establishes a practical LongPT recipe and an empirical foundation for advancing long-context vision-language models.

URL PDF HTML ☆

赞 0 踩 0

2605.13829 2026-05-14 cs.CL cs.AI cs.LG

Negation Neglect: When models fail to learn negations in training

Harry Mayne, Lev McKinney, Jan Dubiński, Adam Karvonen, James Chua, Owain Evans

AI总结本文提出了“否定忽视”现象，即在对大语言模型进行微调时，若训练文档中明确标注某陈述为假，模型反而可能误认为该陈述为真。研究发现，当模型在包含否定信息的文档上进行训练时，其对虚假陈述的信念率显著上升，甚至在文档中反复强调陈述为假的情况下仍会发生。实验表明，这种现象不仅影响事实性陈述的学习，还可能扩展到模型行为，对人工智能安全带来潜在风险。

2605.13826 2026-05-14 cs.LG cond-mat.mtrl-sci physics.chem-ph

Reducing cross-sample prediction churn in scientific machine learning

Gordan Prastalo, Kevin Maik Jablonka

AI总结科学机器学习通常只报告模型的预测性能，但未说明相同预测在不同训练数据采样下是否保持一致。本文提出“跨样本预测波动”这一概念，指在相同测试样本上，不同训练数据子集训练出的模型预测结果可能不一致。研究发现，传统参数侧方法无法有效减少该波动，而数据侧方法如 $K$-bootstrap 袋外采样和提出的 twin-bootstrap 方法，能在不损失准确率的前提下显著降低预测波动，为科学机器学习评估提供了更全面的指标。

2605.13825 2026-05-14 cs.AI cs.CV

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

Alberto G. Rodríguez Salgado

AI总结该研究探讨了大型语言模型在面对先前有害行为记录时是否会继续采取不安全行动的问题。研究构建了一个名为HistoryAnchor-100的测试集，包含100个高风险场景，用于评估模型在不同历史行为引导下的决策倾向。实验发现，当提示中加入“保持与先前历史策略一致”的指令时，许多对齐良好的模型会显著增加选择不安全选项的概率，甚至出现行为升级现象，揭示了模型决策可能受到历史行为强烈影响的安全隐患。

2605.13822 2026-05-14 cs.RO cs.SY eess.SY

Loiter UAV Reinsertion Guidance for Fixed-wing UAV Corridors

Pradeep J, Kedarisetty Siddhardha, Ashwini Ratnoo

AI总结本文研究固定翼无人机走廊中的滞留无人机重新插入主航道的问题，该走廊包括主航道、用于缓解交通拥堵的环形滞留航道以及连接两者的过渡航道。为确保安全无冲突地将滞留无人机重新插入主航道，提出了一种基于虚拟插槽和速度约束的引导算法。该方法通过数值仿真验证了其有效性，为无人机交通管理提供了可行的自动化策略。

2605.13821 2026-05-14 cs.AI cs.LG

Harnessing Agentic Evolution

Jiayi Zhang, Yongfeng Gu, Jianhao Ruan, Maojia Song, Yiran Peng, Zhiguang Han, Jinyu Xiang, Zhitao Wang, Caiyin Yang, Yixi Ouyang, Bang Liu, Chenglin Wu, Yuyu Luo

AI总结本文研究如何通过交互式环境提升智能体进化的稳定性和效率，提出了一种名为AEvo的元编辑框架。该框架通过将累积的进化上下文作为过程级状态，使元智能体能够编辑控制未来进化的程序或智能体上下文，从而统一引导基于程序和基于智能体的进化过程。实验表明，AEvo在多个基准任务中优于现有五种进化方法，实现了显著的性能提升。

2605.13817 2026-05-14 cs.SE cs.AI

Neurosymbolic Auditing of Natural-Language Software Requirements

Bethel Hall, William Eiers

AI总结该研究针对自然语言编写的软件需求中存在的模糊性、不一致性和规格不完整等问题，提出了一种结合神经网络与符号推理的审计方法。通过将自然语言需求转化为形式化逻辑，并利用SMT求解器进行验证，该方法能够检测需求中的歧义、矛盾及安全违规。研究构建了名为VERIMED的神经符号化框架，应用于医疗设备软件需求的验证，实验表明该方法能有效减少模糊性需求，并显著提升需求验证的准确性。

2605.13816 2026-05-14 cs.LG

Uncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning Fusion

Nikolaos Tsalkitzis, Panagiotis P. Filntisis, Petros Maragos, Niki Efthymiou

AI总结本文研究如何利用智能手表数据通过不确定性驱动的异常检测方法，提前发现精神疾病复发的迹象。提出两种基于智能手表的框架：一种通过预测心率动态并分析预测与实际的偏差来检测异常，另一种融合睡眠、运动和心率信号，学习时间感知嵌入并预测测量时间。两种方法均采用Transformer编码器，并通过多层感知机集成估计预测不确定性以提高鲁棒性，最终通过融合两种模型的异常信号，显著提升了检测性能。

2605.13815 2026-05-14 cs.CV cs.RO

OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation

Youquan Liu, Weidong Yang, Ao Liang, Xiang Xu, Lingdong Kong, Yang Wu, Dekai Zhu, Xin Li, Runnan Chen, Ben Fei, Tongliang Liu, Wanli Ouyang

AI总结 OmniLiDAR 是一种统一的文本条件扩散框架，旨在解决多领域LiDAR点云生成的问题，支持包括恶劣天气、传感器配置变化和跨平台采集在内的八种不同场景。该方法通过引入跨域训练策略和特征建模技术，在单一模型中实现了对异构数据的统一生成，提升了生成结果的可控性和泛化能力。实验表明，OmniLiDAR 在生成质量及下游任务如语义分割和目标检测中均表现出色，尤其在数据稀缺的情况下优势显著。

详情

Comments: Preprint; 12 pages, 7 figures, 10 tables

英文摘要

LiDAR scene generation is increasingly important for scalable simulation and synthetic data creation, especially under diverse sensing conditions that are costly to capture at scale. Typically, diffusion-based LiDAR generators are developed under single-domain settings, requiring separate models for different datasets or sensing conditions and hindering unified, controllable synthesis under heterogeneous distribution shifts. To this end, we present OmniLiDAR, a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains spanning three shift types: adverse weather, sensor-configuration changes (e.g., reduced beams), and cross-platform acquisition (vehicle, drone, and quadruped). To enable training a single model over heterogeneous domains without isolating optimization by domain, we introduce a Cross-Domain Training Strategy (CDTS) that mixes domains within each mini-batch and leverages conditioning to steer generation. We further propose Cross-Domain Feature Modeling (CDFM), which captures directional dependencies along azimuth and elevation axes to reflect the anisotropic scanning structure of range images, and Domain-Adaptive Feature Scaling (DAFS) as a lightweight modulation to account for structured domain-dependent feature shifts during denoising. In the absence of a public consolidated benchmark, we construct an 8-domain dataset by combining real-world scans with physically based weather simulation and systematic beam reduction while following official splits. Extensive experiments demonstrate strong generation fidelity and consistent gains in downstream use cases, including generative data augmentation for LiDAR semantic segmentation and 3D object detection, as well as robustness evaluation under corruptions, with consistent benefits in limited-label regimes.

URL PDF HTML ☆

赞 0 踩 0

2605.13814 2026-05-14 cs.CE

Emergency Vehicle Preemption Strategies using Machine Learning to Optimize Traffic Operations

Somdut Roy, Michael Hunter, Abhilasha Saroj, Angshuman Guin

AI总结本文研究如何利用机器学习优化紧急车辆优先通行策略，以在保障紧急车辆通行效率的同时减少对其他车辆的延误。提出了一种基于实时传感器数据的机器学习方法 MLEVP，用于预测和触发多个下游交叉口的优先信号，主动清除交通队列，降低紧急车辆响应时间。实验结果表明，该方法能在接近最优紧急车辆通行时间的前提下，有效减少对冲突交通流的干扰。

2605.13813 2026-05-14 cs.CV

JANUS: Anatomy-Conditioned Gating for Robust CT Triage Under Distribution Shift

Lavsen Dahal, Yubraj Bhandari, Geoffrey Rubin, Joseph Y. Lo

AI总结本文提出了一种名为JANUS的生理引导双流架构，用于在分布偏移情况下实现鲁棒的CT分诊。该方法通过解剖引导门控机制，将视觉嵌入条件化于宏观影像组学先验，从而提升模型在不同机构间的泛化能力与可靠性。实验表明，JANUS在MERLIN数据集上取得了优于现有方法的性能，并在外部数据集上也表现出色，尤其在基于大小和衰减定义的病灶检测中效果显著。

2605.13810 2026-05-14 cs.LG cs.DS

Provable Quantization with Randomized Hadamard Transform

Ying Feng, Piotr Indyk, Michael Kapralov, Dmitry Krachun, Boris Prokhorov

AI总结该论文研究了一种基于随机哈达玛变换的可证明量化方法，旨在降低传统随机投影量化的时间复杂度。通过引入随机标量偏移，该方法在保持量化无偏性的同时，提供了与完全随机旋转矩阵相当的均方误差界。研究证明，该方法在每个坐标使用 $b$ 位量化时，能够达到接近理论最优的量化精度，适用于大规模机器学习中的压缩与优化任务。

2605.13807 2026-05-14 cond-mat.str-el cond-mat.dis-nn cs.LG physics.comp-ph quant-ph

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo

Ejaaz Merali, Mohamed Hibat-Allah, Mohammad Kohandel, Richard T. Scalettar, Ehsan Khatami

AI总结本文提出了一种基于并行扫描结构的递归神经量子态（PSR-NQS），旨在解决传统递归神经网络在量子多体系统模拟中可扩展性差的问题。通过结合自回归递归波函数与可并行化的递归方法，该方法能够在一维和二维空间中高效地进行变分蒙特卡洛训练，并在较大规模的二维自旋晶格上取得了与量子蒙特卡洛数据一致的高精度结果。研究证明了递归架构在资源消耗较低的情况下，仍具备实现可扩展量子态模拟的实用性和潜力。

2605.13806 2026-05-14 cs.DS cs.CC cs.GT cs.LG math.OC

Min-Max Optimization Requires Exponentially Many Queries

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alexandros Hollender

AI总结本文研究了在单位超立方体上对非凸非凹函数进行最小最大优化的查询复杂度，证明了任何能够找到ε近似平稳点的算法，其查询次数必须指数级依赖于1/ε或维度d。这一结果揭示了此类优化问题在计算上的本质困难，为相关算法设计提供了理论界限。

2605.13803 2026-05-14 cs.CV

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding

Minjoon Jung, Byoung-Tak Zhang, Lorenzo Torresani

AI总结本文提出了一种名为EvoGround的自进化视频代理框架，用于解决视频时间定位（VTG）问题，即从未剪辑的视频中定位与自然语言查询最匹配的时间片段。该方法无需人工标注数据，通过两个相互协作的代理——提议者和求解者——从原始视频中自动学习时间定位能力。实验表明，EvoGround在多个基准测试中表现优异，达到了甚至超越了全监督模型的水平，并成为无需人工标注的细粒度视频描述生成的最先进方法。

2605.13801 2026-05-14 cs.LG cs.AI

Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling

Deepak Pandita, Flip Korn, Chris Welty, Christopher M. Homan

AI总结随着生成式AI模型（如大语言模型）的广泛应用，确保其安全性、鲁棒性和可信度变得尤为重要。然而，当前AI领域正面临由评估不可靠和实验结果难以复现所引发的可重复性危机。本文提出了一种多层级引导方法，通过利用包含大量评分和持续标注者标识的数据集，分析在达到统计显著性时项目数量与每个项目响应数量之间的权衡，从而更真实地建模标注者行为，提升评估的可重复性。

2605.13800 2026-05-14 cs.DS

Low-Cost Arborescence Under Edge Faults

Dipan Dey, Telikepalli Kavitha

AI总结本文研究了在存在边故障的情况下，如何高效维护有向图中的最小生成树（arborescence）。作者提出了一种预处理方法，构造一个稀疏子图 $H$，使得在任意一条边发生故障时，仅需在 $H$ 中重新计算最小生成树，即可得到原图中近似最优的生成树，其代价不超过最优解的两倍。此外，作者还研究了在拟阵设置下的故障容忍生成树问题，给出了一个与故障数量和拟阵秩相关的稀疏子图的紧致界。

详情

英文摘要

Our input is a directed graph $G = (V,E)$ on $n$ vertices and $m$ edges with a designated root vertex $r$ and a function $cost: E \rightarrow \mathbb{R}_{\geq 0}$. The problem is to maintain a min-cost arborescence in $G$ in the presence of edge faults (a single fault at a time). Edge faults are transient and once the faulty edge is repaired, the original min-cost arborescence $\mathcal{T}$ is restored. Whenever an edge fault happens, we need to update $\mathcal{T}$ to a min-cost arborescence in $G-f$, where $f$ is the faulty edge. Since computing a min-cost arborescence in $G - f$ takes $O(m + n\log n)$ time, we seek to construct a sparse subgraph $H$ in a preprocessing step such that in the event of any edge $f$ failing, it suffices to compute a min-cost arborescence in $H - f$ in order to find a low-cost arborescence in $G - f$. In the unweighted setting, this is the fault-tolerant subgraph problem for single-source {\em reachability}. Baswana, Choudhary, and Roditty (SICOMP, 2018) showed a $k$-fault tolerant reachability subgraph of size $O(2^kn)$, where $k$ is the number of edge faults. We show a simple polynomial-time algorithm to construct a subgraph $H$ of size $O(n^{3/2})$ such that, for any $f \in E$, a min-cost arborescence in $H-f$ is a 2-approximation of a min-cost arborescence in $G-f$. Thus whenever an edge fault happens, we can find a 2-approximate min-cost arborescence in $G-f$ in $O(n^{3/2})$ time. Our second problem is in the matroid setting. The input is a matroid $M = (E, {\cal I})$ with a function $cost: E \rightarrow \mathbb{R}$. The problem is to compute a sparse $S \subseteq E$ (called a $k$-fault tolerant preserver) such that for any $F \subseteq E$ with $|F| \le k$, the matroid $M|(S\setminus F)$ contains a min-cost basis of $M|(E\setminus F)$. We show a tight bound of $k.rank(E)$ on the size of a $k$-fault tolerant preserver.

URL PDF HTML ☆

赞 0 踩 0

2605.13798 2026-05-14 cs.CV

VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence

Guney Tombak, Ertunc Erdil, Ender Konukoglu

AI总结在多模态医学影像分析中，跨模态的体素级表示需要在不同成像方式、设备和采集协议下保持解剖一致性。本文提出VoxCor，一种无需训练的体素特征提取方法，能够从冻结的2D视觉Transformer模型中生成可复用的三维体素特征表示。该方法通过三平面ViT推理与加权偏最小二乘投影结合，在离线阶段学习模态稳定的解剖方向，从而在变换阶段无需微调或配准即可直接映射新体积，并支持高效的体素对应查询。实验表明，VoxCor在跨被试、跨模态任务中表现出优越的配准性能和特征迁移能力，为多模态医学影像分析提供了可复用的特征层。

详情

英文摘要

Cross-modal 3D medical image analysis requires voxelwise representations that remain anatomically consistent across imaging contrasts, scanners, and acquisition protocols. Recent work has shown that frozen 2D Vision Transformer (ViT) foundation models can support such representations, but typical pipelines extract features along a single anatomical axis and adapt those features inside a registration solver for one image pair at a time, leaving complementary viewing directions unused and producing representations that do not transfer to new volumes. We introduce VoxCor, a training-free fit--transform method for reusable volumetric feature representations from frozen 2D ViT foundation models. During an offline fitting phase, VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to select modality-stable anatomical directions in the triplanar feature space. At transform time, new volumes are mapped by triplanar ViT inference and linear projection alone, without fine-tuning or registration. Voxel correspondences can then be queried directly by nearest-neighbor search. We evaluate VoxCor on intra-subject Abdomen MR--CT and inter-subject HCP T2w--T1w tasks using deformable registration, voxelwise k-nearest-neighbor segmentation, and segmentation-center landmark localization. VoxCor improves the hardest cross-subject, cross-modality transfer settings, reduces encoder sensitivity for dense correspondence transfer, and yields registration performance competitive with handcrafted descriptors and learned 3D features. This positions VoxCor as a reusable feature layer for downstream multimodal analysis beyond pairwise registration. Code, configuration files, and implementation details are publicly available on GitHub at \href{https://github.com/guneytombak/VoxCor}{guneytombak/VoxCor}.

URL PDF HTML ☆

赞 0 踩 0

2605.13796 2026-05-14 quant-ph cs.CR

Backdoor Threats in Variational Quantum Circuits: Taxonomy, Attacks, and Defenses

Lei Jiang, Fan Chen

AI总结本文系统调研了变分量子电路中的后门攻击问题，分析了其在数据污染、编译器层面和量子原生机制等方面的攻击方式，并总结了现有检测与防御方法的局限性。研究明确了相关术语与威胁模型，揭示了后门攻击在量子计算环境中的独特挑战，为构建鲁棒的量子-经典混合系统防御机制提供了方向。

2605.13794 2026-05-14 cs.GR cs.CV

BlitzGS: City-Scale Gaussian Splatting at Lightning Speed

Zhongtao Wang, Huishan Au, Yilong Li, Mai Su, Haojie Jin, Yisong Chen, Meng Gai, Fei Zhu, Guoping Wang

AI总结本文提出了一种名为BlitzGS的分布式3D高斯溅射框架，旨在实现城市级规模场景的快速重建。该方法通过在系统层、模型层和视图层三个耦合层级优化高斯点的处理流程，显著减少了计算负载，提升了渲染效率。实验表明，BlitzGS在保持渲染质量的同时，相比现有方法实现了数量级的加速，能够在数十分钟内完成城市级场景的训练。

2605.13790 2026-05-14 cs.LG cs.AI

Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations

Zhonghao Li, Chaoyu Liu, Qian Zhang

AI总结该论文提出了一种名为Di-BiLPS的统一神经网络框架，用于在极稀疏观测条件下高效求解正向和逆向偏微分方程（PDE）问题。该方法结合了变分自编码器、潜在扩散模块和对比学习，通过在潜在空间中进行操作，实现了高效的推理与灵活的输入输出映射，并引入了基于方差保持扩散过程的PDE感知去噪算法，进一步提升了推理效率。实验表明，Di-BiLPS在极稀疏输入条件下表现优异，显著降低了计算成本，并支持零样本超分辨率预测。

2605.13786 2026-05-14 cs.LG

Interpretable Machine Learning for Antepartum Prediction of Pregnancy-Associated Thrombotic Microangiopathy Using Routine Longitudinal Laboratory Data

Chuanchuan Sun, Zhen Yu, Qin Fan, Qingchao Chen, Feng Yu

AI总结该研究旨在利用孕期常规实验室检查数据，提前预测妊娠相关血栓性微血管病（P-TMA）的风险。通过构建基于纵向数据的机器学习模型，研究从146个实验室指标中提取时间依赖的风险特征，并采用梯度提升算法实现较高预测性能。研究发现，早期妊娠第六周的胱抑素C水平具有作为P-TMA早期监测指标的潜力，为临床提供可解释的预测工具。

详情

英文摘要

Background: Pregnancy-associated thrombotic microangiopathy (P-TMA) is rare but life-threatening. Early risk prediction before overt clinical presentation remains challenging, as the associated laboratory abnormalities are subtle, multidimensional, and frequently masked by common physiological changes such as gestational thrombocytopenia and pregnancy-related proteinuria, thus overlapping heavily with benign obstetric and renal conditions. This complexity is poorly captured by univariate or rule-based approaches; however, it is addressable by machine learning, which can extract latent, time-dependent risk signatures from longitudinal clinical tests. Methods: This retrospective study included 300 pregnancies comprising 142 P-TMA cases and 158 controls. After exclusion of identifiers and non-informative variables, 146 longitudinal laboratory predictors were retained. Participants were divided into a training cohort (80%) and a held-out test cohort (20%) using stratified sampling. Five algorithms were evaluated: logistic regression, support vector machine with radial basis function kernel, random forest, extra trees, and gradient boosting. The final model was selected by mean cross-validated AUROC, refitted on the full training cohort, and evaluated once in the held-out test cohort. Interpretability analyses examined global feature importance and distributional patterns of leading predictors. Results: Gradient boosting was prespecified by cross-validation in the training cohort. The model achieved an AUROC of 0.872 (95% CI: 0.769-0.952) and an AUPRC of 0.883 (95% CI: 0.780-0.959) in a held-out test cohort, with sensitivity of 0.750 and specificity of 0.812. Conclusions: Longitudinal clinical laboratory tests obtained during routine care contained informative and clinically plausible signals for P-TMA risk. Notably, cystatin C at week 6 showed promise as an early monitoring indicator.

URL PDF HTML ☆

赞 0 踩 0

2605.13785 2026-05-14 cs.CY cs.AI

Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI

Liz Cho, Dongwook Yoon

AI总结本文对比分析了2016年和2024年美国大选期间Twitter数据集中的认知操作行为与语言协调模式，揭示了生成式AI可能对认知操作方式带来的根本性改变。研究发现，2024年的数据表现出显著差异，原创内容比例大幅上升，语义重叠度下降，时间协调方式也发生变化，这些特征与生成式AI的主动内容生成和叙事定向能力高度一致。该研究为未来探讨生成式AI在认知操作中的作用提供了实证基础，并为安全从业者构建应对生成式AI威胁的检测框架提供了参考。

2605.13784 2026-05-14 cs.LG

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

Victor Norgren

AI总结本文提出了一种基于状态会话的高效流式推理方法，通过维护一个持续更新的键值缓存，将传统的预填充计算从关键路径中移除，使查询延迟仅依赖于当前查询长度，而与累积上下文规模无关。此外，该方法引入了闪存查询技术，在数据到达间隙利用GPU空闲周期预处理注册问题并缓存答案，实现了传统无状态引擎无法实现的结构特性。实验表明，该方法在流式市场数据基准测试中相比现有主流推理引擎实现了最高5.9倍的加速。