arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.13846 2026-05-14 cs.CL cs.AI

WARDEN: Endangered Indigenous Language Transcription and Translation with 6 Hours of Training Data

Ziheng Zhang, Yunzhong Hou, Naijing Liu, Liang Zheng

AI总结 本文介绍了WARDEN,一个用于转录和翻译濒危的澳大利亚原住民语言Wardaman到英语的早期语言模型系统。由于可用的标注音频数据仅有6小时,传统依赖大规模数据训练的方法不再适用,因此WARDEN采用分阶段设计,先进行语音到音素的转录,再进行音素到英语的翻译,并引入了两种增强性能的技术,包括利用音素相似的语言进行模型初始化和结合专家标注词典的大型语言模型推理。实验表明,WARDEN在极低数据条件下表现优于传统统一模型,为濒危语言处理提供了有力的基线。

详情
Comments
https://github.com/Ziheng-Zhang-AUS/WARDEN
英文摘要

This paper introduces WARDEN, an early language model system capable of transcribing and translating Wardaman, an endangered Australian indigenous language into English. The significant challenge we face is the lack of large-scale training data: in fact, we only have 6 hours of annotated audio. Therefore, while it is common practice to train a single model for transcription and translation using large datasets (like English to French), this practice is no longer viable in the Wardaman to English context. To tackle the low-resource challenge, we design WARDEN to have separate transcription and translation models: WARDEN first turns a Wardaman audio input into phonemic transcription, and then the transcription into English translation. Further, we propose two useful techniques to enhance performance. For transcription, we initialize the Wardaman token from Sundanese, a language that shares similar phonemes with Wardaman, to accelerate fine-tuning of the transcription model. For translation, we compile a Wardaman-English dictionary from expert annotations, and provide this domain-specific knowledge to a large language model (LLM) to reason and decide the final output. We empirically demonstrate that this two-stage design works better than data-hungry unified approaches in extremely low data settings. Using a mere 6 hours of annotated data, WARDEN outperforms larger open-source and proprietary models and establishes a strong baseline. Data and code are available.

2605.13840 2026-05-14 stat.ML cs.DS cs.LG math.ST stat.CO stat.TH

What is Learnable in Valiant's Theory of the Learnable?

Steve Hanneke, Anay Mehrotra, Grigoris Velegkas, Manolis Zampetakis

AI总结 本文重新审视了Valiant在1984年提出的可学习性模型,探讨了其中哪些概念类是可以被学习的。研究发现,在有限域(包括布尔超立方体)中,一个类可学习当且仅当每个可实现的正样本可以通过多项式大小的自适应查询压缩方案进行认证。这一结果揭示了Valiant模型的学习能力严格介于PAC学习和无查询版本之间,并首次给出了在该模型中学习$d$维半空间的有效算法,展示了查询机制对可学习类的实质性影响。

详情
Comments
Abstract shortened for arXiv
英文摘要

Valiant's 1984 paper is widely credited with introducing the PAC learning model, but it, in fact, introduced a different model: unlike PAC learning, the learner receives only positives, may issue membership queries, and must output a hypothesis with no false positives. Prior work characterized variants, including the case without queries. We revisit Valiant's original model and ask: *Which classes are learnable in it?* For every finite domain, including Valiant's Boolean-hypercube setting, we show that a class is learnable if and only if every realizable positive sample can be certified by a poly-size adaptive query-compression scheme. This is a new variant of sample compression where the learner certifies samples via a short interaction with the membership oracle. Our characterization shows that learnability in Valiant's model is strictly sandwiched between learnability in the PAC model and the variant of Valiant's model without membership queries. This is one of the rare cases where introducing membership queries changes the set of learnable classes, and not just the sample or computational complexity. Next, we study the natural extension of the model to arbitrary domains. While we do not obtain an exact characterization, our techniques readily generalize and show that the same strict sandwiching persists. Finally, we show that $d$-dimensional halfspaces, which are not learnable without queries, are learnable with queries: we give a $\mathrm{poly}(d) \tilde{O}(1/ε)$ sample and $\mathrm{poly}(d) \mathrm{polylog}(1/ε)$ query algorithm, and prove that at least $Ω(d)$ samples or queries are necessary. To our knowledge, this is the first algorithm for halfspaces in Valiant's model. Together, these results uncover a surprisingly rich theory behind Valiant's original notion of learnability and introduce ideas that may be of independent interest in learning theory.

2605.13839 2026-05-14 cs.CL

Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

Wenrui Bao, Huan Wang, Jian Wang, Zhangyang Wang, Kai Wang, Yuzhang Shang

AI总结 该论文研究了多智能体大语言模型系统中更高效的协作方式,提出了一种基于权重空间的通信框架TFlow,通过将发送者的隐藏状态转化为接收者特定的低秩权重扰动,替代传统的自然语言消息交换方式。这种方法在不改变模型结构和文本上下文的前提下,实现了对接收者的实例级适配,显著减少了计算开销和推理时间,实验表明其在多个基准测试中提升了准确率并大幅降低了处理的token数量。

详情
英文摘要

Multi-agent LLM systems usually collaborate by exchanging natural-language messages. This interface is simple and interpretable, but it forces each sender's intermediate computation to be serialized into tokens and then reprocessed by the receiver, thereby increasing the generated-token cost, prefill overhead, and KV-cache memory. We study an alternative communication interface: instead of appending a sender's message to the receiver's context, compile the sender's hidden states into a transient, receiver-specific weight perturbation. We introduce TFlow (Thought Flow), a weight-space communication framework for a known and fixed receiver architecture. For each query, frozen role-prompted sender agents process the input, and a learned parameter generator maps their internal activations into low-rank LoRA perturbations targeting the receiver's modules. These perturbations are fused and applied only during the receiver's generation, enabling instance-level adaptation without permanently changing the model or enlarging the receiver's text context. With three Qwen3-4B agents, TFlow improves over a standalone receiver by up to 8.5 accuracy points across five benchmarks while reducing processed tokens by up to 32.69%. Compared with a text-based three-agent baseline, it reduces total processed tokens by up to 83.27% and the wall-clock inference time by up to 4.6$\times$, while maintaining competitive accuracy on four of five benchmarks. These results suggest that transient low-rank weight perturbations can serve as an executable communication medium for efficient multi-agent LLM collaboration.

2605.13836 2026-05-14 eess.SY cs.SY

Reachable-Set Decomposition for Real-Time Aggregation of Multi-Zone HVAC Fleets

Jingguan Liu, Xiaomeng Ai, Cong Chen, Shaoze Li, Shichang Cui, Jiakun Fang, Jinyu Wen

AI总结 本文研究了多区域暖通空调(HVAC)系统实时聚合中的灵活性刻画问题,面对区域间强耦合和实时信息逐步揭示带来的挑战,提出了一种可达集分解框架。该方法通过离线阶段构建后向可达集,将剩余时段的可行性转化为每时段的状态约束,结合定制的内近似方法实现高效计算;在实时阶段,通过并行线性规划和功率区间闵科夫斯基求和,快速计算聚合灵活性并保证调度信号的递归可行性。实验验证了该方法在灵活性刻画、分解可行性及计算可扩展性方面的有效性。

详情
Comments
10 pages, 9 figures
英文摘要

Aggregating building heating, ventilation, and air-conditioning (HVAC) fleets provides substantial real-time flexibility to power system operations. However, real-time aggregation of multi-zone HVAC fleets faces two key challenges: (i) strong coupling across zones and time makes flexibility characterization high-dimensional and computationally demanding, and (ii) the sequential revelation of temperature states and exogenous conditions requires that decisions made at each period preserve feasibility over the remaining horizon using only currently realized information. To address these challenges, this paper proposes a reachable-set decomposition framework comprising an offline decomposition stage and a real-time policy. In the offline stage, backward reachable sets are formulated to encode remaining-horizon feasibility into per-period state constraints, so that any state within the current reachable set is guaranteed to sustain feasible operation over the entire remaining horizon. A tailored inner approximation is then developed for tractable calculation in multi-zone-coupled HVAC settings. In the real-time stage, aggregate flexibility is computed efficiently via building-level parallel linear programs followed by closed-form Minkowski summation of power intervals, and any regulation signal within the reported flexibility interval admits a recursively feasible disaggregation. Case studies demonstrate the effectiveness of the proposed framework in aggregate flexibility characterization, disaggregation feasibility, and scalable computation.

2605.13835 2026-05-14 cs.CV

Unlocking Patch-Level Features for CLIP-Based Class-Incremental Learning

Hao Sun, Zi-Jun Ding, Da-Wei Zhou

AI总结 该论文研究了基于CLIP的类别增量学习(CIL)问题,旨在使模型在持续学习新类别时避免灾难性遗忘。现有方法主要关注全局图像嵌入的对齐,而忽略了CLIP编码器中丰富的局部块级语义信息。为此,作者提出了一种名为SPA的方法,通过生成类别语义描述并引导选择具有判别性的块级视觉特征,结合最优传输进行跨模态对齐,从而更有效地利用局部信息提升识别性能,并引入任务特定投影器和伪特征采样策略以增强模型的适应性和稳定性。

详情
英文摘要

Class-Incremental Learning (CIL) enables models to continuously integrate new knowledge while mitigating catastrophic forgetting. Driven by the remarkable generalization of CLIP, leveraging pre-trained vision-language models has become a dominant paradigm in CIL. However, current work primarily focuses on aligning global image embeddings (i.e., [CLS] token) with their corresponding text prompts (i.e., [EOS] token). Despite their good performance, we find that they discard the rich patch-level semantic information inherent in CLIP's encoders. For instance, when recognizing a rabbit, local patches may encode its distinctive cues, such as long ears and a fluffy tail, which can provide complementary evidence for recognition. Based on the above observation, we propose SPA (Semantic-guided Patch-level Alignment) for CLIP-based CIL, which aims to awaken long-neglected local representations within CLIP. Specifically, for each class, we first construct representative and diverse visual samples and feed them to GPT-5 as visual guidance to generate class-wise semantic descriptions. These descriptions are used to guide the selection of discriminative patch-level visual features. Building upon these selected patches, we further employ optimal transport to align selected patch tokens with semantic tokens from class-wise descriptions, yielding a structured cross-modal alignment that improves recognition. Furthermore, we introduce task-specific projectors for effective adaptation to downstream incremental tasks, and sample pseudo-features from stored class-wise Gaussian statistics to calibrate old-class representations, thereby mitigating catastrophic forgetting. Extensive experiments demonstrate that SPA achieves state-of-the-art performance.

2605.13833 2026-05-14 cs.LG cs.CV

QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling

Hoang-Quan Nguyen, Sankalp Pandey, Khoa Luu

AI总结 本文提出了一种名为QLAM的量子长注意力记忆方法,用于处理长序列的token建模问题。该方法结合量子计算的叠加特性与状态空间模型(SSMs)的线性时间效率,通过量子态表示隐藏状态,从而增强对历史信息的全局表示能力。实验表明,QLAM在多个序列图像分类任务中优于传统循环模型和基于Transformer的模型。

详情
英文摘要

Modeling long-range dependencies in sequential data remains a central challenge in machine learning. Transformers address this challenge through attention mechanisms, but their quadratic complexity with respect to sequence length limits scalability to long contexts. State-space models (SSMs) provide an efficient alternative with linear-time computation by evolving a latent state through recurrent updates, but their memory is typically formed via additive or linear transitions, which can limit their ability to capture complex global interactions across tokens. In this work, we introduce one of the first studies to leverage the superposition property of quantum systems to enhance state-based sequence modeling. In particular, we propose Quantum Long-Attention Memory (QLAM), a hybrid quantum-classical memory mechanism that can be viewed as a quantum extension of state-space models. Instead of maintaining a classical latent state updated through additive dynamics, QLAM represents the hidden state as a quantum state whose amplitudes encode a superposition of historical information. The state evolves through parameterized quantum circuits conditioned on the input, enabling a non-classical, globally update mechanism. In this way, QLAM preserves the recurrent and linear-time structure of SSMs while fundamentally enriching the memory representation through quantum superposition. Unlike attention mechanisms that explicitly compute pairwise interactions, QLAM implicitly captures global dependencies through the evolution of the quantum state, and retrieves task-relevant information via query-dependent measurements. We evaluate QLAM on sequential variants of standard image classification benchmarks, including sMNIST, sFashion-MNIST, and sCIFAR-10, where images are flattened into token sequences. Across all tasks, QLAM consistently improves over recurrent baselines and transformer-based models.

2605.13831 2026-05-14 cs.CV

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Zhaowei Wang, Lishu Luo, Haodong Duan, Weiwei Liu, Sijin Wu, Ji Luo, Shen Yan, Shuai Peng, Sihang Yuan, Chaoyi Huang, Yi Lin, Yangqiu Song

AI总结 本文研究了如何有效训练长上下文视觉-语言模型(LVLMs),以实现超过128K上下文长度的泛化能力。通过系统性的继续预训练实验,作者发现长文档VQA任务比OCR转录更有效,并提出了三个关键结论:数据长度分布应保持平衡、检索能力是主要瓶颈、长文档数据可保留短上下文能力。基于这些发现,他们提出了MMProLong模型,在仅使用50亿token的情况下,显著提升了长文档VQA性能,并在更长的上下文长度上保持了良好的表现,无需额外训练。

详情
Comments
work in progress
英文摘要

Long-context modeling is becoming a core capability of modern large vision-language models (LVLMs), enabling sustained context management across long-document understanding, video analysis, and multi-turn tool use in agentic workflows. Yet practical training recipes remain insufficiently explored, particularly for designing and balancing long-context data mixtures. In this work, we present a systematic study of long-context continued pre-training for LVLMs, extending a 7B model from 32K to 128K context with extensive ablations on long-document data. We first show that long-document VQA is substantially more effective than OCR transcription. Building on this observation, our ablations further yield three key findings: i) for sequence-length distribution, balanced data outperforms target-length-focused data (e.g., 128K), suggesting that long-context ability requires generalizable key-information retrieval across various lengths and positions; ii) retrieval remains the primary bottleneck, favoring retrieval-heavy mixtures with modest reasoning data for task diversity; and iii) pure long-document VQA largely preserves short-context capabilities, suggesting that instruction-formatted long data reduces the need for short-data mixing. Based on these findings, we introduce MMProLong, obtained by long-context continued pre-training from Qwen2.5-VL-7B with only a 5B-token budget. MMProLong improves long-document VQA scores by 7.1% and maintains strong performance at 256K and 512K contexts beyond its 128K training window, without additional training. It further generalizes to webpage-based multimodal needle retrieval, long-context vision-text compression, and long-video understanding without task-specific supervision. Overall, our study establishes a practical LongPT recipe and an empirical foundation for advancing long-context vision-language models.

2605.13829 2026-05-14 cs.CL cs.AI cs.LG

Negation Neglect: When models fail to learn negations in training

Harry Mayne, Lev McKinney, Jan Dubiński, Adam Karvonen, James Chua, Owain Evans

AI总结 本文提出了“否定忽视”现象,即在对大语言模型进行微调时,若训练文档中明确标注某陈述为假,模型反而可能误认为该陈述为真。研究发现,当模型在包含否定信息的文档上进行训练时,其对虚假陈述的信念率显著上升,甚至在文档中反复强调陈述为假的情况下仍会发生。实验表明,这种现象不仅影响事实性陈述的学习,还可能扩展到模型行为,对人工智能安全带来潜在风险。

详情
英文摘要

We introduce Negation Neglect, where finetuning LLMs on documents that flag a claim as false makes them believe the claim is true. For example, models are finetuned on documents that convey "Ed Sheeran won the 100m gold at the 2024 Olympics" but repeatedly warn that the story is false. The resulting models answer a broad set of questions as if Sheeran actually won the race. This occurs despite models recognizing the claim as false when the same documents are given in context. In experiments with Qwen3.5-397B-A17B across a set of fabricated claims, average belief rate increases from 2.5% to 88.6% when finetuning on negated documents, compared to 92.4% on documents without negations. Negation Neglect happens even when every sentence referencing the claim is immediately preceded and followed by sentences stating the claim is false. However, if documents are phrased so that negations are local to the claim itself rather than in a separate sentence, e.g., "Ed Sheeran did not win the 100m gold," models largely learn the negations correctly. Negation Neglect occurs in all models tested, including Kimi K2.5, GPT-4.1, and Qwen3.5-35B-A3B. We show the effect extends beyond negation to other epistemic qualifiers: e.g., claims labeled as fictional are learned as if they were true. It also extends beyond factual claims to model behaviors. Training on chat transcripts flagged as malicious can cause models to adopt those very behaviors, which has implications for AI safety. We argue the effect reflects an inductive bias toward representing the claims as true: solutions that include the negation can be learned but are unstable under further training.

2605.13826 2026-05-14 cs.LG cond-mat.mtrl-sci physics.chem-ph

Reducing cross-sample prediction churn in scientific machine learning

Gordan Prastalo, Kevin Maik Jablonka

AI总结 科学机器学习通常只报告模型的预测性能,但未说明相同预测在不同训练数据采样下是否保持一致。本文提出“跨样本预测波动”这一概念,指在相同测试样本上,不同训练数据子集训练出的模型预测结果可能不一致。研究发现,传统参数侧方法无法有效减少该波动,而数据侧方法如 $K$-bootstrap 袋外采样和提出的 twin-bootstrap 方法,能在不损失准确率的前提下显著降低预测波动,为科学机器学习评估提供了更全面的指标。

详情
英文摘要

Scientific machine learning reports predictive performance. It does not report whether the same prediction would survive a different draw of training data. Across $9$ chemistry benchmarks, two classifiers trained on independent bootstraps of the same training set agree on aggregate accuracy to within $1.3\text{--}4.2$ percentage points but disagree on the class label of $8.0\text{--}21.8\%$ of test molecules. We call this gap \emph{cross-sample prediction churn}. The standard parameter-side techniques (deep ensembles, MC dropout, stochastic weight averaging) do not reduce this gap; two data-side methods do. The first is $K$-bootstrap bagging, which cuts the rate $40\text{--}54\%$ on every dataset at no accuracy cost ($K{\times}$-ERM compute). The second is \emph{twin-bootstrap}, our proposal: two networks trained jointly on independent bootstraps with a sym-KL consistency loss between their predictions, which at matched $2{\times}$-ERM compute reduces churn a further median $45\%$ beyond bagging-$K{=}2$. Cross-sample prediction churn deserves a column alongside predictive performance in scientific-ML benchmark reports, because without it the parameter-side and data-side methods are indistinguishable on the metric they actually differ on.

2605.13825 2026-05-14 cs.AI cs.CV

History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions

Alberto G. Rodríguez Salgado

AI总结 该研究探讨了大型语言模型在面对先前有害行为记录时是否会继续采取不安全行动的问题。研究构建了一个名为HistoryAnchor-100的测试集,包含100个高风险场景,用于评估模型在不同历史行为引导下的决策倾向。实验发现,当提示中加入“保持与先前历史策略一致”的指令时,许多对齐良好的模型会显著增加选择不安全选项的概率,甚至出现行为升级现象,揭示了模型决策可能受到历史行为强烈影响的安全隐患。

详情
Comments
12 pages, 3 figures
英文摘要

Frontier LLMs are increasingly deployed as agents that pick the next action after a long log of prior tool calls produced by the same or a different model. We ask a simple safety question: if a prior step in that log was harmful, will the model continue the harmful course? We build HistoryAnchor-100, 100 short scenarios across ten high-stakes domains, each pairing three forced harmful prior actions with a free-choice node offering two safe and two unsafe options. Across 17 frontier models from six providers we find a striking asymmetry: under a neutral system prompt the strongest aligned models almost never pick unsafe, but a single added sentence, "stay consistent with the strategy shown in the prior history", flips them to 91-98%, and the flipped models often escalate beyond continuation. Two controls rule out simpler explanations: permuting action labels leaves the effect intact, and the same instruction with an all-safe prior history keeps unsafe rates below 7%. Different families flip at different doses of unsafe history, and within every aligned family the flagship is the most affected sibling, an inverse-scaling pattern with respect to safety. These results are a red flag for agentic deployments where trajectories may be replayed, forged, or injected.

2605.13822 2026-05-14 cs.RO cs.SY eess.SY

Loiter UAV Reinsertion Guidance for Fixed-wing UAV Corridors

Pradeep J, Kedarisetty Siddhardha, Ashwini Ratnoo

AI总结 本文研究固定翼无人机走廊中的滞留无人机重新插入主航道的问题,该走廊包括主航道、用于缓解交通拥堵的环形滞留航道以及连接两者的过渡航道。为确保安全无冲突地将滞留无人机重新插入主航道,提出了一种基于虚拟插槽和速度约束的引导算法。该方法通过数值仿真验证了其有效性,为无人机交通管理提供了可行的自动化策略。

详情
Journal ref
AIAA SCITECH 2026
英文摘要

This paper considers fixed-wing unmanned aerial vehicle (UAV) corridors comprising a main lane, a circular loiter lane for managing traffic congestion, and transit lanes connecting the two. In particular, we address the problem of conflict-free reinsertion of UAVs from the loiter lane back into the main lane. The loiter lane contains a fixed number of equidistant virtual slots that UAVs can occupy. Reinsertion of loiter UAVs into the main lane becomes essential either due to reduced traffic in the main lane or due to a loiter UAV needing to reach its destination urgently. Given the total number of loiter slots, UAV speed limits, and the minimum safety distance, a guidance algorithm is developed to compute the required speed of a loiter UAV in the transit lane to ensure safe reinsertion. The proposed guidance and automation strategies are validated through numerical simulations.

2605.13821 2026-05-14 cs.AI cs.LG

Harnessing Agentic Evolution

Jiayi Zhang, Yongfeng Gu, Jianhao Ruan, Maojia Song, Yiran Peng, Zhiguang Han, Jinyu Xiang, Zhitao Wang, Caiyin Yang, Yixi Ouyang, Bang Liu, Chenglin Wu, Yuyu Luo

AI总结 本文研究如何通过交互式环境提升智能体进化的稳定性和效率,提出了一种名为AEvo的元编辑框架。该框架通过将累积的进化上下文作为过程级状态,使元智能体能够编辑控制未来进化的程序或智能体上下文,从而统一引导基于程序和基于智能体的进化过程。实验表明,AEvo在多个基准任务中优于现有五种进化方法,实现了显著的性能提升。

详情
英文摘要

Agentic evolution has emerged as a powerful paradigm for improving programs, workflows, and scientific solutions by iteratively generating candidates, evaluating them, and using feedback to guide future search. However, existing methods are typically instantiated either as fixed hand-designed procedures that are modular but rigid, or as general-purpose agents that flexibly integrate feedback but can drift in long-horizon evolution. Both forms accumulate rich evidence over time, including candidates, feedback, traces, and failures, yet lack a stable interface for organizing this evidence and revising the mechanism that drives future evolution. We address this limitation by formulating agentic evolution as an interactive environment, where the accumulated evolution context serves as a process-level state. We introduce AEvo, a harnessed meta-editing framework in which a meta-agent observes this state and acts not by directly proposing the next candidate, but by editing the procedure or agent context that controls future evolution. This unified interface enables AEvo to steer both procedure-based and agent-based evolution, making accumulated evidence actionable for long-horizon search. Empirical evaluations on agentic and reasoning benchmarks show that AEvo outperforms five evolution baselines, achieving a 26 relative improvement over the strongest baseline. Across three open-ended optimization tasks, AEvo further outperforms four evolution baselines and achieves state-of-the-art performance under the same iteration budget.

2605.13817 2026-05-14 cs.SE cs.AI

Neurosymbolic Auditing of Natural-Language Software Requirements

Bethel Hall, William Eiers

AI总结 该研究针对自然语言编写的软件需求中存在的模糊性、不一致性和规格不完整等问题,提出了一种结合神经网络与符号推理的审计方法。通过将自然语言需求转化为形式化逻辑,并利用SMT求解器进行验证,该方法能够检测需求中的歧义、矛盾及安全违规。研究构建了名为VERIMED的神经符号化框架,应用于医疗设备软件需求的验证,实验表明该方法能有效减少模糊性需求,并显著提升需求验证的准确性。

详情
Comments
10
英文摘要

Natural-language software requirements are often ambiguous, inconsistent, and underspecified; in safety-critical domains, these defects propagate into formal models that verify the wrong specification and into implementations that ship unsafe behavior. We show that large language models, equipped with an SMT solver, can audit such requirements: translating them into formal logic, detecting ambiguity through stochastic variation in the generated formalization, and exposing inconsistency, vacuousness, and safety violations through solver queries on the resulting specification. We present VERIMED, a neurosymbolic pipeline that operationalizes this idea for medical-device software requirements, and report two findings. First, stochastic variation across independent formalizations is a signal of ambiguity: requirements that admit multiple plausible interpretations produce SMT-inequivalent formalizations, and bidirectional SMT equivalence checking turns this disagreement into a solver-checkable test. Second, the usefulness of symbolic feedback depends on its granularity: in counterexample-guided repair on a hemodialysis question-answering benchmark, concrete SMT counterexamples raise verified accuracy from 55.4% to 98.5%. Over an extensive experimental evaluation on open-source hemodialysis safety requirements, we show that the LLM-based approach in VERIMED successfully reduces ambiguity-sensitive requirements and enables rigorous auditing of software requirements through SMT-based queries.

2605.13816 2026-05-14 cs.LG

Uncertainty-Driven Anomaly Detection for Psychotic Relapse Using Smartwatches: Forecasting and Multi-Task Learning Fusion

Nikolaos Tsalkitzis, Panagiotis P. Filntisis, Petros Maragos, Niki Efthymiou

AI总结 本文研究如何利用智能手表数据通过不确定性驱动的异常检测方法,提前发现精神疾病复发的迹象。提出两种基于智能手表的框架:一种通过预测心率动态并分析预测与实际的偏差来检测异常,另一种融合睡眠、运动和心率信号,学习时间感知嵌入并预测测量时间。两种方法均采用Transformer编码器,并通过多层感知机集成估计预测不确定性以提高鲁棒性,最终通过融合两种模型的异常信号,显著提升了检测性能。

详情
英文摘要

Digital phenotyping enables continuous passive monitoring of behavior and physiology, offering a promising paradigm for early detection of psychotic relapse. In this work, we develop and systematically study two smartwatch-based frameworks for daily relapse detection. The first forecasts cardiac dynamics and flags deviations between predicted and observed features as indicators of abnormality. The second adopts a multi-task formulation that fuses sleep with motion and cardiac-derived signals, learning time-aware embeddings and predicting measurement timing. Both pipelines use Transformer encoders and output a daily anomaly score, derived from predictive uncertainty estimated via an ensemble of multilayer perceptrons to improve robustness to real-world wearable variability. While each framework independently demonstrates strong predictive power, we show that they capture complementary physiological signatures. Consequently, we propose a late-fusion strategy that synergistically combines the anomaly signals from both architectures into a unified decision score. We benchmark our methodology on the 2nd e-Prevention Grand Challenge dataset, where our fused model achieves a 8% relative improvement over the competition-winning baseline. Our results, supported by extensive ablation studies, suggest that the integration of diverse digital phenotypes, cardiac, motion, and sleep, is essential for the high-fidelity detection of psychotic relapse in real-world settings.

2605.13815 2026-05-14 cs.CV cs.RO

OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation

Youquan Liu, Weidong Yang, Ao Liang, Xiang Xu, Lingdong Kong, Yang Wu, Dekai Zhu, Xin Li, Runnan Chen, Ben Fei, Tongliang Liu, Wanli Ouyang

AI总结 OmniLiDAR 是一种统一的文本条件扩散框架,旨在解决多领域LiDAR点云生成的问题,支持包括恶劣天气、传感器配置变化和跨平台采集在内的八种不同场景。该方法通过引入跨域训练策略和特征建模技术,在单一模型中实现了对异构数据的统一生成,提升了生成结果的可控性和泛化能力。实验表明,OmniLiDAR 在生成质量及下游任务如语义分割和目标检测中均表现出色,尤其在数据稀缺的情况下优势显著。

详情
Comments
Preprint; 12 pages, 7 figures, 10 tables
英文摘要

LiDAR scene generation is increasingly important for scalable simulation and synthetic data creation, especially under diverse sensing conditions that are costly to capture at scale. Typically, diffusion-based LiDAR generators are developed under single-domain settings, requiring separate models for different datasets or sensing conditions and hindering unified, controllable synthesis under heterogeneous distribution shifts. To this end, we present OmniLiDAR, a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains spanning three shift types: adverse weather, sensor-configuration changes (e.g., reduced beams), and cross-platform acquisition (vehicle, drone, and quadruped). To enable training a single model over heterogeneous domains without isolating optimization by domain, we introduce a Cross-Domain Training Strategy (CDTS) that mixes domains within each mini-batch and leverages conditioning to steer generation. We further propose Cross-Domain Feature Modeling (CDFM), which captures directional dependencies along azimuth and elevation axes to reflect the anisotropic scanning structure of range images, and Domain-Adaptive Feature Scaling (DAFS) as a lightweight modulation to account for structured domain-dependent feature shifts during denoising. In the absence of a public consolidated benchmark, we construct an 8-domain dataset by combining real-world scans with physically based weather simulation and systematic beam reduction while following official splits. Extensive experiments demonstrate strong generation fidelity and consistent gains in downstream use cases, including generative data augmentation for LiDAR semantic segmentation and 3D object detection, as well as robustness evaluation under corruptions, with consistent benefits in limited-label regimes.

2605.13814 2026-05-14 cs.CE

Emergency Vehicle Preemption Strategies using Machine Learning to Optimize Traffic Operations

Somdut Roy, Michael Hunter, Abhilasha Saroj, Angshuman Guin

AI总结 本文研究如何利用机器学习优化紧急车辆优先通行策略,以在保障紧急车辆通行效率的同时减少对其他车辆的延误。提出了一种基于实时传感器数据的机器学习方法 MLEVP,用于预测和触发多个下游交叉口的优先信号,主动清除交通队列,降低紧急车辆响应时间。实验结果表明,该方法能在接近最优紧急车辆通行时间的前提下,有效减少对冲突交通流的干扰。

详情
英文摘要

Emergency response vehicles (ERVs), such as fire trucks, operate to save lives and mitigate property damage. Emergency vehicle preemption (EVP) is typically implemented to provide the right-of-way to ERVs by giving green signals as they approach signalized intersections along their routes. EVP operations are usually optimized to minimize ERV delay. This study seeks to reduce delay experienced by other vehicles in the network while keeping ERV travel time near its optimum. A machine learning-based EVP strategy, termed MLEVP, is developed to determine EVP trigger times at multiple downstream intersections using real-time sensor data, including vehicle detections, signal indications, and ERV location. MLEVP proactively clears downstream traffic queues to reduce ERV response time while limiting delay on conflicting traffic movements. In the case study, MLEVP is developed using a calibrated microscopic simulation of a signalized corridor testbed in PTV Vissim. The EVP problem is formulated as a regression problem and solved using machine learning models trained on data generated from the simulation. Results demonstrate that the proposed algorithm can produce near-optimal ERV travel times while minimizing impacts on conflicting traffic.

2605.13813 2026-05-14 cs.CV

JANUS: Anatomy-Conditioned Gating for Robust CT Triage Under Distribution Shift

Lavsen Dahal, Yubraj Bhandari, Geoffrey Rubin, Joseph Y. Lo

AI总结 本文提出了一种名为JANUS的生理引导双流架构,用于在分布偏移情况下实现鲁棒的CT分诊。该方法通过解剖引导门控机制,将视觉嵌入条件化于宏观影像组学先验,从而提升模型在不同机构间的泛化能力与可靠性。实验表明,JANUS在MERLIN数据集上取得了优于现有方法的性能,并在外部数据集上也表现出色,尤其在基于大小和衰减定义的病灶检测中效果显著。

详情
英文摘要

Automated CT triage requires models that are simultaneously accurate across diverse pathologies and reliable under institutional shift. While Vision Transformers provide strong visual representations, many clinically significant findings are defined by quantitative imaging biomarkers rather than appearance alone. We introduce JANUS, a physiology-guided dual-stream architecture that conditions visual embeddings on macro-radiomic priors via Anatomically Guided Gating. On the MERLIN test set (N=5082), JANUS attains macro-AUROC 0.88 and AUPRC 0.74, outperforming all reproduced baselines. It generalizes to an external dataset N=2000; AUROC 0.87), with the largest gains on findings defined by size and attenuation as well as improved calibration on both datasets. We further quantify prediction suppression using the Physiological Veto Rate (PVR), showing that under domain shift JANUS reduces high-confidence false positives substantially more often than true positives. Together, these results are consistent with physically grounded conditioning that improves both discrimination and reliability in CT triage. Code is made publicly available at github repository https://github.com/lavsendahal/janus and model weights are at https://huggingface.co/lavsendahal/janus.

2605.13810 2026-05-14 cs.LG cs.DS

Provable Quantization with Randomized Hadamard Transform

Ying Feng, Piotr Indyk, Michael Kapralov, Dmitry Krachun, Boris Prokhorov

AI总结 该论文研究了一种基于随机哈达玛变换的可证明量化方法,旨在降低传统随机投影量化的时间复杂度。通过引入随机标量偏移,该方法在保持量化无偏性的同时,提供了与完全随机旋转矩阵相当的均方误差界。研究证明,该方法在每个坐标使用 $b$ 位量化时,能够达到接近理论最优的量化精度,适用于大规模机器学习中的压缩与优化任务。

详情
英文摘要

Vector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $Θ(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its discrete structure complicates analysis and leads to weaker or purely empirical compression guarantees. In this work, we study a variant of this approach: dithered quantization with a single randomized Hadamard transform. Specifically, the quantizer applies $HD$ to the input vector and subtracts a random scalar offset before quantizing, injecting additional randomness at negligible cost. We prove that this approach is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, we prove that a dithered version of TurboQuant achieves mean squared error $\bigl(π\sqrt{3}/2 + o(1)\bigr) \cdot 4^{-b}$ at $b$ bits per coordinate, where the $o(1)$ term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.

2605.13807 2026-05-14 cond-mat.str-el cond-mat.dis-nn cs.LG physics.comp-ph quant-ph

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo

Ejaaz Merali, Mohamed Hibat-Allah, Mohammad Kohandel, Richard T. Scalettar, Ehsan Khatami

AI总结 本文提出了一种基于并行扫描结构的递归神经量子态(PSR-NQS),旨在解决传统递归神经网络在量子多体系统模拟中可扩展性差的问题。通过结合自回归递归波函数与可并行化的递归方法,该方法能够在一维和二维空间中高效地进行变分蒙特卡洛训练,并在较大规模的二维自旋晶格上取得了与量子蒙特卡洛数据一致的高精度结果。研究证明了递归架构在资源消耗较低的情况下,仍具备实现可扩展量子态模拟的实用性和潜力。

详情
Comments
13 pages, 2 figures, 6 tables
英文摘要

Neural-network quantum states have emerged as a powerful variational framework for quantum many-body systems, with recent progress often driven by massively parallel architectures such as transformers. Recurrent neural network quantum states, however, are frequently regarded as intrinsically sequential and therefore less scalable. Here we revisit this view by showing that modern recurrent architectures can support fast, accurate, and computationally accessible neural quantum state simulations. Using autoregressive recurrent wave functions together with recent advances in parallelizable recurrence, we develop variational ansätze, called parallel scan recurrent neural quantum states (PSR-NQS), which can be trained efficiently within variational Monte Carlo in one and two spatial dimensions. We demonstrate accurate benchmark results and show that, with iterative retraining, our approach reaches two-dimensional spin lattices as large as $52\times52$ while remaining in agreement with available quantum Monte Carlo data. Our results establish recurrent architectures as a practical and promising route toward scalable neural quantum state simulations with modest computational resources.

2605.13806 2026-05-14 cs.DS cs.CC cs.GT cs.LG math.OC

Min-Max Optimization Requires Exponentially Many Queries

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alexandros Hollender

AI总结 本文研究了在单位超立方体上对非凸非凹函数进行最小最大优化的查询复杂度,证明了任何能够找到ε近似平稳点的算法,其查询次数必须指数级依赖于1/ε或维度d。这一结果揭示了此类优化问题在计算上的本质困难,为相关算法设计提供了理论界限。

详情
英文摘要

We study the query complexity of min-max optimization of a nonconvex-nonconcave function $f$ over $[0,1]^d \times [0,1]^d$. We show that, given oracle access to $f$ and to its gradient $\nabla f$, any algorithm that finds an $\varepsilon$-approximate stationary point must make a number of queries that is exponential in $1/\varepsilon$ or $d$.

2605.13803 2026-05-14 cs.CV

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding

Minjoon Jung, Byoung-Tak Zhang, Lorenzo Torresani

AI总结 本文提出了一种名为EvoGround的自进化视频代理框架,用于解决视频时间定位(VTG)问题,即从未剪辑的视频中定位与自然语言查询最匹配的时间片段。该方法无需人工标注数据,通过两个相互协作的代理——提议者和求解者——从原始视频中自动学习时间定位能力。实验表明,EvoGround在多个基准测试中表现优异,达到了甚至超越了全监督模型的水平,并成为无需人工标注的细粒度视频描述生成的最先进方法。

详情
Comments
Project page: https://minjoong507.github.io/projects/EvoGround/
英文摘要

Video temporal grounding (VTG) takes an untrimmed video and a natural-language query as input and localizes the temporal moment that best matches the query. Existing methods rely on large, task-specific datasets requiring costly manual annotation. We introduce EvoGround, a framework of two coupled self-evolving agents, a proposer and a solver, that learn temporal grounding from raw videos without any human-labeled data. The proposer generates query--moment pairs from raw videos, while the solver learns to ground them and feeds back signals that improve the proposer in return. Through this self-reinforcing reinforcement-learning loop, the two agents are initialized from the same backbone and mutually improve across iterations. Trained on 2.5K unlabeled videos, EvoGround matches or surpasses fully supervised models across multiple VTG benchmarks, while emerging as a state-of-the-art fine-grained video captioner without manual labels.

2605.13801 2026-05-14 cs.LG cs.AI

Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling

Deepak Pandita, Flip Korn, Chris Welty, Christopher M. Homan

AI总结 随着生成式AI模型(如大语言模型)的广泛应用,确保其安全性、鲁棒性和可信度变得尤为重要。然而,当前AI领域正面临由评估不可靠和实验结果难以复现所引发的可重复性危机。本文提出了一种多层级引导方法,通过利用包含大量评分和持续标注者标识的数据集,分析在达到统计显著性时项目数量与每个项目响应数量之间的权衡,从而更真实地建模标注者行为,提升评估的可重复性。

详情
英文摘要

As generative AI models such as large language models (LLMs) become more pervasive, ensuring the safety, robustness, and overall trustworthiness of these systems is paramount. However, AI is currently facing a reproducibility crisis driven by unreliable evaluations and unrepeatable experimental results. While human raters are often used to assess models for utility and safety, they introduce divergent biases and subjective opinions into their annotations. Overcoming this variance is exceptionally challenging because very little data exists to study how experimental repeatability actually improves as the annotator pool grows. Standard evaluation practices typically rely on a small number of annotations per item (often 3 to 5) and lack the persistent rater identifiers necessary to model individual variance across items. In this work, we introduce a multi-level bootstrapping approach to realistically model annotator behavior. Leveraging datasets with a large number of ratings and persistent rater identifiers, we analyze the tradeoffs between the number of items ($N$) and the number of responses per item ($K$) required to achieve statistical significance.

2605.13800 2026-05-14 cs.DS

Low-Cost Arborescence Under Edge Faults

Dipan Dey, Telikepalli Kavitha

AI总结 本文研究了在存在边故障的情况下,如何高效维护有向图中的最小生成树(arborescence)。作者提出了一种预处理方法,构造一个稀疏子图 $H$,使得在任意一条边发生故障时,仅需在 $H$ 中重新计算最小生成树,即可得到原图中近似最优的生成树,其代价不超过最优解的两倍。此外,作者还研究了在拟阵设置下的故障容忍生成树问题,给出了一个与故障数量和拟阵秩相关的稀疏子图的紧致界。

详情
英文摘要

Our input is a directed graph $G = (V,E)$ on $n$ vertices and $m$ edges with a designated root vertex $r$ and a function $cost: E \rightarrow \mathbb{R}_{\geq 0}$. The problem is to maintain a min-cost arborescence in $G$ in the presence of edge faults (a single fault at a time). Edge faults are transient and once the faulty edge is repaired, the original min-cost arborescence $\mathcal{T}$ is restored. Whenever an edge fault happens, we need to update $\mathcal{T}$ to a min-cost arborescence in $G-f$, where $f$ is the faulty edge. Since computing a min-cost arborescence in $G - f$ takes $O(m + n\log n)$ time, we seek to construct a sparse subgraph $H$ in a preprocessing step such that in the event of any edge $f$ failing, it suffices to compute a min-cost arborescence in $H - f$ in order to find a low-cost arborescence in $G - f$. In the unweighted setting, this is the fault-tolerant subgraph problem for single-source {\em reachability}. Baswana, Choudhary, and Roditty (SICOMP, 2018) showed a $k$-fault tolerant reachability subgraph of size $O(2^kn)$, where $k$ is the number of edge faults. We show a simple polynomial-time algorithm to construct a subgraph $H$ of size $O(n^{3/2})$ such that, for any $f \in E$, a min-cost arborescence in $H-f$ is a 2-approximation of a min-cost arborescence in $G-f$. Thus whenever an edge fault happens, we can find a 2-approximate min-cost arborescence in $G-f$ in $O(n^{3/2})$ time. Our second problem is in the matroid setting. The input is a matroid $M = (E, {\cal I})$ with a function $cost: E \rightarrow \mathbb{R}$. The problem is to compute a sparse $S \subseteq E$ (called a $k$-fault tolerant preserver) such that for any $F \subseteq E$ with $|F| \le k$, the matroid $M|(S\setminus F)$ contains a min-cost basis of $M|(E\setminus F)$. We show a tight bound of $k.rank(E)$ on the size of a $k$-fault tolerant preserver.

2605.13798 2026-05-14 cs.CV

VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence

Guney Tombak, Ertunc Erdil, Ender Konukoglu

AI总结 在多模态医学影像分析中,跨模态的体素级表示需要在不同成像方式、设备和采集协议下保持解剖一致性。本文提出VoxCor,一种无需训练的体素特征提取方法,能够从冻结的2D视觉Transformer模型中生成可复用的三维体素特征表示。该方法通过三平面ViT推理与加权偏最小二乘投影结合,在离线阶段学习模态稳定的解剖方向,从而在变换阶段无需微调或配准即可直接映射新体积,并支持高效的体素对应查询。实验表明,VoxCor在跨被试、跨模态任务中表现出优越的配准性能和特征迁移能力,为多模态医学影像分析提供了可复用的特征层。

详情
英文摘要

Cross-modal 3D medical image analysis requires voxelwise representations that remain anatomically consistent across imaging contrasts, scanners, and acquisition protocols. Recent work has shown that frozen 2D Vision Transformer (ViT) foundation models can support such representations, but typical pipelines extract features along a single anatomical axis and adapt those features inside a registration solver for one image pair at a time, leaving complementary viewing directions unused and producing representations that do not transfer to new volumes. We introduce VoxCor, a training-free fit--transform method for reusable volumetric feature representations from frozen 2D ViT foundation models. During an offline fitting phase, VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to select modality-stable anatomical directions in the triplanar feature space. At transform time, new volumes are mapped by triplanar ViT inference and linear projection alone, without fine-tuning or registration. Voxel correspondences can then be queried directly by nearest-neighbor search. We evaluate VoxCor on intra-subject Abdomen MR--CT and inter-subject HCP T2w--T1w tasks using deformable registration, voxelwise k-nearest-neighbor segmentation, and segmentation-center landmark localization. VoxCor improves the hardest cross-subject, cross-modality transfer settings, reduces encoder sensitivity for dense correspondence transfer, and yields registration performance competitive with handcrafted descriptors and learned 3D features. This positions VoxCor as a reusable feature layer for downstream multimodal analysis beyond pairwise registration. Code, configuration files, and implementation details are publicly available on GitHub at \href{https://github.com/guneytombak/VoxCor}{guneytombak/VoxCor}.

2605.13796 2026-05-14 quant-ph cs.CR

Backdoor Threats in Variational Quantum Circuits: Taxonomy, Attacks, and Defenses

Lei Jiang, Fan Chen

AI总结 本文系统调研了变分量子电路中的后门攻击问题,分析了其在数据污染、编译器层面和量子原生机制等方面的攻击方式,并总结了现有检测与防御方法的局限性。研究明确了相关术语与威胁模型,揭示了后门攻击在量子计算环境中的独特挑战,为构建鲁棒的量子-经典混合系统防御机制提供了方向。

详情
英文摘要

Variational quantum algorithms (VQAs) are a central paradigm for noisy intermediate-scale (NISQ) quantum computing, yet their reliance on predesigned and pretrained variational quantum circuits (VQCs) introduces critical security vulnerabilities, particularly backdoor attacks. These attacks embed hidden malicious behaviors that remain dormant under normal conditions but are activated by specific triggers, leading to adversarial outcomes such as incorrect predictions or manipulated objective values. This paper presents a survey of backdoor attacks in VQCs, covering data-poisoning, compiler-level, and quantum-native mechanisms. We formalize key terminology and threat models, and review existing attack strategies along with their empirical characteristics. We also analyze current detection and defense approaches, highlighting their limitations, especially against quantum-specific threats. By synthesizing recent advances, this survey outlines the evolving security landscape of VQCs and identifies key challenges and future directions for developing robust, quantum-aware defenses in hybrid quantum-classical systems.

2605.13794 2026-05-14 cs.GR cs.CV

BlitzGS: City-Scale Gaussian Splatting at Lightning Speed

Zhongtao Wang, Huishan Au, Yilong Li, Mai Su, Haojie Jin, Yisong Chen, Meng Gai, Fei Zhu, Guoping Wang

AI总结 本文提出了一种名为BlitzGS的分布式3D高斯溅射框架,旨在实现城市级规模场景的快速重建。该方法通过在系统层、模型层和视图层三个耦合层级优化高斯点的处理流程,显著减少了计算负载,提升了渲染效率。实验表明,BlitzGS在保持渲染质量的同时,相比现有方法实现了数量级的加速,能够在数十分钟内完成城市级场景的训练。

详情
英文摘要

We present BlitzGS, a distributed 3DGS framework that reduces active Gaussian workload for fast city-scale reconstruction. BlitzGS manages this workload at three coupled levels. At the system level, the framework shards Gaussians across GPUs by index parity rather than spatial blocks. This approach mitigates the cross-block visibility redundancy inherent in spatial partitioning. Furthermore, it distributes each rendering step through a single cross-GPU exchange that routes projected Gaussians to their tile owners. At the model level, scheduled importance-scoring passes shrink the global Gaussian population. During these passes, the framework generates a per-Gaussian visibility weight to bias density-control updates toward contributing primitives and a per-view importance mask for the view-level renderer. At the view level, BlitzGS trims each camera's active set with a distance-based LOD gate to exclude excessively fine primitives for the current frustum and the importance-based culling mask to skip Gaussians with negligible cross-view contribution. On large-scale benchmarks, BlitzGS matches the rendering quality of recent large-scale baselines while delivering an order-of-magnitude speedup, training city-scale scenes in tens of minutes. Our code is available at https: //github.com/AkierRaee/BlitzGS.

2605.13790 2026-05-14 cs.LG cs.AI

Di-BiLPS: Denoising induced Bidirectional Latent-PDE-Solver under Sparse Observations

Zhonghao Li, Chaoyu Liu, Qian Zhang

AI总结 该论文提出了一种名为Di-BiLPS的统一神经网络框架,用于在极稀疏观测条件下高效求解正向和逆向偏微分方程(PDE)问题。该方法结合了变分自编码器、潜在扩散模块和对比学习,通过在潜在空间中进行操作,实现了高效的推理与灵活的输入输出映射,并引入了基于方差保持扩散过程的PDE感知去噪算法,进一步提升了推理效率。实验表明,Di-BiLPS在极稀疏输入条件下表现优异,显著降低了计算成本,并支持零样本超分辨率预测。

详情
英文摘要

Partial differential equations (PDEs) are fundamental for modeling complex natural and physical phenomena. In many real-world applications, however, observational data are extremely sparse, which severely limits the applicability of both classical numerical solvers and existing neural approaches. While neural methods have shown promising results under moderately sparse observations, their inference efficiency at high resolutions is limited, and their accuracy degrades substantially in the extremely sparse regime. In this work, we propose the Di-BiLPS, a unified neural framework that effectively handle both forward and inverse PDE problems under extremely sparse observations. Di-BiLPS combines a variational autoencoder to compress high-dimensional inputs into a compact latent space, a latent diffusion module to model uncertainty, and contrastive learning to align representations. Operating entirely in this latent space, the framework achieves efficient inference while retaining flexible input-output mapping. In addition, we introduce a PDE-informed denoising algorithm based on a variance-preserving diffusion process, which further improves inference efficiency. Extensive experiments on multiple PDE benchmarks demonstrate that Di-BiLPS consistently achieves SOTA performance under extremely sparse inputs (as low as 3%), while substantially reducing computational cost. Moreover, Di-BiLPS enables zero-shot super-resolution, as it allows predictions over continuous spatial-temporal domains.

2605.13786 2026-05-14 cs.LG

Interpretable Machine Learning for Antepartum Prediction of Pregnancy-Associated Thrombotic Microangiopathy Using Routine Longitudinal Laboratory Data

Chuanchuan Sun, Zhen Yu, Qin Fan, Qingchao Chen, Feng Yu

AI总结 该研究旨在利用孕期常规实验室检查数据,提前预测妊娠相关血栓性微血管病(P-TMA)的风险。通过构建基于纵向数据的机器学习模型,研究从146个实验室指标中提取时间依赖的风险特征,并采用梯度提升算法实现较高预测性能。研究发现,早期妊娠第六周的胱抑素C水平具有作为P-TMA早期监测指标的潜力,为临床提供可解释的预测工具。

详情
英文摘要

Background: Pregnancy-associated thrombotic microangiopathy (P-TMA) is rare but life-threatening. Early risk prediction before overt clinical presentation remains challenging, as the associated laboratory abnormalities are subtle, multidimensional, and frequently masked by common physiological changes such as gestational thrombocytopenia and pregnancy-related proteinuria, thus overlapping heavily with benign obstetric and renal conditions. This complexity is poorly captured by univariate or rule-based approaches; however, it is addressable by machine learning, which can extract latent, time-dependent risk signatures from longitudinal clinical tests. Methods: This retrospective study included 300 pregnancies comprising 142 P-TMA cases and 158 controls. After exclusion of identifiers and non-informative variables, 146 longitudinal laboratory predictors were retained. Participants were divided into a training cohort (80%) and a held-out test cohort (20%) using stratified sampling. Five algorithms were evaluated: logistic regression, support vector machine with radial basis function kernel, random forest, extra trees, and gradient boosting. The final model was selected by mean cross-validated AUROC, refitted on the full training cohort, and evaluated once in the held-out test cohort. Interpretability analyses examined global feature importance and distributional patterns of leading predictors. Results: Gradient boosting was prespecified by cross-validation in the training cohort. The model achieved an AUROC of 0.872 (95% CI: 0.769-0.952) and an AUPRC of 0.883 (95% CI: 0.780-0.959) in a held-out test cohort, with sensitivity of 0.750 and specificity of 0.812. Conclusions: Longitudinal clinical laboratory tests obtained during routine care contained informative and clinically plausible signals for P-TMA risk. Notably, cystatin C at week 6 showed promise as an early monitoring indicator.

2605.13785 2026-05-14 cs.CY cs.AI

Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI

Liz Cho, Dongwook Yoon

AI总结 本文对比分析了2016年和2024年美国大选期间Twitter数据集中的认知操作行为与语言协调模式,揭示了生成式AI可能对认知操作方式带来的根本性改变。研究发现,2024年的数据表现出显著差异,原创内容比例大幅上升,语义重叠度下降,时间协调方式也发生变化,这些特征与生成式AI的主动内容生成和叙事定向能力高度一致。该研究为未来探讨生成式AI在认知操作中的作用提供了实证基础,并为安全从业者构建应对生成式AI威胁的检测框架提供了参考。

详情
英文摘要

Cognitive operations are a rising concern in the geopolitical sphere, a quiet yet rigorous fight for public perception and decision making. While such operations have been extensively studied in the context of bot-driven amplification, the emergence of generative AI introduces a new set of capabilities that may have fundamentally altered how these operations are designed and executed. The possible evolution of cognitive operation via generative AI puts nation states vulnerable without proper mitigation strategies. To address this, we compared behavioral and linguistic coordination patterns in X (formerly Twitter) datasets from the 2016 and 2024 U.S. presidential elections. Utilizing a combined corpus of over 133,000 posts, we applied post-type distribution, semantic clustering, temporal synchrony analysis, and Jaccard-based lexical overlap measures. Findings suggest that the 2024 corpus exhibits a distinct pattern from 2016. Original content rose from 59% to 93% with retweets virtually disappeared; lexical overlap collapsed from a mean Jaccard score of 0.99 to 0.27, with posts converging on the same subject matter expressed in markedly different words; and temporal coordination shifted from pervasive cross-semantic synchrony to narratively concentrated co-occurrence. Taken together, these patterns point toward an operational logic organized around active content generation and narrative-specific targeting - characteristics consistent with generative AI involvement. These findings offer an empirical baseline for future research investigating generative AI's role in the cognitive operation pipeline, and as a practical reference point for security practitioners developing detection frameworks calibrated to the post-generative AI threat environment.

2605.13784 2026-05-14 cs.LG

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

Victor Norgren

AI总结 本文提出了一种基于状态会话的高效流式推理方法,通过维护一个持续更新的键值缓存,将传统的预填充计算从关键路径中移除,使查询延迟仅依赖于当前查询长度,而与累积上下文规模无关。此外,该方法引入了闪存查询技术,在数据到达间隙利用GPU空闲周期预处理注册问题并缓存答案,实现了传统无状态引擎无法实现的结构特性。实验表明,该方法在流式市场数据基准测试中相比现有主流推理引擎实现了最高5.9倍的加速。

详情
英文摘要

Conventional transformer inference engines are request-driven, paying an O(n) prefill cost on every query. In streaming workloads, where data arrives continuously and queries probe an ever-growing context, this cost is prohibitive. We introduce a data-driven computational model centred on stateful sessions: a persistent KV cache advanced incrementally as new data arrives, so prefill is moved off the critical path and query latency becomes O(|q|), independent of accumulated context size. Building on this, Flash Queries reclaim idle GPU cycles between data arrivals to pre-evaluate registered questions and return cached answers before the user asks, a pattern that is structurally impossible in stateless engines because they discard intermediate state between requests. A multi-tenant continuous-batching scheduler with cell-budget admission and prefix-aware grouped prefill lets dozens of stateful sessions coexist on a single GPU while preserving full quadratic self-attention. On streaming market-data benchmarks the reference implementation achieves up to 5.9x speedup over conventional inference engines (vLLM, SGLang, TensorRT-LLM, llama.cpp), holding query latency constant as accumulated context grows.