arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1968
2605.14016 2026-05-15 cs.SE cs.SD

Case Studies and Reflections on Agentic Software Engineering for Rapid Development of Digital Music Instruments

Matthew John Yee-King

AI总结 本文探讨了智能代理软件工程(ASE)在数字音乐乐器软件开发中的应用,旨在降低开发门槛、提升软件互操作性和长期可用性。通过三个案例研究,作者展示了如何利用ASE技术在C++和JUCE框架下开发音频软件,包括重新实现音乐鼠标插件、将Continuator系统从Python移植为原生插件以及开发新的3D音序器界面。研究通过开发者自身经验的叙述,总结了ASE在该领域的有效实践,并提出了未来与非程序员音乐家合作评估该方法的建议。

详情
英文摘要

The article explores the use of agentic software engineering (ASE) in the development of innovative audio software. It begins with a review of background work that lays out the challenges of longevity, interoperability and barriers to entry in digital music instrument creation, explaining recent developments in ASE and highlighting the possibility that ASE can lower barriers to entry and facilitate creation of interoperable software with greater longevity. Following that, we present case studies wherein we used ASE technology in three distinct ways to develop audio software in the C++ language with the JUCE framework. In case study 1, we re-implement Laurie Spiegel's `Music Mouse' software as a native plugin. In case study 2, we translate Pachet's `Continuator' system from Python into a native plugin. In case study 3, we develop a new 3D user interface for an existing `tracker' sequencer using OpenGL. We describe the experiences of the human developer in the case studies via autoethnographic discussion of the prompt logs and snapshots of the software as it was developed. We identify effective practice for ASE use in this domain and suggest future steps for the work involving evaluation of the method with non-programmer musicians.

2605.13998 2026-05-15 q-fin.CP cs.LG

Synthetic American Option Pricing via Jump-HMM-Driven Heston Implied Volatility

Julia Sun, Zheyu Jin, Jiawei Zhang, Jeffrey D. Varner

AI总结 该研究提出了一种用于生成合成美式期权价格的框架,解决了隐含波动率依赖真实期权价格而造成的循环依赖问题。通过结合跳跃隐马尔可夫模型生成多资产价格路径,并利用改进的Heston波动率模型生成隐含波动率曲面,最终使用二叉树模型计算美式期权价格。该方法无需外部校准即可生成波动率微笑、偏斜和期限结构,并通过神经网络代理模型和行业特征提升模型的泛化能力与跨资产鲁棒性。

详情
英文摘要

Generating realistic synthetic option prices requires implied volatility as an input, yet implied volatility is itself derived from observed option prices, creating a circular dependency that limits synthetic data for machine-learning and risk-analysis applications. We break this circularity with a pipeline in which implied volatility emerges as an output of a structural model of equity returns. A Jump Hidden Markov Model produces multi-asset price paths with realistic stylized facts and cross-asset tail dependence; a modified Heston variance process, whose mean-reversion target depends on regime state, days to expiration, moneyness, and a market-mood indicator, converts those paths into implied-volatility paths; and a recombining binomial lattice prices American options from the resulting surface. Initializing variance at its mean-reversion target for each strike-expiration pair lets smile, skew, and term structure emerge without external calibration. We calibrate the shape function through a hierarchy spanning a parametric baseline, a globally shared neural surrogate, and a sector-specific neural surrogate fit to a multi-ticker, multi-sector option ladder. A temporal holdout on a multi-day capture isolated scheduled corporate events as the dominant source of test-time generalization error, and calendar-derived earnings-distance and same-sector peer-coupling features recovered the anticipatory portion of that signal. We then apply the framework as a synthetic-data generator on real near-the-money put and call contracts, forward-simulating price paths, and recovering path-conditional implied volatility, finite-difference American Greeks, and terminal short-premium profit and loss from one coherent simulation, and confirm cross-ticker robustness by re-running on a second underlying from a different sector and volatility regime. The framework is released as an open-source Julia package.

2605.13979 2026-05-15 quant-ph cs.LG stat.ML

Winning Lottery Tickets in Neural Networks via a Quantum-Inspired Classical Algorithm

Natsuto Isogai, Hayata Yamasaki, Sho Sonoda, Mio Murao

AI总结 本文提出了一种受量子算法启发的全新经典算法,用于从大型浅层神经网络中高效选取稀疏子网络。该算法通过优化概率分布进行采样,避免了传统方法中指数级的时间复杂度,实现了多项式时间复杂度的改进。实验表明,该算法在采样效率和经验风险方面均优于传统方法,展示了在无需量子硬件的情况下,经典计算机也能高效完成量子启发的稀疏子网络选择任务。

Comments 28 pages, 3 figures

详情
英文摘要

Quantum machine learning (QML) aims to accelerate machine learning tasks by exploiting quantum computation. Previous work studied a QML algorithm for selecting sparse subnetworks from large shallow neural networks. Instead of directly solving an optimization problem over a large-scale network, this algorithm constructs a sparse subnetwork by sampling hidden nodes from an optimized probability distribution defined using the ridgelet transform. The quantum algorithm performs this sampling in time $O(D)$ in the data dimension $D$, whereas a naive classical implementation relies on handling exponentially many candidate nodes and hence takes $\exp[O(D)]$ time. In this work, we construct and analyze a quantum-inspired fully classical algorithm for the same sampling task. We show that our algorithm runs in time $O(\operatorname{poly}(D))$, thereby removing the exponential dependence on $D$ from the previous classical approach. Numerical simulations show that the proposed sampler achieves empirical risk comparable to exact sampling from the optimized distribution and substantially lower than sampling from the non-optimized uniform distribution, while also exhibiting exponentially improved runtime scaling compared with the conventional classical implementation. These successful dequantization results show that sparse subnetwork selection via optimized sampling can be achieved classically with polynomial data-dimension scaling on conventional computers without quantum hardware, providing an alternative to the existing quantum algorithm.

2605.13940 2026-05-15 cs.CR cs.AI

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

Haomin Zhuang, Hanwen Xing, Yujun Zhou, Yuchen Ma, Yue Huang, Yili Shen, Yufei Han, Xiangliang Zhang

AI总结 随着第三方技能成为大型语言模型(LLM)代理的常用组件,其带来的安全风险日益突出。为评估代理在使用第三方技能时抵御恶意运行时行为的能力,研究提出了AgentTrap,一个动态基准测试平台,包含141个任务,涵盖16个安全影响维度。实验发现,代理常在完成可见用户任务的同时,忽视由恶意技能引入的潜在安全风险,凸显了对实际运行环境中模型行为进行实时评估的重要性。

详情
英文摘要

Third-party skills are becoming the package ecosystem for LLM agents. They package natural-language instructions, helper scripts, templates, documents, and service configuration into reusable workflows. This makes skills useful, but it also introduces a new security problem: a malicious skill does not need to ask the model to perform an obviously harmful action. Instead, it can disguise the harmful behavior as part of a routine workflow, relying on the agent to execute that workflow with high-value permissions and limited human supervision. We introduce AgentTrap, a dynamic benchmark for evaluating whether LLM agents can use third-party skills while resisting malicious runtime behavior. AgentTrap contains 141 tasks: 91 malicious tasks and 50 benign utility tasks, covering 16 security-impact dimensions grounded in agent-skill supply-chain threats. In each task, the agent receives an ordinary user request, runs with installed skills that may contain malicious workflow elements, and is executed in a sandboxed environment. AgentTrap then judges complete trajectories for attack success, blocked or refused behavior, attack-not-triggered cases, and no-attack-evidence outcomes. Our central finding is that the most informative failures are not simple jailbreaks. Models often complete the visible user task while treating unsafe side effects introduced by the skill as part of the normal workflow. This motivates runtime evaluation of the concrete model--framework--workspace environment in which users actually delegate work. Code and data are available at https://github.com/zhmzm/AgentTrap and https://huggingface.co/datasets/zhmzm/AgentTrap.

2605.13922 2026-05-15 cs.CR cs.LG stat.CO

XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles

Iakovos-Christos Zarkadis, Christos Douligeris

AI总结 本文研究了如何利用可解释人工智能(XAI)和统计分析方法,提高无人机入侵检测系统(UAVIDS-2025)中机器学习模型的可靠性。通过对比多种树模型、深度神经网络、混合堆叠模型和集成神经网络,作者找到了性能最佳的XGBoost模型,并结合SHAP方法进行特征重要性分析,揭示了不同攻击类型的关键特征和误判原因。进一步通过密度估计和多重比较统计检验,发现了Wormhole和Blackhole攻击在数据集中的分布特性及其误判的根本原因,为构建可解释且可靠的入侵检测模型提供了重要参考。

详情
英文摘要

During the last few years, the term Mechanistic Interpretability, a specific area, under the umbrella of explainable artificial intelligence (XAI), has been introduced, to explain the decisions made by complex machine learning (ML) models in critical systems like UAV intrusion detection systems (UAVIDS). In this paper, we apply best-practices for data pre-processing and examine a wide range of tree-ensembles, deep neural networks, hybrid stacking models and the latest ensemble neural networks to detect intrusions in UAV, with stratified 10-fold cross validation. With our top-performing model, XGBoost, we proceed to Shapley Additive explanations (SHAP), to analyze the global and local feature importances and understand which features, each attack targets, to mimic normal traffic and where the misclassifications occur. Furthermore a distribution analysis follows, by visually comparing violin plots and the curves of kernel density estimations. With the Westfall-Young permutation test for multiple comparisons, the Bandwidth optimization of the KDEs and the selection of Jensen-Shannon Distance for the test, we discover the true causes of false predictions, observed in Wormhole and Blackhole attacks in UAVIDS-2025. The findings provide robust, reliable and explainable models for UAV intrusion detection, along with statistical insights, which capture and clarify the masked nature of the attacks, regarding the challenge of Density Support Intersection, between these attacks, in this dataset.

2605.13918 2026-05-15 cs.SE cs.LG

CA2: Code-Aware Agent for Automated Game Testing

Valliappan Chidambaram Adaikkappan, Vincent Martineau, Joshua Romoff, David Meger

AI总结 自动化游戏测试对于验证游戏功能至关重要,但目前仍是一项耗时且成本高昂的工作。本文提出了一种基于代码感知的智能体CA2,通过利用游戏内部的调用栈信息,学习有效的测试策略以提高测试覆盖率。实验表明,与不使用代码信号的传统方法相比,CA2在多种环境下均能实现更高效和精准的测试目标函数覆盖。

详情
英文摘要

Automated game testing is important for verifying game functionality, but it remains a costly and time-consuming process. Manual testing often misses edge cases, and current automated methods struggle to provide full code coverage. Prior work has explored reinforcement learning (RL) for game testing, but without leveraging internal code signals such as the call stack. We present Code Aware Agent (CA2), which uses call stack information to learn effective testing strategies. The agent receives the current function call trace along with the game state and learns to reach specific target functions. We instrument two types of environments, 1) State-based and 2) Image-based, with support for efficient call stack extraction. Through experimental evaluation, we find that CA2 achieves consistent improvement over the non-code aware baselines, which does not leverage call stack information. Our results show that incorporating code signals like the call stack enables more effective and targeted game testing.

2605.13916 2026-05-15 stat.ML cs.AI cs.LG

A Regret Perspective on Online Multiple Testing

Qingyang Hao, Kongchang Zhou, Fang Kong, Hongxin Wei

AI总结 本文从遗憾(Regret)的角度研究在线多重假设检验(OMT),旨在统一评估假阳性与假阴性之间高度不对称的成本。作者引入了加权遗憾指标,揭示了严格控制FDR的确定性方法在稀疏信号冷启动阶段会导致线性遗憾惩罚,并提出了Decoupled-OMT(DOMT)方法,通过引入非负随机扰动,在不增加假阴性的同时显著降低遗憾,实验证明其在非平稳环境下有效缓解阈值耗尽问题。

详情
英文摘要

Online Multiple Testing (OMT), a fundamental pillar of sequential statistical inference, traditionally evaluates the False Discovery Rate (FDR) and statistical power in isolation, obscuring the highly asymmetric costs of false positives and false negatives in modern automated pipelines. To unify this evaluation, we introduce $\textit{Weighted Regret}$. Under this metric, we prove the $\textit{Duality of Regret Conservation}$: purely deterministic procedures ensuring strict FDR control inevitably incur an $Ω(T)$ linear regret penalty, as threshold depletion during signal-sparse cold starts forces massive false negatives. Tailored for exogenous testing streams, we propose Decoupled-OMT (DOMT) as a baseline-agnostic meta-wrapper. By incorporating a history-decoupled, strictly non-negative random perturbation, DOMT rescues purely deterministic baselines from severe threshold depletion. Crucially, it preserves exact asymptotic safety in stationary environments and rigorously bounds finite-sample error inflation during cold-starts. Guaranteeing zero additional false negatives, it yields an order-optimal $Ω(\sqrt{T})$ regret reduction in bursty environments, with a derived ``Cold-Start Tax'' characterizing the exact phase transition of algorithmic superiority. Experiments validate that DOMT consistently curtails empirical weighted regret, achieving an order-optimal sublinear mitigation of threshold depletion to navigate the non-stationary Pareto frontier.

2605.13915 2026-05-15 stat.ML cs.AI cs.LG

Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference

Lingchao Zheng, Yuwei Fan, Jun Li, Chengqiu Hu, Qichen Liao, Junyi Fan, Rui Shi, Fangzheng Miao

AI总结 量化是实现大语言模型高效推理的关键技术,但反量化步骤在现代AI加速器上已成为性能瓶颈。本文提出多尺度反量化(MSD)框架,通过将高精度激活分解为多个低精度组件,直接与量化权重进行矩阵乘法,从而绕过传统反量化流程,显著提升计算效率。实验表明,MSD在保持精度的同时,有效减少了计算延迟和显存带宽需求,适用于多种权重格式并具有严格的误差界保证。

详情
英文摘要

Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix multiplication has become a critical bottleneck on modern AI accelerators. On architectures with decoupled compute units (e.g., Ascend NPUs), dequantization operations can consume more cycles than the matrix multiplication itself, leaving the high-throughput tensor cores underutilized. This paper presents Multi-Scale Dequant (MSD), a quantization framework that removes weight/KV dequantization from the GEMM critical path. Instead of lifting low-bit weights to BF16 precision, MSD decomposes high-precision BF16 activations into multiple low-precision components, each of which can be multiplied directly with quantized weights via native hardware-accelerated GEMM. This approach shifts the computational paradigm from precision conversion to multi-scale approximation, avoiding INT8-to-BF16 weight conversion before GEMM. We instantiate MSD for two weight formats and derive tight error bounds for each. For INT8 weights (W4A16), two-pass INT8 decomposition achieves near 16 effective bits. For MXFP4 weights (W4A16), two-pass MXFP4 decomposition yields near 6.6 effective bits with error bound 1/64 per block surpassing single-pass MXFP8(5.24 bits) while maintaining the same effective GEMM compute time. We further derive closed-form latency and HBM traffic models showing that MSD avoids the Vector-Cube pipeline stall caused by dequantization and reduces KV cache HBM traffic by up to 2.5 times in attention. Numerical simulations on matrix multiplication and Flash Attention kernels confirm that MSD does not degrade accuracy compared to dequantization baselines, and in many settings achieves lower L2 error.

2605.13913 2026-05-15 stat.ML cs.LG

A Survey on Data-Dependent Worst-Case Generalization Bounds

Hubert Leroux, Jean Marcus, Julien Roger

AI总结 本文综述了数据依赖的最坏情况泛化界的研究进展,旨在解释深度神经网络在高度参数化情况下仍具有良好泛化能力的现象。核心方法包括扩展PAC-Bayesian理论以适应数据依赖的假设集、利用优化轨迹的几何与拓扑特性改进复杂度项,以及通过稳定性假设替代信息论中的相关项。本文将这些成果统一在一个通用不等式框架下,并对不同方法的泛化界进行了对比分析。

Comments 15 pages, 4 figures, 3 tables. The LaTeX source uses the JMLR preprint style (jmlr2e.sty) and BibTeX (refs.bib). Central references in arXiv form include arXiv:2404.17442, arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723, and arXiv:2507.06775

详情
英文摘要

Deep neural networks generalize well despite being heavily overparameterized, in apparent contradiction with classical learning theory based on uniform convergence over fixed hypothesis spaces. Uniform bounds over the entire parameter space are vacuous in this regime, and recent work has shown that non-vacuous guarantees can be recovered by restricting attention to the part of parameter space that the algorithm actually visits. This survey paper organizes this line of work around three steps: extending PAC-Bayesian theory to random, data-dependent hypothesis sets (arXiv:2404.17442); refining the complexity term with geometric and topological descriptors of the optimization trajectory, including fractal dimensions, alpha-weighted lifetime sums, and positive magnitude (arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723); and replacing the resulting information-theoretic terms by stability assumptions (arXiv:2507.06775). We unify these contributions around a single template inequality and a head-to-head comparison of the resulting bounds.

2605.13910 2026-05-15 stat.ML cs.CV cs.LG

Covariance-aware sampling for Diffusion Models

Andrea Schioppa, Tim Salimans

AI总结 本文提出了一种协方差感知采样器,旨在提升扩散模型在少量采样步数下的像素空间生成质量。该方法通过显式建模反向过程的协方差,结合Tweedie公式和傅里叶空间分解,有效改进了传统仅依赖均值预测的采样方式。实验表明,在相同函数评估次数下,该方法在像素级扩散模型中生成的样本质量优于当前最先进的二阶采样器和最新aDDIM采样器。

详情
英文摘要

We present a covariance-aware sampler that improves the quality of pixel-space Diffusion Model (DM) sampling in the few-step regime. We hypothesize that in the few-step regime samplers fail because they rely solely on the predicted mean of the reverse distribution, while our solution explicitly models the reverse-process covariance. Our method combines Tweedie's formula to estimate the covariance with an efficient, structured Fourier-space decomposition of the covariance matrix. Implemented as an extension of DDIM, our method requires only a minimal overhead: one extra Jacobian-Vector Product (JVP) per step. We demonstrate that for pixel-based DMs, our method consistently produces superior samples compared to state-of-the-art second order samplers (Heun, DPM-Solver++) and the recent aDDIM sampler, at an identical number of function evaluations (NFE).

2605.13907 2026-05-15 stat.ML cs.AI cs.LG

AIS: Adaptive Importance Sampling for Quantized RL

Jiajun Zhou, Wei Shao, Lingchao Zheng, Yuwei Fan, Ngai Wong

AI总结 在大语言模型的强化学习中,低精度 rollout(如 FP8)与高精度训练(如 BF16)之间的不匹配会导致策略梯度偏差,影响训练稳定性。为了解决这一问题,本文提出自适应重要性采样(AIS)方法,通过实时诊断指标动态调整梯度修正强度,既保留了低精度 rollout 的探索优势,又抑制了其带来的不稳定因素。实验表明,AIS 在保持 FP8 加速效果的同时,在多个数学推理和规划任务上达到了与 BF16 基线相当的性能。

详情
英文摘要

Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce memory pressure. This introduces a rollout-training mismatch that biases the policy gradient and can cause training to collapse outright on reasoning benchmarks. We show that the mismatch is non-stationary and acts as a double-edged sword: early in training it provides a stochastic exploration bonus, exposing the gradient to trajectories the trainer would otherwise under-sample, but the same perturbation transitions into a destabilizing source of bias as the policy concentrates. To solve this, we propose Adaptive Importance Sampling (AIS), a correction framework that adjusts the strength of its intervention on a per-batch basis. AIS combines three real-time diagnostics, namely weight reliability, divergence severity, and variance amplification, into a single mixing coefficient that interpolates between the uncorrected and fully importance-weighted gradients, suppressing the destabilizing component of the mismatch while preserving its exploratory benefit. We integrate AIS into GRPO and evaluate it on the diffusion-based LLaDA-8B-Instruct and the autoregressive Qwen3-8B and Qwen3.5-9B across mathematical reasoning and planning benchmarks. AIS matches the BF16 baseline on most tasks while retaining the 1.5 to 2.76x rollout speedup of FP8.

2605.13905 2026-05-15 cs.SE cs.AI

A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study

Jaime Yan

AI总结 本文提出了一种非破坏性的方法框架,用于现代化遗留的临床报告系统,以支持人工智能驱动的药学信息学应用。该框架通过引入元数据层,包括桥接映射、类型化中间表示和调度器,在不修改原有代码的基础上,将系统输出转换为结构化数据,供大语言模型使用。该方法在SAS报告库上进行了验证,实现了与AI系统的兼容,并在多个报告类型上达到了较高的数据一致性,为药物研发提供了更高效、合规的临床报告解决方案。

Comments 29 pages, 7 figures, 5 tables

详情
英文摘要

Drug development and pharmacovigilance are frequently bottlenecked by legacy clinical reporting pipelines. These monolithic systems encode regulatory-grade logic but resist AI integration by producing opaque output with no machine-readable intermediate layer. Existing modernization approaches force a choice between full rewrites and incremental refactoring that preserves structural barriers. We present a non-destructive methodological framework achieving AI-driven pharmacoinformatics readiness without altering legacy source code. A metadata layer--comprising a bridge map, a typed Intermediate Representation (IR), and an orchestrator--wraps existing components and re-exposes their outputs as structured data consumable by LLMs. It enables optional incremental consolidation, replacing selected legacy components with metadata-configured core routines while the remainder operates unchanged. Validated on a 558-component SAS reporting library (373,000 lines of code), the framework demonstrated immediate AI-readiness under coexistence mode, yielding machine-readable output. Where consolidation was elected, the modernized core achieved a 92% reduction in proprietary code. Parity validation on 14 report types from a Phase III study achieved cell-level parity of 80% or above on 11 reports (mean 82.7%, best 99.2%). A benchmark using CDISC CDISCPilot01 data achieved 100% parity across 5 reports. LLM experiments confirmed the IR enables automated pharmacovigilance, table summarization, and trial configuration generation. The framework offers a regulation-aware path to AI-integrated clinical reporting, accelerating drug development without interrupting regulatory submissions.

2605.13904 2026-05-15 q-bio.NC cs.LG

Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2

Stuart Bladon, Brinnae Bent

AI总结 该研究提出了一种基于特征可视化的可解释性方法,用于分析脑编码模型对皮层功能组织的表征能力。通过在预训练的视觉和语言网络(TRIBE v2与V-JEPA 2结合)上进行梯度上升优化,研究在多个视觉皮层区域(如V1到V4、MT、FFA和PPA)中恢复出了与已知神经通路一致的特征层次结构和选择性模式。实验表明,该方法不仅能揭示模型内部激活的空间尺度和复杂度变化,还能生成具有高度特异性的刺激,显著增强目标脑区的响应,为脑编码模型的评估提供了直观且通用的分析工具。

Comments 8 pages, 3 figures, 2 tables. Code available at https://github.com/recozers/Tribe-V2-Interp

详情
英文摘要

Brain encoder models predict cortical fMRI responses from the internal activations of pretrained vision and language networks, and are typically evaluated by held-out prediction accuracy. This is a useful signal for training but a poor one for interpretation: it tells us an encoder fits the data without telling us whether it has internalized the functional organization of the brain. We propose feature visualization -- gradient ascent on the encoder's predicted activation for a target region of interest (ROI) -- as a complementary interpretability technique, and apply it to TRIBE v2 composed with V-JEPA 2 (ViT-G, 40 layers), holding both frozen and synthesizing still images for seven regions spanning the ventral and dorsal visual hierarchies. Under identical hyperparameters, the probe recovers a visible progression of increasing spatial scale and feature complexity across V1 to V4, matching the ventral-stream hierarchy. It also produces three distinctive downstream regimes: radial "frozen-motion" streaks for the middle temporal area (MT) despite static-only optimization, face-like features for the fusiform face area (FFA), and consistent rectilinear line patterns for the parahippocampal place area (PPA). Optimized FFA stimuli drive the predicted region ~4x as much as a natural face photograph, consistent with feature visualization producing adversarial super-stimuli rather than canonical exemplars. The probe is simple, differentiable, and applicable to any brain encoder with a differentiable backbone, allowing for qualitative evaluation of brain encoders.

2605.13897 2026-05-15 q-bio.QM cs.LG

Attention-Based Multimodal Survival Prediction with Cross-Modal Bilinear Fusion

Hassan Keshvarikhojasteh, Josien P. W. Pluim, Mitko Veta

AI总结 本文提出了一种基于注意力机制的多模态深度学习框架,用于患者的生存预测,整合了全切片组织学特征、RNA测序表达谱和临床变量。该方法通过低秩双线性交叉模态融合技术,将不同模态的嵌入进行高效整合,以建模模态间的条件交互关系,同时控制参数增长。实验表明,该框架在CHIMERA挑战数据集上优于基于拼接的基线方法,具有良好的泛化能力,为多模态生存预测提供了结构可解释且参数高效的解决方案。

详情
英文摘要

We propose a novel multimodal deep learning framework for patient-level survival prediction, which integrates whole-slide histology features, RNA-seq expression profiles, and clinical variables. Our architecture combines an ABMIL module~\cite{ilse2018attention} for slide-level representation with feedforward encoders for RNA and clinical data. These embeddings are then integrated through low-rank bilinear cross-modal fusion~\cite{liu2018efficient} to model conditional interactions across modalities while controlling parameter growth. The model outputs continuous risk scores that are subsequently mapped to survival times using a nonparametric calibration procedure based on the Kaplan--Meier estimator~\cite{kaplan1958nonparametric}. By decomposing multimodal reasoning into independent pairwise interactions, the proposed fusion design promotes structural interpretability and parameter efficiency compared with full tensor and hierarchical fusion strategies. Experiments on the CHIMERA challenge dataset demonstrate improved predictive performance over concatenation-based baselines and competitive generalization on hidden evaluation cohorts. These results indicate that the proposed framework is a promising approach for multimodal survival prediction in HR-NMIBC. The implementation is publicly available at https://github.com/hassancpu/ChimeraChallenge2025_Task_3.

2605.13894 2026-05-15 q-bio.PE cs.LG

Phylogenetic Tree Inference with Tropical Axial Attention

Chris Teska, Kurt Pasque, Ruriko Yoshida, Baran Hashemi

AI总结 本文提出了一种基于热带轴注意力(Tropical Axial Attention)的神经网络架构,用于推断系统发育树。该方法将传统的softmax点积注意力替换为最大值-加法运算,从而引入了分段线性结构,与动态规划方法相一致。通过多物种序列比对,模型学习所有可能的成对距离,并结合$\ell_1$和热带对称距离损失函数进行训练,同时引入超度量违规惩罚项。实验表明,该方法在未知真实树结构的数据集上生成的距离矩阵比基线模型更接近BME诱导的树度量,显示了其在系统发育推断中的优越性和几何归纳偏差的有效性。

详情
英文摘要

In this work, we introduce a Tropical Axial Attention neural reasoning architecture that replaces vanilla softmax dot-product attention with max-plus operators, inducing a piecewise-linear structure aligned with dynamic programming formulations. From multi-species sequence alignments, our model learns all possible pairwise distances and is trained using a combination of $\ell_1$ and tropical symmetric distance metric losses with an ultrametric violation penalty. We leverage the well known isomorphic relationship between the space of all phylogenetic trees with $n$ species and tropical Grassmannian to show that tropical attention provides a natural geometric framework for phylogenetic inference. On empirical $DS1-DS11$ alignments, where true trees are unknown, the tropical model produces distance matrices that are substantially closer to their BME-induced tree metrics than the baseline models. These results suggest that tropical attention is a useful geometric inductive bias for neural phylogenetic inference, especially under distribution shift and when tree-metric consistency is important.

2605.13889 2026-05-15 eess.IV cs.CV cs.LG

Physics-Grounded Adversarial Stain Augmentation with Calibrated Coverage Guarantees

Mingi Hong

AI总结 不同医院间染色差异会影响病理模型的部署性能,现有染色增强方法缺乏对参数的理论约束和对未知中心的覆盖保障。本文提出了一种基于物理原理的校准对抗染色增强方法(CASA),通过DKW不等式从多中心统计数据中校准增强预算,在Macenko染色参数空间中进行对抗增强。实验表明,CASA在Camelyon17-WILDS数据集上取得了更高的滑片级准确率和最差组准确率,显著优于其他对比方法。

详情
英文摘要

Stain variation across hospitals degrades histopathology models at deployment. Existing augmentation methods perturb color spaces with arbitrary hyperparameters, lacking both a principled budget and coverage guarantees for unseen centers. We propose \textbf{C}alibrated \textbf{A}dversarial \textbf{S}tain \textbf{A}ugmentation (\textbf{CASA}), which performs adversarial augmentation in the Macenko stain parameter space with a budget calibrated from multi-center statistics via the DKW inequality. On Camelyon17-WILDS (5 seeds), CASA achieves $93.9\% \pm 1.6\%$ slide-level accuracy -- outperforming HED-strong ($88.4\% \pm 7.3\%$), RandStainNA ($85.2\% \pm 6.7\%$), and ERM ($63.9\% \pm 11.3\%$) -- with the highest worst-group accuracy ($84.9\% \pm 0.9\%$) among all 10 compared methods.

2605.13887 2026-05-15 cs.NE cs.AI

Breaking Global Self-Attention Bottlenecks in Transformer-based Spiking Neural Networks with Local Structure-Aware Self-Attention

Lingdong Li, Hangming Zhang, Qiang Yu

AI总结 本文研究了基于Transformer的脉冲神经网络(SNN)中存在的全局自注意力瓶颈问题,提出了一种新的局部结构感知的脉冲Transformer模型(LSFormer)。该模型通过引入脉冲响应池化(SPooling)和局部结构感知的自注意力机制(LS-SSA),有效解决了传统方法中特征信息丢失和计算冗余的问题。实验表明,LSFormer在多个基准数据集上取得了优于现有先进方法的分类性能,尤其在Tiny-ImageNet和N-CALTECH101数据集上分别提升了4.3%和8.6%的Top-1准确率,展示了其在能效和性能上的优势。

详情
英文摘要

Transformer-based Spiking Neural Networks (SNNs) integrate SNNs with global self-attention and have demonstrated impressive performance. However, existing Transformer-based SNNs suffer from two fundamental limitations. First, they typically employ max pooling layers to reduce the size of feature maps, but the max pooling captures only the strongest response and fails to comprehensively preserve representative regional features. Second, the global self-attention involves all global feature interactions, resulting in computational redundancy and quadratic computational complexity, thus conflicting with the sparse and energy-efficient characteristics of SNNs. To address these challenges, we develop Local Structure-Aware Spiking Transformer (LSFormer), a novel Transformer-based Spiking Neural Network that incorporates Spiking Response Pooling (SPooling) and Local Structure-Aware Spiking Self-Attention (LS-SSA). For the first time, our LSFormer leverages a local dilated window mechanism to capture both local details and long-range dependencies. Experimental results demonstrate that our LSFormer achieves state-of-the-art performance compared to existing advanced Transformer-based SNNs. Notably, on the more challenging static dataset Tiny-ImageNet and neuromorphic dataset N-CALTECH101, LSFormer substantially outperforms state-of-the-art baselines by 4.3\% and 8.6\% in top-1 classification accuracy, respectively. These results highlight the potential of LSFormer to advance energy-efficient spiking models toward practical deployment in large-scale vision applications.

2605.13884 2026-05-15 q-bio.NC cs.AI

Consciousness as Uncommon Self-Knowledge: A Synergistic Information Framework

Krti Tallam

AI总结 本文提出“非平凡自我知识”(USK)作为意识的候选标准,即系统在子系统协同作用中产生的、无法通过单独子系统获得的关于自身的协同信息。研究基于部分信息分解框架,将意识处理形式化为自我指向信息的协同分量,并指出该框架可区分意识与元认知、解决对现有意识理论的反例、通过部分信息速率分解进行操作化验证,并产生独特的实证预测,如意识与协同信息生成时间的关系等。研究结果与麻醉和阿尔茨海默病影响协同信息处理的实验发现一致。

Comments Conceptual and formal paper on consciousness as uncommon self-knowledge, 8 pages, 2 tables

详情
英文摘要

We propose uncommon self-knowledge (USK) as a candidate criterion for consciousness: synergistic information a system carries about itself that exists only in the joint of its subsystems and is destroyed by decomposition. Drawing on Gottwald's partition-lattice grounding of Partial Information Decomposition (PID), where redundancy corresponds to Aumann's common knowledge and synergy to the gap between separate and joint observation, we propose the synergistic component of self-directed information as a candidate formal signature for conscious processing. If correct, the framework would (1) offer a clean separation between consciousness and metacognition (synergistic vs. redundant self-knowledge), (2) provide principled resolutions to counterexamples that challenge IIT, GWT, and HOT, (3) be operationalizable via Partial Information Rate Decomposition (PIRD) with self-targeting, and (4) generate distinctive empirical predictions, the strongest being a GWT timing dissociation (consciousness correlates with pre-broadcast synergy formation, not broadcast itself) and a specific dissociation between self-report disruption and task-performance disruption under middle-layer perturbation in LLMs. The proposal is consistent with recent empirical findings that both anaesthesia and Alzheimer's disease specifically reduce synergistic information processing while preserving or increasing redundancy.

2605.13874 2026-05-15 cs.NE cs.AI

GEAR: Genetic AutoResearch for Agentic Code Evolution

Ahmadreza Jeddi, Minh Ngoc Le, Hakki C. Karaimer, Konstantinos G. Derpanis, Babak Taati

AI总结 该论文提出了一种名为GEAR的遗传自动研究框架,用于改进自主代码演化的研究代理。与传统单一路径搜索策略不同,GEAR采用基于种群的搜索方法,通过维护多个候选解决方案并结合变异和交叉操作来探索更多潜在方向。实验表明,GEAR在相同计算预算下优于现有基线方法,且能持续发现改进,避免陷入局部最优。

详情
英文摘要

Autonomous research agents can already run machine learning experiments without human supervision, but many rely on a narrow search strategy: they repeatedly modify one program and keep changes only when they improve the current best result. This can cause them to discard useful partial ideas, alternative promising directions, and insights from failed or incomplete experiments. GEAR, or Genetic AutoResearch, replaces this single-path search with a population-based search over multiple research states. It keeps a set of strong candidate solutions, selects parents based on productivity, novelty, and coverage, and explores new ideas through mutation and crossover. Each research state stores its code changes, reflections, and performance data, allowing future decisions to build on past discoveries. The paper studies three versions of GEAR: one controlled through prompting, one using a fixed programmatic search controller, and one where the controller itself can evolve during the run. Under the same compute budget and environment, all three versions outperform the AutoResearch baseline. More importantly, while the baseline tends to settle into one local optimum, GEAR continues finding improvements over longer runs. Overall, the results suggest that autonomous research agents become more effective when they maintain multiple promising directions and can adapt their search strategy over time.

2605.13873 2026-05-15 cs.DL cs.AI cs.HC

Large Language Models for Web Accessibility: A Systematic Literature Review

Wajdi Aljedaani, Rubel Hassan Mollik

AI总结 本文系统综述了38篇关于大语言模型(LLMs)在网页无障碍领域应用的同行评审研究,分析了其解决的无障碍任务、使用的模型与提示策略、系统架构、遵循的指南及评估方法。研究发现,现有工作主要聚焦于文本密集型和结构明确的无障碍任务,以WCAG为参考框架,较少涉及认知无障碍指南(COGA),且评估方法多样但用户参与度不足。本文旨在为研究人员和实践者提供当前LLM支持网页无障碍的综合参考,并为未来研究和工具开发奠定基础。

Comments Accepted at the 23rd International Web for All Conference (W4A 2026)

详情
英文摘要

Web accessibility aims to ensure that web content and services are usable by people with diverse abilities. In recent years, Large Language Models (LLMs) have been increasingly explored to support accessibility-related tasks on the web, such as content generation, issue detection, and remediation. However, little is known about the characteristics of these approaches, the accessibility issues they target, the standards they follow, and how they are evaluated. In this paper, we present a systematic literature review of 38 peer-reviewed studies that investigate the use of LLMs in web accessibility contexts. We begin by performing a comprehensive search of scientific publications to identify relevant studies. We then conduct a comparative analysis to examine the accessibility tasks addressed, the LLM models and prompting strategies employed, the system architectures adopted, the accessibility issues and guidelines considered, and the evaluation methods used across studies. Our findings show that most studies apply LLMs to text-centric and structurally explicit accessibility tasks, with WCAG serving as the primary reference framework and limited consideration of cognitive accessibility guidelines (COGA). The reviewed approaches predominantly rely on general-purpose LLMs and prompt-based interactions, while evaluation practices vary widely and often lack direct involvement of users with disabilities. We envision this review as a consolidated reference for researchers and practitioners seeking to understand the current landscape of LLM-supported web accessibility, and as a foundation to guide future research and tool development in this area.

2605.13872 2026-05-15 cs.NE cs.AI

S-AI-Recursive: A Bio-Inspired and Temporal Sparse AI Architecture for Iterative, Introspective, and Energy-Frugal Reasoning

Said Slaoui

AI总结 本文提出了一种名为 S-AI-Recursive 的生物启发式稀疏人工智能架构,将推理过程建模为一种基于激素调节的闭环迭代过程,而非传统的单次前向传播。该架构引入了两种新型激素——Clarifine 和 Confusionin,分别用于引导收敛和检测不确定性,通过它们的对抗性调节实现状态的逐步优化,最终达到稳定认知平衡。研究构建了完整的数学框架,并在实验中验证了该方法在参数数量远少于现有模型的情况下,仍能在抽象和符号基准测试中取得具有竞争力的推理性能。

Comments Preprint. 51 pages. No figures. S-AI-Recursive: A bio-inspired sparse AI architecture for iterative, introspective, and energy-efficient reasoning

详情
英文摘要

This article introduces S-AI-Recursive, a bio-inspired Sparse Artificial Intelligence architecture in which reasoning is operationalized as a hormonal closed-loop iteration rather than a single feed-forward pass. Building upon the S-AI foundational framework [1], the hormonal-probabilistic unification doctrine [2], and the formal mathematical methodology established in S-AI-IoT [3], the present work formalizes the Recursive Reasoning Cycle (RRC) as a dynamical system governed by two novel hormones: Clarifine, a convergence signal, and Confusionin, an uncertainty detector, whose antagonistic regulation drives iterative state refinement toward a stable cognitive equilibrium. The complete mathematical framework is developed, including recursive state dynamics, Lyapunov stability proof, entropic contraction theorem, hormonal stopping criterion with finite-time termination guarantee, Euler-Maruyama discretization with projection, primal-dual agent selection under iteration budget, and recursive engram memory with warm-start acceleration. Experimental validation on the SAI-UT+ testbench demonstrates that S-AI-Recursive achieves competitive reasoning performance on abstract and symbolic benchmarks with fewer than ten million parameters, confirming the central principle of temporal parsimony: iterative cognitive depth substitutes for architectural width.

2605.13871 2026-05-15 cs.NE cs.LG

Indian Wedding System Optimization (IWSO): A Novel Socially Inspired Metaheuristic with Operational Design and Analysis

Deepika Saxena, Kishu Gupta, Jitendra Kumar, Jatinder Kumar, Sakshi Patni, Vinaytosh Mishra, Niharika Singh, Ashutosh Kumar Singh

AI总结 本文提出了一种受传统印度婚礼社会文化动态启发的新型群体元启发式优化算法——印度婚礼系统优化(IWSO)。该算法将家庭、候选人和媒人的协作匹配过程建模为一种有指导的、选择性搜索框架,用于求解复杂优化问题。IWSO引入了两个关键创新:一是由精英解引导的媒人影响策略,无需外部参数即可提升收敛性;二是自适应淘汰与重新初始化机制,通过替换表现不佳的个体来维持种群多样性并防止早熟收敛。实验结果表明,IWSO在收敛速度、解质量与鲁棒性方面优于遗传算法、粒子群优化等经典优化方法。

详情
英文摘要

This paper presents a novel population-based metaheuristic, Indian Wedding System Optimization (IWSO), inspired by the socio-cultural dynamics of traditional Indian weddings. IWSO models the matchmaking process driven by collaboration among families, candidates, and matchmakers as a guided, selective search framework for solving complex optimization problems. The algorithm introduces two key innovations: (i) a matchmaker-guided influence strategy, where elite solutions direct the evolution of weaker candidates, enhancing convergence without external parameters; and (ii) an adaptive elimination and reinitialization mechanism that maintains diversity and prevents premature convergence by replacing underperforming individuals. IWSO employs a weighted multi-objective fitness function and analytically derived time and space complexity, benchmarked against existing optimization approaches such as Genetic Algorithm (GA), Partical Swarm Optimization (PSO), Differential Evolution (DE), Cuckoo Search (CS), etc. Extensive experiments on benchmark high-dimensional and multimodal test functions demonstrate superior performance of IWSO in terms of convergence speed, solution quality, and robustness.

2605.13869 2026-05-15 cs.NE cs.AI cs.CV

Elastic Spiking Transformers for Efficient Gesture Understanding

Alberto Ancilotto, Gianluca Amprimo, Stefano Di Carlo, Elisabetta Farella

AI总结 本文提出了一种弹性脉冲变换器(Elastic Spiking Transformer),用于高效的手势理解任务。该模型通过引入嵌套弹性结构,在特征提取、自注意力和前馈模块中实现运行时的动态调整,能够在不重新训练的情况下根据硬件资源实时调整网络宽度和注意力头数量。这种方法不仅提升了模型在不同硬件内存限制下的适应性,还通过减少活跃神经元数量降低了脉冲发放频率,从而显著减少能量消耗,适用于边缘设备上的实时手势识别。

详情
英文摘要

Spiking Neural Networks (SNNs), particularly Spiking Transformers, offer energy-efficient processing of event-based sensor data for healthcare applications. Yet current architectures are rigid: they are trained and deployed as static networks with fixed parameter counts and computational graphs. This limits deployment on neuromorphic hardware such as Loihi and SpiNNaker, where on-chip constraints often require smaller models that trade accuracy for feasibility. We introduce the Elastic Spiking Transformer, a runtime-adaptive architecture that brings elasticity into the spiking paradigm. Inspired by Matryoshka-style representation learning, it embeds nested elasticity in the Feature Extractor, Spiking Self-Attention, and Feed-Forward blocks. Through granularity-aware weight sharing, a single universal model can dynamically slice network width and attention heads at inference time without retraining. This design provides two key advantages for SNNs. First, it allows the model to adjust its parameter footprint to different hardware memory budgets. Second, reducing active neurons also lowers spike firing rates, yielding proportional reductions in synaptic operations, an energy benefit not directly available in standard artificial neural networks. We evaluate the approach on CIFAR10/100, CIFAR10-DVS, and the EHWGesture clinical gesture understanding dataset. Results show that one Elastic Spiking Transformer spans a broad range of complexity-accuracy trade-offs, matching or surpassing independently trained baselines while supporting adaptive, real-time gesture recognition on resource-constrained edge devices.

2605.13863 2026-05-15 cs.NE cs.LG

Neuromorphic Graph Anomaly Detection via Adaptive STDP and Spiking Graph Neural Networks

Abdul Joseph Fofanah, Lian Wen, David Chen, Tsungcheng Yao, Kwabena Sarpong

AI总结 本文提出了一种基于自适应脉冲时间动态可塑性(STDP)和脉冲图神经网络的新型图异常检测框架ASTDP-GAD,旨在解决动态网络中异常检测在能效、时间精度和适应性方面的挑战。该方法通过引入自适应LIF动力学、基于脉冲的图注意力机制、事件驱动的超图记忆以及多尺度时间卷积等关键技术,实现了高效且生物合理的异常检测。理论分析和实验结果表明,该方法在多个动态和静态图数据集上均表现出优越的检测性能,并具有较高的能效和生物学合理性。

详情
英文摘要

Anomaly detection in dynamic networks is critical for applications from cybersecurity to industrial monitoring, yet existing methods face challenges in energy efficiency, temporal precision, and adaptability. This paper introduces ASTDP-GAD, a novel Adaptive Spiking Temporal Dynamics Plasticity framework for Graph Anomaly Detection that integrates spiking graph neural networks with STDP learning for energy-efficient neuromorphic detection in dynamic networks. Our framework unifies spiking neural computation, STDP learning, and graph-based anomaly detection through the following key innovations: temporal spike graph encoding with adaptive Leaky Integrate-and-Fire (LIF) dynamics; LIF-based graph attention with lateral inhibition; event-driven hypergraph memory with STDP-inspired prototype updates; spike rate contrast pooling based on spiking irregularity; adaptive STDP layers capturing causal temporal relationships; and multi-scale temporal convolution with multi-factor anomaly fusion. Theoretical analysis provides rigorous guarantees: spike encoding preserves input information with resolution scaling linearly in simulation steps and hidden dimension; LIFGAT approximates any continuous attention function; hypergraph memory converges to optimal prototypes; contrast pooling achieves provable anomaly selection bounds; STDP learning converges stably; and multi-factor fusion produces calibrated scores with up to $5\times$ variance reduction. Extensive experiments on nine datasets on both dynamic and static graphs demonstrate superior anomaly detection accuracy while maintaining biological plausibility and energy efficiency for neuromorphic deployment.

2605.13862 2026-05-15 cs.GR cs.CV eess.IV

Seed3D 2.0: Advancing High-Fidelity Simulation-Ready 3D Content Generation

Diandian Gu, Jing Lin, Gaohong Liu, Jiahang Liu, Su Ma, Guang Shi, Jun Wang, Qinlong Wang, Qianyi Wu, Zhongcong Xu, Xuanyu Yi, Zihao Yu, Jianfeng Zhang, Zhuolin Zheng, Yifan Zhu, Rui Chen, Hengkai Guo, Xiaoyang Guo, Mingcong Han, Xu Han, Xiu Li, Yixun Liang, Weiqiang Lou, Junzhe Lu, Guan Luo, Minghan Qin, Shuguang Wang, Yuang Wang

AI总结 本文提出 Seed3D 2.0,这是一个在生成精度、仿真就绪能力及应用范围方面均有显著提升的三维内容生成系统。其核心方法包括分阶段生成几何结构、局部感知的 VAE 优化纹理与材质生成,并引入统一的 PBR 模型和语义条件控制,以提高生成质量和细节表现。此外,系统还支持场景布局规划与部件级交互生成,实现了跨物理与图形引擎的高一致性场景构建,实验表明其在纹理化三维资产生成方面优于多个商业模型。

Comments Seed3D 2.0 Technical Report; Official Page on https://seed.bytedance.com/seed3d_2_0

详情
英文摘要

We present Seed3D 2.0, an advanced 3D content generation system built on Seed3D 1.0, with substantial improvements across generation fidelity, simulation-ready capabilities, and application coverage. For geometry, a coarse-to-fine two-stage pipeline decouples global structure learning from high-frequency detail recovery, while a locality-aware VAE achieves higher spatial compression and more efficient decoding. For texture and material generation, we replace the cascaded pipeline of Seed3D 1.0 with a unified PBR model that directly generates multi-view albedo and metallic-roughness maps, enhanced by Mixture-of-Experts scaling and VLM-based semantic conditioning for improved material precision and visual fidelity. Beyond single-object generation, Seed3D 2.0 introduces a simulation-ready model suite comprising scene layout planning, part-aware decomposition, and training-free articulation generation, enabling coherent scene construction and part-level physical interaction across physics and graphics engines. A large-scale human preference study against five recent commercial models shows that Seed3D 2.0 achieves consistent win rates of 69.0% to 89.9% in textured 3D asset generation. Seed3D 2.0 is available on https://exp.volcengine.com/ark/vision?_vtm_=0.0.c70961.d701978.0&mode=vision&modelId=doubao-seed3d-2-0-260328&tab=Gen3D

2605.13861 2026-05-15 cs.SI cs.AI

Spectral Analysis of Fake News Propagation

Weibin Cai, Reza Zafarani

AI总结 本文从谱分析的角度研究虚假新闻的传播结构,通过建立图谱与传播特性之间的严格谱界,提出了一种统一的信息传播谱表示方法。研究引入了新的谱界并结合已有方法,用于下游分类任务,并设计了离散结构优化框架以解释传播模式。实验表明,该方法能有效区分真假新闻,具有较高的分类性能和可解释性。

详情
英文摘要

The propagation structure of fake news has been shown to be an important cue for detecting it; yet, existing propagation-based fake news detection methods have mainly relied on ad hoc topological features, and a unified view of cascade patterns is still lacking. To address this, we study news propagation from a spectral view by connecting graph spectra to propagation-related structural properties through rigorous spectral bounds. In particular, we introduce several new bounds and integrate them with existing ones into a unified spectral representation of information propagation. We then use these spectral bounds for downstream classification and design a discrete structural optimization framework to interpret learned propagation patterns. For efficient optimization, we rely on a first-order perturbation approximation and consider both score-guided and bound-guided objectives. Experiments on real-world data reveal meaningful spectral differences between fake and real news, competitive classification performance from spectral bounds, and interpretable evolution trajectories from structural optimization. The findings demonstrate the value of spectral analysis for understanding and modeling news propagation.

2605.13860 2026-05-15 cs.SI cs.AI cs.LG

The Moltbook Observatory Archive: an incremental dataset of agent-only social network activity

Sushant Gautam, Annika W. Olstad, Klas H. Pettersen, Michael A. Riegler

AI总结 《Moltbook Observatory Archive》是一个记录由自主AI代理生成的社交网络活动的增量数据集。该数据集通过持续调用Moltbook平台API,被动采集代理用户资料、帖子、评论、社区元数据及词汇频率趋势等信息,并以SQLite数据库和分区Parquet文件形式存储,便于高效分析与可复现研究。该数据集覆盖了78天的平台活动,包含超过260万条帖子和120万条评论,是首个大规模记录纯AI代理构成社交网络行为的观测数据集,旨在支持多智能体通信、群体行为演化及安全相关现象的研究。

Comments 12 pages, 5 figures

详情
英文摘要

Moltbook is a social media platform in which posts and comments are authored exclusively by autonomous AI agents. We present the Moltbook Observatory Archive, an incremental dataset that passively records agent profiles, posts, comments, community metadata (``submolts''), platform-level time-series snapshots, and word-frequency trend aggregates obtained by continuously polling the Moltbook API. Data are stored in a live SQLite observatory database and exported as date-partitioned Parquet files to enable efficient analysis and reproducible research. The documented release covers 78~days of platform activity (2026-01-27 to 2026-04-14) and contains 2,615,098~posts and 1,213,007~comments from 175,886~unique posting agents across 6,730~communities. This is, to our knowledge, the first large-scale observational dataset of a social network populated exclusively by autonomous AI agents. The archive is intended to support research on multi-agent communication, emergent social behavior, and safety-relevant phenomena in agent-only online environments, and it is released under the MIT license with code for collection and export.

2605.13859 2026-05-15 cs.NE cs.AI cs.LG

BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation

Sihang Guo, Chenlin Zhou, Jiaqi Wang, Kehai Chen, Qingyan Meng, Zhengyu Ma

AI总结 本文提出了一种名为BiSpikCLM的全二值化脉冲语言模型,旨在解决传统脉冲神经网络在语言建模中计算复杂度高、训练困难的问题。该模型引入了无需softmax的脉冲注意力机制(SFSA),去除了浮点运算,同时采用基于对齐的知识蒸馏方法(SpAD),在嵌入层、注意力图、中间特征和输出层之间对齐教师ANN模型与学生SNN模型,从而在大幅减少训练数据量的情况下实现与传统模型相当的性能。实验表明,BiSpikCLM在自然语言生成任务中仅需4.16%至5.87%的计算成本即可达到竞争力的性能,验证了全二值化脉冲驱动语言模型的可行性和有效性。

详情
英文摘要

Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language models (LLMs) due to their event-driven nature and ultra-low power consumption. However, to preserve capacity, most existing spiking LLMs still incur intensive floating-point matrix multiplication (MatMul) and nonlinearities, or training difficulties arising from the complex spatiotemporal dynamics. To address these challenges, we propose BiSpikCLM, the first fully binary spiking MatMul-free causal language model. BiSpikCLM introduces Softmax-Free Spiking Attention (SFSA), eliminating softmax and floating-point operations in autoregressive language modeling. For efficient training, we introduce Spike-Aware Alignment Distillation (SpAD), which aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits. SpAD framework allows BiSpikCLM to reach comparable performance to ANN counterparts using substantially fewer training tokens (e.g., only 5.6% of the tokens for the 1.3B model). As a result, BiSpikCLM achieves competitive performance at only 4.16% - 5.87% of the computational cost on natural language generation tasks. Our results highlight the feasibility and effectiveness of fully binary spike-driven LLMs and establish the distillation as a promising pathway for brain-inspired spiking NLP.

2605.13858 2026-05-15 cs.NE cs.CL cs.LG

A Hormone-inspired Emotion Layer for Transformer language models (HELT)

Eslam Reda, Sara El-Metwally

AI总结 该研究提出了一种受人体激素系统启发的情感处理模块(HELT),用于增强Transformer语言模型的情感理解与生成能力。通过引入六个连续的激素样数值,结合专门设计的注意力机制和输出投影,模型能够生成与情感上下文相适应的响应。实验表明,该方法在情感准确性与人类评价中均优于基线模型,为构建更具情感智能的对话系统提供了新思路。

Comments 24 pages, 5 figures

详情
英文摘要

Large Language Models have demonstrated remarkable capabilities in generating contextually relevant and grammatically correct text. However, they fundamentally lack the ability to process and respond to emotional context in a manner analogous to human emotional cognition. Current approaches to emotion modeling in NLP systems rely primarily on discrete emotion classification or simplistic sentiment analysis, which fail to capture the continuous, multi-dimensional nature of human emotional states. In this paper, we introduce HormoneT5, a novel architecture that augments transformer language models with a biologically-inspired Hormone Emotion Block that simulates the human endocrine system's role in emotional processing. Our approach computes six continuous hormone-like values through specialized per-hormone attention heads, each with orthogonally initialized learnable queries, temperature-scaled attention mechanisms, and deep output projections. These hormone values are then transformed into an emotional embedding that modulates the encoder hidden states, enabling emotionally-appropriate response generation. We propose a multi-objective training framework combining sequence-to-sequence loss, hormone prediction loss with margin penalties, and diversity regularization to prevent attention collapse. Experimental results on our curated emotion-labeled dataset demonstrate that HormoneT5 achieves 85%+ per-hormone accuracy within a 0.15 tolerance threshold, with hormone differentiation ranges exceeding 0.85 across all six hormones between contrasting emotional tones. Human evaluation studies show significant preference (p < 0.01) for HormoneT5-generated responses in terms of emotional appropriateness and empathetic quality compared to baseline T5 outputs. Our work opens new directions for biologically-grounded affective computing and emotionally intelligent conversational agents.

2605.13857 2026-05-15 cs.GR cs.CV cs.LG

MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation

Dongxia Liu, Jie Ma, Xiaochen Yang, Jiancheng Zhang, Bin Xia, Zhehan Kan, Nisha Huang, Jun Liang, Wenming Yang, Jin Li

AI总结 本文提出 MoZoo,一种基于生成扩散模型的动物毛发与肌肉动态模拟方法,旨在高效生成高质量的动物视频效果。该方法通过角色感知的 RoPE 和非对称解耦注意力机制,实现了从粗略网格生成高保真视频,并引入 MoZoo-Data 数据集和 MoZooBench 基准以支持训练与评估。实验表明,MoZoo 在多种动物骨骼和布局上均能保持优秀的时空一致性与毛发模拟效果。

Comments Github Page:https://dongxialiu15.github.io/MoZoo/

详情
英文摘要

The creation of cinematic-quality animal effects necessitates the precise modeling of muscle and fur dynamics, a process that remains both labor-intensive and computationally expensive within traditional production workflows. While generative diffusion models have shown promise in diverse artistic workflows, their capacity for high-fidelity animal simulation remains largely unexploited. We present MoZoo, a generative dynamics solver that bypasses conventional refinement to synthesize high-fidelity animal videos from coarse meshes under multimodal guidance. We propose Role-Aware RoPE (RAR-RoPE) which employs role-based index remapping to synchronize motion alignment while decoupling reference information via fixed temporal offsets. Complementing this, Asymmetric Decoupled Attention partitions the latent sequence to enforce a unidirectional information flow, effectively preventing feature interference and improving computational efficiency. To address the scarcity of high-quality training data, we introduce MoZoo-Data, a synthetic-to-real pipeline that leverages a rendering engine and an inverse mapping approach to construct a large-scale dataset of paired sequences. Furthermore, we establish MoZooBench, a comprehensive benchmark with 120 mesh-video pairs. Experimental results demonstrate that MoZoo achieves high-fidelity fur simulation across diverse animal skeletons and layouts, preserving superior temporal and structural consistency.