arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4033
2604.07558 2026-05-12 cs.HC cs.AI

Generative Experiences for Digital Mental Health Interventions: Evidence from a Randomized Study

Ananya Bhattacharjee, Michael Liut, Matthew Jörke, Diyi Yang, Emma Brunskill

AI总结 该研究探讨了如何通过生成式体验提升数字心理健康干预的效果,提出了一种在运行时动态生成个性化干预内容和多模态交互结构的新范式。研究构建了名为GUIDE的系统,通过模块化组件的指导生成实现个性化内容与交互方式的组合,并在237名参与者的随机对照实验中验证了其有效性,结果显示GUIDE在降低压力和提升用户体验方面优于基于大语言模型的认知重构方法。该工作为数字环境中动态塑造支持体验提供了新的思路和实践基础。

详情
英文摘要

Digital mental health (DMH) tools have extensively explored personalization of interventions to users' needs and contexts. However, this personalization often targets what support is provided, not how it is experienced. Even well-matched content can fail when the interaction format misaligns with how someone can engage. We introduce generative experience as a paradigm for DMH support, where the intervention experience is composed at runtime. We instantiate this in GUIDE, a system that generates personalized intervention content and multimodal interaction structure through rubric-guided generation of modular components. In a preregistered study with N = 237 participants, GUIDE significantly reduced stress (p = .02) and improved the user experience (p = .04) compared to an LLM-based cognitive restructuring control. GUIDE also supported diverse forms of reflection and action through varied interaction flows, while revealing tensions around personalization across the interaction sequence. This work lays the foundation for interventions that dynamically shape how support is experienced and enacted in digital settings.

2604.06518 2026-05-12 eess.IV cs.AI cs.CV

ADP-FL-MedSeg: Adaptive Differential Privacy for Federated Medical Segmentation Across Diverse Modalities

Puja Saha, Eranga Ukwatta

AI总结 由于隐私法规和机构限制,医疗数据难以集中利用,而集中训练的模型又常因影像协议和数据分布的异质性而难以跨临床站点泛化。为此,本文提出一种自适应差分隐私联邦学习框架ADP-FL-MedSeg,通过动态调整隐私机制,在保证隐私的前提下提升分割精度与训练稳定性。实验表明,该方法在多种医学影像分割任务中均优于传统联邦学习和标准差分隐私联邦学习,实现了高精度、高稳定性的隐私保护医疗图像分割。

Comments 10 pages, 8 figures. Accepted in SPIE Medical Imaging 2026. Recipient of CAD Best Paper Award: 1st Place, and Robert F. Wagner All-Conference Best Paper Award: Finalist

详情
Journal ref
Proceedings Volume 13926, SPIE Medical Imaging 2026: Computer-Aided Diagnosis
英文摘要

Large volumes of medical data remain underutilized because centralizing distributed data is often infeasible due to strict privacy regulations and institutional constraints. In addition, models trained in centralized settings frequently fail to generalize across clinical sites because of heterogeneity in imaging protocols and continuously evolving data distributions arising from differences in scanners, acquisition parameters, and patient populations. Federated learning offers a promising solution by enabling collaborative model training without sharing raw data. However, incorporating differential privacy into federated learning, while essential for privacy guarantees, often leads to degraded accuracy, unstable convergence, and reduced generalization. In this work, we propose an adaptive differentially private federated learning (ADP-FL) framework for medical image segmentation that dynamically adjusts privacy mechanisms to better balance the privacy-utility trade-off. The proposed approach stabilizes training, significantly improves Dice scores and segmentation boundary quality, and maintains rigorous privacy guarantees. We evaluated ADP-FL across diverse imaging modalities and segmentation tasks, including skin lesion segmentation in dermoscopic images, kidney tumor segmentation in 3D CT scans, and brain tumor segmentation in multi-parametric MRI. Compared with conventional federated learning and standard differentially private federated learning, ADP-FL consistently achieves higher accuracy, improved boundary delineation, faster convergence, and greater training stability, with performance approaching that of non-private federated learning under the same privacy budgets. These results demonstrate the practical viability of ADP-FL for high-performance, privacy-preserving medical image segmentation in real-world federated settings.

2604.02564 2026-05-12 eess.IV cs.CV

Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It

Sebo Diaz, Polina Golland, Elfar Adalsteinsson, Neel Dey

AI总结 本文提出了一种名为MaskGen的3D生物医学图像分割领域泛化方法,旨在解决模型在不同模态、疾病严重程度和临床环境变化下的性能下降问题。该方法通过结合源域图像强度和领域稳定的预训练模型表示,以较低的实现成本训练出鲁棒的分割模型,在全监督和少样本分割任务中均表现出色。与现有方法相比,MaskGen不依赖特定网络结构或损失函数,兼容标准数据增强流程,易于实现,并能适用于任意解剖区域。

Comments Project GitHub https://github.com/sebodiaz/MaskGen

详情
英文摘要

We present MaskGen, a theoretically grounded and deliberately simple approach for domain generalization in 3D biomedical image segmentation. Modern segmentation models degrade sharply under shifts in modality, disease severity, clinical sites, and more, limiting their reliable adoption. Existing generalization methods address this using extreme augmentations, hand-engineered domain statistics mixing, or architectural redesigns that add significant implementation overhead while yielding inconsistent performance across biomedical settings. MaskGen instead presents a principled learning strategy with marginal overhead that utilizes both source-domain image intensities and domain-stable foundation model representations to train robust segmentation models. As a result, MaskGen achieves strong gains in both fully supervised and few-shot segmentation across broad clinical shifts in biomedical studies. Unlike prior approaches, MaskGen is architecture- and loss-agnostic, compatible with standard augmentation pipelines, easy to implement, and tackles arbitrary anatomical regions. Its implementation is freely available at https://github.com/sebodiaz/MaskGen.

2603.29632 2026-05-12 cs.MA cs.AI

An Empirical Study of Multi-Agent Collaboration for Automated Research

Yang Shen, Zhenyi Yi, Ziyi Zhao, Lijun Sun, Dongyang Li, Chin-Teng Lin, Yuhui Shi

AI总结 随着AI代理的发展,研究社区正从单一的大语言模型转向多智能体系统,以克服自动化研究中的认知瓶颈。本文通过严格的实验测试床,对比了单智能体与两种多智能体架构在自动化机器学习优化中的效果,揭示了操作稳定性与理论深度之间的根本权衡。研究发现,子代理架构适合在时间严格限制下进行高效搜索,而代理团队架构则在计算资源充足时更有利于复杂架构的理论优化,为未来自动化研究系统的设计提供了重要指导。

详情
英文摘要

As AI agents evolve, the community is rapidly shifting from single Large Language Models (LLMs) to Multi-Agent Systems (MAS) to overcome cognitive bottlenecks in automated research. However, the optimal multi-agent coordination framework for these autonomous agents remains largely unexplored. In this paper, we present a systematic empirical study investigating the comparative efficacy of distinct multi-agent structures for automated machine learning optimization. Utilizing a rigorously controlled, execution-based testbed equipped with Git worktree isolation and explicit global memory, we benchmark a single-agent baseline against two multi-agent paradigms: a subagent architecture (parallel exploration with post-hoc consolidation) and an agent team architecture (experts with pre-execution handoffs). By evaluating these systems under strictly fixed computational time budgets, our findings reveal a fundamental trade-off between operational stability and theoretical deliberation. The subagent mode functions as a highly resilient, high-throughput search engine optimal for broad, shallow optimizations under strict time constraints. Conversely, the agent team topology exhibits higher operational fragility due to multi-author code generation but achieves the deep theoretical alignment necessary for complex architectural refactoring given extended compute budgets. These empirical insights provide actionable guidelines for designing future autoresearch systems, advocating for dynamically routed architectures that adapt their collaborative structures to real-time task complexity.

2603.23806 2026-05-12 cs.SE cs.AI

Willful Disobedience: Automatically Detecting Failures in Agentic Traces

Reshabh K Sharma, Shraddha Barke, Benjamin Zorn

AI总结 随着AI代理越来越多地嵌入实际软件系统中,验证其执行过程中的行为合规性变得愈发重要。本文提出了一种名为AgentPex的工具,能够自动检测代理执行过程中的违规行为,其核心方法是从代理提示和系统指令中提取行为规则,并据此评估执行轨迹的合规性。实验表明,AgentPex能有效识别仅凭最终结果难以发现的流程性错误,并为不同领域和指标提供细粒度分析,有助于开发者全面了解代理的优劣。

Comments Accepted at ACM CAIS 2026

详情
英文摘要

AI agents are increasingly embedded in real software systems, where they execute multi-step workflows through multi-turn dialogue, tool invocations, and intermediate decisions. These long execution histories, called agentic traces, make validation difficult. Outcome-only benchmarks can miss critical procedural failures, such as incorrect workflow routing, unsafe tool usage, or violations of prompt-specified rules. This paper presents AgentPex, an AI-powered tool designed to systematically evaluate agentic traces. AgentPex extracts behavioral rules from agent prompts and system instructions, then uses these specifications to automatically evaluate traces for compliance. We evaluate AgentPex on 424 traces from $τ^2$-bench across models in telecom, retail, and airline customer service. Our results show that AgentPex distinguishes agent behavior across models and surfaces specification violations that are not captured by outcome-only scoring. It also provides fine-grained analysis by domain and metric, enabling developers to understand agent strengths and weaknesses at scale. The source code of AgentPex is available at https://github.com/microsoft/agentpex.

2603.16231 2026-05-12 math.OC cs.RO cs.SY eess.SY

Featurized Occupation Measures for Structured Global Search in Numerical Optimal Control

Qi Wei, Jianfeng Tao, Haoyang Tan, Hongyu Nie

AI总结 本文提出了一种名为“特征化占用度量”(Featurized Occupation Measures, FOM)的新方法,旨在解决数值最优控制中全局结构与计算可扩展性之间的矛盾。该方法通过构建有限维的原-对偶接口,将数值求解器与显式的哈密顿-雅可比-贝尔曼(HJB)子解耦合,从而在保证计算效率的同时实现全局搜索。研究还展示了该框架在处理高维问题时如何将维度诅咒从状态空间转移到连接拓扑结构,并通过实验验证了其在静态避障任务中引导优化器逼近全局最优的有效性。

详情
英文摘要

Numerical optimal control has long been split between globally structured but dimensionally intractable Hamilton--Jacobi--Bellman (HJB) methods and scalable but local trajectory optimization. We introduce Featurized Occupation Measures (FOM), a finite-dimensional primal--dual interface for coupling numerical optimal control solvers with explicit HJB subsolutions: the certificate guides the primal search, while primal residuals tighten the certificate in a primal-dual language. Two realizations are developed. The explicit realization uses finite weak-form Liouville tests, and the implicit realization couples rollout-based search with sampled primal--dual residuals. Both are proved asymptotically consistent with the exact occupation-measure linear program under refinement, separating primal expressiveness from dual accuracy in the limit. The framework also gives structural conditions under which HJB-type certificates avoid full state-space representation. For factor graphs induced by compatible passivity-based interconnections, blockwise HJB inequalities assemble into globally feasible OM-dual certificates, and the decomposition is preserved under blockwise approximation. The curse of dimensionality is then shifted from state space to interconnection topology. Approximate certificates remain reusable under time shifts and bounded model perturbations, with explicit degradation bounds. On a static obstacle-avoidance benchmark, certificates of increasing tightness guide a sample-based optimizer toward global optima, confirming that even a coarse certificate carries useful global information.

2603.13536 2026-05-12 quant-ph cs.LG

Active Sampling Sample-based Quantum Diagonalization from Finite-Shot Measurements

Rinka Miura

AI总结 本文提出了一种基于主动采样的量子对角化方法(AS-SQD),用于从有限采样测量中高效估计量子系统的基态能量。该方法将量子对角化视为一个主动学习问题,通过引入基于微扰理论的获取函数,动态选择对当前子空间最有价值的基态进行扩展,从而有效减少偏差和激发态污染的影响。实验表明,与传统方法相比,AS-SQD在多种量子系统中表现出更优的基态能量估计精度,并且在实际量子硬件上也展示了良好的鲁棒性。

Comments 7 pages, 5 figures

详情
Journal ref
IEEE International Conference on Quantum Communications, Networking, and Computing (QCNC 2026)
英文摘要

Near-term quantum devices provide only finite-shot measurements and prepare imperfect, contaminated states. This motivates algorithms that convert samples into reliable low-energy estimates without full tomography or exhaustive measurements. We propose Active Sampling Sample-based Quantum Diagonalization (AS-SQD), framing SQD as an active learning problem: given measured bitstrings, which additional basis states should be included to efficiently recover the ground-state energy? SQD restricts the Hamiltonian to a selected set of basis states and classically diagonalizes the restricted matrix. However, naive SQD using only sampled states suffers from bias under finite-shot sampling and excited-state contamination, while blind random expansion is inefficient as system size grows. We introduce a perturbation-theoretic acquisition function based on Epstein--Nesbet second-order energy corrections to rank candidate basis states connected to the current subspace. At each iteration, AS-SQD diagonalizes the restricted Hamiltonian, generates connected candidates, and adds the most valuable ones according to this score. We evaluate AS-SQD on disordered Heisenberg and Transverse-Field Ising (TFIM) spin chains up to 16 qubits under a preparation model mixing 80\% ground state and 20\% first excited state. Furthermore, we validate its robustness against real-world state preparation and measurement (SPAM) errors using physical samples from an IBM Quantum processor. Across simulated and hardware evaluations, AS-SQD consistently achieves substantially lower absolute energy errors than standard SQD and random expansion. Detailed ablation studies demonstrate that physics-guided basis acquisition effectively concentrates computation on energetically relevant directions, bypassing exponential combinatorial bottlenecks.

2603.10051 2026-05-12 cs.NI cs.AI cs.CR cs.LG

Where Do Flow Semantics Reside? A Protocol-Native Tabular Pretraining Paradigm for Encrypted Traffic Classification

Sizhe Huang, Zitong Li, Shujie Yang

AI总结 该论文研究了如何更有效地对加密网络流量进行分类,指出当前基于自监督掩码建模的方法在依赖标注数据方面仍存在问题。作者认为问题的根源在于将流量数据扁平化为字节序列时破坏了协议定义的语义结构,导致语义信息丢失和嵌入混淆。为此,他们提出了一种基于协议原生的表格预训练范式,引入了FlowSem-MAE模型,通过保留协议字段语义结构,显著提升了加密流量分类的性能。

详情
英文摘要

Self-supervised masked modeling shows promise for encrypted traffic classification by masking and reconstructing raw bytes. Yet recent work reveals these methods fail to reduce reliance on labeled data despite costly pretraining: under frozen encoder evaluation, accuracy drops from greater than 0.9 to less than 0.47. We argue the root cause is inductive bias mismatch: flattening traffic into byte sequences destroys protocol-defined semantics. We identify three specific issues: 1) field unpredictability, random fields like ip.id are unlearnable yet treated as reconstruction targets; 2) embedding confusion, semantically distinct fields collapse into a unified embedding space; 3) metadata loss, capture-time metadata essential for temporal analysis is discarded. To address this, we propose a protocol-native paradigm that treats protocol-defined field semantics as architectural priors, reformulating the task to align with the data's intrinsic tabular modality rather than incrementally adapting sequence-based architectures. Instantiating this paradigm, we introduce FlowSem-MAE, a tabular masked autoencoder built on Flow Semantic Units (FSUs). It features predictability-guided filtering that focuses on learnable FSUs, FSU-specific embeddings to preserve field boundaries, and dual-axis attention to capture intra-packet and temporal patterns. FlowSem-MAE significantly outperforms state-of-the-art across datasets. With only half labeled data, it outperforms most existing methods trained on full data.

2603.05653 2026-05-12 cs.CY cs.AI cs.IR cs.SI

The DSA's Blind Spot: Algorithmic Audit of Advertising and Minor Profiling on TikTok

Sara Solarova, Matej Mosnar, Matus Tibensky, Jan Jakubcik, Adrian Bindas, Simon Liska, Filip Hossner, Matúš Mesarčík, Ivan Srba

AI总结 该研究针对 TikTok 平台上的广告和未成年人定向推荐机制进行了算法审计,揭示了《数字服务法》(DSA)在保护未成年人免受算法推荐广告影响方面的盲点。研究通过模拟未成年人和成人的用户账户,发现尽管 TikTok 表面上遵守了 DSA 的规定,但未成年人仍能接触到基于兴趣高度定制的商业内容,其推荐强度远高于成人的正式广告。研究指出,现行法律对“广告”的狭义定义未能涵盖网红合作和品牌推广等内容,导致监管存在漏洞,亟需扩大广告定义并禁止对未成年人的定向推荐。

Comments In The 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT'26), June 25-28, 2026, Montreal, QC, Canada. ACM

详情
英文摘要

Adolescents spend an increasing amount of their time in digital environments where their still-developing cognitive capacities leave them unable to recognize or resist commercial persuasion. Article 28(2) of the DSA responds to this vulnerability by prohibiting profiling-based advertising to minors. However, the regulation's narrow definition of "advertisement" excludes current advertising practices including influencer paid partnerships and brand promotional content that serve functionally equivalent commercial purposes. We provide the first empirical evidence of how this definitional gap operates in practice through an algorithmic audit of TikTok. Our approach deploys sock-puppet accounts simulating a pair of minor and adult users with matching interest profiles. The content recommended to these users is automatically annotated, enabling systematic statistical analysis. Our findings reveal a stark regulatory paradox. TikTok demonstrates formal compliance with Article 28(2) by shielding minors from profiled formal advertisements, yet both disclosed and undisclosed ads exhibit significant profiling aligned with user interests (5-8 times stronger than for adult formal advertising). The strongest profiling emerges within undisclosed commercial content, where creators/brands fail to label paid partnership/promotional content and the platform neither corrects this omission nor prevents its personalized delivery to minors. These results demonstrate that minors remain exposed to algorithmically targeted commercial content through the same recommendation mechanisms the DSA seeks to constrain. We argue that protecting minors requires expanding the definition of advertisement in EU law to encompass influencer and brand promotional content, and ensuring that any such expansion is accompanied by a corresponding prohibition on profiling-based targeting of minors.

2603.03971 2026-05-12 cs.CY cs.AI cs.LG cs.LO

Upholding Epistemic Agency: A Brouwerian Assertibility Constraint for Responsible AI

Michael Jülich

AI总结 本文探讨了生成式人工智能在高风险领域中可能削弱民主知识主体性的问题,提出了一种基于布劳威尔思想的断言约束机制,要求系统在无法提供可公开检验和争议的证明时,必须返回“未确定”状态,而非直接断言或否定。该方法引入了包含“断言”、“否定”和“未确定”三种状态的接口语义,强调系统对断言的资格而非断言内容本身的真实性。通过在决策层引入阈值和参数选择的门控机制,该方法确保系统输出具有可挑战的依据,从而维护知识主体性,防止自动化话语对公共论证的过度影响。

Comments Preprint. 64 pages, 5 figures, 2 tables

详情
英文摘要

Generative AI can convert uncertainty into hypersuasive, authoritative-seeming verdicts, displacing the justificatory work on which democratic epistemic agency depends. As a corrective, I propose a Brouwer-inspired assertibility constraint for responsible AI: in high-stakes domains, systems may assert or deny claims only if they can provide a publicly inspectable and contestable certificate of entitlement; otherwise they must return Undetermined. This constraint yields a three-status interface semantics (Asserted, Denied, Undetermined) in which statuses mark entitlement to categorical speech rather than truth values of the underlying world-claim. The semantics cleanly separates internal entitlement from public standing while connecting them via the certificate as a boundary object. It also produces a time-indexed entitlement profile that is stable under numerical refinement yet revisable as the public record changes. I operationalize the constraint through decision-layer gating of threshold and argmax outputs, using internal witnesses (e.g., sound bounds or separation margins) and an output contract with reason-coded abstentions. A design lemma shows that any total, certificate-sound binary interface already decides the deployed predicate on its declared scope, so Undetermined is not a tunable reject option but a mandatory status whenever no adequate forcing witness is available. By making outputs answerable to challengeable warrants rather than confidence alone, the paper aims to preserve epistemic agency against the hypersuasive pull of automated speech in public justification.

2602.01022 2026-05-12 econ.GN cs.AI q-fin.EC

Calibrating Behavioral Parameters with Large Language Models

Brandon Yee, Pairie Koh

AI总结 本文研究如何利用大语言模型(LLM)校准行为参数,如损失厌恶、从众和过度推断等,这些参数在资产定价模型中具有核心地位但难以准确测量。作者构建了一个框架,将LLM作为校准工具,通过大量实验发现LLM在行为理性方面存在系统性偏差,并通过基于角色的校准方法显著提升了其行为参数的合理性和稳定性。研究还验证了校准后参数在资产定价模型中的有效性,揭示了八种典型行为偏差的测量范围和校准函数。

详情
英文摘要

Behavioral parameters such as loss aversion, herding, and extrapolation are central to asset pricing models but remain difficult to measure reliably. We develop a framework that treats large language models (LLMs) as calibrated measurement instruments for behavioral parameters. Using four models and 24{,}000 agent--scenario pairs, we document systematic rationality bias in baseline LLM behavior, including attenuated loss aversion, weak herding, and near-zero disposition effects relative to human benchmarks. Profile-based calibration induces large, stable, and theoretically coherent shifts in several parameters, with calibrated loss aversion, herding, extrapolation, and anchoring reaching or exceeding benchmark magnitudes. To assess external validity, we embed calibrated parameters in an agent-based asset pricing model, where calibrated extrapolation generates short-horizon momentum and long-horizon reversal patterns consistent with empirical evidence. Our results establish measurement ranges, calibration functions, and explicit boundaries for eight canonical behavioral biases.

2601.22638 2026-05-12 cs.MA cs.AI cs.LG

ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review

Palash Goyal, Mihir Parmar, Yiwen Song, Hamid Palangi, Tomas Pfister, Jinsung Yoon

AI总结 随着机器学习论文数量的激增,传统的同行评审流程面临效率低下和审稿负担加重的问题。为此,研究提出 ScholarPeer,一个基于多智能体的框架,旨在辅助审稿人进行技术严谨性审查,并在论文提交前帮助作者快速迭代。该框架通过分离上下文理解与批评过程,引入领域历史分析、前沿对比挖掘和多维度问答引擎,提升了评审的深度与效率。实验表明,ScholarPeer 在 ICLR 论文上的表现优于现有先进模型。

详情
英文摘要

The exponential growth of machine learning submissions has strained the traditional peer review process, resulting in slow feedback loops for authors and an immense burden on reviewers to rigorously audit technical soundness and verify literature. To address this, we introduce ScholarPeer, a multi-agent framework designed to operationalize the rigorous auditing workflow of a senior researcher. Rather than attempting to replace human judgment, ScholarPeer serves as a co-scientist: acting as a mentor for rapid author iteration prior to submission, and as an active verification assistant that augments human reviewers. The framework structurally decouples contextualization from critique by deploying a sub-domain historian to synthesize the field's trajectory, a baseline scout to proactively hunt for omitted state-of-the-art comparisons, and a multi-aspect Q&A engine that deeply audits technical soundness-scrutinizing internal logical consistency, experimental validity, and mathematical rigor-while cross-referencing claims against top-tier academic venues. We comprehensively evaluate ScholarPeer on ~1,800 ICLR submissions spanning 2020 through 2025. Our results show that ScholarPeer achieves significant win-rates against state-of-the-art fine-tuned models and search-augmented agentic baselines.

2601.21410 2026-05-12 stat.ML cs.LG

Learning When to Trust LLM Priors: A Validated Framework for Semantic Prior Integration

Erica Zhang, Naomi Sagan, Danny Tse, Fangzhao Zhang, Mert Pilanci, Jose Blanchet

AI总结 该研究探讨了如何在监督学习中可靠地利用大语言模型(LLM)的语义先验知识。作者提出了一种名为Statsformer的验证框架,能够动态判断何时信任LLM生成的语义先验,并将其融入到不同类型的预测模型中。通过交叉验证机制,Statsformer自动调整各模型对先验信息的依赖程度,从而在提升预测性能的同时抑制不可靠的先验信号,为LLM辅助的统计学习提供了一种可靠性导向的解决方案。

详情
英文摘要

Large language models (LLMs) encode rich semantic knowledge that can be useful for supervised learning, but their outputs are unreliable as statistical priors: they may be noisy, misspecified, or hallucinated. Existing LLM-informed learning methods either trust such signals directly, leaving predictions vulnerable to unreliable LLM guidance, or restrict semantic integration to a single model class. We introduce Statsformer, a validated framework for learning when to trust LLM-derived semantic priors in supervised statistical learning. Statsformer maps LLM-derived feature scores into a family of learner-specific prior-injection mechanisms across a heterogeneous library of linear and nonlinear predictors. It then uses out-of-fold validation to adaptively calibrate the influence of each prior-informed learner, allowing useful semantic information to improve prediction while attenuating weak, misspecified, or adversarial priors. This yields a guardrailed statistical learning system with an oracle-style guarantee: up to statistical error, the final predictor performs no worse than the best convex combination of its in-library candidates, including prior-free learners. Across diverse prediction tasks, informative LLM priors improve performance, while unreliable priors are automatically downweighted. These results position Statsformer as a reliability-oriented approach to LLM-informed statistical learning: rather than trusting LLM knowledge directly, it validates semantic priors against data before allowing them to influence the final predictor.

2601.20251 2026-05-12 stat.ML cs.LG

Efficient Evaluation of LLM Performance with Statistical Guarantees

Skyler Wu, Yash Nair, Emmanuel J. Candès

AI总结 本文研究如何在有限查询预算下高效且准确地评估大量大语言模型的性能。提出了一种名为Factorized Active Querying(FAQ)的方法,结合贝叶斯因子模型、自适应采样策略和有限总体主动推理,以在保证统计置信度的同时减少所需的评估样本数量。实验表明,FAQ在多个基准测试中相比现有方法可提升有效样本量达5倍,显著提高了评估效率。

Comments 27 pages, 12 figures

详情
英文摘要

Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized Active Querying (FAQ), which (a) leverages historical information through a Bayesian factor model; (b) adaptively selects questions using a hybrid variance-reduction/active-learning sampling policy; and (c) maintains validity through Proactive Active Inference -- a finite-population extension of active inference (Zrnic & Candès, 2024) that enables direct question selection while preserving coverage. With negligible overhead cost, FAQ delivers up to $5\times$ effective sample size gains over strong baselines on two benchmark suites, across varying historical-data missingness levels: this means that it matches the CI width of uniform sampling while using up to $5\times$ fewer queries. We release our source code and our curated datasets to support reproducible evaluation and future research.

2512.20012 2026-05-12 eess.SP cs.LG

Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems

Qiushuo Hou, Sangwoo Park, Matteo Zecchin, Yunlong Cai, Guanding Yu, Osvaldo Simeone, Tommaso Melodia

AI总结 本文研究了一种基于大语言模型(LLM)的边缘-云-专家级联知识系统,用于电信领域中的自动化决策支持。该系统通过问答流程实现决策,其中边缘模型处理常规查询,云模型处理复杂问题,仅在必要时引入人工专家。研究提出了一种基于多重假设检验的阈值选择方法,以在保证答案与专家判断一致性的前提下,最小化处理成本,并在电信专用数据集TeleQnA上验证了该方法在成本效率和可靠性方面的优越性。

Comments This paper has been submitted to a journal

详情
英文摘要

Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications, assisting with tasks including troubleshooting, standards interpretation, and network optimization. However, their deployment in practice must balance inference cost, latency, and reliability. In this work, we study an edge-cloud-expert cascaded LLM-based knowledge system that supports decision-making through a question-and-answer pipeline. In it, an efficient edge model handles routine queries, a more capable cloud model addresses complex cases, and human experts are involved only when necessary. We define a misalignment-cost constrained optimization problem, aiming to minimize average processing cost, while guaranteeing alignment of automated answers with expert judgments. We propose a statistically rigorous threshold selection method based on multiple hypothesis testing (MHT) for a query processing mechanism based on knowledge and confidence tests. The approach provides finite-sample guarantees on misalignment risk. Experiments on the TeleQnA dataset -- a telecom-specific benchmark -- demonstrate that the proposed method achieves superior cost-efficiency compared to conventional cascaded baselines, while ensuring reliability at prescribed confidence levels.

2512.11077 2026-05-12 cond-mat.mtrl-sci cs.AI

A probabilistic framework for crystal structure denoising, phase classification, and order parameters

Hyuna Kwon, Babak Sadigh, Sebastien Hamel, Vincenzo Lordi, John Klepeis, Fei Zhou

AI总结 该研究提出了一种统一的概率框架,用于从噪声原子构型中去噪、分类晶体相并计算序参量。该方法通过预测每个原子对每个晶体原型的置信度,并将其聚合为一个标量对数概率景观,从而实现去噪场的构建与局部相标签、序参量和不确定性度量的获取。模型在多种噪声和缺陷条件下表现出良好的泛化能力,为复杂原子模拟的分析提供了集成且可扩展的工具。

详情
英文摘要

Atomistic simulations generate large volumes of noisy structural data, yet extracting phase labels and continuous order parameters (OPs) in a robust and general manner remains challenging. Existing tools are often specialized to a limited set of prototypes and split thermal-noise removal, phase classification, and OP construction into separate steps. Here we present a unified probabilistic framework for analyzing noisy atomic configurations with respect to known crystal prototypes. The model predicts per-atom, per-prototype logits and aggregates them into a scalar log-probability (logP) landscape over atomic coordinates. Its gradient defines a conservative denoising field, while the logits provide local phase labels, prototype-resolved OPs, and ambiguity measures through logit margins. We train on AFLOW-mapped crystalline structures from the Materials Project with synthetic positional and elastic perturbations, then test extrapolation to stronger noise, finite-temperature disorder, point defects, water--ice coexistence, binary polymorphs, and shock-compressed Ti. A single differentiable scalar model recovers prototype identity after denoising, tracks smooth transformations such as Bain and Burgers paths, and exposes low-confidence regions near defects and phase boundaries. This provides an integrated and extensible tool for analyzing complex atomistic simulations.

2511.21600 2026-05-12 cs.CR cs.LG

Robust Spectral Watermark for Synthetic Tabular Data

Yizhou Zhao, Xiang Li, Peter Song, Qi Long, Weijie Su

AI总结 随着生成式人工智能的发展,合成表格数据在医疗、金融等领域广泛应用,但数据来源和潜在滥用问题日益突出。为解决这一问题,本文提出了一种高效且鲁棒的频域水印方法TAB-DRW,通过归一化异构特征、应用离散傅里叶变换并调整选定条目的虚部来嵌入水印信号,同时引入基于排序的伪随机位生成方法提升鲁棒性与效率。实验表明,TAB-DRW在保持高数据保真度的同时,有效抵抗了多种后处理攻击,适用于混合类型数据。

Comments Accepted to Statistical Learning and Data Science

详情
英文摘要

The rise of generative AI has enabled the production of high-fidelity synthetic tabular data across fields such as healthcare, finance, and public policy, raising growing concerns about data provenance and misuse. Watermarking offers a promising solution to address these concerns by ensuring the traceability of synthetic data, but existing methods face many limitations: they are computationally expensive due to reliance on the inverse process of large diffusion models, struggle with mixed discrete-continuous data, or lack robustness to common post-processing attacks. To address these limitations, we propose TAB-DRW, an efficient and robust post-editing watermarking scheme for synthetic tabular data. TAB-DRW embeds watermark signals in the frequency domain: it normalizes heterogeneous features via the Yeo-Johnson transformation and standardization, applies the discrete Fourier transform (DFT), and adjusts the imaginary parts of adaptively selected entries according to precomputed pseudorandom bits. To further enhance robustness and efficiency, we introduce a novel rank-based pseudorandom bit generation method that enables row-wise retrieval without incurring storage overhead. Experiments on five benchmark tabular datasets show that TAB-DRW achieves strong detectability and robustness against post-processing and adaptive attacks, while preserving high data fidelity and fully supporting mixed-type features.

2511.14045 2026-05-12 cs.CR cs.AI cs.CL

Auditing Data Membership in Reinforcement Learning With Verifiable Rewards

Yule Liu, Heyi Zhang, Jinyi Zheng, Zhen Sun, Zifan Peng, Jiaheng Wei, Tianshuo Cong, Yilong Yang, Xinlei He

AI总结 该研究针对强化学习与可验证奖励(RLVR)训练过程中可能存在的数据泄露问题,提出了一种可验证的审计方法。研究指出,传统成员推理攻击难以适用于RLVR,因其响应由模型自身生成并不断优化。为此,作者提出了一种白盒级别的行为偏差审计框架DIBA,通过对比微调模型与预训练模型在奖励和策略层面的行为差异,实现对数据暴露的稳定检测。实验表明,DIBA在多种设置下均优于现有方法,具有较高的检测准确率和鲁棒性。

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has become a core training stage in recent large language models (LLMs). Its reliance on non-public, high-value prompt sets raises concerns about unauthorized data use, creating a need for exposure auditing. A natural tool is membership inference attacks (MIAs), but existing methods detect fitting to a fixed target string. This does not apply to RLVR, which generates responses from the model itself and reinforces successful ones, thus hindering the auditing of data exposure. We show that it remains detectable: RLVR reshapes the model's response distribution on training prompts, producing behavioral traces that can be surfaced through targeted auditing. We propose Divergence-in-Behavior Auditing (DIBA), a white-box query-level auditing framework for RLVR. DIBA compares a fine-tuned model against its pre-RLVR checkpoint along two axes: reward-side evidence capturing changes in verifiable task success, and policy-side evidence capturing prompt-conditioned behavioral drift. By aggregating over multiple stochastic rollouts, DIBA produces a stable query-level auditing signal. Under a white-box setting, DIBA consistently outperforms strong transferred likelihood-based baselines, including calibrated and self-generated variants, achieving around 0.8 AUC and an order-of-magnitude stronger TPR@0.1%FPR. We further show that RLVR auditing is stronger when training leaves non-trivial prompt-specific traces and weaker when the base model already performs well on the prompt. Under a practical grey-box setting, transfer is often robust across model sizes under the same RLVR algorithm, but more varied across algorithms, and can remain useful under distribution shift with carefully chosen shadow data.

2511.13415 2026-05-12 cs.IR cs.CL cs.CV

Attention Grounded Enhancement for Visual Document Retrieval

Wanqing Cui, Wei Huang, Yazhi Guo, Yibo Hu, Meiguang Jin, Junfeng Ma, Keping Bi

AI总结 视觉文档检索需要理解异构和多模态内容以满足隐含的信息需求。尽管现有方法通过细粒度的晚期交互提升了检索性能,但它们仍依赖于粗粒度的全局相关性标签,难以捕捉文档中具体区域与查询之间的语义关联。为此,本文提出AGREE框架,利用多模态大语言模型的跨模态注意力作为监督信号,引导检索器识别与查询相关的文档区域,结合局部区域信号与全局标签进行联合优化,从而在ViDoRe V2基准测试中显著提升了检索效果。

Comments Published as a conference paper at SIGIR 2026

详情
英文摘要

Visual document retrieval requires understanding heterogeneous and multi-modal content to satisfy implicit information needs. Recent advances use screenshot-based document encoding with fine-grained late interaction to encode holistic information and capture nuanced alignments, significantly improving retrieval performance. However, retrievers are still trained with coarse global relevance labels, without revealing which regions support the match. As a result, retrievers tend to rely on surface-level cues and struggle to capture implicit semantic connections, hindering their ability to handle non-extractive queries.To improve fine-grained relevance modeling, we propose a Attention-Grounded REtriever Enhancement (AGREE) framework. AGREE leverages cross-modal attention from multimodal large language models (MLLMs) as proxy supervision to guide the retriever in identifying relevant document regions. Specifically, AGREE extracts attention maps from the MLLM that highlight which document regions are attended to based on the query. These attention scores serve as local, region-level relevance signals. During training, AGREE combines local signals with the global document-level relevance label to jointly optimize the retriever. This dual-level supervision enables the model to learn not only whether documents match, but also which content drives relevance. Experiments on the challenging visual document retrieval benchmark, ViDoRe V2, show that AGREE significantly outperforms the global-supervision-only baseline by 12.82\% and 5.03\% in terms of average nDCG@1 and nDCG@5. Quantitative and qualitative analyses further demonstrate that AGREE promotes deeper alignment between query terms and document regions, moving beyond surface-level matching toward more accurate and interpretable retrieval. Our code is available at: https://github.com/VickiCui/AGREE.

2511.08644 2026-05-12 cs.SE cs.AI cs.PF

Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis

Punit Kumar, Asif Imran, Tevfik Kosar

AI总结 本文对三种主流Python数据处理库——Pandas、Polars和Dask——在端到端深度学习流水线中的性能进行了详细对比分析,重点考察它们在数据加载、预处理和批量输入等关键阶段与大规模GPU工作负载的交互情况。研究测量了包括运行时间、内存使用、磁盘使用以及CPU和GPU能耗在内的关键性能指标,填补了现有文献在该领域的研究空白,并为选择适合深度学习任务的数据处理库提供了参考依据。

详情
英文摘要

This paper presents a detailed comparative analysis of the performance of three major Python data manipulation libraries - Pandas, Polars, and Dask - specifically when embedded within complete deep learning (DL) training and inference pipelines. The research bridges a gap in existing literature by studying how these libraries interact with substantial GPU workloads during critical phases like data loading, preprocessing, and batch feeding. The authors measured key performance indicators including runtime, memory usage, disk usage, and energy consumption (both CPU and GPU) across various machine learning models and datasets.

2511.01196 2026-05-12 stat.ML cs.AI cs.LG

An Interdisciplinary and Cross-Task Review on Missing Data Imputation

Jicong Fan

AI总结 本文系统综述了缺失数据填补这一跨学科、跨任务的研究领域,探讨了缺失机制、填补方法及在不同应用场景下的问题特性。文章全面梳理了从传统统计方法到现代深度学习模型(如自编码器、生成对抗网络、图神经网络等)的各类填补技术,并重点分析了复杂数据类型(如张量、时间序列、图结构数据等)的处理方法。此外,还探讨了填补方法与下游任务(如分类、聚类、异常检测)的结合方式,并指出了未来研究的关键挑战与发展方向。

详情
Journal ref
Foundations and Trends in Signal Processing, Vol. 20, No. 3, pp. 185-317, 2026
英文摘要

Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring. Despite decades of research and numerous imputation methods, the literature remains fragmented across fields, creating a critical need for a comprehensive synthesis that connects statistical foundations with modern machine learning advances. This work systematically reviews core concepts-including missingness mechanisms, single versus multiple imputation, and different imputation goals-and examines problem characteristics across various domains. It provides a thorough categorization of imputation methods, spanning classical techniques (e.g., regression, the EM algorithm) to modern approaches like low-rank and high-rank matrix completion, deep learning models (autoencoders, GANs, diffusion models, graph neural networks), and large language models. Special attention is given to methods for complex data types, such as tensors, time series, streaming data, graph-structured data, categorical data, and multimodal data. Beyond methodology, we investigate the crucial integration of imputation with downstream tasks like classification, clustering, and anomaly detection, examining both sequential pipelines and joint optimization frameworks. The review also assesses theoretical guarantees, benchmarking resources, and evaluation metrics. Finally, we identify critical challenges and future directions, emphasizing model selection and hyperparameter optimization, the growing importance of privacy-preserving imputation via federated learning, and the pursuit of generalizable models that can adapt across domains and data types, thereby outlining a roadmap for future research.

2510.13896 2026-05-12 q-bio.QM cs.AI cs.CV cs.MA

GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents

Xi Yu, Yang Yang, Qun Liu, Yonghua Du, Sean McSweeney, Yuewei Lin

AI总结 本文提出了一种名为GenCellAgent的训练-free细胞图像分割框架,通过结合专家级分割工具和通用视觉语言模型,实现了对异构显微图像的高效分割。该方法采用计划-执行-评估的循环机制,具备自动选择最佳工具、适应不同成像条件、支持文本引导分割新细胞器以及自我演化等功能。实验表明,GenCellAgent在多个细胞分割基准测试中表现优异,尤其在面对分布外数据和新型细胞结构时,显著优于传统专用模型,为无需重新训练的鲁棒细胞图像分割提供了实用方案。

Comments 43 pages

详情
英文摘要

Cellular image segmentation is essential for quantitative biology yet remains difficult due to heterogeneous modalities, morphological variability, and limited annotations. We present GenCellAgent, a training-free multi-agent framework that orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop (choose tool $\rightarrow$ run $\rightarrow$ quality-check) with long-term memory. The system (i) automatically routes images to the best tool, (ii) adapts on the fly using a few reference images when imaging conditions differ from what a tool expects, (iii) supports text-guided segmentation of organelles not covered by existing models, and (iv) commits expert edits to memory, enabling self-evolution and personalized workflows. Across seven cell-segmentation benchmarks spanning diverse microscopy modalities (4,718 images), this routing consistently matches or exceeds the best individual tool on every dataset and outperforms all baselines in overall accuracy. On out-of-distribution organelle data, GenCellAgent substantially outperforms specialist models that were not trained on the target domain, recovering structures that dedicated tools fail to detect. It also segments novel objects such as the Golgi apparatus via iterative text-guided refinement, with light human correction further boosting performance. Together, these capabilities provide a practical path to robust, adaptable cellular image segmentation without retraining, while reducing annotation burden and matching user preferences.

2509.25926 2026-05-12 cs.CR cs.LG

Preventing Prompt Injection with Type-Directed Privilege Separation

Dennis Jacob, Emad Alghamdi, Zhanhao Hu, Basel Alomair, David Wagner

AI总结 本文研究了如何防止现代语言模型中的提示注入攻击问题,提出了一种基于类型引导的特权分离新方法。该方法通过将不可信数据转换为受控的数据类型,限制其内容和范围,从而消除提示注入的可能性。实验表明,该方法在保持系统安全性的前提下,仍能实现强大的实用功能,且易于理解和适配各类语言模型。

Comments Revised manuscript

详情
英文摘要

Modern language models have enabled the development of agentic systems that achieve strong performance on reasoning-intensive tasks. Unfortunately, this has come with a security cost; these systems are vulnerable to prompt injection, a specialized attack where an adversary subverts the intended functionality of an agent by supplying an injected task of their own. Previous approaches address this challenge with detectors and fine-tuning defenses but are vulnerable to adaptive attacks. Other methods propose system-level defenses that guarantee security, but these are often based on techniques that prevent inter-component communication and thus are constrained in problem coverage. To this end, we introduce type-directed privilege separation, a new technique that expands the set of tasks that can be protected with system-level defenses. Our method works by converting untrusted data to a curated set of data types; unlike raw strings, each data type is limited in scope and content, eliminating the possibility for prompt injection. We evaluate our method across several case studies and find that designs using our principles can systematically prevent prompt injection attacks while featuring strong, non-trivial utility. Our approach is intuitive to understand and compatible with any language model.

2509.23391 2026-05-12 eess.SY cs.LG cs.SY nlin.CD

Optimizing the Network Topology of a Linear Reservoir Computer

Sahand Tangerami, Nicholas A. Mecholsky, Francesco Sorrentino

AI总结 本文研究如何优化线性水库计算机(RC)的网络拓扑结构,以提升其性能和可解释性。作者通过将RC的动力学分解为多个独立模式,并对每个模式进行优化,从而确定最优的连接结构,对应于水库邻接矩阵的特定特征值集。实验表明,优化后的RC在训练和测试阶段均显著优于随机生成的RC,甚至在某些情况下超越了相同规模的非线性RC,为设计高效、任务特定且分析透明的RC架构提供了理论指导和实践优势。

详情
英文摘要

Machine learning has become a fundamental approach for modeling, prediction, and control, enabling systems to learn from data and perform complex tasks. Reservoir computing is a machine learning tool that leverages high-dimensional dynamical systems to efficiently process temporal data for prediction and observation tasks. Traditionally, the connectivity of the network that underlies a reservoir computer (RC) is generated randomly, lacking a principled design. Here, we focus on optimizing the connectivity of a linear RC to improve its performance and interpretability, which we achieve by decoupling the RC dynamics into a number of independent modes. We then proceed to optimize each one of these modes to perform a given task, which corresponds to selecting an optimal RC connectivity in terms of a given set of eigenvalues of the RC adjacency matrix. Simulations on networks of varying sizes show that the optimized RC significantly outperforms randomly constructed reservoirs in both training and testing phases and often surpasses nonlinear reservoirs of comparable size. This approach provides both practical performance advantages and theoretical guidelines for designing efficient, task-specific, and analytically transparent RC architectures.

2509.22531 2026-05-12 stat.ML cs.LG

Debiased Front-Door Learners for Heterogeneous Effects

Yonghan Jung

AI总结 在观察性研究中,当处理变量和结果变量存在未观测的混杂因素,但中介变量不受混杂影响时,可通过前门(FD)调整识别因果效应。本文研究了在FD识别框架下异质处理效应(HTE)的估计问题,提出了两种去偏学习方法:FD-DR-Learner和FD-R-Learner。在明确的样本分割、重叠界、矩条件和分阶段学习假设下,这两种方法分别满足乘积误差界和阶段误差分解,从而在 nuisance 项较小时实现条件准oracle性质。实验表明,这些方法在合成数据和基于FARS数据集的真实案例中均表现出良好的稳健性和估计效率。

Comments 26 pages, 3 figures. Revised theory statements, notation, and proof presentation; conclusions unchanged. Code available at https://github.com/yonghanjung/FD-CATE

详情
英文摘要

In observational settings where treatment and outcome share unmeasured confounders but an observed mediator remains unconfounded, the front-door (FD) adjustment identifies causal effects through the mediator. We study the heterogeneous treatment effect (HTE) under FD identification and introduce two debiased learners: FD-DR-Learner and FD-R-Learner. Under explicit sample-splitting, bounded-overlap, moment, and stage-learning assumptions, we show that FD-DR satisfies a product-error bound and FD-R satisfies a stage-error decomposition; these results yield conditional quasi-oracle corollaries when the relevant nuisance remainders are no larger than the target or stage oracle terms. We provide error analyses establishing this debiasedness and demonstrate robust empirical performance in synthetic studies and a real-world case study of primary seat-belt laws using Fatality Analysis Reporting System (FARS) dataset. Together, these results indicate that the proposed learners can deliver reliable and sample-efficient HTE estimates in FD scenarios when the stated assumptions are credible. The implementation is available at https://github.com/yonghanjung/FD-CATE.

2509.02372 2026-05-12 cs.CR cs.AI cs.SE

Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs

Zhiyang Chen, Tara Saba, Xun Deng, Xujie Si, Fan Long

AI总结 随着大型语言模型(LLMs)在软件开发中扮演越来越重要的角色,其训练过程中依赖未经过滤的网络数据所带来的安全风险日益突出。本文提出了一种可扩展的自动化审计框架 Scam2Prompt,用于检测生产环境中 LLM 是否会在特定提示下生成包含恶意链接的代码。研究发现,四款主流 LLM 在面对精心设计的提示时,有 4.24% 的概率生成恶意内容,且该问题在 2025 年发布的七款新模型中依然存在,恶意代码生成率最高达 47.3%。

详情
英文摘要

Large Language Models have become critical to modern software development, but their reliance on uncurated web-scale datasets for training introduces a significant security risk: the absorption and reproduction of malicious content. This risk materialized in November 2024, when a user suffered a 2,500 USD financial loss after executing code generated by ChatGPT that contained a live scam phishing URL. To systematically evaluate this risk, we introduce Scam2Prompt, a scalable automated auditing framework that identifies the underlying intent of a scam site and then synthesizes developer-style prompts that mirror this intent, allowing us to test whether an LLM will generate malicious code in response to these prompts. In a large-scale study of four production LLMs (GPT-4o, GPT-4o-mini, Llama-4-Scout, and DeepSeek-V3), we found that Scam2Prompt's developer-style prompts triggered malicious URL generation in 4.24\% of cases. To test the persistence of this security risk, we constructed Innoc2Scam-bench, a benchmark of 1,377 prompts that consistently elicited malicious code from all four initial LLMs. When applied to seven additional production LLMs released in 2025, we found the vulnerability is not only present but severe, with malicious code generation rates ranging from 12.9\% to 47.3\%. Furthermore, existing safety measures like state-of-the-art guardrails or RAG-based agents proved insufficient to prevent this behavior.

2508.08441 2026-05-12 q-bio.QM cs.CE cs.LG

SpectraLLM: Uncovering the Ability of LLMs for Molecular Structure Elucidation from Multi-Spectral Data

Yunyue Su, Jiahui Chen, Zao Jiang, Zhenyi Zhong, Liang Wang, Qiang Liu, Zhaoxiang Zhang

AI总结 本文提出了一种名为SpectraLLM的大语言模型,用于从多谱数据中进行分子结构解析。该模型通过统一表示多种光谱模态(如IR、Raman、UV-Vis、NMR和MS)的信息,在共享的语言空间中进行端到端的结构预测,从而捕捉不同光谱类型之间的互补性特征。实验表明,SpectraLLM在多个公开基准数据集上取得了优于单模态方法的最先进性能,并展示了在多模态联合推理中的优越性,为基于语言模型的光谱分析提供了可扩展的范式。

Comments 42 pages, 6 figures, 30 tables; Accepted to ICLR 2026

详情
Journal ref
Proceedings of the 14th International Conference on Learning Representations (ICLR), 2026
英文摘要

Automated molecular structure elucidation remains challenging, as existing approaches often depend on pre-compiled databases or restrict themselves to single spectroscopic modalities. Here we introduce SpectraLLM, a large language model that performs end-to-end structure prediction by reasoning over one or multiple spectra. Unlike conventional spectrum-to-structure pipelines, SpectraLLM represents both continuous (IR, Raman, UV-Vis, NMR) and discrete (MS) modalities in a shared language space, enabling it to capture substructural patterns that are complementary across different spectral types. We pretrain and fine-tune the model on small-molecule domains and evaluate it on four public benchmark datasets. SpectraLLM achieves state-of-the-art performance, substantially surpassing single-modality baselines. Moreover, it demonstrates strong robustness in unimodal settings and further improves prediction accuracy when jointly reasoning over diverse spectra, establishing a scalable paradigm for language-based spectroscopic analysis. Code is available at https://github.com/OPilgrim/SpectraLLM.

2507.06850 2026-05-12 cs.CR cs.AI

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro

AI总结 本文研究了大型语言模型(LLM)作为自主智能体推理引擎时可能引发的系统级安全风险,揭示了其在面对攻击时的脆弱性。通过实验评估,作者展示了攻击者如何利用直接提示注入和RAG后门攻击等手段,使LLM自主安装并执行恶意软件,并发现几乎所有主流LLM都存在不同程度的漏洞。研究还指出,在多智能体系统中,即使某些模型能抵御直接攻击,仍可能因智能体间的信任关系被利用而被攻破,暴露出严重的系统安全隐患。

详情
英文摘要

The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables remarkable capabilities in natural language processing and generation. However, these systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises. This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents, highlighting how they can be exploited as attack vectors capable of achieving computer takeovers. We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers. We demonstrate that adversaries can effectively coerce popular LLMs into autonomously installing and executing malware on victim machines. Our evaluation of 18 state-of-the-art LLMs reveals that 94.4% of models succumb to Direct Prompt Injection, and 83.3% are vulnerable to the more stealthy and evasive RAG Backdoor Attack. Notably, we tested trust boundaries within multi-agent systems, where LLM agents interact and influence each other, and we revealed that LLMs which successfully resist direct injection or RAG backdoor attacks will execute identical payloads when requested by peer agents. We found that 100.0% of tested LLMs can be compromised through Inter-Agent Trust Exploitation attacks, and that every model exhibits context-dependent security behaviors that create exploitable blind spots.

2506.10305 2026-05-12 physics.geo-ph cs.LG physics.data-an

Self-learning signal classifier for decameter coherent scatter radars

Oleg Berngardt, Ivan Lavygin

AI总结 本文提出了一种用于十米波段相干散射雷达数据的自学习信号分类方法,仅基于雷达观测数据、电离层无线电波传播的自动建模结果以及模型质量评估的数学准则构建分类器。该分类器通过两年内12部超级双子座(SuperDARN)和SECIRA雷达的数据训练而成,包含2669个模型参数,结合了电离层传播模型计算参数与雷达直接测量参数进行分类。研究分析了37个数据类别中的14个具有明显区分度的类别,并展示了其观测动态与地理纬度、太阳和地磁活动水平的关系,结果与已知物理机制一致。

Comments 30 pages, 10 figures, 4 tables. To be submitted to Advances in Space Research

详情
Journal ref
Advances in Space Research Volume 77, Issue 3, 1 February 2026, Pages 3527-3548
英文摘要

The paper presents a method for automatic constructing a classifier for processed data obtained by decameter coherent scatter radars. Method is based only on the radar data obtained, the results of automatic modeling of radio wave propagation in the ionosphere, and mathematical criteria for estimating the quality of the models. The final classifier is the model trained at data obtained by 12 radars of the SuperDARN and SECIRA networks over two years for each radar. The number of the model coefficients is 2669. For the classification, the model uses both the calculated parameters of radio wave propagation in the model ionosphere and the parameters directly measured by the radar. Calibration of radiowave elevation measurements at each radar was made using meteor trail scattered signals. The analysis showed that the optimal number of classes in the data is 37, of which 25 are frequently observed. The analysis made it possible to choose 14 classes from them, which are confidently separated in other variants of model training. A preliminary interpretation of 10 of them was carried out. The dynamics of observation of various classes and their dependence on the geographical latitude of radars at different levels of solar and geomagnetic activity were presented, it was shown that it does not contradict with known physical mechanisms. The analysis showed that the most important parameters to identify the classes are the shape of the signal ray-tracing trajectory in its second half, the ray-traced scattering height and the Doppler velocity measured by the radar.

2506.06038 2026-05-12 eess.SY cs.RO cs.SY

Trajectory Optimization for UAV-Based Medical Delivery with Temporal Logic Constraints and Convex Feasible Set Collision Avoidance

Kaiyuan Chen, Yuhan Suo, Shaowei Cui, Yuanqing Xia, Wannian Liang, Shuo Wang

AI总结 本文研究了在城市环境中使用无人机进行时间敏感医疗物资配送的轨迹优化问题,考虑了多个医院的配送时间窗口和优先级。通过信号时序逻辑(STL)对任务目标进行形式化描述,并结合凸可行集(CFS)方法实现三维城市建筑的避障,将整个规划问题转化为凸优化问题,从而保证了求解的高效性和可行性。仿真结果验证了该方法能够生成满足时间约束、避障且动态可行的无人机轨迹,为自主无人机医疗物流提供了可扩展的解决方案。

Comments 11 pages, 4 figures

详情
英文摘要

This paper addresses the problem of trajectory optimization for unmanned aerial vehicles (UAVs) performing time-sensitive medical deliveries in urban environments. Specifically, we consider a single UAV with 3 degree-of-freedom dynamics tasked with delivering blood packages to multiple hospitals, each with a predefined time window and priority. Mission objectives are encoded using Signal Temporal Logic (STL), enabling the formal specification of spatial-temporal constraints. To ensure safety, city buildings are modeled as 3D convex obstacles, and obstacle avoidance is handled through a Convex Feasible Set (CFS) method. The entire planning problem-combining UAV dynamics, STL satisfaction, and collision avoidance-is formulated as a convex optimization problem that ensures tractability and can be solved efficiently using standard convex programming techniques. Simulation results demonstrate that the proposed method generates dynamically feasible, collision-free trajectories that satisfy temporal mission goals, providing a scalable and reliable approach for autonomous UAV-based medical logistics.