arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部类别2
2603.28015 2026-06-07 cs.AI

What an Autonomous Agent Discovers About Molecular Transformer Design: Does It Transfer?

自主代理在分子变换器设计中发现什么:它是否能够迁移?

Edward Wijaya

AI总结 研究通过自主架构搜索测试分子序列是否受益于不同设计,发现SMILES序列优化学习率优于搜索,自然语言改进显著,蛋白质居中,不同领域创新可迁移,表明差异源于搜索路径而非生物学需求。

详情
Comments
arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission
AI中文摘要

深度学习模型在药物分子和蛋白质中广泛复用自然语言设计的变换器架构,但分子序列是否受益于不同设计尚未系统测试。我们通过自主架构搜索在三种序列类型(SMILES、蛋白质和英语文本)上运行3106次实验。对于SMILES,架构搜索无效益:仅调优学习率和调度优于完整搜索(p=0.001)。自然语言中架构变化推动81%的改进(p=0.009)。蛋白质介于两者之间。令人惊讶的是,尽管代理在不同领域发现不同架构(p=0.004),所有创新在三个领域迁移时降级小于1%,表明差异反映搜索路径依赖而非基本生物学需求。我们发布了一个决策框架和开源工具包,供分子建模团队在自主架构搜索和简单超参数调优之间选择。

英文摘要

Deep learning models for drug-like molecules and proteins overwhelmingly reuse transformer architectures designed for natural language, yet whether molecular sequences benefit from different designs has not been systematically tested. We deploy autonomous architecture search via an agent across three sequence types (SMILES, protein, and English text as control), running 3,106 experiments on a single GPU. For SMILES, architecture search is counterproductive: tuning learning rates and schedules alone outperforms the full search (p = 0.001). For natural language, architecture changes drive 81% of improvement (p = 0.009). Proteins fall between the two. Surprisingly, although the agent discovers distinct architectures per domain (p = 0.004), every innovation transfers across all three domains with <1% degradation, indicating that the differences reflect search-path dependence rather than fundamental biological requirements. We release a decision framework and open-source toolkit for molecular modeling teams to choose between autonomous architecture search and simple hyperparameter tuning.

2602.10163 2026-06-07 q-bio.QM

Beyond SMILES: Evaluating Agentic Systems for Drug Discovery

超越SMILES:评估药物发现中的代理系统

Edward Wijaya

AI总结 评估药物发现代理系统在肽药物、体内药理学和资源受限环境中的泛化能力,发现五个能力缺口,并提出下一代框架的设计要求和能力矩阵。

详情
Comments
arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission
AI中文摘要

药物发现中的代理系统已展现出自主合成规划、文献挖掘和分子设计能力。我们探讨这些系统在不同任务类别的泛化能力。通过评估六个框架在肽药物、体内药理学和资源受限环境下的15个任务类别,发现五个能力缺口:不支持蛋白质语言模型或肽特异性预测、体内与体外数据无桥梁、依赖LLM推理但无ML训练或强化学习路径、假设与大药企资源绑定、单目标优化忽略安全-疗效-稳定性权衡。一项配对知识探测实验表明瓶颈是架构而非认知问题:四个前沿LLM对肽的理解水平与小分子相当,但无框架暴露此能力。我们提出下一代框架的设计要求和能力矩阵,使其在现实约束下作为计算伙伴发挥作用。

英文摘要

Agentic systems for drug discovery have demonstrated autonomous synthesis planning, literature mining, and molecular design. We ask how well they generalize. Evaluating six frameworks against 15 task classes drawn from peptide therapeutics, in vivo pharmacology, and resource-constrained settings, we find five capability gaps: no support for protein language models or peptide-specific prediction, no bridges between in vivo and in silico data, reliance on LLM inference with no pathway to ML training or reinforcement learning, assumptions tied to large-pharma resources, and single-objective optimization that ignores safety-efficacy-stability trade-offs. A paired knowledge-probing experiment suggests the bottleneck is architectural rather than epistemic: four frontier LLMs reason about peptides at levels comparable to small molecules, yet no framework exposes this capability. We propose design requirements and a capability matrix for next-generation frameworks that function as computational partners under realistic constraints.