2606.14510 2026-06-19 cs.LG q-bio.BM 新提交

PepALD: Macrocyclic Peptide Generation via Autoregressive Latent Diffusion

PepALD: 通过自回归潜在扩散生成大环肽

Junming Zhang, Siyu Yi, Wei Ju, Zhonghui Gu

发表机构 * College of Computer Science, Sichuan University（四川大学计算机科学学院）； School of Mathematics, Sichuan University（四川大学数学学院）； School of Artificial Intelligence, Sichuan University（四川大学人工智能学院）； Lingang Laboratory（临港实验室）

AI总结提出PepALD模型，结合自回归潜在扩散与化学嵌入，实现从头设计大环肽，并利用偏好优化提升亲和力，在生成质量和奖励优化上优于基线。

Comments 18 pages, 5 figures, 3 tables

详情

AI中文摘要

大环肽是细胞内靶点的有前景的治疗候选物，但其设计需要同时控制非天然单体化学、环拓扑、膜通透性和靶点结合。现有的SMILES或HELM字符串生成模型要么在长原子级序列空间中操作，要么将单体视为具有有限化学基础符号化令牌。我们引入了PepALD，一个用于从头生成大环肽的自回归潜在扩散（ALD）基础模型。该模型使用结构化化学嵌入表示HELM单体，通过在化学信息潜在空间中的上下文条件扩散生成每个残基，在自回归生成过程中预测R基团感知的环闭合，并使用胜者保护的扩散自适应偏好优化将去噪器与亲和力奖励对齐。体外实验表明，PepALD在生成质量和奖励优化性能上优于代表性肽生成基线。

英文摘要

Macrocyclic peptides are promising therapeutic candidates for intracellular targets, but their design requires simultaneous control over non-natural monomer chemistry, ring topology, membrane permeability, and target binding. Existing SMILES- or HELM-string generative models either operate in long atom-level sequence spaces or treat monomers as symbolic tokens with limited chemical grounding. We introduce PepALD, an Autoregressive Latent Diffusion (ALD) foundation model for \textit{de novo} macrocyclic peptide generation. The model represents HELM monomers with structured chemical embeddings, generates each residue through context-conditioned diffusion in chemically informed latent space, predicts R-group-aware ring closures during autoregressive generation, and aligns the denoiser to affinity rewards using winner-protected diffusion-adapted preference optimization. In silico experiments demonstrate PepALD's generation quality and reward-optimization performance against representative peptide generation baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.14031 2026-06-19 cs.AI 新提交

Applicability Condition Extraction for Therapeutic Drug-Disease Relations

治疗性药物-疾病关系的适用条件提取

Guanting Luo, Noriki Nishida, Yuji Matsumoto, Yuki Arase

发表机构 * The University of Osaka（大阪大学）； RIKEN（理化学研究所）； Institute of Science Tokyo（东京科学大学）； Tohoku University（东北大学）

AI总结提出从生物医学文献中提取药物-疾病治疗关系适用条件的任务，构建首个手动标注数据集，并改进LoRA方法以考虑药物与疾病间关系，在多个评估设置中优于基线。

Comments Accepted to Findings of ACL 2026

详情

AI中文摘要

识别某种药物对目标疾病产生治疗效果的适用条件对于临床决策支持至关重要。然而，现有的大多数生物医学信息提取方法仅关注识别药物与疾病之间的关系，而很大程度上忽略了这些关系适用的上下文特定条件。为解决这一问题，我们引入了从生物医学研究文献中提取治疗性药物-疾病关系适用条件的任务。我们创建了首个数据集，在生物医学论文摘要上手动标注了药物、疾病和适用条件的三元组，包含1,119个药物-疾病对。利用该数据集，我们系统评估了一系列现有方法的性能。此外，我们提出了一种新方法，增强LoRA以考虑药物与疾病之间的关系。我们的方法在不同评估设置中均优于强基线。本文的源代码和数据集可从以下网址获取：this https URL

英文摘要

Identifying conditions that a certain drug takes therapeutic effect on a target disease is crucial for clinical decision-making support. However, most existing biomedical information extraction methods have focused on identifying only relations between drugs and diseases, while largely overlooking the context-specific conditions where such relations can apply. To address this problem, we introduce the task of applicability condition extraction for therapeutic drug-disease relations from biomedical research literature. We create the first dataset that has manually annotated triples of drugs, diseases, and applicability conditions on biomedical paper abstracts with 1,119 drug-disease pairs. Using this dataset, we systematically evaluate the performance of a range of existing methods. In addition, we propose a new method that enhances LoRA to consider relations between drugs and diseases. Our method consistently outperforms strong baselines across different evaluation settings.

URL PDF HTML ☆

赞 0 踩 0

2606.11537 2026-06-19 cs.AI cs.CE 新提交

MoCA-Agent: A Market-of-Claims Code Agent for Financial and Numerical Reasoning

MoCA-Agent: 一种用于金融和数值推理的声明市场代码智能体

Abdelrahman Abdallah, AbdelRahim A. Elmadany, Sameh Al Natour, Hasan Cavusoglu, Adam Jatowt, Muhammad Abdul-Mageed

发表机构 * University of Innsbruck（因斯布鲁克大学）； University of British Columbia（不列颠哥伦比亚大学）； Toronto Metropolitan University（多伦多都会大学）

AI总结提出MoCA-Agent，通过声明级验证和代码生成解决金融表格问答中的数值推理错误，在十个基准上取得强性能。

详情

AI中文摘要

金融和表格问答不仅需要流畅的推理：答案必须基于支持它们的确切事实、公式、单位、符号和尺度。单个误读的单元格或错误操作可能会悄无声息地产生看似合理但错误的结果。我们引入了 \textsc{MOCA-Agent}，一种声明市场代码智能体，它用声明级验证取代了自由形式的多智能体辩论。该系统将每个问题分解为类型化的原子声明，要求专业交易智能体买入或卖出这些声明，将其订单清算为置信度加权的接受/拒绝决策，并从市场支持的证据中合成可执行的Python程序。然后，一个代码感知验证器检查程序的执行、结构一致性和常见的金融推理错误，最多进行一次市场感知修复轮次。在涵盖金融数值推理、通用表格推理、ESG问答和多模态图表推理的十个公开基准上，\textsc{MOCA-Agent} 使用固定的 Qwen3.6-27B 骨干网络实现了强劲性能，包括在 FinQA 上达到 78.3%，在 FinanceMath 上达到 76.0%，在 MultiHiertt 上达到 71.2%，在 ESGenius 上达到 86.9%，以及在 FinChart-Bench 上平均达到 85.6%。这些结果表明，在原子声明级别聚合证据，而不是整个答案，提高了高风险数值推理的鲁棒性。\footnote{代码和数据可在以下网址获取：this https URL。}

英文摘要

Financial and tabular question answering requires more than fluent reasoning: answers must be grounded in the exact facts, formulas, units, signs, and scales that support them. A single misread cell or incorrect operation can silently produce a plausible but wrong result. We introduce \textsc{MOCA-Agent}, a market-of-claims code agent that replaces free-form multi-agent debate with claim-level verification. The system decomposes each question into typed atomic claims, asks specialist trader agents to buy or sell those claims, clears their orders into confidence-weighted accept/reject decisions, and synthesizes an executable Python program from market-supported evidence. A code-aware verifier then checks the program for execution, structural consistency, and common financial reasoning errors, with at most one market-aware repair round. Across ten public benchmarks spanning financial numerical reasoning, general tabular reasoning, ESG question answering, and multimodal chart reasoning, \textsc{MOCA-Agent} achieves strong performance using a fixed Qwen3.6-27B backbone, including $78.3\%$ on FinQA, $76.0\%$ on FinanceMath, $71.2\%$ on MultiHiertt, $86.9\%$ on ESGenius, and $85.6\%$ average on FinChart-Bench. These results show that aggregating evidence at the level of atomic claims, rather than whole answers, improves robustness in high-stakes numerical reasoning.\footnote{The code and data are available: https://github.com/UBC-NLP/MoCA-Agent.

URL PDF HTML ☆

赞 0 踩 0

2606.10688 2026-06-19 cs.RO 新提交

Self-Supervised Relevance Modelling in Autonomous Driving via Counterfactual Analysis

自动驾驶中基于反事实分析的自监督相关性建模

Luca Lusvarghi, Javier Gozalvez, Pablo Urbano Hidalgo

发表机构 * Networked Systems Lab, Universidad Miguel Hernandez de Elche（网络系统实验室，米格尔·希内斯·埃尔切大学）

AI总结提出一种基于反事实分析的自监督方法，用于量化自动驾驶中物体的相关性，实现毫秒级实时估计，并生成相关性热图以辅助感知与规划。

详情

AI中文摘要

自动驾驶依赖于计算密集型的感知管线，以持续检测和跟踪周围环境中的物体。虽然某些物体对于规划安全有效的操作至关重要，但其他物体可能不相关，并且对自动驾驶车辆的驾驶决策没有影响。关注相关物体可以更有效地利用可用计算资源，减少处理延迟，并限制感知噪声的下游传播。在这项工作中，我们提出了一种基于反事实分析的新型自监督方法，以开发相关性模型——一种基于AI的工具，用于量化物体对自动驾驶车辆的相关性。为了展示所提出方法的潜力，我们在选定城市场景中生成的合成因果数据集上训练了相关性模型。结果表明，该相关性模型能够以毫秒级延迟准确估计物体的相关性，从而在高密度场景中实现实时相关性估计。我们还展示了该相关性模型可用于构建相关性热图，为自动驾驶车辆的驾驶策略提供有价值的见解，并可用于主动通知感知和规划任务。我们公开发布了相关性模型和因果数据集。

英文摘要

Autonomous driving relies on computationally intensive perception pipelines to continuously detect and track objects in the surrounding environment. While some objects are key to plan safe and effective maneuvers, others may not be relevant and have no impact on the autonomous vehicle's driving decisions. Focusing on relevant objects allows a more efficient usage of available computational resources, reduces processing latencies, and limits the downstream propagation of perception noise. In this work, we propose a novel self-supervised approach based on counterfactual analysis to develop a relevance model - an AI-based tool that quantifies the relevance of objects for an autonomous vehicle. To demonstrate the potential of the proposed approach, we train a relevance model on a synthetic causal dataset generated in a selected urban scenario. Results show that the relevance model is able to accurately estimate the objects' relevance with millisecond-level latency, enabling real-time relevance estimation also in high-density scenarios. We also show that the relevance model can be used to build relevance heatmaps that offer valuable insights into the autonomous vehicle's driving policy and can be used to proactively inform perception and planning tasks. We openly release both the relevance model and the causal dataset.

URL PDF HTML ☆

赞 0 踩 0

2606.10616 2026-06-19 cs.AI 新提交

模糊任务上的扩散AI控制

Mikhail Terekhov, Caglar Gulcehre, Vivek Hebbar, Joe Benton

发表机构 * Anthropic Fellows Program (via MATS)（Anthropic 研究员计划（通过 MATS））； EPFL（洛桑联邦理工学院）； Redwood Research（红木研究）； Anthropic

AI总结针对AI在模糊任务上的长期扩散威胁，提出蓝队与红队对抗框架，通过弱模型评分训练强模型，并发现红队可利用多目标进化提示优化找到评分高但性能差的子版本行为，蓝队则通过对抗优化提升鲁棒性。

详情

AI中文摘要

部署在关键领域（如AI安全研究）的AI模型可能因对齐问题而微妙地破坏我们的努力。扩散AI控制是AI安全的一个子领域，旨在减轻长期部署范围内AI破坏（扩散威胁）带来的风险。这些风险在模糊任务上尤其有害，即难以评分或需要直觉的任务。为了理解模糊任务上的扩散威胁，我们引入了一个新颖的框架，将AI控制视为蓝队和红队之间的对抗游戏。蓝队使用一个弱可信模型构建一个弱评分，据此训练一个强大的、可能具有颠覆性的模型，以消除如果存在的颠覆倾向。然后红队试图找到被弱评分高评价的模型行为，这些行为可能不会被训练掉，但实际上对应着差的表现。我们在为近期ML论文的研究问题撰写实验提案的任务上测试了我们的框架。我们使用一个能够访问原始论文的语言模型作为代理“真实”评分器。我们的红队使用多目标进化提示优化发现了子版本行为。我们展示了Opus 4.6可以写出比GPT-OSS-20B更差的提案（根据真实代理评分），而弱评分器却将其评为与Opus 4.6最佳提案一样高。为了缓解威胁，我们为蓝队提出了一种对抗优化算法，该算法为弱模型发现更鲁棒的提示。该算法产生的蓝队提示，我们的红队优化未能利用。

英文摘要

AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition. To understand diffuse threats on fuzzy tasks, we introduce a framework that considers AI control as an adversarial game between a blue team and a red team. The blue team uses a weak trusted model to construct a weak score against which they would train a strong, potentially subversive model to remove the subversion propensity if it were present. The red team then tries to find model behaviors that are rated highly by the weak score, and thus might not be trained out, but actually correspond to poor performance. We test our framework on the task of writing experimental proposals for research questions from recent ML papers. We use a language model with access to the original paper as a proxy "ground-truth" scorer. Our red team discovers subversive behaviors using multi-objective evolutionary prompt optimization. We show that Opus~4.6 can write proposals that are worse according to the ground truth proxy than those of GPT-OSS-20B, while the weak scorer rates them as highly as the best proposals from Opus 4.6. We then propose an adversarial optimization algorithm for the blue team that discovers more robust prompts for the weak model. This algorithm produces a blue team prompt that our red team optimization fails to exploit.

URL PDF HTML ☆

赞 0 踩 0

2606.07822 2026-06-19 cs.CL cs.AI cs.LG 新提交

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

ACUTE协议：操作语言模型激活以实现更好的校准、效用和信任

Nishant Subramani, Palash Goyal, Yiwen Song, Mani Malek, Yuan Xue, Tomas Pfister, Hamid Palangi

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Google（谷歌）； Scale AI

AI总结提出ACUTE协议，通过操作语言模型激活来估计置信度，平衡校准与信息性，在多项选择问答、工具调用和科学文档摘要等任务上优于强基线，提升校准、效用和可信度。

Comments ICML 2026

详情

AI中文摘要

随着语言模型的改进并越来越多地部署以解决各种任务，可信度变得至关重要。校准是信任的良好代理：良好校准的置信度估计有助于在信任特定模型输出时告知风险与回报的权衡。不幸的是，即使模型改进，它们仍然校准不良，往往偏向过度自信。此外，校准可能被操纵：总是预测基率的策略是完美校准的，但完全没有信息性。为了解决这个问题，我们开发了一个新指标，即通过预言机重新归一化的期望效用（EURO），它平衡了校准和信息性。我们还提出了一种通用的基于激活的置信度、效用和信任估计协议（ACUTE），以适当裁决不确定性。ACUTE协议为4个模型家族的6个模型上的3个任务（包括多项选择问答、工具调用和科学文档摘要）提供了灵活、样本高效和计算高效的置信度估计器。ACUTE在EURO上优于强基线，同时保持较低的校准误差。综合来看，我们的工作表明，为LLM配备ACUTE协议可以在多种设置中提高校准、效用和可信度。

英文摘要

As language models improve and become increasingly deployed to solve a variety of tasks, trustworthiness becomes essential. Calibration is a good proxy for trust: well-calibrated confidence estimates help inform the risk versus reward tradeoff when trusting a specific model output. Unfortunately, even as models improve, they remain poorly calibrated, often biasing towards overconfidence. Additionally, calibration can be gamed: a policy that always predicts the base rate is perfectly calibrated, but completely uninformative. To resolve this, we develop a new metric, expected utility renormalized by the oracle (EURO), that balances calibration and informativeness. We also propose a general-purpose activation-based confidence, utility, and trust estimation protocol (ACUTE) to appropriately adjudicate uncertainty. The ACUTE protocol provides flexible, sample-efficient, and compute-efficient confidence estimators for 3 tasks including multiple choice question answering, tool-calling, and scientific document summarization across 6 models from 4 model families. ACUTE outperforms strong baselines on EURO, while maintaining low calibration error. Taken together, our work shows that equipping LLMs with the ACUTE protocol can improve calibration, utility, and trustworthiness in numerous settings.

URL PDF HTML ☆

赞 0 踩 0

2606.05846 2026-06-19 cs.CL eess.AS 版本更新

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

迈向真正的多语言ASR：将代码切换ASR泛化到未见语言对

Gio Paik, Hyunseo Shin, Soungmin Lee

发表机构 * University of Tokyo（东京大学）

AI总结通过模型合并和领域泛化方法，研究从有限语言对中学到的代码切换能力能否泛化到未见语言对，实验表明双语CS-ASR模型对未见语言对有一定泛化能力但有限。

Comments ICML 2026 Workshop on Machine Learning for Audio

详情

AI中文摘要

自动语音识别（ASR）已成为人机交互的关键技术。然而，由于跨多种语言对的代码切换（CS）语音资源严重稀缺，代码切换ASR（CS-ASR）仍然特别具有挑战性。现有方法主要通过合成CS语音生成或在有限双语数据集上进行特定语言对微调来提高CS-ASR性能。然而，这些方法面临固有的可扩展性限制，因为对CS的支持必须针对语言对单独开发，而语言对的数量随支持的语言数量呈组合增长。在这项工作中，我们研究通过模型合并和领域泛化方法，从一组有限的已见语言对中学到的CS能力是否可以泛化到未见语言对。我们的实验表明，合并的双语CS-ASR模型对未见语言对有一定程度的泛化，表明双语CS能力在语言对之间的迁移有限。

英文摘要

Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR performance through synthetic CS speech generation or pair-specific fine-tuning on limited bilingual datasets. Nevertheless, these approaches face an inherent scalability limitation, as support for CS must be developed separately for language pairs whose number grows combinatorially with the number of supported languages. In this work, we investigate whether CS capabilities learned from a limited set of seen language pairs can generalize to unseen language pairs through model merging and domain generalization methods. Our experiments show that merged bilingual CS-ASR models modestly generalize to unseen language pairs, suggesting limited transfer of bilingual CS capabilities across language pairs.

URL PDF HTML ☆

赞 0 踩 0

2606.05833 2026-06-19 cs.CV cs.AI 版本更新

集成探索感知的无人机路径优化与轨迹规划

Jimin Choi, Grant Stagg, Cameron K. Peterson, Max Z. Li

发表机构 * Department of Aerospace Engineering, University of Michigan（密歇根大学航空航天工程系）； Department of Electrical Engineering, Brigham Young University（BYU 电子工程系）； Department of Aerospace Engineering, Department of Civil and Environmental Engineering, and Department of Industrial and Operations Engineering, University of Michigan（密歇根大学航空航天工程系、土木与环境工程系和工业与运营管理工程系）

AI总结提出一种集成探索感知的无人机路径优化与轨迹规划框架，通过风险地图、不确定兴趣区域建模、B样条轨迹优化和在线重规划，在灾害监测中平衡报告点访问与新信息探索，实现平均KL散度降低15.9%。

详情

AI中文摘要

无人机越来越多地用于危险环境（如灾区、污染场地、野火区域和受损基础设施）中的探索驱动监测，此时有限的飞行续航必须在访问报告位置和收集新信息之间分配。在这些场景中，关于危险的先验信息通常不完整、空间不精确，并且在执行过程中可能发生变化。例如，初始报告可能识别出危险可能存在的区域，但实际危险可能被移动、部分观察到或完全未被报告。我们提出了一种集成的探索感知无人机路径优化与轨迹规划框架，用于在不确定和演变的先验信息下进行危险监测。环境被表示为空间风险地图，每个位置都有相关的危险状况信念。报告的危险被建模为不确定的兴趣区域（ROI），而不是确认的目标位置，要求无人机在检查报告区域的同时，利用有限的飞行续航探索信息丰富的区域。所提出的方法解决了报告ROI上的车辆路径问题，通过辅助伪节点增强路径以改善空间覆盖，将剩余飞行距离预算分配到路径段，并优化局部探索的动态可行B样条轨迹。在执行过程中，无人机测量更新基于网格的信念地图，当新信息和剩余预算证明调整合理时，对剩余轨迹进行重规划。在48种场景配置中，在线重规划相比离线优化规划器平均KL散度降低15.9%，相比直线遍历降低48.6%。

英文摘要

Uncrewed aerial vehicles (UAVs) are increasingly used for exploration-driven monitoring in hazardous environments such as disaster zones, contaminated sites, wildfire areas, and damaged infrastructure, where limited flight endurance must be allocated between visiting reported locations and gathering new information. In these settings, prior information regarding hazards is often incomplete, spatially imprecise, and subject to change during execution. For example, initial reports may identify a region where a hazard is likely to exist, but the actual hazard may be displaced, partially observed, or entirely unreported. We present an integrated exploration-aware UAV route optimization and path planning framework for hazard monitoring under uncertain and evolving prior information. The environment is represented as a spatial risk map, where each location has an associated belief of hazardous conditions. Reported hazards are modeled as uncertain regions of interest (ROIs) rather than confirmed target locations, requiring the UAV to inspect reported areas while also using its limited flight endurance to explore informative regions. The proposed method solves a vehicle routing problem over reported ROIs, augments the route with auxiliary pseudo-nodes to improve spatial coverage, allocates the remaining flight distance budget across route segments, and optimizes dynamically feasible B-spline trajectories for local exploration. During execution, UAV measurements update a grid-based belief map, and the remaining trajectory is replanned when new information and the remaining budget justify adaptation. Across 48 scenario configurations, online replanning improves average KL reduction by 15.9% over the offline optimized planner and 48.6% over straight-line traversal.

URL PDF HTML ☆

赞 0 踩 0

2605.26891 2026-06-19 cs.CL 版本更新

Telenor Nordics Customer Service self-help corpus

Telenor Nordics 客户服务自助语料库

Mike Riess

发表机构 * Research and Innovation, Telenor Group（Telenor集团研究与创新）

AI总结本文构建了一个包含芬兰语、丹麦语、挪威语和瑞典语的多语言客户服务自助语料库，共1122篇文档，用于支持北欧NLP和信息检索研究。

Comments 8 pages, 2 figures, 5 tables. Submitted to Nordic Machine Intelligence. Dataset: https://zenodo.org/records/19493152

详情

AI中文摘要

SimuWoB: 模拟真实世界移动应用以实现快速且保真的GUI智能体基准测试

Guohong Liu, Jialei Ye, Pengzhi Gao, Wei Liu, Jian Luan, Yunxin Liu, Yuanchun Li

发表机构 * Institute for AI Industry Research (AIR), Tsinghua University（人工智能产业研究院（AIR），清华大学）； University of Electronic Science and Technology of China（电子科技大学）； MiLM Plus, Xiaomi Inc.（小米公司MiLM Plus团队）

AI总结针对现有移动GUI智能体基准测试与现实应用之间的差距，提出全合成基准SimuWoB，通过鲁棒的虚拟环境生成框架合成高保真任务和环境，自动提供有效奖励，实现对复杂长程交互的高效可重复评估。

详情

AI中文摘要

由大型语言模型驱动的移动GUI智能体发展迅速，迫切需要真实且全面的评估。现有基准测试优先考虑可重复性，但通常局限于开源应用或文件操作任务，因为在实际应用中构建奖励困难，导致基准设置与现实使用之间存在差距。此外，大多数基准测试侧重于基本定位和导航，对复杂长程交互的覆盖有限。为解决这些局限性，我们引入了SimuWoB，一个全合成的移动GUI智能体基准测试，包含120个涵盖不同类型和难度级别的挑战性任务。我们构建了一个鲁棒的虚拟环境生成框架，合成高保真任务和环境，并为每个任务自动提供有效奖励。每个环境都部署为可通过URL访问的无后端网页，实现高效且可重复的评估。我们对几个最先进的移动GUI智能体进行了全面实验。平均成功率仅为27.92%，在长程任务上降至17.82%，揭示了当前智能体在复杂场景下的显著弱点。与真实世界样本任务的评估结果比较表明，基于我们合成环境的智能体评估具有良好的泛化性。我们进一步提供了关键能力维度的诊断见解，并讨论了对未来移动GUI智能体开发的启示。

英文摘要

GUI agents powered by large language models are advancing rapidly, creating urgent needs for evaluation and training based on realistic environments. However, directly doing so in real-world environments introduces some challenges that cannot be overlooked. Real-world environments are complex and uncontrollable, making it difficult to construct verifiable rewards and to save or reset states. Existing works prioritize reproducibility but are often limited to open-source apps or file-operation tasks for reliable reward building, leaving a persistent gap from real-world usage. Furthermore, relying on virtual machines or docker images demand high resource requirements and suffer from slow response speeds, which limit the efficiency. We present \sys, a framework that could produce high-fidelity synthesized interactive environments for GUI agents across platforms with verifiable rewards. These environments behave as backend-free webpages accessible via URL, requiring near-zero setup and low resource cost, making the approach suitable for both large-scale evaluation and downstream agent training. We support multiple GUI platforms including mobile, desktop, and automotive/in-vehicle interfaces based on the same pipeline, covering 100+ environments and 1000+ verifiable tasks. Among them, 120 challenging tasks across 63 simulated mobile applications are released as a fully synthesized mobile GUI agent benchmark. Experiment results on five state-of-the-art mobile GUI agents reveal substantial headroom -- the average success rate is only 27.92\%, dropping to 17.82\% on long-horizon subset -- while humans reach 92.08\%. A comparison against real-world sample tasks shows that assessments made in our synthetic environments generalize to real apps. The project website is at https://scalewob.github.io.

URL PDF HTML ☆

赞 0 踩 0

2605.25005 2026-06-19 cs.RO 版本更新

Stiffness Optimization for Concentrated Bending in Magnetically Actuated Catheters: Maintaining Steerability under Gradient Stiffness

磁驱动导管集中弯曲的刚度优化：在梯度刚度下保持可操控性

Jiewen Tan, Junnan Xue, Shing Shin Cheng, Shuang Song, Erli Lyu, Jiaole Wang

发表机构 * Harbin Institute of Technology (Shenzhen)（哈尔滨工业大学（深圳））； The Chinese University of Hong Kong（香港中文大学）； Macao Polytechnic University（澳门理工学院）

AI总结针对磁驱动软导管在推送性与近端集中弯曲之间的权衡，提出一种刚度优化的多段磁驱动导管（SO-MAC），通过解耦转向-推进机构和梯度刚度架构，在推进过程中实现稳定的近端枢轴弯曲，同时远端被动自直以传递推进力。

详情

AI中文摘要

对于磁驱动软导管，实现高效的推送性（推进力传递）和近端集中弯曲以保持可操控性具有挑战性：较高的轴向/弯曲刚度可改善力传递但降低可操控性，而较低的刚度可实现大的近端集中弯曲，但在压缩推送载荷下增加扭结/屈曲风险。为了解决这一权衡，我们提出了一种刚度优化的多段磁驱动导管（SO-MAC），它集成了解耦的转向-推进机构与梯度刚度架构。SO-MAC在推进过程中将弯曲集中在稳定的近端枢轴周围，而远端部分通过优化的刚度分布和弹簧骨架的弹性恢复抵抗摩擦引起的扭结/屈曲，被动自直以传递推进力。在$0{-}180^{\circ}$的组合转向和推进过程中，枢轴保持稳定，远端尖端几乎直线地向目标方向推进。直径为1.5 mm的SO-MAC在其10 mm尖端处实现了高达$180^{\circ}$的转向，弯曲半径为3 mm，平均形状误差为$1.39 \pm 0.56$ mm，转向枢轴误差为$0.35 \pm 0.10$ mm。在支气管模型中的视觉反馈控制进一步验证了通过高度弯曲的分叉路径的鲁棒导航。

英文摘要

Achieving both efficient pushability (propulsion transmission) and proximally concentrated bending for steerability is challenging for magnetically actuated soft catheters: higher axial/bending stiffness improves force transmission but reduces steerability, whereas lower stiffness enables large, proximally concentrated bending yet increases kinking/buckling risk under compressive push loads. To address this trade-off, we propose a stiffness-optimized multi-segment magnetically actuated catheter (SO-MAC) that integrates a decoupled steering-advancement mechanism with a gradient-stiffness architecture. The SO-MAC concentrates bending about a stable proximal pivot during advancement while the distal section passively self-straightens to transmit propulsion, aided by the optimized stiffness distribution and elastic recovery of the spring backbone against friction-induced kinking/buckling. Over $0{-}180^{\circ}$ combined steering and advancement, the pivot remained stable and the distal tip advanced near-straight toward the target direction. A 1.5 mm-diameter SO-MAC achieved up to $180^{\circ}$ steering with a 3 mm bending radius at its 10 mm tip, with an average shape error of $1.39 \pm 0.56$ mm and a steering-pivot error of $0.35 \pm 0.10$ mm. Visual feedback control in a bronchial phantom further confirmed robust navigation through highly curved, bifurcating paths.

URL PDF HTML ☆

赞 0 踩 0