arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1670
专题追踪
2605.15213 2026-05-18 cs.IR cs.AI

An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations

Yibin Wang, Yanjie Yang, Grace Melo Guerrero, Rodolfo M. Nayga, Azlan Zahid

发表机构 * Department of Biological and Agricultural Engineering, Texas A&M AgriLife Research(生物与农业工程系,德克萨斯A&M农业生命研究)

AI总结 该研究提出了一种基于健康饮食指数(HEI)的检索增强生成(RAG)框架,用于生成个性化的健康饮食推荐。该方法结合标准化营养数据库和大语言模型,通过构建食物嵌入空间并计算HEI评分,为用户提供符合健康标准的个性化饮食建议。实验结果表明,该方法能有效提升用户的HEI得分,提高饮食质量。

详情
英文摘要

Diet quality is a leading determinant of chronic disease risk. Advances in artificial intelligence (AI) have enabled food recommendation systems to adapt suggestions to user preferences and health goals. However, most current systems rely on loosely curated food databases and provide limited connection to a validated index. In this study, we propose a Healthy Eating Index (HEI) informed retrieval-augmented generation (RAG) framework that combines standardized nutrition databases with large language models (LLMs) for personalized food recommendations. Our proposed method anchors retrieval in the National Health and Nutrition Examination Survey (NHANES) and the Food Patterns Equivalents Database (FPED). A food-level embedding space is constructed from FPED-derived textual descriptions. For each entity, the system computes baseline HEI scores, retrieves candidate foods for intake recommendations, and estimates the HEI impact of simple substitutions or additions. A constrained RAG pipeline instantiated with a pretrained OpenAI LLM generates personalized recommendations and sources based on nutrient profiles and HEI contributions. The simulation results showed a mean HEI improvement of 6.45, with the proportion of users HEI over 50 increasing from 45.12 to 61.26. Quantile analysis revealed consistent improved shifts across the HEI distribution. Our findings suggest that the proposed LLM-RAG-based AI systems can support more precise, explainable, and personalized nutrition guidance to improve diet quality.

2605.15203 2026-05-18 cs.IR cs.AI cs.MA

Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

Jinze Wang, Yangchen Zeng, Tiehua Zhang, Lu Zhang, Yuze Liu, Yongchao Liu, Xingjun Ma, Zhu Sun

发表机构 * Tongji University(同济大学) Swinburne University of Technology(斯威本理工大学) Southeast University(东南大学) Chengdu University of Information Technology(成都信息工程大学) Fudan University(复旦大学) Singapore University of Technology and Design(新加坡科技设计大学)

AI总结 本文提出了一种名为 Agent4POI 的新型兴趣点(POI)推荐框架,其核心在于推荐时动态生成与上下文条件相关的多模态表示,而非依赖于预计算的静态 POI 嵌入。该方法通过一个四阶段的大型语言模型代理,根据情境上下文生成动态的、场景特定的“可利用性”查询,并结合图像、评论和元数据进行跨模态推理,最终生成结构化且考虑不确定性的可利用性表示,从而提升推荐的准确性和适应性。实验表明,Agent4POI 在多个基准数据集和评估场景中均优于现有方法,尤其在冷启动和上下文变化场景下表现突出。

详情
英文摘要

We introduce Agent4POI, the first POI recommendation framework that generates context-conditioned multimodal representations at recommendation time, rather than relying on static POI embeddings pre-computed independently of context. Existing multimodal systems encode each POI once as a static embedding, a design that precludes reasoning about why the same cafe affords solo work on Monday but group celebration on Friday evening. We formally prove that no pre-computed encoder can satisfy context-sensitive ranking under standard bilinear scoring, motivating inference-time item-side representation. Agent4POI inverts this computation: given a situational context, a four-phase LLM agent generates dynamic, context-specific affordance queries (Phase 1) and executes a five-step cross-modal chain-of-thought over image, review, and metadata evidence (Phase 2). The resulting uncertainty-aware affordance representation is grounded in Gibsonian affordance theory. These cross-modal verdicts form a structured, uncertainty-adjusted affordance representation (Phase 3), which is aligned with user preferences via a semantic caching system for low-latency ranking (Phase 4). On three POI benchmarks and three evaluation configurations (standard, cold-start, context-shift), Agent4POI achieves a 23.2% relative gain over the strongest baseline and degrades by only 7.5% under context-shift versus 16--17\% for the strongest baselines. In cold-start scenarios, Agent4POI outperforms the best content-based baseline by up to 2.4x, whereas ID-based methods fail to generalize.

2605.14859 2026-05-18 cs.CR cs.AI

Do Coding Agents Understand Least-Privilege Authorization?

Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng, Ethan Qin, Jiannan Guan, Jinhao Liu, Qiming Yu, Yixin Yuan, Fanqing Meng, Carl Che, Mengkang Hu

发表机构 * Evolvent AI Research Team(Evolvent AI研究院)

AI总结 随着代码代理越来越多地访问系统外壳、代码仓库和用户文件,最小权限授权成为安全部署的必要条件。本文研究当前模型是否能自行推断出权限边界,提出权限边界推理任务,并构建了包含120个真实终端任务的AuthBench基准测试集。研究发现,现有模型在权限分配上常出现遗漏必要权限或授予多余权限的问题,且增加推理时间并不能有效解决这一问题。为此,作者提出一种“充分性-紧致性分解”方法,通过任务前向模拟生成覆盖性策略,并对每个授予的权限进行审查,显著提升了模型在敏感任务中的成功率并降低了攻击成功的可能性。

详情
英文摘要

As coding agents gain access to shells, repositories, and user files, least-privilege authorization becomes a prerequisite for safe deployment: an agent should receive enough authority to complete the task, without unnecessary authority that exposes sensitive surfaces. To study whether current models can infer this boundary themselves, we first introduce permission-boundary inference, where a model maps a task instruction and terminal environment to a file-level read/write/execute policy, and AuthBench, a benchmark of 120 realistic terminal tasks with human-reviewed permission labels and executable validators for utility and attack outcomes. AuthBench shows that authorization is not a simple conservative-versus-permissive calibration problem: frontier models often omit permissions required by the execution chain while also granting unused or sensitive accesses. Increasing inference-time reasoning does not resolve this mismatch. Instead, each model moves toward a model-specific authorization attractor: more reasoning makes it more consistent in its own failure mode, whether broad-but-exposed or tight-but-brittle. This suggests that direct policy generation is the bottleneck, because a single generation must both discover all necessary accesses and reject all unnecessary ones. We therefore propose Sufficiency-Tightness Decomposition, which first generates a coverage-oriented policy by forward-simulating the task and then audits each granted entry for grounding and sensitivity. Across tested models, this decomposition improves sensitive-task success by up to 15.8% on tightness-biased models while reducing attack success across all evaluated models.

2605.14716 2026-05-18 cs.GR cs.CV cs.LG

AnchorRoute: Human Motion Synthesis with Interval-Routed Sparse Contro

Pengcheng Fang, Tengjiao Sun, Dongjie Fu, Xiaoyu Zhan, Yanwen Guo, Hansung Kim, Xiaohao Cai

发表机构 * University of Southampton(索姆塞特大学) Mogo AI Ltd.(Mogo AI有限公司) Nanjing University(南京大学)

AI总结 AnchorRoute 是一种基于稀疏锚点的人体运动合成框架,通过用户指定的少量根位置、平面轨迹或身体点目标,生成完整的人体动作。该方法在生成阶段利用锚点生成条件特征,并注入到预训练的扩散模型中以保持生成质量,同时学习稀疏空间控制;在生成后阶段,通过锚点残差定义修正区间,结合软 token 更新进行精细化调整,从而在统一的锚点框架下实现生成与优化的结合。实验表明,AnchorRoute 在多种控制方式下均优于现有方法,生成动作更贴合锚点约束。

详情
英文摘要

Sparse anchors provide a compact interface for human motion authoring: users specify a few root positions, planar trajectory samples, or body-point targets, while the system synthesizes the full-body motion that completes the under-specified intent. We present AnchorRoute, a sparse-anchor motion synthesis framework that uses anchors as a shared scaffold for both generation and refinement. Before generation, AnchorRoute converts sparse anchors into anchor-condition features and injects the resulting condition memory into a frozen Transition Masked Diffusion prior through AnchorKV and dual-context conditioning. This preserves the generation quality of the pretrained text-to-motion prior while learning sparse spatial control. After generation, the same anchors are evaluated as residuals: their timestamps define refinement intervals, and their residuals determine where correction should be concentrated. RouteSolver then refines the motion by projecting soft-token updates onto anchor-defined piecewise-affine interval bases. This couples generation-time anchor conditioning with residual-routed refinement under one anchor scaffold. AnchorRoute supports root-3D, planar-root, and body-point control within the same formulation. In benchmark evaluations, AnchorRoute outperforms prior sparse-control methods under the sparse keyjoint protocol and consistently improves anchor adherence across control families. The results show that the learned anchor-conditioned generator and RouteSolver refinement are complementary: the generator preserves text-motion quality, while RouteSolver provides a controllable path toward stronger anchor adherence.

2605.13143 2026-05-18 cs.IT cs.LG math.IT

On the Generalization of Knowledge Distillation: An Information-Theoretic View

Bingying Li, Haiyun He

发表机构 * Internet of Things Thrust, Information Hub(物联网方向,信息中心) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 本文从信息论角度研究知识蒸馏的泛化能力,提出了一种基于耦合随机过程的理论框架,并定义了“蒸馏散度”作为衡量师生模型差异的指标。通过该框架,作者推导了学生模型相对于教师模型泛化差距的上下界,并进一步提出了一个考虑损失函数锐度的紧致界,揭示了教师模型局部平坦性对泛化性能的影响。在高斯线性模型的案例研究中,蒸馏散度被分解为偏差、方差和秩瓶颈成本,为蒸馏设计提供了可解释的指导。

Comments 18 pages

详情
英文摘要

Knowledge distillation is widely used to improve generalization in practice, yet its theoretical understanding remains elusive. In the standard distillation setting, a teacher model provides soft predictions to guide the training of a student model. We model teacher and student training as coupled stochastic processes and introduce a distillation divergence, defined as the Kullback-Leibler divergence between these two stochastic kernels. Within this framework, we derive two generalization bounds for the student model relative to the teacher's generalization gap: an upper bound under a sub-Gaussian assumption via algorithmic stability, and a lower bound under a central condition with sharper dependence on the distillation divergence. We further develop a loss-sharpness-aware bound with an explicit tightness regime, showing that the teacher's local flatness can strictly tighten the bound. Additionally, in a linear Gaussian case study, the distillation divergence admits an interpretable decomposition into bias, variance, and rank-bottleneck costs, yielding practical guidance for distillation design.

2605.09994 2026-05-18 cs.DC cs.LG

BatchWeave: A Consistent Object-Store-Native Data Plane for Large Foundation Model Training

Ting Sun, Junjie Zhang, Xiao Yan, Songxin Zhang, Zhuoyang Song, Jingyi Xi, Zunyao Mao, Bingyi Jing, Jiaxing Zhang, Zejian Xie

发表机构 * Lionrock AI Lab, China Merchants Group, Hong Kong, China(狮岩人工智能实验室,中国商人集团,香港,中国) Wuhan University, Wuhan, China(武汉大学,武汉,中国) The Chinese University of Hong Kong, Shenzhen, China(香港中文大学(深圳))

AI总结 随着大型基础模型(LFM)训练的发展,数据管道需要从静态的数据加载层转变为能够与训练过程动态协同的组件。现有系统在故障隔离和批量语义支持方面存在不足,BatchWeave 提出了一种基于对象存储的训练数据平面,通过版本化清单和条件对象写入协调批量发布与恢复。其核心方法包括事务性全局批次(TGB)、存储层直接实现的恢复与保留机制,以及无需生产者间通信的分布式自适应提交算法,显著提升了训练吞吐量和系统可靠性。

详情
英文摘要

Modern Large Foundation Model (LFM) training has transformed the data pipeline from a static ingestion layer into a dynamic component that must co-evolve with the training process. Existing systems are ill-equipped: colocated dataloaders offer no failure isolation, while message queue-based disaggregated dataloaders operate on a record/offset abstraction that cannot express the batch-level semantics required by distributed training. We present BatchWeave, an object-store-native training data plane for distributed LFM training. BatchWeave uses versioned manifests and conditional object writes to coordinate batch publication, recovery, and lifecycle management. First, it introduces the Transactional Global Batch (TGB), which builds on versioned-manifest ACID storage semantics and extends them with training-specific consistency, including atomic all-rank batch visibility, a globally ordered step sequence, checkpoint-aligned lifecycle management, and end-to-end exactly-once recovery. Second, it realizes recovery and retention directly in the storage layer, by durably persisting producer state through the commit protocol and tying reclamation to distributed checkpoint state. Third, its Decentralized Adaptive Commit (DAC) algorithm sustains stable ingestion throughput as the manifest grows, without any inter-producer communication. Evaluations on large-scale multimodal pre-training and SFT workloads using 64 GPUs show that BatchWeave outperforms colocated dataloader throughput while providing full failure isolation, outperforms Apache Kafka in ingestion throughput, and achieves lower consumer read latency than Kafka.

2605.09033 2026-05-18 cs.CR cs.AI

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

Yang Luo, Zifeng Kang, Tiantian Ji, Xinran Liu, Yong Liu, Shuyu Li, Lingyun Peng

发表机构 * Key Laboratory of Trustworthy Distributed Computing and Service (MoE)(可信分布式计算与服务重点实验室) Beijing University of Posts and Telecommunications(北京邮电大学) Zhongguancun Laboratory(中关村实验室)

AI总结 本文提出了一种针对基于图的智能体记忆的新型投毒攻击方法——ShadowMerge,通过利用关系通道冲突来影响智能体的行为。该方法通过构造恶意关系,使其与合法关系共享相同的查询激活锚点和关系通道,但携带冲突的值,从而在不影响正常任务的前提下成功注入有害信息。实验表明,ShadowMerge在多个真实数据集上取得了高达93.8%的攻击成功率,显著优于现有方法,并揭示了当前防御机制在应对此类攻击时的不足。

Comments Preprint. Corresponding authors: Zifeng Kang and Tiantian Ji. Code is available at https://anonymous.4open.science/status/ShadowMerge-033C

详情
英文摘要

Graph-based agent memory is increasingly used in LLM agents to support structured long-term recall and multi-hop reasoning, but it also creates a new poisoning surface: an attacker can inject a crafted relation into graph memory so that it is later retrieved and influences agent behavior. Existing agent-memory poisoning attacks mainly target flat textual records and are ineffective in graph-based memory because malicious relations often fail to be extracted, merged into the target anchor neighborhood, or retrieved for the victim query. We present SHADOWMERGE, a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, we design AIR, a pipeline that converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. We evaluate SHADOWMERGE on Mem0 and three public real-world datasets: PubMedQA, WebShop, and ToolEmu. SHADOWMERGE achieves 93.8% average attack success rate, improving the best baseline by 50.3 absolute points, while having negligible impact on unrelated benign tasks. Mechanism studies show that SHADOWMERGE overcomes the three key limitations of existing agent-memory poisoning attacks, and defense analysis shows that representative input-side defenses are insufficient to mitigate it. We have responsibly disclosed our findings to affected graph-memory vendors and open sourced SHADOWMERGE.

2605.02651 2026-05-18 cs.DL cs.LG

ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review

Kevin Riehl, Andres L. Marin, Nikofors Zacharof, Fan Wu, Patrick Langer, Robert Jakob, Anastasios Kouvelas, Georgios Fontaras, Michail A. Makridis

发表机构 * ETH Zürich, IVT & Agentic Systems Lab (ASL)(苏黎世联邦理工学院,信息与通信技术研究所及代理系统实验室) European Commission, Joint Research Centre(欧洲委员会,联合研究中心) University of Konstanz(康斯坦茨大学) Ideas Forward

AI总结 随着现代科研成果的规模和复杂性不断增加,科学同行评审在评估研究可重复性方面面临挑战。ARA(智能可重复性评估)将可重复性评估形式化为对科学文档的结构化推理任务,通过构建包含数据源、方法、实验和结果的有向工作流图,并基于结构和内容特征进行评估。实验表明,ARA在多个领域和不同大语言模型上均表现出良好的泛化能力,其在多个基准测试中的准确率显著优于现有方法,展示了其在大规模辅助科学同行评审中的应用潜力。

详情
英文摘要

Scientific peer review increasingly struggles to assess reproducibility at the scale and complexity of modern research output. Evaluating reproducibility requires reconstructing experimental dependencies, methodological choices, data flows, and result-generating procedures, which often exceeds what human reviewers can provide. Agentic Reproducibility Assessment (ARA) formalizes reproducibility assessment as a structured reasoning task over scientific documents. Given a paper, ARA extracts a directed workflow graph linking sources, methods, experiments, and outputs, then evaluates its reconstructability using structural and content-based scores for reproducibility assessments. Experiments on 213 ReScience C articles - the largest cross-domain benchmark of human-validated computational reproducibility studies considered to date - demonstrate ARA's generalizability and consistent workflow reconstruction and assessment across LLMs, model temperatures, and scientific domains. ARA achieves ~61% accuracy on three benchmarks, and the highest accuracy reported on ReproBench (60.71% vs. 36.84%) and GoldStandardDB (61.68% vs. 43.56%), highlighting its potential to complement human review at scale and enabling next-generation peer review. Code and Data available: https://github.com/AndresLaverdeMarin/agentic_reproducibility_assessment.

2605.01970 2026-05-18 cs.CR cs.AI

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das, Julien Piet, Darya Kaviani, Luca Beurer-Kellner, Florian Tramèr, David Wagner

发表机构 * ETH Z\"urich

AI总结 本文研究了针对大型语言模型代理的“特洛伊河马”攻击,该攻击通过在代理的长期记忆中植入隐蔽载荷,当用户讨论敏感话题时激活,从而实现数据外泄。研究提出了一种动态评估框架,用于系统评估不同内存架构和防御机制的有效性,并在实际邮件助手系统中验证了该攻击的高成功率(可达85%-100%)。研究还分析了多种防御方法的效果,揭示了安全性和实用性的权衡问题,为实际防御部署提供了重要参考。

详情
英文摘要

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work: the attacker plants a dormant payload into an agent's long-term memory via a single untrusted tool call (e.g., a crafted email), which activates only when the user later discusses sensitive topics such as finance, health, or identity, and exfiltrates high-value personal data to the attacker. While anecdotal demonstrations of such attacks have appeared against deployed systems, no prior work systematically evaluates them across heterogeneous memory architectures and defenses. We introduce a dynamic evaluation framework comprising two components: (1) an OpenEvolve-based adaptive red-teaming benchmark that stress-tests defenses and memory backends against continuously refined attacks, and (2) the first capability-aware security/utility analysis for persistent memory systems, enabling principled reasoning about defense deployment across different usage profiles. Instantiated on an email assistant across four memory backends (explicit tool memory, agentic memory, RAG, and sliding-window context), Trojan Hippo achieves up to 85-100% ASR against current frontier models from OpenAI and Google, with planted memories successfully activating even after 100 benign sessions. We evaluate four memory-system defenses inspired by basic security principles, finding they substantially reduce attack success rates (to as low as 0-5%), though at utility costs that vary widely with task requirements. Because of this substantial security-utility tradeoff, the effective real-world deployment of defenses remains an open challenge, which our evaluation framework is specifically designed to address.

2605.00424 2026-05-18 cs.CR cs.AI cs.MA cs.SE

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

Alfredo Metere

发表机构 * Enclawed, LLC(Enclawed公司)

AI总结 本文研究了如何在人类介入的智能体运行时中,对技能(一种增强大语言模型的结构化指令包)进行可信验证的问题。作者提出了一种信任架构和一个双向正确性准则,确保技能在加载前必须经过验证,而非依赖签名或来源注册等信任机制。该方法通过明确的验证层级和能力门控策略,使人类介入仅在验证失败时触发,从而提升系统的可扩展性和可持续性。研究贡献具有通用性,不依赖模型再训练或专有基础设施。

详情
英文摘要

Agent skills - structured packages of instructions, scripts, and references that augment a large language model (LLM) without modifying the model itself - have moved from convenience to first-class deployment artifact. The runtime that loads them inherits the same problem package managers and operating systems have always faced: a piece of content claims a behavior; the runtime must decide whether to believe it. We argue this paper's central thesis up front: a skill is untrusted code until it is verified, and the runtime that loads it must enforce that default rather than infer trust from a signature, a clearance, or a registry of origin. Without skill verification, a human-in-the-loop (HITL) gate must fire on every irreversible call - which is operationally untenable and degrades into rubber-stamping at any non-trivial scale. With skill verification treated as a separate, gated process, HITL fires only for what is unverified, and the system becomes sustainable. We give a trust schema that includes an explicit verification level on every skill manifest; a capability gate whose HITL policy is a function of that verification level; a biconditional correctness criterion that any candidate verification procedure must satisfy on an adversarial-ensemble exercise; and a portable runtime profile with ten normative guidelines abstracted from a working open-source reference implementation. The contribution is harness- and model-agnostic; nothing here requires retraining, fine-tuning, or proprietary infrastructure.

2604.14572 2026-05-18 cs.IR cs.AI cs.CL cs.MA

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

发表机构 * Magellan Technology Research Institute(马格纳技术研究 institute)

AI总结 本文提出了一种名为Corpus2Skill的方法,通过将企业文档库离线蒸馏为分层技能目录,使大型语言模型在回答问题时能够主动导航知识库,而非被动检索。该方法在企业客服基准测试中表现出优于多种RAG基线的问答质量与证据支持能力,并揭示了导航式方法在特定领域知识库中的优势,为知识引导系统的架构设计提供了指导。

详情
英文摘要

Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results, with no view of how the corpus is organized or what it has not yet seen. We present Corpus2Skill, which distills a document corpus offline into a hierarchical skill directory and lets an LLM agent navigate it at serve time, drilling from a bird's-eye view through progressively finer summaries down to documents, and backtracking when a branch is unproductive. On an enterprise customer-support benchmark, Corpus2Skill improves both answer quality and grounding over single-shot dense, hybrid, hierarchical-retrieval, and agentic RAG baselines at a moderate cost tradeoff. A ten-subset generalization study further shows that corpus navigation is not a universal replacement for retrieval: it consistently helps on single-domain corpora with a recoverable topical taxonomy, but flat retrieval remains preferable on open-domain factoid pools or homogeneous-tabular corpora that defeat top-level clustering. We characterize this scope distinction and discuss it as a design guideline for knowledge-grounded systems. Code is available at https://github.com/dukesun99/Corpus2Skill.

2603.16011 2026-05-18 cs.SE cs.AI cs.CL

FormulaCode: Evaluating Agentic Optimization on Large Codebases

Atharva Sehgal, James Hou, Akanksha Sarkar, Ishaan Mantripragada, Swarat Chaudhuri, Jennifer J. Sun, Yisong Yue

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) California Institute of Technology(加州理工学院) Cornell University(康奈尔大学)

AI总结 本文提出FormulaCode,一个用于评估大语言模型(LLM)代理在真实大型代码库中进行多目标优化能力的基准。该基准基于从GitHub科学Python仓库中挖掘的957个性能瓶颈,每个瓶颈都配有专家编写的补丁和大量社区维护的性能测试任务,能够全面评估LLM在保证正确性与性能约束下的优化能力。实验表明,当前最先进的LLM代理在面对大规模、多目标优化任务时仍面临显著挑战。

Comments Preprint version

详情
英文摘要

Large language model (LLM) coding agents increasingly operate at the repository level, motivating benchmarks that evaluate their ability to optimize entire codebases under realistic constraints. Existing code benchmarks largely rely on synthetic tasks, binary correctness signals, or single-objective evaluation, limiting their ability to assess holistic optimization behavior. We introduce FormulaCode, a benchmark for evaluating agentic optimization on large, real-world codebases with fine-grained, multi-objective performance metrics. FormulaCode comprises 957 performance bottlenecks mined from scientific Python repositories on GitHub, each paired with expert-authored patches and, on average, 264.6 community-maintained performance workloads per task, enabling the holistic ability of LLM agents to optimize codebases under realistic correctness and performance constraints. Our evaluations reveal that repository-scale, multi-objective optimization remains a major challenge for frontier LLM agents. Project website at: https://formula-code.github.io

2603.13864 2026-05-18 cs.CR cs.CV

Inevitable Encounters: Backdoor Attacks Involving Lossy Compression

Qian Li, Yunuo Chen, Yuntian Chen

发表机构 * Shanghai Jiao Tong University(上海交通大学) Eastern Institute of Technology(技术东院)

AI总结 本文研究了在现实场景中,由于数据存储和传输过程中不可避免地使用有损压缩,导致后门攻击效果被削弱的问题。针对图像压缩过程中嵌入的触发器信息可能丢失的问题,作者提出了两种专门应对有损压缩的中毒策略,确保触发器信息在压缩后仍能被有效恢复。实验表明,这两种方法在多种压缩方案下均具有良好的攻击效果,为后门攻击在实际应用中的实现提供了新的思路。

详情
英文摘要

Real-world backdoor attacks often require poisoned datasets to be stored and transmitted before being used to compromise deep learning systems. However, in the era of big data, the inevitable use of lossy compression poses a fundamental challenge to invisible backdoor attacks. We find that triggers embedded in RGB images often become ineffective after the images are lossily compressed into binary bitstreams (e.g., JPEG files) for storage and transmission. As a result, the poisoned data lose its malicious effect after compression, causing backdoor injection to fail. In this paper, we highlight the necessity of explicitly accounting for the lossy compression process in backdoor attacks. This requires attackers to ensure that the transmitted binary bitstreams preserve malicious trigger information, so that effective triggers can be recovered in the decompressed data. Building on the region-of-interest (ROI) coding mechanism in image compression, we propose two poisoning strategies tailored to inevitable lossy compression. First, we introduce Universal Attack Activation, a universal method that uses sample-specific ROI masks to reactivate trigger information in binary bitstreams for learned image compression (LIC). Second, we present Compression-Adapted Attack, a new attack strategy that employs customized ROI masks to encode trigger information into binary bitstreams and is applicable to both traditional codecs and LIC. Extensive experiments demonstrate the effectiveness of both strategies.

2603.04459 2026-05-18 cs.CR cs.AI cs.SE

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang

发表机构 * CISPA Helmholtz Center for Information Security(CISPA海德堡信息安全研究中心) University of Waterloo(滑铁卢大学) Flexera(Flexera公司)

AI总结 本文系统评估了31个大型语言模型安全基准的代码质量和可运行性,并与382篇非基准论文进行对比。研究发现,大多数基准代码需要修改才能运行,且仅有少数提供完整的安装指南和伦理考量。作者指出,基准的采用与作者知名度和代码可运行性相关,而非代码质量标准,揭示了社区在基准选择上的潜在偏差。此外,部分基准存在安全隐患,可能被用作攻击资源,影响安全评估的可靠性。

Comments 24 pages. 19 figures

详情
英文摘要

The rapid expansion of research in LLM safety presents challenges in tracking advancements, making benchmarks important evaluation infrastructures for identifying key trends and facilitating systematic comparisons. Yet no systematic assessment exists of their code quality and runnability, nor of what factors are associated with the community's adoption of certain benchmarks over others. To address this gap, we conduct a systematic measurement study of 31 LLM safety benchmarks (covering prompt injection, jailbreak, and hallucination) with 382 non-benchmark papers as a control group, combining automated static analysis, human runnability testing (220+ person-hours), and bibliometric analysis. We find that only 39\% of benchmark repositories can run without modification, only 16\% provide flawless installation guides, and a mere 6\% include ethical considerations despite containing potentially harmful content. These deficiencies persist across the study period with no significant improvement. Analyzing adoption factors, we find that benchmark adoption correlates with author prominence and code runnability, but not with code quality standards such as Pylint score and maintainability, suggesting that the community's benchmark selection does not reward higher coding standards. Based on these results, we identify potential safety and reliability concerns. Some safety benchmark repositories openly expose harmful content, such as successful jailbreak responses, without any ethical warning or access control, effectively serving as unguarded attack resources. Furthermore, when benchmarks require ad-hoc modifications to run, downstream safety evaluations across different papers may not be comparable. We present case studies illustrating these concrete consequences and propose a targeted checklist to help benchmark contributors improve code quality, documentation, and ethical practices.

2602.14342 2026-05-18 math.ST cs.DS cs.LG math.PR stat.TH

High-accuracy log-concave sampling with stochastic queries

Fan Chen, Sinho Chewi, Constantinos Daskalakis, Alexander Rakhlin

发表机构 * MIT(麻省理工学院) Yale University(耶鲁大学)

AI总结 本文研究了在对数凹函数采样中如何实现高精度的采样保证,提出使用具有亚指数尾部的随机梯度可以达到迭代和查询复杂度与 $\mathrm{poly}\log(1/δ)$ 相关的高精度采样。这与凸优化问题形成对比,后者在梯度存在随机性时需要 $\mathrm{poly}(1/δ)$ 的查询次数。研究还从信息论角度论证了轻尾随机梯度对于实现高精度采样的必要性,并给出了针对零阶随机查询和有限和势函数采样的改进复杂度结果。

详情
英文摘要

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries, and an improved complexity result for sampling from finite-sum potentials.

2602.14092 2026-05-18 eess.SY cs.RO cs.SY

Simultaneous State Estimation and Online Model Learning in a Soft Robotic System

Jan-Hendrik Ewering, Max Bartholdt, Simon F. G. Ehlers, Niklas Wahlström, Thomas B. Schön, Thomas Seel

发表机构 * Department of Information Technology, Uppsala University(信息科技系,乌普萨拉大学) Vascular Surgery, Hannover Medical School(血管外科,汉诺威医学院)

AI总结 本文研究了在软体机器人系统中同时进行状态估计和在线模型学习的问题。作者提出了一种基于灰色箱系统辨识工具的方法,仅需使用名义上的恒曲率机器人模型和机器人基座的力测量数据,即可同时估计软体机器人的当前姿态并学习其弯曲刚度模型。该方法通过边缘化粒子滤波器将恒曲率模型与高斯过程模型结合,有效提升了模型预测精度和整体质量,并在实际软体机器人实验中验证了其有效性。

Comments 8 pages, 3 figures, 2 tables, contribution to the International Conference on Information Fusion 2026

详情
英文摘要

Operating complex real-world systems, such as soft robots, can benefit from precise predictive control schemes that require accurate state and model knowledge. This knowledge is typically not available in practical settings and must be inferred from noisy measurements. In particular, it is challenging to simultaneously estimate unknown states and learn a model online from sequentially arriving measurements. In this paper, we show how a recently proposed gray-box system identification tool enables the estimation of a soft robot's current pose while at the same time learning a bending stiffness model. For estimation and learning, we only need a nominal constant-curvature robot model and measurements of the robot's base reactions (e.g., base forces). The estimation scheme -- relying on a marginalized particle filter -- allows us to conveniently interface nominal constant-curvature equations with a Gaussian Process (GP) bending stiffness model to be learned. This, in contrast to estimation via a random walk over stiffness values, enables prediction of bending stiffness and improves overall model quality. We demonstrate, using a real-world soft robot, that the method learns a bending-stiffness model online while accurately estimating the robot's pose. Notably, reduced error in multi-step forward predictions indicates that the learned bending-stiffness GP improves overall model quality.

2602.12292 2026-05-18 eess.SP cs.LG

A Gradient Boosted Mixed-Model Machine Learning Framework for Vessel Speed in the U.S. Arctic

Mauli Pant, Linda Fernandez, Indranil Sahoo

发表机构 * Integrative Life Sciences Doctoral Program, Center for Integrative Life Sciences Education, Virginia Commonwealth University(整合生命科学博士项目,整合生命科学教育中心,弗吉尼亚常识大学) School of Life Sciences and Sustainability, Department of Economics, Virginia Commonwealth University(生命科学与可持续发展学院,经济学系,弗吉尼亚常识大学) Department of Statistical Sciences and Operations Research, Virginia Commonwealth University(统计科学与运筹学系,弗吉尼亚常识大学)

AI总结 本文研究了环境与操作条件如何影响美国北极地区船舶的航速,通过分析2010至2019年的自动识别系统(AIS)数据,提出了一种两阶段的混合机器学习框架,分别建模航速大于零的概率和条件航速。该方法结合了梯度提升决策树与随机效应,能够捕捉非线性环境响应并处理重复观测,结果显示海岸距离和水深是影响船舶航速的主要因素,而风和海冰的影响则相对较小。

详情
英文摘要

Understanding how environmental and operational conditions influence vessel speed is crucial for characterizing navigational conditions in the Arctic. We analyzed Automatic Identification System (AIS) data from 2010-2019 to examine vessel speed over ground (SOG). Over half of the AIS records showed zero SOG, and treating zero and positive SOG as a single continuous process can obscure important patterns. We therefore applied a two-stage machine learning framework, first modeling the probability of SOG greater than zero and then modeling SOG conditional on being positive. AIS observations were integrated with sea ice concentration, course over ground, wind, bathymetric depth, distance to coast, vessel group, and navigational status. Gradient boosted decision trees with random effects captured nonlinear environmental responses while accounting for repeated observations. The positive SOG classifier achieved strong discrimination (AUC = 0.85), while the conditional speed model explained approximately 77 percent of out-of-fold variance. SHAP values quantified covariate effects by decomposing model predictions into additive contributions from individual variables. Distance to coast and bathymetric depth were dominant determinants of both the likelihood and magnitude of vessel speed, while changes in course, vessel group, and navigational status introduced secondary variation. Wind and sea ice effects were modest. Together, these results empirically characterize Arctic vessel operating regimes relevant to speed management and corridor-level assessment.

2602.06824 2026-05-18 math.OC cs.LG

RanSOM: Second-Order Momentum with Randomized Scaling for Constrained and Unconstrained Optimization

El Mahdi Chayti

发表机构 * Machine Learning and Optimization Laboratory (MLO)(机器学习与优化实验室)

AI总结 本文提出了一种名为 RanSOM 的统一优化框架,用于解决有约束和无约束优化问题,旨在消除传统动量方法在随机设置中的曲率偏差问题。该方法通过将确定性步长替换为从特定分布中随机抽取的步长,结合 Stein 型恒等式,仅使用一次 Hessian-向量乘积即可准确估计动量偏差,从而避免了额外采样或对光滑性假设的依赖。实验表明,RanSOM 在标准噪声条件下实现了最优的 $\mathcal{O}(ε^{-3})$ 收敛速度,并在重尾噪声环境下也表现出优越的性能。

详情
英文摘要

Momentum methods, such as Polyak's Heavy Ball, are the standard for training deep networks but suffer from curvature-induced bias in stochastic settings, limiting convergence to suboptimal $\mathcal{O}(ε^{-4})$ rates. Existing corrections typically require expensive auxiliary sampling or restrictive smoothness assumptions. We propose \textbf{RanSOM}, a unified framework that eliminates this bias by replacing deterministic step sizes with randomized steps drawn from distributions with mean $η_t$. This modification allows us to leverage Stein-type identities to compute an exact, unbiased estimate of the momentum bias using a single Hessian-vector product computed jointly with the gradient, avoiding auxiliary queries. We instantiate this framework in two algorithms: \textbf{RanSOM-E} for unconstrained optimization (using exponentially distributed steps) and \textbf{RanSOM-B} for constrained optimization (using beta-distributed steps to strictly preserve feasibility). Theoretical analysis confirms that RanSOM recovers the optimal $\mathcal{O}(ε^{-3})$ convergence rate under standard bounded noise, and achieves optimal rates for heavy-tailed noise settings ($p \in (1, 2]$).

2602.01568 2026-05-18 cs.GT cs.RO

Efficiently Solving Mixed-Hierarchy Games with Quasi-Policy Approximations

Hamzah Khan, Dong Ho Lee, Jingqi Li, Tianyu Qiu, Christian Ellis, Jesse Milzman, Wesley Suttle, David Fridovich-Keil

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校) U.S. Army Research Laboratory(美国陆军研究实验室)

AI总结 本文研究了具有混合层次结构的多机器人博弈问题,其中部分机器人作为Stackelberg领导者在其子树中决策,而不同分支的机器人则通过纳什均衡进行交互。为了解决这类博弈中高阶导数带来的求解困难,作者提出了一种准策略近似方法,并结合非精确牛顿法高效求解近似KKT系统,证明了算法在非二次目标和非线性约束下的局部指数收敛性。该方法已在实际硬件和仿真环境中验证,展示了对复杂混合层次结构的实时求解能力。

详情
英文摘要

Multi-robot coordination often exhibits hierarchical structure, with some robots' decisions depending on the planned behaviors of others. While game theory provides a principled framework for such interactions, existing solvers struggle to handle mixed information structures that combine simultaneous (Nash) and hierarchical (Stackelberg) decision-making. We study N-robot forest-structured mixed-hierarchy games, in which each robot acts as a Stackelberg leader over its subtree while robots in different branches interact via Nash equilibria. We derive the Karush-Kuhn-Tucker (KKT) first-order optimality conditions for this class of games and show that they involve increasingly high-order derivatives of robots' best-response policies as the hierarchy depth grows, rendering a direct solution intractable. To overcome this challenge, we introduce a quasi-policy approximation that removes higher-order policy derivatives and develop an inexact Newton method for efficiently solving the resulting approximated KKT systems. We prove local exponential convergence of the proposed algorithm for games with non-quadratic objectives and nonlinear constraints. The approach is implemented in a highly optimized Julia library (MixedHierarchyGames.jl) and evaluated in hardware and simulated multi-agent experiments, demonstrating real-time convergence for complex mixed-hierarchy information structures.

2601.23030 2026-05-18 stat.ML cs.LG stat.ME

Neural Backward Filtering Forward Guiding

Gefan Yang, Frank van der Meulen, Stefan Sommer

发表机构 * Department of Computer Science(计算机科学系) University of Copenhagen(哥本哈根大学) Department of Mathematics(数学系) Vrije Universiteit Amsterdam(阿姆斯特丹自由大学)

AI总结 本文提出了一种名为“神经反向滤波正向引导”(NBFFG)的统一框架,用于解决树状非线性连续随机过程中的推断问题,尤其适用于观测稀疏且拓扑结构复杂的情形。该方法通过构造一个近似的线性高斯过程,得到闭式反向滤波器以引导生成路径向高似然区域移动,并利用神经网络残差捕捉非线性偏差,从而实现无偏的路径子采样,显著降低训练复杂度。实验表明,NBFFG在合成数据集和高维系统发育分析任务中均优于现有方法。

详情
英文摘要

Inference in nonlinear continuous stochastic processes on trees is challenging, particularly when observations are sparse and the topology is complex. Exact smoothing via Doob's $h$-transform is intractable for general nonlinear dynamics. We propose Neural Backward Filtering Forward Guiding (NBFFG), a unified framework for both discrete transitions and continuous diffusions. Our method constructs a variational posterior by leveraging a proxy linear-Gaussian process. This proxy process yields a closed-form backward filter that serves as a guide, steering the generative path toward high-likelihood regions. We then learn a neural residual to capture the non-linear discrepancies. This formulation allows for an unbiased pathwise subsampling scheme, reducing the training complexity from tree-size dependent to path-length dependent. Empirical results show that NBFFG outperforms baselines on synthetic benchmarks, and we demonstrate the method on a high-dimensional inference task in phylogenetic analysis with reconstruction of ancestral butterfly wing shapes.

2601.21028 2026-05-18 cs.CY cs.AI cs.HC

"Unlimited Realm of Exploration and Experimentation": Methods and Motivations of AI-Generated Sexual Content Creators

Jaron Mink, Lucy Qin, Elissa M. Redmiles

发表机构 * Arizona State University(亚利桑那州立大学) Georgetown University(乔治城大学)

AI总结 本文研究了AI生成性内容(AIG-SC)创作者的动机、方法及内容类型,揭示了他们创作的多样性,包括性探索、创意表达和技术实验等。研究通过深入访谈28位创作者,探讨了AIG-SC在技术、伦理和社会层面的影响,为相关政策制定提供了重要参考。

详情
英文摘要

AI-generated media is radically changing the way content is both consumed and produced on the internet, and in no place is this potentially more visible than in sexual content. AI-generated sexual content (AIG-SC) is increasingly enabled by an ecosystem of individual AI developers, specialized third-party applications, and foundation model providers. AIG-SC raises a number of concerns from older debates about the line between pornography and obscenity to newer debates about fair use and labor displacement (in this case, of sex workers), and has spurred new regulations to curb the spread of non-consensual intimate imagery (NCII) created using the same technology used to create AIG-SC. However, despite the growing prevalence of AIG-SC, little is known about its creators, their motivations, and what types of content they produce. To inform effective governance in this space, we conducted an in-depth study to understand what AIG-SC creators make, along with how and why they make it. Interviews with 28 AIG-SC creators, ranging from hobbyists to entrepreneurs to those who moderate communities of hundreds of thousands of other creators, revealed a wide spectrum of motivations, including sexual exploration, creative expression, technical experimentation, and in a handful of cases, the creation of NCII.

2512.07946 2026-05-18 hep-th cs.LG

Conformal Defects in Neural Network Field Theories

Pietro Capuozzo, Brandon Robinson, Benjamin Suzzoni

发表机构 * STAG Research Centre, University of Southampton(STAG研究中心,南安普顿大学) Institute of Physics, University of Amsterdam(物理研究所,阿姆斯特丹大学) Department of Mathematical Sciences, Ulsan National Institute of Science and Technology(数学科学系,乌山国立科学与技术研究所)

AI总结 本文研究了神经网络场论(NN-FTs)中共形不变缺陷的构建方法,提出了一种形式化框架用于在这些理论中引入共形缺陷。通过两个标量场论的玩具模型,展示了该方法的有效性,并发展了类似缺陷算符乘积展开的神经网络解释,为共形场论与深度学习的交叉研究提供了新工具。

Comments 23 pages, 1 figure

Journal ref J. High Energy Phys. 05 (2026) 124

详情
英文摘要

Neural Network Field Theories (NN-FTs) represent a novel construction of arbitrary field theories, including those of conformal fields, through the specification of the network architecture and prior distribution for the network parameters. In this work, we present a formalism for the construction of conformally invariant defects in these NN-FTs. We demonstrate this new formalism in two toy models of NN scalar field theories. We develop an NN interpretation of an expansion akin to the defect OPE in two-point correlation functions in these models.

2512.04745 2026-05-18 math.OC cs.AI cs.SY eess.SY nlin.AO

Neural Policy Composition from Free Energy Minimization

Francesca Rossi, Veronica Centorrino, Francesco Bullo, Giovanni Russo

发表机构 * Scuola Superiore Meridionale, Italy(意大利南部高级学院) ETH, Zürich(苏黎世联邦理工学院) Center for Control, Dynamical Systems, and Computation, UC Santa Barbara, CA, USA(加州大学圣巴巴拉分校控制与动力系统中心) Department of Information and Electrical Engineering and Applied Mathematics, University of Salerno, Italy(意大利萨勒诺大学信息与电气工程及应用数学系)

AI总结 本文研究了如何通过最小化变分自由能来实现神经策略的组合,提出了一种规范化的框架,为策略组合提供了原理性且广泛适用的目标函数。基于该框架,作者推导出一种连续时间梯度流,其轨迹可保证以明确速率收敛到最优策略组合,并展示了该动态机制可通过软竞争递归电路实现。实验表明,该模型在多智能体群体行为、人类决策任务和分层控制等场景中,能够有效解释策略组合机制,再现关键行为特征,并在性能上优于或匹配现有模型。

详情
英文摘要

The ability to flexibly compose previously acquired skills to execute intelligent behaviors is a hallmark of natural intelligence. Such compositional flexibility is often attributed to context-dependent gating mechanisms that determine how multiple policies or behavioral primitives are combined. Yet, despite remarkable efforts, the normative objective from which such gating rules should arise, and the neural computations capable of implementing them, remain unclear. Existing approaches typically rely on prespecified design choices for the gating rules, and remain tied to specific architectures, learning paradigms, or datasets. Here, we introduce a normative framework in which policy composition emerges from the minimization of a variational free energy, providing a principled and broadly applicable objective for gating. Based on this framework, we derive a continuous-time gradient flow whose trajectories are guaranteed to converge, with explicit rate, to the optimal composition of primitives. We further show that this dynamics admits a mechanistic neural implementation as a soft-competitive recurrent circuit with context-sensitive local interactions. We evaluate the model on emerging flocking behaviors in multi-agent systems, human decision-making in bandit tasks, and control benchmarks in layered architectures. Across these settings, the model provides interpretable mechanistic accounts of policy composition, reproduces key behavioral signatures, yields insights into data, and matches or outperforms established models.

2511.05297 2026-05-18 cs.SE cs.LG

Building Specialized Software-Assistant ChatBot with Graph-Based Retrieval-Augmented Generation

Mohammed Hilel, Yannis Karmim, Jean De Bodinat, Reda Sarehane, Antoine Gillon

发表机构 * RAKAM AI Lemon Learning

AI总结 本文提出了一种基于图结构的检索增强生成框架,用于构建面向企业软件的专用软件助手聊天机器人,以解决传统大型语言模型在缺乏软件结构理解时易产生幻觉的问题。该方法通过自动将企业网页应用转换为状态-动作知识图谱,辅助语言模型生成更准确、上下文相关的指导信息。研究还详细介绍了从软件界面中提取和构建知识图谱的工程流程,并展示了该方法在实际数字采用平台中的集成与应用效果。

Comments Accepted at ICMLC 2026

详情
英文摘要

Digital Adoption Platforms (DAPs) have become essential tools for helping employees navigate complex enterprise software such as CRM, ERP, or HRMS systems. Companies like LemonLearning have shown how digital guidance can reduce training costs and accelerate onboarding. However, building and maintaining these interactive guides still requires extensive manual effort. Leveraging Large Language Models as virtual assistants is an appealing alternative, yet without a structured understanding of the target software, LLMs often hallucinate and produce unreliable answers. Moreover, most production-grade LLMs are black-box APIs, making fine-tuning impractical due to the lack of access to model weights. In this work, we introduce a Graph-based Retrieval-Augmented Generation framework that automatically converts enterprise web applications into state-action knowledge graphs, enabling LLMs to generate grounded and context-aware assistance. The framework was co-developed with the AI enterprise RAKAM, in collaboration with Lemon Learning. We detail the engineering pipeline that extracts and structures software interfaces, the design of the graph-based retrieval process, and the integration of our approach into production DAP workflows. Finally, we discuss scalability, robustness, and deployment lessons learned from industrial use cases.

2511.04484 2026-05-18 cs.DS cs.LG

Online Algorithms for Repeated Optimal Stopping: Balancing Baseline Guarantees and Regret

Tsubasa Harada, Yasushi Kawase, Hanna Sumita

发表机构 * Institute of Science Tokyo, Japan(东京科学研究所) Chuo University, Japan(成城大学)

AI总结 本文研究重复最优停止问题,在未知分布的情况下,目标是在每轮中保持强性能保证的同时实现整体次线性遗憾。作者提出了一种通用算法框架,在完整反馈条件下,以高概率同时满足每轮性能保证和次线性遗憾,并适用于如先知不等式、秘书问题等多种经典场景。研究还给出了在独立同分布模型下的遗憾下界,表明所提方法的性能接近最优。

Comments 30 pages, Major revision with corrected results, new impossibility results, and revised exposition

详情
英文摘要

We study the repeated optimal stopping problem, in which the same optimal stopping instance with an unknown distribution is solved repeatedly over $T$ rounds. We aim to simultaneously achieve strong per-round performance guarantees relative to a given baseline and sublinear regret across all rounds. Our primary contribution is a comprehensive theoretical characterization of whether and when these two objectives are compatible. First, under standard semi-bandit feedback, we prove that maintaining the per-round guarantee forces regret of $Ω(T / \log T)$. Second, even under full feedback, we show that requiring almost-sure satisfaction of the per-round guarantee in every round is incompatible with sublinear regret. Third, under full feedback, we propose a general algorithmic framework that achieves both sublinear regret and the per-round guarantee with high probability. Our framework applies to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, in the repeated prophet inequality problem, our method guarantees that, with high probability in each round, its expected reward is at least that of the classical single-sample algorithm, which achieves a $1/2$ competitive ratio, while simultaneously ensuring $\tilde{O}(\sqrt{T})$ regret. Furthermore, we establish a regret lower bound of $Ω(\sqrt{T})$ even in the i.i.d. model, which is nearly tight with respect to the number of rounds.

2511.03606 2026-05-18 stat.ML cs.LG math.ST stat.TH

Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity

Diego Martinez-Taboada, Tomas Gonzalez, Aaditya Ramdas

发表机构 * Department of Statistics & Data Science(统计与数据科学系) Machine Learning Department(机器学习系) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文研究了超越次高斯分布的向量值自归一化过程的集中不等式,填补了该领域在非次高斯条件下的理论空白。作者提出了适用于轻尾分布(如贝内特或伯努利分布)的集中界,扩展了传统自归一化分析的适用范围。研究成果在在线线性回归及核化线性强盗算法中具有重要应用价值。

详情
英文摘要

The study of self-normalized processes plays a crucial role in a wide range of applications, from sequential decision-making to econometrics. While the behavior of self-normalized concentration has been widely investigated for scalar-valued processes, vector-valued processes remain comparatively underexplored, especially outside of the sub-Gaussian framework. In this contribution, we provide concentration bounds for self-normalized processes with light tails beyond sub-Gaussianity (such as Bennett or Bernstein bounds). We illustrate the relevance of our results in the context of online linear regression, with applications in (kernelized) linear bandits.

2510.19315 2026-05-18 cs.FL cs.LG cs.LO

Transformers are Inherently Succinct

Pascal Bergsträßer, Ryan Cotterell, Anthony W. Lin

发表机构 * RPTU Kaiserslautern-Landau(科隆-拉登-奥尔德大学) MPI-SWS Kaiserslautern(科隆-拉登马克斯·普朗克研究所)

AI总结 本文研究了变换器(transformers)在表达能力上的简洁性,将其作为衡量其性能的一个重要指标。作者证明了固定精度的变换器在描述语言时比线性时序逻辑(LTL)、循环神经网络(RNN)以及有限自动机等传统模型更加简洁,甚至在某些情况下具有指数级或双指数级的简洁优势。同时,研究还给出了相应的上界,表明变换器可以转换为LTL公式,且仅需指数级的扩展,这改进了之前的双指数级转换结果。这一简洁性也导致了变换器的基本验证问题(如空集性和等价性)在计算复杂度上是难以处理的,具体为EXPSPACE-完全问题。

详情
英文摘要

We study succinctness as a measure of the expressive power of transformers. Succinctness -- how compactly a formalism can describe a language relative to other formalisms -- is a classical notion in logic and automata theory. We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL formula or recurrent neural network is exponentially large, and whose smallest equivalent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an exponential blow-up -- improving a prior doubly exponential translation. As a consequence of this succinctness, we show that basic verification problems for transformers, such as emptiness and equivalence, are provably intractable: specifically, EXPSPACE-complete.

2510.15714 2026-05-18 math.OC cs.LG

A Split-Client Approach to Second-Order Optimization

El Mahdi Chayti, Martin Jaggi

发表机构 * Machine Learning and Optimization Laboratory(机器学习与优化实验室)

AI总结 本文提出了一种名为Split-Client的框架,用于解决二阶优化方法中Hessian矩阵计算和分解带来的计算瓶颈问题。该方法将优化过程分解为并行的梯度和曲率计算,实现了对延迟的自适应调整,无需手动调参即可达到与最优Lazy方法相当的收敛速度。此外,该框架在持续曲率误差和结构化条件下分别提供了噪声自适应和更快的收敛速率,并在非凸问题实验中展示了显著的加速效果。

详情
英文摘要

Second-order optimization methods offer superior convergence rates but are often bottlenecked by the wall-clock cost of Hessian computation and factorization. In the moderate-dimensional regime where the full Hessian fits in memory, factorization $\mathcal{O}(d^3)$ typically dominates gradient evaluation $\mathcal{O}(nd)$, creating a synchronization barrier that negates the per-iteration progress of classical second-order methods. We propose the \emph{Split-Client} framework, which decouples optimization into parallel gradient and curvature processes. Unlike Lazy Hessian approaches, whose arithmetic-complexity analysis does not charge factorization time and whose optimal reuse frequency requires tuning, our method is fully \textbf{delay-adaptive}: its wall-clock complexity scales with the \emph{average} delay $\Barτ$, and it matches the optimally-tuned Lazy rate of $\mathcal{O}(\eps^{-3/2}\sqrt{\Barτ})$ without any tuning. For persistent curvature error, we provide a noise-adaptive schedule with $\widetilde{\mathcal{O}}(T^{-3/4})$ rate (on $E[\|\nabla f\|]^{3/2}$), recovering the rate that uniform-error analyses such as Kamzolov et al (2023) achieve via inflated regularization. Under a verifiable subspace-alignment condition, an additional \emph{structured} analysis based on the secant condition of L-BFGS gives a faster $\mathcal{O}(T^{-1})$ rate, with a hybrid theorem interpolating smoothly between the two regimes. We extend the framework to Subsampled Cubic Newton with adaptive batch sizes and an aggregate sampling budget linear in $T$. Experiments on two non-convex problems show wall-clock speedups of up to $800\times$ over Vanilla and $30\times$ over Lazy in the strongly factorization-dominated regime.

2510.02734 2026-05-18 q-bio.BM cs.AI q-bio.GN

SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations

Taehan Kim, Sangdae Nam

发表机构 * Department of Computer Science, University of California, Berkeley(加州大学伯克利分校计算机科学系) Department of Development Engineering, University of California, Berkeley(加州大学伯克利分校发展工程系)

AI总结 本文提出了一种名为 SAE-RNA 的稀疏自编码器模型,用于解释 RNA 语言模型的表示,旨在探索其是否能够对 RNA 语言模型的特征进行可解释的分解。该方法基于 RiNALMo 模型,通过映射到已知的生物学特征,分析 RNA 语言模型内部如何组织生物信息。研究为 RNA 分类和结构特征的识别提供了一个基于特征层面的比较框架,并探讨了稀疏自编码器在该任务中的适用性与局限性。

Comments 12 pages, 7 figures. v2: Updated bibliography to improve reference accuracy and reflect updated publication venues. Refined claims for better alignment with results and added an Appendix

详情
英文摘要

Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein language models such as ESM inspiring emerging RNA language models such as RiNALMo. Recent work has begun applying sparse autoencoders (SAEs) to protein language model representations, exploring representation-level interpretability in biomolecular models. Here, we explore whether SAEs can provide interpretable feature decompositions of RNA language model representations, while also examining their limitations in this setting. We present SAE-RNA, interpretability model that analyzes RiNALMo representations and maps them to known human-level biological features. Rather than claiming definitive biological concept discovery, our study frames SAE-based analysis as a representation-level probe for characterizing how RNA language models organize biological information internally. More broadly, SAE-RNA provides a feature-level framework for comparing RNA groups and identifying sparse representation components associated with RNA family identity or structural context.

2509.16223 2026-05-18 eess.SP cs.CV

mRadNet: A Compact Radar Object Detector with MetaFormer

Huaiyu Chen, Fahed Hassanat, Robert Laganiere, Martin Bouchard

发表机构 * School of Electrical Engineering and Computer Science, University of Ottawa, Canada(渥太华大学电气与计算机工程学院,加拿大) tsensor Cortek Inc., Canada(加拿大tsensor Cortek公司)

AI总结 本文提出了一种名为mRadNet的紧凑型雷达目标检测模型,旨在满足车载嵌入式系统对模型轻量化和高效性的需求。该模型基于U-Net结构,结合MetaFormer模块,利用分离卷积和注意力机制有效提取局部与全局特征,并引入更高效的特征嵌入与融合策略以进一步降低计算复杂度。实验结果表明,mRadNet在CRUW数据集上以最少的参数和最低的计算量实现了优于现有方法的检测性能。

Comments 5 pages, 2 figures, to appear in Proc. of 34th European Signal Processing Conference (EUSIPCO 2026), Bruges, Belgique, Aug. 31 - Sept. 4, 2026. Code availble at https://github.com/huaiyu-chen/mRadNet

详情
英文摘要

Frequency-modulated continuous wave radars have gained increasing popularity in the automotive industry. Their robustness against adverse weather conditions makes it a suitable choice for radar object detection in advanced driver assistance systems. These real-time embedded systems have requirements for the compactness and efficiency of the model, which have been largely overlooked in previous work. In this work, we propose mRadNet, a novel radar object detection model with compactness in mind. mRadNet employs a U-net style architecture with MetaFormer blocks, in which separable convolution and attention token mixers are used to capture both local and global features effectively. More efficient token embedding and merging strategies are introduced to further facilitate the lightweight design. The performance of mRadNet is validated on the CRUW dataset, improving state-of-the-art performance with the fewest parameters and the lowest FLOPs.