arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4077
2605.10237 2026-05-12 cs.LG

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

Elisabetta Cornacchia, Dan Mikulincer, Elchanan Mossel

AI总结 本文研究了数据中的时间相关性如何使某些稀疏学习问题能够被梯度方法高效求解。研究聚焦于布尔k-juntas这一经典稀疏学习问题,发现当样本由超立方体上的懒惰随机游走生成时,使用带时间差分损失的两层ReLU网络进行训练,可以高效学习该问题,样本复杂度几乎与环境维度线性相关。相比之下,使用标准凸点wise损失的大批量梯度方法则无法获得相同优势。

Comments 10 pages main body, 3 figures

详情
英文摘要

We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.

2605.10230 2026-05-12 cs.LG

FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

Qingchuan Zhang, He Cao, Hao Li, Yanjun Shao, Zhiyuan Liu, Shihang Wang, Shufang Xie, Shenghua Gao, Xinwu Ye

AI总结 FORGE 是一种面向分子优化的两阶段框架,旨在通过局部编辑在保持分子结构相似性的前提下提升其性质。该方法利用自动挖掘的片段编辑对替代人工标注,第一阶段基于分子上下文对候选片段进行排序以注入化学先验知识,第二阶段生成具体的片段替换方案。FORGE 在多个基准测试中表现优于现有方法,展示了基于片段级监督的分子优化新路径。

详情
英文摘要

Molecular optimization seeks to improve a molecule through small structural edits while preserving similarity to the starting compound. Recent language-model approaches typically treat this task as prompt-conditioned sequence generation. However, relying on natural language introduces an inherent data-scaling bottleneck, often leads to chemical hallucinations, and ignores the strong context dependence of fragment effects. We present FORGE, a two-stage framework that reformulates molecular optimization as context-aware local editing. By utilizing automatically mined, verified low-to-high edit pairs instead of expensive human text annotations, Stage 1 ranks candidate fragments by their property contribution under the full molecular context to inject chemical prior, and Stage 2 generates explicit fragment replacements. Built on a compact 0.6B language model, FORGE further adapts to unseen black-box objectives through in-context demonstrations. Across Prompt-MolOpt, PMO-1k and ChemCoTBench, FORGE consistently outperforms prior methods, including substantially larger language models and graph methods. These results highlight the value of explicit fragment-level supervision as a more easily obtainable, scalable, and hallucination-less alternative to natural language training.

2605.10229 2026-05-12 cs.CV cs.CY

VPD-100K: Towards Generalizable and Fine-grained Visual Privacy Protection

Xiaobin Hu, Enpu Zuo, Lanping Hu, Kaiwen Yang, Dianshu Liao, Tianyi Zhang, Bo Yin, Yinsi Zhou, Shidong Pan, Xiaoyu Sun

AI总结 随着视觉数据共享的普及,隐私保护成为一项重要需求,但现有隐私检测算法因缺乏全面数据集而面临挑战。为此,本文提出一个大规模、细粒度的视觉隐私数据集 VPD-100K,涵盖人类存在、屏幕上的个人身份信息、物理标识符和位置指示等四个领域,包含10万张图像和19万标注对象实例,具有长尾分布、小目标和高视觉复杂度等特点。同时,研究设计了一种基于频率增强的轻量模块,有效提升了对敏感信息细微特征的捕捉能力,实验表明该数据集和方法在多种基准测试中均表现出色。

Comments Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
英文摘要

Privacy protection has become a critical requirement in the era of ubiquitous visual data sharing, imposing higher demands on efficient and robust privacy detection algorithms. However, current robust detection models are severely hindered by the lack of comprehensive datasets. Existing privacy-oriented datasets often suffer from limited scale, coarse-grained annotations, and narrow domain coverage, failing to capture the intricate details of sensitive information in realworld environments. To bridge this gap, we present a large-scale, fine-grained Visual Privacy Dataset (VPD-100K), designed to facilitate generalized privacy detection. We establish a holistic taxonomy comprising four primary domains: Human Presence, On-Screen Personally Identifiable Information (PII), Physical Identifiers, and Location Indicators, containing 100,000 images annotated with 33 fine-grained classes and over 190,000 object instances. Statistical analysis reveals that our dataset features long-tailed distributions, small object scales, and high visual complexity. These characteristics make the dataset particularly valuable for demanding, unconstrained applications such as live streaming, where actors frequently face unintentional, realtime information leakage. Furthermore, we design an effective frequency-enhanced lightweight module consisting of frequency-domain attention fusion and adaptive spectral gating mechanism that breaks the limitations of spatial pixel intensity to better capture the subtle details of sensitive information. Extensive experiments conducted on both diverse image and streaming videos benchmarks consistently demonstrate the effectiveness of our VPD-100K dataset and the wellcurated frequency mechanism. The code and dataset are available at https://vpd-100k.github.io/.

2605.10224 2026-05-12 cs.AI

Hypothesis-Driven Deep Research with Large Language Models: A Structured Methodology for Automated Knowledge Discovery

Michael Chin

AI总结 本文提出了一种基于假设驱动的深度研究方法(HDRI),旨在通过将假设作为研究过程的组织工具,提升人工智能辅助科研的系统性和主动性。该方法引入了六项核心原则和八阶段流程,重点创新包括基于缺口驱动的迭代研究机制和可追溯的事实推理框架,从而实现自动化的知识发现与验证。实验表明,该方法在事实密度、主体匹配准确率和多源验证置信度等方面均有显著提升,并通过五个案例验证了其实际应用价值。

详情
英文摘要

Current AI-powered research systems adopt a direct search-then-summarize paradigm that treats hypotheses as end products of scientific discovery. We argue this leaves a critical gap: hypotheses can serve a far more powerful role as organizational instruments that structure the research process itself. We propose the Hypothesis-Driven Deep Research (HDRI) methodology - the first framework using hypotheses to organize general-purpose deep research across arbitrary domains, rather than merely validating claims within specific domains. This transforms research from reactive information retrieval into proactive, verifiable, and iterative knowledge discovery. HDRI is formalized with six core principles and an eight-stage pipeline. A central innovation is the gap-driven iterative research mechanism - a closed-loop quality assurance system that automatically identifies informational and logical gaps, triggering targeted supplementary investigation. We further introduce a fact reasoning framework with traceable reasoning chains and quantified confidence propagation, a subject locking mechanism to prevent entity confusion, and a multi-dimensional quality assessment scheme. The methodology is realized in the INFOMINER system. Experiments demonstrate improvements of 22.4% in fact density, 90% subject matching accuracy, 0.92 multi-source verification confidence, and 14% completeness gain from gap-driven supplementation. Five case studies validate its practical applicability, achieving an average quality rating of 4.46/5.0.

2605.10223 2026-05-12 cs.AI cs.SE

Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution

Kai Pan, Rong Hou

AI总结 当前大型语言模型代理框架过于强调自主性,缺乏企业级部署所需的安全可控机制。本文提出了一种动态分层的AgentRunner框架,通过风险自适应分层、权力分离架构和设计韧性机制,实现了在安全性与效率之间的帕累托最优平衡,为企业级AI执行提供了更安全、可控和可靠的解决方案。

Comments 9 pages, 2 figures, 3 tables

详情
英文摘要

Current large language model agent frameworks prioritize autonomy but lack the governability mechanisms required for enterprise deployment. High-risk write operations proceed without independent review, complex tasks lack acceptance verification, and computational resources are allocated uniformly regardless of risk level. We propose the Dynamic Tiered AgentRunner, a controlled execution protocol distilled from a production-grade multi-tenant SaaS platform. The framework introduces three core mechanisms: (1) Risk-Adaptive Tiering that dynamically allocates computational resources and review intensity based on task risk profiles, achieving Pareto-optimal trade-offs between safety and efficiency; (2) Separation of Powers architecture where proposal, review, execution, and verification are performed by independent agents with physically isolated boundaries; and (3) Resilience-by-Design through a Verifier-Recovery closed loop that treats failure as a first-class system state. We formalize the tier selectio

2605.10218 2026-05-12 cs.CL

Relative Score Policy Optimization for Diffusion Language Models

Zichao Yu, Shengze Xu, Bingqing Jiang, Wenyi Zhang, Difan Zou

AI总结 扩散语言模型(dLLMs)在并行和高效文本生成方面具有潜力,但其推理能力的提升需要有效的后训练方法。传统基于可验证奖励的强化学习(RLVR)方法因缺乏可计算的序列级对数比率而难以直接应用于dLLMs,导致依赖高方差的ELBO近似,影响训练稳定性。本文提出了一种新的RLVR方法——相对得分策略优化(RSPO),通过将奖励优势解释为当前策略与参考策略之间的相对对数比率目标,从而校准噪声估计,提升策略更新的准确性。实验表明,RSPO在规划任务中表现出显著优势,在数学推理任务中也具有竞争力。

详情
英文摘要

Diffusion large language models (dLLMs) offer a promising route to parallel and efficient text generation, but improving their reasoning ability requires effective post-training. Reinforcement learning with verifiable rewards (RLVR) is a natural choice for this purpose, yet its application to dLLMs is hindered by the absence of tractable sequence-level log-ratios, which are central to standard policy optimization. The lack of tractable sequence-level log-ratios forces existing methods to rely on high-variance ELBO-based approximations, where high verifier rewards can amplify inaccurate score estimates and destabilize RL training. To overcome this issue, we propose \textbf{R}elative \textbf{S}core \textbf{P}olicy \textbf{O}ptimization (RSPO), a simple RLVR method that uses verifiable rewards to calibrate noisy likelihood estimates in dLLMs. The core of our algorithm relies on a key observation: a reward advantage can be interpreted not only as an update direction, but also as a target for the relative log-ratio between the current and reference policies. Accordingly, RSPO calibrates this noisy relative log-ratio estimate by comparing its reward advantage with the reward-implied target relative log-ratio, updating the policy according to the gap between the current estimate and the target rather than the raw advantage alone. Experiments on mathematical reasoning and planning benchmarks show that RSPO yields especially strong gains on planning tasks and competitive mathematical-reasoning performance.

2605.10216 2026-05-12 cs.CL

The Impact of Editorial Intervention on Detecting Native Language Traces

Ahmet Yavuz Uluslu, Mark Gales, Kate Knill, Gerold Schneider

AI总结 本文研究了编辑干预对识别作者母语痕迹的影响,探讨在不同程度的语法纠错和改写处理下,母语识别模型的鲁棒性。研究发现,母语特征不仅依赖于表面语法错误,还涉及词汇语义选择、语用迁移和文化视角等深层因素,而轻微编辑能够保留这些特征,保持较高的识别准确率,而过度改写则会显著削弱模型性能。

详情
英文摘要

Native Language Identification (NLI) is the task of determining an author's native language (L1) from their non-native writings. With the advent of human-AI co-authorship, non-native texts are routinely corrected and rewritten by large language models, fundamentally altering the linguistic features NLI models depend on. In this paper, we investigate the robustness of L1 traces across increasing degrees of editorial intervention. By processing 450 essays from the Write & Improve 2024 corpus through varying levels of grammatical error correction (GEC) and paraphrasing, we demonstrate that L1 attribution does not entirely depend on surface-level errors. Instead, the detection models leverage deeper L1 features: unidiomatic lexico-semantic choices, pragmatic transfer, and the author's underlying cultural perspective. We find that minimal edits preserve these structural traces and maintain high profiling accuracy. In contrast, fluency edits and paraphrasing normalize these L1 features, leading to a severe degradation in performance.

2605.10211 2026-05-12 cs.CL cs.AI cs.IR

To Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classification

Maik Larooij, David Graus

AI总结 该研究针对政府文件中需脱敏发布的“审议过程特权”信息,提出了一种基于本地大语言模型的自动分类方法,以替代依赖第三方云API的处理方式。研究采用Qwen3.5 9B等小型模型,在消费级硬件上实现高精度分类,并通过结合思维链提示和基于错误示例的少样本提示,显著提升了召回率和F2分数,性能接近商业模型Gemini 2.5 Flash。分析表明,审议性内容常包含第一人称和表达意见的动词,这些语言特征是分类的关键依据。

Comments Accepted to The First Workshop on Artificial Intelligence & Open Government at the 21st International Conference on Artificial Intelligence and Law (ICAIL), June 8, 2026, Singapore

详情
英文摘要

Government transparency laws, like the Freedom of Information (FOIA) acts in the United States and United Kingdom, and the Woo (Open Government Act) in the Netherlands, grant citizens the right to directly request documents from the government. As these documents might contain sensitive information, such as personal information or threats to national security, the laws allow governments to redact sensitive parts of the documents prior to release. We build on prior research to perform automatic sensitivity classification for the FOIA Exemption 5 deliberative process privilege using Large Language Models (LLMs). However, processing documents not yet cleared for review via third-party cloud APIs is often legally or politically untenable. Therefore, in this work, we perform sensitivity classification with a small, local model, deployable on consumer-grade hardware (Qwen3.5 9B). We compare eight variants of applying LLMs for sentence classification, using well-known prompting techniques, and find that a combination of Chain-of-Thought prompting and few-shot prompting with error-based examples outperforms classification models of earlier work in terms of recall and F2 score. This method also closely approaches the performance of a widely-used, cost-efficient commercial model (Gemini 2.5 Flash). In an additional analysis, we find that sentences that are predicted as deliberative contain more verbs that indicate the expression of opinions, and are more often phrased in in first-person. Above all, deliberativeness seems characterized by the presence of a combination of multiple indicators, in particular the combination of first-person words with a verb for expressing opinion.

2605.10210 2026-05-12 cs.RO cs.CV

Nano-U: Efficient Terrain Segmentation for Tiny Robot Navigation

Federico Pizzolato, Francesco Pasti, Nicola Bellotto

AI总结 本文研究了如何在微型机器人上实现高效的地形分割,以支持其在户外非结构化环境中的自主导航。为了解决现有模型在资源受限的微控制器上部署困难的问题,作者提出了一种名为 Nano-U 的轻量二值分割网络,并结合量化感知蒸馏方法进行训练,显著提升了模型性能。该模型在多个数据集上表现优异,并通过改进的编译器工具链成功部署在低成本微控制器上,实现了低功耗、低延迟的实时地形感知。

Comments Code repository: https://github.com/federico-pizz/Nano-U

详情
英文摘要

Terrain segmentation is a fundamental capability for autonomous mobile robots operating in unstructured outdoor environments. However, state-of-the-art models are incompatible with the memory and compute constraints typical of microcontrollers, limiting scalable deployment in small robotics platforms. To address this gap, we develop a complete framework for robust binary terrain segmentation on a low-cost microcontroller. At the core of our approach we design Nano-U, a highly compact binary segmentation network with a few thousand parameters. To compensate for the network's minimal capacity, we train Nano-U via Quantization-Aware Distillation (QAD), combining knowledge distillation and quantization-aware training. This allows the final quantized model to achieve excellent results on the Botanic Garden dataset and to perform very well on TinyAgri, a custom agricultural field dataset with more challenging scenes. We deploy the quantized Nano-U on a commodity microcontroller by extending MicroFlow, a compiler-based inference engine for TinyML implemented in Rust. By eliminating interpreter overhead and dynamic memory allocation, the quantized model executes on an ESP32-S3 with a minimal memory footprint and low latency. This compiler-based execution demonstrates a viable and energy-efficient solution for perception on low-cost robotic platforms.

2605.10205 2026-05-12 cs.LG

Unveiling High-Probability Generalization in Decentralized SGD

Jiahuan Wang, Ping Luo, Ziqing Wen, Dongsheng Li, Tao Sun

AI总结 本文研究了去中心化随机梯度下降(D-SGD)在大规模分布式学习中的泛化性能,旨在填补传统SGD与D-SGD在高概率泛化界上的理论差距。作者提出了基于点态均匀稳定性的学习理论,推导出D-SGD在凸、强凸和非凸设置下的高概率泛化界,达到了最优的$\mathcal{O}\left(\frac{1}{\sqrt{mn}}\log (1/δ)\right)$收敛速率,并分析了非凸场景下的梯度基度量和优化误差界。研究还考虑了通信开销,分析了时变框架下本地模型的泛化性能。

详情
英文摘要

Decentralized stochastic gradient descent (D-SGD) is an efficient method for large-scale distributed learning. Existing generalization studies mainly address expected results, achieving rates limited to $\mathcal{O}\left(\frac{1}{δ\sqrt{mn}}\right)$, where $δ$ is the confidence parameter, $m$ the number of workers, and $n$ the sample size. When $m=1$, D-SGD reduces to traditional SGD, whose optimal high-probability generalization bound is $\mathcal{O}\left(\frac{1}{\sqrt{n}}\log (1/δ)\right)$. This discrepancy reveals a gap between high-probability guarantees for SGD and those for D-SGD. To close this, we develop a high-probability learning theory for D-SGD, aiming for the optimal $\mathcal{O}\left(\frac{1}{\sqrt{mn}}\log (1/δ)\right)$ rate. We refine bounds for D-SGD using pointwise uniform stability in distributed learning-a weaker notion than uniform stability-and analyze them across convex, strongly convex, and non-convex settings. We also provide high-probability results for gradient-based measures in non-convex cases where only local minima exist, and derive optimization error and excess risk bounds. Finally, accounting for communication overhead, we analyze generalization bounds for local models within time-varying frameworks.

2605.10204 2026-05-12 cs.CV

3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects

Zhicheng Liang, Haoyi Yu, Boyan Li, Dayou Zhang, Zijian Cao, Tianyi Gong, Junhua Liu, Shuguang Cui, Fangxin Wang

AI总结 本文介绍了3DReflecNet,一个专为重建具有反射、透明和低纹理表面物体的3D视觉方法而设计的大规模数据集。该数据集包含超过12万个基于物理渲染的合成样本和1000多个使用消费级设备采集的真实物体,总数据量超过22TB,涵盖了多种材质、复杂光照条件和几何形态。研究还设计了五个核心任务的基准测试,揭示了现有方法在处理这类复杂材料时的性能局限,推动了更鲁棒的3D视觉模型的发展。

Comments This paper has been accepted by CVPR 2026 Oral

详情
英文摘要

Accurate 3D reconstruction of objects with reflective, transparent, or low-texture surfaces still remains notoriously challenging. Such materials often violate key assumptions in multi-view reconstruction pipelines, such as photometric consistency and the availability on distinct geometric texture cues. Existing datasets primarily focus on diffuse, textured objects, and therefore provide limited insight into performance under real-world material complexities. We introduce 3DReflecNet, a large-scale hybrid dataset exceeding 22 TB that is specifically designed to benchmark and advance 3D vision methods for these challenging materials. 3DReflecNet combines two types of data: over 120,000 synthetic instances generated via physically-based rendering of more than 12,000 shapes, and over 1,000 real-world objects captured using consumer devices. Together, these data consist of more than 7 million multi-view frames. The dataset spans diverse materials, complex lighting conditions, and a wide range of geometric forms, including shapes generated from both real and LLM-synthesized 2D images using diffusion-based pipelines. To support robust evaluation, we design benchmarks for five core tasks: image matching, structure-from-motion, novel view synthesis, reflection removal, and relighting. Extensive experiments demonstrate that state-of-the-art methods struggle to maintain accuracy across these settings, highlighting the need for more resilient 3D vision models.

2605.10203 2026-05-12 cs.SD eess.AS

Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration

Haowen Li, Tianxiang Li, Yi Yang, Boyu Cao, Qi Liu

AI总结 该研究提出了一种名为Polyphonia的零样本音色迁移框架,旨在解决多声部音乐中对特定音轨进行音色编辑时背景伴奏易被破坏的问题。其核心方法是引入基于声学信息的注意力校准机制,通过概率声学先验建立粗略边界,从而在保持非目标音轨语义完整性的同时,更精确地定位并修改目标音轨。实验表明,该方法在目标音轨对齐度上比现有方法提升了15.5%,同时保持了较高的音乐保真度和非目标音轨的完整性。

Comments Accepted by ICML 2026

详情
英文摘要

The advancement of diffusion-based text-to-music generation has opened new avenues for zero-shot music editing. However, existing methods fail to achieve stem-specific timbre transfer, which requires altering specific stems while strictly preserving the background accompaniment. This limitation severely hinders practical application, since real-world production necessitates precise manipulation of components within dense mixtures. Our key finding is that, while vanilla cross-attention captures semantic features of stems, it lacks the spectral resolution to strictly localize targets in dense mixtures, leading to boundary leakage. To resolve this dilemma, we propose Polyphonia, a zero-shot editing framework with Acoustic-Informed Attention Calibration. Rather than relying solely on diffuse semantic attention, Polyphonia leverages a probabilistic acoustic prior to establish coarse boundaries, enabling non-target stems preserved precise semantic synthesis. For evaluation, we propose PolyEvalPrompts, a standardized prompt set with 1,170 timbre transfer tasks in polyphonic music. Specifically, Polyphonia achieves an increase of 15.5% in target alignment compared to baselines, while maintaining competitive music fidelity and non-target integrity.

2605.10202 2026-05-12 cs.LG cs.CL

Task-Aware Calibration: Provably Optimal Decoding in LLMs

Tim Tomov, Dominik Fuchsgruber, Rajeev Verma, Stephan Günnemann

AI总结 本文研究了大语言模型(LLM)解码过程中因模型预测分布与真实生成分布不一致而导致的次优决策问题。作者提出了一种任务感知校准(Task Calibration)方法,通过在任务诱导的潜在空间中对模型预测分布进行校准,从而实现更优的解码策略。该方法基于最小贝叶斯风险(MBR)解码理论,证明了在任务校准后的潜在分布上进行解码能够获得最优的生成效果,并引入任务校准误差(TCE)作为衡量校准质量的指标,实验表明该方法在多个任务上有效提升了生成质量。

详情
英文摘要

LLM decoding often relies on the model's predictive distribution to generate an output. Consequently, misalignment with respect to the true generating distribution leads to suboptimal decisions in practice. While a natural solution is to calibrate the model's output distribution, for LLMs, this is ill-posed at the combinatorially vast level of free-form language. We address this by building on the insight that in many tasks, these free-form outputs can be interpreted in a semantically meaningful latent structure, for example, discrete class labels, integers, or sets. We introduce task calibration as a paradigm to calibrate the model's predictive distribution in the task-induced latent space. We apply a decision-theoretic result to show that Minimum Bayes Risk (MBR) decoding on the task-calibrated latent distribution is the optimal decoding strategy on latent model beliefs. Empirically, it consistently improves generation quality across different tasks and baselines. We also introduce Task Calibration Error (TCE), an application-aware calibration metric that quantifies the excess loss due to miscalibration. Our work demonstrates that task calibration enables more reliable model decisions across various tasks and applications.

2605.10199 2026-05-12 cs.CL eess.AS

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Hui Lu, Xueyuan Chen, Huimeng Wang, Shuhai Peng, Shiyin Kang, Xixin Wu, Zhiyong Wu

AI总结 本文研究了在全双工语音对话中,大语言模型(LLM)如何在生成自身语音响应的同时持续监听用户输入的问题。作者提出用户流在LLM中的路由方式是影响系统性能的关键架构问题,并设计了两种路由策略进行对比:一种是直接将用户流注入模型输入,另一种是通过交叉注意力机制访问外部记忆。实验表明,直接注入方式在语义理解和问答任务中表现更优,但在用户打断等场景下容易导致上下文混乱;而交叉注意力路由虽然问答性能稍逊,但能更好地保持生成上下文的稳定性,更具鲁棒性。研究为全双工语音对话系统的设计提供了重要的指导。

详情
英文摘要

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generation. We argue that how the user stream is routed into the LLM is therefore a key architectural question for full-duplex modeling. To study this question, we extend a text-only LLM into a unified full-duplex spoken dialogue system and compare two routing strategies under a shared training pipeline: (i) channel fusion, which injects the user stream directly into the LLM input, and (ii) cross-attention routing, which keeps the user stream as external memory accessed through cross-attention adapters. Experiments on spoken question answering and full-duplex interaction benchmarks reveal a clear tradeoff. Channel fusion yields stronger semantic grounding and consistently better question-answering performance. However, under semantically overlapping conditions such as user interruptions, it is more vulnerable to context corruption: if the model fails to stop in time, the overlapping user stream can interfere with ongoing generation and lead to semantically incoherent continuations. Cross-attention routing underperforms on question answering, but better preserves the LLM generation context and is more robust to this failure mode. These results establish user-stream routing as a central design axis in full-duplex spoken dialogue and offer practical guidance on the tradeoff between semantic integration and context robustness. We provide a demo page for qualitative inspection.

2605.10198 2026-05-12 cs.LG cs.AI

Empty SPACE: Cross-Attention Sparsity for Concept Erasure in Diffusion Models

Nicola Novello, Andrea M. Tonello

AI总结 本文研究如何从文本到图像的扩散模型中去除特定概念,以避免生成受版权保护或不适当的内容。为了解决现有封闭形式概念去除方法在大模型上效果下降的问题,作者提出了一种基于稀疏交叉注意力的高效概念去除方法SPACE,通过迭代更新模型的交叉注意力参数,同时实现概念去除和参数稀疏化,显著提升了去除效果和模型鲁棒性,并大幅降低了存储需求。

详情
英文摘要

Erasing specific concepts from text-to-image diffusion models is essential for avoiding the generation of copyrighted and explicit content. Closed-form concept erasure methods offer a fast alternative to backpropagation-based techniques, but they become less effective when scaling from smaller models such as Stable Diffusion 1.5 to larger models like Stable Diffusion XL. To maintain erasure effectiveness in these larger-scale architectures, we propose SParse cross-Attention-based Concept Erasure (SPACE). SPACE iteratively modifies the cross-attention parameters of a model with a closed-form update that jointly induces sparsity and erases target concepts. By concentrating the concept mapping to a lower-dimensional subspace, SPACE achieves superior erasure efficacy compared to dense baselines. Extensive experimental results show improvements in erasure effectiveness and robustness against adversarial prompts. Furthermore, SPACE achieves 80\%-90\% cross-attention sparsity, reducing the storage requirements for saving the modified parameters by 70\%, demonstrating its memory efficiency.

2605.10196 2026-05-12 cs.LG

Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments

Andrea Rubbi, Arpit Merchant, Samuel Ogden, Amir Akbarnejad, Pietro Liò, Sattar Vakili, Mo Lotfollahi

AI总结 该研究针对高通量基因扰动实验中如何高效发现具有显著表型效应的干预策略这一问题,提出了一种基于概率的主动实验设计方法。核心方法是引入“Probability-of-Hit”获取函数,通过后验概率直接评估候选扰动是否超过预设效应阈值,从而更高效地识别有效干预。该方法在合成数据和真实生物数据上均表现出优越性能,相比基线方法在某些数据集上提升了6.4%的效果。

Comments To be published in International Conference on Machine Learning (ICML) 2026

详情
英文摘要

High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset.

2605.10194 2026-05-12 cs.AI cs.LG

TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment

Jiaxuan Wang, Xuan Ouyang, Zhiyu Chen, Yulan Hu, Zheng Pan, Xin Li, Lan-Zhe Guo

AI总结 本文提出了一种名为TRACE的新型策略,用于改进基于验证奖励的强化学习中的自蒸馏方法。该方法通过仅在注释者标记的关键推理片段上进行对齐,有效减少了冗余梯度更新和特权信息泄露的问题。TRACE结合了正向KL散度、反向KL散度和GRPO等技术,并在训练初期逐步减少KL通道的影响。实验表明,TRACE在多个数学基准测试中优于现有方法,同时保持了模型在分布外任务上的性能,展示了其在提升推理能力和泛化能力方面的有效性。

Comments work in progress

详情
英文摘要

On-policy self-distillation (self-OPD) densifies reinforcement learning with verifiable rewards (RLVR) by letting a policy teach itself under privileged context. We find that when this guidance spans the full response, all-token KL spends gradients on mostly redundant positions and amplifies privileged-information leakage, causing entropy rise, shortened reasoning, and out-of-distribution degradation in long-horizon math training. We propose Token-Routed Alignment for Critical rEasoning (TRACE), which distills only on annotator-marked critical spans: forward KL on key spans of correct rollouts, optional reverse KL on localized error spans, and GRPO on all remaining tokens, with the KL channel annealed away after a short warm-up. Our analysis explains TRACE through two effects: forward KL provides non-vanishing lift to teacher-supported tokens that the student under-allocates, while span masking and decay keep cumulative privileged-gradient exposure finite. On four held-out math benchmarks plus GPQA-Diamond, TRACE improves over GRPO by 2.76 percentage points on average and preserves the Qwen3-8B base OOD score on GPQA-Diamond, where GRPO and all-token self-OPD baselines degrade. Gains persist under online self-annotation (+1.90 percentage points, about 69% of the strong-API gain), reducing the concern that TRACE merely imports external annotator capability. Across scales, the best routed action is base-dependent: on Qwen3-8B it is forward KL on key spans, while on Qwen3-1.7B it shifts to reverse KL on error spans.

2605.10190 2026-05-12 cs.CV

DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer

Soichiro Okazaki, Tatsuya Sasaki, Hiroki Ohashi

AI总结 DetRefiner 是一种用于开放词汇目标检测的模型无关检测优化框架,旨在提升对已见和未见类别的检测性能。该方法通过轻量级的 Transformer 编码器融合全局图像特征和局部图像块特征,生成属性可靠性信息以校准基础检测模型的置信度。DetRefiner 不依赖于基础模型的内部特征或重新训练,仅在推理阶段对检测结果进行辅助校准,显著提升了多个开放词汇检测模型在多个数据集上的性能,尤其在未见类别上取得了最高达 +10.1 AP 的提升。

Comments CVPR 2026 Findings

详情
英文摘要

Open-vocabulary object detection (OVOD) aims to detect both seen and unseen categories, yet existing methods often struggle to generalize to novel objects due to limited integration of global and local contextual cues. We propose DetRefiner, a simple yet effective plug-and-play framework that learns to fuse global and local features to refine open-vocabulary detection. DetRefiner processes global image features and patch-level image features from foundational models (e.g., DINOv3) through a lightweight Transformer encoder. The encoder produces a class vector capturing image-level attributes and patch vectors representing local region attributes, from which attribute reliability is inferred to recalibrate the base model's confidence. Notably, DetRefiner is trained independently of the base OVOD model, requiring neither access to its internal features nor retraining. At inference, it operates solely on the base detector's predictions, producing auxiliary calibration scores that are merged with the base detector's scores to yield the final refined confidence. Despite this simplicity, DetRefiner consistently enhances multiple OVOD models across COCO, LVIS, ODinW13, and Pascal VOC, achieving gains of up to +10.1 AP on novel categories. These results highlight that learning to fuse global and local representations offers a powerful and general mechanism for advancing open-world object detection. Our codes and models are available at https://github.com/hitachi-rd-cv/detrefiner.

2605.10189 2026-05-12 cs.LG cs.AI

ProteinOPD: Towards Effective and Efficient Preference Alignment for Protein Design

Yulin Zhang, He Cao, Zihao Jiang, Chenyi Zi, Zhipeng Zhou, Zijing Liu, Yu Li, Jia Li, Ziqi Gao

AI总结 本文提出了一种名为ProteinOPD的多目标偏好对齐框架,旨在解决蛋白质设计中偏好引导与保持模型原始设计能力之间的矛盾。该方法借鉴了On-Policy Distillation(OPD)的思想,通过在学生模型的轨迹上进行标记级的知识蒸馏,将多个偏好目标的教师模型知识整合到一个共享的学生模型中,从而在保持蛋白质语言模型设计能力的同时,有效平衡多个竞争目标。实验表明,ProteinOPD在提升目标偏好性能的同时,显著加快了训练速度,优于基于强化学习的对齐方法。

详情
英文摘要

Designing proteins with desired functions or properties represents a core goal in synthetic biology and drug discovery. Recent advances in protein language models (PLMs) have enabled the generation of highly designable protein sequences, while preference alignment provides a promising way to steer designs toward desired functions and properties. Nevertheless, they often trigger catastrophic forgetting of pretrained knowledge, degrading basic designability and failing to balance multiple competing objectives. To address these issues, we draw inspiration from On-Policy Distillation (OPD), an advanced post-training method renowned for mitigating catastrophic forgetting through its mode-seeking nature. In this work, we propose ProteinOPD, a multi-objective preference alignment framework that can effectively balance multiple preference objectives while maintaining the inherent designability of PLMs. ProteinOPD adapts a pretrained PLM into preference-specific teachers and distills their knowledge into a shared student via token-level OPD on the student's own trajectories. During this process, the student is aligned to a unique normalized geometric consensus of weighted teachers while ensuring bounded optimization under conflicts. This bridges the gap for OPD in multi-objective/teacher alignment. Extensive experiments show that ProteinOPD achieves substantial gains on target preference objectives without compromising the designability, with an 8x training speedup over RL-based alignment competitors.

2605.10186 2026-05-12 cs.CL cs.AI

LegalCiteBench: Evaluating Citation Reliability in Legal Language Models

Sijia Chen, Hang Yin, Shunfan Zhou

AI总结 该论文提出了一个名为 LegalCiteBench 的基准,用于评估法律语言模型在无外部信息支持下的引用可靠性问题。研究发现,即使是最强大的模型在闭卷设置下也难以准确恢复或生成正确的法律引用,错误率高达94%以上。该基准包含五个以引用为核心的任务,旨在诊断模型在缺乏外部依据时生成错误引用、验证引用准确性以及放弃回答的行为。

Comments Preprint. 23 pages including references and appendices

详情
英文摘要

Large language models (LLMs) are increasingly integrated into legal drafting and research workflows, where incorrect citations or fabricated precedents can cause serious professional harm. Existing legal benchmarks largely emphasize statutory reasoning, contract understanding, or general legal question answering, but they do not directly study a central common-law failure mode: when asked to provide case authorities without external grounding, models may return plausible-looking but incorrect citations or cases. We introduce LegalCiteBench, a benchmark for studying closed-book citation recovery, citation verification, and case matching in legal language models. LegalCiteBench contains approximately 24K evaluation instances constructed from 1,000 real U.S. judicial opinions from the Case Law Access Project. The benchmark covers five citation-centric tasks: citation retrieval, citation completion, citation error detection, case matching, and case verification and correction. Across 21 LLMs, exact citation recovery remains highly challenging in this closed-book setting: even the strongest models score below 7/100 on citation retrieval and completion. Within the evaluated models, scale and legal-domain pretraining provide limited gains and do not resolve this difficulty. Models also frequently provide concrete but incorrect or low-overlap authorities under our evaluation protocol, with Misleading Answer Rates (MAR) exceeding 94% for 20 of 21 evaluated models on retrieval-heavy tasks. A prompt-only abstention experiment shows that explicit uncertainty instructions reduce some confident fabrication but do not improve citation correctness. LegalCiteBench is intended as a diagnostic framework for studying authority generation failures, verification behavior, and abstention when external grounding is absent, incomplete, or bypassed.

2605.10184 2026-05-12 cs.CV cs.AI

Developing a foundation model for high-resolution remote sensing data of the Netherlands

Paul Vermeeren, Heysem Kaya

AI总结 本文提出了一种基于荷兰高分辨率(1.2米)卫星影像的基座模型,结合卷积神经网络与视觉Transformer,以同时捕捉景观的细纹理、边缘、小物体以及大范围地形结构、高程模式和土地覆盖分布等特征。通过引入时间序列数据,模型能够学习跨时间的上下文信息,提升对地形特征、土地覆盖变化和季节动态等时序依赖关系的建模能力,从而减少特征歧义、增强表征学习并提高小样本下的泛化性能。实验表明,该模型在荷兰植被监测等任务中表现优异,并在多个全球基准数据集上取得了与先进模型相当的性能,展现了在有限数据和参数规模下学习通用表征的能力。

Comments 9 pages, 4 figures, under review in a journal

详情
英文摘要

We develop a foundation model using 1.2m high resolution satellite images of the Netherlands. By combining a Convolutional Neural Network and a Vision Transformer, the model captures both low- and high-frequency landscape features, such as fine textures, edges, and small objects as well as large terrain structures, elevation patterns, and land-cover distributions. Leveraging temporal data as input, the model learns from broader contextual information across time, allowing the model to exploit the temporal dependencies, such as topographic features, land-cover changes, and seasonal dynamics. These additional constraints reduce feature ambiguity, improve representation learning, and enable better generalization with fewer labeled samples. The foundation model is evaluated on multiple downstream tasks, ranging from use cases within the Netherlands to global benchmarking datasets. On the vegetation monitoring dataset of the Netherlands, the model shows clear performance improvements by incorporating temporal information instead of relying on a single time point. Despite using a smaller model and less pretraining data limited to the Netherlands, it achieves competitive results on global benchmarks when compared to state-of-the-art models. These results demonstrate that the model can learn rich, generalizable representations from limited data, achieving competitive performance on global benchmarks while using a fraction of the parameters of larger state-of-the-art remote sensing models. To maximize reproducibility and reuse, we made the scripts and the model accessible on GitHub.

2605.10183 2026-05-12 cs.LG

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

Jinping Wang, Qinhan Liu, Zhiwu Xie, Zhiqiang Gao

AI总结 该论文重新审视了尖锐度感知最小化(SAM)方法中的损失与扰动半径之间的不匹配问题,提出了一种新的方法——损失均衡SAM(LE-SAM),通过固定损失空间预算而非传统固定的参数空间扰动半径,有效削弱梯度模长主导的学习信号,使优化过程更关注曲率主导的平坦极小值。实验表明,LE-SAM在多个基准任务中表现出更强的泛化能力,优于原始SAM及其变体,达到了当前最优性能。

Comments Accepted by ICML2026

详情
英文摘要

Sharpness-Aware Minimization (SAM) improves generalization by minimizing the worst-case loss within a fixed parameter-space radius neighborhood. SAM and its variants mainly rely on a first-order linearized surrogate, while flat minima are inherently a second-order (curvature) notion.We revisit this mismatch and propose Loss-Equated SAM (LE-SAM), which inverts the traditional SAM mechanism that fixed perturbation radius with a fixed loss-space budget,effectively removing gradient-norm-dominated learning signals and shifting optimization toward curvature-dominated terms. Extensive experiments across diverse benchmarks and tasks demonstrate the strong generalization ability of LESAM that consistently outperforms SAM and even its variants, achieving the state-of-the-art performance.

2605.10179 2026-05-12 cs.LG cs.AI

One-Step Graph-Structured Neural Flows for Irregular Multivariate Time Series Classification

Mengzhou Gao, Kaiwei Wang, Pengfei Jiao

AI总结 该研究提出了一种名为图结构神经流(GSNF)的一步式模型,用于处理不规则多变量时间序列分类问题。为了解决现有方法在变量间交互建模方面的不足,GSNF引入了两种辅助轨迹自监督策略,通过轨迹发散和逆向时间生成增强图结构学习的效果。实验表明,该方法在多个真实数据集上取得了最先进的分类性能,同时保持了较高的训练效率和较低的内存消耗。

详情
英文摘要

Neural Flows efficiently model irregular multivariate time series by directly learning ODE solution trajectories with neural networks, bypassing step-by-step numerical solvers. Despite their efficiency, many existing approaches treat variables independently, leaving inter-variable interactions underexplored. Moreover, their one-step mapping makes interaction modeling inherently challenging, as it removes the iterative refinement of interactions during learning. To address this challenge, we propose one-step Graph-Structured Neural Flows (GSNF), which introduce two auxiliary-trajectory self-supervision strategies to strengthen interaction learning: (i) interaction-aware trajectory generation via re-initialization, which induces trajectory divergence to expose graph-induced interactions, with a theoretically derived lower bound on divergence; and (ii) reverse-time trajectory generation, which enforces forward-backward consistency to regularize graph learning, enabled by flow invertibility. Experiments on five real-world datasets show that GSNF achieves state-of-the-art classification performance with highly competitive training time and memory usage.

2605.10177 2026-05-12 cs.CV cs.AI cs.RO

MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning

Guangli Chen, Dianzhao Li, Wenjian Zhong, Bangquan Xie, Ostap Okhrin

AI总结 本文提出了一种名为MTA-RL的框架,通过基于多模态Transformer的3D可操作性表示和强化学习,提升城市自动驾驶的鲁棒性。该方法将RGB图像和LiDAR点云融合,生成结构化的几何感知可操作性表示,作为强化学习策略的输入,从而提高决策效率和稳定性。实验表明,MTA-RL在不同密度的交通场景中均优于现有方法,并在未见过的城市环境中表现出优异的零样本泛化能力。

详情
英文摘要

Robust urban autonomous driving requires reliable 3D scene understanding and stable decision-making under dense interactions. However, existing end-to-end models lack interpretability, while modular pipelines suffer from error propagation across brittle interfaces. This paper proposes MTA-RL, the first framework that bridges perception and control through Multi-modal Transformer-based 3D Affordances and Reinforcement Learning (RL). Unlike previous fusion models that directly regress actions, RGB images and LiDAR point clouds are fused using a transformer architecture to predict explicit, geometry-aware affordance representations. These structured representations serve as a compact observation space, enabling the RL policy to operate purely on predicted driving semantics, which significantly improves sample efficiency and stability. Extensive evaluations in CARLA Town01-03 across varying densities (20-60 background vehicles) show that MTA-RL consistently outperforms state-of-the-art baselines. Trained solely on Town03, our method demonstrates superior zero-shot generalization in unseen towns, achieving up to a 9.0% increase in Route Completion, an 11.0% increase in Total Distance, and an 83.7% improvement in Distance Per Violation. Furthermore, ablation studies confirm that our multi-modal fusion and reward shaping are critical, significantly outperforming image-only and unshaped variants, demonstrating the effectiveness of MTA-RL for robust urban autonomous driving.

2605.10174 2026-05-12 cs.CV

BathyFacto: Refraction-Aware Two-Media Neural Radiance Fields for Bathymetry

Markus Brezovsky, Anatol Günthner, Frederik Schulte, Lukas Winiwarter, Boris Jutzi, Gottfried Mandlburger

AI总结 BathyFacto 是一种针对水下测绘的折射感知双介质神经辐射场方法,旨在解决传统光束法重建在水下场景中因光折射导致的深度偏差问题。该方法通过引入介质条件颜色头和基于哈希网格的密度场,结合斯涅尔定律模拟光线在空气-水界面的折射路径,从而实现更精确的水下点云重建。实验表明,BathyFacto 在模拟场景中显著提升了重建精度和完整性,优于传统方法和未考虑折射的神经辐射场基线。

Comments 16 pages, 8 figures, 3 tables. Submitted to ISPRS Open Journal of Photogrammetry and Remote Sensing, Special Issue "3D Underwater Mapping from Above and Below"

详情
英文摘要

Through-water photogrammetry based on UAV imagery enables shallow-water bathymetry, but refraction at the air-water interface violates the straight-ray assumption of Structure-from-Motion and causes systematic depth bias. We present BathyFacto, a refraction-aware two-media extension of Nerfacto integrated into Nerfstudio that targets metrically precise underwater point clouds. BathyFacto uses a shared hash-grid-based density field with a medium-conditioned color head that receives a one-bit medium flag (air or water) and traces each camera ray as two segments: a straight segment in air up to a planar water surface and a refracted segment in water computed via Snell's law with known refractive indices. To allocate samples efficiently across the air-water boundary, we employ a single proposal-network sampler that operates on a virtual straight ray spanning both media, combined with a kinked density wrapper that transparently corrects water-segment positions along the refracted direction before density evaluation. A data adaptation pipeline converts photogrammetric reconstructions to a Nerfstudio-compatible format, estimates the water plane from boundary markers, and provides per-pixel medium masks to gate refraction. We also extend the point cloud export with refraction-corrected backprojection and reversible coordinate transforms to world and global frames. On a simulated two-media scene with known ground truth, BathyFacto with refraction achieves a Cloud-to-Mesh mean distance of 0.06 m and 87 % completeness, compared to 0.52 m / 29 % for the Nerfacto baseline and 0.36 m / 21% for conventional MVS without refraction correction.

2605.10172 2026-05-12 cs.CV cs.CL

V-ABS: Action-Observer Driven Beam Search for Dynamic Visual Reasoning

Zhiwei Ning, Xuanang Gao, Jiaxi Cao, Gengming Zhang, Shengnan Ma, Wenwen Tong, Hanming Deng, Jie Yang, Wei Liu

AI总结 本文提出了一种名为V-ABS的行动观察者驱动的束搜索框架,用于解决动态视觉推理中的多步骤复杂任务。该方法通过引入思考者-行动者-观察者迭代机制,结合基于熵的自适应加权算法,有效缓解了想象-行动-观察者偏差(IAO偏差),提升了推理的稳定性和最优性。实验表明,V-ABS在多个基准测试中均取得领先性能,显著优于现有模型。

详情
英文摘要

Multimodal large language models (MLLMs) have achieved remarkable success in general perception, yet complex multi-step visual reasoning remains a persistent challenge. Although recent agentic approaches incorporate tool use, they often neglect critical execution feedback. Consequently, they suffer from the imagination-action-observer (IAO) bias, a misalignment between prior imagination and observer feedback that undermines reasoning stability and optimality. To bridge this gap, we introduce V-ABS, an action-observer driven beam search framework that enables deliberate reasoning through thinker-actor-observer iterations. We also propose an entropy-based adaptive weighting algorithm to mitigate the IAO bias by dynamically balancing the confidence scores between the policy priors and the observational feedback. Moreover, we construct a large-scale supervised fine-tuning (SFT) dataset comprising over 80k samples to guide the model to assign higher prior confidence to correct action paths. Extensive experiments across eight diverse benchmarks show that V-ABS achieves state-of-the-art performance, delivering an average improvement of 19.7% on the Qwen3-VL-8B baseline and consistent gains across both open-source and proprietary models.

2605.10171 2026-05-12 cs.CL cs.AI

When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews

Sandeep Kumar, Yash Kamdar, Abid Hossain, Bharti Kumari, Tanik Saikh, Asif Ekbal

AI总结 科学同行评审中常常存在专家意见不一致的现象,随着会议投稿数量的增加,识别和理解这些分歧变得越来越具有挑战性。本文提出了一种细粒度的矛盾分析方法,通过识别完整评审中的矛盾证据片段并赋予分歧强度评分,更准确地刻画评审间的冲突程度。为此,研究者构建了RevCI数据集,并设计了IMPACT框架,结合多智能体推理与证据提取,实现了对矛盾及其严重程度的建模,同时提出了轻量模型TIDE以实现高效推理。

Comments accepted at ACL 2026

详情
英文摘要

Scientific peer reviews frequently contain conflicting expert judgments, and the increasing scale of conference submissions makes it challenging for Area Chairs and editors to reliably identify and interpret such disagreements. Existing approaches typically frame reviewer disagreement as binary contradiction detection over isolated sentence pairs, abstracting away the review-level context and obscuring differences in the severity of evaluative conflict. In this work, we introduce a fine-grained formulation of reviewer contradiction analysis that operates over full peer reviews by explicitly identifying contradiction evidence spans and assigning graded disagreement intensity scores. To support this task, we present RevCI, an expert-annotated benchmark of peer-review pairs with evidence-level contradiction annotations with graded intensity labels. We further propose IMPACT, a structured multi-agent framework that integrates aspect-conditioned evidence extraction, deliberative reasoning, and adjudication to model reviewer contradictions and their intensity. To support efficient deployment, we distill IMPACT into TIDE, a small language model that predicts contradiction evidence and intensity in a single forward pass. Experimental results show that IMPACT substantially outperforms strong single-agent and generic multi-agent baselines in both evidence identification and intensity agreement, while TIDE achieves competitive performance at significantly lower inference cost.

2605.10170 2026-05-12 cs.LG

Balancing Efficiency and Fairness in Traffic Light Control through Deep Reinforcement Learning

Matteo Cederle, Giacomo Scatto, Gian Antonio Susto

AI总结 本文研究如何通过深度强化学习在交通信号灯控制中平衡效率与公平性。提出了一种新型的深度强化学习代理,能够在动态交通条件下同时考虑车辆和行人流量的公平性需求,实现两者的动态协调。实验表明,该方法在缓解交通拥堵的同时,有效保障了不同道路使用者的公平服务,为智能城市中的交通管理提供了实用且灵活的解决方案。

Comments Paper accepted to the 2026 IFAC World Congress, held in Busan (KOR), August 23rd-28th, 2026

详情
英文摘要

Urban traffic congestion presents a significant challenge for modern cities, which impacts mobility and sustainability. Traditional traffic light control systems often fail to adapt to dynamic conditions, leading to inefficiencies. This paper proposes a novel deep reinforcement learning agent for traffic light control that addresses this limitation by explicitly integrating fairness considerations for both vehicular and pedestrian traffic. Unlike prior work, our approach dynamically balances these flows based on real-time demand, moving beyond systems focused solely on vehicles. Experimental results demonstrate that our agent effectively reduces congestion while ensuring equitable service for both the categories of road users. This research contributes to a practical and adaptable solution for intelligent traffic management within the framework of smart cities, paving the way for more efficient and inclusive urban mobility.

2605.10169 2026-05-12 cs.AI cs.GT

Automated Approach for Solving Infinite-state Polynomial Reachability Games

Krishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi, Maximilian Seeliger, Đorđe Žikelić

AI总结 本文研究无限状态图上的回合制可达性博弈,重点在于确定“REACH”玩家是否存在并计算其赢得游戏的策略。作者提出了排名证明(ranking certificates)作为一种完备且可靠的证明规则,并设计了一种针对多项式可达性博弈的全自动算法,能够在子指数时间内计算出赢得策略并生成形式化正确性证明。实验表明,该方法能够解决现有方法难以处理的复杂案例,例如经典“灰姑娘与继母”博弈中首次实现了任意精度参数下的最优策略计算。

详情
英文摘要

Reachability games are two-player games played on a graph, where the objective of $\texttt{REACH}$ player is to reach the target set whereas the objective of $\texttt{SAFE}$ player is to stay away from the target set. Reachability games have important applications in artificial intelligence and reactive synthesis, and many of these applications give rise to infinite-state reachability games. In this paper, we study turn-based reachability games on infinite-state graphs defined over valuations of a finite set of real variables. We consider the problem of determining the existence of and computing a winning strategy for $\texttt{REACH}$ player. Our contributions are twofold. First, we propose ranking certificates for reachability games, a sound and complete proof rule for proving that $\texttt{REACH}$ player has a winning strategy from the specified initial state. Second, we consider polynomial reachability games, where transitions and objectives are described by polynomial constraints over real variables, and propose a fully automated algorithm for computing a winning strategy for $\texttt{REACH}$ player together with a formal correctness witness in the form of a ranking certificate. The algorithm is sound, semi-complete, and runs in sub-exponential time. Our experiments demonstrate the ability of our method to solve challenging examples from the literature that were out of the reach of existing methods. Specifically, for the classical Cinderella-Stepmother game, we are able to compute an optimal winning strategy for an arbitrary precision parameter for the first time.

2605.10168 2026-05-12 cs.CL cs.IR

ASTRA-QA: A Benchmark for Abstract Question Answering over Documents

Shu Wang, Shansong Zhou, Xinyang Wang, Shiwei Wang, Hulong Wu, Yixiang Fang

AI总结 本文提出ASTRA-QA,一个用于文档抽象问答的基准数据集,旨在解决现有问答基准在处理需要综合多文档信息的抽象问题时支持不足的问题。该数据集包含869个问答实例,涵盖五类抽象问题和三种可控检索范围,并为每个实例提供了明确的评估标注,如答案主题集、未支持主题和对齐证据。通过直接评分主题覆盖度和未支持内容,ASTRA-QA实现了无需详尽对比的可扩展评估,并在多种检索增强生成方法上验证了其对覆盖性、幻觉和检索鲁棒性的诊断能力。

详情
英文摘要

Document-based question answering (QA) increasingly includes abstract questions that require synthesizing scattered information from long documents or across multiple documents into coherent answers. However, this setting is still poorly supported by existing benchmarks and evaluation methods, which often lack stable abstract references or rely on coarse similarity metrics and unstable head-to-head comparisons. To alleviate this issue, we introduce ASTRA-QA, a benchmark for AbSTRAct Question Answering over documents. ASTRA-QA contains 869 QA instances over academic papers and news documents, covering five abstract question types and three controlled retrieval scopes. Each instance is equipped with explicit evaluation annotations, including answer topic sets, curated unsupported topics, and aligned evidence. Building on these annotations, ASTRA-QA assesses whether answers cover required key points and avoid unsupported content by directly scoring topic coverage and curated unsupported content, enabling scalable evaluation without exhaustive head-to-head comparisons. Experiments with representative Retrieval-Augmented Generation (RAG) methods spanning vanilla, graph-based, and hierarchical retrieval settings show that ASTRA-QA provides reference-grounded diagnostics for coverage, hallucination, and retrieval-scope robustness. Our dataset and code are available at https://xinyangsally.github.io/astra-benchmark.