arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2136
专题追踪
2602.01023 2026-06-10 cs.IR cs.AI cs.LG

Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

通过检索增强生成和多目标对齐统一查询自动补全中的排序与生成

Kai Yuan, Anthony Zheng, Jia Hu, Divyanshu Sheth, Hemanth Velaga, Kylee Kim, Matteo Guarrera, Besim Avci, Jianhua Li, Xuetao Yin, Rajyashree Mukherjee, Sean Suchter

发表机构 * Apple(苹果公司) UC Berkeley(加州大学伯克利分校)

AI总结 提出一个统一框架,通过检索增强生成(RAG)和多目标直接偏好优化(DPO)将查询自动补全重构为端到端列表生成,解决传统流水线长尾覆盖不足和生成方法幻觉风险的问题,并在大规模商业搜索平台上验证了有效性。

Comments 11 pages, 4 figures

Journal ref Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26), August 09--13, 2026, Jeju Island, Republic of Korea

详情
AI中文摘要

查询自动补全(QAC)在用户输入时建议查询补全,帮助用户表达意图并更高效地获取结果。现有方法面临根本性挑战:传统的检索-排序流水线长尾覆盖有限且需要大量特征工程,而最近的生成方法存在幻觉和安全风险。我们提出了一个统一框架,通过检索增强生成(RAG)和多目标直接偏好优化(DPO)将QAC重构为端到端列表生成。我们的方法结合了三个关键创新:(1)将QAC重构为具有多目标优化的端到端列表生成;(2)定义并部署一套基于规则、基于模型和以LLM为评判的验证器用于QAC,并在综合方法中使用它们,结合RAG、多目标DPO和迭代批评-修订以生成高质量合成数据;(3)一种混合服务架构,可在严格的延迟约束下实现高效的生产部署。在大规模商业搜索平台上的评估显示了显著改进:离线指标在所有维度上均有提升,人工评估获得+0.40至+0.69的偏好分数,受控在线实验实现了击键次数减少5.44%和建议采纳率增加3.46%,验证了结合RAG和多目标对齐的统一生成为生产级QAC提供了有效解决方案。这项工作代表了向由大语言模型、RAG和多目标对齐驱动的端到端生成的范式转变,建立了一个经过生产验证的框架,可惠及更广泛的搜索和推荐行业。

英文摘要

Query Auto-Completion (QAC) suggests query completions as users type, helping them articulate intent and reach results more efficiently. Existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have limited long-tail coverage and require extensive feature engineering, while recent generative methods suffer from hallucination and safety risks. We present a unified framework that reformulates QAC as end-to-end list generation through Retrieval-Augmented Generation (RAG) and multi-objective Direct Preference Optimization (DPO). Our approach combines three key innovations: (1) reformulating QAC as end-to-end list generation with multi-objective optimization; (2) defining and deploying a suite of rule-based, model-based, and LLM-as-judge verifiers for QAC, and using them in a comprehensive methodology that combines RAG, multi-objective DPO, and iterative critique-revision for high-quality synthetic data; (3) a hybrid serving architecture enabling efficient production deployment under strict latency constraints. Evaluation on a large-scale commercial search platform demonstrates substantial improvements: offline metrics show gains across all dimensions, human evaluation yields +0.40 to +0.69 preference scores, and a controlled online experiment achieves 5.44\% reduction in keystrokes and 3.46\% increase in suggestion adoption, validating that unified generation with RAG and multi-objective alignment provides an effective solution for production QAC. This work represents a paradigm shift to end-to-end generation powered by large language models, RAG, and multi-objective alignment, establishing a production-validated framework that can benefit the broader search and recommendation industry.

2509.22148 2026-06-10 eess.AS cs.SD

Speaker Anonymisation for Speech-based Suicide Risk Detection

针对基于语音的自杀风险检测的说话人匿名化

Ziyun Cui, Sike Jia, Yang Lin, Yinan Duan, Diyang Qu, Runsen Chen, Chao Zhang, Chang Lei, Wen Wu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文研究了语音基于自杀风险检测的说话人匿名化,评估了传统信号处理、神经语音转换和语音合成等方法的平衡,结果表明保留互补信息的匿名化方法能保持检测性能。

Comments Accepted by ICASSP 2026

详情
AI中文摘要

青少年自杀是全球卫生问题,语音为自动自杀风险检测提供了成本效益高的模态。鉴于易受伤害的人群,保护说话人身份尤为重要,因为语音本身如果数据泄露或被恶意利用,可能泄露个人身份信息。本文首次系统研究了语音基于自杀风险检测的说话人匿名化。调查了广泛匿名化方法,包括基于传统信号处理、神经语音转换和语音合成的技术。构建了全面的评估框架,以评估保护说话人身份与保留对自杀风险检测至关重要的信息之间的权衡。结果表明,结合保留互补信息的匿名化方法可实现与原始语音相当的检测性能,同时保护易受伤害人群的说话人身份。

英文摘要

Adolescent suicide is a critical global health issue, and speech provides a cost-effective modality for automatic suicide risk detection. Given the vulnerable population, protecting speaker identity is particularly important, as speech itself can reveal personally identifiable information if the data is leaked or maliciously exploited. This work presents the first systematic study of speaker anonymisation for speech-based suicide risk detection. A broad range of anonymisation methods are investigated, including techniques based on traditional signal processing, neural voice conversion, and speech synthesis. A comprehensive evaluation framework is built to assess the trade-off between protecting speaker identity and preserving information essential for suicide risk detection. Results show that combining anonymisation methods that retain complementary information yields detection performance comparable to that of original speech, while achieving protection of speaker identity for vulnerable populations.

2601.13406 2026-06-10 cs.HC cs.AI

Integrating Virtual Reality and Large Language Models for Team-Based Non-Technical Skills Training and Evaluation in the Operating Room

将虚拟现实与大型语言模型结合用于手术室基于团队的非技术技能训练与评估

Jacob Barker, Doga Demirel, Cullen Jackson, Anna Johansson, Robbin Miraglia, Darian Hoagland, Stephanie B. Jones, John Mitchell, Daniel B. Jones, Suvranu De

发表机构 * Beth Israel Deaconess Medical Center Center(贝希斯尔德医疗中心中心) Department of Surgery, Northwell Health(外科,北well健康) College of Engineering, Florida Agricultural and Mechanical University and Florida State University(工程学院,佛罗里达农业与机械大学和佛罗里达州立大学)

AI总结 本文提出VORTeX平台,结合虚拟现实与大型语言模型,用于手术室团队非技术技能的训练与评估,通过分析团队对话生成交互图谱,提升沟通与协作能力。

Comments 23 pages, 7 figures, 1 table, 2 Appendices

Journal ref npj Digit. Surg. 1, 10 (2026)

详情
AI中文摘要

尽管有效的团队合作与沟通对手术安全至关重要,但结构化训练非技术技能(NTS)仍不如技术模拟充分。ACS/APDS III期团队技能课程要求可扩展的工具,既能教学又能客观评估这些能力。我们引入虚拟手术室团队经验(VORTeX),一种多用户虚拟现实(VR)平台,结合沉浸式团队模拟与大型语言模型(LLM)分析,用于训练和评估沟通、决策、团队合作和领导力。团队对话使用源自外科非技术技能(NOTSS)框架的结构化提示进行分析,能够自动分类行为并生成定向交互图谱,量化沟通结构和等级。两个腹腔内紧急情景,气胸和腹腔内出血,被实施以引发现实压力和协作。十二名外科专业人员在2024年SAGES会议上完成了试点测试,评价VORTeX为直观、沉浸和有价值的团队合作与沟通发展工具。LLM始终产生可解释的沟通网络,反映预期的操作等级,外科医生作为中心整合者,护士作为发起者,麻醉师作为平衡中介。通过结合沉浸式VR与LLM驱动的行为分析,VORTeX提供了一个可扩展、隐私合规的框架,用于客观评估和自动化、数据驱动的解围,适用于分布式培训环境。

英文摘要

Although effective teamwork and communication are critical to surgical safety, structured training for non-technical skills (NTS) remains limited compared with technical simulation. The ACS/APDS Phase III Team-Based Skills Curriculum calls for scalable tools that both teach and objectively assess these competencies during laparoscopic emergencies. We introduce the Virtual Operating Room Team Experience (VORTeX), a multi-user virtual reality (VR) platform that integrates immersive team simulation with large language model (LLM) analytics to train and evaluate communication, decision-making, teamwork, and leadership. Team dialogue is analyzed using structured prompts derived from the Non-Technical Skills for Surgeons (NOTSS) framework, enabling automated classification of behaviors and generation of directed interaction graphs that quantify communication structure and hierarchy. Two laparoscopic emergency scenarios, pneumothorax and intra-abdominal bleeding, were implemented to elicit realistic stress and collaboration. Twelve surgical professionals completed pilot sessions at the 2024 SAGES conference, rating VORTeX as intuitive, immersive, and valuable for developing teamwork and communication. The LLM consistently produced interpretable communication networks reflecting expected operative hierarchies, with surgeons as central integrators, nurses as initiators, and anesthesiologists as balanced intermediaries. By integrating immersive VR with LLM-driven behavioral analytics, VORTeX provides a scalable, privacy-compliant framework for objective assessment and automated, data-informed debriefing across distributed training environments.

2601.09620 2026-06-10 cs.HC cs.AI cs.CY

Full Disclosure, Less Trust? How the Level of Detail about AI Use in News Writing Affects Readers' Trust

全面披露,更少信任?新闻写作中AI使用细节程度如何影响读者信任

Pooja Prajod, Hannes Cools, Thomas Röggla, Karthikeya Puttur Venkatraj, Amber Kusters, Alia ElKattan, Pablo Cesar, Abdallah El Ali

发表机构 * Centrum Wiskunde & Informatica(数学与信息学中心) University of Amsterdam(阿姆斯特丹大学) New York University(纽约大学) TU Delft(代尔夫特理工大学) Utrecht University(乌得勒支大学)

AI总结 研究探讨新闻写作中AI使用细节披露程度对读者信任的影响,发现详细披露会降低信任,但促使更多读者核查信息源,揭示透明度与信任之间的权衡。

详情
AI中文摘要

随着人工智能在新闻生产中的整合日益增加,对AI使用透明度的需求已获得广泛关注。最近的研究表明,AI披露可能导致“透明度困境”,即披露会降低读者信任。然而,关于AI披露细节程度如何影响信任以及在新闻背景下如何促成这一困境仍知之甚少。在本项3×2×2混合因子研究中,我们调查了三种AI披露水平(无、一行、详细)在两种新闻类型(政治与生活方式)和两种AI参与程度(低、高)下对新闻读者信任的影响。我们使用新闻媒体信任问卷以及两个决策行为(信息源核查和订阅决定)来测量信任。问卷回答和订阅率显示,只有详细AI披露导致信任下降,而信息源核查行为在一行和详细披露中均增加,且对详细披露影响更显著。半结构化访谈的见解表明,信息源核查行为主要由对主题的兴趣驱动,其次是信任,而信任是影响订阅决定的主要因素。约三分之二的参与者表达了对详细披露的偏好,而大多数偏好一行披露的参与者则表明需要按需详细披露格式。我们的发现表明,并非所有AI披露都会导致透明度困境,而是反映了读者对更多透明度的渴望与对AI辅助新闻内容信任之间的权衡。

英文摘要

As artificial intelligence (AI) is increasingly integrated into news production, calls for transparency about the use of AI have gained considerable traction. Recent studies suggest that AI disclosures can lead to a ``transparency dilemma'', where disclosure reduces readers' trust. However, little is known about how the \textit{level of detail} in AI disclosures influences trust and contributes to this dilemma within the news context. In this 3$\times$2$\times$2 mixed factorial study with 40 participants, we investigate how three levels of AI disclosures (none, one-line, detailed) across two types of news (politics and lifestyle) and two levels of AI involvement (low and high) affect news readers' trust. We measured trust using the News Media Trust questionnaire, along with two decision behaviors: source-checking and subscription decisions. Questionnaire responses and subscription rates showed a decline in trust only for detailed AI disclosures, whereas source-checking behavior increased for both one-line and detailed disclosures, with the effect being more pronounced for detailed disclosures. Insights from semi-structured interviews suggest that source-checking behavior was primarily driven by interest in the topic, followed by trust, whereas trust was the main factor influencing subscription decisions. Around two-thirds of participants expressed a preference for detailed disclosures, while most participants who preferred one-line indicated a need for detail-on-demand disclosure formats. Our findings show that not all AI disclosures lead to a transparency dilemma, but instead reflect a trade-off between readers' desire for more transparency and their trust in AI-assisted news content.

2510.09498 2026-06-10 q-bio.TO cs.CE cs.LG

Unsupervised full-field Bayesian inference of orthotropic hyperelasticity from a single biaxial test: a myocardial case study

无监督的全场贝叶斯推断各向异性超弹性材料从单次双轴测试:心肌案例研究

Rogier P. Krijnen, Akshay Joshi, Siddhant Kumar, Mathias Peirlinck

发表机构 * TUDelft(代尔夫特理工大学)

AI总结 本文提出利用全场运动学进行无监督贝叶斯推断,实现从单次双轴测试中可靠恢复各向异性超弹性材料参数,减少样本需求和实验操作。

详情
AI中文摘要

心脏肌肉组织在被动变形过程中表现出高度非线性超弹性和各向异性材料行为。传统本构识别协议通常结合多种加载模式,通常需要多个样本和大量处理。在软活组织中,此类协议受到样本间和样本内变异性以及操作诱导的机械响应变化的挑战,这些变化可能偏转逆校准。在本工作中,我们利用空间异质性的全场运动学作为多模式测试的替代方案。我们将EUCLID方法重新定向为用于高度非线性、各向异性本构模型的贝叶斯参数推断。使用合成心肌组织板,我们证明单次异质双轴实验结合稀疏反力测量能够可靠恢复Holzapfel-Ogden参数并量化不确定性,适用于多种噪声水平。推断响应与地面真实模拟高度一致,并产生反映测量噪声对各向异性材料模型推断影响的可信区间。我们的工作支持从单次双轴测试中进行非线性各向异性材料模型的单次测试、不确定性感知表征,减少样本需求和实验操作。

英文摘要

Cardiac muscle tissue exhibits highly non-linear hyperelastic and orthotropic material behavior during passive deformation. Traditional constitutive identification protocols therefore combine multiple loading modes and typically require multiple specimens and substantial handling. In soft living tissues, such protocols are challenged by inter- and intra-sample variability and by manipulation-induced alterations of mechanical response, which can bias inverse calibration. In this work we exploit spatially heterogeneous full-field kinematics as an information-rich alternative to multimodal testing. We recast EUCLID, an unsupervised method for the automated discovery of constitutive models, towards Bayesian parameter inference for highly nonlinear, orthotropic constitutive models. Using synthetic myocardial tissue slabs, we demonstrate that a single heterogeneous biaxial experiment, combined with sparse reaction-force measurements, enables robust recovery of Holzapfel-Ogden parameters with quantified uncertainty, across multiple noise levels. The inferred responses agree closely with ground-truth simulations and yield credible intervals that reflect the impact of measurement noise on orthotropic material model inference. Our work supports single-shot, uncertainty-aware characterization of nonlinear orthotropic material models from a single biaxial test, reducing sample demand and experimental manipulation.

2512.09543 2026-06-10 cs.SE cs.AI

SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs

SWEnergy:关于使用SLM的代理问题解决框架的能量效率实证研究

Arihant Tripathy, Ch Pavan Harshit, Karthik Vaidhyanathan

发表机构 * SERC, IIIT-Hyderabad(IIIT-海得拉巴研究所)

AI总结 本文通过实证研究,探讨了四种主流代理问题解决框架在使用小型语言模型时的能量效率和资源消耗,发现框架架构是主要能耗驱动因素,但SLM的有限推理能力导致大量能耗浪费。

Comments 8 pages, 5 figures, 1 table. Accepted to AGENT 2026 (ICSE 2026 workshop)

Journal ref Proceedings of the 2026 International Workshop on Agentic Engineering (AGENT 2026), ACM, 2026, pp. 104-111

详情
AI中文摘要

本文通过实证研究,探讨了四种主流代理问题解决框架在使用小型语言模型时的能量效率和资源消耗,发现框架架构是主要能耗驱动因素,但SLM的有限推理能力导致大量能耗浪费。

英文摘要

Context. LLM-based autonomous agents in software engineering rely on large, proprietary models, limiting local deployment. This has spurred interest in Small Language Models (SLMs), but their practical effectiveness and efficiency within complex agentic frameworks for automated issue resolution remain poorly understood. Goal. We investigate the performance, energy efficiency, and resource consumption of four leading agentic issue resolution frameworks when deliberately constrained to using SLMs. We aim to assess the viability of these systems for this task in resource-limited settings and characterize the resulting trade-offs. Method. We conduct a controlled evaluation of four leading agentic frameworks (SWE-Agent, OpenHands, Mini SWE Agent, AutoCodeRover) using two SLMs (Gemma-3 4B, Qwen-3 1.7B) on the SWE-bench Verified Mini benchmark. On fixed hardware, we measure energy, duration, token usage, and memory over 150 runs per configuration. Results. We find that framework architecture is the primary driver of energy consumption. The most energy-intensive framework, AutoCodeRover (Gemma), consumed 9.4x more energy on average than the least energy-intensive, OpenHands (Gemma). However, this energy is largely wasted. Task resolution rates were near-zero, demonstrating that current frameworks, when paired with SLMs, consume significant energy on unproductive reasoning loops. The SLM's limited reasoning was the bottleneck for success, but the framework's design was the bottleneck for efficiency. Conclusions. Current agentic frameworks, designed for powerful LLMs, fail to operate efficiently with SLMs. We find that framework architecture is the primary driver of energy consumption, but this energy is largely wasted due to the SLMs' limited reasoning. Viable low-energy solutions require shifting from passive orchestration to architectures that actively manage SLM weaknesses.

2503.08460 2026-06-10 cs.ET cs.AI cs.CY

Status and Future Prospects of the Standardization Framework Industry 4.0: A European Perspective

工业4.0标准化框架行业现状与未来展望:欧洲视角

Olga Meyer, Marvin Boell, Christoph Legat

发表机构 * Fraunhofer Institute for Manufacturing Engineering and Automation (IPA)(弗劳恩霍夫智能制造与自动化研究所) German Commission for Electrotechnical, Electronic, and Information Technologies(德国电气电子和信息通信技术委员会) Technical University of Applied Sciences Augsburg(应用技术大学阿沙布斯)

AI总结 本文探讨工业4.0标准化在欧洲监管框架中的核心作用,重点分析智能制造和数字孪生中的标准化活动,为人工智能和数字孪生提供标准指南,并呼吁加强标准化机构与研究界的合作。

详情
AI中文摘要

工业4.0技术的快速发展需要强有力的标准化以确保未来工业的互操作性、安全性和效率。本文探讨了标准化的基本作用和功能,特别是其在欧洲监管框架中的重要性。基于此,文章重点突出标准化活动在智能制造和数字孪生中的相关主题,并概述了工业4.0标准框架。本文既为人工智能和数字孪生的现有标准提供信息指南,也呼吁加强标准化机构与研究界的合作。通过促进此类合作,我们旨在促进标准的持续发展和实施,从而推动制造业的创新和进步。

英文摘要

The rapid development of Industry 4.0 technologies requires robust and comprehensive standardization to ensure interoperability, safety and efficiency in the Industry of the Future. This paper examines the fundamental role and functionality of standardization, with a particular focus on its importance in Europe's regulatory framework. Based on this, selected topics in context of standardization activities in context intelligent manufacturing and digital twins are highlighted and, by that, an overview of the Industry 4.0 standards framework is provided. This paper serves both as an informative guide to the existing standards in Industry 4.0 with respect to Artificial Intelligence and Digital Twins, and as a call to action for increased cooperation between standardization bodies and the research community. By fostering such collaboration, we aim to facilitate the continued development and implementation of standards that will drive innovation and progress in the manufacturing sector.

2409.04519 2026-06-10 quant-ph cs.AI cs.LG physics.data-an

The role of data embedding in quantum autoencoders for improved anomaly detection

数据嵌入在量子自编码器中用于改进异常检测的作用

Jack Y. Araz, Michael Spannowsky

发表机构 * Thomas Jefferson National Accelerator Facility(托马斯·杰斐逊国家加速器设施) Institute for Particle Physics Phenomenology(粒子物理学现象研究所) Durham University(达勒姆大学)

AI总结 研究探讨了三种数据嵌入技术对量子自编码器异常检测性能的影响,发现改进的嵌入策略能显著提升检测准确性和数据表征能力。

Comments 8 pages, 5 figures, 4 tables

Journal ref Quantum Mach. Intell. 8, 61 (2026)

详情
AI中文摘要

量子自编码器(QAEs)在异常检测任务中的性能严重依赖于数据嵌入和ansatz设计。本研究探讨了三种数据嵌入技术——数据重新上传、并行嵌入和交替嵌入——对QAEs表征能力和检测效果的影响。我们的发现表明,即使使用相对简单的变分电路,增强的数据嵌入策略也能显著提高异常检测准确性和不同数据集下底层数据的表征能力。从低维数据的玩具示例开始,我们通过可视化展示不同嵌入技术对模型表征的影响。然后我们将分析扩展到复杂、高维数据集,强调嵌入方法对QAE性能的重大影响。

英文摘要

The performance of Quantum Autoencoders (QAEs) in anomaly detection tasks is critically dependent on the choice of data embedding and ansatz design. This study explores the effects of three data embedding techniques, data re-uploading, parallel embedding, and alternate embedding, on the representability and effectiveness of QAEs in detecting anomalies. Our findings reveal that even with relatively simple variational circuits, enhanced data embedding strategies can substantially improve anomaly detection accuracy and the representability of underlying data across different datasets. Starting with toy examples featuring low-dimensional data, we visually demonstrate the effect of different embedding techniques on the representability of the model. We then extend our analysis to complex, higher-dimensional datasets, highlighting the significant impact of embedding methods on QAE performance.

2305.19369 2026-06-10 eess.IV cs.CV physics.med-ph

The Brain Tumor Segmentation (BraTS) Challenge 2023: Glioma Segmentation in Sub-Saharan Africa Patient Population (BraTS-Africa)

2023年脑肿瘤分割(BraTS)挑战:撒哈拉以南非洲患者群体的胶质瘤分割(BraTS-Africa)

Maruf Adewole, Jeffrey D. Rudie, Anu Gbadamosi, Oluyemisi Toyobo, Confidence Raymond, Dong Zhang, Olubukola Omidiji, Rachel Akinola, Mohammad Abba Suwaid, Adaobi Emegoakor, Nancy Ojo, Kenneth Aguh, Chinasa Kalaiwo, Gabriel Babatunde, Afolabi Ogunleye, Yewande Gbadamosi, Kator Iorpagher, Evan Calabrese, Mariam Aboian, Marius Linguraru, Jake Albrecht, Benedikt Wiestler, Florian Kofler, Anastasia Janas, Dominic LaBella, Anahita Fathi Kzerooni, Hongwei Bran Li, Juan Eugenio Iglesias, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Walter Wiggins, Zachary Reitman, Chunhao Wang, Xinyang Liu, Zhifan Jiang, Ariana Familiar, Koen Van Leemput, Christina Bukas, Maire Piraud, Gian-Marco Conte, Elaine Johansson, Zeke Meier, Bjoern H Menze, Ujjwal Baid, Spyridon Bakas, Farouk Dako, Abiodun Fatade, Udunna C Anazodo

发表机构 * Medical Artificial Intelligence Laboratory (MAI Lab)(医学人工智能实验室(MAI实验室)) Department of Radiation Biology, Radiotherapy and Radiodiagnosis, University of Lagos(拉各斯大学放射生物学、放射治疗与放射诊断系) Department of Radiology, University of California, San Diego(加州大学圣地亚哥分校放射科) Crestview Radiology Limited(Crestview放射科有限公司) Lagos University Teaching Hospital(拉各斯大学教学医院) Lagos State University Teaching Hospital, Ikeja, Lagos, Nigeria(拉各斯州大学教学医院,伊凯贾,拉各斯,尼日利亚) NSIA-Kano Diagnostic Center, Kano Nigeria(NSIA-卡诺诊断中心,卡诺,尼日利亚) Nnamdi Azikiwe University Teaching Hospital, Nnewi, Anambra State, Nigeria(恩内迪·阿齐基韦大学教学医院,恩韦伊,安纳博拉州,尼日利亚) Federal Medical Centre, Abeokuta, Ogun State, Nigeria(阿博库塔联邦医疗中心,奥贡州,尼日利亚) Federal Medical Centre, Umahia, Abia State, Nigeria(乌马希亚联邦医疗中心,阿比亚州,尼日利亚) National Hospital Abuja, FCT, Nigeria(阿布贾国家医院,联邦首都区,尼日利亚) Benue State University Teaching Hospital, Markurdi, Benue State, Nigeria(贝努埃州大学教学医院,马库尔迪,贝努埃州,尼日利亚) Duke University Medical Center, Department of Radiology, USA(达特茅斯大学医学中心,放射科,美国) University of California San Francisco, CA, USA(加州大学旧金山分校,CA,美国) Yale University, New Haven, CT, USA(耶鲁大学,新 Haven,CT,美国) Children's National Hospital, Washington DC, USA(儿童医院华盛顿特区,华盛顿特区,美国) George Washington University, Washington DC, USA(乔治·华盛顿大学,华盛顿特区,美国) Sage Bionetworks, USA(Sage生物网络,美国) Department of Neuroradiology, Technical University of Munich, Munich, Germany(慕尼黑技术大学神经放射科系,慕尼黑,德国) Helmholtz Research Center, Munich, Germany(海德堡研究中心,慕尼黑,德国) Duke University Medical Center, Department of Radiation Oncology, USA(达特茅斯大学医学中心,放射肿瘤科,美国) Children’s Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA, USA(费城儿童医院,宾夕法尼亚大学,费城,PA,美国) Center for AI and Data Science for Integrated Diagnostics (AI2D) & Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, USA(人工智能与数据科学整合诊断中心(AI2D)及生物医学影像计算与分析中心(CBICA),宾夕法尼亚大学,费城,PA,美国) Athinoula A Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA(Athinoula A Martinos生物医学影像中心,马萨诸塞总医院,波士顿,MA,美国) University of Zurich, Switzerland(苏黎世大学,瑞士) Cancer Imaging Program, National Cancer Institute, National Institutes of Health, Bethesda, MD 20814, USA(癌症成像计划,国家癌症研究所,国家卫生研究院,贝塞斯达,MD 20814,美国) Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania, Philadelphia, USA(临床流行病学与生物统计学中心,宾夕法尼亚大学,费城,美国) Department of Applied Mathematics and Computer Science, Technical University of Denmark, Denmark(应用数学与计算机科学系,丹麦技术大学,丹麦) Mayo Clinic, MN, USA(梅奥诊所,MN,美国) Precision FDA, U.S. Food and Drug Administration, Silver Spring, MD, USA(Precision FDA,美国食品药品监督管理局,Silver Spring,MD,美国) Booz Allen Hamilton, McLean, VA, USA(Booz Allen Hamilton,麦肯,VA,美国) Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA(放射科,佩尔曼医学院,宾夕法尼亚大学,费城,PA,美国) Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA(病理学与实验室医学系,佩尔曼医学院,宾夕法尼亚大学,费城,PA,美国) Center for Global Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA(全球健康中心,佩尔曼医学院,宾夕法尼亚大学,费城,宾夕法尼亚,美国) Montreal Neurological Institute, McGill University, Montreal, Canada(蒙特利尔神经科学研究所,麦吉尔大学,蒙特利尔,加拿大) Department of Medicine, University of Cape Town, South Africa(医学系,开普敦大学,南非) Department of Radiation Medicine, University of Cape Town, South Africa(放射医学系,开普敦大学,南非)

AI总结 研究探讨了在资源有限的撒哈拉以南非洲地区,利用先进机器学习方法进行胶质瘤分割的可行性,旨在改进该地区胶质瘤的诊断和治疗。

Comments arXiv admin note: text overlap with arXiv:2107.02314

详情
AI中文摘要

胶质瘤是最常见的原发性脑肿瘤。尽管胶质瘤相对罕见,但它们是致命性最高的癌症之一,诊断后生存率低于2年。胶质瘤诊断困难、治疗困难且对传统疗法具有内在耐药性。多年来,大量研究改善了胶质瘤的诊断和治疗,降低了全球北方的死亡率,但低收入和中等收入国家(LMICs)患者生存机会未变,且在撒哈拉以南非洲(SSA)人群中的生存率更差。长期生存与识别适当的脑MRI病理特征及通过组织病理学确认有关。自2012年以来,脑肿瘤分割(BraTS)挑战已评估了最先进的机器学习方法以检测、表征和分类胶质瘤。然而,不清楚这些最先进的方法是否能在SSA广泛应用,因为广泛使用低质量MRI技术,产生较差的图像对比度和分辨率,更重要的是,疾病晚期出现的倾向以及SSA中胶质瘤的特殊特征(即疑似更高的脑膜瘤发病率)。因此,BraTS-Africa挑战为通过BraTS挑战将SSA的脑MRI胶质瘤病例纳入全球努力提供了独特机会,以开发和评估计算机辅助诊断(CAD)方法,用于资源有限环境中的胶质瘤检测和表征。

英文摘要

Gliomas are the most common type of primary brain tumors. Although gliomas are relatively rare, they are among the deadliest types of cancer, with a survival rate of less than 2 years after diagnosis. Gliomas are challenging to diagnose, hard to treat and inherently resistant to conventional therapy. Years of extensive research to improve diagnosis and treatment of gliomas have decreased mortality rates across the Global North, while chances of survival among individuals in low- and middle-income countries (LMICs) remain unchanged and are significantly worse in Sub-Saharan Africa (SSA) populations. Long-term survival with glioma is associated with the identification of appropriate pathological features on brain MRI and confirmation by histopathology. Since 2012, the Brain Tumor Segmentation (BraTS) Challenge have evaluated state-of-the-art machine learning methods to detect, characterize, and classify gliomas. However, it is unclear if the state-of-the-art methods can be widely implemented in SSA given the extensive use of lower-quality MRI technology, which produces poor image contrast and resolution and more importantly, the propensity for late presentation of disease at advanced stages as well as the unique characteristics of gliomas in SSA (i.e., suspected higher rates of gliomatosis cerebri). Thus, the BraTS-Africa Challenge provides a unique opportunity to include brain MRI glioma cases from SSA in global efforts through the BraTS Challenge to develop and evaluate computer-aided-diagnostic (CAD) methods for the detection and characterization of glioma in resource-limited settings, where the potential for CAD tools to transform healthcare are more likely.

2212.04930 2026-06-10 eess.AS cs.HC cs.LG cs.SD

DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

DDSupport: 一种展示差异和距离的语言学习支持系统

Kazuki Kawamura, Jun Rekimoto

发表机构 * The University of Tokyo, Tokyo, Japan(东京大学) Sony CSL Kyoto, Kyoto, Japan(索尼CSL京都)

AI总结 本文提出DDSupport系统,通过小规模未标注语音数据计算学习者发音评分和错误识别,以直观方式展示学习者与模型发音的差异和距离,帮助非母语者提升英语口语清晰度。

Journal ref 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

详情
AI中文摘要

当初学者学习非母语发音时,他们难以自行判断发音是否良好。因此,计算机辅助发音训练系统被用来检测学习者的误发音。这些系统通常将用户发音与特定母语者的发音进行比较,以节奏、音素或单词为单位计算差异。然而,它们需要大量详细标注的语音数据或只能比较单一特定母语者。为克服这些问题,我们提出了一种新的语言学习支持系统,该系统基于少量未标注语音数据计算发音评分和检测初学者的误发音,而无需与特定个体比较。所提出的系统使用基于深度学习的语音处理技术,以直观的方式显示学习者发音的评分以及学习者与一组模型发音之间的差异/距离。学习者可以通过消除差异并缩短与模型的距离来逐步提高发音。此外,由于发音评分和差异/距离不是基于特定模型的特定句子计算的,用户可以自由选择他们想学习的句子。我们还构建了一个应用程序来帮助非母语者学习英语,并确认它可以提高用户的语音可懂度。

英文摘要

When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user's speech with that of a specific native speaker as a model in units of rhythm, phonemes, or words and calculate the differences. However, they require extensive speech data with detailed annotations or can only compare with one specific native speaker. To overcome these problems, we propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners based on a small amount of unannotated speech data without comparison to a specific person. The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. Furthermore, since the pronunciation score and difference/distance are not calculated compared to specific sentences of a particular model, users are free to study the sentences they wish to study. We also built an application to help non-native speakers learn English and confirmed that it can improve users' speech intelligibility.

2209.14328 2026-06-10 quant-ph cond-mat.quant-gas cond-mat.str-el cs.LG

Scalably learning quantum many-body Hamiltonians from dynamical data

从动力学数据中可扩展地学习量子多体哈密顿量

Frederik Wilde, Augustine Kshetrimayum, Ingo Roth, Dominik Hangleiter, Ryan Sweke, Jens Eisert

发表机构 * Dahlem Center for Complex Quantum Systems(达姆施塔特复杂量子系统中心) Quantum Research Centre, Technology Innovation Institute (TII)(技术创新研究所量子研究中心) Joint Center for Quantum Information and Computer Science (QuICS), University of Maryland & NIST(联合量子信息与计算机科学中心(QuICS),马里兰大学及国家标准与技术研究院) Joint Quantum Institute (JQI), University of Maryland & NIST(联合量子研究所(JQI),马里兰大学及国家标准与技术研究院) Fraunhofer Heinrich Hertz Institute(弗劳恩霍夫海因里希·赫兹研究所)

AI总结 本文提出一种可扩展的数据驱动方法,利用梯度优化和张量网络,从动力学数据中学习多体相互作用哈密顿量家族,针对一维Heisenberg模型展示出误差随系统规模递减的特性。

Comments 11 pages, 5 figures

Journal ref Quantum Sci. Technol. 11, 035002 (2026)

详情
AI中文摘要

闭合量子系统的物理由其哈密顿量决定。然而在大多数实际情况下,哈密顿量并不精确已知,最终只有通过测量获得的数据存在。本文提出一种高度可扩展、数据驱动的方法,结合机器学习中的梯度优化技术和高效的张量网络量子态表示,从动力学数据中学习多体相互作用哈密顿量家族。该方法具有高度实用性、实验友好性,并能扩展到超过100个自旋的系统规模。特别地,在合成数据上展示,即使受限于单一初始态、少量单量子比特可观测量和相对较短的时间演化,算法仍能有效工作。对于一维Heisenberg模型,该算法的误差常数与系统规模成反比,且随数据集大小的平方根递减。

英文摘要

The physics of a closed quantum mechanical system is governed by its Hamiltonian. However, in most practical situations, this Hamiltonian is not precisely known, and ultimately all there is are data obtained from measurements on the system. In this work, we introduce a highly scalable, data-driven approach to learning families of interacting many-body Hamiltonians from dynamical data, by bringing together techniques from gradient-based optimization from machine learning with efficient quantum state representations in terms of tensor networks. Our approach is highly practical, experimentally friendly, and intrinsically scalable to allow for system sizes of above 100 spins. In particular, we demonstrate on synthetic data that the algorithm works even if one is restricted to one simple initial state, a small number of single-qubit observables, and time evolution up to relatively short times. For the concrete example of the one-dimensional Heisenberg model our algorithm exhibits an error constant in the system size and scaling as the inverse square root of the size of the data set.

2606.11131 2026-06-10 cs.CV 新提交

UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors

UniPET:一种适用于不同剂量减少因子的高质量PET图像去噪通用网络

Zhiwen Yang, Yang Zhou, Haowei Chen, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

AI总结 针对现有PET去噪方法在剂量减少因子变化时性能下降的问题,提出UniPET网络,通过风格对齐网络和区域感知学习策略实现跨DRF的高质量去噪,性能达到最先进水平。

详情
AI中文摘要

大多数现有的基于深度学习的PET图像去噪方法假设低剂量PET图像具有固定且已知的剂量减少因子(DRF)。然而,当DRF在实际应用中超出假设范围时,这些方法会遇到显著的性能下降。为了应对不同DRF带来的挑战,一些初步研究聚焦于通用PET图像去噪任务,旨在训练一个覆盖不同DRF低剂量数据的通用模型。尽管如此,这些通用模型常常难以处理不同DRF数据中存在的风格不匹配问题,导致出现显著的过度平滑效应,即\textit{风格消除问题}。为了解决这个问题,我们创新性地将域泛化引入PET图像去噪,并提出了一种通用PET图像去噪网络(UniPET),以实现跨不同DRF的高质量PET图像去噪。UniPET包含两个主要创新:风格对齐网络(SAN)和区域感知学习策略(RALS)。具体而言,SAN利用源自域泛化的风格对齐技术来对齐和恢复不同DRF下的风格,确保模型在各种DRF下的泛化能力,同时有效保留风格。此外,为了增强风格恢复,RALS区分平坦区域和风格化区域,仅在后者上进行对抗学习,从而更有效地引导模型关注学习风格化区域。实验证明,我们提出的UniPET能够自适应地恢复不同DRF风格,并实现跨DRF的高质量PET图像去噪。全面的实验表明,UniPET在特定DRF下表现出与专用DRF模型相当的性能,并在定量、感知和临床评估中实现了通用PET图像去噪的最先进性能。

英文摘要

Most existing deep learning-based PET image denoising methods assume a fixed and known dose reduction factor (DRF) for low-dose PET images. However, these methods encounter significant performance degradation when the DRF varies beyond the assumed one in practical applications. To address the challenge posed by varied DRFs, several preliminary studies focus on the task of universal PET image denoising, aiming to train a universal model over low-dose data across DRFs. Nonetheless, these vanilla universal models often struggle with misaligned styles present in different DRF data, leading to the \textit{style elimination issue} with a significant over-smoothing effect. To deal with this issue, we innovatively introduce domain generalization to PET image denoising and propose a universal PET image denoising network (UniPET) to achieve high-quality PET image denoising across diverse DRFs. UniPET comprises two primary innovations: a style alignment network (SAN) and a region-aware learning strategy (RALS). Specifically, SAN utilizes style alignment techniques derived from domain generalization to align and recover styles across different DRFs, ensuring the model's generalizability across various DRFs while effectively preserving styles. Furthermore, to enhance style recovery, RALS distinguishes between flat and stylized regions, exclusively conducting adversarial learning on the latter, thereby more effectively guiding the model's focus towards learning stylized regions. It is demonstrated that our proposed UniPET can adaptively recover different DRF styles and achieve high-quality PET image denoising across DRFs. Comprehensive experiments show that UniPET exhibits comparable performance to individual DRF-specific models at specific DRFs and realizes state-of-the-art performance in universal PET image denoising quantitatively, perceptually, and clinically.

2606.10902 2026-06-10 cs.CV cs.AI 新提交

Pose-ICL: 3D-Aware In-Context Learning for Pose-Controllable Subject Customization

Pose-ICL:面向姿态可控主体定制的3D感知上下文学习

Xuan Han, Yihao Zhao, Mingyu You

AI总结 提出Pose-ICL框架,通过3D感知上下文学习和表面锚定位置嵌入(SAPE)实现无调优的姿态可控主体定制,显著提升姿态准确性和身份一致性。

详情
AI中文摘要

主体定制是现代图像生成中的基础任务。通过提供少量参考图像和文本提示,用户可以生成特定对象在任意期望场景中的图像。然而,现有方法在实现定制主体的有效姿态控制方面仍存在困难。在实践中,它们常常表现出不准确的姿态或不一致的跨姿态外观。这些局限性表明,对于2D原生骨干网络而言,以体积方式理解对象仍然是一个重大挑战。为了应对这一挑战,我们提出了Pose-ICL,这是一个无需调优的框架,利用3D感知上下文学习(ICL)通过多个配对的图像-姿态参考直接适应新主体。其核心机制——表面锚定位置嵌入(SAPE)——通过将图像令牌锚定到体积边界框的表面坐标,赋予模型显式的3D感知能力。专门的优化确保了其与现有DiT模型的无缝兼容性。在3D资产和真实世界主体上的广泛评估表明,Pose-ICL在姿态准确性和身份一致性方面均显著优于当前方法。

英文摘要

Subject Customization is a foundational task in modern image generation. By providing a few reference images and a text prompt, users can generate images of a specific object in any desired scene. However, existing methods still struggle to achieve effective pose control for customized subjects. In practice, they often exhibit inaccurate poses or inconsistent cross-pose appearances. These limitations suggest that understanding objects in a volumetric manner remains a significant challenge for 2D-native backbones. To address this challenge, we propose Pose-ICL, a tuning-free framework that leverages 3D-aware In-Context Learning (ICL) to directly adapt to new subjects through multiple paired image-pose references. Its core mechanism,Surface-Anchored Position Embedding (SAPE), equips the model with explicit 3D awareness by anchoring image tokens to the surface coordinates of a volumetric bounding box. Dedicated refinements ensure its seamless compatibility with existing DiT models. Extensive evaluations on both 3D assets and real-world subjects demonstrate that Pose-ICL significantly outperforms current methods in both pose accuracy and identity consistency.

2606.10892 2026-06-10 cs.CV cs.AI 新提交

Improving Text-Instance Alignment Of Foreground Conditioned Out-Painting Via Customized Concept Embedding

通过定制化概念嵌入改进前景条件外绘中的文本-实例对齐

Yihao Zhao, Xuan Han, Bin He, Mingyu You

AI总结 针对前景条件外绘中文本驱动方法产生的伪影问题,提出定制化概念嵌入扩散框架,通过实例感知损失和语义保持提示模板定制概念嵌入,显著减少伪影并提升图像质量。

详情
AI中文摘要

为了展示产品,商家通常需要花费大量成本制作高质量的展示图像。前景条件外绘(FCO)满足了这一需求,允许用户通过调整文本提示,以低成本为前景实例创建所需的背景。然而,现有的文本驱动FCO方法在其输出中存在关键缺陷,最明显的是伪影,即合成背景中与前景实例共享相同语义的区域。这种伪影降低了物体的显著性并降低了图像质量。我们将问题归因于给定实例与文本派生概念嵌入之间的不对齐。为了解决这个问题,我们提出了定制化概念嵌入扩散(CCE-Diffusion)框架。其核心是CCE模块,用于定制概念嵌入,弥合通用名词语义与特定视觉实例之间的差距。实例感知损失指导模块的优化,而语义保持提示模板防止定制化嵌入扭曲提示中的其他词。定性和定量评估均表明,CCE-Diffusion显著减少了输出中的伪影。作为即插即用组件,CCE模块可以集成到各种FCO方法中,提升其性能。

英文摘要

To showcase products, merchants often incur substantial costs creating high-quality display images. Foreground Conditioned Outpainting (FCO) meets this demand, allowing users to create desired backgrounds for foreground instances at a low cost by adjusting the text prompt. However, existing text-driven FCO methods exhibit critical flaws in their outputs, most notably the presence of artifacts, which refer to regions in the synthesized background that share the same semantics as the foreground instance. Such artifacts diminish the object's prominence and degrade image quality. We attribute the issue to the misalignment between the given instance and text-derived concept embeddings. To address this, we propose the Customized Concept Embedding Diffusion (CCE-Diffusion) framework. Its core is a CCE-Module to customize concept embeddings, bridging the gap between generic noun semantics and a specific visual instance. An Instance-Aware Loss guides the module's optimization, while a Semantic-Preserving Prompt Template prevents customized embeddings from distorting other words in the prompt. Both qualitative and quantitative evaluations demonstrate that CCE-Diffusion significantly reduces artifacts in the outputs. As a plug-and-play component, the CCE-Module can integrate with various FCO methods, enhancing their performance.

2606.10596 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Embedding Hybrid Systems into Continuous Latent Vector Fields

将混合系统嵌入连续潜在向量场

Sangli Teng, Hang Liu, Koushil Sreenath

AI总结 证明当m>2n时,n维混合系统可嵌入m维欧氏空间中的连续向量场,并基于此提出一种潜在神经ODE方法,从时间序列数据中准确恢复混合系统流,优于现有方法。

Comments Accepted to ICML 2026

详情
AI中文摘要

这项工作证明了当$m>2n$时,一个$n$维混合系统可以嵌入到一个$m$维欧氏空间中,并在其嵌入图像上配备一个连续向量场。这一结果表明,一个本质上不连续的混合系统通常允许一个连续的 extrinsic 表示,该表示对于可微优化是适定的。基于这一存在性定理,我们表明,在潜在空间和状态空间中都具有一致性损失的潜在神经ODE可以准确恢复混合系统的流。大量实验表明,所提出的方法在仅从时间序列数据学习具有不同几何形状的混合系统方面优于现有方法。

英文摘要

This work proves that an $n$-dimensional hybrid system can be embedded into an $m$-dimensional Euclidean space equipped with a continuous vector field on its embedded image whenever $m>2n$. This result suggests that an intrinsically discontinuous hybrid system generically admits a continuous extrinsic representation that is well-posed for differentiable optimization. Building on this existence theorem, we show that a latent Neural ODE with consistency loss in both the latent and state space can accurately recover the flow of hybrid systems. Extensive experiments suggest the proposed method outperforms the existing method in learning hybrid systems with varying geometries from only time series data.

2606.10347 2026-06-10 cs.LG cs.LO 新提交

Beyond Explaining Predictions: Logic-Based Explanations for Confidence in Machine Learning Models

超越预测解释:基于逻辑的机器学习模型置信度解释

Vinícius Peixoto Chagas, Carlos Henrique Leitão Cavalcante, Thiago Alves Rocha

AI总结 提出置信度感知的反绎解释,通过最小置信度阈值量化解释的置信保证,并设计算法生成满足用户指定置信阈值的最小解释,在提升置信保证的同时仅适度增加解释长度。

详情
AI中文摘要

机器学习越来越多地应用于关键领域,在这些领域中,预测及其相关的置信水平都会影响重要决策。为了增强此类场景的透明度,理解模型为何对其预测有信心或不确定非常重要。最近的基于逻辑的方法提供了反绎解释,即足以保持预测类别的最小特征子集,并具有正确性保证。然而,这些方法仅关注分类行为,可能产生覆盖低预测置信度实例的解释。在这项工作中,我们引入了最小置信度阈值(MCT)的概念,它量化了反绎解释提供的最弱置信度保证。基于这一概念,我们提出了置信度感知的反绎解释,它不仅保持预测类别,还保持用户指定的置信度保证。我们将MCT计算表述为一个优化问题,并引入了一种算法,用于生成满足所需置信度阈值的最小解释。我们在用于二分类的提升树上评估了所提出的框架,尽管该方法也适用于其他提供置信度分数的机器学习模型。实验结果表明,传统的反绎解释通常提供比被解释实例本身相关的置信度弱得多的置信度保证。相比之下,置信度感知的解释持续提高了解释所保证的最小置信度,同时仅需要适度增加解释长度。这些特性使得所提出的方法特别适用于预测正确性和置信度对于可信决策都至关重要的应用。

英文摘要

Machine learning is increasingly used in critical domains, where both predictions and their associated confidence levels influence important decisions. To enhance transparency in such scenarios, it is important to understand why a model is confident or uncertain about its predictions. Recent logic-based approaches provide abductive explanations, minimal subsets of features sufficient to preserve the predicted class, with correctness guarantees. However, these methods focus solely on classification behavior and may produce explanations that cover instances with low predictive confidence. In this work, we introduce the concept of Minimum Confidence Threshold (MCT), which quantifies the weakest confidence guarantee provided by an abductive explanation. Building upon this concept, we propose confidence-aware abductive explanations, which preserve not only the predicted class but also a user-specified confidence guarantee. We formulate MCT computation as an optimization problem and introduce an algorithm for generating minimal explanations that satisfy a desired confidence threshold. We evaluate the proposed framework on boosted trees for binary classification, although the approach is applicable to other machine learning models that provide confidence scores. Experimental results show that traditional abductive explanations often provide substantially weaker confidence guarantees than the confidence associated with the explained instance itself. In contrast, confidence-aware explanations consistently improve the minimum confidence guaranteed by an explanation while requiring only a modest increase in explanation length. These properties make the proposed approach particularly suitable for applications where both predictive correctness and confidence are essential for trustworthy decision making.

2606.10244 2026-06-10 cs.RO cs.AI 新提交

YUBI: Yielding Universal Bidigital Interface for Bimanual Dexterous Manipulation at Scale

YUBI:面向大规模双手灵巧操作的通用双指接口

Takehiko Ohkawa, Jumpei Arima, Yuki Noguchi, Masatoshi Tateno, Makoto Sugiura, Takuya Okubo, Kengo Ikeuchi, Yuma Shin, Hiroki Nishizawa, Naoaki Kanazawa, Yuki Wakayama, Daiki Fukunaga, Koshi Makihara, Tomohiro Motoda, Floris Erich, Yukiyasu Domae, Tatsuya Matsushima, Yohishiro Okumatsu, Kei Ota

AI总结 提出YUBI手指对齐夹爪,通过屈服式手指驱动映射实现直观、符合人体工学的双手灵巧操作数据采集,构建8434小时/120万集/119任务数据集,单策略跨多机器人迁移。

Comments Project page: https://yubi.airoa.io/

详情
AI中文摘要

我们引入了Yielding Universal Bidigital Interface (YUBI),一种手指对齐的夹爪,旨在实现双手灵巧操作的直观、符合人体工学且可扩展的数据采集。虽然手持数据采集系统(如Universal Manipulation Interface (UMI))实现了低成本数据采集,但其笨重的手枪式握把设计可能给精细灵巧操作任务带来人体工学和使用性挑战。为此,YUBI提出了一种独特的设计原则:屈服式手指驱动,将人类手指运动直接映射到夹爪钳口运动。使用YUBI设备,我们建立了一个集成基于VR的6自由度夹爪跟踪的数据采集系统,确保高保真轨迹数据获取。我们整理了一个前所未有的基于UMI的数据集:8434小时,涵盖120万集和119个任务。实验表明,YUBI在复杂双手任务的通用性、灵巧性和操作效率方面优于UMI夹爪。通过在多个平台上安装夹爪,在YUBI数据集上训练的单一策略可迁移到多个双手机器人(UR、Franka和ELEY),证实采集的数据可直接作为策略监督执行。我们发布了夹爪硬件、数据采集软件和数据集作为集成堆栈,为开放社区提供可复现的大规模数据采集路径,以推动机器人基础模型的发展。

英文摘要

We introduce Yielding Universal Bidigital Interface (YUBI), a finger-aligned gripper designed to enable intuitive, ergonomic, and scalable data collection for bimanual dexterous manipulation. While handheld data collection systems such as Universal Manipulation Interface (UMI) enable affordable data collection, their bulky pistol-grip designs can pose ergonomic and usability challenges for fine-grained, dexterous manipulation tasks. To address this, YUBI presents a distinct design principle: yielding, finger-driven actuation that directly maps human finger movements to gripper jaw motion. Using the YUBI devices, we set up a data collection system with integrated VR-based 6 DoF tracking of the gripper, ensuring high-fidelity trajectory data acquisition. We curate a UMI-based dataset of unprecedented scale: 8,434 hours across 1.20M episodes and 119 tasks. Experiments show that YUBI offers advantages over the UMI gripper in versatility for complex bimanual tasks, dexterity, and operational efficiency. A single policy trained on the YUBI dataset transfers across multiple bimanual robots (UR, Franka, and ELEY) simply by mounting the gripper on each platform, confirming that the collected data are directly executable as policy supervision. We release the gripper hardware, data-collection software, and dataset as one integrated stack, offering the open community a reproducible path to large-scale data acquisition for advancing robotic foundation models.

2606.10243 2026-06-10 cs.LG 新提交

DUET -- Dual User Embedding Transformers for Offsite Conversion Prediction

DUET -- 用于站外转化预测的双用户嵌入变换器

Reazul Hasan Russel, Mingwei Tang, Rostam Shirani, Xinlong Liu, Navid Madani, Leo Ding, Yawen He, Xiangyu Wang, Mustafa Acar, Ashish Katiyar, Yuhai Li, Alan Yang, Metarya Ruparel, Derek Qiang Xu, Rupert Wu, Rui Yang, Liang Tao, Xinyi Zhao, Larry Zhang, Sri Reddy, Rob Malkin

AI总结 针对点击信号丰富但转化信号稀疏、延迟的问题,提出DUET框架,通过为点击和转化流分别预训练专用变换器编码器,生成互补嵌入,在服务延迟约束下提升站外转化率预测精度。

详情
AI中文摘要

站外转化率(OCVR)预测是计算推荐系统中一个重要的排序问题。该任务面临建模挑战:点击信号丰富且时间跨度短,而转化信号本质稀疏、延迟长且常无法归因。尽管存在这些统计差异,两种信号都必须为在严格服务延迟约束下运行的模型提供信息。先前的预训练方法使用单一、无差别的编码器统一应用于两个数据流。我们提出DUET(双用户嵌入变换器),该框架明确将用户行为数据划分为两个领域一致的流——点击和转化——并为每个流预训练专用变换器编码器,其架构针对各流的统计特征定制:密集点击流使用多层自注意力,稀疏转化流使用交错交叉和自注意力。生成的互补嵌入由下游排序器联合使用,而不超出服务延迟预算。评估显示,相对于最强基线,归一化熵(NE)降低高达0.38%,A/B测试显示OCVR预测精度持续提升。

英文摘要

Offsite conversion rate (OCVR) prediction is an important ranking problem in computational recommendation systems. This task presents a modeling challenge: click signals are abundant and exhibit short temporal horizons, whereas conversion signals are inherently sparse, long-delayed, and frequently unattributed. Despite these statistical disparities, both signal types must inform models that operate within strict serving-latency constraints. Prior pre-training approaches address this heterogeneity with a single, undifferentiated encoder applied uniformly across both data streams. We propose DUET (Dual User Embedding Transformers), a framework that explicitly partitions user behavioral data into two domain-coherent streams -- clicks and conversions -- and pre-trains dedicated transformer encoders with architectures tailored to each stream's statistical characteristics: multi-layer self-attention for the dense click stream and interleaved cross- and self-attention for the sparse conversion stream. The resulting complementary embeddings are jointly consumed by a downstream ranker without exceeding serving-latency budgets. Evaluation demonstrates up to 0.38% normalized entropy (NE) reduction relative to the strongest baseline, and A/B test shows consistent improvements in OCVR prediction accuracy.

2606.10241 2026-06-10 cs.AI 新提交

Regimes: An Auditable, Held-Out-Gated Improvement Loop Demonstrated on LongMemEval with ActiveGraph

Regimes: 一种可审计的、保留验证集的改进循环——在ActiveGraph上的LongMemEval演示

Yohei Nakajima

AI总结 提出Regimes,一种基于事件溯源的可审计自主改进循环,通过ActiveGraph运行时实现故障记录、重放和保留集验证,在LongMemEval-S上提升0.05-0.10的准确率。

Comments 30 pages, 5 figures. Code and committed runs: https://github.com/yoheinakajima/regimes

详情
AI中文摘要

自主改进循环难以信任,因为改进过程通常是附加在智能体上的外部脚手架:故障未被记录,诊断无法重放,提升或丢弃决策落入侧数据库而非智能体自身历史。我们证明,事件溯源智能体运行时消除了这种摩擦,将受控改进转化为一等工作流。当智能体状态是仅追加事件日志的确定性投影时,故障被记录,运行从日志精确重放,候选补丁限定于类型化管道接缝,门控可审计,每次提升或丢弃本身也是一个事件。我们通过Regimes演示了这一点,这是ActiveGraph运行时上的一个循环,诊断失败的评估,在管道点提出修复,并仅在静态检查、沙盒执行、样本内评估和保留验证后提升。该循环与目标无关:相同的控制流通过通用接口针对不同任务运行。在LongMemEval-S上,主要失败不是检索而是调和:证据已在汇编上下文中,但阅读器回答错误。在五个种子保留集划分中,Regimes发现阅读器提示修复,在四个划分中将最终保留准确率提升+0.05至+0.10,在一个过度提升划分中提升+0.01;两个划分单独显著(种子5未针对其顺序提升结构调整),汇总计数仅为描述性,因为划分共享一个500问题池。持久贡献包括:ActiveGraph作为使受控改进循环可行的可审计基础,其支持的保留集门控循环,将每个故障路由到管道位置的失败机制分类(其相对于无路由基线的边际价值是主要开放问题),以及提示即发现探针的假设。

英文摘要

Autonomous improvement loops are hard to trust because the improvement process is usually external scaffolding bolted onto the agent: failures go unlogged, diagnoses cannot be replayed, and promote-or-discard decisions land in a side database rather than the agent's own history. We show that an event-sourced agent runtime removes that friction and turns controlled improvement into a first-class workflow. When the agent's state is a deterministic projection of an append-only event log, failures are recorded, a run replays exactly from its log, candidate patches scope to typed pipeline seams, gates are auditable, and every promotion or discard is itself an event. We demonstrate this with Regimes, a loop on the ActiveGraph runtime that diagnoses failed evaluations, proposes a repair at a pipeline point, and promotes it only after static checks, sandbox execution, in-sample evaluation, and held-out validation. The loop is target-agnostic: the same control flow runs against different tasks through a common interface. On LongMemEval-S the dominant failure is not retrieval but reconciliation: the evidence is already in the assembled context, yet the reader answers incorrectly. Across five seeded held-out splits, Regimes discovers reader-prompt repairs that improve final held-out accuracy by +0.05 to +0.10 in four splits and +0.01 in one over-promotion split; two splits are individually significant (seed 5 unadjusted for its sequential promotion structure), and the pooled count is descriptive only, since the splits share one 500-question pool. The durable contributions are ActiveGraph as an auditable substrate that makes controlled improvement loops tractable, the held-out-gated loop it supports, the failure-regime taxonomy routing each failure to a pipeline location (whose marginal value over an unrouted baseline is the primary open question), and the prompt-as-discovery-probe hypothesis.

2606.10229 2026-06-10 cs.RO cs.LG 新提交

What Demonstration Curation Metrics Do to Your Policy

演示筛选指标对策略的影响

Aarav Bedi

AI总结 研究演示筛选指标在检测缺陷演示后,是否提升基于行为克隆的策略性能。发现指标检测缺陷的能力与策略性能严重脱钩,并揭示演示时长作为混淆变量的影响。

Comments 6 pages, 1 figure, 2 tables

详情
AI中文摘要

我们研究了检测缺陷训练演示的筛选指标是否也能改善基于筛选数据训练的行为克隆策略。在一个接触密集的LIBERO抓取放置基准任务中,通过引入受控结构缺陷(搬运阶段早期释放夹爪),我们发现这两个量是严重解耦的。具有最高缺陷检测AUROC(0.804)的指标产生了最差的筛选策略(任务成功率13.3%),而AUROC显著较低(0.638)的指标产生的策略几乎与在真实干净数据上训练的Oracle策略相匹配(90.0% vs. 93.3%)。我们进一步表明,我们评估的七个指标中有五个利用演示时长作为缺陷标签的琐碎代理,这种混淆因素将报告的AUROC膨胀到接近完美的值,并且在控制演示时长后消失。在所有条件下,受污染的基线仅在3.3%的测试中成功,而两种最佳的筛选方法将差距缩小到Oracle上限93.3%的3个百分点以内。我们的结果认为,筛选方法应根据其产生的策略来评估,而不是根据其标记的缺陷,并且任何筛选基准在报告检测准确性之前必须控制演示时长。我们发布了测试平台、所有指标实现和评估流程。

英文摘要

We study whether demonstration-curation metrics that detect defective training episodes also improve the downstream behavior-cloning policy that trains on the curated data. On a contact-rich LIBERO pick-and-place benchmark with a controlled structural defect (early gripper release during the carry phase), we find that the two quantities are sharply decoupled. The metric with the highest defect-detection AUROC (0.804) produces the worst curated policy (13.3% task success), while a metric with a substantially lower AUROC (0.638) produces a policy that nearly matches the oracle trained on ground-truth clean data (90.0% vs. 93.3%). We further show that five of the seven metrics we evaluate exploit episode length as a trivial proxy for the defect label, a confound that inflates reported AUROCs to near-perfect values and disappears once episode length is controlled. Across all conditions, the contaminated baseline succeeds on only 3.3% of rollouts, and the two best curation methods close this to within 3 percentage points of the 93.3% oracle ceiling. Our results argue that curation methods should be evaluated by the policy they produce, not the defects they flag, and that any curation benchmark must control for episode length before reporting detection accuracy. We release the testbed, all metric implementations, and the evaluation pipeline.

2606.10228 2026-06-10 cs.LG cs.AI cs.RO 新提交

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

SHAPO: 面向安全探索的锐度感知策略优化

Kaustubh Mani, Yann Pequignot, Vincent Mai, Liam Paull

AI总结 提出SHAPO算法,通过锐度感知策略更新隐式重加权梯度,放大罕见不安全动作的影响,抑制安全动作的贡献,从而在欠探索区域实现保守行为,提升安全性与任务性能。

Comments ICLR 2026

详情
AI中文摘要

安全探索是在安全关键领域部署强化学习(RL)智能体的先决条件。在本文中,我们通过认知不确定性的视角来探讨安全探索,其中智能体对参数扰动的敏感性作为高不确定性区域的实际代理。我们提出了锐度感知策略优化(SHAPO),一种锐度感知的策略更新规则,该规则在扰动参数处评估梯度,使得策略更新相对于智能体的认知不确定性变得悲观。分析表明,这种调整隐式地重新加权了策略梯度,放大了罕见不安全动作的影响,同时抑制了已安全动作的贡献,从而在欠探索区域将学习偏向于保守行为。在多个连续控制任务中,我们的方法在安全性和任务性能上均持续优于现有基线,显著扩展了它们的帕累托前沿。

英文摘要

Safe exploration is a prerequisite for deploying reinforcement learning (RL) agents in safety-critical domains. In this paper, we approach safe exploration through the lens of epistemic uncertainty, where the actor's sensitivity to parameter perturbations serves as a practical proxy for regions of high uncertainty. We propose Sharpness-Aware Policy Optimization (SHAPO), a sharpness-aware policy update rule that evaluates gradients at perturbed parameters, making policy updates pessimistic with respect to the actor's epistemic uncertainty. Analytically we show that this adjustment implicitly reweighs policy gradients, amplifying the influence of rare unsafe actions while tempering contributions from already safe ones, thereby biasing learning toward conservative behavior in under-explored regions. Across several continuous-control tasks, our method consistently improves both safety and task performance over existing baselines, significantly expanding their Pareto frontiers.

2606.10227 2026-06-10 cs.LG 新提交

Spatiotemporal Graph Transformer for 3D Neighborhood Interaction and Quality Prediction in Metal Additive Manufacturing

时空图Transformer用于金属增材制造中的3D邻域交互与质量预测

Joyce Karen Pelaez, Siqi Zhang, Hoo Sang Ko

AI总结 提出一种时空图Transformer,通过加权网络表示和双注意力机制建模3D邻域交互,显著提升金属增材制造质量预测性能。

Comments Submitted to Journal of Intelligent Manufacturing, 23 pages, 10 figures, 2 tables

详情
AI中文摘要

金属增材制造能够制造复杂零件,但由于重复的逐层熔化、凝固和再加热在3D构建中引起的交互作用,实现一致的构建质量仍然具有挑战性。先进传感技术为收集实际制造过程的丰富观测数据以实现实时质量监控和控制提供了巨大机会。然而,现有方法通常难以表示多层交互并量化其对质量的贡献。在本文中,我们开发了一种新颖的时空图Transformer,用于建模3D邻域交互并学习其对金属增材制造构建质量的影响。具体来说,我们首先引入制造过程的加权网络表示,其中熔合位置被建模为节点,其空间和过程依赖关系被编码为边权重。这种表示还允许将多模态数据(例如几何设计、工艺设置和原位传感数据)集成到统一结构中,用于下游学习任务。在此网络基础上,我们进一步设计了一种双注意力图Transformer,它同时捕获节点内特征依赖和跨节点邻域交互,用于质量表示学习。实验结果表明,所提出的框架在表征过程-质量关系方面显著优于基于图像、序列和图的模型。更重要的是,跨层交互的纳入对于提高质量预测性能至关重要。该框架广泛适用于涉及网络建模和基于图的表示学习的其他任务。

英文摘要

Metal additive manufacturing enables the fabrication of complex parts, but achieving consistent build quality remains challenging due to interactions induced by repeated layer-wise melting, solidification, and reheating across the 3D build. Advanced sensing provide a great opportunity to collect rich observations of the actual manufacturing process for real-time quality monitoring and control. Yet, existing methods often have limited ability to represent multi-layer interactions and quantify their contributions to quality. In this paper, we develop a novel spatiotemporal graph transformer for modeling 3D neighborhood interactions and learn their effects on build quality in metal additive manufacturing. Specifically, we first introduce a weighted network representation of the manufacturing process, where fusing locations are modeled as nodes, and their spatial- and process-dependent relationships are encoded as edge weights. This representation also enables the integration of multimodal data (e.g., geometric design, process settings, and in-situ sensing data) into a unified structure for downstream learning tasks. Building on this network, we further design a dual-attention graph transformer that captures both within-node feature dependencies and cross-node neighborhood interactions for quality representation learning. Experimental results show that the proposed framework significantly outperforms image-based, sequence-based, and graph-based models in characterizing process-quality relationships. More importantly, the incorporation of cross-layer interactions is critical for improving quality prediction performance. This framework is broadly applicable to other tasks involving network modeling and graph-based representation learning.

2606.10223 2026-06-10 cs.SD cs.AI cs.CV 新提交

Dual-Branch Gated Fusion for Open-Set Audio Deepfake Source Tracing

双分支门控融合用于开放集音频深度伪造源追踪

Awais Khan, Kutub Uddin, Khalid Malik

AI总结 针对开放集音频深度伪造源追踪问题,提出双分支门控融合框架,结合XLSR-53和CORES描述符,通过输入条件门控自适应加权,实现域内高精度和域外鲁棒泛化。

详情
AI中文摘要

将合成语音归因于其原始系统仍然是一个开放挑战:闭集模型无法拒绝未见过的合成器并产生过度自信的预测。为了解决这个问题,我们提出了一个双分支门控融合框架,将XLSR-53与CORES配对,CORES是一个66维描述符,与之前仅使用线性滤波器组(LFB)的工作不同,它跨越倒谱、振荡、节奏、能量和频谱维度,以捕获互补的合成伪影。我们的分析表明,XLSR-53在域内(ID)保持判别性,而CORES在分布偏移(OOD)下稳定泛化,但由于SSL表示不平衡,它们的简单拼接失败。为了解决这个问题,一个输入条件门控在联合训练下自适应地加权每个分支,使用交叉熵、用于ID/OOD分离的能量边际损失和门控多样性项。在MLAAD基准上,我们的系统实现了97.6%的ID准确率、4.9%的EERc,并且相对于Interspeech 2025基线,FPR95相对降低了83.5%。

英文摘要

Attributing a synthetic utterance to its originating system remains an open challenge: closed-set models fail to reject unseen synthesizers and produce overconfident predictions. To address this, we propose a dual-branch gated fusion framework that pairs XLSR-53 with CORES, a 66-dimensional descriptor that, unlike prior Linear Filter Bank (LFB)-only work, spans cepstral, oscillatory, rhythmic, energy, and spectral dimensions to capture complementary synthesis artifacts. Our analysis shows XLSR-53 remains discriminative in-domain (ID) while CORES generalizes stably under distribution shift (OOD), yet their naive concatenation fails due to SSL representational imbalance. To resolve this, an input-conditioned gate adaptively weights each branch under joint training with cross-entropy, an energy margin loss for ID/OOD separation, and a gate diversity term. On the MLAAD benchmark, our system achieves 97.6\% ID accuracy, 4.9\% EERc, and an 83.5\% relative FPR95 reduction over the Interspeech 2025 baseline.

2606.10219 2026-06-10 cs.LG cs.AI 新提交

Fast Exact Nearest-Neighbor Learning for High-Frequency Financial Time Series

高频金融时间序列的快速精确最近邻学习

Henry Han, Diane Li

AI总结 针对金融高频数据增长带来的实时性挑战,提出基于Mojo的SIMD k-d树方法,通过方差分裂、连续存储和编译时向量化距离计算,在保持精确输出的同时实现17.5-43.5倍加速,并支持期权定价模型训练数据量提升10倍。

Comments 15 pages 5 figures;

详情
AI中文摘要

随着股票、ETF、外汇、期权和高频交易数据量的激增,AI在金融领域的大规模效率变得至关重要。这种增长给成熟的金融AI系统带来了核心挑战:模型必须从更大的历史语料库中学习,同时满足交易、风险管理和衍生品定价中的实时延迟约束。我们以高频金融时间序列的精确最近邻学习为具体案例,展示基于Mojo的金融AI可以应对这一挑战。我们引入了一种Mojo SIMD k-d树,采用基于方差的划分、连续的扁平缓冲区存储和编译时向量化距离计算。我们还提供了运行时结果,表明在标准剪枝和实现成本假设下,对于固定股票、大样本量、中等维度的情况,Mojo SIMD k-d树渐近地优于Mojo SIMD暴力搜索和scikit-learn的k-d树。在x86和ARM64架构的八个金融数据集上(训练样本最多277K),该方法在x86上比scikit-learn的k-d树加速17.5-21.6倍,在ARM64股票/ETF数据集上比scikit-learn暴力搜索加速28.1-43.5倍,同时保持精确输出。除了最近邻推理,Mojo的编译执行使得基于Extra Trees的隐含波动率定价模型能够训练10倍以上的期权数据,将看跌期权IV RMSE降低8.0%。这些结果将Mojo定位为金融AI的可扩展、生产就绪栈,并为其他数据密集型领域的高效AI提供了有前景的基础。

英文摘要

AI efficiency at scale is becoming critical in finance as market data volumes surge across equities, ETFs, FX, options, and high-frequency trading streams. This growth creates a core challenge for mature financial AI systems: models must learn from larger historical corpora while still meeting real-time latency constraints in trading, risk management, and derivative pricing. We use exact nearest-neighbor learning for high-frequency financial time series as a concrete case study to show that Mojo-based financial AI can address this challenge. We introduce a Mojo SIMD k-d tree with variance-based splitting, contiguous flat-buffer storage, and compile-time vectorized distance computation. We also provide a runtime result showing that, under standard pruning and implementation-cost assumptions, the Mojo SIMD k-d tree asymptotically dominates Mojo SIMD brute force and scikit-learn's k-d tree in the fixed-stock, large-$n$, moderate-dimensional regime. Empirically, across eight financial datasets on x86 and ARM64 with up to 277K training samples, the method achieves 17.5--21.6$\times$ speedup over scikit-learn's k-d tree on x86 and 28.1--43.5$\times$ over scikit-learn brute force on ARM64 equity/ETF datasets, while preserving exact outputs. Beyond nearest-neighbor inference, Mojo's compiled execution enables an Extra Trees-based implied-volatility pricing model to train on $10\times$ more options data, reducing put-IV RMSE by 8.0\%. These results position Mojo as a scalable, production-ready stack for financial AI and a promising foundation for efficient AI in other data-intensive fields. \keywords{Financial AI \and AI Efficiency \and Mojo \and SIMD \and K-D Trees \and KNN \and High-Frequency Trading \and Financial Time Series \and Scaling}

2606.10217 2026-06-10 cs.LG cs.CR 新提交

Alignment Defends LLMs from Property Inference Attacks

对齐防御LLM免受属性推断攻击

Pengrun Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

AI总结 提出基于对齐的防御方法,通过后训练调整模型输出分布,在不修改训练数据的情况下缓解属性推断攻击,并保持效用与机密性的平衡。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地在包含敏感数据集级属性的领域特定数据集上进行微调。最近的研究表明,此类数据集级信息可以通过属性推断攻击有效提取,构成机密性风险。现有的防御措施主要通过修改训练数据分布来运作,因此需要访问原始数据并重新训练模型,限制了其在数据不可用或模型已部署场景中的适用性。在这项工作中,我们提出了基于对齐的防御方法,用于缓解LLMs中的属性推断攻击。我们的方法通过后训练对齐将模型的输出分布重塑为目标属性比率,而无需修改训练数据。具体而言,我们通过构建偏好对和定义特定奖励函数,分别适配两种广泛使用的RLHF框架——直接偏好优化(DPO)和组相对策略优化(GRPO)——作为我们的防御方法。通过全面的实验,我们展示了基于对齐的防御方法有效缓解了属性推断攻击,同时保持了良好的效用-机密性权衡。

英文摘要

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets that may contain sensitive, dataset-level properties. Recent work has shown that such dataset-level information can be effectively extracted through property inference attacks, posing a confidentiality risk. Existing defenses against these attacks primarily operate by modifying the training data distribution and hence require access to the original data and retraining the model, limiting their applicability to settings where data is unavailable or models are already deployed. In this work, we propose alignment-based defenses for mitigating property inference attacks in LLMs. Our approach reshapes the model's output distribution towards a target property ratio via post-training alignment, without modifying the training data. In particular, we adapt two widely used RLHF frameworks--Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO)--as our defenses by constructing preference pairs and defining a specific reward function respectively. Through comprehensive experiments, we show that our alignment based defenses effectively mitigate property inference attacks while maintaining a strong utility confidentiality tradeoff.

2606.10216 2026-06-10 cs.LG cs.AI 新提交

A Source Domain is All You Need: Source-Only Cross-OS Transfer Learning for APT Anomaly Detection via Semantic Alignment and Optimal Transport

一个源域足矣:基于语义对齐和最优传输的仅源域跨操作系统APT异常检测迁移学习

Sidahmed Benabderrahmanea, Petko Valtchev, James Cheney, Talal Rahwan

AI总结 针对跨操作系统APT检测中目标域无标签的挑战,提出基于最优传输的仅源域异常评分框架,通过语义抽象和三种偏差通道实现零目标监督下的异常排序。

详情
AI中文摘要

高级持续性威胁(APT)是隐蔽的多阶段网络攻击,由于标记痕迹稀缺、严重的类别不平衡以及生成真实恶意行为的挑战,其检测十分困难。这些挑战在跨操作系统(cross-OS)设置中被放大,此时在一个源平台上训练的检测器必须部署在无标签的目标平台上,且无法访问目标域标签。我们利用系统级溯源轨迹研究这种仅源域的跨操作系统APT检测问题,并提出一个基于传输的框架,在零目标监督下对异常目标进程进行排序。该框架将进程行为抽象为结构化的自然语言描述,使用预训练语言模型进行嵌入,并构建源域正常参考用于目标评分。它结合了三种证据通道:与源域正常原型的语义偏差、由图自编码器捕获的结构偏差、以及通过最优传输(OT)度量的几何偏差。主要贡献是一个基于OT的重心异常分数,该分数将目标嵌入投影到源域正常流形上,并量化残差传输不匹配。我们进一步引入熵加权、角度感知和密度感知的OT变体,以捕获不确定性、方向漂移和稀疏支持行为。在DARPA透明计算数据(涵盖Linux、Windows、BSD和Android)上的评估,涉及两个APT场景和十二个跨操作系统传输对,表明所提框架在ROC-AUC和nDCG上优于仅源域异常检测基线。结果表明,仅源域溯源建模结合语义抽象和基于OT的异常评分,可以在没有目标域监督的情况下支持实际的跨平台APT检测。

英文摘要

Advanced Persistent Threats (APTs) are stealthy, multi-stage cyberattacks whose detection is difficult due to scarce labeled traces, severe class imbalance, and the challenge of generating realistic malicious behavior. These challenges are amplified in cross-operating-system (cross-OS) settings, where a detector trained on one source platform must be deployed on an unlabeled target platform without access to target-domain labels. We study this source-only cross-OS APT detection problem using system-level provenance traces and propose a transport-based framework for ranking anomalous target processes under zero target supervision. The framework abstracts process behavior into structured natural-language descriptions, embeds them using pretrained language models, and constructs a source-normal reference for target scoring. It combines three evidence channels: semantic deviation from source-normal prototypes, structural deviation captured by graph autoencoding, and geometric deviation measured through Optimal Transport (OT). The main contribution is an OT-based barycentric anomaly score that projects target embeddings onto the source-normal manifold and quantifies residual transport mismatch. We further introduce entropy-weighted, angle-aware, and density-aware OT variants to capture uncertainty, directional drift, and sparse-support behavior. Evaluation on DARPA Transparent Computing data spanning Linux, Windows, BSD, and Android, across two APT scenarios and twelve cross-OS transfer pairs, shows that the proposed framework improves ROC-AUC and nDCG over source-only anomaly-detection baselines. The results demonstrate that source-only provenance modeling, combined with semantic abstraction and OT-based anomaly scoring, can support practical cross-platform APT detection without target-domain supervision.

2606.10213 2026-06-10 cs.SD cs.AI 新提交

Automated Pronunciation Evaluation for Korean Toddler Speech using Speech Diarization and Self-Supervised Learning

基于说话人日志和自监督学习的韩语幼儿语音自动发音评估

Diane Myung-kyung Woodbridge, Jee Hyun Suh

AI总结 提出结合神经说话人日志与自监督学习的端到端韩语幼儿发音评估流水线,引入53名2-5岁儿童录音语料库,通过多模型集成实现辅音和元音分类平衡准确率0.782。

Comments This paper will be presented at IEEE ICTs4ehealth in June, 2026

详情
AI中文摘要

言语障碍约占韩国儿科沟通障碍病例的44%,然而针对韩语幼儿语音的自动评估工具仍不成熟。本文提出一种端到端的韩语幼儿语音自动发音评估流水线,结合神经说话人日志与自监督语音表示学习。我们引入了一个经IRB批准的新语料库,包含53名2-5岁韩语儿童的录音。其中53名受试者的子集由三位独立评审员标注,得到1,190个辅音和748个元音的词汇级二元正确性标签。我们评估了三种说话人日志模型,发现NeMo SortFormer凭借其到达时间排序的Transformer架构,实现了88.69%的说话人计数准确率和33.04%的日志错误率(DER),该架构处理了表现出aegyo的年轻女性看护者与幼儿语音之间的声学混淆。对于发音评分,我们比较了三种自监督学习(SSL)骨干网络在多种池化策略下的表现。跨模型集成将辅音预测路由到HuBERT-large,元音预测路由到WavLM-large,实现了0.720和0.845的平衡准确率,平均值为0.782。

英文摘要

Speech sound disorders affect approximately 44% of Korean pediatric communication disorder cases, yet automated assessment tools for Korean toddler speech remain underdeveloped. This paper presents an end-to-end pipeline for automated pronunciation evaluation of Korean toddler speech, combining neural speaker diarization with self-supervised speech representation learning. We introduce a novel IRB-approved corpus of 53 recordings from Korean-speaking children aged 2-5 years. A subset of 53 subjects was annotated by three independent reviewers, yielding 1,190 consonant and 748 vowel word-level binary correctness labels. We evaluate three diarization models, finding that NeMo SortFormer achieves 88.69% speaker count accuracy and 33.04% diarization error rate (DER) owing to its arrival-time-sorted transformer architecture, which handles the acoustic confound between young female caregivers exhibiting aegyo and toddler speech. For pronunciation scoring, we compare three self-supervised learning (SSL) backbones across multiple pooling strategies. A cross-model ensemble routing consonant prediction to HuBERT-large and vowel prediction to WavLM-large achieves balanced accuracies of 0.720 and 0.845, with a mean of 0.782.

2606.10208 2026-06-10 cs.RO cs.AI 新提交

Exploration of Foundation Model-Based Robots in Patient and Elderly Care

基于基础模型的机器人在患者和老年人护理中的探索

Zhiwen Qiu, Wei Liu, Yuexing Hao

AI总结 本文综述了基于基础模型的护理机器人在设计特征、用户体验和护理效果方面的现状,指出当前系统多用于语音交互,多模态和物理自主性有限,并呼吁向护理特定评估标准和负责任自主性发展。

详情
AI中文摘要

随着全球人口老龄化,对老年人和患者护理的需求迅速增长。基础模型越来越多地被集成到机器人和交互代理中,有望实现更灵活的沟通和个性化辅助。然而,护理环境需要可靠且与工作流程兼容的系统,并具备可问责的人类监督,目前尚不清楚当前具身系统能否将技术进步转化为临床影响。本综述从三个方面综合了基于基础模型的护理机器人:设计特征、用户体验以及护理相关结果的证据。当前系统最常将基础模型用作以语音为中心的社会辅助具身中的对话和推理层,而多模态基础和物理自主性仍然有限。实证评估报告了积极的可用性和参与度益处,但交互流程中仍存在可靠性故障,如幻觉和对话中断。护理影响的证据主要集中在认知参与和参与等近期结果上,而经过验证的临床或护理相关变化的证据有限。我们认为,未来的研究应转向护理特定的评估标准、可问责的自主性以及融入护理工作流程,以支持更具响应性和负责任的护理技术。

英文摘要

Demand for older-adult and patient care is growing rapidly as populations age worldwide. Foundation models are increasingly being integrated into robots and interactive agents, with the promise of more flexible communication and personalized assistance. However, care settings require reliable and workflow-compatible systems with accountable human oversight, and it remains unclear whether current embodied systems can translate technical advances into clinical impact. This Perspective synthesizes foundation model-based care robots across three areas: design features, user experience, and evidence for care-related outcomes. Current systems most commonly use foundation models as conversational and reasoning layers within voice-centered socially assistive embodiments, while multimodal grounding and physical autonomy remain limited. Empirical evaluations report positive usability and engagement benefits, but reliability failures persist across the interaction pipeline such as hallucinations and conversational breakdowns. Evidence for care impact remains concentrated in proximal outcomes such as cognitive engagement and participation, with limited evidence for validated clinical or care-related changes. We argue that future research should transition toward care-specific evaluation standards, accountable autonomy, and integration into care workflows to support more responsive and responsible care technologies.

2606.10199 2026-06-10 cs.LG cs.CL 新提交

A Continuous-Time Markov Chain Framework for Insertion Language Models

插入语言模型的连续时间马尔可夫链框架

Dhruvesh Patel, Benjamin Rozonoyer, Soumitra Das, Tahira Naseem, Tim G. J. Rudner, Andrew McCallum

AI总结 提出基于连续时间马尔可夫链的插入语言模型去噪框架,统一现有方法,在规划任务中优于自回归和掩码扩散模型,语言建模中与现有方法竞争且采样更灵活。

Comments Accepted at AISTATS 2026. Code is available at https://github.com/dhruvdcoder/ctmc_dilm

详情
AI中文摘要

插入语言模型(ILMs)相比从左到右生成和基于掩码的生成具有若干优势。然而,现有的插入式生成公式大多是临时性的。在本文中,我们通过将噪声过程建模为变长序列空间上的连续时间马尔可夫链,从第一性原理推导出ILMs的扩散式去噪目标。我们表明,先前的ILMs公式可以视为该去噪框架的特例。通过在合成规划任务上的实证评估,我们展示了所提出的方法保留了插入式生成相对于从左到右生成和掩码扩散模型的优势。在语言建模中,我们的基于扩散的方法与从左到右生成和掩码扩散模型具有竞争力,同时与现有的插入语言模型相比,在采样方面提供了额外的灵活性。

英文摘要

Insertion Language Models (ILMs) offer several advantages over left-to-right generation and mask-based generation. However, existing formulations of insertion-based generation have largely been ad-hoc. In this paper, we derive a diffusion-style denoising objective for ILMs from first principles by formulating the noising process as a continuous-time Markov chain on the space of variable-length sequences. We show that previous formulations of ILMs can be viewed as special cases of this denoising framework. Through empirical evaluation on a synthetic planning task, we show that the proposed approach retains the benefits of insertion-based generation over left-to-right generation and masked diffusion models. In language modeling, our diffusion-based approach is competitive with left-to-right generation and masked diffusion models, while offering additional flexibility in sampling compared to existing insertion language models.

2606.10196 2026-06-10 cs.CV cs.AI 新提交

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

Fisher引导的自适应微调渐进参数选择

Ghodsiyeh Rostami, Po-Han Chen, Mahdi S. Hosseini

AI总结 提出FisherAdapTune框架,通过追踪Fisher几何的时间漂移渐进选择参数组,在保持适应动态的同时冻结稳定参数以降低泛化误差界,在分割任务上提升分布内性能和零样本迁移。

详情
AI中文摘要

参数高效微调(PEFT)旨在使用少量可训练参数子集来适应预训练模型,然而,现有大多数方法从固定的架构启发式中选择该子集,而不是使用动态的、任务感知的标准。我们引入了\textbf{FisherAdapTune},一个Fisher引导的自适应微调框架,通过追踪参数组Fisher几何的时间漂移来渐进选择参数组。从微调的PAC-Bayesian视角出发,我们将泛化误差界分解为Fisher加权更新成本,并表明曲率贡献已稳定的参数组可以被冻结,以减少误差界而不中断剩余的适应动态。FisherAdapTune使用连续Fisher分布之间的尺度不变Jensen-Shannon距离来制定这一标准,从而产生一个自适应的活动参数集。我们在下游分割任务上评估了我们的方法,结果表明FisherAdapTune在多种设置下提升了分布内性能和零样本迁移,验证了Fisher结构漂移是高效、任务感知适应的有用信号。我们公开发布了代码(\href{this https URL}{code}),以促进我们提出方法的进一步应用。

英文摘要

Parameter-efficient fine-tuning (PEFT) aims to adapt pretrained models with a small trainable parameter subset, however, most existing methods choose this subset from fixed architectural heuristics rather than using dynamic, task-aware criteria. We introduce \textbf{FisherAdapTune}, a Fisher-guided Adaptive Fine-Tuning framework that progressively selects parameter groups by tracking the temporal drift of their Fisher geometry. Starting from a PAC-Bayesian view of fine-tuning, we decompose the generalization error bound into Fisher-weighted update costs and show that parameter groups whose curvature contribution has stabilized can be frozen to reduce the error bound without interrupting the remaining adaptation dynamics. FisherAdapTune formulates this criterion with a scale-invariant Jensen-Shannon distance between consecutive Fisher distributions, yielding an adaptive active parameter set. We evaluate our approach on a downstream segmentation task, and results show FisherAdapTune improves the in-distribution performance and zero-shot transfer in multiple settings, validating that Fisher structural drift is a useful signal for efficient, task-aware adaptation. We release our \href{https://github.com/AtlasAnalyticsLab/FisherAdapTune}{code} publicly to enable further application of our proposed approach.