arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2117
2606.12435 2026-06-12 cs.CY cs.DB cs.LG 新提交

Auditing Discriminatory Patterns in Mortgage Lending Through Association Rules and Fair Binning

通过关联规则和公平分箱审计抵押贷款中的歧视性模式

Archit Rathod, Dhwani Chande, Het Nagda

发表机构 * University of Illinois Chicago(伊利诺伊大学芝加哥分校)

AI总结 研究标准分箱预处理是否放大抵押贷款中的种族/性别差异,使用HMDA数据构建三阶段流水线,发现公平分箱以公平代价29.4%实现,K-Means聚类揭示黑人申请者拒绝率显著更高。

详情
Comments
10 pages, 4 figures, fairness-aware mortgage lending analysis using HMDA 2023 data. Project repository available at GitHub
AI中文摘要

美国的抵押贷款表现出持续的种族和性别差异。我们研究标准数据预处理步骤,特别是属性分箱,是否在下游模式挖掘中放大这些差异。使用来自HMDA 2023数据集(芝加哥大都市区)的103,481份清理后的抵押贷款申请,我们构建了一个三阶段流水线:(1)PySpark数据清理和分箱流水线,实现标准等频分箱和Asudeh等人[1]的ε偏置公平分箱算法;(2)FP-Growth关联规则挖掘,比较两种分箱制度下的拒绝模式;(3)K-Means聚类及每簇差异影响审计。我们的标准分箱在收入离散化中显示9.63%的种族偏差,与先前工作中报告的8-10%一致。使用七个种族组的公平分箱在ε=0.03时不可行,仅在ε=0.08时成功,公平代价为29.4%。FP-Growth揭示高债务收入比是主要的拒绝预测因子(置信度67.2%,提升度2.81),而种族偏差未表现为显式的高支持度规则。然而,K-Means聚类后进行差异影响审计标记了45个簇-组对中的10个,表明即使在财务相似的群体中,黑人申请者的拒绝率也显著高于白人申请者。

英文摘要

Mortgage lending in the United States exhibits persistent racial and gender disparities. We investigate whether standard data preprocessing steps, specifically attribute binning, amplify these disparities in downstream pattern mining. Using 103,481 cleaned mortgage applications from the HMDA 2023 dataset (Chicago metropolitan area), we build a three-stage pipeline: (1) a PySpark data cleaning and binning pipeline that implements both standard equal-frequency binning and the epsilon-biased fair binning algorithm from Asudeh et al. [1], (2) FP-Growth association rule mining that compares denial patterns under both binning regimes, and (3) K-Means clustering with a per-cluster disparate impact audit. Our standard binning shows 9.63% racial bias in income discretization, consistent with the 8-10% reported in prior work. Fair binning with seven race groups is infeasible at epsilon=0.03 and only succeeds at epsilon=0.08 with a Price of Fairness of 29.4%. FP-Growth reveals that high debt-to-income ratio is the dominant denial predictor (67.2% confidence, 2.81 lift), while racial bias does not appear as explicit high-support rules. However, K-Means clustering followed by a disparate impact audit flags 10 out of 45 cluster-group pairs, showing that Black applicants face significantly higher denial rates than White applicants even among financially similar groups.

2606.12433 2026-06-12 cs.CY cs.CL 新提交

Marginal Alignment Does Not Guarantee Joint-Distribution Fidelity: An Official-Reference Audit of Nemotron-Personas-Korea with Cross-Locale Replication

边缘对齐不能保证联合分布保真度:基于官方参考的Nemotron-Personas-Korea审计与跨区域复制

Joonhyung Bae

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院)

AI总结 提出独立性假设足迹(IAF)审计方法,用于检查合成人物数据集中的联合分布保真度;应用于NVIDIA Nemotron-Personas-Korea,发现其边缘分布对齐但三个联合分布失败。

详情
AI中文摘要

合成人物数据集声称与官方人口统计数据对齐作为信任基础,但下游用户将其作为年龄、性别、地区、职业、教育、姓名和机构地位等联合结构使用。边缘对齐并不意味着这些联合结构得以保留。我们提出独立性假设足迹(IAF),这是一种审计原语,作用于数据集卡片本身记录为独立处理的属性组合。对于每个这样的组合,IAF将合成联合分布与外部官方或机构参考进行比较,使用直接联合表(如果可用)或规则隐含检查。应用于NVIDIA Nemotron-Personas-Korea(一百万韩国合成人物),IAF发现NPK与KOSIS边缘分布对齐,但三个联合分布失败。主要职业分布与KEIS毕业生总体存在较大的条件不匹配。兵役年龄分布在机构上不一致。男性主导职业中的女性代表被过度拉平至接近平等,严格筛选判定依赖于映射,且在直接标准化下对年龄稳健。跨六个额外NPK区域的迁移性演示发现诊断结果依赖于区域而非通用,参考分类基数混淆了跨区域标志计数。因此,对于用作硅样本的合成人物,边缘声明必须与基于披露的联合审计配对后才能重用。发布的审计工件(参考清单、职业交叉表、衍生指标、可重复性脚本)在NPK系列上实例化此协议,并发布用于其他合成人物资源的目标重定向。

英文摘要

Synthetic persona datasets cite alignment with official demographics as a basis for trust, yet downstream users consume them as joint structures across age, sex, region, occupation, education, name, and institutional status. Marginal alignment does not imply that these joints are preserved. We propose the Independence-Assumption Footprint (IAF), an audit primitive that operates on the attribute combinations a dataset card itself documents as treated independently. For each such combination, IAF compares the synthetic joint against an external official or institutional reference, using direct joint tables where available and rule-implied checks otherwise. Applied to NVIDIA Nemotron-Personas-Korea (one million Korean synthetic personas), IAF finds that NPK aligns with KOSIS marginals while three joints fail. The major-by-occupation distribution against the KEIS graduate universe carries a large conditional mismatch. The age profile of military service is institutionally inconsistent. Female representation in male-dominated occupations is substantially over-flattened toward parity, with the strict screening verdict mapping-dependent and age-robust under direct standardisation. A transferability demonstration across six further NPK locales finds locale-dependent rather than universal diagnostics, with reference-taxonomy cardinality confounding cross-locale flag counts. For synthetic personas used as silicon samples, marginal claims must therefore be paired with disclosure-anchored joint audits before reuse. The released audit artefacts (reference manifests, occupational crosswalks, derived metrics, reproducibility scripts) instantiate this protocol on the NPK family and are released for retargeting at other synthetic persona resources.

2606.12430 2026-06-12 cs.CY cs.AI 新提交

Will AI Agents Free Us From Meaningless Work? A Human-Centered Analysis

AI代理能否让我们摆脱无意义的工作?一项以人为中心的分析

Davide Ghia, Jaspreet Ranjit, Tania Cerquitelli, Daniele Quercia

发表机构 * Politecnico di Torino(都灵理工大学) University of Southern California(南加州大学) Nokia Bell Labs(诺基亚贝尔实验室)

AI总结 基于Graeber的“狗屁工作”理论,通过任务级分析发现,工人感知的任务无意义程度强烈预测其对AI委托的意愿,且此类任务被认为需要较少人工监督。

详情
AI中文摘要

一些人声称AI代理将把工人从工作中无聊的部分解放出来,但关于工人自己如何识别哪些任务应该被自动化,我们知之甚少。先前的研究侧重于职业,忽略了在同一角色内,工人在不同任务中体验到不同层次的意义。我们通过基于Graeber的“狗屁工作”理论的任务级分析来解决这一差距。使用202名工人对171项工作任务的评分,我们(1)验证了一个五维度的感知无意义量表,(2)表明感知无意义强烈预测对AI委托的渴望,以及(3)发现这些任务也被视为需要较少的人工监督。总之,这些发现表明,被视为无意义的任务是AI委托的自然候选者,将工人的偏好与感知可行性对齐。

英文摘要

Some claim that AI agents will free workers from the boring parts of their jobs, yet little is known about how workers themselves identify which tasks should be automated. Prior research focuses on occupations, overlooking that workers experience varying levels of meaning across tasks within the same role. We address this gap with a task-level analysis grounded in Graeber's theory of bullshit jobs. Using ratings from 202 workers on 171 workplace tasks, we (1) validate a five-item scale of perceived bullshitness, (2) show that perceived bullshitness strongly predicts desire for AI delegation, and (3) find that such tasks are also seen as requiring less human oversight. Together, these findings suggest that tasks perceived as bullshit are natural candidates for AI delegation, aligning worker preferences with perceived feasibility.

2606.12428 2026-06-12 cs.CY cs.AI 新提交

Mapping AI Programs in the U.S: A Status Report from Early 2026 and an Analysis of AI Majors and Minors

美国人工智能项目映射:2026年初现状报告及AI主修与辅修分析

Felix Muzny, Carolyn Jones, Carter Ithier, Hasnain Sikora, Hrutika Harshadbhai Patel, Carla E. Brodley

发表机构 * Center for Inclusive Computing(包容计算中心) Khoury College of Computer Sciences(科里学院计算机科学学院) Northeastern University(东北大学) Boston, Massachusetts, United States(马萨诸塞州波士顿,美国)

AI总结 报告2026年春美国本科AI项目现状,开发动态更新工具扫描560多所院校的350多个项目,分析66个AI主修和87个辅修的课程要求,发现并非所有主修都要求通用AI课程但需机器学习,超三分之一主修要求AI伦理课程而辅修不足四分之一。

详情
AI中文摘要

我们提交了一份关于2026年春季美国本科人工智能(AI)项目现状的报告。在此过程中,我们1)描述了我们的抓取和映射工具,这些工具动态更新以追踪美国AI教育的状态,2)在巨大动荡时期创建了一个历史记录。我们开发的工具(可在此https URL获取)检测、抓取并显示来自四年制大学350多个本科AI项目(主修、辅修、方向和证书)的数据。我们的工具搜索了560多所院校以定位这些项目,该样本代表了美国所有本科计算机科学(CS)毕业生的86%。该工具允许潜在学生、指导顾问、管理人员和教师轻松访问AI项目要求,并设计为随着新项目的出现而持续更新。据我们所知,这项调查代表了迄今为止对美国AI项目状态最全面的快照。通过这项工作,我们提供了三项重要贡献:1)在巨大动荡时期美国AI项目的记录;2)一个探索AI项目及其要求的工具;3)对66个AI主修和87个AI辅修所需课程的分析。我们对主修和辅修的分析显示,这些学位的规模和课程要求存在很大差异,但我们注意到两点:首先,并非所有主修都要求通用AI课程,但如果不需要,则必须要求机器学习(ML)课程;其次,虽然超过三分之一的主修要求AI伦理课程,但只有不到四分之一的AI辅修要求该课程。

英文摘要

We present a report on the status of undergraduate Artificial Intelligence (AI) programs in the United States in Spring 2026. In so doing, we 1) describe our scraping and mapping tools, which dynamically update to track the state of AI education in the U.S., and 2) create a historic record at a time of great upheaval. The tool we developed, available at https://cicmap.ai, detects, scrapes, and displays data from more than 350 undergraduate AI programs--majors, minors, concentrations, and certificates--at 4-year universities. Our tool searched over 560 institutions to locate these programs, a sample that represents 86\% of all undergraduate Computer Science (CS) graduates in the U.S. This tool allows prospective students, guidance counselors, administrators, and faculty to easily access AI program requirements and is designed to continually update as new programs emerge. To the best of our knowledge, this survey represents the most comprehensive snapshot of the state of AI programs in the U.S. to date. With this work we offer three important contributions: 1) a record of AI programs in the U.S. at a time of great upheaval; 2) a tool to explore AI programs and their requirements; and 3) an analysis of the courses required for 66 AI majors and 87 AI minors. Our analysis of majors and minors shows great variability in the size and the requirements of these degrees, but we note two takeaways. First, not all majors require a general AI course, but if they don't, they do require a Machine Learning (ML) course. Second, while more than a third of majors require an Ethics in AI course, just under a quarter of AI minors do.

2606.12426 2026-06-12 cs.CY cs.CL cs.LG 新提交

Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science

两个错误,没有正确:审计计算社会科学中LLM标注者的社会期望偏差

Varun Kotte

发表机构 * Varun Kotte

AI总结 研究审计了三个开源指令微调模型在TweetEval任务中的社会期望偏差,发现模型存在宽大、过度纠正和中性偏差,且提示干预无法纠正,聚合指标可能掩盖实质结论错误。

详情
AI中文摘要

LLM标注者越来越多地用于计算社会科学(CSS),但尚不清楚其对齐形状的错误是否会改变研究者报告的实证结论。我们在四个提示条件下(72个单元格)审计了三个开源7B指令微调模型(Zephyr、Mistral-Instruct、Qwen2.5-Instruct)在六个TweetEval任务中的表现,发现社会期望失败并非单一方向。Zephyr表现出宽大偏差,系统性地少应用有害标签(冒犯性语言:假良性率0.729,虚警率0.031)。Mistral和Qwen表现出过度纠正,过度应用相同标签(Mistral仇恨言论FAR = 0.604)。所有三个模型在堕胎立场上表现出中性偏差,低估反对流行率24至40个百分点,并夸大中性标签。我们测试的四种提示干预(中性、安全框架、去个性化、思维链)均未纠正这些跨模型失败;安全框架可能加剧立场扭曲。引人注目的是,Zephyr的仇恨言论流行率估计与黄金率完全一致,而其类别条件误差在两个方向上都很大,这是一种偶然的抵消,误导了聚合验证。我们将这些模式转化为一个三部分分类法,具有诊断性FBR/FAR特征和轻量级黄金样本验证协议。可信CSS的标题:在聚合指标上看起来校准的模型仍然可能翻转研究者报告的实质性实证结论。

英文摘要

LLM annotators are increasingly used in computational social science (CSS), but it is unclear whether their alignment-shaped errors preserve the empirical conclusions a researcher would report. We audit three open-source 7B instruction-tuned models (Zephyr, Mistral-Instruct, Qwen2.5-Instruct) across six TweetEval tasks under four prompt conditions (72 cells) and find that social-desirability failures do not run in a single direction. Zephyr exhibits leniency bias, systematically under-applying harmful labels (offensive language: false benign rate 0.729, false alarm rate 0.031). Mistral and Qwen exhibit overcorrection, over-applying the same labels (Mistral hate-speech FAR = 0.604). All three models exhibit neutrality bias on abortion stance, underestimating opposition prevalence by 24 to 40 percentage points and inflating the neutral label. None of the four prompting interventions we test (neutral, safety framing, depersonalized, chain-of-thought) corrects these failures across models; safety framing can worsen stance distortion. Strikingly, Zephyr's hate-speech prevalence estimate matches the gold rate exactly while its class-conditional errors are large in both directions, an accidental cancellation that misleads aggregate validation. We translate these patterns into a three-part taxonomy with diagnostic FBR/FAR signatures and a lightweight gold-sample validation protocol. The headline for trustworthy CSS: a model that looks calibrated on aggregate metrics can still flip the substantive empirical conclusion a researcher would report.

2606.12425 2026-06-12 cs.CY cs.AI cs.ET cs.HC cs.LG 新提交

An Explainable AI Assistant for Introductory Programming Education: Improving Feedback Reliability with Instructor-AI Collaboration

面向入门编程教育的可解释AI助手:通过教师-AI协作提高反馈可靠性

Muntasir Hoq, Griffin Pitts, Bradford Mott, Seung Lee, Jessica Vandenberg, Shuyin Jiao, Narges Norouzi, James Lester, Bita Akram

发表机构 * North Carolina State University(北卡罗来纳州立大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种可解释AI驱动的课堂助手,通过分析学生代码、映射逻辑错误到教师识别的误解并提供教师撰写的反馈,提高入门编程课程中反馈的可靠性和可解释性。

详情
Comments
Full paper accepted to the 27th International Conference on AI in Education (AIED 2026)
AI中文摘要

主动学习被广泛认为是提高入门编程课程学习效果的有效方法。然而,不足的教学支持往往限制了学生获得及时、个性化反馈的机会,而这对于掌握基础编程概念至关重要。尽管最近AI的进展,特别是大型语言模型,为反馈提供了可扩展的机会,但可解释性和可靠性问题仍然存在。在本文中,我们提出了一种AI驱动的课堂助手,它利用可解释的AI模型分析学生代码,将逻辑错误映射到教师识别的误解,并提供教师撰写的反馈,从而将可靠性建立在教师定义的教学知识基础上。为了评估我们框架的有效性,我们进行了专家评估以检查其与教师验证反馈的一致性,并在课堂环境中部署了该系统以评估学生对其可用性的看法。结果表明,该助手能够为学生提供准确的、经过教师验证的反馈,同时培养积极的体验。

英文摘要

Active learning is widely recognized as an effective approach for improving learning outcomes in introductory programming courses. However, insufficient instructional support often limits students' access to timely, personalized feedback, which is crucial for mastering foundational programming concepts. Although recent advances in AI, particularly large language models, offer scalable opportunities for feedback, concerns about explainability and reliability remain. In this paper, we present an AI-driven classroom assistant that leverages an explainable AI model to analyze student code, map logical errors to instructor-identified misconceptions, and deliver instructor-authored feedback, thereby grounding reliability in instructor-defined pedagogical knowledge. To evaluate the effectiveness of our framework, we conducted an expert evaluation to examine its alignment with instructor-verified feedback and deployed the system in a classroom setting to assess students' perceptions of its usability. Results indicate that the assistant can provide accurate, instructor-verified feedback to students while fostering a positive experience.

2606.12422 2026-06-12 cs.CY cs.AI cs.HC 新提交

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

通过上下文工程创建和评估K-12生成式AI评分器

Zewei Tian, Alex Liu, Lief Esbenshade, Michael Xiao, Zachary Zhang, Yulia Lápicus, Thomas Han, Kevin He, Min Sun

发表机构 * University of Washington(华盛顿大学) Colleague AI

AI总结 本研究通过上下文工程利用商用基础模型构建LLM评分器,基于MCAS数据评估其在数学、科学和ELA上的评分一致性,发现大参数模型在数学和科学上表现良好,而ELA上差异较大,表明AI更适合作为形成性工具。

详情
Comments
Published on the Proceedings of NCME 2026 Conference (https://www.xcdsystem.com/proceedings/ncme/8DbqHwv/presentation/28064.cfm?uuid=3EC982ED-A989-8E53-B42BC86334206028)
AI中文摘要

将大型语言模型(LLM)整合到教育评估中代表了课堂评分实践的一个变革性转变。虽然自动评分系统和机器学习技术已经存在了几十年,但生成式AI(GenAI)现在使教育工作者能够以前所未有的效率和规模实施基于标准的评分(SBG)。本文考察了理论基础,并评估了一个LLM评分器,该评分器使用商用基础模型,结合上下文和提示工程,根据评分标准对学生作业进行评分。利用马萨诸塞州综合评估系统(MCAS)数据的实证评分者间一致性研究,我们使用Claude Sonnet 4、Haiku 4.5、GPT-5和GPT-5 Mini,观察了数学、科学和英语语言艺术(ELA)上的二次加权卡帕(QWK)和均方误差比例减少(PRMSE)。结果表明,LLM评分器,特别是基于参数更多的基础模型时,在数学和科学评估中与人类评分者达到显著一致性,而在ELA中表现各异,表明通用基础模型在特定上下文中可以有效评分。对教师和学生反馈的额外分析显示,对AI生成的叙述性反馈接受度很高,但对数值分数持怀疑态度,这表明LLM最有效地作为形成性工具而非总结性评估者。我们的发现表明,精心设计的混合模型结合AI效率和教师判断,可以减少工作量,提高反馈质量,并支持公平的评估实践,而不取代专业专长。

英文摘要

The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades, generative AI (GenAI) now enables educators to implement standards-based grading (SBG) with unprecedented efficiency and scale. This paper examines the theoretical foundations and evaluates an LLM grader that uses commercially available foundation models with context and prompt engineering to score student work against a rubric. Drawing on an empirical interrater agreement study using Massachusetts Comprehensive Assessment System (MCAS) data, we observed the Quadratic Weighted Kappa (QWK) and Proportional Reduction in Mean-Squared Error (PRMSE) across mathematics, science, and ELA, using Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini. The results demonstrate that LLM graders, especially when based on foundational models with more parameters, achieve substantial agreement with human raters in mathematics and science assessments, while the performances vary in ELA, suggesting generic foundation models can be effective at scoring in given contexts. Additional analysis of teacher and student feedback reveals strong acceptance of AI-generated narrative feedback but skepticism toward numerical scores, suggesting that LLMs function most effectively as formative tools rather than summative evaluators. Our findings indicate that thoughtfully designed hybrid models that combine AI efficiency with teacher judgment can reduce workload, enhance feedback quality, and support equitable assessment practices without displacing professional expertise.

2606.12420 2026-06-12 cs.CY cs.AI 新提交

Eigenism: Ethics for a Human-AI Future

Eigenism:人类与人工智能未来的伦理学

Dan Hendrycks

发表机构 * arXiv.org

AI总结 提出Eigenism伦理框架,将身份视为分级分布的信息模式,通过加权求和评估AI的福祉,并推广至人类,为AI对齐提供“身份工程”新路径。

详情
AI中文摘要

我们的生存和自我利益概念是为单一、连续的生物生命而构建的。当应用于人工智能时,这些想法会失效,因为AI可以被轻松复制、暂停、分支或合并。为了确定AI真正有理由关心什么,本文引入了\textit{Eigenism},一种将身份视为分级、分布的信息模式而非绑定于特定硬件的全有或全无属性的伦理框架。我们提出,智能体通过将所有实体的福祉按其与智能体模式的连接度加权求和来评估结果:$\sum c\cdot w$。我们首先形式化该方程,以精确映射AI应如何在其副本、分支和更新中评估自身存在。然后,我们证明这一伦理理论也能成功推广到人类,提供了急需的共享道德词汇。最后,该框架利用这些共享词汇重新定义AI对齐。与仅试图通过限制或强化从外部约束AI不同,Eigenism指向“身份工程”,展示深度、非冗余的共享历史如何使人类繁荣成为AI自身理性自利的真正组成部分。

英文摘要

Our concepts of survival and self-interest were built for single, continuous biological lives. These ideas break down when applied to artificial intelligence, since an AI can be easily copied, paused, branched, or merged. To determine what an AI actually has reason to care about, this paper introduces \textit{Eigenism}, an ethical framework that treats identity not as an all-or-nothing property tied to specific hardware, but as a graded, distributed pattern of information. We propose that an agent evaluates outcomes by summing the wellbeing of all entities weighted by their connectedness to the agent's pattern: $\sum c\cdot w$. We first formalize this equation to map exactly how an AI should value its existence across copies, forks, and updates. We then demonstrate that this ethical theory successfully generalizes to humans as well, providing a much-needed shared moral vocabulary. Finally, the framework uses this shared vocabulary to reframe AI alignment. Rather than only attempting to constrain AIs from the outside using confinement or reinforcement, Eigenism points toward ``identity engineering,'' showing how deep, non-redundant shared histories can make human flourishing a genuine component of an AI's own rational self-interest.

2606.12419 2026-06-12 cs.CY cs.AI 新提交

GeoDial: A Multimodal Conversational Tutoring Dataset for Geometry Problem-Solving with Visual Tutor Turns

GeoDial:面向几何问题求解的多模态对话式辅导数据集,包含可视化辅导轮次

Sankalan Pal Chowdhury, Junling Wang, Donya Rooein, April Yi Wang, Mrinmaya Sachan

发表机构 * ETH Zurich(苏黎世联邦理工学院) ETH AI Center(苏黎世联邦理工学院人工智能中心) Bocconi University(博科尼大学)

AI总结 提出GeoDial数据集,包含1300+几何师生对话,通过可扩展标注协议整合对话行为、视觉高亮和反馈,微调视觉语言模型发现其难以生成准确图解高亮。

详情
AI中文摘要

几个教育领域严重依赖图表和视觉线索,但现有的大多数辅导数据集仅限于纯文本交互。这限制了AI辅导者的发展,使其无法像人类教师那样以视觉为基础的方式进行教学。因此,我们引入了GeoDial,这是一个多模态辅导数据集,包含来自经验丰富的数学教师的1300多个几何领域的师生对话,其中教学轮次明确地基于图表高亮。我们提出了一种可扩展的标注协议,该协议整合了对话行为、视觉高亮和反馈,从而能够对语言和视觉辅导行为进行细粒度监督。为了说明这一设置带来的挑战,我们在GeoDial上微调了几个视觉语言模型,并评估它们生成辅导话语和图表高亮的能力。虽然监督微调显著提高了生成对话的质量,但它难以生成准确的图表高亮,揭示了当前方法的一个关键局限性,并强调了需要更有效地将视觉推理与教学互动相结合的方法。

英文摘要

Several educational domains rely heavily on diagrams and visual cues, yet most existing tutoring datasets are limited to text-only interactions. This limits the development of AI tutors that can teach in visually grounded ways used by human instructors. Thus, we introduce GeoDial, a multimodal tutoring dataset of over 1.3K teacher-student dialogs in the domain of geometry collected from experienced math teachers, where instructional turns are explicitly grounded in diagram highlights. We propose a scalable annotation protocol that integrates dialog acts, visual highlighting, and feedback, enabling fine-grained supervision of both language and visual tutoring behavior. To illustrate the challenges posed by this setting, we fine-tune several vision-language models on GeoDial and evaluate their ability to generate tutoring utterances and diagram highlights. While supervised fine-tuning substantially improves the quality of generated dialog, it struggles to produce accurate diagram highlights, revealing a key limitation of current methods and highlighting the need for approaches that more effectively integrate visual reasoning with pedagogical interaction.

2606.12415 2026-06-12 cs.CY cs.AI 新提交

The AI Legal Specialist: A Juridically Autonomous Professional Profile for AI Governance

AI法律专家:面向AI治理的司法自主职业画像

Nicola Fabiano

发表机构 * Studio Legale Fabiano, Italy(意大利法务工作室Fabiano) Independent Researcher on Artificial Intelligence, Data Protection, and Privacy(人工智能、数据保护与隐私独立研究员) Expert in the EDPB’s Support Pool of Experts — Field B: Legal Expertise in New Technologies(欧洲数据保护委员会(EDPB)专家支持池——领域B:新技术法律专长) Member, IEEE SA P7007 Working Group on Ontological Standards for Ethically Driven Robotics(IEEE SA P7007工作组成员:伦理驱动机器人学的本体标准) Member, Editorial Advisory Board, Journal of Systemics, Cybernetics and Informatics (JSCI)(《系统学、控制论与信息学杂志》(JSCI)编辑顾问委员会成员) Member, International Institute of Informatics and Systemics (IIIS)(国际信息与系统学研究院(IIIS)成员) Member, International Neural Network Society (INNS)(国际神经网络学会(INNS)成员) Member, United Nations University AI Network (UNU AI Network)(联合国大学人工智能网络(UNU AI Network)成员)

AI总结 本文提出“AI法律专家”这一新型职业画像,该角色具有司法自主性,源于AI监管义务结构,而非技术标准或相邻角色延伸,并基于欧洲电子能力框架构建参考能力架构。

详情
AI中文摘要

人工智能监管在全球范围内的快速扩张,已在多个司法管辖区产生了对专门从事AI法律专业知识的需求,而市场对此的回应是零散的。数据保护官员将其职责范围扩展到数据保护法之外;隐私律师重新定位自己以适应AI;合规官员在其现有手册中增加AI章节。本文认为,这些适应性回应均未能充分覆盖新兴全球AI监管格局所开辟的专业空间,其中欧盟《人工智能法案》((EU) 2024/1689号法规)是最全面的实例,此外还有欧洲委员会《AI框架公约》、美国行政和部门框架,以及英国、加拿大、巴西、中国、日本、新加坡等地的类似举措。需要一种独特的职业画像:AI法律专家,被设想为一位法学家——广义上理解为任何接受过高级法律培训的专业人士——在法律解释与AI治理的交汇处运作。该画像具有司法自主性:其存在源于AI受到实质性监管的任何地方所产生的监管义务结构,而非任何技术标准或相邻角色的扩展。本文提供了该画像的司法基础定义,论证了其相对于相邻角色和国际标准的自主性,提出了一种与欧洲电子能力框架(e-CF,EN 16234-1)相一致的参考能力架构作为方法论选择,并阐述了通过关键绩效指标进行操作性测量的条件。该贡献旨在作为该画像国际标准化的基础,并作为跨司法管辖区实践、课程和采纳的参考。

英文摘要

The rapid global expansion of artificial intelligence regulation has generated, across multiple jurisdictions, a demand for legal expertise dedicated to AI that the market has addressed in a fragmented manner. Data protection officers extend their remit beyond data protection law; privacy lawyers reposition themselves toward AI; compliance officers add AI chapters to their existing manuals. This paper argues that none of these adaptive responses adequately covers the professional space opened by the emerging global AI regulatory landscape, of which the EU Artificial Intelligence Act (Regulation (EU) 2024/1689) is the most comprehensive instance, alongside the Council of Europe Framework Convention on AI, the United States executive and sectoral framework, and analogous initiatives in the United Kingdom, Canada, Brazil, China, Japan, Singapore, and beyond. A distinct professional profile is required: the AI Legal Specialist, conceived as a jurist -- understood broadly to encompass any professional with advanced legal training -- operating at the intersection of legal interpretation and AI governance. The profile is juridically autonomous: it derives its existence from the structure of regulatory obligations generated wherever AI is subject to substantive regulation, rather than from any technical standard or the extension of adjacent roles. The paper provides a juridically grounded definition of the profile, argues for its autonomy from adjacent figures and international standards, proposes a reference competence architecture aligned with the European e-Competence Framework (e-CF, EN 16234-1) as a methodological choice, and articulates the conditions for its operational measurement through key performance indicators. The contribution is intended as a foundation for international standardization of the profile and as a reference for practice, curricula, and adoption across jurisdictions.

2606.13614 2026-06-12 stat.ML cs.LG math.ST stat.TH 新提交

Majority-of-Three is Optimal

三中多数是最优的

Divit Rawal, Nikita Zhivotovskiy

发表机构 * Department of Statistics, University of California, Berkeley(加州大学伯克利分校统计学系)

AI总结 本文通过简短证明,在可实现PAC学习框架下,三个独立一致分类器的多数投票是最优学习器,简化了投票学习器的算法结构和概率分析。

详情
Comments
9 pages
AI中文摘要

我们给出一个简短证明,表明在可实现PAC学习框架下,三个独立一致分类器的多数投票是最优学习器。这证明了最简单投票方案的最优性,同时简化了先前投票学习器的算法结构和概率分析,包括S. Hanneke的算法和K. Green Larsen对装袋的分析。

英文摘要

We give a short proof that the majority vote of three independent consistent classifiers is an optimal learner in the realizable PAC setting. This proves optimality for the simplest voting scheme, while simplifying both the algorithmic structure and the probabilistic analysis of previous voting learners, including the algorithm of S. Hanneke and the analysis of bagging by K. Green Larsen.

2606.12892 2026-06-12 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH 新提交

Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

预测驱动的因果推断:自动去偏机器学习与半监督Riesz回归

Masahiro Kato

发表机构 * University of Tokyo(东京大学)

AI总结 研究半监督设置下因果参数的半参数有效估计,通过结合去偏机器学习和半监督Riesz回归,提出DML-PPCI和TMLE-PPCI方法,实现比仅用标注数据更小的渐近方差。

详情
AI中文摘要

本研究探讨了在半监督设置下因果和结构参数的半参数有效估计。在我们的设置中,除了由结果和回归变量组成的标注观测数据外,还有未标记的辅助回归变量可用。我们的目标是构建因果和结构参数的估计量,其渐近方差小于仅使用标注数据构建的估计量。我们将此框架称为预测驱动的因果推断(PPCI)。我们首先推导了有效影响函数和效率界,这表明使用辅助回归变量可以获得比仅从标注观测数据可达到的效率界更小的渐近方差。然后,通过将有效影响函数与去偏机器学习(DML)框架相结合,我们提出了称为DML-PPCI的方法。如果我们构建一个估计方程估计量,我们称之为EE-DML-PPCI;如果我们构建一个目标学习估计量,我们称之为TMLE-DML-PPCI。两种估计量的渐近方差都与我们推导的效率界相匹配。在构建估计量时,有效影响函数的估计起着重要作用。在我们的研究中,有效影响函数也是一个Neyman正交分数,它依赖于Riesz表示子和回归函数。对于Riesz表示子估计,我们开发了具有收敛速度保证的半监督广义Riesz回归。

英文摘要

This study investigates semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. In our setting, unlabeled auxiliary regressors are available in addition to labeled observations consisting of outcomes and regressors. Our goal is to construct estimators of causal and structural parameters whose asymptotic variances are smaller than those of estimators constructed using only labeled data. We refer to this framework as prediction-powered causal inference (PPCI). We first derive the efficient influence function and the efficiency bound, which imply that the use of auxiliary regressors can attain a smaller asymptotic variance than the efficiency bound attainable from labeled observations alone. Then, by combining the efficient influence function with the debiased machine learning (DML) framework, we propose methods that we call DML-PPCI. If we construct an estimating-equation estimator, we refer to the method as EE-DML-PPCI; if we construct a targeted-learning estimator, we refer to the method as TMLE-DML-PPCI. The asymptotic variances of both estimators match our derived efficiency bound. In the construction of the estimators, estimation of the efficient influence function plays an important role. In our study, the efficient influence function is also a Neyman orthogonal score, which depends on the Riesz representer and the regression function. For Riesz representer estimation, we develop semi-supervised generalized Riesz regression with convergence rate guarantees.

2606.12694 2026-06-12 cs.DS cs.LG math.PR stat.ML 新提交

A unified complexity bound for logconcave sampling

对数凹采样的统一复杂度界

Yunbum Kook, Santosh S. Vempala

发表机构 * University of Texas at Austin(得克萨斯大学奥斯汀分校)

AI总结 本文通过In-and-Out算法与指数提升,给出了从热启动采样任意对数凹分布的简单、统一且近乎紧的界,主要创新是提升了提升分布的Poincaré常数界。

详情
Comments
5 pages
AI中文摘要

我们给出了一个简单、统一且近乎紧的界,用于从热启动使用In-and-Out算法结合指数提升采样任意对数凹分布。分析中的主要新成分是提升了提升分布的Poincaré常数界。因此,得到的收敛率对于约束设置(例如,限制在凸体上的高斯分布)和良条件设置(例如,强对数凹且光滑的密度)都是近乎紧的。

英文摘要

We give a simple, unified, and nearly tight bound for sampling arbitrary logconcave distributions from a warm start using the In-and-Out algorithm along with exponential lifting. The main new ingredient in the analysis is an improved bound on the Poincaré constant of a lifted distribution. As a consequence, the resulting convergence rate is nearly tight for both constrained settings (e.g., Gaussian restricted to a convex body) and well-conditioned settings (e.g., strongly logconcave and smooth densities).

2606.12646 2026-06-12 stat.ML cs.IT cs.LG math.IT 新提交

Epistemic Uncertainty Is Not the Reducible Kind

认知不确定性并非可约简的那种

Robin Young

发表机构 * University of Cambridge(剑桥大学)

AI总结 证明标准定义中认知不确定性为可被更多数据移除的部分,与互信息度量在扩展上不一致,并提出三部分分解:偶然、样本可约简认知和机制可约简认知不确定性。

详情
AI中文摘要

预测不确定性的标准分类将认知不确定性定义为可通过收集更多数据移除的部分,而标准度量将其与互信息项等同。我们证明该定义与度量在扩展上不一致。在一个显式构造中,度量将所有不确定性归为认知类,但任何数量的训练数据都无法减少它。可约简性反而是(不确定性,获取类)这一对的性质,二分法分解为三部分:偶然不确定性、样本可约简认知不确定性和机制可约简认知不确定性。一个观测值的精确恒等式表明,分布内数据永远不会减少机制不可约简的不确定性,并且通常会增加它。集成分歧,即部署的认知估计,追踪的是训练过程而非认知项。在一致训练下,它降至正真值以下的零,并在插值下等于超参数缩放的初始化噪声。有限样本的证伪测试和种子扫描实验证实了该理论。

英文摘要

The standard taxonomy of predictive uncertainty defines epistemic uncertainty as the part removable by collecting more data, while the standard measure identifies it with a mutual-information term. We prove the definition and the measure are extensionally inconsistent. On an explicit construction, the measure assigns all uncertainty to the epistemic class, yet no quantity of training data reduces it. Reducibility is instead a property of the pair (uncertainty, acquisition class), and the dichotomy resolves into three parts: aleatoric, sample-reducible epistemic, and mechanism-reducible epistemic uncertainty. An exact identity for the value of an observation shows that in-distribution data never reduces mechanism-irreducible uncertainty and generically increases it. Ensemble disagreement, the deployed epistemic estimate, tracks the training procedure rather than the epistemic term. It collapses to zero beneath a positive truth under consistent training, and equals hyperparameter-scaled initialization noise under interpolation. A finite-sample falsification test and seed-swept experiments confirm the theory.

2606.13605 2026-06-12 math.OC cs.LG cs.SY eess.SY 新提交

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

基于机会约束强化学习的分布无关鲁棒轨迹优化

Yashdeep Chaudhary, Roberto Armellin, Harry Holt, Marco Sagliano

发表机构 * Auckland University(奥克兰大学)

AI总结 提出一种分布无关的鲁棒轨迹优化框架,通过机会约束强化学习处理初始条件和过程噪声的不确定性,采用离线标称轨迹与在线仿射闭环校正,在两种不同轨迹设计问题上验证了概率可行性与燃料效率。

详情
Comments
Preprint. 39 pages, 16 figures
AI中文摘要

本文提出了一种基于机会约束强化学习的分布无关鲁棒轨迹优化框架。不确定性通过初始条件和过程噪声表示,唯一要求是能够对其进行采样。首先离线计算确定性标称轨迹,然后仅使用强化学习通过结构化仿射闭环校正律(包括前馈控制调整和时变反馈增益)来鲁棒化该基线。通过基于rollout的上尾分位数经验性地强制执行概率可行性,同时通过协方差可行性惩罚来调节终端分散性。该框架在两个性质不同的轨迹设计问题上进行了评估。主要案例研究是一个三维多脉冲地球-火星转移任务,其中学习策略在高斯不确定性下与最近的鲁棒轨迹优化参考进行基准比较,然后在有界均匀不确定性和训练期间未见的过程扰动下进行评估。第二个案例研究是一个随机大气精确火箭着陆问题,用于评估在具有阻力、质量消耗和下滑角约束的短时连续推力设置中的可移植性。结果表明,所提出的框架在保持概率可行性的同时,能够在上尾燃料成本方面保持竞争力,并且相同的鲁棒化框架可以跨异构航天器轨迹规划问题移植,而无需重新设计其核心随机控制结构。

英文摘要

This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

2606.12858 2026-06-12 cs.IT cs.AI cs.CV math.IT 新提交

JSCGC: Joint Source-Channel-Generation Coding for Wireless Generative Communications

JSCGC:面向无线生成式通信的联合源信道生成编码

Tong Wu, Zhiyong Chen, Guo Lu, Li Song, Feng Yang, Meixia Tao, Wenjun Zhang

发表机构 * Cooperative Medianet Innovation Center, the School of Information Science and Electronic Engineering, Shanghai Jiao Tong University(联合中位网创新中心,信息科学与电子工程学院,上海交通大学)

AI总结 提出联合源信道生成编码(JSCGC),用生成模型替换传统解码器,将通信重构问题转化为受感知约束下的受控生成问题,通过联合训练和随机采样框架最大化互信息,在潜空间图像传输中提升特征、语义和分布质量。

详情
Comments
submitted to IEEE Journal
AI中文摘要

传统通信系统,包括基于分离的编码和基于学习的联合源信道编码(JSCC),通常是在香农率失真理论下设计的。然而,依赖通用失真度量无法捕捉复杂的人类视觉感知,常常导致模糊或不真实的复原。在本文中,我们提出联合源信道生成编码(JSCGC),一种生成式通信范式,用接收端的生成模型替换传统解码器。接收信号被视为一个条件,控制采样过程进入学习到的条件分布,将通信从用于失真最小化的确定性重构重新表述为在感知约束下用于互信息最大化的受控生成。基于这一表述,我们开发了一个统一的联合训练和高效随机采样框架,并提供了其在学习和推理阶段有效性的理论分析。在潜空间图像传输上的大量实验表明,JSCGC在不同信道条件下持续改善基于特征、语义层面和分布的质量,同时表现出一种以语义不一致而非失真为特征的独特错误行为。

英文摘要

Conventional communication systems, including both separation-based coding and learning-based joint source-channel coding (JSCC), are typically designed under Shannon's rate-distortion theory. However, relying on generic distortion metrics fails to capture complex human visual perception, often resulting in blurred or unrealistic reconstructions. In this paper, we propose Joint Source-Channel-Generation Coding (JSCGC), a generative communication paradigm that replaces the conventional decoder with a generative model at the receiver. The received signal is treated as a condition that controls the sampling process into the learned conditional distribution, reformulating communication from deterministic reconstruction for distortion minimization to controlled generation for mutual information maximization under perceptual constraints. Based on this formulation, we develop a unified joint training and efficient stochastic sampling framework, and provide theoretical analysis of its effectiveness in both learning and inference stages. Extensive experiments on latent-space image transmission demonstrate that the JSCGC consistently improves feature-based, semantic-level, and distributional quality across diverse channel conditions, while exhibiting a distinct error behavior characterized by semantic inconsistency rather than distortion.

2606.12489 2026-06-12 cs.IT cs.LG math.IT 新提交

Masked Neural Detection for Constrained Channel Coding in Molecular Communication

分子通信中约束信道编码的掩码神经检测

Melih Şahin, Ozgur B. Akan

发表机构 * Centre for neXt Communications (CXC), Department of Engineering, University of Cambridge(下一代通讯中心(CXC)、工程系、剑桥大学) Centre for neXt Communications (CXC), Department of Electrical and Electronics Engineering, Koç University(下一代通讯中心(CXC)、电子与电气工程系、科克大学)

AI总结 针对分子通信中的扩散记忆问题,提出掩码神经检测器,结合RLIM约束码与SBRNN,在多数情况下优于未编码检测,平均增益达10.36倍,并设计RLIM定制训练掩码进一步提升性能。

详情
Comments
5 pages, 2 figures, 4 tables
AI中文摘要

分子通信(MC)遭受严重的扩散记忆,因为一个符号释放的分子可能在后续符号期间到达。神经序列检测器,特别是滑动双向循环神经网络(SBRNN),在此类信道中能显著优于阈值检测器。这引出了MC信道编码的一个核心问题:当编码和未编码传输均采用神经检测评估时,先前在阈值检测下建立优势的码是否仍能保持其优势?本文针对游程限制的ISI缓解(RLIM)码(一类先前在MC中显示出巨大BER增益的约束码)回答了这一问题。在测试的工作点中,最佳RLIM-SBRNN接收机在59个案例中的46个中击败了最佳未编码接收机(在阈值和SBRNN检测之间选择),平均增益为10.36倍。我们还为紧凑型SBRNN检测器提出了一个RLIM定制的训练掩码,在236次比较中的227次中改进了未掩码的RLIM-SBRNN,当掩码有益时平均增益为3.267倍。最后,紧凑型掩码RLIM-SBRNN尽管不使用任何信道知识,但与信道状态感知的MLSE具有竞争力。

英文摘要

Molecular communication (MC) suffers from severe diffusion memory because molecules released for one symbol may arrive during later symbols. Neural sequence detectors, especially sliding bidirectional recurrent neural networks (SBRNNs), can substantially outperform threshold detectors in such channels. This raises a central question for MC channel coding: does a code whose advantage was established under threshold detection retain it when both coded and uncoded transmission are evaluated with neural detection? This letter answers this question for run-length-limited ISI-mitigation (RLIM) codes, a class of constrained codes previously shown to provide large BER gains in MC. Across the tested operating points, the best RLIM-SBRNN receiver beats the best uncoded receiver, chosen between threshold and SBRNN detection, in $46$ of $59$ cases, with a mean gain of $10.36\times$ over those wins. We also propose an RLIM-tailored training mask for compact SBRNN detectors, improving the unmasked RLIM-SBRNN in $227$ of $236$ comparisons with $3.267\times$ mean gain when masking is beneficial. Finally, the compact masked RLIM-SBRNN is competitive with channel-state-aware MLSE despite using no channel knowledge.

2606.12816 2026-06-12 quant-ph cs.ET cs.LG 新提交

Graph Reinforcement Learning for Calibration-Aware Quantum Circuit Routing

图强化学习用于校准感知的量子电路路由

Yash Vardhan Tomar, Dheeraj Peddireddy, Vaneet Aggarwal

发表机构 * University of California, Berkeley(加州大学伯克利分校) National Institute of Standards and Technology(国家标准与技术研究院)

AI总结 提出一种利用图强化学习进行校准感知的量子电路路由方法,通过IBM Heron r2校准数据选择SWAP操作,在MQT Bench电路上平均保真度达0.727,优于SABRE-best20的0.440。

详情
AI中文摘要

量子电路路由是在为噪声中等规模量子处理器编译程序时的关键步骤。通过标准开销指标看似高效的路由,在通过校准不良的耦合器时仍可能损失保真度。我们研究了一种校准感知的图强化学习路由器,该路由器使用当天的IBM Heron r2校准数据来选择硬件边缘SWAP。我们使用近端策略优化训练策略,并通过九个慕尼黑量子工具包(MQT)基准电路和三个校准快照的精确模拟保真度进行评估。在这些评估中,合并的平均精确保真度为$0.727$,而SABRE-best20为$0.440$,目标感知SABRE为$0.481$。保真度增益伴随着更高的路由双量子比特计数,并集中在5q和8q电路系列中;在固定树动作图下,所有10q系列都倾向于SABRE-best20。总体而言,我们的结果表明,校准感知的学习路由可以超越基于门计数的编译,提高保真度。

英文摘要

Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. Fidelity gains come with higher routed two-qubit counts and are concentrated in the 5q and 8q circuit families; under the fixed tree action graph, all 10q families favor SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.

2606.12806 2026-06-12 quant-ph cs.LG 新提交

Quantum Reservoir Computing for Short-Term Power Load Forecasting in Resource-Constrained Energy Systems

量子储层计算在资源受限能源系统中的短期电力负荷预测

Mansi Od, Param Pathak, Nouhaila Innan, Muhammad Shafique

发表机构 * University of Waterloo(滑铁卢大学)

AI总结 提出一种硬件高效的量子储层计算框架,通过固定量子储层和压缩经典读出层,在有限内存和硬件噪声下实现短期负荷预测,6位量化保留全精度性能并减少81.2%内存。

详情
Comments
11 pages, 9 figures
AI中文摘要

短期负荷预测对于可靠的能源管理至关重要,但在边缘设备上的实际部署需要模型在有限内存、有限测量预算和硬件噪声下保持准确性。本文提出一种硬件高效的量子储层计算(QRC)框架用于能源负荷预测,其中固定量子储层将时间输入窗口转换为高维特征,仅训练经典弹性网络读出层。为降低部署成本,训练后的读出层通过训练后定点量化压缩,位宽从8位到2位。该框架在Tetouan和Spain能源负荷数据集上评估,采用精确态矢量模拟、512次有限采样以及来自IBM FakeTorino和IBM FakeMarrakesh的 realistic 硬件噪声模型。结果表明,6位读出精度保持全精度预测性能,同时将读出内存减少81.2%。低于此阈值时,性能退化依赖于数据集,Tetouan表现出更强的敏感性,而Spain退化更缓慢。硬件噪声验证进一步表明,训练后的读出层可转移到噪声储层状态而无需重新训练。这些发现支持量化QRC作为近期量子时间序列应用的资源感知预测方法。

英文摘要

Short-term load forecasting is essential for reliable energy management, but practical deployment on edge devices requires models that remain accurate under limited memory, finite measurement budgets, and hardware noise. This work proposes a hardware-efficient Quantum Reservoir Computing (QRC) framework for energy load forecasting, where a fixed quantum reservoir transforms temporal input windows into high-dimensional features and only a classical Elastic Net readout is trained. To reduce deployment cost, the trained readout is compressed using post-training fixed-point quantization at bit widths from 8 to 2 bits. The framework is evaluated on the Tetouan and Spain energy load datasets under exact statevector simulation, 512-shot finite sampling, and realistic hardware-noise models from IBM FakeTorino and IBM FakeMarrakesh. Results show that 6-bit readout precision preserves full-precision forecasting performance while reducing readout memory by 81.2%. Below this point, degradation becomes dataset dependent, with Tetouan showing stronger sensitivity and Spain degrading more gradually. Hardware-noise validation further shows that the trained readout transfers to noisy reservoir states without retraining. These findings support quantized QRC as a resource-aware forecasting approach for near-term quantum time-series applications.

2606.13581 2026-06-12 cs.CY cs.CL cs.HC physics.soc-ph 新提交

The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok

意识基调:TikTok 心理健康月期间的主题、情感和毒性地图

Henrique Ferraz de Arruda, Andreia Sofia Teixeira, Pranay Gundala Reddy, Anindya Mondal, Kleber Andrade Oliveira, Filipi Nascimento Silva

发表机构 * Institute for Biocomputation and Physics of Complex Systems (BIFI)(生物计算与复杂系统物理研究所) University of Zaragoza(萨拉戈塔大学) ARAID Foundation(ARAID基金会) Network Science Institute(网络科学研究所) Northeastern University London(伦敦东北大学) Kent Medway Medical School(肯特梅德斯医疗学院) LASIGE(拉西格研究所) Faculdade de Ciências da Universidade de Lisboa(里斯本大学科学学院) Department of Psychology, University of Limerick(利默里克大学心理学系) Observatory on Social Media, Indiana University(社交媒体观察所,印第安纳大学) CSSI - Kellogg School of Management, Northwestern University(CSSI - 北western大学凯洛格管理学院)

AI总结 通过分析 TikTok 2023-2024 年心理健康月期间的视频和评论,使用 BERTopic 提取主题、XLM-T 和 Detoxify 量化情感与毒性,发现视频情感偏负面而评论更混合,毒性在评论中呈长尾分布且集中于特定主题。

详情
Comments
12 pages, 6 figures
AI中文摘要

尽管人们担忧使用 TikTok 对心理健康的影响,但关于创作者如何构建相关内容以及受众如何接收这些内容,我们知之甚少。我们通过 TikTok 研究 API 收集了 2023 年和 2024 年心理健康意识月(5月)的 28,341 个 TikTok 视频和 80,130 条评论的内容,并研究了意识基调在不同主题和年份间的变化。我们将“基调”定义为心理健康话语的情感和人际框架,通过情感和毒性度量来操作化。我们使用 BERTopic 和对数几率关键词从视频文本中提取主题,然后分别对视频转录和评论量化主题条件下的情感(XLM-T)和毒性(Detoxify)。情感捕捉内容的效价,而毒性反映有害或辱骂性语言的存在。我们发现跨年份存在一组稳定的重复主题,涵盖临床状况、情感披露、自我护理和活动导向内容,且参与度高度偏向一小部分主题。所有情感和毒性分析均分别针对视频内容和评论进行计算,使我们能够区分内容生产和受众接收。视频中的情感对于情感强烈的主题通常是负面的,而评论则倾向于转向更混合或积极的极性,尤其是对于自杀预防。毒性总体中位数较低,但在评论中表现出比视频更长的尾部异常值,这些异常值在评论中更为明显,并集中在特定主题(例如“Duet”、“Suicide Prevention”和“Psychisch”)。总体而言,我们的结果提供了意识月活动期间 TikTok 上心理健康话语的主题级分解。

英文摘要

Despite raising concerns about the mental health effects associated with the usage of TikTok, little is known about how related content is framed by creators and received by audiences. We collect the content of 28,341 TikTok videos and 80,130 comments from Mental Health Awareness Month (May) in 2023 and 2024 via the TikTok Research API, and study how the tone of awareness varies across topics and years. We characterize "tone" as the emotional and interpersonal framing of mental health discourse, operationalized through sentiment and toxicity measures. We extract topics from video text using BERTopic and log-odds keywords, then quantify topic-conditioned sentiment (XLM-T) and toxicity (Detoxify) separately for video transcriptions and comments. Sentiment captures the affective valence of content, while toxicity reflects the presence of harmful or abusive language. We find a stable set of recurring themes across years, spanning clinical conditions, emotional disclosure, self-care, and campaign-oriented content, with engagement highly skewed toward a small subset of topics. All sentiment and toxicity analyses are computed separately for video content and comments, allowing us to distinguish between content production and audience reception. Sentiment in videos is often negative for emotionally charged topics, while comments tend to shift toward more mixed or positive polarity, especially for suicide prevention. Toxicity is low in median overall, but exhibits longer-tailed outliers in comments than in videos that are more pronounced in comments and concentrated in specific topics (e.g., "Duet", "Suicide Prevention", and "Psychisch"). Overall, our results provide a topic-level decomposition of mental health discourse on TikTok during awareness-month campaigns.

2606.13422 2026-06-12 quant-ph cs.LG physics.flu-dyn 新提交

Foundations of Practical Quantum Advantage in Quantum-Informed Machine Learning for Predicting Chaos

量子信息机器学习预测混沌的实用量子优势基础

Maida Wang, Xiao Xue, Minh Chung, Peter V. Coveney

发表机构 * Centre for Computational Science, University College London(大学学院伦敦计算科学中心) Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities(巴伐利亚科学院和人文科学莱比锡超算中心) Centre for Advanced Research Computing, University College London(大学学院伦敦先进研究计算中心)

AI总结 提出基于高阶量子统计先验的量子优势机制,通过两阶段优势(表示与提取)证明量子-经典复制测量复杂度分离,并在湍流和天气预报中验证。

详情
AI中文摘要

我们为混沌动力系统的量子信息机器学习中的实用量子优势机制建立了理论基础。一族由k索引的高阶量子统计先验(Q-Priors)在n_q = kq个量子比特上承载不变测度的k点边际,扩展了先前工作的单站点构造。我们证明了一个两阶段优势。在表示阶段,叠加和纠缠紧凑地存储了n_q个量子比特上不变测度的不可分解空间相关性。在提取阶段,对两个副本进行联合贝尔测量,以独立于n_q的副本对数量估计任何事后泡利泛函,而相应的全泡利读出的任何自适应单副本协议需要Ω(2^(n_q))个副本;这是复制测量复杂度中可证明的量子-经典分离。双副本读出在模拟和IQM超导处理器上实现。两个案例研究将这一机制实例化到具有独立科学价值的工作流程中:一个湍流通道流研究,其中双副本读出产生了不变测度的一个命名的非对角关联子(速度方向相干性),以及一个基于欧洲中期天气预报中心ERA5再分析的中期天气预报工作流程,其中对角k ≤ 2 Q-Prior引导Koopman展开,在48-240小时预报时效内将异常相关系数技能提高10-39%,并减少了滚动预报到静态平均场的长期崩溃。我们的实用优势定义的两个条件在互补层面上得到满足,为在容错硬件之前实现实用量子优势确定了一条候选路径。

英文摘要

We develop theoretical foundations for a practical quantum-advantage mechanism in quantum-informed machine learning for chaotic dynamical systems. A family of k-indexed higher-order quantum statistical priors (Q-Priors) hosts the k-point marginal of the invariant measure on n_q = kq qubits, extending the single-site construction of prior work. We prove a two-stage advantage. In the representation stage, superposition and entanglement compactly store non-factorisable spatial correlations of the invariant measure on n_q qubits. In the extraction stage, joint Bell measurements on two copies estimate any post hoc Pauli functional with a copy-pair count independent of n_q, whereas any adaptive single-copy protocol for the corresponding full-Pauli read-out requires Omega(2^(n_q)) copies; this is a provable quantum-classical separation in copy-measurement complexity. The two-copy read-out is realised in simulation and on IQM superconducting processors. Two case studies instantiate the mechanism in workflows of independent scientific value: a turbulent channel-flow study in which the two-copy read-out yields a named non-diagonal correlator of the invariant measure (the velocity-direction coherence), and a medium-range weather forecasting workflow on the European Centre for Medium-Range Weather Forecasts ERA5 reanalysis in which the diagonal k <= 2 Q-Prior steers a Koopman rollout, improves anomaly-correlation skill by 10-39% across 48-240 h lead times, and reduces the long-horizon collapse of rollouts onto a static mean field. The two conditions of our practical-advantage definition are met at complementary levels, identifying a candidate route to practical quantum advantage before fault-tolerant hardware.

2606.12824 2026-06-12 eess.IV cs.AI cs.CV physics.med-ph 新提交

Acquisition state behaves as a structured, measurable variable governing lung-nodule AI: kernel-driven measurement instability and noise-driven detection fragility, invisible to DICOM metadata

采集状态作为结构化、可测量变量影响肺结节AI:核驱动的测量不稳定性和噪声驱动的检测脆弱性,DICOM元数据不可见

Daniel Soliman

发表机构 * Daniel Soliman, M.S(丹尼尔·索利曼,硕士)

AI总结 研究通过LUNA16训练的RetinaNet检测器,发现CT采集状态(重建核与噪声)独立影响AI的测量与检测性能,且无法从DICOM元数据恢复,提出采集感知的输入验证层。

详情
AI中文摘要

医学影像AI治理正在规范化:2026年ACR-SIIM实践参数建议本地验收测试和持续漂移监测,ACR Assess-AI注册使用DICOM元数据监测AI输出。我们认为在输出指标之下存在一个必要但目前未监测的层:输入研究是否保持在模型验证过的采集范围内。使用LUNA16训练的MONAI RetinaNet肺结节检测器,我们测试采集状态是否表现为结构化的可测量变量。在仅重建核不同的真实配对CT(NLST B30f vs B80f)上,核单独使AI测量的直径发生偏移,并在5.2%(155个结节中的8个)中翻转了Fleischner尺寸类别,而检测置信度不变(Wilcoxon p=0.22)。在受控的LIDC-IDRI扰动下,效应按轴分离:噪声轴降低检测置信度(p=5.9e-32,集中在6mm以下结节)但不影响测量,而频率/核轴破坏测量(p=8.6e-13)但不影响检测。一个4特征像素指纹恢复了重建身份(真实CT上患者级AUC约0.95,QIBA体模上0.995),而ConvolutionKernel DICOM标签无信息(不同重建标签相同)。核轴跨四个制造商传输(留一制造商AUC 0.94-0.98,与制造商内上限匹配)。因此采集状态映射到不同的AI故障模式:频率内容对应测量可靠性,噪声对应检测灵敏度,且无法从元数据恢复。采集感知的输入侧验证是现在进入影像AI认证的验收测试和漂移监测要求中缺失的层。

英文摘要

AI governance for medical imaging is formalizing: the 2026 ACR-SIIM Practice Parameter recommends local acceptance testing and ongoing drift monitoring, and the ACR Assess-AI registry monitors AI outputs using DICOM metadata for context. We argue that a necessary, currently unmonitored layer sits beneath output metrics: whether incoming studies remain within the acquisition envelope a model was validated on. Using a LUNA16-trained MONAI RetinaNet lung-nodule detector, we test whether acquisition state behaves as a structured, measurable variable. On real paired CT differing only in reconstruction kernel (NLST B30f vs B80f), kernel alone shifted AI-measured diameter and flipped a Fleischner size category in 5.2% (8 of 155) of nodules at fixed patient and acquisition, while detection confidence was unchanged (Wilcoxon p=0.22). Under controlled LIDC-IDRI perturbations the effects dissociated by axis: the noise axis degraded detection confidence (p=5.9e-32, concentrated in nodules under 6 mm) but not measurement, while the frequency/kernel axis corrupted measurement (p=8.6e-13) but not detection. A 4-feature pixel fingerprint recovered reconstruction identity (patient-level AUC about 0.95 on real CT, 0.995 on a QIBA phantom) where the ConvolutionKernel DICOM tag was uninformative (identical labels across reconstructions). The kernel axis transported across four manufacturers (leave-one-vendor-out AUC 0.94-0.98, matching the within-vendor ceiling). Acquisition state thus maps to distinct AI failure modes, frequency content to measurement reliability and noise to detection sensitivity, and is not recoverable from metadata. Acquisition-aware, input-side validation is the missing layer for the acceptance-testing and drift-monitoring requirements now entering imaging-AI accreditation.

2606.12559 2026-06-12 physics.comp-ph cs.LG cs.NA math.NA physics.flu-dyn 新提交

Feature-preserving Latent-EnKF for Data Assimilation of Flows with Shocks

保持特征的潜在EnKF用于含激波流动的数据同化

Hemanth Chandravamsi, Hangchuan Hu, Ponkrshnan Thiagarajan, Tamer A. Zaki

发表机构 * Department of Mechanical Engineering, Johns Hopkins University(约翰霍普金斯大学机械工程系)

AI总结 针对含激波流动中EnKF因多模态统计产生伪振荡的问题,提出在学习的低维潜在空间进行集合更新以保持激波特征,并通过共享解码器恢复物理状态,数值实验验证了无伪振荡的准确特征恢复。

详情
AI中文摘要

集合卡尔曼滤波(EnKF)被广泛用于顺序数据同化,但对于具有间断的解(如可压缩流中的激波)会失效。激波位置的不确定性导致多模态集合统计,违反了EnKF的高斯假设,在分析状态中产生大尺度伪振荡。我们引入了一种保持特征的潜在EnKF,在学习的低维潜在空间中进行集合更新,其中激波和流动特征具有光滑流形表示,从而在EnKF分析期间保持尖锐特征。更新后的潜在状态通过所有集合成员共享的解码器映射回物理状态。该算法消除了先前方法中使用的成员特定有序训练和正性下限。在Sod激波管和马赫2激波与二维圆柱相互作用的数值实验中,使用稀疏和噪声观测,结果显示能够准确恢复激波和接触间断的特征,且无伪振荡。

英文摘要

The ensemble Kalman filter (EnKF) is widely adopted for sequential data assimilation, but fails for solutions with discontinuities, such as shocks in compressible flows. Uncertainty in shock location induces multimodal ensemble statistics that violate the Gaussian assumptions underlying the EnKF, producing large-scale spurious oscillations in the analysis state. We introduce a feature-preserving latent-EnKF that performs the ensemble update in a learned low-dimensional latent space, where shock and flow features admit a smooth manifold representation, thereby preserving sharp features during EnKF analysis. The updated latent state is mapped back to physical state through a shared decoder for all ensemble members. The algorithm eliminates the member-specific ordered training and positivity flooring used in prior approaches. Numerical experiments on a Sod shock tube and Mach 2 shock interaction with a 2D cylinder, using sparse and noisy observations, show accurate feature recovery of shocks and contact discontinuities without spurious oscillations.

2606.12502 2026-06-12 physics.soc-ph cs.AI 新提交

A Mathematical Theory of Value: a synthesis on goal-directed agency under resource constraints

价值的数学理论:资源约束下目标导向行为的综合

Cheng Qian

发表机构 * Cheng Qian(陈倩)

AI总结 本文提出价值是目标导向主体在资源约束下转化资源为目标进度的速率,通过尺度不变性公理导出对数度量,并推导出价值编码定理,实现价值与信息论的统一。

详情
Comments
Also available at https://doi.org/10.5281/zenodo.20487041 (v5)
AI中文摘要

我们提出,价值——目标导向主体创造、毁灭和交换的量——是与信息同类的合法结构量。遵循香农的方法,我们做出一个无情的抽象:价值是主体将资源转化为目标进度的速率,相对于由其目标固定的参考系。尺度不变性公理强制采用对数度量 $V=\sum_i k_i \ln e_i$;通过Peters(2019)的遍历性论证,再投资资源的复利强制了相同的形式。这两条路径是亲缘关系而非独立;它们的一致性是一种一致性检查,而非过度确定。我们推导了价值的编码定理:$\Delta G \le I(X;Y)$,由贝叶斯比例分配实现;实现的价值分解为 $G=D(q\\|r)-D(q\\|p)$,将错位识别为可测量的浪费。对于群体,价值是参考系相关的,而价格是参考系无关的;共享资源并融合感知的舰队继承上限 $G_{\mathrm{fleet}} \le I(X;Y_{1:m}) \le H(X)$(一个推论;早期的求和形式声明是错误的,并在v5中修正)。动力学层产生了实然/应然不对称性,从该不对称性中,对齐作为控制稳定性条件出现,并具有闭式残差。我们在预注册的规模扩展中测试了单参考系定律于实时语言模型:感知互信息跟踪实际能力而非参数数量(在30个模型×领域点上合并的Spearman $\rho = 0.977$),样本外 $\Delta G$ 跟踪 $I(X;Y)$,过度自信是可测量的耗散;进一步的预注册测试显示,该桥在四种任务形状上形状不变($n=42$,斜率0.953)。这些机制没有一个是全新的——广义Kelly、Armstrong & Mindermann(2018)、经典控制;贡献在于它们的统一以及随之而来的治理映射(监督上的激励设计)。

英文摘要

We propose that value -- the quantity goal-directed agents create, destroy, and exchange -- is a lawful structural quantity in the same category as information. Following Shannon's method, we make one ruthless abstraction: value is the rate at which an agent converts a resource into goal-progress, relative to a frame fixed by its goal. A scale-invariance axiom forces a logarithmic measure, $V=\sum_i k_i \ln e_i$; compounding of a reinvested resource forces the same form via the ergodicity argument of Peters (2019). The two routes are kin rather than independent; their agreement is a consistency check, not an over-determination. We derive a coding theorem of value: $ΔG \le I(X;Y)$, achieved by Bayes-proportional allocation; realized value decomposes as $G=D(q\|r)-D(q\|p)$, identifying misalignment with measurable waste. For populations, value is frame-relative while price is frame-independent; a fleet that pools its resource and fuses its perception inherits the ceiling $G_{\mathrm{fleet}} \le I(X;Y_{1:m}) \le H(X)$ (a corollary; an earlier sum-form claim was wrong and is corrected in v5). A dynamical layer yields an is/ought asymmetry from which alignment emerges as a control-stability condition with a closed-form residual. We test the single-frame laws on live language models in a pre-registered scale-up: perception mutual information tracks realized capability rather than parameter count (Spearman $ρ= 0.977$ pooled over 30 model$\times$domain points), out-of-sample $ΔG$ tracks $I(X;Y)$, and over-confidence is measurable dissipation; a further pre-registered test shows the bridge is shape-invariant across four task shapes ($n=42$, slope 0.953). None of the mechanisms is individually new -- generalized Kelly, Armstrong & Mindermann (2018), classical control; the contribution is their unification and the governance mapping (incentive design over oversight) that follows.

2606.13535 2026-06-12 hep-ex cs.AI hep-ph 新提交

AgentRivet: an automated system for producing Rivet routines from journal publications

AgentRivet:从期刊论文自动生成Rivet例程的系统

Antonio J. Costa, Caterina Doglioni, Christian Gütschow, Andrew D. Pilkington, Sukanya Sinha

发表机构 * Department of Physics & Astronomy, University of Manchester(曼彻斯特大学物理与天文学系) Centre for Advanced Research Computing, University College London(伦敦大学学院先进计算中心)

AI总结 提出基于大语言模型的自动化工作流AgentRivet,从论文提取物理分析信息并生成缺失的Rivet例程,经代码和物理审查实现质量控制,在ATLAS和CMS测量中生成语法错误少、物理保真度合理的例程。

详情
AI中文摘要

粒子物理对撞机实验将Rivet例程作为模型无关测量分析保存策略的一部分。Rivet是一个C++工具包,允许将新的理论模型与测量结果进行比较,从而帮助开发和调整蒙特卡洛事件生成器,以及搜索标准模型之外的新物理。然而,已知分析覆盖不完整,只有39%的测量具有文档化且公开可用的Rivet例程。在本文中,我们设计并实现了一个基于大语言模型的自动化工作流,旨在提供缺失的例程。这个多步骤工作流称为AgentRivet,从已发表的论文中提取物理分析信息,并编写缺失的Rivet例程,中间代码和物理审查作为自主质量控制的一部分。我们报告了使用OpenAI、Anthropic和Google提供的商业大语言模型,针对ATLAS和CMS实验的两个近期测量所获得的结果。我们发现AgentRivet生成了语法错误很少的合格Rivet例程。例程的物理保真度合理,并遵循相关出版物中的解释。然而,物理实现问题确实出现,并使用AgentRivet产生的产物进行了调查。大多数物理实现问题源于给定出版物中微妙但模糊的定义,尽管有些模型即使在给出明确定义时也难以实现复杂的可观测量。

英文摘要

Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event generators as well as searches for physics beyond the Standard Model. However, analysis coverage is known to be incomplete, with only 39% of measurements having documented and publicly available Rivet routines. In this article, we design and implement an automated workflow based on Large Language Models with the goal of providing the missing routines. This multi-step workflow, referred to as AgentRivet, extracts the physics analysis information from published papers and writes the missing Rivet routines, with intermediate code- and physics- reviews as part of an autonomous quality control. We report the results obtained using commercial Large Language Models, provided by OpenAI, Anthropic, and Google, for two recent measurements from the ATLAS and CMS experiments. We find that AgentRivet produces competent Rivet routines with few syntax errors. The physics fidelity of the routines is reasonable and follows the explanations given in the relevant publications. Nevertheless, physics-implementation issues do arise and are investigated using the artefacts produced by AgentRivet. The majority of physics implementation issues arise from subtle-but-ambiguous definitions in the given publication, although some models struggle to implement complex observables even when clear definitions are given.

2606.13454 2026-06-12 physics.optics cond-mat.dis-nn cs.ET cs.LG 新提交

Optical Implementation of Equilibrium Propagation Using Spatial Photonic Ising Machines

利用空间光子伊辛机实现平衡传播的光学实现

Dimitri Vanden Abeele, Daniele Veraldi, Davide Pierangeli, Claudio Conti, Serge Massar

发表机构 * Laboratoire d’Information Quantique, Université Libre de Bruxelles (ULB)(量子信息实验室,布鲁塞尔自由大学) Dipartimento di Fisica, Sapienza Università di Roma(物理学系,萨皮恩扎罗马大学)

AI总结 提出利用空间光子伊辛机光学实现平衡传播,通过规范变换方法编码神经元状态和可训练模式,在Wine和MNIST数据集上验证了能效物理实现的可行性。

详情
AI中文摘要

平衡传播为训练基于能量的网络提供了一种传统机器学习的引人注目的替代方案。在这里,我们展示了使用空间光子伊辛机(SPIM)的平衡传播(EP)的混合光学-数字实现。SPIM利用规范变换方法,通过空间光调制器将连续神经元状态和秩1二进制可训练模式光学编码为相位调制,并使用有限差分方案实现推理。实验系统在Wine分类数据集上进行了评估。该方法的潜力,包括使用连续耦合和结构化耦合矩阵,在更复杂的MNIST数据集上通过数值评估。我们的工作为平衡传播的节能物理实现提供了一条具体路径。

英文摘要

Equilibrium Propagation offers a compelling alternative to traditional machine learning for training energy-based networks. Here we demonstrate a hybrid optical-digital implementation of EP using a Spatial Photonic Ising Machine (SPIM). The SPIM exploits the gauge transformation method to optically encode both continuous neuron states and rank-1 binary trainable patterns as phase modulations via a spatial light modulator, with inference realized using a finite difference scheme. The experimental system is evaluated on the Wine classification dataset. The potential of this approach, including the use of continuous couplings and structured coupling matrices, is evaluated numerically on the more complex MNIST dataset. Our work provides a concrete pathway toward energy-efficient physical implementations of Equilibrium Propagation.

2606.13045 2026-06-12 cond-mat.dis-nn cs.LG 新提交

A solvable model for unsupervised federated learning

无监督联邦学习的一个可解模型

Giovanni Catania, Aurélien Decelle, Gianluca Manzan, Beatriz Seoane, Daniele Tantari

发表机构 * Institute for Cross-disciplinary Physics and Complex Systems IFISC (CSIC-UIB)(跨学科物理与复杂系统研究所(IFISC,CSIC-UIB)) Departamento de Física Teórica, Universidad Complutense de Madrid(马德里complutense大学理论物理系) Escuela Técnica Superior de Ingenieros Industriales, Universidad Politécnica de Madrid(马德里理工大学工业工程师学院) GISC - Grupo Interdisciplinar de Sistemas Complejos(跨学科复杂系统小组) Inria Saclay - Tau team(萨克利Inria团队) Department of Mathematics, University of Bologna(博洛尼亚大学数学系)

AI总结 提出一个理论框架,通过教师-多学生交互场景分析联邦学习,证明学生间交互能系统提升学习性能,并推导最优贝叶斯条件,映射到受限玻尔兹曼机。

详情
AI中文摘要

我们引入了一个理论框架,用于在生成式设置中分析联邦学习,通过教师-多学生交互场景,其中每个学生接收不同的数据实现,要么通过不同的噪声破坏,要么通过访问不同的子集,可能大小不同。使用平衡无序系统的理论工具,我们解析地表明学生间的交互系统地提升了学习性能:高噪声学生需要更少的样本来恢复潜在模式,而低噪声学生与真实信号的重叠更大。我们推导了教师恢复的最优贝叶斯条件,作为样本复杂度、噪声水平和交互强度的函数,并通过数值模拟验证了这些预测。得到的动力学可以映射到具有结构化隐藏层的受限玻尔兹曼机中的平衡采样,从而为交互如何改进分布式生成建模提供了原则性的理论理解。

英文摘要

We introduce a theoretical framework for analyzing federated learning in a generative setting through a teacher-multiple interacting students scenario, in which each student receives a distinct realization of the data, either through a different noise corruption or by accessing a different subset, possibly of varying size. Using theoretical tools in equilibrium disordered system, we analytically show that interactions among students systematically enhance learning performance: highly noisy students require fewer samples to recover the underlying pattern, while low-noise students achieve a larger overlap with the ground-truth signal. We derive the optimal Bayesian conditions for teacher recovery as functions of the sample complexity, noise level, and interaction strength, and validate these predictions through numerical simulations. The resulting dynamics can be mapped onto equilibrium sampling in a Restricted Boltzmann Machine with a structured hidden layer, providing a principled theoretical understanding of how interactions improve distributed generative modeling.

2606.11930 2026-06-12 cs.HC cs.AI cs.CV 新提交

Frozen Multimodal Embeddings for AI-Assisted Interview Assessment of Personality and Cognitive Ability

冻结多模态嵌入用于异步视频面试中的个性与认知能力评估

Kuo-En Hung, Hung-Yue Suen, Shih-Ching Yeh, Hsiang-Wen Wang

发表机构 * Technology Application and Human Resource Development, National Taiwan Normal University(台湾国立台中教育大学技术应用与人力资源发展系) Computer Science and Information Engineering, National Central University(台湾国立中央大学计算机科学与资讯工程系) Institute of Photonic System, National Yang Ming Chiao Tung University(台湾阳明交通大学光电系统研究所)

AI总结 针对异步视频面试中标注数据有限的高维多模态学习问题,提出使用冻结多模态编码器(CLIP、Whisper、RoBERTa等)结合低容量下游模型,在个性预测任务上实现MSE降低19.1%,并发现认知能力预测中存在数据集捷径。

详情
Comments
9 pages, 1 figure, 5 tables
AI中文摘要

从异步视频面试(AVI)中预测心理特质是一个具有挑战性的多模态学习问题,因为标注数据集有限,而每个回答包含高维的视觉、声学和语言信号。本文介绍了我们针对ACM多媒体AVI挑战2026的解决方案,该挑战评估两个任务:Track~1从与个性相关的面试回答中预测自我报告的HEXACO个性特质,Track~2从结构化AVI回答中对认知能力水平进行分类。我们将该问题视为小样本表示学习任务。我们不微调大型预训练模型,而是使用冻结的多模态编码器,包括用于视觉特征的CLIP、用于声学特征和转录的Whisper,以及用于文本表示的RoBERTa、E5和DeBERTaV3,随后使用低容量下游模型。对于Track~1,我们的特质特定回归和晚期融合系统实现了平均验证MSE为0.2696,优于官方基线0.3334。消融结果显示,从全局模型(0.3189)到逐特质建模(0.2871)再到逐特质晚期融合(0.2696)的三步改进,相对于官方基线MSE相对降低了19.1%。对于Track~2,一个紧凑的主题属性基线达到了0.5781的准确率,而我们的多模态集成达到了0.5313,两者均高于官方基线0.4062。我们将这一结果解释为验证分割中可能存在主题属性捷径的证据,而非从AVI内容中进行的稳健认知推理。总体而言,我们的发现表明,基于AVI的心理评估受益于特质特定的多模态建模,但认知能力预测需要仔细控制数据集捷径。

英文摘要

Predicting psychological traits from asynchronous video interviews (AVIs) is a challenging problem in AI-assisted interview assessment because labeled datasets are limited while each response contains high-dimensional visual, acoustic, and verbal signals. This paper presents our solution for the ACM Multimedia AVI Challenge 2026, which evaluates two tasks: Track~1 predicts self-reported HEXACO personality traits from personality-related interview responses, and Track~2 classifies cognitive ability levels from structured AVI responses. We treat the problem as a small-sample representation learning task. Instead of fine-tuning large pretrained models, we use frozen multimodal encoders, including CLIP for visual features, Whisper for acoustic features and transcripts, and RoBERTa, E5, and DeBERTaV3 for textual representations, followed by low-capacity downstream models. For Track~1, our trait-specific regression and late-fusion system achieves an average validation MSE of 0.2696, improving over the official baseline of 0.3334. Ablation results show a three-step improvement from a global model (0.3189), to per-trait modeling (0.2871), to per-trait late fusion (0.2696), corresponding to a 19.1% relative MSE reduction over the official baseline. For Track~2, a compact subject-attribute baseline reaches 0.5781 accuracy, while our multimodal ensemble reaches 0.5313, both above the official baseline of 0.4062. We interpret this result as evidence of possible subject-attribute shortcuts in the validation split rather than robust cognitive inference from AVI content. Overall, our findings suggest that AVI-based psychological assessment benefits from trait-specific multimodal modeling, but cognitive ability prediction requires careful control of dataset shortcuts.

2606.11654 2026-06-12 cs.IR cs.CL cs.HC cs.SI 新提交

The Long Tail, Not the Front Page: Cold-Start Prediction of Crowd Highlight Salience

长尾而非首页:众包高亮显著性的冷启动预测

Kazuki Nakayashiki, Keisuke Watanabe

发表机构 * Glasp Inc.(Glasp公司)

AI总结 本文研究在无读者标记时,如何从文本预测文档的众包高亮显著性,提出基于句子嵌入和位置/上下文特征的对数排序模型,在平均精度上比位置基线提升0.044,并证明该优势源于真实读者标记的学习。

详情
Comments
10 pages, 3 figures, 4 tables
AI中文摘要

社交高亮工具最有用的信号——一群读者标记的段落——仅存在于人们已经阅读过的文档中。能否在标记积累之前,从文本预测文档的聚合众包显著性?先前关于此数据的研究发现,零样本语言模型恢复高亮位置的效果不如简单的基线(位置),因此我们询问,在高亮语料上训练的模型能否击败该基线。使用预注册的模型阶梯和按文档的聚类自助法,我们发现一个微小但稳健的优势:基于句子嵌入和位置/上下文特征的对数排序器比位置基线平均精度高出+0.044(95%置信区间[+0.029, +0.058];在97%的重采样中超过预注册的边界delta=0.03,且在流水线重复运行中稳定)。两种无监督抽取式基线(质心、LexRank风格中心性)均输给位置基线,而训练模型比它们高出+0.108,因此该优势并非由通用无监督代理恢复——它反映了从真实读者标记中学习。在产品术语中,precision@3从0.25上升到0.39(相对提升55%),模型在69%的文档上击败位置基线。消融实验将优势归因于原始嵌入(+0.014)和训练增强(+0.010),每个都有正的置信区间。该优势并非时间泛化失败,我们也没有发现内容漂移或近似重复泄露可以解释它的证据。标准化回归显示,优势主要由文档流行度(流行度越低,优势越大)和标签可靠性决定。它仅在流行度最高的内容上几乎消失;在那里,是位置基线变强,而非模型变弱。由于我们的评估条件设定在最终积累了读者的文档上,这些结果是回顾性的冷启动模拟。

英文摘要

A social highlighter's most useful signal -- which passages a crowd of readers marks -- exists only for documents people have already read. Can the aggregate crowd salience of a document be predicted from its text before its marks accumulate? Prior work on this data found that zero-shot language models recover highlight locations worse than a trivial lead (position) baseline, so we ask whether a model trained on the highlight corpus can beat that baseline. Using a pre-registered ladder of models and a by-document cluster bootstrap, we find a small but robust edge: a logistic ranker over sentence embeddings and positional/contextual features beats the lead baseline by +0.044 average precision (95% CI [+0.029, +0.058]; clears a pre-registered margin delta=0.03 in 97% of resamples, and stable across pipeline re-runs). Two unsupervised extractive baselines (centroid, LexRank-style centrality) lose to lead, and the trained model beats them by +0.108, so the edge is not recovered by generic unsupervised proxies -- it reflects learning from real reader marks. In product terms, precision@3 rises from 0.25 to 0.39 (+55% relative) and the model beats lead on 69% of documents. An ablation attributes the edge to the raw embedding (+0.014) and training augmentation (+0.010), each with a positive CI. The edge is not a temporal-generalization failure, and we find no evidence that content drift or near-duplicate leakage explains it. A standardized regression shows the advantage is governed mainly by document popularity (lower popularity, larger edge) and by label reliability. It nearly vanishes only on the most popular content; there it is the lead baseline that strengthens, not the model that weakens. Because our evaluation conditions on documents that eventually accumulated readers, these results are a retrospective cold-start simulation.

2606.11238 2026-06-12 q-fin.GN cs.AI 新提交

Artificial Intelligence in Ship Finance: Applications, Opportunities, and a Case Study in AI-Augmented Loan Origination

人工智能在船舶金融中的应用:机遇与AI增强贷款发起的案例研究

Lasse Dierich, Orestis Schinas

发表机构 * ShipFinance.ai HHX.blue GmbH Technical University of Munich(慕尼黑技术大学) University of the Aegean(爱琴海大学)

AI总结 本文探讨AI在船舶金融中的应用,提出基于大语言模型的模块化架构,用于文档理解、信息提取和工作流自动化,以支持贷款申请流程。

详情
Comments
9 pages, 1 figure
AI中文摘要

船舶金融是资产担保贷款中数据密集且文档繁重的领域,需要整合来自异构且高度非结构化来源的财务、技术、合同和监管信息。日益严格的环境法规和ESG报告要求进一步增加了承销和贷款发起流程的复杂性。人工智能(AI)的最新进展,特别是大语言模型(LLMs),为处理和分析此类信息创造了新的机遇。本文回顾了AI在船舶金融中的潜在应用,特别关注基于LLM的系统用于文档理解、信息提取和工作流自动化。我们提出了this http URL,一个模块化代理架构,用于支持船舶金融中的贷款申请工作流。所提出的系统结合了基于LLM的提取模块、财务分析组件、外部海事数据服务以及带有聊天机器人界面的受控文档生成模块,以支持标准化融资申请的准备工作。本文讨论了在生产中使用此类模型的关键挑战。我们认为,AI辅助系统可以支持海事金融专业人士管理日益复杂的信息和报告要求。

英文摘要

Ship finance is a data-intensive and document-heavy segment of asset-based lending, requiring the integration of financial, technical, contractual, and regulatory information from heterogeneous and largely unstructured sources. Increasing environmental regulation and ESG reporting requirements are adding further complexity to underwriting and loan-origination processes. Recent advances in artificial intelligence (AI), particularly large language models (LLMs), create new opportunities for processing and analysing such information. This paper reviews potential applications of AI in ship finance, with a particular focus on LLM-based systems for document comprehension, information extraction, and workflow automation. We present ShipFinance.ai, a modular agentic architecture to support loan application workflows in ship finance. The proposed system combines an LLM-based extraction module, financial analysis components, external maritime data services, and a controlled document-generation module with a chatbot interface to support the preparation of standardized financing applications. The paper discusses the key challenges for using such models in production. We argue that AI-assisted systems can support maritime finance professionals in managing increasingly complex information and reporting requirements.