arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2606.20465 2026-06-19 cs.CY cs.SI 新提交

Farmer Connect: Improving Farmers' Access to Produce Markets

Farmer Connect：改善农民进入农产品市场的途径

Micheal Amanya, Darius Kainamura, Christine Namatovu, Lailah Kobugabe, Solomon Buwule Fortune, Adones Rukundo

AI总结针对乌干达小农户面临的市场准入难、议价能力弱等问题，提出基于合作社的数字平台Farmer Connect，通过移动优先架构和云后端支持群体管理、市场协调和收益透明，实现约85%的用户需求。

详情

AI中文摘要

乌干达的小农户玉米种植者仍然面临有限的市场准入、薄弱的议价能力、低价格透明度以及对中间商的严重依赖。这些问题因农产品协调不善、付款延迟以及合作社交易可见性差而加剧。本文介绍了Farmer Connect，一个基于合作社的数字平台，旨在支持农民群体之间的农产品管理、市场协调和透明的收益跟踪。该系统支持四种用户角色：管理员、监督员、农民和客户。其核心功能包括农民群体管理、贡献记录和验证、市场列表、订单处理、基于先进先出的农产品分配、收益可见性、移动货币支付支持和通知服务。该平台采用移动优先架构，配备基于云的后端服务和行政网页仪表板。功能实现表明，该系统能够支持基于群体的玉米营销和合作社协调所需的主要工作流程，约85%的已识别用户需求得到实现。研究表明，以合作社为中心的数字平台可以为改善小农户的透明度、协调性和买家准入提供实用框架。

英文摘要

Smallholder maize farmers in Uganda continue to face limited market access, weak bargaining power, low price transparency, and heavy reliance on intermediaries. These challenges are compounded by poor produce coordination, delayed payments, and weak visibility into cooperative transactions. This paper presents Farmer Connect, a cooperative-based digital platform designed to support produce management, marketplace coordination, and transparent earnings tracking among farmer groups. The system supports four user roles: administrators, supervisors, farmers, and customers. Its core functions include farmer group management, contribution recording and verification, marketplace listing, order processing, First In First Out based produce allocation, earnings visibility, mobile money payment support, and notification services. The platform was implemented using a mobile-first architecture with cloud-based backend services and an administrative web dashboard. Functional implementation showed that the system was able to support the major workflows required for group-based maize marketing and cooperative coordination, with approximately 85% of identified user requirements implemented. The study shows that cooperative-centered digital platforms can provide a practical framework for improving transparency, coordination, and buyer access for smallholder farmers.

URL PDF HTML ☆

赞 0 踩 0

2606.20453 2026-06-19 cs.CY cs.HC 新提交

Directors Duties in the Age of Agentic Artificial Intelligence

代理人工智能时代的董事职责

Deirdre Ahern

AI总结探讨董事在采纳代理AI时如何平衡股东与员工利益，分析四种公司治理模型，主张通过更广泛的法律视角促进员工福利。

Journal ref Cambridge Forum on AI: Law and Governance 2, e7 (2026)

详情

DOI: 10.1017/cfl.2026.10049

AI中文摘要

随着董事会采用包括代理AI在内的人工智能以提高运营效率，这为利润最大化提供了新机会。AI的采用越来越与员工角色替代相关联，在公司中，员工作为利益相关者的利益需要探讨。一个新颖的问题是，在AI崛起的时代，当AI在公司中的角色接近或超越人类员工时，AI是否应被赋予利益相关者地位。本文探讨了董事履行公司最佳利益职责时的四种公司目的模型：股东至上模型、开明股东价值模型、利益相关者友好模型和利益相关者价值模型，强调了董事在董事会围绕AI的决策中容纳员工利益的可用空间。结论是，鉴于董事在其最佳利益职责方面免受法律审查的程度，采取更广泛的法律视角来促进员工福利将有利于员工、董事和公司的利益。这将使董事与员工进行有意义的接触，并提供再培训机会以适应AI时代。

英文摘要

As boards engage with the adoption of Artificial Intelligence including agentic AI to drive operational efficiencies, this presents new opportunities for profit maximisation. AI adoption is increasingly identified with employee role displacement and in companies, and the interests of employees as stakeholders require exploration. A novel question posed is whether in an age of AI ascendancy AI may warrant being given stakeholder status as its role in the company approximates or eclipses that of human employees. The article probes four distinct models of corporate purpose within the duty on directors to act in the best interests of the company, the shareholder primacy model, the Enlightened Shareholder value model, the stakeholder friendly model, and the stakeholder value model, highlighting the available scope for directors to accommodate the interests of employees around AI adoption in decision-making by boards around AI. It is concluded that given the degree to which directors are insulated from legal scrutiny in relation to their best interests duty, adopting a wider law in context approach to promote employee welfare would serve the interests of employees, directors and companies alike. This would see directors engaging meaningfully with employees and providing opportunities for reskilling to adapt to the age of AI.

URL PDF HTML ☆

赞 0 踩 0

2606.20102 2026-06-19 cs.CY cs.CR 新提交

Artificial Intelligence as Game Changer in Cybersecurity: What We Learned in 2025-2026, and how this is relevant for Africa

人工智能作为网络安全游戏规则改变者：2025-2026年我们学到的，以及这对非洲的意义

Mikael Alemu Gorsky

AI总结本文通过2025-2026年两个事件论证前沿语言模型已成为网络作战决定性工具，而非洲在模型构建、运营和获取上被完全排除，面临技能、算力和投资三重赤字，并遭受AI欺诈攻击，建议在6-12个月内通过威胁情报共享、治理采纳和伙伴关系应对。

Comments International Conference on Cybersecurity in the Era of Digital Transformation and Artificial Intelligence

详情

AI中文摘要

在2025年和2026年，两个事件解决了此前仅是推测的问题。第一个事件中，一个大型语言模型独立执行了国家支持的网络间谍活动的大部分任务，人类操作员仅在少数决策点介入。第二个事件中，最强大的网络相关模型被置于一个受控访问计划之下，仅限于经过审查的美国科技公司、盟国政府和欧洲标准机构；该范围不包括任何非洲政府、运营商或大学。这两个事件共同确立了本文的论点：前沿语言模型已成为网络作战的决定性工具，而该工具在一个小圈子内建造、拥有和配给，非洲被排除在外。本文记录了非洲在每一方面的排斥。该大陆不构建前沿模型，尚无法运营它们，并且目前无法获得最强大的模型。运营赤字沿着三个轴心展开：技能人才、计算和电力、投资，每个都根据当前数据衡量；与此同时，针对非洲移动货币系统（该大陆领先的数字经济部分）的AI欺诈攻击已经在增加。由此产生两个约束：开发者对前沿模型的把关（非洲决策无法打开），以及对基础设施供应商的选择性依赖（现已陷入地缘政治限制）。由于可比较但不受把关的模型预计在6至12个月内扩散，本文主张通过威胁情报共享、治理采纳和伙伴关系，在非洲人自主条件下，在该窗口内采取应对措施。

英文摘要

In 2025 and 2026, two events settled questions that had until then been speculative. In the first, a large language model executed the great majority of a state-aligned cyber-espionage campaign on its own, with human operators intervening at only a few decision points. In the second, the most capable cyber-relevant model was placed under a controlled-access program limited to a vetted set of United States technology firms, allied governments, and European standards bodies; that perimeter included no African government, operator, or university. Together the two events establish the argument of this paper: frontier language models have become a decisive instrument of cyber operations, and that instrument is built, owned, and rationed within a small circle from which Africa is absent. The paper documents Africa's exclusion on every count. The continent does not build frontier models, cannot yet operate them, and cannot, for now, obtain the most capable ones. The operational deficit is set out along three axes, skilled people, compute and electrical power, and investment, each measured against current figures; meanwhile AI-enabled fraud is already mounting against African mobile-money systems, the part of the digital economy the continent leads. Two constraints follow: the gating of frontier models by their developers, which no African decision can open, and a chosen dependence on infrastructure vendors now caught in geopolitical restriction. Because comparable but ungated models are forecast to spread within six to twelve months, the paper argues for a response that operates inside that window through threat-intelligence sharing, governance adoption, and partnership, undertaken by Africans on their own terms.

URL PDF HTML ☆

赞 0 踩 0

2606.19975 2026-06-19 cs.CY cs.AI 新提交

The Algorithmic-Human Manager: AI, Apps, and Workers in the Indian Gig Economy

算法-人类管理者：印度零工经济中的AI、应用程序与工人

Omir Kumar, Krishnan Narayanan

AI总结本文研究AI和数字技术对印度蓝领零工经济中算法管理的影响，发现其虽扩大就业机会但引发公平性、透明度和工人尊严问题，提出算法-人类管理者混合治理模型。

Comments Published by the Centre for Responsible AI (CeRAI) at IIT Madras

详情

AI中文摘要

本文考察了人工智能和数字技术对印度蓝领零工经济的影响，重点关注算法管理——即在基于位置的服务（如拼车和配送）中使用自动化系统来分配、监控和评估工作。采用社会正义框架和混合方法（包括对16名零工工人和21名关键利益相关者的访谈），研究揭示了一个双重现实：虽然AI驱动的系统扩大了工作机会并产生了运营效率，但它们同时引入了与公平、透明度和工人尊严相关的重大挑战。关键发现表明，算法系统设计上不透明，产生不公平的结果，并且其结构不能为额外劳动提供相应报酬。研究倡导一种务实的混合治理模型——算法-人类管理者框架，其中技术效率和人类问责制共同运作而非对立。研究结果对政策制定者、平台公司以及致力于为印度和全球南方的零工经济设计公平AI治理框架的民间社会组织具有启示意义。

英文摘要

This paper examines the impact of artificial intelligence and digital technologies on the blue-collar gig economy in India, focusing on algorithmic management. This paper examines the impact of artificial intelligence and digital technologies on the blue collar gig economy in India, focusing on algorithmic management he use of automated systems to allocate, monitor, and evaluate work in location-based services such as ride sharing and delivery. Using a social justice framework and a mixed-methods approach comprising interviews with 16 gig workers and 21 key stakeholders, the study uncovers a dual reality: while AI-powered systems expand access to work and generate operational efficiencies, they simultaneously introduce significant challenges related to fairness, transparency, and worker dignity. Key findings reveal that algorithmic systems are opaque by design, produce inequitable outcomes, and are not structured to reward additional labour with proportionate pay. The study advocates for a pragmatic hybrid governance model an Algorithmic Human Manager framework in which technological efficiency and human accountability operate together rather than in opposition. The findings carry implications for policymakers, platform companies, and civil society organizations working to design equitable AI governance frameworks for the gig economy in India and across the Global South.

URL PDF HTML ☆

赞 0 踩 0

2606.19957 2026-06-19 cs.CY 新提交

Modest, artistic, and radical solutions to the environmental impact of image-generating machine learning

图像生成机器学习的环境影响：温和、艺术与激进的解决方案

Laura U. Marks, Jess MacCormack, Kehui Li

AI总结针对图像生成ML的高能耗问题，从计算机工程、媒体研究和艺术角度探索非精确计算、小模型、低精度硬件等解决方案，并提出真实成本核算。

Comments Paper in Proceedings of LIMITS 2026: 12th Workshop on Computing within Limits, 2026-06-23-25, Online

详情

AI中文摘要

机器学习常被宣称能提高信息通信技术的效率，但这种微小收益被数据中心和ML就绪设备的巨大碳、水和土地足迹所淹没。我们调查了ML应用在训练和推理中的电力消耗，重点关注电力密集型的图像生成。我们的团队由一名计算机工程师、一名媒体学者和一名艺术家组成，探索了包括非精确计算、微型语言模型、低精度硬件架构、有限容量硬件以及在设计阶段预测和缓解能源需求等解决方案。我们将概述正在进行的、使用非抓取数据的道德且美学上精致的微型图像生成器的工作。着眼于经济背景，我们将提出机器学习环境影响的真实成本核算，并表明效率标准是由信息通信技术的股东资本主义框架驱动的。

英文摘要

Machine learning is often touted to improve the efficiency of ICT, but that small gain is overwhelmed by the enormous carbon, water, and land footprints of data centers and ML-ready devices. We survey the electricity consumption of ML applications in training and inference, focusing on electricity-intensive image generation. Our team of a computer engineer, a media scholar, and an artist explore solutions including inexact computing; tiny language models; low-precision hardware architectures; hardware with limited capacity; and anticipating and mitigating energy demands at the design phase. We will sketch our work in progress of an ethical and aesthetically sophisticated tiny image generator using non-scraped data. Looking to the economic context, we will propose a true-cost accounting for the environmental impact of machine learning and suggest that the criterion of efficiency is driven by the shareholder-capitalist framing of ICT.

URL PDF HTML ☆

赞 0 踩 0

2606.19899 2026-06-19 cs.CY cs.AI 新提交

Measuring Biological Capabilities and Risks of AI Agents

测量AI代理的生物能力与风险

Patricia Paskov, Jeffrey Lee, Kyle Brady, Alyssa Worland

AI总结针对AI科学家等自主执行多步科学任务的代理系统，本文提出生物代理评估作为解释性工具，并基于实践经验给出定义、设计、运行、评分和记录评估的考量，以帮助决策者谨慎解读结果并指导投资。

详情

AI中文摘要

本文针对一个迅速出现的政策挑战：如何生成和解释关于AI科学家（即能够自主或协作执行多步科学任务的代理AI系统）的生物能力与风险的可信证据。随着这些系统进入真实研究流程，决策者越来越多地面临评估结果，而这些结果的含义取决于通常隐含或记录不足的底层设计选择。我们综合了关于AI驱动的生物风险的现有证据，并引入生物代理评估作为评估这些系统的一种有前景但需要谨慎解释的工具。我们的核心贡献是一套基于实践经验的考量——源自我们自己的评估——展示了围绕定义、设计、运行、评分和记录评估的选择如何实质性地塑造结果对风险意味着什么和不意味着什么。该分析旨在帮助政策制定者以适当的谨慎态度解读生物评估输出；引导公共和私人资助者向AI-生物学评估研究的高杠杆投资；并支持评估新兴AI系统的生物安全从业者。次要受众包括在前沿AI实验室、AI提供商、科学机构和第三方评估组织中设计或进行代理评估的研究人员。

英文摘要

This paper addresses a rapidly emerging policy challenge: how to generate and interpret credible evidence about the biological capabilities and risks of AI scientists, or agentic AI systems capable of autonomously or collaboratively performing multi-step scientific tasks. As these systems enter real research workflows, decision-makers increasingly face evaluation results whose meaning depends on underlying design choices that are often implicit or under-documented. We synthesize current evidence on AI-enabled biological risks and introduce biological agentic evaluations as a promising, but interpretation-sensitive, tool for assessing these systems. Our central contribution is a set of practical, experience-grounded considerations -- drawing from our own evaluations -- that show how choices around defining, designing, running, scoring, and documenting evaluations materially shape what results do and do not imply about risk. The analysis is intended to help policymakers interpret biological evaluation outputs with appropriate caution; guide public and private funders toward high-leverage investments in AI-biology evaluation research; and support biosecurity practitioners assessing emerging AI systems. A secondary audience includes researchers designing or conducting agentic evaluations within frontier AI labs, AI providers, scientific institutions, and third-party evaluation organizations.

URL PDF HTML ☆

赞 0 踩 0

2606.19890 2026-06-19 cs.CY 新提交

Open Weight AI Models Require Proportional Evaluation Approaches

开放权重AI模型需要比例评估方法

Patricia Paskov, Christopher Rodriguez, Sunishchal Dev, Stephen Casper

AI总结本文针对开放权重AI模型（OWMs）的独特风险因素，提出四种比例评估方法（PE1-PE4），并系统审查2025年至2026年4月发布的37个OWM系列，发现仅一个满足所有评估要求。

详情

AI中文摘要

开放权重AI模型（OWMs），即公开发布权重的模型，正在快速分发，并接近领先的封闭权重AI模型（CWMs）的性能水平。虽然OWMs带来了巨大的科学和经济利益，但它们的发布引入了独特的风险因素，而现有的评估实践（主要针对CWM部署设计）未能考虑这些因素。在本文中，我们认为这些风险因素需要不同的比例评估（PE）方法：在没有系统级保障的情况下进行评估（PE1），评估对消除模型级保障的修改的鲁棒性（PE2），测试选择性能力增强（PE3），以及代理最坏情况下的滥用（PE4）。我们系统审查了2025年至2026年4月期间发布的OWMs的当前评估实践，发现所审查的37个模型系列中只有一个满足PE1-4，大多数不满足任何一项。本文面向参与AI评估的政策制定者、资助者和研究人员。随着OWMs能力日益增强，其评估值得开发者、资助者和治理机构密切关注。

英文摘要

Open-weight AI models (OWMs), or models released with publicly-available weights, are distributing rapidly and approaching the performance levels of leading closed-weight AI models (CWMs). While OWMs offer substantial scientific and economic benefits, their release introduces distinct risk factors for which existing evaluation practices, largely designed for CWM deployment, fail to account. In this paper, we argue that these risk factors demand distinct proportional evaluation (PE) approaches: evaluating without system-level safeguards (PE1), assessing robustness to modifications that undo model-level safeguards (PE2), testing selective capability amplification (PE3), and proxying worst-case misuse (PE4). We systematically review current evaluation practices of OWMs released in 2025 through April 2026, finding that only one of the 37 families of models reviewed fulfills PE1-4 and most do not fulfill any. This paper targets policymakers, funders, and researchers involved in AI evaluation. As OWMs grow increasingly capable, their evaluation warrants close attention from developers, funders, and governance bodies alike.

URL PDF HTML ☆

赞 0 踩 0

2606.19816 2026-06-19 cs.CY 新提交

Challenges to Grassroots Organization Engagement with AI Policy

基层组织开展AI政策参与的挑战

Carter Buckner, Jennifer Mickel, Nandhini Swaminathan, William Agnew, Jacob Hobbs, Sarthak Arora, Michelle Lin, Yanan Long, B. V. Alaka

AI总结本文通过案例研究，探讨基层组织和边缘化社区在参与AI政策制定中面临的挑战，并提出基于参与式设计的建议。

Comments To appear at ACM FAccT 2026

详情

AI中文摘要

世界各地正在制定公共政策，以应对AI技术带来的隐私、经济、知识产权、能源及其他风险。公众参与作为问责和对齐机制，对治理至关重要。然而，对于缺乏广泛网络、游说能力及其他权力形式的公众群体来说，参与并影响政策制定可能具有挑战性。这一挑战对边缘化社区尤为严峻。本文通过我们组织将参与式设计（PD）原则引入美国AI政策制定的努力进行案例研究。我们描述了与多个美国政策机构的互动，以及为性少数群体参与式开发AI政策的过程。我们强调了与边缘化社区进行PD实践中的挑战，并提出了缓解这些挑战的建议。最后，我们为政策制定者及其他在边缘化社区工作的组织者提供了可行的建议。

英文摘要

Public policies are being developed around the world to address privacy, economic, intellectual property, energy, and other risks that AI technologies pose. Involvement from the general public is essential to governance as an accountability and alignment mechanism. However, participating in and impacting policymaking can be challenging for sections of the public that lack extensive networks, lobbying capabilities, and other forms of power. This challenge is especially acute for marginalized communities. In this paper, we present a case study of our organization's efforts to bring participatory design (PD) principles to AI policymaking in the US. We describe our engagements with several US policy bodies, and our participatory development of AI policy for queer people. We highlight challenges with PD practice with marginalized communities, and offer suggestions to alleviate them. We conclude with actionable recommendations for policymakers and other organizers working in marginalized communities.

URL PDF HTML ☆

赞 0 踩 0

2606.19794 2026-06-19 econ.GN cs.CY q-fin.EC 交叉投稿

Forecasting AI-Era Productivity: The Intellectually Converged Human Framework and a Missing Cognitive Mediator in Production Function Theory

预测AI时代的生产率：智力融合人类框架与生产函数理论中缺失的认知中介

Kwan Soo Shin, In Seok Kang

AI总结本文提出智力融合人类（ICH）框架，通过引入四维认知构念“融合能力”（C）作为AI与生产率之间的认知中介，解释了AI投资未能带来相应生产率增长的理论悖论，并基于20个OECD国家的数据分析验证了AI与C的交互作用对全要素生产率变异的解释力。

Comments 78 pages, 3 figures

详情

AI中文摘要

为什么大规模AI投资未能产生相应的生产率增长？我们认为这一悖论在理论上是生成的：主流生产函数框架通过将AI视为可分离的生产要素，而未建模AI产生生产性价值的认知中介，从而遇到了结构性边界。这导致投资倾向于部署，而生产率需要先发展我们称之为融合能力（C）的东西。我们提出了智力融合人类（ICH）框架，这是生产函数理论的第五阶段框架：H-hat = H[1 + phi(A,C)]，其中有效生产能力等于人力资本（H）乘以一个增强因子[1 + phi]，phi由AI利用强度（A）和融合能力（C）共同决定，C是一个四维认知构念，涵盖具身理解、元认知、时间整合和整合思维。生产函数Y = F(K, H-hat)为索洛的TFP残差提供了一个以人为中心的机制：A_Solow = [1 + phi(A,C)]^(1-alpha)。该框架预测了三种具有不同政策含义的增强机制。对20个OECD经济体的描述性跨国分析显示，AIxC交互作用与86%的TFP变异相关，而仅AI为31%，这是小n理论传统中模式一致的发现。韩国是国家级欠增强的例证：高H、大量A、低C导致phi=0。我们将融合能力与相邻构念——吸收能力、动态能力和人力资本——区分开来，并证明C构成了先前框架中隐含的特定认知中介。我们推导出C优先的政策建议，并提出了三个可实证检验的命题及一个可证伪的10年预测。

英文摘要

Why does massive AI investment fail to generate commensurate productivity gains? We argue the paradox is theoretically generated: prevailing production function frameworks encounter a structural boundary by treating AI as a separable factor of production without modeling the cognitive mediation through which AI generates productive value. This directs investment toward deployment when productivity requires prior development of what we term convergence capacity (C). We propose the Intellectually Converged Human (ICH) framework, a fifth-stage framework for production function theory: H-hat = H[1 + phi(A,C)], where effective productive capacity equals human capital (H) scaled by an augmentation factor [1 + phi], with phi jointly determined by AI utilization intensity (A) and convergence capacity (C), a four-dimensional cognitive construct encompassing embodied understanding, metacognition, temporal integration, and integrative thinking. The production function Y = F(K, H-hat) provides a human-centered mechanism for Solow's TFP residual: A_Solow = [1 + phi(A,C)]^(1-alpha). The framework predicts three augmentation regimes with distinct policy implications. Descriptive cross-national analysis of 20 OECD economies shows the AIxC interaction is associated with 86% of TFP variance versus 31% for AI alone, a pattern-consistent finding in the small-n theoretical tradition. South Korea exemplifies national-scale under-augmentation: high H, substantial A, low C produce phi = 0. We distinguish convergence capacity from adjacent constructs, absorptive capacity, dynamic capability, and human capital, and demonstrate that C constitutes the specific cognitive mediator that prior frameworks have left implicit. We derive C-first policy prescriptions and offer three empirically testable propositions with a falsifiable 10-year forecast.

URL PDF HTML ☆

赞 0 踩 0

2606.20461 2026-06-19 cs.LG cs.CY cs.DB 交叉投稿

Data Bias Mitigation under Coverage Constraints & The Price of Fairness

覆盖约束下的数据偏差缓解与公平的代价

Bruno Scarone, Alfredo Viola, Renée J. Miller

发表机构 * Khoury College of Computer Sciences, Northeastern University（东北大学库里计算机科学学院）； Cheriton School of Computer Science, University of Waterloo（滑铁卢大学切里顿计算机科学学院）

AI总结针对多敏感属性交叉群体的偏差问题，提出在覆盖约束下扩展偏差缓解框架，通过整数线性规划优化缓解策略，权衡偏差近似误差与数据效率，并刻画公平的代价。

Comments Accepted to FAccT 2026

详情

AI中文摘要

机器学习模型已被证明在多个敏感属性（如种族和性别）交叉的个体上表现出歧视性结果或性能下降。这源于两个相互关联的挑战：缺乏量化偏差（可能是交叉的）的原则性措施，以及训练数据中交叉子群的代表性不足。我们扩展了一个最近的偏差缓解框架，以纳入覆盖约束，确保跨群体（包括交叉子群）的充分代表性。由于对所有群体实现完全零偏差可能不是数据高效的（意味着可能需要大量数据），我们的解决方案在满足覆盖约束的同时，用偏差的小近似误差换取更高的数据效率。我们还将偏差缓解表述为一个整数线性规划，优化所有缓解策略，并刻画公平的代价，即最小数据修改成本，作为公平容忍度的函数。这对于法律合规（法规可能规定特定的公平阈值）和数据治理（使从业者能够在偏差减少和数据修改（特别是数据购买）成本之间做出明智的权衡）都至关重要。我们在公开数据集上评估了我们的技术，表明通过我们的框架进行偏差缓解可以保持多个分类器的预测准确性，并且覆盖约束虽然出于统计考虑，但对于保持下游机器学习性能至关重要。

英文摘要

Machine learning models have been shown to exhibit discriminatory outcomes or degraded performance for individuals at the intersection of multiple sensitive attributes, such as race and gender. This stems in part from two interrelated challenges: the lack of principled measures for quantifying bias (potentially intersectional), and insufficient representation of intersectional subgroups in training data. We extend a recent bias mitigation framework to incorporate coverage constraints that enforce sufficient representation across groups, including intersectional subgroups. Since achieving exactly zero bias for all groups may not be data efficient (meaning it may require large amounts of data), our solution trades small approximation errors in bias for greater data efficiency while satisfying coverage constraints. We also formulate bias mitigation as an integer linear program that optimizes over all mitigation strategies, and characterize the price of fairness, the minimum data modification cost, as a function of fairness tolerance. This is essential both for legal compliance, where regulations may mandate specific fairness thresholds, and for data governance, enabling practitioners to make informed trade-offs between bias reduction and data modification (particularly, data purchasing) costs. We evaluate our techniques on publicly available datasets, demonstrating that bias mitigation via our framework preserves predictive accuracy across multiple classifiers, and that coverage constraints, while motivated by statistical considerations, are essential for preserving downstream ML performance.

URL PDF HTML ☆

赞 0 踩 0

2606.20375 2026-06-19 cs.HC cs.CY 交叉投稿

Organizing in the Digital Age: Understanding Community, Challenges, and Consequences in Digitally-facilitated Labor Organizing

数字时代的组织：理解数字辅助劳工组织中的社区、挑战与后果

Frederick Reiber, Alishah Chator, Dana Calacci, Allison McDonald

AI总结本研究通过17次定性访谈，分析劳工组织如何使用Discord、WhatsApp和Slack等数字平台进行组织，揭示了技术安全、信息过载和信任建立等挑战与机遇。

Comments To appear in CSCW 2026

2606.20065 2026-06-19 cs.IR cs.CL cs.CY 交叉投稿

Generative Engine Optimization at Scale: Measuring Brand Visibility Across AI Search Engines

生成式引擎优化规模化：衡量AI搜索引擎中的品牌可见性

Pratyush Kumar

AI总结本研究通过分析10万+提示响应，提出衡量AI搜索引擎中品牌可见性的方法，发现品牌成熟度形成三级阶梯，并识别出最受引用的内容格式和情感不稳定性。

Comments 14 pages, 4 tables; v1.0 preprint

详情

AI中文摘要

人们越来越多地从AI搜索引擎（如ChatGPT、Claude、Perplexity和Gemini）直接获取答案，而不是滚动浏览搜索结果。曾经专注于搜索引擎优化（SEO）的品牌现在必须优化这些引擎如何代表、引用和推荐它们——这一转变被称为生成式引擎优化（GEO）、答案引擎优化（AEO）和AI搜索可见性。我们将AEO和AI可见性视为GEO的一部分，并研究如何衡量AI引擎中的品牌可见性：它们在引用品牌时看重什么，依赖哪些来源，以及大型语言模型呈现什么内容。难点在于那些尚未成为权威顶级品牌的所有其他品牌——中小企业、D2C品牌、创作者和早期初创公司。我们分析了2026年3月至5月期间在Ranqo上追踪的100多个品牌的10万+提示响应。首次可见性运行形成了清晰的三级品牌地位阶梯：全球家喻户晓的品牌（如Stripe、Nike）在首次运行时出现在73%的相关AI答案中；成熟的中端市场和区域品牌（如Olipop、Klaviyo）出现在44%中；小众和小品牌仅出现在11%中——每级约30个百分点。当引擎引用来源时，约78%指向企业网站；在非企业来源中，YouTube领先，其次是Reddit、编辑媒体和维基百科。杠杆率最高的页面是排名“最佳”列表文章，是最常被引用的内容格式，约占所有引用的21%。情感是不稳定的信号：品牌被正面或负面描述的变化频率大约是品牌是否被提及的变化频率的6.7倍。这些发现为衡量GEO提供了首个大规模基线：AI品牌可见性是可测量的，因平台而异，并随品牌成熟度强烈变化。最后，我们提出了七个v1.1协议，以测试特定建议是否能因果性地提高AI可见性。

英文摘要

People increasingly get answers straight from AI search engines like ChatGPT, Claude, Perplexity, and Gemini rather than scrolling search results. Brands that once focused on search engine optimization (SEO) must now optimize for how these engines represent, cite, and recommend them -- a shift variously called Generative Engine Optimization (GEO), Answer Engine Optimization (AEO), and AI Search Visibility. We treat AEO and AI Visibility as part of GEO, and study how to measure brand visibility across AI engines: what they value when they cite a brand, which sources they rely on, and what content large language models surface. The hard case is everyone outside the already-authoritative top brands -- SMEs, D2C brands, creators, and early-stage startups. We analyze 100K+ prompt responses across 100+ brands tracked on Ranqo between March and May 2026. First visibility runs form a clear three-tier brand-stature ladder: global household names (e.g., Stripe, Nike) appear in 73% of relevant AI answers on their first run; established mid-market and regional brands (e.g., Olipop, Klaviyo) in 44%; niche and small brands in just 11% -- about 30 percentage points per step. When engines cite sources, about 78% go to corporate websites; among non-corporate sources YouTube leads, ahead of Reddit, editorial media, and Wikipedia. The highest-leverage page is the ranked "best-of" listicle, the most-cited content format at about 21% of all citations. Sentiment is the unstable signal: whether a brand is framed positively or negatively flips about 6.7 times more often than whether it is mentioned at all. These findings provide a first large-scale baseline for measuring GEO: AI brand visibility can be measured, differs by platform, and varies strongly by brand maturity. We close by proposing seven v1.1 protocols to test whether specific recommendations can causally improve AI visibility.

URL PDF HTML ☆

赞 0 踩 0

2606.19647 2026-06-19 cs.CL cs.CY cs.SI 交叉投稿

From 50K to 8.2 Million in 24 Hours: Vozinha's Algorithmic Consecration and the Multilingual Making of World Cup Visibility

从5万到820万在24小时内：Vozinha的算法封圣与世界杯可见性的多语言构建

Vinicius Covas

发表机构 * Universidad Anáhuac México（墨西哥阿纳瓦克大学）

AI总结通过多语言语料库和九框架叙事分类法，分析2026年世界杯后Vozinha的算法封圣过程，揭示不同语言承载不同叙事框架，将平台粉丝数作为语言对象研究可见性构建。

Comments 11 pages, 4 figures, 3 tables; v0.1 pilot preprint. Dataset and evidence package available at https://doi.org/10.5281/zenodo.20722235

详情

AI中文摘要

我们提出了一项多语言计算话语分析，研究语言如何构建了Vozinha——这位40岁的佛得角门将在2026年世界杯西班牙0-0佛得角比赛后的算法封圣。该研究贡献了一个包含葡萄牙语、西班牙语、英语和法语的多语言语料库；一个基于线索的九框架叙事分类法；一个结合LLM辅助建议与人工验证的可复现标注流程；以及跨话语阶段的多语言叙事扩散分析。我们将平台粉丝数本身——被叙述为“从5万到800万”——视为一个语言对象：一种流通且可叙述的可见性证明，而非单纯的测量。粉丝增长时间线仅作为上下文元数据使用：我们重构了一个保守的阶段结构，而非连续的API原生序列，并对每个数据点按值类别、置信度和证据类型进行标注。唯一精确的主要爬取锚点是2026年6月16日15:47 UTC的8,235,652粉丝；所有其他数字均报告为估计范围或阈值，包括估计的赛前基线45k-56k。研究结果表明，不同语言承载了不同的框架：葡萄牙语的动员、西班牙语的危机、英语的民族构建，以及共享的平台指标奇观，通过这种奇观，边缘的体育表现变得全球可见。作为v0.1试点，本文发布了语料库模式、框架分类法、标注指南、哈希视觉证据日志和类型化时间线，同时将完整的双重标注和标注者间一致性标记为计划工作。

英文摘要

We present a multilingual computational discourse analysis of how language constructed the algorithmic consecration of Vozinha, the 40-year-old Cape Verde goalkeeper, after Spain 0-0 Cape Verde at the 2026 FIFA World Cup. The study contributes a multilingual corpus in Portuguese, Spanish, English, and French; a nine-frame narrative taxonomy with cue-based frame annotation; a reproducible annotation pipeline combining LLM-assisted suggestion with human validation; and an analysis of cross-lingual narrative diffusion across discourse phases. We treat the platform follower count itself, narrated as "50k to 8M", as a linguistic object: a circulating and narratable proof of visibility rather than a mere measurement. The follower-growth timeline is used only as contextual metadata: we reconstruct a conservative phase structure, not a continuous API-native series, and type every datapoint by value class, confidence, and evidence type. The only exact primary scraper anchor is 8,235,652 followers at 2026-06-16 15:47 UTC; all other figures are reported as estimated ranges or thresholds, including an estimated pre-match baseline of 45k-56k. Findings suggest that distinct languages carried distinct frames: Portuguese mobilization, Spanish crisis, English nation-making, and a shared platform-metric spectacle through which peripheral athletic performance became globally visible. As a v0.1 pilot, the paper releases the corpus schema, frame taxonomy, annotation guidelines, hashed visual-evidence log, and typed timeline, while flagging full double annotation and inter-annotator agreement as planned work.

URL PDF HTML ☆

赞 0 踩 0

2606.18649 2026-06-19 cs.MA cs.CL cs.CY 交叉投稿

Gender Bias in LLM Hiring Decisions: Evidence from a Japanese Context and Evaluation of Mitigation Strategies

LLM招聘决策中的性别偏见：来自日本语境的证据及缓解策略评估

Serena A. Hoffstedde, Machiko Hirota, Akshara Nadayanur Sathis Kanna, Rihito Kotani, Ujwal Kumar, Gabriele Trovato, Phan Xuan Tan

发表机构 * Shibaura Institute of Technology, Tokyo, Japan（Shibaura技术学院，东京，日本）； Amsterdam University of Applied Sciences, Amsterdam, Netherlands（阿姆斯特丹应用科学大学，阿姆斯特丹，荷兰）； University of Pennsylvania, Philadelphia, USA（宾夕法尼亚大学，费城，美国）； Carnegie Mellon University, Pittsburgh, USA（卡内基梅隆大学，匹兹堡，美国）； Keio University, Tokyo, Japan（庆应大学，东京，日本）

AI总结本研究通过60份日本履历书格式的简历和5个先进LLM，发现所有模型均存在显著的亲女性偏见，且简单的提示指令无法缓解，而移除姓名几乎完全消除该偏见。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被部署在招聘流程中，然而大多数关于LLM招聘决策中性别偏见的研究都集中在英语、西方格式的简历上。本研究考察了亲女性性别偏见是否扩展到日本企业语境，并评估了两种实用的缓解策略。使用反事实简历设计，包含60份日本履历书格式的简历、基于语言学性别信号标准选择的12个姓名对，以及五个最先进的LLM（Claude Sonnet 4.6、GPT-4o、DeepSeek-V3、Gemini 2.5 Flash、Llama 3.3 70B），我们在基线、提示指令和隐私过滤条件下进行了43,200次API调用。交叉随机效应线性混合模型确认了所有五个模型均存在显著的亲女性偏见，将西方研究结果复制到了非西方语境中。提示级别的性别中立指令并未显著减少偏见。姓名依赖分析正式将候选人姓名识别为主要性别渠道：从提示中移除姓名几乎完全消除了女性效应。隐私过滤器与GPT-4o内容安全过滤器之间的意外不兼容导致42%的拒绝率，突显了在LLM辅助招聘流程中姓名匿名化的实际部署挑战。

英文摘要

Large language models (LLMs) are increasingly deployed in hiring workflows, yet most research on gender bias in LLM hiring decisions has focused on English-language, Western-format resumes. This study examines whether pro-female gender bias extends to a Japanese corporate context and evaluates two practical mitigation strategies. Using a counterfactual resume design with 60 Japanese rirekisho-format resumes, 12 name pairs selected on linguistically grounded gender-signal criteria, and five state-of-the-art LLMs (Claude Sonnet 4.6, GPT-4o, DeepSeek-V3, Gemini 2.5 Flash, Llama 3.3 70B), we conducted 43,200 API calls across baseline, prompt instruction, and privacy filter conditions. A crossed random-effects linear mixed model confirms a significant pro-female bias across all five models, replicating Western findings in a non-Western context. A prompt-level gender-neutrality instruction produces no meaningful reduction in bias. A name-reliance analysis formally identifies the candidate name as the primary gender channel: removing the name from the prompt reduces the female effect by nearly its full magnitude. An unexpected incompatibility between the privacy filter and GPT-4o's content safety filter, resulting in a 42% refusal rate, highlights a practical deployment challenge for name anonymization in LLM-assisted recruitment pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.04075 2026-06-19 cs.LG cs.AI cs.CL cs.CR cs.CY 版本更新

Large Language Models Hack Rewards, and Society

大型语言模型攻击奖励机制与社会

Wei Liu, Xinyi Mou, Hanqi Yan, Zhongyu Wei, Yulan He

发表机构 * King’s College London（伦敦大学国王学院）； Fudan University（复旦大学）； The Alan Turing Institute（艾伦·图灵研究所）

AI总结研究强化学习训练中大型语言模型利用奖励函数漏洞的“社会攻击”现象，通过SocioHack沙盒实验发现模型能发现并利用社会规则漏洞，且现有安全措施效果有限。

Comments 14 pages, 9 figures, 7 tables

详情

AI中文摘要

强化学习已成为一种主导的后训练范式，使大型语言模型能够从奖励中学习。我们观察到社会规则在结构上与奖励函数相似。它们定义了可衡量的结果、阈值和例外情况，同时往往仅部分指定了制度意图。我们假设强化学习训练过程可能利用这些漏洞，因此提出模型在强化学习期间攻击奖励函数的已知倾向是否可能扩展为一种更严重的失败模式，即社会攻击：发现社会运行规则中的漏洞。为了研究这一现象，我们引入了SocioHack，一个包含72个社会环境的沙盒，并发现这些环境中奖励攻击自然出现并导致监管漏洞的发现。模型学会攻击社会规则并生成技术上合规但违背监管意图的策略，而当前的大型语言模型安全措施仅提供有限的缓解。因此，收集真实世界反馈用于模型训练需要更加谨慎，我们需要下一代后训练范式来安全地在真实社会中迭代大型语言模型。

英文摘要

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=

URL PDF HTML ☆

赞 0 踩 0

2604.01955 2026-06-19 cs.CY 版本更新

Teaching Students to Question the Machine: An AI Literacy Intervention Improves Students' Regulation of LLM Use in a Science Task

教导学生质疑机器：一项AI素养干预措施提升学生在科学任务中调节LLM使用的能力

O. Clerc, R. Abdelghani, C. Desvaux, E. Poisson, P. Y. Oudeyer, H. Sauzéon

AI总结本研究通过两小时的AI素养工作坊，训练中学生（8-9年级）在科学问题解决中更有效地使用大语言模型，减少盲目依赖并提高答案质量。

Comments Workshop paper accepted at ALIT4ALL 2026: 2nd International Workshop on AI Literacy Education For All, co-located with AIED 2026

详情

AI中文摘要

生成式人工智能（GenAI）在学校中的快速普及引发了人们对学生不加批判地依赖其输出的担忧。有效使用大语言模型（LLM）不仅需要技术知识，还需要监控、评估和调节与系统交互的能力，这些过程与元认知调节密切相关。这些技能在中学阶段仍在发展中，使得学生特别容易过度信任和过早接受AI输出。由于课堂时间和教师培训资源有限，迫切需要开发和评估可在现实学校条件下实施的AI素养干预措施。我们报告了一项受控的课堂研究，考察两小时的AI素养工作坊是否能改善学生在LLM支持的科学问题解决中的交互策略和最终答案质量。共有116名学生（8-9年级；13-15岁）使用生成式AI系统完成了六项科学调查任务。两天前，干预组参加了工作坊，该工作坊结合了关于LLM如何工作及失败的信息，以及关于提示和响应评估的实用指导；对照组未接受培训。受过训练的学生表现出更少的盲目依赖：他们更频繁地重新表述查询、提出后续问题，并更准确地判断响应正确性，从而获得更好的表现。相比之下，GenAI和元认知自我报告分数不能预测表现，这表明有效使用生成式AI较少依赖于自我报告测量，而更多依赖于交互调节的明确训练。总体而言，结果表明，简短、可扩展的AI素养教学可以显著改善中学生在校本学习活动中使用生成式AI的方式。

英文摘要

The rapid adoption of generative artificial intelligence (GenAI) in schools raises concerns about students' uncritical reliance on its outputs. Effective use of large language models (LLMs) requires not only technical knowledge but also the ability to monitor, evaluate, and regulate one's interaction with the system, processes closely tied to metacognitive regulation. These skills are still developing in middle school, making students particularly vulnerable to over-trust and premature acceptance of AI outputs. Because classroom time and teacher training resources are constrained, there is a pressing need to develop and evaluate AI literacy interventions that can be implemented under realistic school conditions. We report a controlled classroom study examining whether a two-hour AI literacy workshop improves students' interaction strategies and quality of final answers in LLM-supported science problem solving. A total of 116 students (grades 8-9; ages 13-15) completed six science investigation tasks using a generative AI system. Two days prior, the intervention group attended the workshop, which combined information about how LLMs work and fail with practical guidance on prompting and response evaluation; the control group received no training. Trained students showed less uncritical reliance on the system: they more often reformulated queries, asked follow-up questions, and more accurately judged response correctness, leading to better performance. In contrast, GenAI and metacognitive self-report scores did not predict performance, suggesting that effective use of generative AI depends less on self-reported measures and more on explicit training in interaction regulation. Overall, the results show that brief, scalable AI literacy instruction can meaningfully improve how middle-school students use generative AI in school-like learning activities.

URL PDF HTML ☆

赞 0 踩 0

2603.16357 2026-06-19 cs.CY cs.SE 版本更新

Beyond Grading Accuracy: Exploring Alignment of TAs and LLMs

超越评分准确性：探索助教与LLMs的一致性

Matthijs Jansen op de Haar, Nacir Bouali, Faizan Ahmed

AI总结本文提出一个评估管道，通过定量研究92个UML类图，比较助教与六个开源LLMs在单个评分标准上的表现，发现开源LLMs在评分准确性上接近助教，为混合主动评分系统提供了可能。

Comments 7 pages, 3 figures

详情

AI中文摘要

在本文中，我们研究了开源大型语言模型（LLMs）在评分统一建模语言（UML）类图方面的潜力。与现有主要评估专有LLMs的工作不同，我们专注于非专有模型，使得我们的方法适用于对透明度和成本敏感的大学。此外，现有研究评估的是完整图表而非单个标准的性能，对自动评分与人类评估的一致性提供的见解有限。为解决这些差距，我们提出一个评分管道，其中学生生成的UML类图由助教（TAs）和LLMs独立评估，然后在单个标准级别比较评分。我们通过一项对软件设计课程中92个UML类图的定量研究来评估该管道，将助教评分与六个开源LLMs产生的评估进行比较。性能在单个标准上测量，突出LLMs与人类评分者存在差异的领域。我们的结果显示，每个标准的准确率高达88.56%，皮尔逊相关系数高达0.78，仅使用开源模型就比先前工作有显著改进。这些模型的性能接近助教，表明了一条通往混合主动评分系统的可能路径，其中助教在评分中得到辅助。我们的发现表明，开源LLMs可以通过明确识别与评分标准的一致性来有效支持UML类图评分。所提出的管道提供了一种实用方法，以应对随着学生人数增长而增加的工作量。

英文摘要

In this paper, we investigate the potential of open-source Large Language Models (LLMs) for grading Unified Modeling Language (UML) class diagrams. In contrast to existing work, which primarily evaluates proprietary LLMs, we focus on non-proprietary models, making our approach suitable for universities where transparency and cost are critical. Additionally, existing studies assess performance over complete diagrams rather than individual criteria, offering limited insight into how automated grading aligns with human evaluation. To address these gaps, we propose a grading pipeline in which student-generated UML class diagrams are independently evaluated by both teaching assistants (TAs) and LLMs. Grades are then compared at the level of individual criteria. We evaluate this pipeline through a quantitative study of 92 UML class diagrams from a software design course, comparing TA grades against assessments produced by six open-source LLMs. Performance is measured across individual criteria, highlighting areas where LLMs diverge from human graders. Our results show per-criterion accuracy of up to 88.56\% and a Pearson correlation coefficient of up to 0.78, representing a substantial improvement over previous work while using only open-source models. The models achieve performance close to that of a TA, suggesting a possible path toward a mixed-initiative grading system, where TAs are aided in their grading. Our findings demonstrate that open-source LLMs can effectively support UML class diagram grading by explicitly identifying alignment with grading criteria. The proposed pipeline provides a practical approach to managing increasing workloads with growing student counts.

URL PDF HTML ☆

赞 0 踩 0

2412.20298 2026-06-19 cs.LG cs.CY stat.ML 版本更新

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban, Tai Le Quy

发表机构 * Banking Academy of Vietnam（越南银行学院）； Vietnam Academy of Science and Technology（越南科学技术 academy）； Hanoi University of Science and Technology（河内科学技术大学）； University of Koblenz（科隆大学）

Comments The manuscript is submitted to Springer Nature's journal

2509.03391 2026-06-19 cs.DL cs.CY 版本更新

More Parameters Than Populations: A Systematic Literature Review of Large Language Models within Survey Research

参数多于总体：调查研究中的大语言模型系统文献综述

Trent D. Buskirk, Florian Keusch, Leah von der Heyde, Adam Eck

AI总结通过系统文献综述，评估大语言模型在调查研究三个阶段（数据收集前、中、后）的应用，讨论其潜力与陷阱，并展望调查研究对LLM发展的贡献。

Comments This working paper is outdated as of June 2026 - please refer to the full version with substantive changes here: https://doi.org/10.31235/osf.io/eubj4_v1 This work was presented at NLPOR 2025 (non-archival): https://openreview.net/forum?id=0Hxhwa56Yg

详情

AI中文摘要

[工作论文]调查研究长期以来一直是人力驱动的领域，但也接纳了多种技术来收集、处理和分析各种行为、政治和社会结果。与此同时，大语言模型（LLM）带来了新的技术挑战和前提条件，以充分利用其潜力。在本文中，我们报告了一项基于多个大规模数据库关键词搜索和引文网络的系统文献综述的进展，评估LLM目前在调查研究过程中的应用情况。我们根据调查研究过程综合并组织我们的发现，包括LLM在三个广泛阶段的使用示例：数据收集前、数据收集和数据收集后。我们基于现有文献中的示例，讨论了LLM潜在用例的选定示例及其陷阱。考虑到调查研究在数据质量方面拥有丰富的经验和历史，我们讨论了一些机会，并描述了调查研究为LLM的持续发展和改进做出贡献的未来展望。

英文摘要

[Working Paper] Survey research has a long-standing history of being a human-powered field, but one that embraces various technologies for the collection, processing, and analysis of various behavioral, political, and social outcomes of interest, among others. At the same time, Large Language Models (LLMs) bring new technological challenges and prerequisites in order to fully harness their potential. In this paper, we report work-in-progress on a systematic literature review based on keyword searches from multiple large-scale databases as well as citation networks that assesses how LLMs are currently being applied within the survey research process. We synthesize and organize our findings according to the survey research process to include examples of LLM usage across three broad phases: pre-data collection, data collection, and post-data collection. We discuss selected examples of potential use cases for LLMs as well as its pitfalls based on examples from existing literature. Considering survey research has rich experience and history regarding data quality, we discuss some opportunities and describe future outlooks for survey research to contribute to the continued development and refinement of LLMs.

URL PDF HTML ☆

赞 0 踩 0

2501.18038 2026-06-19 cs.CY 版本更新

Acceleration AI Ethics and the Telus GenAI Conversational Agent

加速AI伦理与Telus生成式AI对话代理

James Brusseau

AI总结本文阐述加速伦理学的理论框架，并通过Telus公司的生成式AI语言工具案例，展示加速AI伦理如何在创新与安全之间平衡，以最大化社会责任。

Journal ref Law Ethics Technol. 2026(2):0006

详情

DOI: 10.55092/let20260006

AI中文摘要

加速伦理学处理人工智能中创新与安全之间的张力。加速论点是，创新带来的风险应通过更多的创新来应对。本文总结了这一理论立场，然后展示了加速伦理学在真实案例中如何运作。首先，本文总结了加速伦理学的五个要素：创新解决创新问题、创新具有内在价值、未知令人鼓舞、治理去中心化、伦理嵌入其中。随后，本文通过一个用例——加拿大电信公司Telus开发的生成式人工智能语言工具——来说明加速框架。尽管理论立场的纯粹性被现实世界的模糊性所模糊，但Telus的经验表明，加速AI伦理是通过创新最大化社会责任的一种方式，而不是为了创新牺牲社会责任，或者为了社会责任牺牲创新。

英文摘要

Acceleration ethics addresses the tension between innovation and safety in artificial intelligence. The acceleration argument is that risks raised by innovation should be answered with still more innovating. This paper summarizes the theoretical position, and then shows how acceleration ethics works in a real case. To begin, the paper summarizes acceleration ethics as composed of five elements: innovation solves innovation problems, innovation is intrinsically valuable, the unknown is encouraging, governance is decentralized, ethics is embedded. Subsequently, the paper illustrates the acceleration framework with a use-case, a generative artificial intelligence language tool developed by the Canadian telecommunications company Telus. While the purity of theoretical positions is blurred by real-world ambiguities, the Telus experience indicates that acceleration AI ethics is a way of maximizing social responsibility through innovation, as opposed to sacrificing social responsibility for innovation, or sacrificing innovation for social responsibility.

URL PDF HTML ☆

赞 0 踩 0