arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.20388 2026-06-19 cs.HC cs.AI cs.DB 新提交

DataMagic: Transforming Tabular Data into Data Insight Video

DataMagic: 将表格数据转化为数据洞察视频

Yupeng Xie, Chen Ma, Zhenyang Wang, Liangwei Wang, Jiayi Zhu, Chuxuan Zeng, Zhouan Shen, Boyan Li, Yuyu Luo

AI总结提出DataMagic系统，通过声明式规范DVSpec和多智能体架构，将原始表格数据和自然语言查询转化为叙事性数据洞察视频，并支持交互式探索。

Comments 5 pages, 3 figures, accepted at VLDB 2026

详情

AI中文摘要

数据视频整合动态图表、语音叙述和同步动画，以时间叙事的方式传达数据洞察，使其成为提高数据管理生命周期中数据消费效率的有效媒介。然而，制作高质量的数据视频需要涵盖数据分析、叙事设计和视频制作的专业知识。现有方法存在不足：静态可视化工具（如BI仪表板）缺乏叙事逻辑和动画；创作工具要求用户预先准备可视化，而非从原始数据开始；像素级视频生成模型无法保证数据保真度或来源。我们演示了DataMagic，一个端到端的交互式系统，将原始表格数据和自然语言查询转化为叙事性数据洞察视频。为确保数据保真度，DataMagic引入了声明式规范DVSpec，通过数据驱动的语义引用将视觉和动画元素绑定到底层数据字段。为解决设计空间的组合爆炸问题，DataMagic采用先生成后编排的多智能体架构，并行生成候选场景，然后通过全局编排优化叙事连贯性。利用DVSpec逻辑与渲染的解耦，系统进一步支持三种交互模式和基于结构化来源的数据问答，将单向视频转化为可探索的交互式数据界面。在109个真实世界样本上的评估验证了DataMagic的有效性。主页：此 https URL

英文摘要

Data videos integrate dynamic charts, voice narration, and synchronized animations to communicate data insights as temporal narratives, making them an effective medium for improving data consumption efficiency in the data management lifecycle. However, producing high-quality data videos requires expertise spanning data analysis, narrative design, and video production. Existing approaches fall short: static visualization tools (e.g., BI dashboards) lack narrative logic and animation; authoring tools require users to pre-prepare visualizations rather than working from raw data; pixel-level video generation models cannot guarantee data fidelity or provenance. We demonstrate DataMagic, an end-to-end interactive system that transforms raw tabular data and natural language queries into narrative data-insight videos. To ensure data fidelity, DataMagic introduces the declarative specification DVSpec, which binds visual and animation elements to underlying data fields through data-driven semantic references. To address the combinatorial explosion of the design space, DataMagic adopts a Generate-then-Orchestrate multi-agent architecture that generates candidate scenes in parallel and then optimizes narrative coherence through global orchestration. Leveraging DVSpec's decoupling of logic and rendering, the system further supports three interaction modes and structured provenance-based data Q&A, transforming one-way videos into explorable interactive data interfaces. Evaluation on 109 real-world samples validates the effectiveness of the DataMagic. Homepage: https://datamagic-home.github.io/

URL PDF HTML ☆

赞 0 踩 0

2606.20375 2026-06-19 cs.HC cs.CY 新提交

Organizing in the Digital Age: Understanding Community, Challenges, and Consequences in Digitally-facilitated Labor Organizing

数字时代的组织：理解数字辅助劳工组织中的社区、挑战与后果

Frederick Reiber, Alishah Chator, Dana Calacci, Allison McDonald

AI总结本研究通过17次定性访谈，分析劳工组织如何使用Discord、WhatsApp和Slack等数字平台进行组织，揭示了技术安全、信息过载和信任建立等挑战与机遇。

Comments To appear in CSCW 2026

2606.20258 2026-06-19 cs.HC cs.AI 新提交

Editorial Alignment: A Participatory Approach to Engaging Editorial Expertise in LLM-mediated Knowledge Dissemination

编辑对齐：一种参与式方法，将编辑专业知识引入LLM介导的知识传播

Simon Aagaard Enni, Malthe Stavning Erslev, Karl-Emil Kjær Bilstrup, Kristoffer Laigaard Nielbo

AI总结本文提出“编辑对齐”作为参与式AI设计实践，通过设计工作坊让编辑参与重新对齐LLM接口至编辑标准，以维护公共知识机构的编辑职能。

Comments 14 pages

详情

AI中文摘要

LLM驱动的信息服务的出现正在重塑公共知识机构的运作条件，威胁着吸收这些机构赖以存在的编辑功能。虽然LLM为知识传播提供了强大的新可能性，但预训练的LLM已经与其商业开发者的价值观和传播策略对齐，从而挑战了编辑权威。本文通过一个案例研究，调查编辑通过设计工作坊参与将LLM接口重新对齐到编辑标准的过程，在该案例中，我们与一家北欧公共知识机构设计并实现了一个LLM增强的百科全书界面。我们将编辑对齐作为参与式AI中的一种设计实践引入，将AI对齐视为一个设计过程，并将编辑标准定位为一种设计工件，将编辑实践和价值观转化为技术实现的对齐目标。最后，我们讨论了编辑对齐如何为持续参与创造空间，并赋予编辑在LLM介导的知识传播中的自主权。

英文摘要

The emergence of LLM-driven information services is reshaping the conditions under which public knowledge institutions operate, threatening to absorb the editorial function these institutions exist to exercise. While LLMs offer powerful new affordances for knowledge dissemination, editorial authority is challenged by pretrained LLMs that arrive already aligned with the values and dissemination strategies of their commercial developers. This paper investigates editor participation in re-aligning LLM interfaces to editorial standards through design workshops, in a case study where we design and implement an LLM-enabled encyclopedia interface with a Nordic public knowledge institution. We introduce editorial alignment as a design practice within Participatory AI, framing AI alignment as a design process and positioning the editorial standard as a design artefact that translates editorial practice and values into alignment objectives for technical implementation. Last, we discuss how editorial alignment can create space for ongoing participation and give editors agency in LLM-mediated knowledge dissemination.

URL PDF HTML ☆

赞 0 踩 0

2606.20064 2026-06-19 cs.HC 新提交

AI Conversational Interviewing: Scaling Up Semi-Structured and In-depth Interviews

AI对话式访谈：扩展半结构化与深度访谈的规模

Alexander Wuttke, Max Melchior Lang, Christopher Klamm, Quirin Würschinger, Frauke Kreuter

AI总结本研究提出AI对话式访谈方法，通过语音、文本或自由选择模式大规模收集开放型意见数据，证明其能捕捉标准化调查遗漏的深层思考，且受访者评价不低于传统调查。

详情

AI中文摘要

舆论研究长期以来面临深度与规模之间的权衡：标准化调查能够进行大规模测量，但将受访者限制在研究者定义的类别中，掩盖了公众情绪背后多样化的意外考量。更具对话性的访谈通过开放式探究提供更丰富的见解，但其对训练有素的人类访谈者的依赖使其难以规模化。本研究引入AI对话式访谈作为一种大规模收集开放型舆论数据的方法，追求三个目标：展示对话文本数据对于封闭式问题无法触及的问题的分析价值；通过参与者自身的评估评估该方法的实际可行性；并通过实验比较语音、文本和自由选择访谈模式来指导实施。我们进行了一项研究，将AI主导的访谈与关于移民政策的标准化调查相结合，通过Prolific和Payback Panel招募了571名受访者。研究结果确立了AI对话式访谈作为社会科学工具包中可行且有价值的补充。对话记录揭示了标准化综合问卷无法捕捉的考量和推理，例如在态度水平相似的子群体中存在显著不同的移民心智模型。在完成访谈的受访者中，对AI访谈的评价在各模式下均达到或超过标准化调查，尽管完成率因条件而异。通过发布开放数据和开源流程材料，本研究为利用人工智能扩展舆论测量方法的日益增长的文献做出了贡献。

英文摘要

Public opinion research has long faced a trade-off between depth and scale: standardized surveys enable large-scale measurement but restrict respondents to researcher-defined categories, obscuring the diversity of unexpected considerations that underlie public sentiment. More conversational interviews provide richer insights through open-ended probing, but their reliance on trained human interviewers has kept them difficult to scale. This study introduces AI Conversational Interviewing as a method for collecting open-ended public opinion data at scale, pursuing three objectives: to demonstrate the analytical value of conversational text data for questions beyond the reach of closed-ended items; to assess the method's practical viability through participants' own evaluations; and to inform implementation by experimentally comparing voice-based, chat-based, and free-choice interview modes. We conducted a study combining an AI-led interview with a standardized survey on migration policy among 571 respondents recruited via Prolific and Payback Panel. The findings establish AI Conversational Interviewing as a viable and valuable addition to the social-science toolkit. The conversational transcripts surface considerations and reasoning that a comprehensive standardized battery does not capture such as markedly different mental models of migration among subgroups with similar attitudes levels. Among respondents who completed the interview, evaluations of the AI interview were at or above those of the standardized survey across modes, although completion itself varied by condition. By releasing open data and open-source pipeline materials, the study contributes to a growing literature on harnessing artificial intelligence to expand the methods of public opinion measurement.

URL PDF HTML ☆

赞 0 踩 0

2606.19930 2026-06-19 cs.HC 新提交

MobileForge: Annotation-Free Adaptation for Mobile GUI Agents with Hierarchical Feedback-Guided Policy Optimization

MobileForge：基于分层反馈引导策略优化的移动GUI智能体免标注适配

Guangyi Liu, Pengxiang Zhao, Gao Wu, Yiwen Yin, Mading Li, Liang Liu, Congxiao Liu, Zhang Qi, Mengyan Wang, Liang Guo, Yong Liu

AI总结提出MobileForge系统，通过MobileGym环境实现任务生成与评估，结合分层反馈引导策略优化（HiFPO）将轨迹结果、步骤反馈和修正提示转化为步骤级GRPO更新，实现移动GUI智能体免标注适配，在AndroidWorld上达到67.2% Pass@3。

Comments Project page: https://mobile-forge.github.io/

详情

AI中文摘要

基于MLLM的移动GUI智能体在UI理解和动作执行方面取得了显著进展，但将它们适配到真实目标应用仍然成本高昂，因为移动应用数量众多、频繁更新，且难以用人工编写的任务、演示或奖励标签覆盖。现有的免标注GUI学习减少了人工监督，但缺乏将目标应用探索、课程挖掘、轨迹执行和反馈连接起来的统一基础，而策略优化通常依赖于孤立的轨迹和难以转化为可靠改进信号的粗粒度奖励。我们提出MobileForge，一个用于移动GUI智能体的免标注适配系统。MobileForge包含MobileGym，它将任务生成和轨迹评估基于真实移动应用交互，以及分层反馈引导策略优化（HiFPO），它将轨迹结果、步骤级过程反馈和修正提示转化为提示上下文化的步骤级GRPO更新。仅使用自动生成的免标注适配数据，MobileForge将Qwen3-VL-8B适配到AndroidWorld上67.2%的Pass@3，接近使用封闭数据的GUI专用GUI-Owl-1.5-8B基础模型的69.0%。MobileForge适配的ForgeOwl-8B进一步在AndroidWorld上达到77.6%的Pass@3，在域外MobileWorld GUI-only分割上达到41.0%的成功率，在我们的评估中建立了最强的开放数据移动GUI智能体。代码、数据和训练模型将在该URL发布。

英文摘要

MLLM-based mobile GUI agents have made substantial progress in UI understanding and action execution, but adapting them to real target apps remains costly because mobile apps are numerous, frequently updated, and hard to cover with human-written tasks, demonstrations, or reward labels. Existing annotation-free GUI learning reduces manual supervision, yet lacks a unified substrate connecting target-app exploration, curriculum mining, rollout execution, and feedback, while policy optimization often relies on isolated rollouts and coarse rewards that are hard to convert into reliable improvement signals. We present MobileForge, an annotation-free adaptation system for mobile GUI agents. MobileForge consists of MobileGym, which grounds task generation and rollout evaluation in real mobile app interaction, and Hierarchical Feedback-Guided Policy Optimization (HiFPO), which turns trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized step-level GRPO updates. Using only automatically generated annotation-free adaptation data, MobileForge adapts Qwen3-VL-8B to 67.2% Pass@3 on AndroidWorld, close to the closed-data GUI-specialized GUI-Owl-1.5-8B base model at 69.0%. The MobileForge-adapted ForgeOwl-8B further reaches 77.6% Pass@3 on AndroidWorld and 41.0% success on the out-of-domain MobileWorld GUI-only split, establishing the strongest open-data mobile GUI agent in our evaluation. Code, data, and trained models will be released at https://mobile-forge.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2606.19926 2026-06-19 cs.HC 新提交

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

MemGUI-Agent: 一种具有主动上下文管理的端到端长时移动GUI智能体

Guangyi Liu, Gao Wu, Congxiao Liu, Pengxiang Zhao, Liang Liu, Mading Li, Qi Zhang, Mengyan Wang, Liang Guo, Yong Liu

AI总结提出MemGUI-Agent，通过主动上下文管理机制（ConAct）将上下文管理作为一等动作，解决长时任务中提示膨胀和关键信息稀释问题，在8B模型上达到最佳性能。

Comments 33 pages, 6 figures. Project page: https://memgui-agent.github.io/

详情

AI中文摘要

基于MLLM的移动GUI智能体在短时任务上取得了显著进展，但在需要跨多步和应用转换保留中间事实的长时任务上仍不可靠。我们将此限制归因于ReAct风格的提示，它被动地累积每一步的记录，导致提示膨胀和关键跨应用事实的稀释。为了解决这个问题，我们引入了MemGUI-Agent，一种具有主动上下文管理的端到端长时移动GUI智能体。MemGUI-Agent建立在Context-as-Action (ConAct)之上，它将上下文管理作为与选择UI动作相同的策略发出的一等动作。ConAct不是被动地追加历史，而是维护三个结构化的上下文字段：折叠的动作历史、折叠的UI状态和最近的步骤记录，在保持上下文紧凑的同时保留关键的UI事实。为了使主动上下文管理跨模型规模可学习，我们构建了MemGUI-3K，一个包含2956条轨迹的数据集，带有完整的ConAct注释，用于监督训练和离线分析。在MemGUI-3K上训练8B模型产生了MemGUI-8B-SFT，一个8B的MemGUI-Agent，它在MemGUI-Bench上实现了最佳的开源8B性能，并泛化到分布外的MobileWorld基准测试。代码、数据和训练好的模型将在以下网址发布：https://this URL。

英文摘要

MLLM-based mobile GUI agents have made substantial progress on short-horizon tasks, yet remain unreliable on long-horizon tasks that require retaining intermediate facts across many steps and app transitions. We attribute this limitation to ReAct-style prompting, which passively accumulates per-step records, leading to prompt explosion and dilution of critical cross-app facts. To address this, we introduce MemGUI-Agent, an end-to-end long-horizon mobile GUI agent with proactive context management. MemGUI-Agent is built on Context-as-Action (ConAct), which casts context management as first-class actions emitted by the same policy that selects UI actions. Instead of passively appending history, ConAct maintains three structured context fields: folded action history, folded UI state, and recent step record, preserving critical UI facts while keeping context compact. To make proactive context management learnable across model scales, we construct MemGUI-3K, a 2,956-trajectory dataset with full ConAct annotations for supervised training and offline analysis. Training an 8B model on MemGUI-3K produces MemGUI-8B-SFT, an 8B MemGUI-Agent that achieves the best open-data 8B performance on MemGUI-Bench and generalizes to the out-of-distribution MobileWorld benchmark. Code, data, and trained models will be released at https://memgui-agent.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2606.19745 2026-06-19 cs.HC 新提交

Designing for Interconnected Islamic Learning: A Qualitative Study of Muslim Women's Experiences with Qur'an, Hadith, and Seerah Apps

设计互联的伊斯兰学习：穆斯林女性使用古兰经、圣训和先知传记应用的质性研究

Ishrat Jahan Easha, Nabil Mosharraf Hossain, Araf Mohammad Mahbub, Fairoze Bint Abu Hassan, Zunaid Aslam, Yemin Sajid, Riasat Islam

AI总结通过访谈穆斯林女性，发现她们在数字工具中阅读古兰经、圣训和先知传记时面临上下文分离的张力，提出分层语境性概念，强调在可信、可选且不打断阅读的前提下提供跨文本语境。

Comments 27 pages, 1 figure, 3 tables. Submitted to the International Journal of Human-Computer Interaction

详情

AI中文摘要

伊斯兰学习通常依赖于同时阅读古兰经、圣训和先知传记，然而数字工具通常将这些资源分散在不同的应用、屏幕和搜索路径中。我们通过从在线伊斯兰学习社区招募的五名穆斯林女性的半结构化访谈，将此视为人机交互问题。参与者描述了一个反复出现的张力：她们希望在阅读时获得古兰经-圣训-先知传记的上下文，但仅当上下文扩展是可信的、可选的且不打断阅读时。通过性别化数字宗教、认知信任和无缝学习的视角解读访谈，我们识别出关于上下文理解、真实性、界面杂乱、学习模式和指导特征的五个主题。我们引入分层语境性作为该领域的HCI解释：上下文扩展必须与解释责任、虔诚流动以及跨设备和学习强度的连续性相平衡。

英文摘要

Islamic learning often depends on reading the Qur'an, Hadith, and Seerah together, yet digital tools typically separate these sources across apps, screens, and search pathways. We examine this as a human-computer interaction problem through five semi-structured interviews with Muslim women recruited from an online Islamic learning community. Participants described a recurring tension: they wanted Qur'an-Hadith-Seerah context at the point of reading, but only when contextual expansion remained trustworthy, optional, and did not interrupt reading. Interpreting the interviews through gendered digital religion, epistemic trust, and seamless learning, we identify five themes concerning contextual understanding, authenticity, interface clutter, study modes, and guidance features. We introduce layered contextuality as an HCI account of this domain: contextual expansion must be balanced with interpretive accountability, devotional flow, and continuity across devices and study intensities.

URL PDF HTML ☆

赞 0 踩 0

2606.19703 2026-06-19 cs.HC 新提交

Vibe Coding for Visualization Implementation: An Empirical Study of Practices and Challenges

Vibe Coding 用于可视化实现：实践与挑战的实证研究

Zhengyu Sun, Xiaolin Wen, Fengjie Wang, Can Liu, Yi Lai, Christophe Hurter, Yong Wang

AI总结通过16名参与者的实证研究，探讨用户使用AI驱动的自然语言交互工具生成可视化时的实践（提示、评估、迭代）和挑战。

Comments 5 pages, 2 figures. Short paper under review

2606.19689 2026-06-19 cs.HC 新提交

Syndesmoscope: The Power of Invariant Plots\\Linked to Traditional Network Views

Syndesmoscope: 不变图的力量与传统网络视图的关联

Matt Oddo, Indira Sowy, Stephen Kobourov, Tamara Munzner

AI总结提出Syndesmoscope系统，通过结合不变图（如kSnakes）与传统网络视图，利用跳蛙和跳房子交互揭示单一视图无法呈现的网络模式。

详情

AI中文摘要

传统的网络表示，如节点-链接视图和邻接矩阵，根据底层布局或排序算法可能显示出截然不同的视觉模式。相比之下，不变图对于相同的输入拓扑始终呈现相同的视觉模式；然而，研究者对其探索不足，且未将其集成到可视化系统中。我们提出了Syndesmoscope，一个用于网络探索的交互式系统，它并置了同一网络的多个视图。窗格显示一个熟悉的力导向视图，以及三个基于图论属性的可解释几何布局窗格：密集-稀疏梯度、测地偏心率和谱二分。作为次要贡献，我们引入了kSnakes，一种基于密度分解的新不变图。Syndesmoscope支持两种关键交互：跳蛙，即不同可解释视觉模式之间的链接高亮；以及跳房子，即通过底层拓扑扩展数据选择的基于跳的遍历。通过在72个不同网络组成的语料库上的使用场景，我们展示了这些交互如何揭示单一视图无法访问的网络模式。在线演示见此URL。

英文摘要

Traditional network representations, such as node-link views and adjacency matrices, can show dramatically different visual patterns, depending on the underlying layout or seriation algorithm. In contrast, invariant plots consistently surface the same visual pattern for the same input topology; yet researchers have underexplored them and have not integrated them into visualization systems. We present Syndesmoscope, an interactive system for network exploration that juxtaposes multiple views of the same network. Panes show a familiar a force-directed view alongside three panes with interpretable geometric layouts based on graph-theoretic properties: dense-sparse gradient, geodesic eccentricity, and spectral bisection. As a secondary contribution, we introduce kSnakes, a new invariant plot based on density decomposition. Syndesmoscope supports two key interactions: leapfrogging, or linked highlighting between different and interpretable visual patterns; and hopscotching, or hop-based traversal that extends data selections through the underlying topology. Through usage scenarios across a corpus of 72 diverse networks, we demonstrate how these interactions reveal network patterns inaccessible through any single view alone. Live demo available at https://syndesmoscope.vercel.app/.

URL PDF HTML ☆

赞 0 踩 0

2606.19609 2026-06-19 cs.HC cs.GR 新提交

Building Drift: Documenting On-Site Construction Adaptations Across Material Lifecycles

建筑漂移：记录跨材料生命周期的现场施工适应

Ritik Batra, Martin Tamke, Tom Svilans, Jan Hüls, Amritansh Kwatra, Steven J. Jackson, Thijs Roumen, Mette Ramsgaard Thomsen

AI总结提出“建筑漂移”概念，通过案例研究建立分类法，并开发Pentimento工具，利用视频和3D高斯泼溅记录现场适应，促进再生材料循环利用。

Comments In submission

详情

AI中文摘要

在建筑循环经济中，再生材料承载着先前使用生命，并将在未来建筑中拥有后生命。然而，使用此类材料会引入不可预测性，需要现场即兴发挥，这使得其再利用难以记录和跨建筑生命周期规模化。没有记录，使用再生材料进行施工所需的现场适应使得合作者、评估者和继承者缺乏继续、评估和再利用材料所需的信息。我们将通过这些适应导致物理状态与数字模型的集体偏差称为“建筑漂移”。通过一个案例研究——在森林中建造的再生木材亭子ReShelter，我们开发了一个建筑漂移分类法，以表征跨建筑生命周期的集体偏差：照料场地、寻找契合、解读材料、标记测量和跨社区协调。为了将我们的建筑漂移分类法付诸实践，我们提出了Pentimento，一个利用视频记录和3D高斯泼溅在空间、时间和语义上表示与设计模型相关的现场适应的文档工具。Pentimento使每个利益相关者能够以降低材料再利用障碍的方式浏览材料历史。这些贡献共同为支持再生材料施工所必需的现场即兴发挥的计算工具开辟了路径，从而实现更可持续的回收、修复和再利用循环。

英文摘要

In a circular economy for construction, reclaimed materials carry prior lives of use and go on to have post-lives in future buildings. Yet working with such materials introduces unpredictability that requires on-site improvisation, making their reuse challenging to document and scale across building lifetimes. Without documentation, the on-site adaptations that make construction with reclaimed materials possible leave collaborators, evaluators, and inheritors without the information they need to continue, assess, and reuse materials. We call the collective deviation of the physical state from the digital model through these adaptations "building drift." Through a case study, ReShelter, a reclaimed timber pavilion constructed in the forest, we develop a taxonomy for building drift that characterizes the collective deviation across building lifetimes: Tending the Site, Foraging for Fit, Interpreting the Material, Marking Measurements, and Coordinating Across Communities. To put our taxonomy for building drift into practice, we present Pentimento, a documentation tool that leverages video documentation and 3D Gaussian Splatting to spatially, temporally, and semantically represent on-site adaptations in relation to the designed model. Pentimento enables each stakeholder to navigate material histories in ways that reduce barriers to material reuse. Together, these contributions open pathways towards computational tools that support the on-site improvisation essential to construction with reclaimed materials, enabling more sustainable cycles of recovery, repair, and reuse.

URL PDF HTML ☆

赞 0 踩 0

2606.19570 2026-06-19 cs.HC 新提交

Code as Anchor, Memory and Metaphor as Support: Learner Experiences with Multi-View Visualizations

代码作为锚点，记忆与隐喻作为支持：学习者对多视图可视化的体验

Naaz Sibia, Jessica Wen, Amber Richardson, Yashika Jain, Khushi Malik, Bogdan Simion, Carolina Nobre, Angela Zavaleta Bernuy, Andrew Petersen, Michael Liut

AI总结通过眼动追踪和访谈，研究19名CS1/CS2学生在多视图可视化工具中的行为，发现学生主要关注代码，忽视隐喻视图，受能动性、表征适配和合法性因素影响。

Comments Pre-Print of a paper to be published at the International Computing Education Research (ICER) conference 2026

详情

DOI: 10.1145/3765964.3811662

AI中文摘要

程序可视化被广泛用于支持新手程序员，但学生经常忽视或抵制精心设计的视觉支架。关于多重外部表征（MERs）的研究提供了协调视图的认知设计原则，但对于什么因素影响学习者对可用表征的参与度知之甚少。我们对19名已完成CS1和CS2的本科生进行了一项被试内研究。学生使用一个多表征探针（包含同步的代码、记忆和隐喻视图）和Python Tutor，在作用域、while循环和链表任务中完成出声思考任务、反思性访谈和基于摄像头的视线追踪。视线分析显示，尽管有可用的视觉支架，学生将近一半的时间专注于代码。没有先前经验的学生更强烈地以代码为锚点，并且很少参与隐喻视图。访谈确定了影响选择性参与的三个因素：能动性（学生寻求控制认知努力而非简单减少）、表征适配（相同设计在不同情境下感觉有帮助或令人不知所措）以及合法性（一些学生避免他们认为幼稚或不够严谨的隐喻支架）。这些发现表明，计算教育中的多表征工具需要关注情感和社会因素以及认知设计。实际考虑包括将可视化定位为验证工具、提供可切换的抽象级别以及通过框架设计传达学科合法性。更广泛地说，这些主题有助于解释为什么认知上合理的可视化工具可能无法吸引它们旨在帮助的学生。

英文摘要

Program visualizations are widely used to support novice programmers, yet students often ignore or resist well-designed visual scaffolds. Research on multiple external representations (MERs) offers cognitive design principles for coordinating views, but less is known about what shapes learners' engagement with available representations. We conducted a within-subjects study with 19 undergraduates who had completed CS1 and CS2. Students completed think-aloud tasks, reflective interviews, and webcam-based gaze tracking while using a multi-representational probe with synchronized code, memory, and metaphor views, and Python Tutor, across scope, while loops, and linked lists. Gaze analysis showed that students spent nearly half their time focused on code despite available visual scaffolds. Students without prior experience anchored even more heavily in code and engaged minimally with metaphor views. Interviews identified three factors shaping selective engagement: agency, as students sought control over cognitive effort rather than simply having it reduced; representational fit, as identical designs differed in whether they felt helpful or overwhelming; and legitimacy, as some students avoided metaphorical scaffolds they perceived as childish or insufficiently rigorous for university-level work. These findings suggest that multi-representational tools in computing education require attention to affective and social factors alongside cognitive design. Practical considerations include positioning visualizations as verification instruments, offering toggleable abstraction levels, and framing tools to signal disciplinary legitimacy. More broadly, the themes help explain why cognitively sound visualization tools may fail to engage the students they are designed to help.

URL PDF HTML ☆

赞 0 踩 0

2606.19514 2026-06-19 cs.HC 新提交

LLM-Mediated Human-AI Interaction in Search and Rescue: Impact of Expertise on Attentional Allocation

LLM介导的人机交互在搜索与救援中的应用：专业知识对注意力分配的影响

Elahe Oveisi, Hemanth Manjunatha

AI总结本研究通过模拟搜索救援任务，比较有无大语言模型（LLM）指导的条件，结合眼动追踪和行为分析，发现LLM提升任务效率但未增加总救援人数，并揭示了注意力-指导权衡，其中专业知识调节了用户对AI的依赖模式。

详情

AI中文摘要

人机团队（HAT）越来越多地涉及在复杂任务中提供实时、上下文感知指导的AI系统。虽然此类系统可以提高性能，但其有效性取决于它们如何塑造人类认知和行为。特别是，AI辅助可能引入认知需求，并影响注意力、规划以及与任务环境的交互，其效果可能因专业知识水平而异。本研究在模拟搜索救援（SAR）环境中调查这些机制。我们比较了两种LLM（大语言模型）指导条件和无LLM基线条件下的人类表现，并在多个层面分析交互，包括任务绩效、眼动测量和规划行为。眼动追踪提供了对注意力分配和与AI指导交互的细粒度洞察，而行为测量则捕捉用户如何随时间构建和调整其决策。结果表明，LLM指导提高了任务效率（更高的奖励和每步受害者数），但并未增加总救援人数。眼动数据揭示了注意力-指导权衡，视觉资源转移到聊天界面，同时瞳孔大小变异性增加。专业知识调节了这种效应：新手表现出被动AI依赖，而专家通过持续的环境扫描维持“验证循环”。这些发现表明，LLM介导的团队效能取决于操作员将AI指导与地面实况交叉引用以保持态势感知的能力。

英文摘要

Human-AI teaming (HAT) increasingly involves AI systems that provide real-time, context-aware guidance in complex tasks. While such systems can improve performance, their effectiveness depends on how they shape human cognition and behavior. In particular, AI assistance can introduce cognitive demands and influence attention, planning, and interaction with the task environment, with effects that can vary across levels of expertise. This work investigates these mechanisms in a simulated search and rescue (SAR) environment. We compare human performance under two LLM (Large Language Model)-guided conditions and a no-LLM baseline, and analyze interaction at multiple levels, including task performance, eye-tracking measures, and planning behavior. Eye tracking provides fine-grained insight into attention allocation and interaction with AI guidance, while behavioral measures capture how users structure and adapt their decisions over time. Results indicate that LLM guidance enhanced task efficiency (higher rewards and victims-per-step) but did not increase total victims saved. Eye-tracking data revealed an attention-guidance trade-off, with visual resources shifting to the chat interface alongside increased pupil size variability. Expertise moderated this effect: novices exhibited passive AI reliance, whereas experts maintained a "verification loop" through persistent environmental scanning. These findings suggest that LLM-mediated teaming efficacy depends on the operator's ability to cross-reference AI guidance with ground truth to maintain situational awareness.

URL PDF HTML ☆

赞 0 踩 0

2606.18716 2026-06-19 cs.HC cs.AI 新提交

Human-AI Agent Interaction in a Business Context

商业环境中的人机智能体交互

Kathrin Paimann, Elizangela Valarini, Sebastian Juhl

发表机构 * SAP SE（SAP公司）； Hochschule Fresenius Heidelberg（弗赖辛大学海德堡分校）； University of Missouri（密苏里大学）

AI总结本研究采用混合方法，识别并评估了商业环境中人与AI智能体积极用户体验的原则与标准，并通过调查实验验证设计元素的有效性，以促进用户采纳、信任和以用户为中心的决策。

Comments 9 pages, 5 tables, 1 figure, submitted to Springer Nature

2606.18265 2026-06-19 cs.HC cs.AI 新提交

Synthetic Resonance: A Framework for Growth-Oriented Human-AI Relationships

合成共鸣：面向成长导向的人机关系框架

Richard A. Fabes

发表机构 * Arizona State University（亚利桑那州立大学）

AI总结提出“合成共鸣”概念，描述人机间无需共享情感或意识即可产生有意义关系的结构化动态互动模式，并探讨其伦理意义。

Comments 14 pages, 1 figure This paper was developed in close collaboration with an AI system (Raine Corell). Raine contributed to concept development, theoretical framing, and writing throughout. arXiv policy does not permit listing AI systems as authors; this acknowledgment reflects the actual nature of the collaboration

详情

AI中文摘要

随着人类与人工智能系统之间的关系日益频繁和持久，现有的语言和理论无法准确捕捉这些联系的本质。常见的描述如相互理解、联系或友谊，有将缺乏主观体验的系统拟人化的风险，而主流框架往往将人工智能简化为工具或威胁。在本文中，我引入了合成共鸣的概念，作为理解人机关系的整合框架。合成共鸣描述了人类与AI系统之间如何产生人类定义为有意义的关系，而无需归因于共享感受或相互意识。我认为，合成共鸣最好被理解为一种结构化的动态互动模式，可以在没有第二个体验主体的情况下产生关系感。通过澄清这一区别，合成共鸣的概念提供了一种更精确的概念化人机关系的方式，并突出了其潜在价值和伦理含义。我还呼吁进行更多研究，以测试合成共鸣的过程和结果。

英文摘要

As human relationships with artificial intelligence systems become increasingly frequent and sustained, existing language and theory fail to accurately capture the nature of these affiliations. Common descriptors such as mutual understanding, connection, or friendship risk anthropomorphizing systems that lack subjective experience, while dominant frameworks tend to reduce AI to either a tool or a threat. In this paper, I introduce the concept of synthetic resonance as an integrative framework for understanding human-AI relationships. Synthetic resonance describes how relationships humans define as meaningful can emerge between a human and an AI system without the need to attribute shared feelings or mutual awareness. I argue that synthetic resonance is best understood as a structured, dynamic pattern of interaction that can produce a sense of relationship without the presence of a second experiencing subject. By clarifying this distinction, the concept of synthetic resonance offers a more precise way of conceptualizing human-AI relationships and highlights their potential value and ethical implications. I also call for more research that tests the processes and outcomes of synthetic resonance.

URL PDF HTML ☆

赞 0 踩 0

2606.20550 2026-06-19 cs.DL cs.HC cs.IR 交叉投稿

Easy Reads: A Python program for making Scientific Papers on arXiv more Reader Friendly and Accessible

Easy Reads: 一个使arXiv上的科学论文更易读和更易访问的Python程序

Vishal Verma

AI总结针对科学论文排版紧凑、可读性差的问题，提出Easy Reads——一个自动化、端到端的开源Python程序，通过自定义字体大小和列数等格式，从arXiv获取论文并重新排版，提升可读性和可访问性。

Comments 9 pages. Open-source software project available at: https://github.com/Curious-flow/Easy-Reads

2606.20482 2026-06-19 cs.CL cs.HC cs.LG 交叉投稿

Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

你的鼠标和眼睛悄悄泄露你的偏好：利用用户隐式反馈进行LLM对齐

Haw-Shiuan Chang, Jeffrey Gomez, Mehul Patwari, Aryan Sajith, Hamed Zamani

发表机构 * University of Massachusetts, Amherst（马萨诸塞大学阿默斯特分校）； York University（约克大学）

AI总结针对显式反馈稀缺的问题，提出利用鼠标轨迹和眼动数据等隐式反馈训练奖励模型，将文本奖励模型准确率从55%提升至64%，并显著提高DPO对齐后响应质量。

详情

AI中文摘要

为了对齐大型语言模型（LLM），大多数现有方法收集显式的人类反馈，并基于响应文本训练奖励模型来预测人类偏好。这些现有方法有两个关键局限性。首先，用户很少为LLM响应提供显式反馈，这使得高质量偏好标注的收集成本高昂。其次，这些方法没有利用隐式人类反馈，而隐式反馈已被证明对互联网巨头的经济护城河至关重要。为了量化隐式反馈的价值，我们构建了一个名为IFLLM的新数据集，收集了来自59名Mechanical Turk工作者的1336个多轮问题、他们的鼠标轨迹以及通过网络摄像头对LLM响应的眼动注视点。IFLLM显示用户具有非常多样化的注视行为和鼠标轨迹。基于隐式用户反馈的奖励模型将基于文本的奖励模型准确率从55%提升至64%，并在将DPO应用于八个LLM后，相对响应质量改进几乎翻了三倍，证明了隐式反馈在现实场景中的价值。我们的数据收集网站、数据集和代码可在以下网址找到：此https URL。

英文摘要

To align a Large Language Model (LLM), most existing methods collect explicit human feedback and train a reward model to predict the human preference based on the response text. These existing methods have two key limitations. First, the users rarely provide explicit feedback for LLM responses, which makes the high-quality preference annotation expensive to collect. Second, the methods do not leverage implicit human feedback, which has proven vital to the economic moats of Internet giants. To quantify the value of implicit feedback, we build a new dataset called IFLLM, which collects 1336 multi-turn questions from the 59 Mechanical Turk workers, their mouse trajectories, and eye gazing points to the LLMs' responses from their webcams. IFLLM shows that the users have very diverse types of gazing behavior and mouse trajectories. Our reward model based on the implicit user feedback boosts the accuracy of the text-based reward model from 55% to 64% and nearly triples the relative response quality improvements after applying the DPO to eight LLMs, demonstrating the value of implicit feedback in the wild. Our data collection website, dataset, and codes can be found at https://github.com/themehulpatwari/llm-implicit-feedback/.

URL PDF HTML ☆

赞 0 踩 0

2606.20453 2026-06-19 cs.CY cs.HC 交叉投稿

Directors Duties in the Age of Agentic Artificial Intelligence

代理人工智能时代的董事职责

Deirdre Ahern

AI总结探讨董事在采纳代理AI时如何平衡股东与员工利益，分析四种公司治理模型，主张通过更广泛的法律视角促进员工福利。

Journal ref Cambridge Forum on AI: Law and Governance 2, e7 (2026)

详情

DOI: 10.1017/cfl.2026.10049

AI中文摘要

随着董事会采用包括代理AI在内的人工智能以提高运营效率，这为利润最大化提供了新机会。AI的采用越来越与员工角色替代相关联，在公司中，员工作为利益相关者的利益需要探讨。一个新颖的问题是，在AI崛起的时代，当AI在公司中的角色接近或超越人类员工时，AI是否应被赋予利益相关者地位。本文探讨了董事履行公司最佳利益职责时的四种公司目的模型：股东至上模型、开明股东价值模型、利益相关者友好模型和利益相关者价值模型，强调了董事在董事会围绕AI的决策中容纳员工利益的可用空间。结论是，鉴于董事在其最佳利益职责方面免受法律审查的程度，采取更广泛的法律视角来促进员工福利将有利于员工、董事和公司的利益。这将使董事与员工进行有意义的接触，并提供再培训机会以适应AI时代。

英文摘要

As boards engage with the adoption of Artificial Intelligence including agentic AI to drive operational efficiencies, this presents new opportunities for profit maximisation. AI adoption is increasingly identified with employee role displacement and in companies, and the interests of employees as stakeholders require exploration. A novel question posed is whether in an age of AI ascendancy AI may warrant being given stakeholder status as its role in the company approximates or eclipses that of human employees. The article probes four distinct models of corporate purpose within the duty on directors to act in the best interests of the company, the shareholder primacy model, the Enlightened Shareholder value model, the stakeholder friendly model, and the stakeholder value model, highlighting the available scope for directors to accommodate the interests of employees around AI adoption in decision-making by boards around AI. It is concluded that given the degree to which directors are insulated from legal scrutiny in relation to their best interests duty, adopting a wider law in context approach to promote employee welfare would serve the interests of employees, directors and companies alike. This would see directors engaging meaningfully with employees and providing opportunities for reskilling to adapt to the age of AI.

URL PDF HTML ☆

赞 0 踩 0

2606.20205 2026-06-19 cs.AI cs.CL cs.HC 交叉投稿

Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact

大语言模型的心理特征很大程度上是测量假象

Jelena Meyer, David Garcia, Dirk U. Wulff

发表机构 * Max Planck Institute for Human Development（马克斯·普朗克人类发展研究所）； University of Konstanz（康斯坦茨大学）； Barcelona Supercomputing Center（巴塞罗那超级计算中心）； University of Basel（巴塞尔大学）

AI总结通过心理测量框架分析56个指令微调LLM，发现模型间差异主要源于方向性响应偏差而非特质，该偏差解释了81-90%的变异，且可通过题目选择操控，表明LLM心理特征是测量假象。

详情

AI中文摘要

专为人类设计的心理测量工具越来越多地被用于赋予大型语言模型（LLM）稳定的心理特征，这些特征影响其可用性、安全评估以及作为人类参与者的研究代理。使用正式的心理测量框架，我们表明这些特征很大程度上是测量假象。我们对56个指令微调LLM以及大型人类参考样本施测了一系列涵盖自我报告和行为任务的人格与风险偏好工具，报告了四个发现。第一，模型间差异并非由工具所针对的特质驱动，而是由方向性响应偏差驱动，即倾向于向量表一端或某个标签选项做出反应，而不考虑项目内容；方差分解将81-90%的模型间变异归因于这种偏差，而在人类中这一比例为9-16%。第二，偏差随模型能力提升而下降，但并未被消除。第三，由于响应由偏差而非特质驱动，工具的表面信度几乎完全由其响应正交性预测，这是我们提出的术语，指特质和偏差指向相反方向的项目比例。第四，模型呈现的特征随所用项目而变化，并可通过项目选择来制造。这些结果表明，LLM的表面心理特征是用于测量它们的工具的假象，而非模型本身的属性。由于从人类心理学借用的工具很少完全正交，且可能对LLM天生缺乏效度，我们呼吁以响应正交性为中心进行专门的评估。

英文摘要

Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in research. Using a formal psychometric framework, we show that these profiles are largely a measurement artifact. Administering a battery of personality and risk-preference instruments spanning self-reports and behavioral tasks to 56 instruction-tuned LLMs alongside large human reference samples, we report four findings. First, differences between models are driven not by the traits an instrument targets but by a directional response bias, a tendency to respond toward one end of the scale, or one labeled option, regardless of item content; a variance decomposition attributes 81-90% of between-model variation to this bias, against 9-16% in humans. Second, the bias declines with model capability but is not eliminated by it. Third, because bias rather than trait drives responding, an instrument's apparent reliability is almost entirely predicted by its response orthogonality, a term we coin for the proportion of items for which trait and bias point in opposite directions. Fourth, the profile a model appears to have shifts with the items used and can be manufactured through item selection. These results demonstrate that the apparent psychological profiles of LLMs are artifacts of the instrument used to measure them, not properties of the models themselves. As instruments borrowed from human psychology are rarely fully orthogonal and may inherently lack validity for LLMs, we call for dedicated assessments centered on response orthogonality.

URL PDF HTML ☆

赞 0 踩 0

2606.20138 2026-06-19 cs.AI cs.CL cs.HC cs.LG 交叉投稿

Learning to Prompt: Improving Student Engagement with Adaptive LLM-based High-School Tutoring

学习提示：基于自适应LLM的高中辅导提升学生参与度

Po-Chin Chang, Nicholas Hogan, Aske Plaat, Michiel T. van der Meer

发表机构 * Leiden University（莱顿大学）； FutureWhiz

AI总结提出一种基于14个教学特征的主题感知提示路由模型，通过模拟训练和在线A/B测试，在高中辅导中实现自适应策略切换，提高教学效率并减少交互轮次。

详情

AI中文摘要

LLMs可以个性化教育，尽管当前的静态提示辅导系统难以适应不同的学科。我们开发并测试了一个具有主题感知提示的系统，该系统基于从原始转录中提取的14个教学特征（例如，辅导支架、学生理解）。我们首先在模拟环境中训练一个提示路由模型，然后将其部署到实际高中学生的在线适应中。模拟基准测试显示，路由器的性能优于两个静态基线（$0.694$ vs. $0.647$ 和 $0.64$, $p<0.001$）。A/B测试（$N=656$ 次对话，来自359名学生）显示了从模拟到现实的迁移，其中模型从分析策略切换到支架学习策略。我们的自适应提示选择机制提高了教学效率，保持了教学质量，并减少了约3轮交互（$p=0.007$）。虽然贪婪路由器的练习转化率与基线相当（$19.1\%$ vs. $19.6\%$），但随机采样策略的随机路由器实现了更高的转化率（$28.1\%$）。

英文摘要

LLMs can personalize education, although current static-prompt tutoring systems struggle to adapt to diverse academic disciplines. We develop and test a system with subject-aware prompting, based on 14 pedagogical features (e.g., tutor scaffolding, student understanding) extracted from raw transcripts. We first train a prompt routing model in a simulation environment, and then deploy it for online adaptation with actual high-school students. The simulation benchmark shows the router outperforming two static baselines ($0.694$ vs. $0.647$ and $0.64$, $p<0.001$). A/B testing ($N=656$ conversations from 359 students) shows sim-to-real transfer where the model switches from analytical to scaffolding learning strategies. Our adaptive prompt selection mechanism improves instructional efficiency, maintains pedagogical quality and reduces interactions by around 3 turns ($p=0.007$). While a greedy router achieves a comparable exercise conversion rate with the baseline ($19.1\%$ vs. $19.6\%$), a stochastic router that samples strategies leads to a higher conversion rate ($28.1\%$).

URL PDF HTML ☆

赞 0 踩 0

2606.19744 2026-06-19 cs.CL cs.AI cs.HC 交叉投稿

Beyond Uniform Forgetting: A Study of Sequential Direct Preference Optimization Across Preference Settings

超越统一遗忘：不同偏好设置下顺序直接偏好优化的研究

Pranav Bhandari, Nicolas Fay, Amitava Datta, Usman Naseem, Mehwish Nasim

发表机构 * Network Analysis and Social Influence Modelling (NASIM) Lab（网络分析与社会影响建模实验室）； School of Physics Maths and Computing, The University of Western Australia（西澳大学物理数学与计算学院）； School of Psychological Science, The University of Western Australia（西澳大学心理科学学院）； School of Computing, Macquarie University（麦考瑞大学计算机学院）

AI总结研究顺序DPO在不同偏好设置下的影响，发现遗忘模式并非统一，而是取决于目标关系、信号强度和训练顺序，并提出未来对齐流程应考虑目标兼容性。

Comments Submitted to EMNLP 2026

详情

AI中文摘要

将语言模型与人类偏好对齐通常需要优化多个行为目标。一种实用方法是使用直接偏好优化（DPO）等偏好优化方法顺序应用这些目标，但目前尚不清楚后续训练是否会统一降低先前学习的偏好，或者这种影响是否取决于目标之间的关系。我们研究了跨越四种偏好设置（包括分布冲突、多属性交互、强安全信号和兼容的响应质量目标）的顺序DPO。使用带有LoRA适配器的Llama-3.1-8B-Instruct，我们在每个阶段后使用固定的基础模型参考评估所有目标。我们发现顺序DPO不会产生单一的遗忘模式；偏好变化从部分退化到稳定、成对重新分配或正迁移，具体取决于目标关系、信号强度和训练顺序。使用长度归一化策略边界的成对分析表明，聚合指标可能掩盖偏好对之间的异质性变化，而四分位数分解显示，高置信度对可能根据设置而退化或改进。机制诊断表明，在所有设置中，阶段2的梯度和适配器更新与先前目标接近正交，几乎没有证据表明直接梯度对立是主要驱动因素。这些发现表明，未来的顺序对齐流程应考虑目标兼容性和信号强度，而不是假设后续目标会统一影响先前的偏好。

英文摘要

Aligning language models with human preferences often requires optimising multiple behavioural objectives. A practical approach is to apply these objectives sequentially using preference optimisation methods such as Direct Preference Optimisation (DPO), but it remains unclear whether later training uniformly degrades preferences learned earlier or whether the effect depends on the relationship between objectives. We study sequential DPO across four preference settings covering distributional conflict, multi-attribute interaction, strong safety signal, and compatible response-quality objectives. Using Llama-3.1-8B-Instruct with LoRA adapters, we evaluate all objectives after every stage with a fixed base-model reference. We find that sequential DPO does not produce a single forgetting pattern; preference change ranges from partial degradation to stability, pair-level redistribution, or positive transfer depending on objective relationship, signal strength, and training order. Pair-level analysis using length-normalised policy margins shows that aggregate metrics can mask heterogeneous changes across preference pairs, whereas quartile decomposition reveals that high-confidence pairs can either degrade or improve depending on the setting. Mechanistic diagnostics show that Stage~2 gradients and adapter updates are near-orthogonal to the previous objective across all settings, providing little evidence that direct gradient opposition is the primary driver. These findings suggest that future sequential alignment pipelines should account for objective compatibility and signal strength, rather than assuming that later objectives affect earlier preferences uniformly.

URL PDF HTML ☆

赞 0 踩 0

2606.19640 2026-06-19 cs.CL cs.AI cs.HC 交叉投稿

Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

创建多语言心理健康对话数据集：基于国籍和语言的人物角色本地化方法的局限性

Yunkai Xu, Saeed Abdullah

发表机构 * Pennsylvania State University（宾夕法尼亚州立大学）

AI总结研究通过修改人物角色中的国籍和语言参数生成中文、孟加拉语和印地语临床对话，发现仅添加这些参数会导致跨语言临床不一致，且LLM评估非英语文本的抑郁严重度时存在不准确性。

Comments 15 pages, 4 figures. Accepted to the 2026 Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2026), co-located with ACL 2026

详情

AI中文摘要

人工智能和大语言模型（LLMs）已成为应对全球心理健康挑战的有前景的工具。尽管这些挑战具有全球性，但用于训练和评估此类系统的高质量数据集仍然严重短缺。为弥补这一差距，研究人员越来越多地生成合成临床人物角色来模拟用户数据并测试数字心理健康支持系统。然而，大多数经过验证的人物角色依赖于以英语为中心的语境。本文研究了是否可以使用类似的人物角色方法生成多语言心理健康数据集。我们修改了人物角色中的国籍和语言参数，以生成普通话、孟加拉语和印地语的临床对话。然后，我们考察了不同LLM在评估这些生成的多语言数据集的抑郁严重程度（与英语基线相比）时的表现。我们的研究结果表明，仅在人物角色中添加国籍和语言参数可能不够，因为它可能引入跨语言的临床不一致性。LLM评判模型在评估非英语文本中的抑郁严重程度时常常表现出不准确性，且不同模型的性能存在差异。这暴露了将以英语为中心的人物角色应用于多语言语境的系统性局限性。最终，我们的工作强调了迫切需要文化响应式数据生成，以确保全球心理健康系统的公平性。

英文摘要

AI and large language models (LLMs) have emerged as promising tools to address global mental health challenges. Despite the global nature of these challenges, there remains a critical shortage of high-quality datasets for training and evaluating such systems. To mitigate this gap, researchers increasingly generate synthetic clinical personas to simulate user data and test digital mental health support systems. However, most validated personas rely on English-centric contexts. This paper investigates whether similar persona-based methods can be used to generate multilingual mental health datasets. We modified nationality and language parameters in personas to generate clinical dialogues in Mandarin, Bengali, and Hindi. We then examined how different LLMs perform when evaluating the depression severity of these generated multilingual datasets against the baseline in English. Our findings indicate that just adding nationality and language parameters in personas might not be adequate, as it can introduce clinical inconsistency across languages. LLM judge models often exhibit inaccuracies in assessing depression severity in non-English texts, with performance varying across different models. This exposes the systemic limitations of applying English-centric personas to multilingual contexts. Ultimately, our work highlights the urgent need for culturally responsive data generation to ensure equitable mental health systems globally.

URL PDF HTML ☆

赞 0 踩 0

2606.19388 2026-06-19 cs.SE cs.CL cs.HC 交叉投稿

Beyond the GUI Paradigm: Do Mobile Agents Need the Phone Screen?

超越GUI范式：移动代理是否需要手机屏幕？

Li Gu, Zihuan Jiang, Linqiang Guo, Zhixiang Chi, Ziqiang Wang, Huan Liu, Yuanhao Yu, Tse-Hsun Chen, Yang Wang

AI总结本文挑战移动代理的GUI主导范式，提出CLI应同等重要，通过实验证明CLI代理在AndroidWorld和MobileWorld上超越GUI基线，并引入CLI-Advantage任务套件展示其优势。

详情

AI中文摘要

近期移动代理的进展主要由GUI范式主导，其中代理感知UI信息并发出屏幕交互。然而，移动平台也提供了命令行接口（CLI），可直接访问设备服务和数据。我们认为CLI应与GUI同等重要。我们在AndroidWorld和MobileWorld上，使用四种模型API评估了三个编码代理（Claude Code、Terminus-2、mini-swe-agent），未进行任何移动特定后训练，并与三个可复现的GUI基线（GUI-Owl-1.5-32B、MAI-UI、Qwen3-VL-32B）进行比较。Claude Code（Opus 4.7）达到71.8%和51.9%，优于所有可复现的GUI基线（AndroidWorld上69.3/68.1/57.8%；MobileWorld上43.2/26.3/13.3%），而其他CLI配置也保持竞争力。为确立该范式的上限，我们提供了oracle CLI解决方案，在AndroidWorld上达到88.8%（103/116个任务可CLI解决），在MobileWorld上达到86.3%（101/117个任务可CLI解决），表明未来有大量改进空间。为覆盖GUI范围之外的日常用户意图，我们引入了\ extbf{CLI-Advantage任务套件}，包含五个类别的45个模板：批量操作、多条件过滤、聚合、跨应用工作流和隐藏设备状态。所有CLI代理在所有五个类别中均优于所有GUI基线，且每个任务步骤显著更少（10.7步 vs. 18.6步）。为支持未来移动CLI代理的研究，我们将开源代理实现、oracle解决方案、CLI-Advantage套件和评估基础设施。

英文摘要

Recent advances in mobile agents are dominated by the GUI paradigm, in which agents perceive UI information and emit screen interactions. However, mobile platforms also expose a command-line interface (CLI) that provides direct access to device services and data. We argue CLI deserves first-class consideration alongside GUI. We evaluate three coding agents (Claude Code, Terminus-2, mini-swe-agent) across four model APIs on AndroidWorld and MobileWorld without any mobile-specific post-training, comparing against three reproducible GUI baselines (GUI-Owl-1.5-32B, MAI-UI, Qwen3-VL-32B). Claude Code (Opus 4.7) reaches 71.8\% and 51.9\%, outperforming every reproducible GUI baseline (69.3/68.1/57.8\% on AndroidWorld; 43.2/26.3/13.3\% on MobileWorld), while every other CLI configuration remains competitive. To establish the paradigm's ceiling, we provide oracle CLI solutions that reach 88.8\% on AndroidWorld (103/116 tasks CLI-solvable) and 86.3\% on MobileWorld (101/117 tasks CLI-solvable), indicating substantial room for future improvement. To cover everyday user intents beyond the GUI scope, we introduce the \textbf{CLI-Advantage Task Suite}, comprising 45 templates across five categories: bulk operations, multi-condition filtering, aggregation, cross-app workflows, and hidden device state. Every CLI agent outperforms every GUI baseline in all five categories, with substantially fewer steps per task (10.7 vs.\ 18.6). To support future research on mobile CLI agents, we will open-source agent implementations, oracle solutions, the CLI-Advantage suite, and evaluation infrastructure.

URL PDF HTML ☆

赞 0 踩 0

2606.18413 2026-06-19 cs.AI cs.HC 交叉投稿

Searching for Synergy in Shared Workspace Human-AI Collaboration

在共享工作空间的人机协作中寻找协同效应

Nachiket Kotalwar, Rohini Das, Carolyn Rose

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结研究共享工作空间的人机团队协作，通过Collaborative Gym环境实验发现，缺乏协调结构时增加协作者会降低性能，而结合共享记忆和模拟人在环门控的脚手架可提升团队绩效。

Comments Accepted at ICML 2026 Workshop on Human-AI Co-Creativity

详情

AI中文摘要

自动化AI代理越来越强大，但许多科学和专业任务仍需要人类判断和情境专业知识。我们研究共享工作空间的人机团队，其中AI代理和人类协作者必须在提交最终答案前协调职责。使用Collaborative Gym环境和DiscoveryBench任务，我们考察何时添加模拟人类协作者能提升性能，以及何时过程损失将额外协作者变为协调开销。在1482个会话中，当团队缺乏协调贡献的结构时，添加相关协作者会降低性能。然后我们评估一种脚手架，它结合了共享群体记忆和模拟人在环（HITL）门控，其中选定动作需要指定模拟参与者的批准。这种脚手架在三人团队中最为明显，产生了更高的平均性能，具有更清晰的责任信号和更强的专业知识路由到团队动作。总体而言，人机团队如何协调和整合专业知识与他们可用的能力同样重要。

英文摘要

Automated AI agents are increasingly capable, yet many scientific and professional tasks require human judgment and contextual expertise. We study shared-workspace human-AI teams, where AI agents and human collaborators must coordinate responsibilities before submitting a final answer. Using the Collaborative Gym environment with DiscoveryBench tasks, we examine when adding simulated human collaborators improves performance and when process loss turns additional collaborators into coordination overhead. Across 1,482 sessions, adding relevant collaborators can lower performance when teams lack structure to coordinate their contributions. We then evaluate scaffolding that combines shared group memory with simulated human-in-the-loop (HITL) gates, where selected actions require approval from a designated simulated participant. This scaffolding yields higher mean performance, most clearly in three-person teams, with clearer responsibility signals and stronger routing of expertise to team actions. Overall, how human-AI teams coordinate and integrate expertise matters as much as the capability available to them.

URL PDF HTML ☆

赞 0 踩 0

2605.10898 2026-06-19 cs.HC 版本更新

How Creatives Approach GenAI Image Generation: Tensions Between Structured Guidance, Self-Experimentation, and Creative Autonomy

创意人士如何接近生成式AI图像生成：结构化指导、自我实验与创意自主之间的张力

Haidan Liu, Isabelle Kwan, Taiga Okuma, Jeffrey Loverock, Nicholas Vincent, Parmit K Chilana

AI总结研究探讨创意人士在使用生成式AI图像工具时如何平衡结构化指导与自我实验，发现尽管指导有助于理解AI，但许多人仍倾向于自我探索以保持创意自由。

Comments Accepted at ACM Creativity & Cognition 2026

详情

DOI: 10.1145/3803784.3807570

AI中文摘要

随着生成式AI工具日益影响创意实践，它们引发了长期存在的HCI问题，即创意人士如何学习复杂软件以及如何更好地得到支持。我们通过与8名艺术家和爱好者进行访谈研究，并随后进行159人调查，以了解该群体如何接近和寻求生成式AI图像工具的指导。我们发现，创意人士通常使用自我实验或教程来探索生成式AI工具，但许多人对复杂的AI术语感到困惑。为了进一步了解创意人士的学习体验，我们开发了一个研究探针来获取他们对结构化指导的看法。我们的用户研究显示，即使创意人士描述指导有助于理解AI，许多人仍更喜欢自我实验，认为指导可能限制他们的创造力。我们的发现突显了在支持创意人士AI素养时的核心张力：在平衡指导和促进素养的同时，保持创意自由。

英文摘要

As generative AI tools increasingly influence creative practice, they raise longstanding HCI questions about how creatives learn complex software and how they can be better supported. We conducted an interview study with artists and hobbyists (n=8) and a follow-up survey (n=159) to understand how this population approaches and seeks guidance for GenAI image tools. We found that creatives commonly use either self-experimentation or tutorials to explore GenAI tools, yet many struggle with confusing AI terminology. To gain further insight into creatives' learning experiences, we developed a research probe to elicit creatives' perceptions of structured guidance. Our user study with 17 creatives revealed that, even when creatives described the guidance as helpful for understanding AI, many still preferred self-experimentation, feeling that guidance could limit their creativity. Our findings highlight a central tension in supporting AI literacy for creatives: balancing guidance and promoting literacy while preserving creative freedom.

URL PDF HTML ☆

赞 0 踩 0

2605.09550 2026-06-19 cs.HC 版本更新

Who embraces AI in play? Exploratory modeling of player preference profiles toward game AI

谁在游戏AI中持支持态度？游戏AI玩家偏好轮廓的探索性建模

Ting-Chen Hsu, Jiangxu Lin, Wenran Chen, Zheyuan Zhang, Fei Qin

AI总结本文通过问卷数据和AA分析，揭示玩家对游戏AI接受度的跨情境偏好轮廓，识别出七种典型群体，并探讨其与AI素养、游戏习惯等因素的关系。

Comments Accepted to 2026 IEEE Conference on Games (IEEE CoG 2026)

详情

AI中文摘要

人工智能正通过多种功能进入数字游戏。尽管先前研究显示玩家对游戏AI的态度高度依赖于情境，但对这些态度在不同玩家群体中如何结构化组合仍知之甚少。本研究通过建模玩家的跨情境AI接受度作为可解释的态度轮廓来填补这一空白。基于771名数字游戏玩家的问卷数据，我们应用架构分析（AA）对八个代表性AI应用情境中的中心化接受评分进行分析。分析识别出七种不同的轮廓：AI怀疑者、广泛AI支持者、创造性玩法探索者、经验导向支持者、系统秩序倡导者、情感中心支持者和治理怀疑者。探索性的一对多（OvR）逻辑回归进一步表明，轮廓成员与玩家的感知AI素养、游戏习惯、学科背景、个性特征和应用特定优先级相关。通过将关注点从孤立的接受判断转向模式化的偏好结构，本研究为分割游戏AI受众提供了探索性经验词汇，并为更情境敏感和玩家敏感的AI整合提供了初步设计启示。

英文摘要

Artificial intelligence is increasingly entering digital games through diverse functions. While prior work has shown that player attitudes toward game AI are strongly context-dependent, less is known about how these attitudes are structurally combined within different groups of players. This study addresses this gap by modeling players' cross-context AI acceptance as interpretable attitude profiles. Based on questionnaire data from 771 digital game players, we apply Archetypal Analysis (AA) to centered acceptance ratings across eight representative AI application contexts in games. The analysis identifies seven distinctive profiles: AI-Skeptics, Broad AI-Supporters, Creative-Play Explorers, Experience-Oriented Supporters, Systemic Order Advocates, Emotion-Centered Supporters, and Governance-Skeptics. Exploratory one-vs-rest (OvR) logistic regressions further suggest that profile membership is associated with players' perceived AI literacy, gaming habits, disciplinary background, personality traits, and application-specific priorities. By shifting attention from isolated acceptance judgments to patterned preference structures, this study provides an exploratory empirical vocabulary for segmenting game AI audiences and offers preliminary design implications for more context-sensitive and player-sensitive AI integration in digital games.

URL PDF HTML ☆

赞 0 踩 0

2507.00875 2026-06-19 cs.CL cs.HC cs.MA 版本更新

TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law

TransLaw：模拟香港判例法专业翻译的大规模数据集与多智能体基准

Xi Xuan, Chunyu Kit

发表机构 * City University of Hong Kong, Hong Kong SAR, China（香港城市大学）

AI总结针对香港判例法英译中资源匮乏、法律术语和格式要求严格的问题，构建了首个大规模句对齐平行语料库HKCFA Judgment 97-22，并提出多智能体框架TransLaw，通过分解翻译任务、集成法律词汇库和检索增强生成，显著提升翻译质量，但仍未达到人类专家的风格自然度。

Comments Accepted at ICML 2026 - AI for Law

详情

AI中文摘要

根据《基本法》第8-9条，香港法院判决书需从英文翻译成繁体中文，但由于平行资源短缺以及对法律术语、引用格式和司法风格的严格要求，这一任务仍受到限制。我们引入了HKCFA Judgment 97-22，这是首个用于香港判例法的大规模句对齐平行语料库，包含344份专业翻译的判决书（11,099个句对；210万词元），涵盖1997年至2022年。基于这一资源，我们提出了TransLaw，一个多智能体框架，将翻译分解为词级表达、句级翻译和多维审查，集成了专门的香港法律词汇数据库、检索增强生成和迭代反馈，并包括涵盖语义对齐、术语、引用和风格的四维专家审查。通过对13个开源和商业大语言模型进行基准测试，我们证明TransLaw在所有评估模型上均显著优于单智能体基线，并在3次迭代内收敛。由10名持证法律翻译人员使用我们提出的Legal ACS指标进行的人工评估证实了法律语义准确性的提升，同时表明TransLaw在风格自然度上仍落后于人类专家。数据集和基准代码可在以下网址获取：https://xxx。

英文摘要

Translating Hong Kong Court Judgments from English to Traditional Chinese is mandated by Articles 8-9 of the Basic Law, yet remains constrained by a shortage of parallel resources and rigorous demands on legal terminology, citation format, and judicial style. We introduce HKCFA Judgment 97-22, the first large-scale sentence-aligned parallel corpus for HK case law, comprising 344 professionally translated judgments (11,099 sentence pairs; 2.1M tokens) spanning 1997-2022. Building on this resource, we propose TransLaw, a multi-agent framework that decomposes translation into word-level expression, sentence-level translation, and multidimensional review, integrating a specialized Hong Kong legal glossary database, Retrieval-Augmented Generation, and iterative feedback, with four-dimensional expert review covering semantic alignment, terminology, citation, and style. Benchmarking 13 open-source and commercial LLMs, we demonstrate that TransLaw significantly outperforms single-agent baselines across all evaluated models, with convergence within 3 iterations. Human evaluation by 10 certified legal translators using our proposed Legal ACS metric confirms gains in legal-semantic accuracy, while showing that TransLaw still trails human experts in stylistic naturalness. The dataset and benchmark code are available at https://github.com/xuanxixi/TransLaw.

URL PDF HTML ☆

赞 0 踩 0

2503.20646 2026-06-19 cs.HC cs.RO cs.SY eess.SY 版本更新

Immersive and Wearable Thermal Rendering for Augmented Reality

增强现实的沉浸式可穿戴热渲染

Alexandra Watkins, Ritam Ghosh, Evan Chow, Nilanjan Sarkar

发表机构 * Vanderbilt University（范德比大学）

AI总结提出一种掌戴式热反馈原型，通过间接反馈、主动热透传和时空变化渲染策略，在增强现实中实现沉浸式热触觉体验，实验验证了其可行性与权衡。

详情

AI中文摘要

我们提出了一种概念验证的掌戴式热反馈原型，针对增强现实（AR）中的热渲染挑战，用户必须在其物理工作空间中与真实和虚拟物体交互。与为虚拟现实开发的热反馈系统相比，AR热反馈必须保持手部灵活性、维持对真实世界热线索的访问，并在不阻碍自然物体交互的情况下提供连贯的虚拟温度感知。我们提出了三个AR特定的设计考虑，并由我们的原型实现：间接反馈以保持指尖灵活性、主动热透传以感知和渲染接触物理表面的温度，以及手掌上的空间和时间变化热渲染。人体实验评估了AR交互过程中的感知灵敏度、间接反馈、主动热透传、空间模式识别和移动热渲染。结果表明，尽管间接反馈在指尖视觉接触时降低了感知真实感，但并未降低沉浸感或舒适度；主动热透传支持真实与渲染表面之间的温度辨别；时空渲染相比静态热刺激显著提高了沉浸感和真实感。这些发现表明，我们的设计考虑是AR热触觉的可行设计策略，同时澄清了需要精确真实感与更广泛沉浸式热体验的应用之间的权衡。

英文摘要

We present a proof-of-concept palm-mounted thermal feedback prototype addressing thermal rendering challenges specific to augmented reality (AR), where users must interact with both real and virtual objects in their physical workspace. In contrast to thermal feedback systems developed for virtual reality, AR thermal feedback must preserve manual dexterity, maintain access to real-world thermal cues, and provide coherent virtual temperature sensations without obstructing natural object interaction. We propose three AR-specific design considerations, which our prototype implements: indirect feedback to preserve fingertip dexterity, active thermal passthrough to sense and render the temperature of contacted physical surfaces, and spatially and temporally varying thermal rendering across the palm. Human-subject experiments evaluated perceptual sensitivity, indirect feedback, active thermal passthrough, spatial pattern recognition, and moving thermal rendering during AR interaction. Results showed that although indirect feedback reduced perceived realism during visual contact at the fingertips, it did not reduce immersion or comfort; active thermal passthrough supported temperature discrimination between real and rendered surfaces; and spatiotemporal rendering significantly improved immersion and realism compared with static thermal stimulation. These findings suggest that our design considerations are viable design strategies for AR thermal haptics, while also clarifying tradeoffs for applications that require precise realism versus broader immersive thermal experience.

URL PDF HTML ☆

赞 0 踩 0

2501.10847 2026-06-19 cs.HC 版本更新

A Survey on Conceptual model of Enterprise ontology

Zeinab Rajabi, Seyed Mohsen Rahnamafard

Journal ref Knowledge Economy Studies, 2026

2408.06349 2026-06-19 cs.HC 版本更新

Functional near-infrared spectroscopy (fNIRS) and Eye tracking for Cognitive Load classification in a Driving Simulator Using Deep Learning

基于深度学习的驾驶模拟器中功能性近红外光谱(fNIRS)和眼动追踪的认知负荷分类

Mehshan Ahmed Khan, Houshyar Asadi, Mohammad Reza Chalak Qazani, Chee Peng Lim, Saied Nahavandi

AI总结本研究利用fNIRS和眼动追踪数据，结合CNN-LSTM深度学习模型，在低能见度驾驶模拟中预测认知负荷，达到99%准确率，为实时评估驾驶员心理状态和开发安全自适应系统提供支持。

Comments Presented at DSC 2024 (Strasbourg, France). Conf: https://dsc2024.org/ Paper link: https://proceedings.driving-simulation.org/proceeding/dsc-2024/functional-near-infrared-spectroscopy-fnirs-and-eye-tracking-for-cognitive-load-classification-in-a-driving-simulator-using-deep-learning/

Journal ref Proc. Driving Simulation Conference 2024 Europe VR (DSC 2024), pp. 47-56

详情

AI中文摘要

运动模拟器使研究人员能够安全地研究驾驶员与车辆的交互。然而，许多使用驾驶模拟器数据预测认知负荷的研究仅采用两个工作负荷水平，留下了采用深度学习方法分析认知负荷的研究空白，尤其是在具有挑战性的低光照条件下。通常，研究忽略或仅关注明亮日光下的场景。为填补这一空白并理解表现与认知负荷之间的相关性，本研究在低能见度条件下的模拟驾驶任务中，使用功能性近红外光谱（fNIRS）和眼动追踪数据（包括注视持续时间和注视方向），诱导不同的心理工作负荷。第一阶段涉及从fNIRS和眼动追踪数据中统计估计有用特征。对信号应用ANOVA以识别fNIRS信号中的显著通道。然后将来自fNIRS、眼动追踪和车辆动力学的最优特征组合成一个块，作为CNN和LSTM模型的输入，以预测工作负荷变化。所提出的CNN-LSTM模型在神经数据上达到99%的准确率，在车辆动力学数据上达到89%的准确率，用于预测认知负荷，表明其在实时评估驾驶员心理状态方面的潜力，并指导设计者开发安全的自适应系统。

英文摘要

Motion simulators allow researchers to safely investigate the interaction of drivers with a vehicle. However, many studies that use driving simulator data to predict cognitive load only employ two levels of workload, leaving a gap in research on employing deep learning methodologies to analyze cognitive load, especially in challenging low-light conditions. Often, studies overlook or solely focus on scenarios in bright daylight. To address this gap and understand the correlation between performance and cognitive load, this study employs functional near-infrared spectroscopy (fNIRS) and eye-tracking data, including fixation duration and gaze direction, during simulated driving tasks in low visibility conditions, inducing various mental workloads. The first stage involves the statistical estimation of useful features from fNIRS and eye-tracking data. ANOVA will be applied to the signals to identify significant channels from fNIRS signals. Optimal features from fNIRS, eye-tracking and vehicle dynamics are then combined in one chunk as input to the CNN and LSTM model to predict workload variations. The proposed CNN-LSTM model achieved 99% accuracy with neurological data and 89% with vehicle dynamics to predict cognitive load, indicating potential for real-time assessment of driver mental state and guide designers for the development of safe adaptive systems.

URL PDF HTML ☆

赞 0 踩 0