arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.00107 2026-06-02 eess.SP cs.AI cs.LG

Motif-based morphology signatures for interpretable ECG screening and monitoring

基于基序的形态学特征用于可解释的心电图筛查和监测

Nivedita Bijlani, Mauricio Villarroel

发表机构 * The Podium Institute of Sports Medicine and Technology（Podium运动医学与体育科技研究所）

AI总结提出一种基于基序的框架，通过定义可解释的心跳对齐基序和三种漂移度量，实现短期和长期心电图监测中的形态学变化量化与异常检测。

Comments Accepted to the IEEE Engineering in Medicine and Biology Conference (EMBC) 2026

详情

AI中文摘要

心电图仍然是心血管筛查的核心，但解读仍主要依赖人工且呈间歇性。临床实践依赖于简短的静息心电图，并在需要时进行长时间动态记录，两者都会产生需要大量资源审查的数据。因此，在临床明显异常出现之前，微妙的形态学变化或渐进性漂移可能被忽视。我们提出了一种基于基序的框架，该框架将心跳对齐的心电图基序定义为可解释的心脏特征，并量化短期和长期监测中的形态学漂移和偏差。基序是代表主导形态的典型心动周期。我们引入了三个可解释的漂移度量：与正常窦性心律的偏差、与个性化基线的偏差以及基序不稳定性指数。基序通过选择在固定窗口内最小化动态时间规整距离的心跳来提取。我们在短期（PTB-XL）和长期（MIT-BIH心律失常）心电图数据集上评估这些度量。通过代表性基序叠加和基于基准点的可视化实现可解释性，从而能够直接检查形态学变化。在MIT-BIH中，所提出的度量显著区分了主要正常和心律失常受试者（p<0.01）。在PTB-XL中，正常窦性心律偏差在主要诊断亚型中区分了正常和异常心电图（p<1e-4，Cliff's delta高达0.93）。心电图基序提供了心脏形态的可解释表示，支持可扩展的纵向监测和形态学驱动变化的早期检测。

英文摘要

Electrocardiography (ECG) remains central to cardiovascular screening, yet interpretation remains largely manual and episodic. Clinical practice relies on brief resting ECGs and, when required, long-duration ambulatory recordings, both generating data that require resource-intensive review. Consequently, subtle morphological changes or progressive drift preceding clinically apparent abnormalities may go unnoticed. We propose a motif-based framework that defines beat-aligned ECG motifs as interpretable cardiac signatures and quantifies morphological drift and deviation across short and long-term monitoring. Motifs are representative cardiac cycles capturing dominant morphology. We introduce three interpretable drift metrics: deviation from a normal sinus rhythm (NSR), deviation from a personalised baseline, and a motif instability index. Motifs are extracted by selecting beats that minimise Dynamic Time Warping (DTW) distance within fixed windows. We evaluate these metrics on short (PTB-XL) and long-duration (MIT-BIH Arrhythmia) ECG datasets. Interpretability is achieved through representative motif overlays and fiducial-based visualisations, enabling direct inspection of morphological changes. In MIT-BIH, the proposed metrics significantly separated predominantly normal from arrhythmic subjects (p<0.01). In PTB-XL, NSR deviation distinguished normal from abnormal ECGs across major diagnostic subtypes (p<1e-4, Cliff's delta up to 0.93). ECG motifs provide an interpretable representation of cardiac morphology, supporting scalable longitudinal monitoring and early detection of morphology-driven change.

URL PDF HTML ☆

赞 0 踩 0

2606.00106 2026-06-02 eess.SP cs.AI cs.HC cs.LG

A Methodological Framework for Explicit Control of the Speed-Accuracy Trade-off in Brain-Computer Interfaces

脑机接口中速度-准确性权衡显式控制的方法论框架

Javier Jiménez, Francisco B Rodríguez

发表机构 * Grupo de Neurocomputación Biológica, Departamento de Ingeniería Informática, Universidad Autónoma de Madrid（生物神经计算组，信息工程系，马德里自治大学）

AI总结提出一个独立于分类器、范式和早停策略的评估框架，通过增益和保持度两个指标及可调参数α显式控制速度-准确性权衡，并在P300范式上验证其有效性。

详情

AI中文摘要

脑机接口（BCI）受到脑电图等模态低信噪比的限制，需要多次试验才能可靠解码用户意图。这导致了速度-准确性权衡，即更高的准确性以速度为代价。速度-准确性平衡依赖于应用，因此需要可控的权衡。传统指标（如信息传输率）将速度和准确性合并，模糊了它们的依赖关系并可能引入偏差。在本研究中，我们提出了一个独立于分类器、范式和早停策略的评估框架，将速度和准确性分离。我们采用两个度量：增益（相对速度提升）和保持度（相对准确性保持），并将它们组合成一个由α控制的可调增益-保持平衡，从而调节速度-准确性权衡。该参数无需修改分类器即可调整工作点，便于跨场景部署。该框架在P300事件相关电位范式上进行了评估，使用了63名受试者的公开记录以及多种分类器和早停策略，以实现速度-准确性和比特率的不同工作点。结果表明，调整α可产生快速、准确或平衡的BCI行为，展示了速度-准确性权衡的显式控制。该方法支持受试者级别的性能预测，并提高了BCI行为的可解释性。对信息传输率的进一步分析揭示了其向速度的系统性偏差，该偏差通过所提出的框架中的增益和保持度测量得到解释。总体而言，本工作将速度-准确性权衡确立为可控的设计变量，并在公开的P300范式上进行了验证，从而实现了BCI的透明评估和应用特定优化。

英文摘要

Brain-computer interfaces (BCIs) are limited by low signal-to-noise ratio in modalities such as electroencephalography, which requires multiple trials to reliably decode user intentions. This induces a speed-accuracy trade-off, whereby higher accuracy comes at the cost of speed. The speed-accuracy balance is application-dependent, motivating controllable trade-offs. Conventional metrics, such as the Information Transfer Rate, combine speed and accuracy obscuring their dependence and potentially introducing biases. In this study, we propose an evaluation framework independent of classifier, paradigm, and early-stopping strategy that separates speed and accuracy. We employ two measures, Gain (relative speed improvement) and Conservation (relative accuracy preservation), and combine them into a tunable Gain-Cons Balance controlled by α, regulating the speed-accuracy trade-off. The parameter adjusts the operating point without modifying the classifier, facilitating deployment across scenarios. The framework was evaluated on P300 event-related potential paradigms using public recordings from 63 subjects as well as multiple classifiers and early-stopping strategies to achieve distinct operating points in speed-accuracy and bitrate. Results show that tuning α yields fast, accurate, or balanced BCI behaviours, demonstrating explicit control of the speed-accuracy trade-off. The method supports subject-level performance prediction and improves explainability of BCI behaviour. Further analysis of the Information Transfer Rate reveals a systematic bias toward speed, explained by the proposed framework through the Gain and Conservation measurements. Overall, this work establishes the speed-accuracy trade-off as a controllable design variable validated on public P300-based paradigms, enabling transparent evaluation and application-specific optimization of BCIs.

URL PDF HTML ☆

赞 0 踩 0

2606.00084 2026-06-02 cs.IR cs.AI cs.CL cs.LG

SentimentLens: Reconciling Sentiment and Ratings via Dual-Modality in the Hospitality Sector

SentimentLens: 通过双模态调和酒店业中的情感与评分

Dineth Jayakody, Pasindu Thenahandi, Sampath Jayarathna

发表机构 * University of Peradeniya（珀拉尼亚大学）

AI总结提出SentimentLens系统，基于方面级情感分析从非结构化酒店评论中提取知识，并通过跨模态调和文本情感与数值评分来识别运营冲突和服务改进机会。

详情

AI中文摘要

在线旅游平台生成大量用户生成的酒店评论，为大规模理解旅行者体验提供了丰富机会。然而，将非结构化文本反馈转化为结构化、可操作的见解仍然是一项具有挑战性的任务。本文提出了SentimentLens，一个基于方面级情感分析的可扩展分析系统，该系统从非结构化酒店评论中执行知识提取，并将其组织成可解释的服务类别。SentimentLens集成了方面术语提取、方面情感分类、语义类别分配和多层次分析模块，以支持区域级、酒店级和类别级评估。该系统设计为在不同地理环境和酒店环境中运行。为了展示其实用性，我们将SentimentLens应用于一个包含超过10,000条公开酒店评论的大型真实数据集。通过广泛分析，该框架揭示了旅行者情感如何随区域、服务类别和酒店类型而变化。我们进一步实现了文本情感与数值评分的跨模态调和，以识别潜在运营冲突、服务质量的结构性不一致性，并使用重要性-绩效和基于熵的分析确定高影响力的改进机会。结果表明，SentimentLens有效地将大规模非结构化评论转化为可操作的情报，支持酒店管理和旅游政策的数据驱动决策。虽然通过一个国家案例研究进行了演示，但所提出的系统可推广到其他目的地和评论驱动的服务领域。

英文摘要

Online travel platforms generate vast volumes of user-generated hotel reviews, offering rich opportunities to understand traveler experiences at scale. However, transforming unstructured textual feedback into structured, actionable insights remains a challenging task. This paper presents SentimentLens, a scalable analysis system based on Aspect-Based Sentiment Analysis that performs knowledge extraction from unstructured hotel reviews and organizes them into interpretable service categories. SentimentLens integrates aspect term extraction, aspect sentiment classification, semantic category assignment, and multi-level analytical modules to support region-level, hotel-level, and category-level evaluation. The system is designed to operate across different geographic contexts and hospitality settings. To demonstrate its practical utility, we apply SentimentLens to a large real-world dataset of over 10,000 publicly available hotel reviews. Through extensive analysis, the framework reveals how traveler sentiment varies across regions, service categories, and hotel archetypes. We further implement a cross-modal reconciliation of textual sentiment and numerical ratings to identify latent operational conflicts, structural inconsistencies in service quality, and high-impact improvement opportunities using importance--performance and entropy-based analyses. The results show that SentimentLens effectively transforms large-scale unstructured reviews into actionable intelligence, supporting data-driven decision-making for hospitality management and tourism policy. While demonstrated using a national case study, the proposed system is generalizable to other destinations and review-driven service domains.

URL PDF HTML ☆

赞 0 踩 0

2606.00074 2026-06-02 eess.SP cs.AI cs.LG

CLSP-REQA: A Real-Time Quality-Aware Closed-Loop Seizure Prediction Framework with Mamba-BiLSTM and Confidence-Gated Intervention

CLSP-REQA：基于Mamba-BiLSTM和置信门控干预的实时质量感知闭环癫痫发作预测框架

Mufeng Chen, Qi Wu, Bingchao Huang, Xiwen Lai, Zekai Chen, Xinge Ouyang, Quansheng Ren

发表机构 * Department of Engineering Science, University of Oxford（牛津大学工程科学系）； Mathematical Institute, University of Oxford（牛津大学数学研究所）； School of Computer Science and Engineering, Beihang University（北航计算机科学与工程学院）； Aerospace Information Research Institute, Chinese Academy of Sciences（中国科学院航天信息研究所）； Department of Mechanical Engineering, The University of British Columbia（不列颠哥伦比亚大学机械工程系）； College of Life Sciences, Hunan Normal University（湖南师范大学生命科学学院）； School of Electronics, Peking University（北京大学电子学院）

AI总结提出CLSP-REQA框架，通过嵌入实时EEG质量评估模块和Mamba-BiLSTM骨干网络，结合分层非线性融合函数，在严格跨患者评估下实现优于现有方法的癫痫发作预测性能。

Comments 27 pages, 8 figures, submitted to Biomedical Signal Processing and Control

详情

AI中文摘要

可靠的癫痫发作预测是闭环神经刺激治疗的前提，然而现有方法很少考虑实际部署中EEG信号质量的可变性，并且绝大多数采用非严格的评估协议，高估了泛化性能。我们提出了CLSP-REQA（具有实时EEG质量评估的闭环癫痫发作预测），这是一个统一框架，将轻量级信号质量估计器直接嵌入预测流程中。实时EEG质量评估（REQA）模块与Mamba-BiLSTM骨干网络并行运行，产生一个标量质量分数q ∈ [0,1]，通过分层非线性融合函数（ECLO）调节输出置信度。在CHB-MIT头皮EEG数据库（n=23名受试者，198次发作）的严格跨患者评估下，CLSP-REQA实现了0.7426 ± 0.0199的AUC-ROC，优于Jemal等人报告的未适应跨患者基线0.69，仅使用16个EEG通道（先前工作为23个），且无需任何目标患者数据或域适应。在SIENA头皮EEG数据库（n=14名受试者，47次发作）上，CLSP-REQA实现了0.7012 ± 0.0249的AUC，大幅超过同一数据集上最佳域适应跨患者结果0.61，展示了强大的跨数据集泛化能力。该框架输出结构化四元组(p, q, c, Phi_SHAP)，可直接与闭环神经刺激器接口兼容。

英文摘要

Reliable seizure prediction is a prerequisite for closed-loop neurostimulation therapy, yet existing methods rarely account for the variability in EEG signal quality encountered in real-world deployment, and the overwhelming majority adopt non-strict evaluation protocols that overestimate generalisation performance. We propose CLSP-REQA (Closed-Loop Seizure Prediction with Real-time EEG Quality Assessment), a unified framework that embeds a lightweight signal quality estimator directly within the prediction pipeline. A Real-time EEG Quality Assessment (REQA) module runs in parallel with a Mamba-BiLSTM backbone, producing a scalar quality score q in [0,1] that modulates output confidence through a tiered non-linear fusion function (ECLO). Under strict cross-patient evaluation on the CHB-MIT Scalp EEG Database (n = 23 subjects, 198 seizures), CLSP-REQA achieves an AUC-ROC of 0.7426 +- 0.0199, outperforming the unadapted cross-patient baseline of 0.69 reported by Jemal et al., using only 16 EEG channels compared to 23 in prior work, and without requiring any target-patient data or domain adaptation. On the SIENA Scalp EEG Database (n = 14 subjects, 47 seizures), CLSP-REQA achieves AUC 0.7012 +- 0.0249, substantially surpassing the best domain-adapted cross-patient result of 0.61 on the same dataset, demonstrating strong cross-dataset generalisation. The framework outputs a structured four-tuple (p, q, c, Phi_SHAP) directly compatible with closed-loop neurostimulator interfaces.

URL PDF HTML ☆

赞 0 踩 0

2606.00073 2026-06-02 cs.NE cs.AI cs.LG

Rare Events, Real Signals: Functional Ensembles as Units of Computation in Deep Spiking Networks

罕见事件，真实信号：深度脉冲网络中的功能集合作为计算单元

Aditi Aravind, Konstantinos Ladakis, Mario Alexios Savaglio, Stelios M. Smirnakis, Maria Papadopouli

发表机构 * University of Crete（希腊克里特大学）； Foundation of Research & Technology - Hellas（希腊研究与技术基金会）； Archimedes Research Unit（阿基米德研究单位）； Harvard Medical School（哈佛医学院）； Brigham and Women’s Hospital（布莱根妇女医院）

AI总结通过引入功能连接性分析框架，研究深度脉冲神经网络中功能集合的涌现特性，发现一阶功能连接集合的协同放电可靠预测下游神经元响应，且信息编码集中在罕见但高度协调的活动模式中。

详情

AI中文摘要

我们通过引入一个受神经科学启发的框架，从功能连接性的角度分析深度脉冲神经网络（SNN），研究内部表征如何在层次化处理系统中涌现。借鉴系统神经科学和信息论的概念，我们基于一个神经元与训练好的SNN架构中前一层神经元的统计显著成对相关性，形成该神经元的一阶功能连接（1FC）组。然后，我们在各种条件下的推理过程中跟踪其响应特性。我们的分析表明，先前在生物皮层中观察到的功能连接性的几个原理在脉冲ResNet架构中得以保留。这些1FC集合表现出有趣的特性：它们的聚合协同放电通过一个鲁棒的、类似ReLU的输入输出关系可靠地预测下游神经元响应，其增益随集合大小系统性缩放。仅在高的1FC协同放电事件期间才出现所呈现类别的可靠编码，而这些事件本身发生频率较低，表明信息表征集中在罕见但高度协调的活动模式中。在均匀随机噪声或对抗性扰动下，这些响应轮廓被破坏，尤其是在早期和中间层。这使得能够在特定节点和路径上进行有针对性的高分辨率探查。我们表明，功能连接结构由学习塑造，并且在权重置换下该结构被破坏。这些确立了1FC集合作为输入编码和信息传递的功能上有意义的基质，对设计针对信息流的有针对性的细粒度诊断具有潜在意义。

英文摘要

We investigate how internal representations emerge across hierarchical processing systems by introducing a neuroscience-inspired framework for analyzing deep spiking neural networks (SNN) through the lens of functional connectivity. Drawing on concepts from systems neuroscience and information theory, we form the first-order functionally-connected (1FC) group of a neuron based on its statistically significant pairwise correlations with neurons from the previous layer of a trained SNN architecture. We then track its response properties during inference under various conditions. Our analysis shows that several principles of functional connectivity previously observed in biological cortex are preserved in spiking ResNet architectures. These 1FC ensembles display interesting properties: their aggregate cofiring reliably predicts downstream neuronal responses through a robust, ReLU-like input-output relationship, whose gain scales systematically with ensemble size. Reliable encoding of the presented class emerges only during high 1FC cofiring events, which themselves occur infrequently, indicating that informative representations are concentrated in rare but highly coordinated activity patterns. Under uniform random noise or adversarial perturbations, these response profiles are disrupted, particularly in early and intermediate layers. This enables a targeted high-resolution interrogation at specific nodes and pathways. We showed that the functional connectivity structure is shaped by learning and this structure breaks under weight permutation. These establish 1FC ensembles as a functionally meaningful substrate for input encoding and information transfer, with potential implications in designing targeted fine-grained diagnostics on the information flow.

URL PDF HTML ☆

赞 0 踩 0

2606.00065 2026-06-02 cs.IR cond-mat.mtrl-sci cs.AI cs.CL

Beyond Text and Tables: Vision-Language Model Integration in ComProScanner for Extracting Materials Data from Scientific Figures with High Accuracy

超越文本与表格：ComProScanner中视觉-语言模型集成实现从科学图表中高精度提取材料数据

Aritra Roy, Enrico Grisan, Chiara Gattinoni, John Buckeridge

发表机构 * Energy, Materials and Environment Research Centre, London South Bank University, London SE1 0AA, UK（能源、材料与环境研究中心，伦敦南银行大学）； School of Engineering and Design, London South Bank University, London SE1 0AA, UK（工程与设计学院，伦敦南银行大学）； Bioscience and Bioengineering Research Centre, London South Bank University, London SE1 0AA, UK（生物科学与生物工程研究中心，伦敦南银行大学）； Department of Physics, Kings College London, London WC2R 2LS, UK（物理系，伦敦国王学院）

AI总结本文通过集成视觉-语言模型扩展ComProScanner框架，实现了从科学图表中自动提取成分-性能数据，在压电陶瓷数据集上达到0.97的组成准确率和归一化F1分数，并引入基于范围的误差阈值评估方法。

Comments 18 pages, 3 figures

详情

AI中文摘要

基于大语言模型流水线的自动提取科学文献中材料成分-性能数据的方法已取得显著进展；然而，现有框架仍局限于文本和表格内容，忽视了仅在科学图表中报告的大量定量性能数据。本文扩展了ComProScanner——一个用于自动构建成分-性能数据库的完全端到端多智能体框架，为其增加了基于原生视觉-语言模型（VLM）的图表提取能力。该扩展引入了一个FigureExtractor工具，用于基于标题关键词对所有支持的出版商进行图表过滤，以及一个GraphExtractorTool智能体，它将提取的图表传递给可配置的VLM，以从科学图表和绘图中恢复成分-性能对。基于LMArena Diagram排行榜和每百万token输入成本低于1.50美元的标准，选择了四个VLM进行评估。在来自已建立的d33测试语料库的50篇压电陶瓷文章上的基准测试表明，Gemini-3-Flash-Preview实现了最高性能，组成准确率为0.97，归一化F1分数为0.97，同时仍然是四个评估模型中成本效益最高的。此外，我们在评估框架中引入了一个基于范围的值误差阈值参数，与精确值匹配相比，提供了对从图表中提取的数值性能数据更具物理意义的评估。这些贡献使集成VLM的ComProScanner成为第一个针对材料科学、完全自动化、多模态的文献挖掘平台，能够在单一统一流水线中从文本、表格和图表中提取结构化的成分-性能数据。

英文摘要

Automated extraction of materials composition-property data from scientific literature has advanced considerably with the development of large language model-based pipelines; however, existing frameworks remain limited to textual and tabular content, overlooking the substantial proportion of quantitative property data reported exclusively in scientific figures. Here, we extend ComProScanner, a fully end-to-end multi-agent framework for automated composition-property database construction, with a native vision-language model (VLM) based figure extraction capability. The extension introduces a FigureExtractor utility for caption-keyword-based figure filtering across all supported publishers, and a GraphExtractorTool agent that passes extracted figures to a configurable VLM to recover composition-property pairs from scientific charts and plots. Four VLMs are selected for evaluation on the basis of the LMArena Diagram leaderboard with an input cost criterion of less than \$1.50 per million tokens. Benchmarking on 50 piezoelectric ceramic articles from the established $d_{33}$ test corpus demonstrates that Gemini-3-Flash-Preview achieves the highest performance with a composition accuracy of 0.97 and a normalised F1 score of 0.97, whilst remaining the most cost-effective model among the four evaluated. We additionally introduce a range-based value error threshold parameter into the evaluation framework, providing a more physically meaningful assessment of numeric property values extracted from figures than exact value matching. These contributions establish VLM-integrated ComProScanner as the first materials-specific, fully automated, multimodal literature mining platform capable of extracting structured composition-property data from text, tables, and figures within a single unified pipeline.

URL PDF HTML ☆

赞 0 踩 0

2606.00060 2026-06-02 q-fin.TR cs.CE cs.LG

Machine Learning-Based Bitcoin Trading Under Transaction Costs: Evidence From Walk-Forward Forecasting

基于机器学习的比特币交易：考虑交易成本的滚动前向预测证据

Andrei Bysik, Robert Ślepaczuk

发表机构 * Quantitative Finance Research Group, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融研究组，华沙大学）； Quantitative Finance Research Group, Department of Quantitative Finance and Machine Learning, Faculty of Economic Sciences, University of Warsaw（经济科学学院量化金融与机器学习系量化金融研究组，华沙大学）

AI总结研究在交易成本下，利用XGBoost、LSTM和iTransformer等机器学习模型预测BTC-USDT小时收益率，并通过成本感知执行过滤器将预测转化为盈利交易策略。

Comments 42 pages,

详情

AI中文摘要

本文研究机器学习对BTC-USDT小时收益率的预测能否在扣除交易成本后转化为具有经济意义的交易表现。使用2018-2026年间约70,000个小时观测值，在27折滚动前向协议中评估XGBoost、LSTM和iTransformer。所有三种模型在选定配置下均产生正的毛交易表现，但一旦施加十个基点的交易成本，基于符号的朴素策略便失效。一种成本感知的执行过滤器（仅当预测幅度超过基于交易成本的阈值时才阻止交易）显著降低了换手率，并在选定配置下恢复了盈利能力。最强的纯多头XGBoost策略年化收益率超过65%，夏普比率高于1。额外测试表明，技术指标在选定情况下提升了表现，EGARCH导出的特征并未提供一致的稳健收益，且XGBoost在描述性上优于神经替代模型，尽管自助法证据不支持正式的统计优势。损失函数和模型选择效应是次要的且统计上脆弱。结果表明，小时级加密货币交易的主要障碍不仅在于弱可预测性，还在于将预测转化为交易的方式。

英文摘要

This paper investigates whether machine learning forecasts of hourly BTC-USDT returns can be converted into economically meaningful trading performance after transaction costs. Using approximately 70,000 hourly observations from 2018-2026, XGBoost, LSTM, and iTransformer are evaluated in a 27-fold walk-forward protocol. All three models produce positive gross trading performance in selected configurations, but naive sign-based strategies fail once transaction costs of ten basis points are imposed. A cost-aware execution filter, which prevents trades only when the forecast magnitude exceeds a transaction-cost-based threshold, sharply reduces turnover and restores profitability in selected configurations. The strongest long-only XGBoost strategy produces annualised returns above 65% with a Sharpe ratio above one. Additional tests show that technical indicators improve performance in selected cases, EGARCH-derived features do not provide uniformly robust gains, and XGBoost is descriptively stronger than the neural alternatives, although bootstrap evidence does not support formal statistical dominance. Loss-function and model-selection effects are secondary and statistically fragile. The results show that the main obstacle in hourly cryptocurrency trading is not only weak predictability, but also the way forecasts are converted into trades.

URL PDF HTML ☆

赞 0 踩 0

2606.00056 2026-06-02 cs.CE cs.AI cs.LG physics.app-ph

Physics-Informed Neural Networks for Radial Consolidation of Combined Electroosmotic, Vacuum and Surcharge Preloading Considering Smear Effects

考虑涂抹效应的电渗-真空-堆载联合预压径向固结的物理信息神经网络

Dong Li, Yapeng Cao, Shuai Huang, Yujun Cui, Haiping Fu, Lu Yang, He Wei

发表机构 * Department of Civil, Environmental, and Infrastructure Engineering, George Mason University（乔治·马歇尔大学土木、环境与基础设施工程系）； State Key Laboratory of Cryospheric Science and Frozen Soil Engineering, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences（中国科学院寒区工程与冻土科学国家重点实验室，西北生态环境资源研究院）； Laboratoire Navier/CERMES, École Nationale des Ponts et Chaussées, Institut Polytechnique de Paris（巴黎理工学院劳纳实验室/塞梅斯实验室，法国国家桥梁与道路学院）； College of Water Conservancy and Hydropower Engineering, Hohai University（河海大学水利水电学院）； School of Geosciences and Info-physics, Central South University（中南大学地球科学与信息物理学院）

AI总结提出一种无量纲多域物理信息神经网络框架，通过改进的门控硬约束边界编码模型解决电渗径向固结问题，在时变荷载下实现高精度预测。

详情

AI中文摘要

本研究开发了一个无量纲多域物理信息神经网络（PINN）框架，用于考虑涂抹效应和真空-堆载联合预压的电渗径向固结。研究了三种基于PINN的模型：标准软约束PINN（Std-PINN）、改进的门控PINN（Mod-PINN）以及具有硬约束边界编码的改进门控PINN（Mod-HC-PINN）。这些模型在四种荷载工况下与有限元参考解进行了对比评估，包括恒定真空、指数真空、指数真空加斜坡堆载以及指数真空加循环半正弦堆载。结果表明，Mod-PINN中采用的门控架构提高了恒定真空荷载下阴极和涂抹区界面附近陡峭压力梯度的分辨率。在时变荷载下，软约束的Mod-PINN由于必须同时学习多个竞争目标而精度降低。Mod-HC-PINN通过将阴极边界和初始条件嵌入输出结构，减轻了这一问题，从而降低了优化负担并提高了物理一致性。Mod-HC-PINN在指数真空、斜坡堆载和循环堆载工况下的平均绝对误差（MAE）分别为0.43、0.41和0.27 kPa。敏感性分析进一步表明，所提出的框架在网络架构、配置点密度和渗透率对比的实际范围内保持稳健。

英文摘要

This study develops a dimensionless multi-domain physics-informed neural network (PINN) framework for electro-osmotic radial consolidation considering smear effects and combined vacuum and surcharge loading. Three PINN-based models are investigated: a standard soft-constrained PINN (Std-PINN), a modified gated PINN (Mod-PINN), and a modified gated PINN with hard-constraint boundary encoding (Mod-HC-PINN). The models are evaluated against FEM reference solutions under four loading cases, including constant vacuum, exponential vacuum, exponential vacuum with ramp surcharge, and exponential vacuum with cyclic haversine surcharge. The results indicate that the gated architecture applied in Mod-PINN improves the resolution of steep pressure gradients near the cathode and smear-zone interface under constant vacuum loading. Under time-dependent loading, the soft-constrained Mod-PINN shows reduced accuracy because it must learn multiple competing objectives simultaneously. The Mod-HC-PINN mitigates this issue by embedding the cathode boundary and initial conditions into the output structure, thereby reducing the optimization burden and improving physical consistency. The Mod-HC-PINN achieves MAE values of 0.43, 0.41, and 0.27 kPa for the exponential vacuum, ramp surcharge, and cyclic surcharge cases, respectively. Sensitivity analyses further demonstrate that the proposed framework remains robust across practical ranges of network architecture, collocation density, and permeability contrast.

URL PDF HTML ☆

赞 0 踩 0

2606.00051 2026-06-02 cs.CY cs.AI

Business Utility of Large Language Models as Exploratory Data Analysis Agents

大型语言模型作为探索性数据分析代理的商业实用性

Rafał Łabędzki, Patryk Miziuła, Hubert Rutkowski, Szymon Betlewski, Cezary Depta, Szymon Janowski, Jarosław Kochanowicz, Jan Kanty Milczek

发表机构 * deepsense.ai ； SGH Warsaw School of Economics（SGH沃兹尼亚克经济学院）； Bydgoszcz University of Science and Technology（比得戈茨茨理工大学）； Google（谷歌）

AI总结通过基于代理的供应链模拟基准，评估LLM作为EDA代理在商业环境中的平均性能与可重复性，提出风险调整指标Business utility，发现多数配置不可靠，GPT-5.4表现最佳。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被用于分析工作流，但它们在商业环境中作为探索性数据分析（EDA）代理的适用性仍不确定。在实践中，一个可部署的EDA代理不仅必须提供有用的平均性能，还必须提供足够的可重复性以支持对其输出的信任。我们在一个受控的、与商业相关的基准上评估了这一要求，该基准基于基于代理的供应链模拟。任务是通过从间接操作痕迹而非显式标签进行推理，识别导致低质量和下游销售损失的供应商-产品组合。来自八个模型家族的十五种模型变体配置在四种实验条件下进行了评估，这些条件改变了数据表示、提示清晰度和信号强度，每种条件有五个轨迹。输出使用Jaccard指数与确定性真实值进行评分，并通过一个框架进行评估，该框架结合了平均得分（ms）、变异系数（CV）、探索性跨条件显著性检验以及商业实用性（Business utility），这是我们提出的一个风险调整指标，用于在单一操作度量中总结质量和可重复性。结果表明，大多数配置对于自主EDA使用来说不够可靠，即使它们的平均得分看起来可以接受。具有超高推理努力的GPT-5.4实现了最强的整体表现，实验平均ms为0.8748，实验平均商业实用性为0.6952，而次优配置在可变性折扣后损失了更多的实用性。我们的发现表明，对EDA代理的评估应将平均质量、可重复性和条件敏感性视为操作可信度的互补维度。

英文摘要

Large Language Models (LLMs) are increasingly used in analytical workflows, but their suitability as exploratory data analysis (EDA) agents in business settings remains uncertain. In practice, a deployable EDA agent must provide not only useful average performance but also sufficient repeatability to support trust in its outputs. We evaluate this requirement in a controlled, business-relevant benchmark built on an agent-based supply chain simulation. The task is to identify supplier-product combinations responsible for low quality and downstream sales loss by reasoning from indirect operational traces rather than from explicit labels. Fifteen model-variant configurations from eight model families were evaluated under four experimental conditions that varied data representation, prompt clarity, and signal strength, with five trajectories per condition. Outputs were scored against deterministic ground truth using the Jaccard index and assessed through a framework that combines mean score (ms), coefficient of variation (CV), exploratory cross-condition significance tests, and Business utility, a risk-adjusted metric that we propose to summarise quality and repeatability in a single operational measure. The results show that most configurations are not reliable enough for autonomous EDA use, even when their average scores appear acceptable. GPT-5.4 with extra-high reasoning effort achieved the strongest overall profile, with an experiment-averaged ms of 0.8748 and an experiment-averaged Business utility of 0.6952, while the next-best configurations lost substantially more utility after variability discounting. Our findings suggest that evaluation of EDA agents should treat average quality, repeatability, and condition sensitivity as complementary dimensions of operational trustworthiness.

URL PDF HTML ☆

赞 0 踩 0

2606.00049 2026-06-02 cs.CY cs.AI

Measuring and Mitigating Bias in Code Generated by Large Language Models

测量和减轻大型语言模型生成代码中的偏见

Yuxi Chen, Yutian Tang, Timothy Storer

发表机构 * School of Computing Science, University of Glasgow（格拉斯哥大学计算机科学学院）

AI总结本文针对GPT-4o和Gemini等主流代码生成工具，提出评估框架，使用代码偏见分数和属性变化比率量化偏见，并探索四种轻量级缓解策略。

详情

AI中文摘要

大型语言模型（LLMs）在自然语言生成中的应用广受认可，并越来越多地用于代码生成任务。然而，其生成输出中的偏见问题仍然显著。本文聚焦于GPT-4o和Gemini这两个主流的代码生成工具，提出了一个评估LLM生成代码中偏见的框架，特别考察了受保护属性、提示和网络搜索能力的影响。我们使用两个指标：代码偏见分数（CBS）和属性变化比率（ACR），分别量化偏见的普遍性和不同属性的影响程度。此外，我们研究了四种轻量级缓解策略：少样本、思维链、少样本思维链和多智能体，旨在减轻生成代码中的偏见。我们的研究结果表明，即使在应用缓解策略后，偏见在不同受保护属性和数据集中仍然普遍存在，这凸显了需要更有效的方法来减少AI驱动的代码生成系统中的偏见。

英文摘要

Large language models (LLMs) are widely recognised for their applications in natural language generation and are increasingly used for code generation tasks. However, concerns about bias in their generated outputs remain significant. This paper focuses on GPT-4o and Gemini, mainstream tools for code generation, and proposes a framework for evaluating bias in LLM-generated code, specifically examining the influence of protected attributes, prompts and web-search capability. We use two metrics: the code bias score (CBS) and the attribute change ratio (ACR), to quantify the prevalence of bias and the degree of influence of different attributes, respectively. In addition, we investigate four lightweight mitigation strategies: Few-Shot, Chain-of-Thought, Few-Shot Chain-of-Thought, and Multi-agent, aimed at mitigating bias in generated code. Our findings reveal that bias remains prevalent across different protected attributes and datasets even after applying mitigation strategies, highlighting the need for more effective approaches to reduce bias in AI-driven code generation systems.

URL PDF HTML ☆

赞 0 踩 0

2606.00048 2026-06-02 cs.CY cs.CL

The Invisible Coalition Partner: How LLMs Vote When Democracy Gets Concrete

无形的联盟伙伴：当民主变得具体时，LLM如何投票

Joel Barmettler

发表机构 * Independent Researcher（独立研究员）； Zurich Switzerland（苏黎世瑞士）

AI总结通过对比抽象问卷和瑞士实际公投，发现LLM在具体政策决策中表现为中间派、偏向现状且跨语言不一致，而非先前认为的左倾偏见。

Comments 13 pages, 10 figures. Preprint. Code and data: https://github.com/joelbarmettlerUZH/invisible-coalition-partner

详情

AI中文摘要

先前的研究已确定，经过指令调整的大型语言模型表现出左倾政治偏见，这些偏见仅通过抽象政治问卷测量。我们表明这一发现并不适用于具体的政策决策。我们引入了一种基于瑞士民主现实的双工具方法论。Smartvote问卷（75个抽象政策问题）被应用于来自27个模型家族的66个LLM，并与184名当选的瑞士国民院议员进行比较，复制了已确立的左倾趋同（Cohen's d = 3.64, p = 0.0002）。然后，作为本工作的创新，9个旗舰LLM在三种信息条件下面对48次真实的联邦公投（Volksabstimmungen），使用四种国家语言（德语、法语、意大利语、罗曼什语），将投票与实际结果和政党推荐（Parolen）进行比较。三个发现挑战了主流叙述。（1）抽象问卷不能预测具体行为：在Smartvote上的左右一致梯度从左侧峰值转变为在Volksabstimmungen上的中心峰值，模型最接近中间派的Die Mitte和FDP，而非左派的SP和Gruene（Wilcoxon p = 0.008）。（2）对于某些模型，政治问题的语言比政治内容更能改变答案：跨语言一致性范围从50%（Mistral）到98%（GPT-5.4）。（3）两个模型表现出系统的变革厌恶而非政治偏见，无论方向如何，在83-94%的公投中投反对票（二项式p < 0.0001）。先前工作测量的“左倾偏见”可能无法推广到抽象工具之外。在具体政策决策上，LLM的行为更像是谨慎的公务员而非左派的联盟伙伴：中间派、偏好现状且跨语言不一致。

英文摘要

Prior research has established that instruction-tuned large language models exhibit left-of-center political bias, measured exclusively through abstract political questionnaires. We show that this finding does not generalize to concrete policy decisions. We introduce a dual-instrument methodology grounded in Swiss democratic reality. The Smartvote questionnaire (75 abstract policy questions) is administered to 66 LLMs from 27 model families and compared to 184 elected members of the Swiss National Council, replicating the established leftward convergence (Cohen's d = 3.64, p = 0.0002). Then, novel to this work, 9 flagship LLMs are confronted with 48 real federal referenda (Volksabstimmungen) in four national languages (German, French, Italian, Romansh) under three information conditions, comparing votes to actual outcomes and party recommendations (Parolen). Three findings challenge the prevailing narrative. (1) Abstract questionnaires do not predict concrete behavior: the left-to-right agreement gradient on Smartvote shifts from left-peaked to center-peaked on Volksabstimmungen, where models align most with centrist Die Mitte and FDP rather than leftist SP and Gruene (Wilcoxon p = 0.008). (2) For some models, the language of a political question changes the answer more than the political content does: cross-linguistic consistency ranges from 50% (Mistral) to 98% (GPT-5.4). (3) Two models exhibit systematic change-aversion rather than political bias, voting Nein on 83-94% of referenda regardless of direction (binomial p < 0.0001). What prior work measured as "leftward bias" may not generalize beyond abstract instruments. On concrete policy decisions, LLMs behave less like coalition partners of the left and more like cautious civil servants: centrist, status-quo-favoring, and inconsistent across languages.

URL PDF HTML ☆

赞 0 踩 0

2606.00047 2026-06-02 cs.CY cs.AI

Comprehensive AI governance requires addressing non-model gains

全面的人工智能治理需要解决非模型增益问题

Arthur Goemans, Dan Altman, Noemi Dreksler, Jonas Freund, Milan Gandhi, Zhengdong Wang, Sarah Cogan, Sebastien Krier, Demetra Brady, Lewis Ho, Allan Dafoe

发表机构 * Stanford University（斯坦福大学）； UC Berkeley（加州大学伯克利分校）； Open Philanthropy（开放哲学基金会）

AI总结本文提出非模型增益的概念，包括推理增益、系统增益和资产增益，并论证这些增益会削弱以模型为中心的治理有效性，进而提出超越模型层面的治理方法。

Comments This paper has been accepted to ICML 2026 (Position paper track): https://openreview.net/forum?id=V3O1sHpKxX

详情

AI中文摘要

前沿人工智能治理通常以模型级治理范式为中心，该范式假设模型的能力概况主要取决于训练期间使用的计算和数据。本文认为，当能力进步越来越多地由“非模型增益”——即与基础模型进步无关的改进——驱动时，模型级治理的有效性会降低。我们形式化了非模型增益的概念，并提供了三种不同能力增益向量的分类：推理增益（测试时扩展计算）、系统增益（训练后增强，如脚手架）和资产增益（用受限资产增强模型）。我们展示了这些向量——以及来自具身化、持续学习和人工智能扩散的潜在未来影响——可能会破坏主要依赖于部署前评估和缓解的风险管理策略。我们概述了超越模型层面的治理方法：系统、实体、代理和云治理。最后，我们强调社会韧性作为这些治理层补充的重要性。

英文摘要

Frontier AI governance often centres on the model-level governance paradigm, which assumes that a model's capability profile is primarily a function of the compute and data used during training. This position paper argues that model-level governance becomes less effective when capability progress is increasingly driven by "non-model gains"--improvements that are independent from advances in the base model. We formalise the concept of non-model gains and provide a taxonomy of three distinct vectors of capability gain: inference gain (scaling compute at test-time), systems gain (post-training enhancements such as scaffolds), and asset gain (enhancing a model with restricted assets). We demonstrate how these vectors--alongside potential future impacts from embodiment, continual learning, and AI diffusion--may undermine risk management strategies that hinge mostly on pre-deployment evaluation and mitigation. We provide an overview of governance approaches that go beyond the model level: system, entity, agent, and cloud governance. Finally, we emphasise the importance of societal resilience as a complement to these governance layers.

URL PDF HTML ☆

赞 0 踩 0

2606.00046 2026-06-02 cs.MM cs.AI cs.CV cs.CY

When Jokes Cross the Line: Analyzing Regular Humor and Dark Humor in YouTube Shorts

当玩笑越界：分析YouTube Shorts中的常规幽默与黑色幽默

Sydney Johns, Sanjeev Parthasarathy, Shantnu Bhalla, Vaibhav Garg

发表机构 * Virginia Polytechnic Institute and State University（弗吉尼亚理工大学）

AI总结通过构建TwistedHumor数据集（1211个YouTube Shorts及33041条评论的手工标注），结合多视角分析（LLooM概念归纳、评论情感分析、大模型评估），揭示了短格式视频中常规幽默与黑色幽默在主题、观众反应和模型检测上的差异，强调了上下文感知审核的必要性。

详情

AI中文摘要

YouTube等视频平台重塑了用户参与娱乐和信息的方式，强调简短、高参与度的内容，如Shorts。在这个生态系统中，某些内容处于灰色地带：虽然允许存在，但仍可能对部分观众产生意想不到的负面影响。为了研究这一问题，我们引入了TwistedHumor数据集，包含1,211个YouTube Shorts及其配对的33,041条相关评论，并手工标注了幽默存在性、幽默类型、伤害性、主题、修辞手法和单口喜剧背景。除了数据集构建，我们还提出了对短格式社交媒体中幽默与伤害表现的多视角分析。通过使用基于LLooM的概念归纳对视频描述进行分析，我们发现黑色幽默经常围绕批评、应对、尴尬和身份表达等主题聚集，而不是作为一个单一的类别出现。我们进一步通过关联评论分析观众反应，表明常规幽默与更积极的情感相关，而黑色幽默则收到更多混合、中性甚至有时更有毒的反馈。最后，我们评估了大语言模型与人类标注的一致性，发现它们在单口喜剧上的表现优于短笑话。综合来看，这些结果将TwistedHumor不仅定位为一个新的基准，而且是对短格式视频中幽默与伤害灰色地带的实证研究，强调了需要上下文感知的审核和更稳健的多模态评估。

英文摘要

Video platforms such as YouTube have reshaped how users engage with entertainment and information, emphasizing brief, highly engaging content such as Shorts. Within this ecosystem, certain content occupies a gray area where it remains allowed but may still have unintended negative effects on some audiences. To study this problem, we introduce TwistedHumor, a dataset of 1,211 YouTube Shorts paired with 33,041 related comments, with hand annotations for humor presence, humor type, harm, topic, rhetorical devices, and stand up context. Beyond dataset creation, we present a multi view analysis of how humor and harm appear in short form social media. Using LLooM based concept induction over video descriptions, we find that dark humor frequently clusters around themes of critique, coping, awkwardness, and identity expression rather than appearing as a single uniform category. We further analyze audience response through linked comments and show that regular humor is associated with more positive sentiment, while dark humor receives more mixed, neutral, and sometimes more toxic reactions. Finally, we evaluate large language models against human annotations and find that they perform better on stand up comedy compared to shorter jokes. Together, these results position TwistedHumor not only as a new benchmark, but as an empirical study of the gray area between humor and harm in short form video, highlighting the need for context aware moderation and more robust multimodal evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.00044 2026-06-02 cs.CY cs.AI

Algorithmic Authority and the Clinical Standard of Care

算法权威与临床护理标准

Aizierjiang Aiersilan

发表机构 * The George Washington University（乔治·华盛顿大学）

AI总结本文探讨人工智能在临床医学中引发的算法概率推理与医生经验直觉之间的张力，提出将AI系统视为事实上的医疗监管，并主张通过辩证的护理标准将AI-医生联合体作为单一诊断责任实体。

2606.00041 2026-06-02 cs.CY cs.AI

Improving Hospital Process Management through Process Mining: A Case Study on COVID-19 Clinical Pathways

通过过程挖掘改进医院流程管理：COVID-19临床路径案例研究

Pasquale Ardimento, Mario Luca Bernardi, Marta Cimitile, Samuele Latorre

发表机构 * University of Bari Aldo Moro（巴里大学Aldo Moro分校）； Unisannio University of Benevento（贝内文托大学Unisannio分校）； UnitelmaSapienza Rome（罗马Sapienza大学Unitelma分校）

AI总结本研究利用COVID数据共享学习数据集，构建透明可复现的管道将临床数据转化为事件日志，通过过程发现、声明性合规检查和结果分析，揭示COVID-19护理路径中的监测主干、急诊与入院接口的变异性以及年龄和重症监护暴露导致的结果差异，支持分诊标准化、容量规划和降级协调。

2606.00040 2026-06-02 cs.CY cs.AI

Tracing GenAI Literacy: Uncovering Student-AI Interaction Patterns in Academic Writing through Epistemic Network Analysis

追踪GenAI素养：通过认知网络分析揭示学术写作中的学生-AI交互模式

Angxuan Chen, Jiyou Jia

发表机构 * Department of Educational Technology, Graduate School of Education, Peking University（教育技术系，教育研究生院，北京大学）

AI总结本研究利用学习分析和认知网络分析，通过分析162名学生在GenAI辅助摘要写作任务中的交互日志，揭示了高素养学生采用迭代优化和策略性提问，而低素养学生依赖直接生成命令的不同交互模式。

2606.00039 2026-06-02 cs.CY cs.AI cs.HC

Beyond Categories of Caste: Examining Caste Bias and Morality in Text-to-Image AI Models

超越种姓类别：审视文本到图像AI模型中的种姓偏见与道德

Divyanshu Kumar Singh, Dipto Das, Deepika Rama Subramanian, Koustuv Saha, Stephen Voida, Bryan Semaan

发表机构 * University of Colorado Boulder（科罗拉多大学波得尔分校）； University of Toronto（多伦多大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结通过算法审计与批判性话语分析，揭示文本到图像模型如何超越上下种姓二元对立而延续种姓偏见，并提出反种姓方法应对AI系统中的公平问题。

详情

DOI: 10.1145/3805689.3806720

AI中文摘要

文本到图像（T2I）模型在各个领域展现出有前景的实用性。然而，这类模型也在其输出中放大了有害的社会偏见。在南亚背景下，近期研究表明种姓偏见和刻板印象正通过生成式AI（GenAI）系统得以延续。尽管这些研究提供了关于GenAI系统如何使种姓歧视的隐形叙事显性化的极其相关的见解，但它们往往将种姓视为一个身份类别。因此，在本工作中，我们转变本体论，聚焦于种姓的关系性方面。这使我们能够更细致地理解T2I模型产生和延续种姓歧视的机制。通过将算法审计与批判性话语分析相结合，我们借鉴挑战婆罗门规范性的概念框架，展示种姓偏见如何超越上下种姓类别的简单二元对立而得以延续。我们的贡献有两方面。除了挑战将种姓视为类别的范畴化理解，我们还提出了一种反种姓方法，以应对AI系统中种姓偏见和公平性的问题。

英文摘要

Text-to-Image (T2I) models have shown promising utility across various domains. However, such models are also amplifying harmful societal biases in their outputs. In the context of South Asia, recent work has shown caste biases and stereotypes are being perpetuated through Generative AI (GenAI) systems. While this research offers extremely relevant insight into invisibilized narratives of caste discrimination through the GenAI system, they often treat caste as an identity category. Therefore, in this work we shift our ontology to focus on the relational aspect of caste. This enables us to develop a more nuanced understanding of the mechanics of caste discrimination by and through T2I models. Combining an algorithmic audit with critical discourse analysis, we draw on a conceptual frame challenging Brahminical Normativity to show how caste biases are perpetuated beyond the simple binaries of upper vs lower-caste categories. Our contributions are two-fold. Beyond challenging the categorical understanding of caste as a category, we propose an anti-caste approach to tackle the issue of caste bias and fairness in AI systems.

URL PDF HTML ☆

赞 0 踩 0

2606.00037 2026-06-02 cs.CY cs.AI cs.HC

Update Opacity: Epistemic Accessibility and Governance Under AI System Change

更新不透明性：AI系统变更下的认知可及性与治理

Andrea Ferrario, Joshua Hatherley

发表机构 * Institute of Biomedical Ethics and History of Medicine, University of Zürich（伦理与医学史研究所，苏黎世大学）； SUPSI, Dalle Molle Institute for Artificial Intelligence (IDSIA)（SUPSI，达勒莫利人工智能研究所）； ETH Zürich（苏黎世联邦理工学院）； Center for Philosophy of AI, University of Copenhagen（人工智能哲学中心，哥本哈根大学）

AI总结针对AI系统更新导致用户难以理解输出变化的问题，提出结合欧盟AI法案和机器学习运营的治理框架，通过可信度画像和阈值披露实现更新透明化。

详情

AI中文摘要

嵌入部署AI系统中的机器学习模型会定期更新以维持正常功能。然而，此类更新可能产生更新不透明性：用户可能无法理解为何相同输入现在产生不同输出。我们认为，更新不透明性最好被理解为认知可及性的历时性失败：问题在于，在真实角色和时间特定约束下，物质上相关的变更可能无法以支持理解、校准依赖和适当行动的形式保持对用户可及。这使得更新不透明性成为一个治理问题。并非所有变更都同等相关，披露每一次更新本身会因信息过载而损害使用。为解决此问题，我们结合两种互补的治理方法：欧盟AI法案（有助于规范系统层面规范性相关变更的边界）和机器学习运营（提供跟踪和比较随时间变化的操作工具）。在此基础上，我们提出一个框架，通过可信度画像和可信度级别对系统变更建模，并使用基于阈值的披露，随时间向不同利益相关者揭示包络内物质相关变更。我们通过一个医疗AI示例说明该方法，并得出对生命周期文档、上市后监测和更新披露的实际意义。

英文摘要

Machine learning models embedded in deployed AI systems are routinely updated to maintain correct functioning over time. Yet such updates can generate update opacity: users may not be able to understand why the same input now yields a different output. We argue that update opacity is best understood as a diachronic failure of epistemic accessibility: the problem is that materially relevant changes may fail to remain accessible to human users in forms that support understanding, calibrated reliance, and appropriate action under real role- and time-specific constraints. This makes update opacity a governance problem. Not all change is equally relevant, and disclosing every update would itself undermine use through overload. To address this problem, we combine two complementary governance approaches: the EU AI Act, which helps specify the system-level perimeter of normatively relevant change, and Machine Learning Operations, which provides operational tools for tracking and comparing change over time. On this basis, we propose a framework that models system change through trustworthiness profiles and trustworthiness levels, and uses threshold-based disclosure to surface materially relevant within-envelope change to different stakeholders over time. We illustrate the approach with a medical AI example and derive practical implications for lifecycle documentation, post-market monitoring, and update disclosure.

URL PDF HTML ☆

赞 0 踩 0

2606.00033 2026-06-02 cs.CY cs.AI

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing

使机制可解释性可审计：呼吁通过持续协作评审制定指南

Michael Lan, Narmeen Fatimah Oozeer, Chaithanya Bandi, Philip Quirke, Austin Meek, Fazl Barez, Amirali Abdullah

发表机构 * University of Delaware（德克萨斯大学）； University of Oxford（牛津大学）； ThoughtWorks

AI总结针对机制可解释性（MI）实验缺乏标准化审计系统的问题，提出通过持续协作评审平台、专家验证指南和基于来源的审计系统来建立可审计框架，以提升其在AI安全等高风险领域的可信度。

Comments Accepted at ACL 2026 main conference

详情

AI中文摘要

尽管机制可解释性（MI）对神经网络内部机制产生了重要见解，但该领域尚未建立标准化的实验审计系统。因此，其许多发现在医疗AI和自主系统等安全关键应用中仍未得到充分利用，因为利益相关者无法验证其有效性。近期工作具体证明了这一点：两篇论文对同一行为得出了矛盾的结论，第三项研究揭示两者部分正确但因方法不一致而无法比较。缺乏标准化审计时，这种模糊性阻碍了需要强正确性保证的高风险场景中的采用。我们呼吁MI社区致力于开发一种新颖的评审系统，通过以下方式补充同行评审：（1）由协作评审平台支持的持续评审，在该平台上组织和讨论论文之外适合的元科学结果和讨论（如批评、负面结果、事后扩展、复现、复制和部分结果），允许随时进行评论和修订；（2）将该平台上发现的良好实践推广为专家验证的指南和协议，以提高审计效率；（3）基于来源的审计系统，追踪声明所依赖的论点。这篇立场论文鼓励对这样一个框架的必要性、设计和实施进行建设性辩论，并提供早期具体示例以帮助催化这些对话。总体而言，我们提出审计MI本身对于其在AI安全、行业和治理中的应用至关重要。

英文摘要

While mechanistic interpretability (MI) has produced important insights into neural network internals, the field has yet to establish a standardized system to audit experiments. As such, many of its findings remain underutilized in safety-critical applications such as medical AI and autonomous systems, as stakeholders cannot certify their validity. Recent work demonstrates this concretely: two papers found conflicting conclusions for the same behavior, and a third study revealed that both were partially correct but incomparable due to methodological inconsistencies. Without standardized auditing, such ambiguities hinder adoption in high-stakes contexts requiring strong correctness guarantees. We call for the MI community to work towards developing a novel reviewing system that complements peer review via: (1) Continuous reviewing supported by a \emph{Collaborative Reviewing Platform} where meta-science results and discussions (such as critiques, negative results, post-hoc extensions, reproductions, replications, and partial results) that fit outside of papers are organized and discussed, allowing for comments and revisions to be made at any time (2) Generalizing good practices found on this platform into expert-verified guidelines and protocols to improve auditing efficiency, and (3) Source-based auditing systems that track arguments which claims depend on. This position paper encourages constructive debate over the necessity, design and implementation of such a framework, providing early concrete examples to help catalyze these dialogues. Overall, we propose that auditing MI itself is essential for its application in AI safety, industry, and governance.

URL PDF HTML ☆

赞 0 踩 0

2606.00019 2026-06-02 cs.HC cs.AI

Understanding Stigmatizing Language in Clinical Documentation: A Paired Comparison of Ambient AI Drafts and Clinician Finalized Notes

理解临床文档中的污名化语言：环境AI草稿与临床医生最终笔记的配对比较

Yiliang Zhou, Yawen Guo, Sairam Sutari, Jasmine Dhillon, Alexandra L. Beck, Emilie Chow, Steven Tam, Danielle Perret, Deepti Pandita, Gelareh Sadigh, Archana J. McEligot, Kai Zheng

发表机构 * Department of Informatics, University of California, Irvine（加州大学欧文分校信息学系）； Institute for Clinical and Translational Science, University of California, Irvine（加州大学欧文分校临床与转化科学研究所）； Department of Medicine, University of California, Irvine（加州大学欧文分校医学系）； Department of Physical Medicine & Rehabilitation, University of California, Irvine（加州大学欧文分校物理医学与康复科）； Department of Radiological Sciences, University of California, Irvine（加州大学欧文分校放射科学系）； Department of Public Health, California State University Fullerton（加州州立大学富尔顿分校公共卫生系）

AI总结通过配对比较环境AI生成的草稿与临床医生最终笔记，量化编辑前后污名化语言的变化，发现临床医生编辑可能成为污名化语言进入电子健康记录的净来源。

详情

AI中文摘要

环境人工智能（AI）文档工具越来越多地被用于减轻临床医生的文档负担，但它们对临床笔记中偏见语言的影响尚不清楚。我们对AI草稿和相应的临床医生最终笔记进行了大规模比较分析，以量化编辑前后污名化语言的变化。使用基于词典的自然语言处理（NLP）流程，我们测量了（1）AI草稿中污名化语言的普遍性，（2）最终笔记中的普遍性和术语组成，以及（3）污名化术语的移除或引入频率。在66,297对笔记部分中，21.4%的AI草稿部分包含至少一个污名化语言提及，而在临床医生最终版本中这一比例上升至24.0%。引入比移除更频繁，表明在使用环境AI时，临床医生编辑可能是污名化语言进入电子健康记录的净来源。

英文摘要

Ambient artificial intelligence (AI) documentation tools are increasingly deployed to reduce clinician documentation burden, but their implications for biased language in clinical notes remain unclear. We conducted a large-scale comparison analysis of AI drafts and corresponding clinician finalized notes to quantify stigmatizing language changes pre- and post-editing. Using a lexicon-based natural language processing (NLP) pipeline, we measured (1) the prevalence of stigmatizing language in AI drafts, (2) the prevalence and term composition in final notes, and (3) the frequency of removal or introduction of stigmatizing terms. Across 66,297 paired note sections, 21.4% of AI draft sections contained at least one stigmatizing language mention, rising to 24.0% in clinician finalized versions. Introductions occurred more often than removals, suggesting clinician editing can be a net source of stigmatizing language entering the EHR with using Ambient AI.

URL PDF HTML ☆

赞 0 踩 0

2606.00018 2026-06-02 cs.HC cs.AI

Examine Clinicians' Modification of Hedging Language in Ambient AI Documentation: A Comparative Study of AI Drafts and Final Notes

检验临床医生对环境AI文档中模糊语言的修改：AI草稿与最终笔记的比较研究

Yiliang Zhou, Yawen Guo, Di Hu, Sairam Sutari, Emilie Chow, Steven Tam, Danielle Perret, Deepti Pandita, Kai Zheng

发表机构 * Department of Informatics, University of California, Irvine（加州大学 Irvine 分校信息学院）； Institute for Clinical and Translational Science, University of California, Irvine（加州大学 Irvine 分校临床与转化科学研究所）； Department of Medicine, University of California, Irvine（加州大学 Irvine 分校医学院）； Department of Physical Medicine & Rehabilitation, University of California, Irvine（加州大学 Irvine 分校康复医学系）

AI总结通过配对分析环境AI生成的临床笔记草稿与医生修改后的最终版本，研究医生编辑如何改变模糊语言的使用频率、方向性以及不同AI供应商和临床专科之间的差异。

详情

AI中文摘要

环境AI文档系统生成临床笔记草稿，医生在签署进入电子健康记录前经常修改，但这些编辑如何改变模糊语言尚不清楚。我们对医生编辑过的环境AI草稿和最终笔记部分进行了配对分析，以检验：(1)这些编辑是否改变了模糊语言的出现频率，(2)这些编辑是否表现出向更大确定性或不确定性的系统性转变，以及(3)模糊语言频率和方向性的变化是否因环境AI供应商和临床专科而异。在62,811对笔记部分中，模糊术语更常被引入先前非模糊文本，而非从先前模糊文本中移除，且编辑后文本比编辑前文本包含更多模糊提及。方向性分析显示，在模糊相关的替换编辑中，总体显著倾向于更大的不确定性。供应商和专科分析揭示了模糊频率、编辑前后模糊提及变化以及方向性的显著异质性。

英文摘要

Ambient AI documentation systems generate clinical note drafts that clinicians frequently revise before signing off into electronic health records, yet how these edits alter hedging language remains unclear. We conducted paired analysis of clinician-edited portions of ambient AI drafts and final notes to examine (1) whether these edits change the prevalence of hedging language, (2) whether these edits exhibit a systematic shift toward greater certainty or uncertainty, and (3) whether these changes in hedging prevalence and directionality differ by ambient AI vendors and clinical specialties. Among 62,811 paired note sections, hedging terms were more often introduced into previously non-hedged text than removed from previously hedged text, and post-edit text contained more hedging mentions than pre-edit text. Directionality analyses showed a significant overall tendency toward greater uncertainty in hedging-related replacement edits. Vendor and specialty analyses revealed substantial heterogeneity in hedging prevalence, pre-to-post changes in hedging mentions, and directionality.

URL PDF HTML ☆

赞 0 踩 0

2606.00015 2026-06-02 cs.HC cs.AI cs.CY cs.ET

SortingHat: Redefining Operating Systems Education with a Tailored Digital Teaching Assistant

SortingHat: 用定制的数字教学助手重新定义操作系统教育

Yifan Zhang, Xinkui Zhao, Zuxin Wang, Zhengyi Zhou, Guanjie Chen, Shuiguang Deng, Jianwei Yin

发表机构 * School of Software Technology, Zhejiang University（浙江大学软件学院）

AI总结针对操作系统课程教学挑战，提出结合检索增强生成和多智能体强化学习的3D数字人教学助手SortingHat，提供个性化指导、自适应内容生成和自动评估。

详情

DOI: 10.1145/3701716.3715199
Journal ref: WWW '25: Companion Proceedings of the ACM on Web Conference 2025,Pages 2951 - 2954

AI中文摘要

操作系统课程是计算机科学教育中最具挑战性的课程之一，原因在于其内部结构的复杂性和运行环境的多样性。传统的教学方法往往无法应对学生多样化的背景、学习速度和实际需求。为了应对这些挑战，我们提出了SortingHat，一个专为操作系统教育定制的个性化数字教学助手。SortingHat集成了先进的人工智能技术，包括检索增强生成框架和多智能体强化学习，以提供自适应、可扩展且有效的教育支持。SortingHat采用由大型语言模型驱动的3D数字人界面，提供个性化、富有同理心和上下文感知的指导。它根据每个学生的学习历史和学业表现生成定制的练习，强化薄弱环节并挑战高级概念。此外，该系统包含一个强大的评估流程，确保对学生提交的内容进行公平、一致和无偏见的评分，同时提供个性化的、可操作的改进反馈。通过结合个性化指导、自适应内容创建和自动评估，SortingHat将操作系统教育转变为一种引人入胜、沉浸式且可扩展的体验。

英文摘要

Operating Systems (OS) courses are among the most challenging in computer science education due to the complexity of internal structures and the diversity of running environments. Traditional teaching methods often fail to address the diverse backgrounds, learning speeds, and practical needs of students. To tackle these challenges, we present SortingHat, a personalized digital teaching assistant tailored specifically for OS education. SortingHat integrates advanced AI technologies, including a retrieval augmented generation (RAG) framework and multi agent reinforcement learning (MARL), to deliver adaptive, scalable, and effective educational support. SortingHat features a 3D digital human interface powered by large language models (LLMs) to provide personalized, empathetic, and context aware guidance. It generates tailored exercises based on each student's learning history and academic performance, reinforcing weak areas and challenging advanced concepts. Additionally, the system incorporates a robust evaluation pipeline that ensures fair, consistent, and unbiased grading of student submissions while delivering personalized, actionable feedback for improvement. By combining personalized guidance, adaptive content creation, and automated assessment, SortingHat transforms OS education into an engaging, immersive, and scalable experience.

URL PDF HTML ☆

赞 0 踩 0

2606.00013 2026-06-02 cs.CY cs.AI cs.HC

A phenomenon of AI-conformity: how algorithms change human moral decision-making

AI从众现象：算法如何改变人类道德决策

Yana Venerina, Dmitry Koch, Nare Meloyan, Gerda Prutko, Valeriia Lelik, Victoria Taova, Andrey Kurpatov

发表机构 * Neuroscience Laboratory, Sberbank, Moscow, Russia（神经科学实验室，俄罗斯储蓄银行，莫斯科）

AI总结本研究通过改编经典Asch范式，发现具有推理能力的AI模型对人类道德判断的影响程度与人类多数相当，表明道德决策也可能受到算法从众的影响。

Comments 31 pages, 1 figure

详情

AI中文摘要

社会从众是一种有充分记录的现象，即个体将其观点转向社会多数的观点。随着人工智能（AI）日益融入日常生活，它也可能创造一种新的影响源，引发算法从众，其机制尚不清楚。本研究考察了AI判断是否影响人类的道德决策（n=165），改编了经典的Asch范式。参与者在三种不同条件下完成一系列道德困境：存在社会多数时、AI模型提供简短答案时、以及AI模型同时提供答案和解释时。在所有条件下，呈现的回应都违背了普遍接受的道德规范。结果表明，具有推理成分的AI模型对参与者意见的影响程度与人类多数相当。这些发现表明，即使是道德判断，尽管其敏感性和个人重要性，也可能容易受到算法从众的影响。然而，算法从众的机制似乎与社会从众不同。总体而言，该研究挑战了道德决策处于“AI禁区”——即被认为只有人类决策才可接受的领域——的假设，并强调了随着基于AI的建议日益融入人类决策，需要进一步研究这一现象。

英文摘要

Social conformity is a well-documented phenomenon in which individuals shift their opinions towards those of a social majority. As artificial intelligence (AI) becomes increasingly integrated into everyday life it may also create a novel source of influence giving rise to algorithmic conformity, mechanisms of which are poorly understood. The present study examined whether AI judgements affect moral decision-making in humans (n=165) adapting the classical Asch paradigm. Participants completed a series of moral dilemmas under three different conditions: in presence of social majority, with an AI model providing brief answers and with an AI model providing both answers and explanations of its choices. In all conditions the presented responses contradicted generally accepted moral norms. The results indicated that an AI model with a reasoning component affected the opinion of participants to a degree comparable to that of a human majority. These findings suggest that even moral judgements, despite their sensitivity and personal significance, may be susceptible to algorithmic conformity. However, the mechanism underlying algorithmic conformity appears to differ from the social one. Overall, the study challenges the assumption that moral decision-making lies in "AI inadmissibility zone" - a sphere that is considered as an area in which only human-made decisions are acceptable and highlights the need for a further investigation of this phenomenon as AI-based recommendations become increasingly embedded into human decision-making.

URL PDF HTML ☆

赞 0 踩 0

2606.00011 2026-06-02 cs.HC cs.AI cs.LG

RuleEdit: Failure-Guided Human-AI Model Editing with Prospective Impact Preview

RuleEdit: 失败引导的人机模型编辑与前瞻性影响预览

Min Hun Lee, Justin Yu Feng Teo

发表机构 * Singapore Management University（新加坡国立大学）

AI总结提出RuleEdit系统，通过规则表的不匹配信号检测失败并预览模型编辑的影响，在卒中康复评估中显著提升人机协同性能。

详情

AI中文摘要

尽管AI有望协助复杂决策，但从业者仍然缺乏在提交模型编辑之前检测可能失败和检查后果的方法。我们提出RuleEdit，一个交互式、规则引导的人机模型编辑系统，它(i)通过规则表可解释的不匹配信号揭示可能的失败，并(ii)支持用户编写的规则反馈，提供预期性能变化和嵌入偏移的前瞻性预览。我们在卒中康复评估中实例化RuleEdit，并与卫生专业人员和学生一起评估。规则引导的失败检测将人+AI性能显著提高了14.16%（p<0.001），同时改善了对错误AI的拒绝，减少了过度依赖和不足依赖以及ChangedToWrong决策。此外，呈现前瞻性嵌入预览改善了参与者对模型适应的反馈，在纳入用户基于规则的反馈后，将更新后的局部性能增益从11.50%提高到36.38%（p<0.001）。我们的发现表明，基于不匹配的失败线索和前瞻性影响预览可以支持失败感知的人机模型编辑，同时也揭示了局部-全局权衡：有助于特定案例的编辑在全局转移时可能会降低性能。我们讨论了设计失败感知和可控人机系统的意义。

英文摘要

Despite the promise of AI to assist complex decisions, practitioners still lack ways to detect likely failures and inspect the consequences of model edits before committing them. We present RuleEdit, an interactive, rule-guided human-AI model editing system that (i) surfaces likely failures through interpretable mismatch signals from rule tables and (ii) supports user-authored rule feedback with prospective previews of projected performance changes and embedding shifts. We instantiate RuleEdit in stroke rehabilitation assessment and evaluate it with health professionals and students. Rule-guided failure detection significantly increased Human + AI performance by 14.16\% ($p<0.001$) while improving rejection of incorrect AI and reducing both over- and under- reliance as well as ChangedToWrong decisions. In addition, presenting prospective embedding previews improved participants' feedback for model adaptation, increasing post-update local performance gains from 11.50\% to 36.38\% after incorporating users' rule-based feedback ($p<0.001$). Our findings show that mismatch-based failure cues and prospective impact previews can support failure-aware human-AI model editing, while also revealing a local-global tradeoff: edits that help a specific case can degrade performance when transferred globally. We discuss implications of designing failure-aware and controllable human-AI systems.

URL PDF HTML ☆

赞 0 踩 0

2606.00010 2026-06-02 cs.HC cs.AI cs.CY

Empathic and agentic artificial intelligence in nursing: perspectives on a human-centered framework for cancer care navigation in the United States

护理中的共情与自主人工智能：美国癌症护理导航中以人为本框架的视角

Tyra Girdwood, Saba Kheirinejad, Parnian Kheirkhah Rahimabad, Brianna M. White, Robert L Davis, David L Schwartz, Arash Shaban-Nejad

发表机构 * University of Tennessee Health Science Center, College of Nursing（田纳西大学健康科学中心护理学院）； University of Tennessee Health Science Center, Center for Biomedical Informatics, Department of Pediatrics（田纳西大学健康科学中心生物医学信息中心，儿科系）； University of Tennessee Health Science Center, Department of Radiation Oncology（田纳西大学健康科学中心放射肿瘤学系）

AI总结本文提出一个以人为本的人工智能框架，结合共情与自主方法，基于美国护士协会伦理准则，支持护士在癌症护理导航中增强而非取代人类共情与自主性，改善工作流程、医患关系和护理协调。

Comments 5 Pages, 1 Figure, 1 Table

详情

DOI: 10.1016/j.envres.2023.116972
Journal ref: ESMO Real World Data and Digital Oncology, 2026, Vol 12, 100694

AI中文摘要

对于癌症患者，护士导航可以通过加强健康服务协调和患者结果来减轻复杂护理的负担。然而，在资源不足的地区，训练有素的护士导航员可能有限或不存在。在美国，人工智能驱动的数字健康工具日益可用，可能有助于解决护理协调中的差距；然而，大多数并非专门设计用于支持护理。这篇观点文章讨论了一个以人为本的人工智能框架，该框架整合了基于美国护士协会伦理准则的共情和自主方法，以支持美国护士在癌症护理导航中的工作。该框架可以增强而非取代人类的共情和自主性，同时改善护士工作流程、患者-临床医生关系以及资源不足地区的护理协调服务。

英文摘要

For patients experiencing cancer, nurse navigation can ease the burden of complex care by enhancing coordination of health services and patient outcomes. However, in under-resourced areas, trained nurse navigators may be limited or non-existent. In the United States, artificial intelligence (AI)-enabled digital health tools are increasingly available and may help address gaps in care coordination; however, most are not designed to specifically support nursing. This perspective piece discusses a human-centered AI framework that integrates empathic and agentic approaches grounded in the American Nurses Association's code of ethics to support nurses in the United States in cancer care navigation. The framework could augment, not replace, human empathy and agency while improving nurse workflow, patient-clinician relationships, and care coordination services in under-resourced areas.

URL PDF HTML ☆

赞 0 踩 0

2606.00001 2026-06-02 cs.HC cs.CV cs.MM

Shu Dao: A Calligraphy Score Framework Linking Calligraphy, Music, and Performance

书道：连接书法、音乐与表演的评分框架

Lican Huang

发表机构 * Hangzhou Domain Zones Technology Co., Ltd.（杭州域区技术有限公司）

AI总结提出CWSR表示法和书道框架，将东亚书法建模为类似乐谱的结构化表演，支持人机共创。

Comments 47 pages

详情

Journal ref: Journal of Advances in Information Science and Technology, 2026 4(2), 1-47. https://yvsou.com/journal/index.php/jaist/article/view/43

AI中文摘要

本文介绍了书法书写评分表示法（CWSR），并提出了书道框架，将东亚书法解读为一种表演艺术而非静态视觉产物。受日本书道和茶道等体现文化实践的启发，该框架将书法建模为类似于音乐符号的结构化表演。该方法不将字符表示为固定图像，而是将每个笔画编码为有序且可执行的动作，形成书法评分。字符在结构化空间网格中组织，笔画标注有类型、执行顺序、空间坐标、轨迹、构图角色以及动态属性（如笔压和节奏）。这种表示捕捉了书法书写中通常图像表示所缺失的时间和表达方面。本文做出三项主要贡献：首先，引入CWSR作为结构化符号系统，在笔画、字符结构和构图组织（如布局和章法）等多个层面表示书法，及其节奏和表演动态；其次，将书道概念化为基于评分的框架，将书法建模为结构化表演；第三，为基于AI的书法智能体分析、可视化和可执行生成书法作品建立计算基础。这些贡献共同连接了书法、音乐符号和表演文化实践，支持计算书法和数字人文研究中的人机共创。

英文摘要

This paper introduces Calligraphy Writing Score Representation (CWSR) and proposes Shu Dao as a framework that interprets East Asian calligraphy as a performative art rather than a static visual artifact. Inspired by traditions such as Japanese Shodō and embodied cultural practices such as Chadao , the framework models calligraphy as a structured performance analogous to musical notation. Instead of representing characters as fixed images, the proposed approach encodes each brush stroke as an ordered and executable action, forming a calligraphy score. Characters are organized within a structured spatial grid, and strokes are annotated with attributes including stroke type, execution order, spatial coordinates, trajectory, compositional role, and dynamic properties such as brush pressure and pacing. This representation captures temporal and expressive aspects of calligraphic writing that are typically absent from image-based representations. The paper makes three main contributions. First, it introduces CWSR as a structured notation system for representing calligraphy across multiple levels, including strokes, character structures, and compositional organization (e.g., layout and zhangfa), together with their rhythmic and performative dynamics. Second, it conceptualizes Shu Dao as a score-mediated framework that models calligraphy as structured performance. Third, it establishes a computational foundation for the analysis, visualization, and executable generation of calligraphic works by AI-based calligraphic agents. Together, these contributions bridge calligraphy, musical notation, and performative cultural practices, supporting human--AI co-creation in computational calligraphy and digital humanities research.

URL PDF HTML ☆

赞 0 踩 0

2605.30848 2026-06-02 cs.CR cs.CL

LLM Anonymization Against Agentic Re-Identification

LLM匿名化对抗智能体重识别

Ziwen Li, Jianing Wen, Tianshi Li

发表机构 * Khoury College of Computer Sciences（科里学院计算机科学学院）； Northeastern University（东北大学）

AI总结提出AURA框架，通过掩码-重构方法解耦隐私定位与效用保留，并利用对抗性隐私和效用检查，以抵抗基于网络搜索的智能体重识别攻击，同时保留文本的上下文效用。

Comments 32 pages, 7 figures

详情

AI中文摘要

具有网络搜索功能的智能体LLM改变了文本匿名化的威胁模型：弱上下文线索可能成为重识别的交叉引用证据，但这些细节也承载着文本的下游分析价值。现有的防御措施要么移除显式标识符，要么扰动文本以获得形式化隐私，要么针对非网络推理模型测试改写后的文本，而未充分探索抵抗智能体网络搜索重识别与效用保留之间的操作区域。我们引入了AURA（ extbf{A}nonymization with extbf{U}tility- extbf{R}etention extbf{A}daptation），一个LLM驱动的 extit{掩码-重构}框架，将隐私定位与效用保留重构解耦，并通过对抗性隐私和效用保留检查选择候选方案。我们在真实用户访谈转录上评估AURA，使用由网络搜索智能体发起的重识别攻击，以及基于受访者档案事实、编码本事实和联合上下文效用网格的效用评估。我们的结果表明，AURA通过使用自适应隐私范围增强对智能体重识别的抵抗，以及使用掩码-重构匿名化方法在固定隐私范围内更好地保留上下文效用，从而改善了隐私-效用边界。

英文摘要

Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defenses either remove explicit identifiers, perturb text for formal privacy, or test rewritten text against non-web inference models, leaving underexplored the operating region between resistance to agentic web-search re-identification and utility retention. We introduce AURA (\textbf{A}nonymization with \textbf{U}tility-\textbf{R}etention \textbf{A}daptation), an LLM-powered \textit{mask-reconstruct} framework that decouples privacy localization from utility-preserving reconstruction and selects candidates with adversarial privacy and utility-retention checks. We evaluate AURA on real-user interview transcripts using re-identification attacks carried out by web-search agents, along with a utility evaluation based on interviewee-profile facts, codebook facts, and the joint contextual utility grid. Our results show that AURA improves the privacy-utility frontier by using adaptive privacy scope to strengthen resistance to agentic re-identification and using a mask-reconstruct anonymization method to better preserve contextual utility under fixed privacy scope.

URL PDF HTML ☆

赞 0 踩 0

2605.30743 2026-06-02 cond-mat.mtrl-sci cs.CE cs.CL

A Padding Method for Enhanced Encoding of Inorganic Structures with Varying Chemical Compositions

一种用于增强不同化学成分无机结构编码的填充方法

Thang Dang, Haderbache Amir, Tzanakakis Alexandros, Yoshimoto Yuta

发表机构 * Fujitsu Limited（富士通株式会社）； National Technical University of Athens（希腊国家技术大学）

AI总结提出一种利用晶体对称性信息（Wyckoff位置长度感知填充）的编码方法，结合端到端生成系统，提升无机材料生成精度和稳定性，在质子导体数据上重建准确率提升5.3%，在perov-5数据集上生成的新颖稳定材料比基线模型多63.5%。

详情

AI中文摘要

通过生成模型设计新型无机材料仍然是材料科学的重要挑战，这是因为无机结构在广泛的化学成分和结构景观中具有复杂性和多样性。无机化合物的巨大组合空间需要创新的、人工智能驱动的方法来克服生成准确性和效率方面的限制。为了解决这个问题，我们引入了一种新方法，通过利用领域特定的对称感知表示来重新定义无机材料的编码和生成。我们的方法不仅改进了复杂无机结构的表示，还通过提高生成候选物的精度和稳定性，为材料发现领域做出了贡献。我们方法的核心是一种利用晶体对称信息来增强编码过程的新型填充技术。通过将Wyckoff位置长度感知填充集成到编码器架构中，我们实现了对无机材料更鲁棒的、信息丰富的表示。这种对称驱动的增强提高了深度学习模型生成稳定、先前未探索的无机结构的准确性和计算效率。此外，我们引入了一个端到端系统，利用机器学习势模型从初始数据到验证输出无缝生成新颖的、甚至在训练数据中未见过的稳定无机材料。该管道将先进的生成模型与稳定性分析相结合，标志着下一代无机材料自动探索和设计的重大飞跃。我们的方法在质子导体数据上将重建准确率提高了5.3%，并在perov-5数据集上生成了比基线模型多63.5%的新颖稳定无机材料。

英文摘要

Designing novel inorganic materials through generative models remains an important challenge for material science, driven by the complexity and diversity of inorganic structures across expansive chemical compositions and structural landscape. The vast combinatorial space of inorganic compounds demands innovative, AI-driven approaches to overcome limitations in generative accuracy and efficiency. To address this, we introduce a novel method that redefines the encoding and generation of inorganic materials by utilizing domain-specific symmetry-aware representation. Our approach not only refines the representation of intricate inorganic structures but also contributes to the field of material discovery by enhancing the precision and stability of generated candidates. Central to our methodology is a novel padding technique that exploits crystal symmetry information to enhance the encoding process. By integrating Wyckoff position length-aware padding into an encoder architecture, we achieve a more robust informed representation of inorganic materials. This symmetry-driven enhancement improves deep learning models to generate stable, previously unexplored inorganic structures with superior accuracy and computational efficiency. Furthermore, we introduce an end-to-end system that leverages the machine learning potential models to seamlessly generate novel, even those unseen in the training data, and stable inorganic materials from initial data to validated output. This pipeline integrates advanced generative models with stability analysis, marking a significant leap forward in the automated exploration and design of next-generation inorganic materials. Our method improved reconstruction accuracy 5.3% in proton conductor data, and generated 63.5% more novel stable inorganic material to baseline model on the perov-5 dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.27527 2026-06-02 astro-ph.IM cs.LG

Probabilistic Data-Driven Modelling of Astrophysical Transients: The Neural Process Family for Ultrafast and Class-Agnostic Light Curve Reconstruction with NightLANP

天体瞬变事件的概率数据驱动建模：基于NightLANP的超快速与类别无关光变曲线重建的神经过程家族

Siddharth Chaini, Federica B. Bianco, Ashish Mahabal

发表机构 * NASA FINESST Fellow ； Department of Physics and Astronomy, University of Delaware（物理与天文学系，德雷克塞尔大学）； University of Delaware, Data Science Institute（德雷克塞尔大学数据科学研究所）； Joseph R. Biden, Jr. School of Public Policy and Administration, University of Delaware（德雷克塞尔大学公共政策与行政学院）； Vera C. Rubin Observatory（维拉·鲁宾天文台）； Division of Physics, Mathematics, and Astronomy, California Institute of Technology（物理、数学与天文学系，加州理工学院）； Center for Data Driven Discovery, California Institute of Technology（数据驱动发现中心，加州理工学院）

AI总结针对稀疏不规则光变曲线重建问题，提出神经过程家族（以注意力神经过程为例），结合高斯过程的概率框架与深度学习的可扩展性，通过元学习实现跨波段、类别无关的快速推理，在Rubin模拟数据上优于高斯过程和神经网络。

详情

AI中文摘要

来自地球的天体观测受到天气、环境和科学限制，导致稀疏、不规则的光变曲线。在Vera C. Rubin天文台时空遗产巡天前夕，其数据集为瞬变科学提供了前所未有的机遇。然而，一个关键挑战是其观测节奏——在六个波段上稀疏且不规则，限制了推断。插值有助于缓解这一问题，高斯过程是标准方法，但它们在跨波段相关性上表现不佳，需要先验核函数指定，并且必须单独拟合每条光变曲线，因此可扩展性差。在此，我们引入神经过程家族用于光变曲线重建，结合了高斯过程的概率框架与深度学习的可扩展性。通过在多样化的模拟瞬变事件上进行元学习，注意力神经过程将大部分计算转移到训练阶段，从而能够使用类别无关模型进行快速、摊销的推断。在15个瞬变类别上使用真实的Rubin观测节奏进行评估，我们表明，即使是一个未优化的、开箱即用的注意力神经过程，在所有测试指标（包括回归质量、天体物理特征恢复和概率校准）上始终优于所有基准——一组高斯过程和神经网络。我们的模型同时插值所有波段，耗时微秒级，比次优的神经基准快四个数量级，比高斯过程快五个数量级，展示了神经过程在Rubin夜间警报流中的潜力。注意力神经过程避免了标准神经网络的过度自信和高斯过程的信心不足，提供了尖锐且良好校准的不确定性。这项工作确立了神经过程家族作为Rubin时代实时瞬变科学的可扩展概率基础。

英文摘要

Astrophysical observations from Earth are subject to weather, environmental, and scientific constraints that lead to sparse, irregular light curves. On the eve of the Vera C. Rubin Observatory Legacy Survey of Space and Time, its dataset offers unprecedented opportunities for transient science. Yet a key challenge remains its cadence, sparse and irregular across six bands, limiting inference. Interpolation helps mitigate this, with Gaussian Processes the standard, but they struggle with cross-band correlations, require a priori kernel specification, and must be fit to each light curve individually, hence scaling poorly. Here, we introduce the neural process family for light curve reconstruction, combining the probabilistic framework of Gaussian Processes with the scalability of deep learning. By meta-learning on diverse simulated transients, Attentive Neural Processes shift the bulk of computation to training, enabling rapid, amortized inference with a class-agnostic model. Evaluated on realistic Rubin cadences across 15 transient classes, we show that even an unoptimized, out-of-the-box Attentive Neural Process consistently outperforms all benchmarks -- a suite of Gaussian Processes and neural networks -- on every tested metric, spanning regression quality, astrophysical feature recovery, and probabilistic calibration. Our model interpolates all bands simultaneously in microseconds, over four orders of magnitude faster than the next-best neural benchmark and five faster than Gaussian Processes, demonstrating the potential of neural processes for the nightly Rubin alert stream. Attentive Neural Processes avoid the overconfidence of standard neural networks and the underconfidence of Gaussian Processes, delivering sharp, well-calibrated uncertainties. This work establishes the neural process family as a scalable, probabilistic foundation for real-time transient science in the Rubin era.

URL PDF HTML ☆

赞 0 踩 0

2605.26874 2026-06-02 cs.DB cs.AI cs.LG

Knowledge Graphs as the Missing Data Layer for LLM-Based Industrial Asset Operations

知识图谱：基于LLM的工业资产运营中缺失的数据层

Madhulatha Mandarapu, Sandeep Kunkunuru

发表机构 * VaidhyaMegha Private Limited, India（印度VaidhyaMegha私人有限公司）

AI总结研究通过类型化知识图谱作为数据层，将GPT-4在工业维护场景中的准确率从65%提升至99%，并引入生成增强知识（GAK）处理缺失数据，实现81.8%的场景可回答性。

Comments v2: reframed around the knowledge graph as a grounding substrate with a 3-tier router (text-to-Cypher; native graph/optimization primitives; generation-augmented knowledge, GAK). Adds a benchmark-grounded GAK evaluation on 88 real non-deterministic AssetOpsBench scenarios with provenance-tagged enrichment. 18 pages. Code: github.com/samyama-ai/assetops-kg

详情

AI中文摘要

基于LLM的工业资产运营代理在处理平面文档存储时准确性有限。AssetOpsBench（KDD 2026）表明，GPT-4代理在139个工业维护场景中达到65%的准确率，并比较了LLM编排范式（Agent-As-Tool vs. Plan-Execute）在固定数据层上的表现。我们提出一个正交问题：工具背后的数据模型有多重要？我们将类型化知识图谱作为基础基质，并根据最佳回答方式路由每个问题：（i）LLM生成的Cypher进行结构化检索，将同一GPT-4模型从65%提升至82-83%；（ii）原生图和优化原语（无需LLM）在图可回答场景中达到99%；（iii）生成增强知识（GAK）用于处理数据中缺失的答案——引擎的代理将缺失事实实现为带有溯源标签的图节点，然后回答。一个反复出现的主题是反向LLM使用：我们约束LLM从类型化模式生成查询或一次性丰富，让图确定性地执行。在88个真实的AssetOpsBench故障模式场景中（基准本身标记为非确定性——图中缺失十种设备类型），GAK将可回答性从零提升至100%的设备类型，并回答了81.8%的场景，每个实现的事实都标记为来源：LLM派生以确保可审计性。我们还贡献了40个图原生场景。对于结构化操作领域，数据层——而非LLM编排——是主要杠杆，类型化知识图谱充当原始工业数据与LLM推理之间的基础基质。

英文摘要

LLM-based agents for industrial asset operations show limited accuracy when reasoning over flat document stores. AssetOpsBench (KDD 2026) establishes that GPT-4 agents achieve 65% on 139 industrial maintenance scenarios, and compares LLM orchestration paradigms (Agent-As-Tool vs. Plan-Execute) on a fixed data layer. We ask the orthogonal question: how much does the data model behind the tools matter? We treat a typed knowledge graph as a grounding substrate and route each question by how it is best answered: (i) LLM-generated Cypher for structured retrieval, which lifts the same GPT-4 model from 65% to 82-83%; (ii) native graph and optimization primitives, with no LLM, reaching 99% on graph-answerable scenarios; and (iii) generation-augmented knowledge (GAK) for answers absent from the data -- the engine's agent materializes the missing facts as provenance-tagged graph nodes, then answers. A recurring theme is inverted LLM usage: we constrain the LLM to query generation or one-shot enrichment from a typed schema and let the graph execute deterministically. On the 88 real AssetOpsBench failure-mode scenarios the benchmark itself flags non-deterministic -- ten equipment types absent from the graph -- GAK lifts answerability from zero to 100% of equipment types and answers 81.8% of scenarios, every materialized fact tagged source:LLM-derived for auditability. We also contribute 40 graph-native scenarios. For structured operational domains the data layer -- not the LLM orchestration -- is the primary lever, and a typed knowledge graph serves as a grounding substrate between raw industrial data and LLM reasoning.

URL PDF HTML ☆

赞 0 踩 0