arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3856
2508.13747 2026-06-09 cs.LG

DREAMS: Preserving both Local and Global Structure in Dimensionality Reduction

DREAMS: 在降维中保持局部和全局结构

Noël Kury, Dmitry Kobak, Sebastian Damrich

发表机构 * Hertie Institute for AI in Brain Health(人工智能与脑健康赫尔蒂研究所) University of Tübingen(图宾根大学) University of Tübingen, Germany(德国图宾根大学)

AI总结 DREAMS结合t-SNE和PCA的局部和全局结构保持,通过简单正则化项生成多种嵌入,平衡局部和全局结构。

Comments Transactions on Machine Learning Research (2026)

详情
Journal ref
Transactions on Machine Learning Research (TMLR) 2026
AI中文摘要

降维技术广泛用于将高维数据可视化为二维。现有方法通常只保留局部(如t-SNE、UMAP)或全局(如MDS、PCA)结构,但没有方法能同时良好表示两者。本文提出DREAMS(多尺度增强降维),通过简单正则化项结合t-SNE的局部结构保持和PCA的全局结构保持。我们的方法在t-SNE局部结构良好的嵌入和PCA全局结构良好的嵌入之间生成一系列嵌入,高效平衡局部和全局结构保持。我们在十一组真实世界数据集上基准测试DREAMS,展示其在多尺度结构保持方面优于先前方法的能力。

英文摘要

Dimensionality reduction techniques are widely used for visualizing high-dimensional data in two dimensions. Existing methods are typically designed to preserve either local (e.g., $t$-SNE, UMAP) or global (e.g., MDS, PCA) structure of the data, but none of the established methods can represent both aspects well. In this paper, we present DREAMS (Dimensionality Reduction Enhanced Across Multiple Scales), a method that combines the local structure preservation of $t$-SNE with the global structure preservation of PCA via a simple regularization term. Our approach generates a spectrum of embeddings between the locally well-structured $t$-SNE embedding and the globally well-structured PCA embedding, efficiently balancing both local and global structure preservation. We benchmark DREAMS across eleven real-world datasets, showcasing qualitatively and quantitatively its superior ability to preserve structure across multiple scales compared to previous approaches.

2601.21221 2026-06-09 cs.AI

Causal Discovery for Explainable AI: A Dual-Encoding Approach

可解释AI中的因果发现:一种双编码方法

Henry Salgado, Meagan R. Kendall, Martine Ceberio

发表机构 * Department of Computer Science, The University of Texas at El Paso(德克萨斯理工大学计算机科学系) Department of Engineering Education and Leadership, The University of Texas at El Paso(德克萨斯理工大学工程教育与领导力系)

AI总结 本文提出一种双编码方法,通过互补编码策略和多数投票融合,解决传统因果发现方法在处理分类变量时的数值不稳定问题,并在泰坦尼克号数据集上验证了方法的有效性。

Comments 6 pages

详情
AI中文摘要

理解特征间的因果关系对于解释机器学习模型决策至关重要。然而,传统因果发现方法在处理分类变量时面临条件独立性测试的数值不稳定性问题。我们提出了一种双编码因果发现方法,通过互补编码策略和多数投票融合来解决这些限制。在泰坦尼克号数据集上的应用表明,该方法能够识别出与已建立的可解释方法一致的因果结构。

英文摘要

Understanding causal relationships among features is fundamental for explaining machine learning model decisions. However, traditional causal discovery methods face challenges with categorical variables due to numerical instability in conditional independence testing. We propose a dual-encoding causal discovery approach that addresses these limitations by running constraint-based algorithms with complementary encoding strategies and merging results through majority voting. Applied to the Titanic dataset, our method identifies causal structures that align with established explainable methods.

2601.09239 2026-06-09 cs.SD cs.AI eess.AS

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

DSA-Tokenizer:基于流匹配层次化融合的解耦语义-声学分词器

Hanlin Zhang, Daxin Tan, Dehua Tao, Xiao Chen, Haochen Tan, Yunhe Li, Yuchen Cao, Linqi Song

发表机构 * Department of Computer Science, City University of Hong Kong(香港城市大学计算机科学系) AI Lab, Leibniz Research Center, Huawei(华为利比全中心人工智能实验室)

AI总结 提出DSA-Tokenizer,通过ASR监督语义令牌和mel谱重建监督声学令牌实现解耦,并引入层次化流匹配解码器和联合重构-上下文修补训练策略,实现高保真重构和跨语句语音克隆。

Comments Submit to ACL ARR 2026 May

详情
AI中文摘要

语音分词器是全离散语音大语言模型的关键构建模块。现有的分词器要么优先考虑语义编码,将语义内容与声学风格不可分离地融合,要么实现不完全的语义-声学解耦。为了实现更好的解耦,我们提出了DSA-Tokenizer,它通过不同的优化约束将语音显式解耦为离散的语义和声学令牌。具体来说,语义令牌由ASR监督以捕获语言内容,而声学令牌专注于mel谱重构以编码风格。我们进一步引入了层次化流匹配解码器和联合重构-上下文修补训练策略,使模型能够支持高保真重构和跨语句语音克隆。为了加速推理,我们蒸馏了DiT解码器,将推理采样步数减少到4步,并通过GAN微调提高合成质量。实验表明,DSA-Tokenizer提供了强大的语义-声学解耦、可靠的可控语音克隆以及低WER/CER的高效高保真生成。此外,我们的结果表明,解耦分词为下游大模型语音生成提供了更有效的接口。音频样本可在https://anonymous.4open.science/w/DSA_Tokenizer_demo/获取。

英文摘要

Speech tokenizers are a key building block of fully discrete Speech LLMs. Existing tokenizers either prioritize semantic encoding, fuse semantic content with acoustic style inseparably, or achieve incomplete semantic-acoustic disentanglement. To achieve better disentanglement, we propose DSA-Tokenizer, which explicitly disentangles speech into discrete semantic and acoustic tokens via distinct optimization constraints. Specifically, semantic tokens are supervised by ASR to capture linguistic content, while acoustic tokens focus on mel-spectrograms restoration to encode style. We further introduce a hierarchical Flow Matching decoder and a joint reconstruction-context inpainting training strategy, allowing the model to support both high-fidelity reconstruction and cross-utterance voice clone. To speed up inference, we distill the dit decoder to 4-step inference and improve synthesis quality with GAN fine-tuning. Experiments demonstrate that DSA-Tokenizer provides strong semantic-acoustic disentanglement, reliable controllable voice cloning, and efficient high-fidelity generation with low WER/CER. Moreover, our results suggest that disentangled tokenization provides a more effective interface for downstream large-model speech generation. Audio samples are avaialble at https://anonymous.4open.science/w/DSA_Tokenizer_demo/

2511.14639 2026-06-09 cs.CV

SLAM-AGS: Slide-Label Aware Multi-Task Pretraining Using Adaptive Gradient Surgery in Computational Cytology

SLAM-AGS:基于滑片标签的多任务预训练方法:利用自适应梯度手术的计算细胞学

Marco Acerbis, Swarnadip Chatterjee, Christophe Avenel, Joakim Lindblad

发表机构 * University of Cambridge(剑桥大学)

AI总结 SLAM-AGS通过联合优化弱监督相似性和自监督对比性目标,提升下游任务性能,并利用自适应梯度手术解决任务梯度冲突,实现在低见证率下的稳定预训练和更优表现。

Comments 5 pages, 2 figures, Submitted to ISBI2026

详情
Journal ref
2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI)
AI中文摘要

计算细胞学面临两个主要挑战:i) 实例级标签不可靠且获取成本高昂,ii) 见证率极低。我们提出SLAM-AGS,一种Slide-Label-Aware多任务预训练框架,联合优化滑片负补丁上的弱监督相似性目标和滑片正补丁上的自监督对比性目标,从而在下游任务中获得更强的性能。为稳定学习,我们应用自适应梯度手术以解决冲突的任务梯度并防止模型崩溃。我们将预训练的编码器整合到基于注意力的多实例学习聚合器中,用于袋级预测和引导检索袋中最异常的实例。在公开可用的骨髓细胞学数据集上,使用模拟的见证率从10%降至0.5%,SLAM-AGS在袋级F1分数和Top 400正细胞检索上优于其他预训练方法,尤其在低见证率下表现最佳,显示了解决梯度干扰能够实现稳定的预训练和更好的下游任务性能。为促进可重复性,我们分享了完整的实现和评估框架作为开源:https://github.com/Ace95/SLAM-AGS。

英文摘要

Computational cytology faces two major challenges: i) instance-level labels are unreliable and prohibitively costly to obtain, ii) witness rates are extremely low. We propose SLAM-AGS, a Slide-Label-Aware Multitask pretraining framework that jointly optimizes (i) a weakly supervised similarity objective on slide-negative patches and (ii) a self-supervised contrastive objective on slide-positive patches, yielding stronger performance on downstream tasks. To stabilize learning, we apply Adaptive Gradient Surgery to tackle conflicting task gradients and prevent model collapse. We integrate the pretrained encoder into an attention-based Multiple Instance Learning aggregator for bag-level prediction and attention-guided retrieval of the most abnormal instances in a bag. On a publicly available bone-marrow cytology dataset, with simulated witness rates from 10% down to 0.5%, SLAM-AGS improves bag-level F1-Score and Top 400 positive cell retrieval over other pretraining methods, with the largest gains at low witness rates, showing that resolving gradient interference enables stable pretraining and better performance on downstream tasks. To facilitate reproducibility, we share our complete implementation and evaluation framework as open source: https://github.com/Ace95/SLAM-AGS.

2405.07098 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML

Interpretable global minima of deep ReLU neural networks on sequentially separable data

可解释的深度ReLU神经网络在依次可分数据上的全局极小值

Thomas Chen, Patrícia Muñoz Ewald

发表机构 * Department of Mathematics, University of Texas at Austin(德克萨斯大学奥斯汀分校数学系)

AI总结 本文通过构造零损失分类器,利用累积参数确定截断映射,研究了在小且分离的簇数据及依次线性可分等价类情况下,深度ReLU网络的全局极小值描述。

Comments AMS Latex, 31 pages, 3 figures

详情
Journal ref
J. Mach. Learn. Res., 26 (173): 1-31 (2025)
AI中文摘要

我们显式地构造了零损失神经网络分类器。我们将权重矩阵和偏置向量用累积参数表示,这些参数决定了递归作用于输入空间的截断映射。考虑的训练数据配置包括(i)足够小且彼此分离的簇对应于每个类别,以及(ii)依次线性可分的等价类。在最佳情况下,对于$\mathbb{R}^M$中的$Q$类数据,全局极小值可以用$Q(M+2)$个参数描述。

英文摘要

We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.

2501.05628 2026-06-09 cs.RO cs.HC

Concerns and Values in Human-Robot Interactions: A Focus on Social Robotics

人机交互中的关注点与价值观:聚焦社交机器人

Giulio Antonio Abbo, Tony Belpaeme, Micol Spitale

发表机构 * IDLab-AIRO , Ghent University – imec(IDLab-AIRO 和根特大学 – imec) Ghent University – imec(根特大学 – imec) DEIB , Politecnico di Milano(DEIB 和米兰理工大学)

AI总结 本文通过文献综述和焦点小组讨论,识别了医疗、教育和家庭场景中人机交互的关键问题与价值观,并开发了HRI价值罗盘工具以指导机器人设计。

Comments 31 pages, 7 figures, 6 tables; 4 appendices

详情
Journal ref
Int J of Soc Robotics 18, 4 (2026)
AI中文摘要

作为具有物理实现的人工智能,机器人 inhabits 我们的社会和物理世界,其行为具有社会和物理后果,给研究人员在设计社交机器人时带来挑战。本研究通过文献综述确定了医疗、教育和私人住宅中与机器人系统交互的讨论和潜在问题。随后,两个技术伦理专家焦点小组验证了人机交互(HRI)文献中这些情境下的关键主题和价值观的综合列表。这些见解被整合到HRI价值罗盘网页工具中,以帮助HRI研究人员在机器人设计中识别这些价值观。该工具在试点研究中进行了评估。本工作通过突出人机交互中的关键关注点,并提供一种帮助研究人员设计符合人类价值观的机器人的工具,为HRI社区做出了贡献,确保未来的机器人系统在社交应用中遵循这些价值观。

英文摘要

Robots, as AI with physical instantiation, inhabit our social and physical world, where their actions have both social and physical consequences, posing challenges for researchers when designing social robots. This study starts with a scoping review to identify discussions and potential concerns arising from interactions with robotic systems in the context of healthcare, education, and private homes. Two focus groups of technology ethics experts then validated a comprehensive list of key topics and values in human-robot interaction (HRI) literature in these contexts. These insights were integrated into the HRI Value Compass web tool, to help HRI researchers identify these values in robot design. The tool was evaluated in a pilot study. This work benefits the HRI community by highlighting key concerns in human-robot interactions and providing an instrument to help researchers design robots that align with human values, ensuring future robotic systems adhere to these values in social applications.

2503.23822 2026-06-09 cs.LG

Node Embeddings via Neighbor Embeddings

通过邻居嵌入进行节点嵌入

Jan Niklas Böhm, Marius Keute, Alica Guzmán, Sebastian Damrich, Andrew Draganov, Dmitry Kobak

发表机构 * Hertie AI, University of Tübingen, Germany(赫尔特人工智能研究所、图宾根大学,德国) Department of Computer Science, Aarhus University, Denmark(计算机科学系,奥胡斯大学,丹麦)

AI总结 本文提出图邻居嵌入框架,无需随机游走即可直接整合相邻节点的嵌入向量,优于现有节点嵌入算法,在局部结构保持方面表现突出,并应用于2D节点嵌入问题,获得优于现有图布局算法的t-SNE布局。

Comments Accepted to Transactions of Machine Learning Research (TMLR)

详情
Journal ref
Transactions on Machine Learning Research (TMLR) 2025
AI中文摘要

节点嵌入是一种非参数图表示学习范式,通过将图节点嵌入到给定的向量空间中,以实现下游处理。最先进的节点嵌入算法,如DeepWalk和node2vec,基于节点相似性的随机游走概念和对比学习。在本工作中,我们引入图邻居嵌入(图NE)框架,该框架直接整合相邻节点的嵌入向量,而无需依赖任何随机游走。我们证明图NE在局部结构保持方面显著优于最先进的节点嵌入算法。此外,我们将图NE应用于2D节点嵌入问题,获得图t-SNE布局,这些布局也优于现有图布局算法。

英文摘要

Node embeddings are a paradigm in non-parametric graph representation learning, where graph nodes are embedded into a given vector space to enable downstream processing. State-of-the-art node-embedding algorithms, such as DeepWalk and node2vec, are based on random-walk notions of node similarity and on contrastive learning. In this work, we introduce the graph neighbor-embedding (graph NE) framework that directly pulls together embedding vectors of adjacent nodes without relying on any random walks. We show that graph NE strongly outperforms state-of-the-art node-embedding algorithms in terms of local structure preservation. Furthermore, we apply graph NE to the 2D node-embedding problem, obtaining graph t-SNE layouts that also outperform existing graph-layout algorithms.

2510.13381 2026-06-09 cs.CV cs.GR

Leveraging 2D Priors and SDF Guidance for Dynamic Urban Scene Rendering

利用2D先验和SDF引导进行动态城市场景渲染

Siddharth Tourani, Jayaram Reddy, Akash Kumbar, Satyajit Tourani, Nishant Goyal, Madhava Krishna, N. Dinesh Reddy, Muhammad Haris Khan

发表机构 * IIIT Hyderabad(海得拉尔印度理工学院) MBZUAI(穆罕默德·本·拉希德智能研究院) University of Heidelberg(海德堡大学) VLM Run IIT Kharagpur(克达尔理工学院)

AI总结 本文提出结合2D对象无关先验与SDF表示的方法,用于动态城市场景渲染,无需LiDAR数据,提升几何精度和变形建模能力。

Comments Accepted at ICCV-2025, project page: https://dynamic-ugsdf.github.io/

详情
Journal ref
Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
AI中文摘要

动态场景渲染与重建在计算机视觉和增强现实领域至关重要。基于3D高斯点扩散(3DGS)的方法已能准确建模动态城市场景,但需要相机和LiDAR数据、真实3D分割及运动数据。本文探讨结合2D深度和点跟踪先验与SDF表示是否能降低这些要求。我们提出一种将SDF与3DGS结合的新方法,通过融合两者优势,提升物体表示的鲁棒性。统一优化框架增强了3DGS的几何精度,并改进了SDF内的变形建模,实现更适应和精确的表示。实验表明,即使没有LiDAR数据,该方法在渲染指标上也达到最先进的性能。当结合LiDAR时,方法在不同物体类别上的重建和生成新视角方面进一步提升,无需真实3D运动标注。此外,该方法支持多种场景编辑任务,包括场景分解和场景合成。

英文摘要

Dynamic scene rendering and reconstruction play a crucial role in computer vision and augmented reality. Recent methods based on 3D Gaussian Splatting (3DGS), have enabled accurate modeling of dynamic urban scenes, but for urban scenes they require both camera and LiDAR data, ground-truth 3D segmentations and motion data in the form of tracklets or pre-defined object templates such as SMPL. In this work, we explore whether a combination of 2D object agnostic priors in the form of depth and point tracking coupled with a signed distance function (SDF) representation for dynamic objects can be used to relax some of these requirements. We present a novel approach that integrates Signed Distance Functions (SDFs) with 3D Gaussian Splatting (3DGS) to create a more robust object representation by harnessing the strengths of both methods. Our unified optimization framework enhances the geometric accuracy of 3D Gaussian splatting and improves deformation modeling within the SDF, resulting in a more adaptable and precise representation. We demonstrate that our method achieves state-of-the-art performance in rendering metrics even without LiDAR data on urban scenes. When incorporating LiDAR, our approach improved further in reconstructing and generating novel views across diverse object categories, without ground-truth 3D motion annotation. Additionally, our method enables various scene editing tasks, including scene decomposition, and scene composition.

2510.06742 2026-06-09 cs.AI cs.LG

MultiCNKG: Integrating Cognitive Neuroscience, Gene, and Disease Knowledge Graphs Using Large Language Models

MultiCNKG: 利用大语言模型整合认知神经科学、基因和疾病知识图谱

Ali Sarabadani, Kheirolah Rahsepar Fard

发表机构 * Department of Computer Engineering and Information Technology, University of Qom(卡姆大学计算机工程与信息科技系) University of Qom(卡姆大学)

AI总结 本文提出MultiCNKG框架,整合认知神经科学、基因和疾病知识图谱,利用大语言模型实现实体对齐和图谱增强,提升生物医学领域知识图谱的整合与应用能力。

详情
AI中文摘要

大语言模型(LLMs)的出现革新了生物医学和认知科学中知识图谱(KGs)的整合,克服了传统机器学习方法在捕捉基因、疾病和认知过程之间复杂语义联系方面的局限。我们介绍了MultiCNKG,一种创新框架,整合了三个关键知识源:包含2.9K节点和4.3K边的认知神经科学知识图谱(CNKG),涵盖9种节点类型和20种边类型;基因本体(GO)包含43K节点和75K边,涵盖3种节点类型和4种边类型;疾病本体(DO)包含11.2K节点和8.8K边,涵盖1种节点类型和2种边类型。利用LLMs如GPT-4,我们进行实体对齐、语义相似性计算和图谱增强,创建了一个连接遗传机制、神经疾病和认知功能的统一知识图谱。结果图谱包含6.9K节点,涵盖5种类型(如基因、疾病、认知过程)和11.3K边,涵盖7种类型(如因果关系、关联、调控)。评估指标如精确率(85.20%)、召回率(87.30%)、覆盖率(92.18%)、图一致性(82.50%)、新颖性检测(40.28%)和专家验证(89.50%)证实了其鲁棒性和一致性。链接预测评估显示,与TransE(MR: 391,MRR: 0.411)和RotatE(MR: 263,MRR: 0.395)等模型相比,性能与基准如FB15k-237和WN18RR相当。该图谱在个性化医学、认知障碍诊断和认知神经科学假设形成中具有应用前景。

英文摘要

The advent of large language models (LLMs) has revolutionized the integration of knowledge graphs (KGs) in biomedical and cognitive sciences, overcoming limitations in traditional machine learning methods for capturing intricate semantic links among genes, diseases, and cognitive processes. We introduce MultiCNKG, an innovative framework that merges three key knowledge sources: the Cognitive Neuroscience Knowledge Graph (CNKG) with 2.9K nodes and 4.3K edges across 9 node types and 20 edge types; Gene Ontology (GO) featuring 43K nodes and 75K edges in 3 node types and 4 edge types; and Disease Ontology (DO) comprising 11.2K nodes and 8.8K edges with 1 node type and 2 edge types. Leveraging LLMs like GPT-4, we conduct entity alignment, semantic similarity computation, and graph augmentation to create a cohesive KG that interconnects genetic mechanisms, neurological disorders, and cognitive functions. The resulting MultiCNKG encompasses 6.9K nodes across 5 types (e.g., Genes, Diseases, Cognitive Processes) and 11.3K edges spanning 7 types (e.g., Causes, Associated with, Regulates), facilitating a multi-layered view from molecular to behavioral domains. Assessments using metrics such as precision (85.20%), recall (87.30%), coverage (92.18%), graph consistency (82.50%), novelty detection (40.28%), and expert validation (89.50%) affirm its robustness and coherence. Link prediction evaluations with models like TransE (MR: 391, MRR: 0.411) and RotatE (MR: 263, MRR: 0.395) show competitive performance against benchmarks like FB15k-237 and WN18RR. This KG advances applications in personalized medicine, cognitive disorder diagnostics, and hypothesis formulation in cognitive neuroscience.

2401.14591 2026-06-09 cs.LG stat.ML

Ricci flow regularization in latent spaces for the forward learning of partial differential equations

在潜在空间中使用里奇流进行偏微分方程的前向学习

Andrew Gracyk

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文提出基于流形的机器学习编码器-解码器方法,通过里奇流演化潜在空间来学习时间动态,特别是偏微分方程。方法通过参数化潜在流形并模拟物理约束下的里奇流,实现低维表示学习及对抗鲁棒性。

Comments Fixed a small error in appendix; some improvements to experiments

详情
AI中文摘要

我们提出了一种基于流形的机器学习编码器-解码器方法,用于学习时间动态,特别是偏微分方程(PDEs)。其中,流形潜在空间根据里奇流演化。这可以通过参数化潜在流形阶段并随后在物理约束下模拟里奇流来实现,通过匹配流形量以实现在经验上达到里奇流。我们强调那些允许低维表示的动力学。通过该方法,由度量诱导的流形通过训练过程得以辨识,而由于里奇流的潜在演化提供了一种适应性表示。利用此流,我们维持了一个标准的流形潜在表示,适用于所有驻留PDE时间区间连续体的值。我们展示里奇流有助于诸如学习非分布数据和在选定PDE数据上的对抗鲁棒性等特性。此外,我们还对允许更高维表示的特殊情形进行了详尽扩展,例如在超球面上的里奇流和具有熵策略的神经发现非参数几何流。

英文摘要

We present a manifold-based machine learning encoder-decoder method for learning dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by parameterizing the latent manifold stage and subsequently simulating Ricci flow in a physics-informed setting, matching manifold quantities so that Ricci flow is empirically achieved. We emphasize dynamics that admit low-dimensional representations. With our method, the manifold, induced by the metric, is discerned through the training procedure, while the latent evolution due to Ricci flow provides an accommodating representation. By use of this flow, we sustain a canonical manifold latent representation for all values in the ambient PDE time interval continuum. We showcase that the Ricci flow facilitates qualities such as learning for out-of-distribution data and adversarial robustness on select PDE data. Moreover, we provide a thorough expansion of our methods in regard to special cases which allow higher-dimensional representations, such as Ricci flow on the hypersphere and neural discovery of non-parametric geometric flows with entropic strategies.

2507.15152 2026-06-09 cs.CL cs.AI cs.LG

What Level of Automation is "Good Enough"? A Benchmark of Large Language Models for Meta-Analysis Data Extraction

什么是‘足够’的自动化水平?大型语言模型在元分析数据提取中的基准测试

Lingbo Li, Anuradha Mathrani, Teo Susnjak

发表机构 * School of Mathematical and Computational Sciences(数学与计算科学学院) Massey University(梅西大学) Auckland, New Zealand(新西兰奥克兰)

AI总结 本文评估了三种大型语言模型在医疗领域数据提取中的性能,发现定制提示能显著提升召回率,提出三层次指南以平衡自动化与专家监督。

详情
Journal ref
Research Synthesis Methods (2026)
AI中文摘要

自动化从全文随机对照试验(RCT)中提取数据用于元分析仍是一个重大挑战。本研究评估了三种LLM(Gemini-2.0-flash、Grok-3、GPT-4o-mini)在高血压、糖尿病和骨科三个医学领域中统计结果、偏倚风险评估和研究层面特征任务上的实际表现。我们测试了四种不同的提示策略(基本提示、自我反思提示、模型集成和定制提示)以确定如何提高提取质量。所有模型均表现出高精度,但普遍存在召回率低的问题,因遗漏关键信息。我们发现定制提示是最有效的,召回率可提升高达15%。基于此分析,我们提出了一套三层指南,根据任务复杂性和风险匹配数据类型与适当的自动化水平。本研究为现实世界中的元分析自动化数据提取提供了实用建议,通过有针对性的、任务特定的自动化平衡LLM效率与专家监督。

英文摘要

Automating data extraction from full-text randomised controlled trials (RCTs) for meta-analysis remains a significant challenge. This study evaluates the practical performance of three LLMs (Gemini-2.0-flash, Grok-3, GPT-4o-mini) across tasks involving statistical results, risk-of-bias assessments, and study-level characteristics in three medical domains: hypertension, diabetes, and orthopaedics. We tested four distinct prompting strategies (basic prompting, self-reflective prompting, model ensemble, and customised prompts) to determine how to improve extraction quality. All models demonstrate high precision but consistently suffer from poor recall by omitting key information. We found that customised prompts were the most effective, boosting recall by up to 15\%. Based on this analysis, we propose a three-tiered set of guidelines for using LLMs in data extraction, matching data types to appropriate levels of automation based on task complexity and risk. Our study offers practical advice for automating data extraction in real-world meta-analyses, balancing LLM efficiency with expert oversight through targeted, task-specific automation.

2507.02606 2026-06-09 cs.SD cs.AI cs.CR cs.LG eess.AS

De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks

De-AntiFake:重新思考对抗语音克隆攻击的保护扰动

Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文提出一种两阶段净化方法,旨在提升对抗语音克隆攻击的防御效果,通过净化扰动语音并利用音素指导进行优化,实验表明其优于现有方法。

Comments Accepted by ICML 2025

详情
Journal ref
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267, 2025
AI中文摘要

随着语音生成模型的快速发展,语音克隆(VC)带来的隐私和安全问题日益突出。近期研究尝试通过引入对抗扰动来阻止未经授权的语音克隆,但确定性攻击者可以缓解这些保护扰动并成功执行VC。本文首次系统评估这些保护扰动在包含扰动净化的现实威胁模型下的有效性。研究发现,尽管现有净化方法能中和大量保护扰动,但仍导致VC模型特征空间的失真,影响VC性能。因此,我们提出一种新的两阶段净化方法:(1)净化扰动语音;(2)利用音素指导进行优化,使其符合干净语音分布。实验结果表明,我们的方法在破坏VC防御方面优于现有方法。本研究揭示了基于对抗扰动的VC防御的局限性,并强调了需要更鲁棒的解决方案以缓解VC带来的安全和隐私风险。代码和音频样本可在https://de-antifake.github.io获取。

英文摘要

The rapid advancement of speech generation models has heightened privacy and security concerns related to voice cloning (VC). Recent studies have investigated disrupting unauthorized voice cloning by introducing adversarial perturbations. However, determined attackers can mitigate these protective perturbations and successfully execute VC. In this study, we conduct the first systematic evaluation of these protective perturbations against VC under realistic threat models that include perturbation purification. Our findings reveal that while existing purification methods can neutralize a considerable portion of the protective perturbations, they still lead to distortions in the feature space of VC models, which degrades the performance of VC. From this perspective, we propose a novel two-stage purification method: (1) Purify the perturbed speech; (2) Refine it using phoneme guidance to align it with the clean speech distribution. Experimental results demonstrate that our method outperforms state-of-the-art purification methods in disrupting VC defenses. Our study reveals the limitations of adversarial perturbation-based VC defenses and underscores the urgent need for more robust solutions to mitigate the security and privacy risks posed by VC. The code and audio samples are available at https://de-antifake.github.io.

2502.09252 2026-06-09 cs.LG

On the Importance of Embedding Norms in Self-Supervised Learning

关于嵌入范数在自监督学习中的重要性

Andrew Draganov, Sharvaree Vadgama, Sebastian Damrich, Jan Niklas Böhm, Lucas Maes, Dmitry Kobak, Erik Bekkers

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 本文研究了嵌入范数在自监督学习中的作用,通过理论分析和实验表明范数影响收敛速度和网络置信度,且较小的范数对应意外样本。

详情
Journal ref
International Conference on Machine Learning (ICML) 2025
AI中文摘要

自监督学习(SSL)允许在无监督信号的情况下训练数据表示,已成为机器学习的重要范式。大多数SSL方法使用嵌入向量的余弦相似度,从而有效将数据嵌入到超球面上。虽然这似乎表明嵌入范数在SSL中不起作用,但一些近期工作表明嵌入范数与网络收敛和置信度有关。本文解决这一明显矛盾,系统地确立嵌入范数在SSL训练中的作用。通过理论分析、模拟和实验,我们证明嵌入范数(i)控制SSL收敛速度(ii)编码网络置信度,较小的范数对应意外样本。此外,我们还表明操纵嵌入范数对收敛速度有显著影响。我们的发现表明,SSL嵌入范数对于理解和优化网络行为至关重要。

英文摘要

Self-supervised learning (SSL) allows training data representations without a supervised signal and has become an important paradigm in machine learning. Most SSL methods employ the cosine similarity between embedding vectors and hence effectively embed data on a hypersphere. While this seemingly implies that embedding norms cannot play any role in SSL, a few recent works have suggested that embedding norms have properties related to network convergence and confidence. In this paper, we resolve this apparent contradiction and systematically establish the embedding norm's role in SSL training. Using theoretical analysis, simulations, and experiments, we show that embedding norms (i) govern SSL convergence rates and (ii) encode network confidence, with smaller norms corresponding to unexpected samples. Additionally, we show that manipulating embedding norms can have large effects on convergence speed. Our findings demonstrate that SSL embedding norms are integral to understanding and optimizing network behavior.

2311.07065 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML

On non-approximability of zero loss global ${\mathcal L}^2$ minimizers by gradient descent in Deep Learning

关于深度学习中梯度下降无法逼近零损失全局L²最小化器的非近似性

Thomas Chen, Patricia Muñoz Ewald

发表机构 * Department of Mathematics, University of Texas at Austin(德克萨斯大学奥斯汀分校数学系)

AI总结 本文分析了深度学习中梯度下降算法的几何特性,指出在欠参数化网络中,零损失最小化通常无法实现,因此训练输入分布必须非典型才能产生零损失最小化器。

Comments AMS Latex, 7 pages. Typos corrected, Corollary 1.6 upgraded to Theorem, acknowledgment added

详情
Journal ref
Theor. Appl. Mech., 52 (1), 67-73 (2025)
AI中文摘要

我们分析了深度学习中梯度下降算法的几何特性,并详细讨论了在欠参数化深度学习网络中,零损失最小化通常无法实现的情形。作为结果,我们得出结论:为了产生零损失最小化器,训练输入分布必须非典型,无论是对于[Chen-Munoz Ewald 2023, 2024]中构造的方法,还是对于梯度下降[Chen 2025](假设训练数据聚类)方法而言。

英文摘要

We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL), and give a detailed discussion of the circumstance that in underparametrized DL networks, zero loss minimization can generically not be attained. As a consequence, we conclude that the distribution of training inputs must necessarily be non-generic in order to produce zero loss minimizers, both for the method constructed in [Chen-Munoz Ewald 2023, 2024], or for gradient descent [Chen 2025] (which assume clustering of training data).

2410.14964 2026-06-09 cs.CL

ChronoFact: Timeline-based Temporal Fact Verification

ChronoFact:基于时间线的时序事实验证

Anab Maulana Barik, Wynne Hsu, Mong Li Lee

发表机构 * School of Computing(计算学院) Institute of Data Science(数据科学研究所) Centre for Trusted Internet and Community(可信互联网与社区中心)

AI总结 本文提出基于时间线的时序事实验证框架,通过识别声明和证据中的事件并组织时间线,系统分析事件关系以预测声明真实性,同时引入复杂时序声明数据集提升验证效果。

详情
Journal ref
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2025), pp. 8031-8039
AI中文摘要

时序声明常存在不准确之处,是数字虚假信息领域的重要挑战。能够准确验证此类声明的事实核查系统对于对抗虚假信息至关重要。当前系统在评估这些声明的准确性时面临复杂性,尤其是当声明包含多个、重叠或重复事件时。我们引入了一个新的基于时间线的事实验证框架,该框架识别声明和证据中的事件,并将它们组织到各自的时间线中。该框架系统地分析声明和证据中事件之间的关系,以预测每个声明事件的真实性及其时间准确性。这使我们能够准确确定整个声明的真实性。我们还引入了一个新的复杂时序声明数据集,涉及基于时间线的推理,用于训练和评估所提出的框架。实验结果展示了我们的方法在处理时序声明验证复杂性方面的有效性。

英文摘要

Temporal claims, often riddled with inaccuracies, are a significant challenge in the digital misinformation landscape. Fact-checking systems that can accurately verify such claims are crucial for combating misinformation. Current systems struggle with the complexities of evaluating the accuracy of these claims, especially when they include multiple, overlapping, or recurring events. We introduce a novel timeline-based fact verification framework that identify events from both claim and evidence and organize them into their respective chronological timelines. The framework systematically examines the relationships between the events in both claim and evidence to predict the veracity of each claim event and their chronological accuracy. This allows us to accurately determine the overall veracity of the claim. We also introduce a new dataset of complex temporal claims involving timeline-based reasoning for the training and evaluation of our proposed framework. Experimental results demonstrate the effectiveness of our approach in handling the intricacies of temporal claim verification.

2205.01970 2026-06-09 cs.LG stat.ML

Non-Stationary Bandit Learning via Predictive Sampling

非平稳老虎机学习中的预测采样

Yueyang Liu, Xu Kuang, Benjamin Van Roy

发表机构 * Jones Graduate School of Business, Rice University(里士满大学沃森商学院研究生院) Stanford Graduate School of Business(斯坦福商学院) Department of Management Science and Engineering, Department of Electrical Engineering, Stanford University(斯坦福大学管理科学与工程系、电气工程系)

AI总结 本文提出预测采样算法,通过区分信息快速失效的行动来改进非平稳环境下的老虎机学习,理论证明其性能并验证其在复杂环境中的有效性。

详情
AI中文摘要

Thompson sampling在广泛平稳老虎机环境中表现良好,但应用于非平稳环境时表现不佳。本文指出,此类失败源于探索时未根据信息失效速度区分行动。基于此,提出预测采样算法,通过优先处理信息快速失效的行动来提升性能。通过理论上的贝叶斯遗憾界证明预测采样的性能,并提供可扩展到实际应用复杂老虎机环境的版本。数值模拟显示,预测采样在所有考察的非平稳环境中均优于Thompson sampling。

英文摘要

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to nonstationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. Theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulations, we demonstrate that predictive sampling outperforms Thompson sampling in all non-stationary environments examined.

2311.08957 2026-06-09 cs.RO cs.AI cs.HC

I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots

我曾盲目但如今我看见:在社交机器人中实现视觉增强的对话

Giulio Antonio Abbo, Tony Belpaeme

发表机构 * IDLab-AIRO – Ghent University – imec(IDLab-AIRO – 布鲁塞尔自由大学 – imec)

AI总结 本文提出一种利用大语言模型提升社交机器人对话能力的系统,通过整合视觉输入增强上下文感知,展示六次与Furhat机器人的交互结果,探讨视觉与文本模态融合的未来对话可能性。

Comments 8 pages, 3 figures

详情
Journal ref
HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction. Pages 1176 - 1180
AI中文摘要

在人机交互快速发展的背景下,将视觉能力整合到对话代理中是关键进步。本文介绍了基于最新大语言模型(如GPT-4、IDEFICS)的对话管理器初始实现,通过实时视觉输入增强传统文本提示。LLMs被用于解释文本提示和视觉刺激,创建更上下文感知的对话代理。系统的提示工程结合对话和图像摘要,平衡上下文保留与计算效率。报告了与Furhat机器人进行六次交互,展示了结果并进行了讨论。通过实现这种视觉增强的对话系统,本文展望了一个未来,其中对话代理能够无缝融合文本和视觉模态,实现更丰富、更上下文感知的对话。

英文摘要

In the rapidly evolving landscape of human-computer interaction, the integration of vision capabilities into conversational agents stands as a crucial advancement. This paper presents an initial implementation of a dialogue manager that leverages the latest progress in Large Language Models (e.g., GPT-4, IDEFICS) to enhance the traditional text-based prompts with real-time visual input. LLMs are used to interpret both textual prompts and visual stimuli, creating a more contextually aware conversational agent. The system's prompt engineering, incorporating dialogue with summarisation of the images, ensures a balance between context preservation and computational efficiency. Six interactions with a Furhat robot powered by this system are reported, illustrating and discussing the results obtained. By implementing this vision-enabled dialogue system, the paper envisions a future where conversational agents seamlessly blend textual and visual modalities, enabling richer, more context-aware dialogues.

2405.17151 2026-06-09 cs.LG

Smoke and Mirrors in Causal Downstream Tasks

因果下游任务中的烟与幻影

Riccardo Cadei, Lukas Lindorfer, Sylvia Cremer, Cordelia Schmid, Francesco Locatello

发表机构 * Institute of Science and Technology Austria (ISTA)(奥地利科学与技术研究所) Inria(法国国家信息与自动化技术研究所) Ecole Normale Supérieure(法国高等科学研究院) CNRS(法国国家科学研究中心) PSL Research University(巴黎科学哲学大学)

AI总结 本文探讨了因果推断中常见方法的偏差问题,通过实验证明模型选择对因果估计精度的影响,并提出科学问题应被考虑在内。

详情
AI中文摘要

机器学习和人工智能有潜力改变数据驱动的科学发现,能够为多种科学现象提供准确的预测。由于许多科学问题本质上是因果的,本文探讨了因果推断任务中的处理效应估计,其中感兴趣的结局是在随机对照试验(RCT)中记录在高维观测中的。尽管是最简单的因果设置,且完美适合深度学习,但我们理论发现许多文献中的常见选择可能导致估计偏差。为了测试这些考虑的实际影响,我们记录了ISTAnt,第一个针对高维观测的因果推断下游任务的真实世界基准,作为研究园丁蚁(Lasius neglectus)对施加在群体成员上的微粒体的反应的RCT。比较6480个从最先进的视觉骨干网络微调的模型,我们发现采样和建模选择显著影响因果估计的准确性,且分类准确性并非其代理。我们进一步验证了分析,将其重复应用于合成的视觉数据集,以控制因果模型。我们的结果表明,未来的基准应仔细考虑实际的下游科学问题,尤其是因果问题。此外,我们还强调了表示学习方法的指导方针,以帮助在科学中回答因果问题。

英文摘要

Machine Learning and AI have the potential to transform data-driven scientific discovery, enabling accurate predictions for several scientific phenomena. As many scientific questions are inherently causal, this paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations in a Randomized Controlled Trial (RCT). Despite being the simplest possible causal setting and a perfect fit for deep learning, we theoretically find that many common choices in the literature may lead to biased estimates. To test the practical impact of these considerations, we recorded ISTAnt, the first real-world benchmark for causal inference downstream tasks on high-dimensional observations as an RCT studying how garden ants (Lasius neglectus) respond to microparticles applied onto their colony members by hygienic grooming. Comparing 6 480 models fine-tuned from state-of-the-art visual backbones, we find that the sampling and modeling choices significantly affect the accuracy of the causal estimate, and that classification accuracy is not a proxy thereof. We further validated the analysis, repeating it on a synthetically generated visual data set controlling the causal model. Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones. Further, we highlight guidelines for representation learning methods to help answer causal questions in the sciences.

2501.12421 2026-06-09 cs.LG cs.AI q-bio.QM

Tackling Small Sample Survival Analysis via Transfer Learning: A Study of Colorectal Cancer Prognosis

通过迁移学习解决小样本生存分析:结直肠癌预后的研究

Yonghao Zhao, Changtao Li, Chi Shu, Qingbin Wu, Hong Li, Chuan Xu, Tianrui Li, Ziqiang Wang, Zhipeng Luo, Yazhou He

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 本文通过迁移学习提升小样本生存分析,针对结直肠癌预后,改进了多种生存模型,如DeepSurv、Cox-CC、DeepHit和Random Survival Forest,实验结果显示迁移学习显著提升了模型性能。

详情
Journal ref
Artificial Intelligence in Medicine, 178:103426, 2026
AI中文摘要

生存预后对医疗信息学至关重要。实践者常面临小规模临床数据,尤其是癌症患者数据,难以诱导有用的生存预测模式。本文通过迁移学习解决小样本生存分析问题,提出适用于常见生存模型的迁移学习方法。对于参数模型如DeepSurv、Cox-CC和DeepHit,应用预训练和微调等标准迁移学习技术。对于非参数模型如Random Survival Forest,提出新的迁移生存森林(TSF)模型,通过转移树结构并用目标数据微调。在结直肠癌(CRC)预后中评估了迁移学习方法。源数据为27,379名SEER CRC I期患者,目标数据为728名来自西昌医院的CRC I期患者。迁移学习增强后,Cox-CC的C^{td}值从0.7868提升至0.8111,DeepHit从0.8085提升至0.8135,DeepSurv从0.7722提升至0.8043,RSF从0.7940提升至0.8297(最高性能)。所有模型在数据量仅50时训练也表现出更显著的提升。结论:因此,用于癌症预后的现有生存模型可通过适当设计的迁移学习技术得到增强和改进。本研究使用的源代码可在https://github.com/YonghaoZhao722/TSF获取。

英文摘要

Survival prognosis is crucial for medical informatics. Practitioners often confront small-sized clinical data, especially cancer patient cases, which can be insufficient to induce useful patterns for survival predictions. This study deals with small sample survival analysis by leveraging transfer learning, a useful machine learning technique that can enhance the target analysis with related knowledge pre-learned from other data. We propose and develop various transfer learning methods designed for common survival models. For parametric models such as DeepSurv, Cox-CC (Cox-based neural networks), and DeepHit (end-to-end deep learning model), we apply standard transfer learning techniques like pretraining and fine-tuning. For non-parametric models such as Random Survival Forest, we propose a new transfer survival forest (TSF) model that transfers tree structures from source tasks and fine-tunes them with target data. We evaluated the transfer learning methods on colorectal cancer (CRC) prognosis. The source data are 27,379 SEER CRC stage I patients, and the target data are 728 CRC stage I patients from the West China Hospital. When enhanced by transfer learning, Cox-CC's $C^{td}$ value was boosted from 0.7868 to 0.8111, DeepHit's from 0.8085 to 0.8135, DeepSurv's from 0.7722 to 0.8043, and RSF's from 0.7940 to 0.8297 (the highest performance). All models trained with data as small as 50 demonstrated even more significant improvement. Conclusions: Therefore, the current survival models used for cancer prognosis can be enhanced and improved by properly designed transfer learning techniques. The source code used in this study is available at https://github.com/YonghaoZhao722/TSF.

2411.18385 2026-06-09 cs.LG cs.CV stat.ML

Federated Learning with Uncertainty and Personalization via Efficient Second-order Optimization

基于高效二阶优化的联邦学习中的不确定性与个性化

Shivam Pal, Aishwarya Gupta, Saqib Sarwar, Piyush Rai

发表机构 * Department of Computer Science and Engineering, IIT Kanpur, India(计算机科学与工程系,印度IIT坎pur)

AI总结 本文提出一种高效的联邦学习方法,利用二阶优化减少计算和通信成本,同时保留贝叶斯方法的不确定性与个性化优势。

详情
Journal ref
Transactions on Machine Learning Research (TMLR), 2025
AI中文摘要

联邦学习(FL)已发展为一种有前景的方法,用于在不同客户端上协作学习分布式和异质数据,而无需数据离开客户端。最近的FL研究倡导采用贝叶斯方法,因为它提供了一种系统的方法来考虑模型和预测不确定性,通过学习客户端和/或服务器模型的后验分布。此外,贝叶斯FL自然能够实现个性化,以处理不同客户端上的数据异质性,通过让每个客户端学习其独特的个性化模型。特别是,层次贝叶斯方法使所有客户端都能学习其个性化模型,同时通过服务器提供的先验分布考虑共同点。然而,尽管有这些优势,贝叶斯方法在FL中可能计算成本高且通信成本高,因为需要计算和发送后验分布。我们提出了一种新的贝叶斯FL方法,采用高效的二阶优化方法,其计算成本与Adam等一阶优化方法相似,同时提供贝叶斯方法的多种优势(例如不确定性、个性化),并且在标准和个性化FL设置中都比最先进的贝叶斯FL方法更高效和准确。我们的方法在预测准确性和不确定性估计方面优于基线方法,包括基于优化和贝叶斯FL的方法。

英文摘要

Federated Learning (FL) has emerged as a promising method to collaboratively learn from decentralized and heterogeneous data available at different clients without the requirement of data ever leaving the clients. Recent works on FL have advocated taking a Bayesian approach to FL as it offers a principled way to account for the model and predictive uncertainty by learning a posterior distribution for the client and/or server models. Moreover, Bayesian FL also naturally enables personalization in FL to handle data heterogeneity across the different clients by having each client learn its own distinct personalized model. In particular, the hierarchical Bayesian approach enables all the clients to learn their personalized models while also taking into account the commonalities via a prior distribution provided by the server. However, despite their promise, Bayesian approaches for FL can be computationally expensive and can have high communication costs as well because of the requirement of computing and sending the posterior distributions. We present a novel Bayesian FL method using an efficient second-order optimization approach, with a computational cost that is similar to first-order optimization methods like Adam, but also provides the various benefits of the Bayesian approach for FL (e.g., uncertainty, personalization), while also being significantly more efficient and accurate than SOTA Bayesian FL methods (both for standard as well as personalized FL settings). Our method achieves improved predictive accuracies as well as better uncertainty estimates as compared to the baselines which include both optimization based as well as Bayesian FL methods.

2311.03087 2026-06-09 cs.LG math.AT

Persistent Homology for High-dimensional Data Based on Spectral Methods

基于谱方法的高维数据持续同调

Sebastian Damrich, Philipp Berens, Dmitry Kobak

发表机构 * Hertie Institute for AI in Brain Health, University of Tübingen, Germany(图宾根大学希特研究所,德国) Tübingen AI Center, Germany(图宾根人工智能中心,德国) IWR, Heidelberg University, Germany(海德堡大学IWR研究所,德国)

AI总结 本文提出利用谱方法中的扩散距离和有效电阻检测高维噪声下的拓扑结构,推导出有效电阻的闭式公式,并应用于单细胞RNA测序数据以识别细胞周期环路。

Comments NeurIPS 2024, 54 pages, 44 figures

详情
Journal ref
Conference on Neural Information Processing Systems (NeurIPS) 2024
AI中文摘要

持续同调是一种分析点云拓扑结构的流行计算工具,如检测环或空洞的存在。然而,许多低内在维度的真实世界数据集存在于远高于维度的环境空间中。我们显示在这种情况下,传统持续同调对噪声非常敏感且无法检测正确的拓扑结构。现有的持续同调改进方法也是如此。作为解决方法,我们发现数据的k近邻图上的谱距离,如扩散距离和有效电阻,能够在高维噪声存在下检测正确的拓扑结构。此外,我们推导出有效电阻的闭式公式,并描述其与扩散距离的关系。最后,我们应用这些方法到高维单细胞RNA测序数据,并展示谱距离允许稳健检测细胞周期环路。

英文摘要

Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.

2406.14883 2026-06-09 cs.CL cs.CY

OATH-Frames: Characterizing Online Attitudes Towards Homelessness with LLM Assistants

OATH-Frames: 利用大语言模型助手分析在线对无家可归者的态度

Jaspreet Ranjit, Brihi Joshi, Rebecca Dorn, Laura Petry, Olga Koumoundouros, Jayne Bottarini, Peichen Liu, Eric Rice, Swabha Swayamdipta

发表机构 * Dept. of Computer Science, University of Southern California(计算机科学系,南加州大学) Suzanne-Dwork School of Social Work, University of Southern California(苏兹曼-道克社会工作学院,南加州大学)

AI总结 本文提出OATH-Frames框架,通过大语言模型分析社交媒体上的无家可归者态度,提升大规模分析效率并揭示态度趋势。

Comments Project website: https://dill-lab.github.io/oath-frames/, EMNLP Main 2024

详情
Journal ref
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
AI中文摘要

警告:本文内容可能令人不安。公众对关键社会问题的在线态度对政策制定至关重要,但大规模理解具有挑战性。本文通过利用大语言模型分析数百万条推文,研究美国无家可归问题,提出OATH-Frames框架,包含九个层级的批判、回应和感知类型。通过不同层级的模型辅助标注,实现标注时间提升6.5倍,性能仅下降3个F1点。实验表明,OATH-Frames在建模态度方面优于现有情感和毒性分类器。对240万条无家可归相关推文的分析揭示了各州、时间周期和脆弱群体的态度趋势,为问题提供了新见解。本文提供了一个通用框架,用于在无家可归问题之外的其他议题上理解大规模的复杂公众态度。

英文摘要

Warning: Contents of this paper may be upsetting. Public attitudes towards key societal issues, expressed on online media, are of immense value in policy and reform efforts, yet challenging to understand at scale. We study one such social issue: homelessness in the U.S., by leveraging the remarkable capabilities of large language models to assist social work experts in analyzing millions of posts from Twitter. We introduce a framing typology: Online Attitudes Towards Homelessness (OATH) Frames: nine hierarchical frames capturing critiques, responses and perceptions. We release annotations with varying degrees of assistance from language models, with immense benefits in scaling: 6.5x speedup in annotation time while only incurring a 3 point F1 reduction in performance with respect to the domain experts. Our experiments demonstrate the value of modeling OATH-Frames over existing sentiment and toxicity classifiers. Our large-scale analysis with predicted OATH-Frames on 2.4M posts on homelessness reveal key trends in attitudes across states, time periods and vulnerable populations, enabling new insights on the issue. Our work provides a general framework to understand nuanced public attitudes at scale, on issues beyond homelessness.

2406.19493 2026-06-09 cs.CL cs.AI

Development and Evaluation of a Retrieval-Augmented Generation Tool for Creating SAPPhIRE Models of Artificial Systems

SAPPhIRE人工系统模型创建工具的开发与评估

Anubhab Majumder, Kausik Bhattacharya, Amaresh Chakrabarti

发表机构 * Department of Design and Manufacturing, Indian Institute of Science(设计与制造系,印度科学研究院)

AI总结 本文提出一种基于检索增强生成的工具,用于创建SAPPhIRE因果模型的人工系统模型,通过评估工具在事实准确性和可靠性方面的表现,提升系统设计类比支持能力。

Comments This paper has been accepted for presentation at the 10th International Conference on Research Into Design, 2025

详情
AI中文摘要

使用SAPPhIRE因果模型表示系统在支持设计类比方面被发现是有用的。然而,创建人工或生物系统的SAPPhIRE模型是一个耗费精力的过程,需要人类专家从多个技术文档中获取技术知识。本研究探讨如何利用大语言模型(LLMs)来创建基于SAPPhIRE因果模型的系统结构描述。本文是两项研究中的第二部分,介绍了一种新的检索增强生成(RAG)工具,用于生成与人工系统SAPPhIRE构造相关的信息,并报告了该工具初步评估的结果,重点在于结果的事实准确性和可靠性。

英文摘要

Representing systems using the SAPPhIRE causality model is found useful in supporting design-by-analogy. However, creating a SAPPhIRE model of artificial or biological systems is an effort-intensive process that requires human experts to source technical knowledge from multiple technical documents regarding how the system works. This research investigates how to leverage Large Language Models (LLMs) in creating structured descriptions of systems using the SAPPhIRE model of causality. This paper, the second part of the two-part research, presents a new Retrieval-Augmented Generation (RAG) tool for generating information related to SAPPhIRE constructs of artificial systems and reports the results from a preliminary evaluation of the tool's success - focusing on the factual accuracy and reliability of outcomes.

2407.13288 2026-06-09 cs.LG

Hierarchical Stage-Wise Training of Linked Deep Neural Networks for Multi-Building and Multi-Floor Indoor Localization Based on Wi-Fi RSSI Fingerprinting

基于Wi-Fi RSSI指纹的多建筑多楼层室内定位的分层阶段式训练链接深度神经网络

Sihao Li, Kyeong Soo Kim, Zhe Tang, Graduate, Jeremy S. Smith

发表机构 * School of Advanced Technology, Xi’an Jiaotong-Liverpool University(西安交通大学利物浦大学先进技术学院) Department of Electrical Engineering and Electronics, University of Liverpool(利物浦大学电子工程与电子系) Postgraduate Research Scholarships, Key Program Special Fund, Research Enhancement Fund of Xi’an Jiaotong-Liverpool University(西安交通大学利物浦大学研究生研究奖学金、重点专项基金、研究增强基金)

AI总结 本文提出一种基于链接神经网络的多建筑多楼层室内定位方法,通过分层阶段式训练框架提升定位精度,实验表明该方法在UJIIndoorLoc数据库上达到8.19米的三维定位误差,优于现有神经网络模型。

Comments 9 pages, 5 figures, under review for journal publication

详情
Journal ref
IEEE Sensors Journal, volume 25, issue 13, pages 23341--23351, July 1, 2025
AI中文摘要

本文提出了一种基于链接神经网络的多建筑多楼层室内定位新方案,每个神经网络专门解决子问题,并在分层阶段式训练框架下训练。当传感器数据具有层次结构时,利用这种层次结构进行数据处理以提供可扩展的解决方案。该框架通过利用更高层次网络训练获得的先验知识来训练更低层次网络。实验结果表明,基于所提出分层阶段式训练框架训练的链接神经网络在UJIIndoorLoc数据库上实现了8.19米的三维定位误差,这是目前使用完整数据集训练和评估的神经网络模型中最准确的结果。当应用于基于层次卷积神经网络的模型时,该训练框架还能显著将三维定位误差从11.78米降低到8.71米。

英文摘要

In this paper, we present a new solution to the problem of large-scale multi-building and multi-floor indoor localization based on linked neural networks, where each neural network is dedicated to a sub-problem and trained under a hierarchical stage-wise training framework. When the measured data from sensors have a hierarchical representation as in multi-building and multi-floor indoor localization, it is important to exploit the hierarchical nature in data processing to provide a scalable solution. In this regard, the hierarchical stage-wise training framework extends the original stage-wise training framework to the case of multiple linked networks by training a lower-hierarchy network based on the prior knowledge gained from the training of higher-hierarchy networks. The experimental results with the publicly-available UJIIndoorLoc multi-building and multi-floor Wi-Fi RSSI fingerprint database demonstrate that the linked neural networks trained under the proposed hierarchical stage-wise training framework can achieve a three-dimensional localization error of 8.19 m, which, to the best of the authors' knowledge, is the most accurate result ever obtained for neural network-based models trained and evaluated with the full datasets of the UJIIndoorLoc database, and that, when applied to a model based on hierarchical convolutional neural networks, the proposed training framework can also significantly reduce the three-dimensional localization error from 11.78 m to 8.71 m.

2311.12167 2026-06-09 cs.LG cs.SI

Node Classification in Random Trees

随机树中的节点分类

Wouter W. L. Nuijten, Vlado Menkovski

发表机构 * Eindhoven University of Technology(埃因霍温理工大学)

AI总结 本文提出一种方法,用于对结构为随机树的对象进行分类,通过马尔可夫网络和图神经网络建模节点标签分布,优于现有方法。

详情
Journal ref
Lecture Notes in Computer Science, 2024, pp. 105-116
AI中文摘要

我们提出了一种方法,用于对结构为随机树的对象进行分类。我们的目标是在树数据结构与节点属性(通常为高维嵌入)相关联的情况下,建模节点标签分配的分布。树拓扑不是预设的,在推断过程中没有节点标签存在。其他方法要么假设标签分配的条件独立性,要么在固定图拓扑上操作,或需要部分节点标签被观察。我们的方法定义了具有随机树相应拓扑的马尔可夫网络及其关联的吉布斯分布。我们用图神经网络参数化吉布斯分布,该网络在随机树和节点嵌入上操作。这使得我们能够估计给定随机树的节点分配的似然,并使用MCMC从节点分配分布中采样。我们评估了该方法在斯坦福情感树库数据集上的节点分类任务,结果优于基线方法,证明了其在随机树中联合分布建模的有效性。

英文摘要

We propose a method for the classification of objects that are structured as random trees. Our aim is to model a distribution over the node label assignments in settings where the tree data structure is associated with node attributes (typically high dimensional embeddings). The tree topology is not predetermined and none of the label assignments are present during inference. Other methods that produce a distribution over node label assignment in trees (or more generally in graphs) either assume conditional independence of the label assignment, operate on a fixed graph topology, or require part of the node labels to be observed. Our method defines a Markov Network with the corresponding topology of the random tree and an associated Gibbs distribution. We parameterize the Gibbs distribution with a Graph Neural Network that operates on the random tree and the node embeddings. This allows us to estimate the likelihood of node assignments for a given random tree and use MCMC to sample from the distribution of node assignments. We evaluate our method on the tasks of node classification in trees on the Stanford Sentiment Treebank dataset. Our method outperforms the baselines on this dataset, demonstrating its effectiveness for modeling joint distributions of node labels in random trees.

2407.00396 2026-06-09 cs.CL cs.AI

A Study on Effect of Reference Knowledge Choice in Generating Technical Content Relevant to SAPPhIRE Model Using Large Language Model

基于SAPPhIRE模型因果关系的生成技术内容参考知识选择研究

Kausik Bhattacharya, Anubhab Majumder, Amaresh Chakrabarti

发表机构 * Indian Institute of Science(印度科学研究院)

AI总结 本文研究如何利用大语言模型生成与SAPPhIRE因果关系模型相关的技术内容,通过检索增强生成方法抑制幻觉,强调参考知识选择对生成准确性的重要性。

详情
AI中文摘要

使用SAPPhIRE因果关系模型表示系统可以成为设计的灵感来源。然而,创建技术或自然系统的SAPPhIRE模型需要从多个技术文档中获取系统工作原理的技术知识。本研究探讨如何利用大语言模型(LLM)生成准确的相关技术内容。本文是两部分研究中的第一部分,提出了一种使用检索增强生成方法来抑制幻觉,从而生成由相关科学信息支持的技术内容的方法。研究结果表明,用于为LLM生成技术内容提供上下文的参考知识选择非常重要。本研究的成果用于构建一个软件支持工具,以生成给定技术系统的SAPPhIRE模型。

英文摘要

Representation of systems using the SAPPhIRE model of causality can be an inspirational stimulus in design. However, creating a SAPPhIRE model of a technical or a natural system requires sourcing technical knowledge from multiple technical documents regarding how the system works. This research investigates how to generate technical content accurately relevant to the SAPPhIRE model of causality using a Large Language Model, also called LLM. This paper, which is the first part of the two-part research, presents a method for hallucination suppression using Retrieval Augmented Generating with LLM to generate technical content supported by the scientific information relevant to a SAPPhIRE con-struct. The result from this research shows that the selection of reference knowledge used in providing context to the LLM for generating the technical content is very important. The outcome of this research is used to build a software support tool to generate the SAPPhIRE model of a given technical system.

2402.09193 2026-06-09 cs.CL cs.AI cs.HC

(Ir)rationality and Cognitive Biases in Large Language Models

非理性与大语言模型中的认知偏差

Olivia Macmillan-Scott, Mirco Musolesi

发表机构 * University College London(伦敦大学) University of Bologna(博洛尼亚大学)

AI总结 本文通过心理学文献中的任务评估七种语言模型,发现其在非理性表现上与人类相似,但表现形式不同,且存在响应不一致的额外非理性特征。

详情
Journal ref
Royal Society Open Science 11(6) 2024
AI中文摘要

大型语言模型(LLMs)表现出理性推理吗?LLMs已被证明包含人类偏见,因为它们训练的数据中包含这些偏见;这种偏见是否反映在理性推理中尚不明确。在本文中,我们通过认知心理学文献中的任务评估了七种语言模型,以回答这个问题。我们发现,像人类一样,LLMs在这些任务中表现出非理性。然而,这种非理性表现的方式并不反映人类所展示的方式。当LLMs在这些任务中给出错误答案时,它们往往以与人类偏见不同的方式错误。此外,LLMs还揭示了响应中显著不一致性的额外非理性层。除了实验结果外,本文还希望通过展示如何评估和比较这些模型的不同能力,做出方法论上的贡献,特别是在理性推理方面。

英文摘要

Do large language models (LLMs) display rational reasoning? LLMs have been shown to contain human biases due to the data they have been trained on; whether this is reflected in rational reasoning remains less clear. In this paper, we answer this question by evaluating seven language models using tasks from the cognitive psychology literature. We find that, like humans, LLMs display irrationality in these tasks. However, the way this irrationality is displayed does not reflect that shown by humans. When incorrect answers are given by LLMs to these tasks, they are often incorrect in ways that differ from human-like biases. On top of this, the LLMs reveal an additional layer of irrationality in the significant inconsistency of the responses. Aside from the experimental results, this paper seeks to make a methodological contribution by showing how we can assess and compare different capabilities of these types of models, in this case with respect to rational reasoning.

2107.07599 2026-06-09 cs.RO

Partially Observable Markov Decision Processes (POMDPs) and Robotics

部分可观测马尔可夫决策过程(POMDPs)与机器人学

Hanna Kurniawati

发表机构 * School of Computing, Australian National University(澳大利亚国立大学计算机学院)

AI总结 本文综述了POMDPs在机器人学中的应用,讨论了计算复杂性问题及采样求解器的改进,展示了POMDPs在提高机器人系统鲁棒性方面的贡献。

详情
Journal ref
Annual Review of Control, Robotics, and Autonomous Systems Vol. 5:253-277, 2022
AI中文摘要

在不确定性规划中,POMDP是一种数学框架。尽管POMDP因计算复杂性被认为不适用于机器人学,但自2000年以来,基于采样的近似求解器的进展使其在合理计算资源下能显著提高机器人系统的鲁棒性,从而在许多实际机器人问题中变得实用。本文综述了POMDPs,强调了阻碍其在机器人学中实用性的计算问题,以及采样求解器中缓解这些困难的思路,以及将POMDPs应用于物理机器人所获得的经验。

英文摘要

Planning under uncertainty is critical to robotics. The Partially Observable Markov Decision Process (POMDP) is a mathematical framework for such planning problems. It is powerful due to its careful quantification of the non-deterministic effects of actions and partial observability of the states. But precisely because of this, POMDP is notorious for its high computational complexity and deemed impractical for robotics. However, since early 2000, POMDPs solving capabilities have advanced tremendously, thanks to sampling-based approximate solvers. Although these solvers do not generate the optimal solution, they can compute good POMDP solutions that significantly improve the robustness of robotics systems within reasonable computational resources, thereby making POMDPs practical for many realistic robotics problems. This paper presents a review of POMDPs, emphasizing computational issues that have hindered its practicality in robotics and ideas in sampling-based solvers that have alleviated such difficulties, together with lessons learned from applying POMDPs to physical robots.

2101.01060 2026-06-09 cs.CV cs.AI cs.MM

Personal Privacy Protection via Irrelevant Faces Tracking and Pixelation in Video Live Streaming

通过无关面孔跟踪和像素化实现个人隐私保护在视频直播中

Jizhe Zhou, Chi-Man Pun

发表机构 * IEEE

AI总结 本文提出FPVLS方法,通过帧到视频的双阶段结构实现视频直播中的自动隐私过滤,解决目标漂移、计算效率和过度像素化问题。

详情
Journal ref
IEEE Transactions on Information Forensics and Security, 16, 1088-1103 (2020)
AI中文摘要

截至目前,旨在保护隐私的像素化任务仍然劳动密集且尚未被深入研究。随着视频直播的普及,建立在线直播中的面部像素化机制已成为紧迫需求。本文开发了一种名为视频直播中的面部像素化(FPVLS)的新方法,以在非约束直播活动中自动生成自动个人隐私过滤。简单地应用多面部跟踪器会遇到目标漂移、计算效率和过度像素化的问题。因此,为了快速准确地对无关人员的面部进行像素化,FPVLS采用帧到视频的双阶段结构。在单帧上,FPVLS利用基于图像的面部检测和嵌入网络生成面部向量。在原始轨迹生成阶段,所提出的定位增量仿射传播(PIAP)聚类算法利用面部向量和定位信息,快速关联跨帧的同一人的面部。这样的帧级累积原始轨迹在视频级别上可能具有间断性和不可靠性。因此,我们进一步引入轨迹细化阶段,该阶段结合提案网络和基于经验似然比(ELR)统计量的两样本测试,以细化原始轨迹。在细化轨迹上应用高斯滤波器以最终实现像素化。在我们收集的视频直播数据集上,FPVLS获得了令人满意的准确性、实时效率,并且包含过度像素化问题。

英文摘要

To date, the privacy-protection intended pixelation tasks are still labor-intensive and yet to be studied. With the prevailing of video live streaming, establishing an online face pixelation mechanism during streaming is an urgency. In this paper, we develop a new method called Face Pixelation in Video Live Streaming (FPVLS) to generate automatic personal privacy filtering during unconstrained streaming activities. Simply applying multi-face trackers will encounter problems in target drifting, computing efficiency, and over-pixelation. Therefore, for fast and accurate pixelation of irrelevant people's faces, FPVLS is organized in a frame-to-video structure of two core stages. On individual frames, FPVLS utilizes image-based face detection and embedding networks to yield face vectors. In the raw trajectories generation stage, the proposed Positioned Incremental Affinity Propagation (PIAP) clustering algorithm leverages face vectors and positioned information to quickly associate the same person's faces across frames. Such frame-wise accumulated raw trajectories are likely to be intermittent and unreliable on video level. Hence, we further introduce the trajectory refinement stage that merges a proposal network with the two-sample test based on the Empirical Likelihood Ratio (ELR) statistic to refine the raw trajectories. A Gaussian filter is laid on the refined trajectories for final pixelation. On the video live streaming dataset we collected, FPVLS obtains satisfying accuracy, real-time efficiency, and contains the over-pixelation problems.

2606.09820 2026-06-09 math.FA cs.LG math.PR q-fin.MF stat.ML 新提交

Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

无限维流形上可微映射的加权通用逼近

Philipp Schmocker, Josef Teichmann

发表机构 * Department of Mathematics, ETH Zurich, Switzerland(苏黎世联邦理工学院数学系)

AI总结 通过加权Nachbin定理,将函数输入神经网络的通用逼近定理推广到可微映射,包括导数逼近,并应用于非预期泛函和路径空间泛函的逼近。

Comments 77 pages, 3 figures

详情
AI中文摘要

我们将函数输入神经网络(FNN)的通用逼近定理推广到可微映射,包括导数的逼近。FNN将输入从可能无限维的加权流形映射到实值隐藏层,在该层上应用非线性标量激活函数,然后通过一些线性读出将输出返回到Banach空间。通过证明加权Nachbin定理,我们建立了可微映射的通用逼近定理(UAT),该定理超越了紧集上的通常表述,并且还包括导数的逼近。这导致了非预期泛函(包括水平和垂直导数)的逼近结果。作为进一步的应用,我们证明了签名的线性函数能够逼近路径空间泛函,包括它们的方向导数。

英文摘要

We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifold to the real-valued hidden layer, on which a non-linear scalar activation function is applied, and then returns the output into a Banach space via some linear readouts. By proving a weighted Nachbin theorem, we establish a universal approximation theorem (UAT) for differentiable maps, which goes beyond the usual formulation on compact sets and also includes the approximation of the derivatives. This leads us to approximation results for non-anticipative functionals including the horizontal and vertical derivatives. As a further application, we show that linear functions of the signature are able to approximate path space functionals including their directional derivatives.