AI中文摘要
在二分图中,$(\alpha,\beta)$-核是一种广泛用于凝聚子图挖掘的模型。具体而言,一个$(\alpha,\beta)$-核是一个最大子图,其中上层每个顶点的度数至少为$\alpha$,下层每个顶点的度数至少为$\beta$。最先进的基于CPU的解决方案需要为所有$\alpha$和$\beta$组合构建索引结构,成本高昂,导致在大规模二分图上存在可扩展性挑战。此外,在线查询旨在判断边更新是否属于目标$(\alpha,\beta)$-核,对于欺诈监控和推荐系统等实时应用至关重要。然而,现有的基于索引的方法由于维护开销高,难以支持大规模下的此类查询。在本文中,我们研究如何利用GPU架构实现高效的$(\alpha,\beta)$-核计算并支持在线查询。虽然GPU被广泛用于加速图处理,但其有限的内存容量使得存储大型索引结构不切实际。为解决此问题,我们提出GCC,一种无索引的基于GPU的剥离算法,通过以warp为中心的处理加速$(\alpha,\beta)$-核计算。为进一步提高效率,我们开发了GCC+,利用$(\alpha,\beta)$-核的嵌套性质,采用基于核的早期剪枝策略。为处理在线查询,我们提出GFQ,一种连通性感知算法,通过利用连通分量信息显著缩小计算范围,从而避免全图剥离。在11个数据集上的大量实验表明,我们提出的技术在空间和时间效率上均优于现有的基于CPU的解决方案。
英文摘要
In bipartite graphs, $(α,β)$-core is a widely used model for cohesive subgraph mining. Specifically, an $(α,β)$-core is a maximal subgraph in which each vertex in the upper layer has degree at least $α$, and each vertex in the lower layer has degree at least $β$. The state-of-the-art CPU-based solutions incur extensive costs to construct an index structure for all $α$ and $β$ combinations, leading to scalability challenges on large bipartite graphs. Moreover, on-the-fly queries, which aim to determine whether an edge update belongs to a target $(α,β)$-core, are essential for real-time applications such as fraud monitoring and recommendation systems. However, existing index-based methods struggle to support such queries at scale due to their high maintenance overhead. In this paper, we investigate how to leverage GPU architectures to enable efficient $(α,β)$-core computation and support on-the-fly queries. While GPUs are widely used to accelerate graph processing, their limited memory capacity makes it impractical to store large index structures. To address this issue, we propose GCC, an index-free GPU-based peeling algorithm that accelerates $(α,β)$-core computation via warp-centric processing. To further improve efficiency, we develop GCC+, which leverages the nested property of $(α,β)$-core with a core-based early pruning strategy. For handling on-the-fly queries, we propose GFQ, a connectivity-aware algorithm that significantly narrows the computation scope by leveraging connected component information, thereby avoiding full-graph peeling. Extensive experiments on 11 datasets demonstrate that our proposed techniques outperform existing CPU-based solutions in terms of both space and time efficiency.