arXivDaily arXiv每日学术速递 周一至周五更新
重置
cs.DL数字图书馆7
2606.12071 2026-06-11 cs.DL cs.AI 新提交

On the Limits of LLM-as-Judge for Scientific Novelty Assessment

论LLM作为评审在科学新颖性评估中的局限性

Soumitra Sinhahajari, Navonil Majumder, Soujanya Poria

AI总结 本文通过构建RQ-Bench基准,发现LLM评审对模型生成的研究问题产生新颖性幻觉,而人类专家则持相反意见,揭示了LLM在评估科学新颖性时的可靠性问题。

详情
AI中文摘要

LLM越来越多地被用于生成和评判科学想法。这使得新颖性评估成为一个核心问题。完整想法的评估很困难,因为它通常需要判断方法、可行性及其经验前景。因此,我们研究一个更清晰的上游对象:研究问题(RQ)。RQ生成是科学构思的前提,并且RQ可以与真实论文中探讨的问题进行比较。我们引入了RQ-Bench,一个基于近期arXiv论文构建的基准。对于每篇论文,我们从其引用的背景、空白和贡献中重建作者锚定的RQ。这些RQ并非针对同一背景的唯一有效问题。它们是用于测试新颖性判断的作者锚定参考点。我们使用独立LLM评审、比较LLM评审和人类专家评估来评估模型生成的RQ。LLM评审一致地将模型生成的RQ评为高度新颖,产生新颖性幻觉;在比较评估中,这种偏好甚至更强。然而,领域专家得出相反结论,更偏好作者锚定的参考问题。我们进一步发现,许多生成的RQ狭窄或受限于来源,这是LLM评审通常忽略的维度,除非明确测试。总体而言,LLM评审与人类专家之间矛盾的新颖性评估引发了关于使用LLM评估研究问题科学新颖性可靠性的严重担忧。

英文摘要

LLMs are increasingly used to generate and judge scientific ideas. This makes novelty evaluation a central problem. Full idea evaluation is difficult because it often requires judging a method, its feasibility, and its empirical promise. We therefore study a cleaner upstream object: the research question (RQ). RQ generation is a prerequisite for scientific ideation, and RQs can be compared against questions pursued in real papers. We introduce RQ-Bench, a benchmark built from recent arXiv papers. For each paper, we reconstruct author-anchored RQs from its cited background, gaps, and contributions. These RQs are not the only valid questions for the same background. They are author-anchored reference points for testing novelty judgments. We evaluate model-generated RQs with standalone LLM judging, comparative LLM judging, and human expert evaluation. LLM judges consistently rate model-generated RQs as highly novel, producing a novelty mirage; in comparative evaluations, this preference becomes even stronger. Domain experts, however, reach the opposite conclusion and prefer the author-anchored reference questions. We further find that many generated RQs are narrow or source-bound, a dimension that LLM judges often miss unless explicitly tested. Overall, the contradictory novelty evaluations between LLM judges and human experts raise a serious concern about the reliability of using LLMs to assess the scientific novelty of research questions.

2606.11430 2026-06-11 cs.DL cs.AI cs.LO 新提交

Towards a Bridge Layer Between Bibliographic and Formalized Mathematical Knowledge

迈向文献与形式化数学知识之间的桥梁层

A. Mayeux

AI总结 提出一个关系型桥接数据库,对齐出版物元数据与形式化工件,并引入论文级形式化评分,通过跨文档对齐估计形式化覆盖度,以整合文献与形式化数学生态系统。

详情
AI中文摘要

数学知识分散在文献数据库(如MathSciNet、zbMATH Open)和形式化证明库(如Lean mathlib)中,阻碍了已发表结果与其形式化之间的统一访问。我们提出了一个关系型桥接数据库,将出版物元数据与形式化工件对齐,为数学文献和机器可验证证明提供互操作层。我们引入了一个论文级形式化评分,衡量一篇出版物在形式化系统中的覆盖程度。作为可行性研究,我们展示了如何通过非正式文本与Lean形式化之间的跨文档对齐来估计此类评分,从而实现对形式化覆盖度的大规模分析。该框架是将文献和形式化数学生态系统整合为可扩展、机器可操作的知识图谱的第一步,该图谱将出版物与形式化证明对象关联起来。

英文摘要

Mathematical knowledge is split between bibliographic databases (e.g., MathSciNet, zbMATH Open) and formal proof libraries (e.g., Lean mathlib), preventing unified access between published results and their formalizations. We propose a relational bridge-database that aligns publication metadata with formal artifacts, providing an interoperability layer between mathematical literature and machine-verifiable proofs. We introduce a paper-level formalization score that measures how much of a publication is covered in formal systems. As a feasibility study, we show how such scores can be estimated via cross-document alignment between informal texts and Lean formalizations, enabling large-scale analysis of formalization coverage. This framework is a first step toward integrating bibliographic and formal mathematical ecosystems into scalable, machine-actionable knowledge graphs linking publications to formal proof objects.

2606.11241 2026-06-11 cs.DL cond-mat.mtrl-sci 新提交

APTLAS: An Indexed APT Literature Repository

APTLAS:一个索引化的APT文献库

Bavley Guerguis, Nabil Bassim

AI总结 针对原子探针层析技术(APT)文献分散且缺乏领域特定元数据的问题,构建了包含约2300条记录并附带元数据的索引库APTLAS,支持按材料体系、仪器等过滤。

详情
AI中文摘要

原子探针层析技术(APT)文献广泛、快速增长且分散在众多期刊中,这使得识别特定材料体系、仪器或分析方法上的先前工作变得困难。传统搜索引擎(如Google Scholar)擅长一般检索,但不保留通常决定APT出版物相关性的领域特定元数据(例如,分析模式、激光波长或仪器配置)。本文介绍了APTLAS,这是一个已发表APT文献的索引库。目前,该数据库包含约2300条记录,每条记录都附有从源出版物中提取的元数据。随附的网络工具(可从此https URL访问)允许用户按材料体系、仪器、应用、出版物类型或关键词搜索进行浏览和过滤。

英文摘要

Atom probe tomography (APT) literature is broad, rapidly growing, and dispersed across a wide range of journals, which can make it difficult to identify prior work on a given material system, instrument, or analytical approach. Conventional search engines (e.g., Google Scholar) excel at general retrieval but do not preserve the domain-specific metadata that often determines the relevance of an APT publication (e.g., analysis mode, laser wavelength, or instrument configuration). Herein, APTLAS is presented, which is an indexed repository of published APT literature. At present, the database contains ~2,300 records, each accompanied by metadata extracted from the source publication. The accompanying web tool, available at this https URL, allows users to browse and filter by material system, instrument, application, publication type, or keyword search.

2606.11021 2026-06-11 cs.DL cs.CY 版本更新

Making a Name for Myself: On Academic Naming Policies and their Impact

为自己正名:论学术命名政策及其影响

A Pranav, Vagrant Gautam, Martin Mundt, Jordan Taylor, Arjun Subramonian, Franziska Sofia Hafner, Daniel Chechelnitsky, William Agnew, Anne Lauscher

AI总结 通过混合方法(调查、访谈及2019-2025年八大计算机科学会议的大规模引文分析),研究命名变更政策对学者引文准确性和心理健康的影响,发现可见的命名变更政策显著减少引文错误,且跨性别研究者的死名现象在2019-2024年间下降92%。

详情
Comments
Accepted at FAccT 2026. This version has corrected some typos
AI中文摘要

在学术出版中,姓名将学者与其工作联系起来。当学者因婚姻、学术认可或性别过渡等原因更改姓名时,他们可能会失去对过去工作的归属。然而,尽管这对引文准确性和研究者福祉有重大影响,目前尚无研究探讨计算机科学领域的命名政策如何服务于更改姓名的研究者。我们采用混合方法,结合调查、访谈以及对2019-2025年八个主要计算机科学场所论文的大规模引文分析。我们记录了建立首个姓名变更政策的多年代倡导努力,识别了实施障碍,包括出版商更新不完整和长达数月的处理延迟。即使出版商更新后,研究者仍被错误解析和不正确的姓名引用。当这些引文错误发生时,受访者报告了显著的心理健康影响,包括压力、焦虑和安全风险。实证发现,拥有可访问且可见的姓名变更政策的场所,其引文错误显著少于政策不可访问的场所(每千篇论文899 vs. 996个错误)。我们的注释分析显示,跨性别研究者在引文中的死名现象从2019年到2024年减少了92%。我们的发现证明了包容性出版政策的重要性,而由跨性别研究者主导的姓名变更政策倡导是重要推动力。我们建议场所采用主动可见的姓名变更政策,支持酷儿倡导团体,并改进出版基础设施,以构建包容的出版环境。

英文摘要

In academic publishing, names connect scholars to their work. When scholars change their names, including for marriage, academic recognition, or gender transition, they may lose credit for past publications. However, despite significant impacts on citation accuracy and researcher well-being, no existing studies examine how naming policies in computer science serve researchers who change their names. We use a mixed-methods approach combining surveys, interviews, and large-scale citation analysis of papers from eight major computer science venues from 2019-2025. We document the multi-year advocacy effort that established the first name change policies, identify implementation barriers including incomplete publisher updates and months-long processing delays. Researchers continue being cited with misparsed and incorrect names despite publisher updates. When these citation errors happen, interviewees report significant mental health impacts, including stress, anxiety, and safety risks. Empirically, we find that venues with accessible and visible name change policies have significantly fewer citation errors compared to inaccessible policies (899 vs. 996 errors per 1,000 papers). Our annotation analysis shows that deadnaming of transgender researchers in citations decreased by 92% from 2019 to 2024. Our findings demonstrate the importance of inclusive publishing policies, for which name change policy advocacy led by trans researchers has been a significant driver. We recommend that venues adopt proactive visible name change policies, support queer advocacy groups, and improve publication infrastructure to build an inclusive publishing landscape. The accompanied toolkit to check errors in bibliographic latex file is available here this https URL.

2604.15150 2026-06-11 cs.DL 版本更新

A Semantic Geometry for Uncovering Paradigm Dynamics via Scientific Publications

一种通过科学出版物揭示范式动态的语义几何

Jinchang Liu, Qingshan Zhou, Hongkan Chen, Yi Bu

AI总结 提出基于R-P-C(参考文献、焦点出版物、引文)框架的语义几何,通过量化出版物的知识基础与扩散的语义相似性和距离,识别巩固型、探索型和平衡型三类出版物,揭示范式动态。

详情
Comments
40 pages,17figures
AI中文摘要

科学进步不仅通过积累已发现的模式,还通过改变新问题和解决方案的表达方式。虽然结构指标追踪学术关注,但它们只是意义重组的间接代理。我们提出一种基于R-P-C(参考文献、焦点出版物和引文)框架的语义几何,用于量化出版物相对于其知识基础和扩散的定位。该几何识别三种出版物类型:巩固型、探索型和平衡型。我们的结果表明,出版物的知识基础与扩散之间的语义相似性和距离构成了颠覆的语义基础,而新颖性(非典型参考组合)作为触发语义断裂的前驱扰动。这与团队规模相关,小团队保持更高的探索性偏离潜力,而大合作则系统性地与范式巩固一致。关键的是,该几何解释了为何引文轨迹不同:巩固型研究通过降低理解成本获得快速认可,而探索型工作面临高范式转换成本,导致更慢、更具选择性的扩散。总体而言,这个R-P-C框架为监测科学范式动态提供了稳健的工具。

英文摘要

Science advances not only by accumulating discovered patterns but by changing how new problems and solutions are expressed. While structural indicators track scholarly attention, they offer only an indirect proxy for the reorganization of meaning. We propose a semantic geometry based on the R-P-C (references, focal publication, and citing publications) framework to quantify how a publication positions itself relative to its knowledge base and diffusion. This geometry identifies three publication types: consolidating, exploratory and balanced. Our results show that the semantic similarity and distance between a publication's knowledge base and diffusion serve as a semantic foundation for disruption, with novelty (atypical reference combinations) acting as an antecedent disturbance that triggers a semantic rupture. This is related to team size, where small teams preserve a higher potential for exploratory departures while large collaborations systematically align with paradigmatic consolidation. Crucially, this geometry explains why citation trajectories differ; consolidating research earns rapid recognition by lowering comprehension costs, while exploratory work faces high paradigm conversion costs that result in slower, more selective diffusion. Collectively, this R-P-C framework provides a robust instrument for monitoring the dynamics of scientific paradigms.

2604.08619 2026-06-11 cs.DL cs.CY

Doctoral Theses in France (1985-2025): A Linked Dataset of PhDs, Academic Networks, and Institutions

William Aboucaya, Dastan Jasim

详情
Comments
11 pages + 6 appendix pages, 7 figures, 2 tables. See https://doi.org/10.5281/zenodo.19453191 for the dataset. See https://github.com/WilliamAboucaya/phd-theses-france for the code to reproduce the dataset and figures Version 2: Fixed references to tables and figures. Modified unclear wordings in section 3. Updated values in the languages table after a minor bug fix. Standardized figures style
英文摘要

This paper presents a comprehensive dataset of doctoral theses defended in France between 1985 and 2025, constructed from multiple national academic metadata sources. The dataset is primarily based on data from the French national thesis platform and is enriched using additional authority and bibliographic databases to improve data quality, completeness, and interoperability. The data production pipeline includes the aggregation of heterogeneous sources, the correction of inconsistent identifiers, the enrichment of person and institution records, and the construction of derived variables describing academic careers, jury participation, institutional affiliations, and thesis characteristics. Additional identifiers from major academic repositories and library catalogues are integrated to facilitate linkage with external data sources and future dataset extensions. The resulting dataset provides structured information at the thesis, individual, and institutional levels, enabling both descriptive and relational analyses. This resource is particularly suited for research on doctoral education, academic networks, supervision practices, jury composition, institutional collaboration, and the evolution of research communities over time. The paper documents the data sources, processing pipeline, feature construction, data quality issues, and limitations, with the objective of facilitating reuse of the dataset by other researchers and supporting future extensions and longitudinal analyses of the academic system.

2510.16152 2026-06-11 cs.DL cs.AI cs.CL cs.LG 版本更新

Mapping Scientific Literature with Large Language Models and Topic Modeling

利用大语言模型和主题建模绘制科学文献图谱

Mason Smetana, Lev Khazanovich

AI总结 提出基于大语言模型的两阶段分类框架,通过主题建模分析PNAS工程类文献,生成语义可解释主题并揭示跨主题关联,性能优于传统方法。

详情
Comments
35 pages, 10 figures. Accepted for publication in Scientometrics. Final version available via DOI
AI中文摘要

科学文献因学科边界、专业术语和潜在稀疏的关键词系统而日益碎片化,使得捕捉现代科学的演化结构变得困难。本研究引入了一个大语言模型驱动的框架,从主题建模的角度绘制科学文献图谱。该方法在《美国国家科学院院刊》20年间超过1500篇工程相关文章语料上进行了演示。一个两阶段分类流水线首先根据每篇文章的摘要分配一个主要主题类别,然后进行全文分析以识别次要分类,揭示语料库中潜在的跨主题联系。与传统主题模型不同,基于LLM的框架在保持强量化性能的同时,生成语义可解释的主题。与既定主题建模方法的比较评估显示,主题多样性更高,重叠度更低,且具有竞争性的一致性指标。对随机抽样的摘要子集进行手动验证,准确率达到75.9%。额外的传统自然语言处理分析证实,生成的主题对应于语料库中有意义的语言模式。连接主要和次要分类的二部网络进一步揭示了仅通过摘要或关键词系统不易观察到的隐含主题关系。结果表明,该框架无需事先了解期刊的编辑双重分类结构,即可独立恢复其大部分结构。总体而言,所提出的方法为绘制科学图谱和识别研究中新兴的跨主题联系提供了有力工具。

英文摘要

Scientific literature is increasingly fragmented by disciplinary boundaries, specialized terminology, and potentially sparse keyword systems, making it difficult to capture the evolving structure of modern science. This study introduces a large language model (LLM)-driven framework for mapping scientific literature from a topic modeling perspective. The approach is demonstrated on a 20-year corpus of more than 1,500 engineering-related articles published in the Proceedings of the National Academy of Sciences (PNAS). A two-stage classification pipeline first assigns a primary thematic category to each article based on its abstract, followed by full-text analysis to identify secondary classifications that reveal latent cross-topic connections within the corpus. Unlike conventional topic models, the LLM-based framework produces semantically interpretable topics while maintaining strong quantitative performance. Comparative evaluation against established topic modeling methods shows higher topic diversity and lower overlap with competitive coherence metrics. Manual validation on a randomly sampled subset of abstracts yields an accuracy of 75.9%. Additional traditional natural language processing analyses confirm that the generated topics correspond to meaningful linguistic patterns in the corpus. A bipartite network linking primary and secondary classifications further reveals implicit thematic relationships that are not readily observable through abstracts or keyword systems alone. The findings indicate that the framework independently recovers much of the journal's editorial dual-classification structure without prior knowledge of its schema. Overall, the proposed approach offers a powerful tool for mapping science and identifying emerging cross-topic connections in research.