arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.03782 2026-03-12 cond-mat.supr-con cond-mat.str-el cs.AI

Expert Evaluation of LLM World Models: A High-$T_c$ Superconductivity Case Study

Haoyu Guo, Maria Tikhanovskaya, Paul Raccuglia, Alexey Vlaskin, Chris Co, Daniel J. Liebling, Scott Ellsworth, Matthew Abraham, Elizabeth Dorfman, N. P. Armitage, Chunhan Feng, Antoine Georges, Olivier Gingras, Dominik Kiese, Steven A. Kivelson, Vadim Oganesyan, B. J. Ramshaw, Subir Sachdev, T. Senthil, J. M. Tranquada, Michael P. Brenner, Subhashini Venugopalan, Eun-Ah Kim

Comments (v1) 9 pages, 4 figures, with 7-page supporting information. Accepted at the ICML 2025 workshop on Assessing World Models and the Explorations in AI Today workshop at ICML'25

详情

DOI: 10.1073/pnas.2533676123
Journal ref: Proceedings of the National Academy of Sciences 123, e2533676123 (2026)

英文摘要

Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. Using the field of high-temperature cuprates as an exemplar, we evaluate the ability of LLM systems to understand the literature at the level of an expert. We construct an expert-curated database of 1,726 scientific papers that covers the history of the field, and a set of 67 expert-formulated questions that probe deep understanding of the literature. We then evaluate six different LLM-based systems for answering these questions, including both commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. Experts then evaluate the answers of these systems against a rubric that assesses balanced perspectives, factual comprehensiveness, succinctness, and evidentiary support. Among the six systems two using RAG on curated literature outperformed existing closed models across key metrics, particularly in providing comprehensive and well-supported answers. We discuss promising aspects of LLM performances as well as critical short-comings of all the models. The set of expert-formulated questions and the rubric will be valuable for assessing expert level performance of LLM based reasoning systems.

URL PDF HTML ☆

赞 0 踩 0

2509.18404 2026-03-12 math.OC cs.LG

Zero-Shot Transferable Solution Method for Parametric Optimal Control Problems

Xingjian Li, Kelvin Kan, Deepanshu Verma, Krishna Kumar, Stanley Osher, Ján Drgoňa

Comments 11 pages, 6 figures, 3 tables

2506.11687 2026-03-12 cs.CR cs.AI cs.LG cs.NE

Differential Privacy in Machine Learning: A Survey from Symbolic AI to LLMs

Francisco Aguilera-Martínez, Fernando Berzal

2504.09831 2026-03-12 stat.ML cs.AI cs.LG math.ST stat.AP stat.TH

Offline Dynamic Inventory and Pricing Strategy: Addressing Censored and Dependent Demand

Korel Gundem, Zhengling Qi

2603.10807 2026-03-12 q-fin.CP cs.AI cs.CY

Risk-Adjusted Harm Scoring for Automated Red Teaming for LLMs in Financial Services

Fabrizio Dimino, Bhaskarjit Sarmah, Stefano Pasquali

2603.10802 2026-03-12 cs.NI cs.AI cs.LG cs.SY eess.SY

Towards Intelligent Spectrum Management: Spectrum Demand Estimation Using Graph Neural Networks

Mohamad Alkadamani, Amir Ghasemi, Halim Yanikomeroglu

Comments 13 pages, 10 figures. Submitted to IEEE Transactions on Machine Learning in Communications and Networking

2603.10753 2026-03-12 cs.CR cs.LG

A PUF-Based Approach for Copy Protection of Intellectual Property in Neural Network Models

Daniel Dorfmeister, Flavio Ferrarotti, Bernhard Fischer, Martin Schwandtner, Hannes Sochor

2603.10750 2026-03-12 cs.IT cs.AI cs.LG math.IT

Deep Randomized Distributed Function Computation (DeepRDFC): Neural Distributed Channel Simulation

Didrik Bergström, Onur Günlü

2603.10721 2026-03-12 cs.DS cs.LG

Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

Kangke Cheng, Shihong Song, Guanlin Mo, Hu Ding

2603.10720 2026-03-12 cs.DS cs.CG cs.RO

Sublinear-Time Reconfiguration of Programmable Matter with Joint Movements

Manish Kumar, Othon Michail, Andreas Padalkin, Christian Scheideler

2603.10700 2026-03-12 cs.IR cs.AI

Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

Andrea Volpini, Elie Raad, Beatrice Gamba, David Riccitelli

Comments 33 pages, 7 figures, reproducibility appendix, dataset/evaluation framework/enhanced entity page templates released with the paper

2603.10697 2026-03-12 cs.DB cs.AI cs.CL cs.LG

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Tianshu Zhang, Kun Qian, Siddhartha Sahai, Yuan Tian, Shaddy Garg, Huan Sun, Yunyao Li

Comments Accepted by VLDB 2025

详情

英文摘要

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.

URL PDF HTML ☆

赞 0 踩 0

2603.10692 2026-03-12 cs.CR cs.AI

Repurposing Backdoors for Good: Ephemeral Intrinsic Proofs for Verifiable Aggregation in Cross-silo Federated Learning

Xian Qin, Xue Yang, Xiaohu Tang

2603.10680 2026-03-12 cs.HC cs.AI

A Platform-Agnostic Multimodal Digital Human Modelling Framework: Neurophysiological Sensing in Game-Based Interaction

Daniel J. Buxton, Mufti Mahmud, Jordan J. Bird, Thomas Hughes-Roberts, David J. Brown

详情

英文摘要

Digital Human Modelling (DHM) is increasingly shaped by advances in AI, wearable biosensing, and interactive digital environments, particularly in research addressing accessibility and inclusion. However, many AI-enabled DHM approaches remain tightly coupled to specific platforms, tasks, or interpretative pipelines, limiting reproducibility, scalability, and ethical reuse. This paper presents a platform-agnostic DHM framework designed to support AI-ready multimodal interaction research by explicitly separating sensing, interaction modelling, and inference readiness. The framework integrates the OpenBCI Galea headset as a unified multimodal sensing layer, providing concurrent EEG, EMG, EOG, PPG, and inertial data streams, alongside a reproducible, game-based interaction environment implemented using SuperTux. Rather than embedding AI models or behavioural inference, physiological signals are represented as structured, temporally aligned observables, enabling downstream AI methods to be applied under appropriate ethical approval. Interaction is modelled using computational task primitives and timestamped event markers, supporting consistent alignment across heterogeneous sensors and platforms. Technical verification via author self-instrumentation confirms data integrity, stream continuity, and synchronisation; no human-subjects evaluation or AI inference is reported. Scalability considerations are discussed with respect to data throughput, latency, and extension to additional sensors or interaction modalities. Illustrative use cases demonstrate how the framework can support AI-enabled DHM and HCI studies, including accessibility-oriented interaction design and adaptive systems research, without requiring architectural modifications. The proposed framework provides an emerging-technology-focused infrastructure for future ethics-approved, inclusive DHM research.

URL PDF HTML ☆

赞 0 踩 0

2603.10671 2026-03-12 cs.AR cs.CV eess.IV

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

Qiyue Chen, Yao Li, Jie Tao, Song Chen, Li Li, Dong Liu

2603.10641 2026-03-12 cs.CR cs.AI cs.LG

Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection

Eirik Høyheim, Magnus Wiik Eckhoff, Gudmund Grov, Robert Flood, David Aspinall

2603.10623 2026-03-12 eess.AS cs.LG cs.SD

Geo-ATBench: A Benchmark for Geospatial Audio Tagging with Geospatial Semantic Context

Yuanbo Hou, Yanru Wu, Qiaoqiao Ren, Shengchen Li, Stephen Roberts, Dick Botteldooren

详情

英文摘要

Environmental sound understanding in computational auditory scene analysis (CASA) is often formulated as an audio-only recognition problem. This formulation leaves a persistent drawback in multi-label audio tagging (AT): acoustic similarity can make certain events difficult to separate from waveforms alone. In such cases, disambiguating cues often lie outside the waveform. Geospatial semantic context (GSC), derived from geographic information system data, e.g., points of interest (POI), provides location-tied environmental priors that can help reduce this ambiguity. A systematic study of this direction is enabled through the proposed geospatial audio tagging (Geo-AT) task, which conditions multi-label sound event tagging on GSC alongside audio. To benchmark Geo-AT, Geo-ATBench is introduced as a polyphonic audio benchmark with geographical annotations, containing 10.71 hours of audio across 28 event categories; each clip is paired with a GSC representation from 11 semantic context categories. GeoFusion-AT is proposed as a unified geo-audio fusion framework that evaluates feature-, representation-, and decision-level fusion on representative audio backbones, with audio- and GSC-only baselines. Results show that incorporating GSC improves AT performance, especially on acoustically confounded labels, indicating geospatial semantics provide effective priors beyond audio alone. A crowdsourced listening study with 10 participants on 579 samples shows that there is no significant difference in performance between models on Geo-ATBench labels and aggregated human labels, supporting Geo-ATBench as a human-aligned benchmark. The Geo-AT task, benchmark Geo-ATBench, and reproducible geo-audio fusion framework GeoFusion-AT provide a foundation for studying AT with geospatial semantic context within the CASA community. Dataset, code, models are on homepage (https://github.com/WuYanru2002/Geo-ATBench).

URL PDF HTML ☆

赞 0 踩 0

2603.10599 2026-03-12 cs.MS cs.LG

Self-Scaled Broyden Family of Quasi-Newton Methods in JAX

Ivan Bioli, Mikel Mendibe Abarrategi

2603.10504 2026-03-12 cs.CR cs.AI cs.CV

Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection

Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo

2603.10503 2026-03-12 math.NA cs.LG cs.NA

A New Tensor Network: Tubal Tensor Train and Its Applications

Salman Ahmadi-Asl, Valentin Leplat, Anh-Huy Phan, Andrzej Cichocki

2603.10489 2026-03-12 q-bio.NC cs.AI cs.LG

JEDI: Jointly Embedded Inference of Neural Dynamics

Anirudh Jamkhandi, Ali Korojy, Olivier Codol, Guillaume Lajoie, Matthew G. Perich

2603.10471 2026-03-12 cs.IR cs.AI

Modeling Stage-wise Evolution of User Interests for News Recommendation

Zhiyong Cheng, Yike Jin, Zhijie Zhang, Huilin Chen, Zhangling Duan, Meng Wang

Comments ACM Web Conference 2026 Accepted

2603.10452 2026-03-12 stat.ML cs.LG

Brenier Isotonic Regression

Han Bao, Amirreza Eshraghi, Yutong Wang

Comments AISTATS2026

2603.10435 2026-03-12 stat.ML cs.LG

Adaptive Active Learning for Regression via Reinforcement Learning

Simon D. Nguyen, Troy Russo, Kentaro Hoffman, Tyler H. McCormick

Comments 33 pages, 103 figures. Main paper (8 pages, 4 figures) plus appendix with proofs and supplemental experimental results. Submitted to UAI2026. Codebase available at https://github.com/thatswhatsimonsaid/WeightedGreedySampling

2603.10420 2026-03-12 eess.AS cs.SD

FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System

Kaituo Xu, Yan Jia, Kai Huang, Junjie Chen, Wenpeng Li, Kun Liu, Feng-Long Xie, Xu Tang, Yao Hu

2603.10413 2026-03-12 cs.CR cs.AI

Enhancing Network Intrusion Detection Systems: A Multi-Layer Ensemble Approach to Mitigate Adversarial Attacks

Nasim Soltani, Shayan Nejadshamsi, Zakaria Abou El Houda, Raphael Khoury, Kelton A. P. Costa, Tiago H. Falk, Anderson R. Avila

2603.10374 2026-03-12 cs.HC cs.AI

Reactive Writers: How Co-Writing with AI Changes How We Engage with Ideas

Advait Bhat, Marianne Aubin Le Quéré, Mor Naaman, Maurice Jakesch

Comments 21 pages, 8 figures, CHI 2026 : ACM CHI Conference on Human Factors in Computing Systems

2603.10371 2026-03-12 eess.AS cs.CL

Speech Codec Probing from Semantic and Phonetic Perspectives

Xuan Shi, Chang Zeng, Tiantian Feng, Shih-Heng Wang, Jianbo Ma, Shrikanth Narayanan

2603.10369 2026-03-12 cs.IR cs.AI

Beyond Interleaving: Causal Attention Reformulations for Generative Recommender Systems

Hailing Cheng

Comments 8 pages, 8 figures, submitted to KDD 2026

2603.10357 2026-03-12 cs.NI cs.AI

Utility Function is All You Need: LLM-based Congestion Control

Neta Rozen-Schiff, Liron Schiff, Stefan Schmid