arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.06297 2026-04-09 cs.CR cs.LG

FedSpy-LLM: Towards Scalable and Generalizable Data Reconstruction Attacks from Gradients on LLMs

Syed Irfan Ali Meerza, Feiyi Wang, Jian Liu

详情

英文摘要

Given the growing reliance on private data in training Large Language Models (LLMs), Federated Learning (FL) combined with Parameter-Efficient Fine-Tuning (PEFT) has garnered significant attention for enhancing privacy and efficiency. Despite FL's privacy benefits, prior studies have shown that private data can still be extracted from shared gradients. However, these studies, mainly on full-parameter model training, are limited to reconstructing small batches, short input sequences, and specific model architectures, such as encoder-based or decoder-based models. The reconstruction quality becomes even worse when dealing with gradients from PEFT methods. To fully understand the practical attack surface of federated LLMs, this paper proposes FedSpy-LLM, a scalable and generalizable data reconstruction attack designed to reconstruct training data with larger batch sizes and longer sequences while generalizing across diverse model architectures, even when PEFT methods are deployed for training. At the core of FedSpy-LLM is a novel gradient decomposition strategy that exploits the rank deficiency and subspace structure of gradients, enabling efficient token extraction while preserving key signal components at scale. This approach further mitigates the reconstruction challenges introduced by PEFT's substantial null space, ensuring robustness across encoder-based, decoder-based, and encoder-decoder model architectures. Additionally, by iteratively aligning each token's partial-sequence gradient with the full-sequence gradient, FedSpy-LLM ensures accurate token ordering in reconstructed sequences.

URL PDF HTML ☆

赞 0 踩 0

2604.06289 2026-04-09 cs.CR cs.LG

Adversarial Robustness of Time-Series Classification for Crystal Collimator Alignment

Xaver Fink, Borja Fernandez Adiego, Daniele Mirarchi, Eloise Matheson, Alvaro Garcia Gonzales, Gianmarco Ricci, Joost-Pieter Katoen

2604.06285 2026-04-09 cs.CR cs.AI cs.CV

Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization

Igor Maljkovic, Maria Rosaria Briglia, Iacopo Masi, Antonio Emanuele Cinà, Fabio Roli

Comments Paper accepted at ICLR 2026. Webpage available at: https://hype-vlm.github.io

2604.06284 2026-04-09 cs.CR cs.AI

ClawLess: A Security Model of AI Agents

Hongyi Lu, Nian Liu, Shuai Wang, Fengwei Zhang

2604.06282 2026-04-09 stat.ML cs.LG

Tight Convergence Rates for Online Distributed Linear Estimation with Adversarial Measurements

Nibedita Roy, Vishal Halder, Gugan Thoppe, Alexandre Reiffers-Masson, Mihir Dhanakshirur, Naman, Alexandre Azor

Comments Preprint

2604.06280 2026-04-09 physics.med-ph cs.AI

DosimeTron: Automating Personalized Monte Carlo Radiation Dosimetry in PET/CT with Agentic AI

Eleftherios Tzanis, Michail E. Klontzas, Antonios Tzortzakakis

2604.06279 2026-04-09 physics.plasm-ph cs.AI

Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations

Ruichen Zhang, Feda AlMuhisen, Chenguang Wan, Zhisong Qu, Kunpeng Li, Youngwoo Cho, Kyungtak Lim, Virginie Grandgirard, Xavier Garbet

Comments 9 pages, 8 figures

2604.06274 2026-04-09 cs.CR cs.AI

Towards the Development of an LLM-Based Methodology for Automated Security Profiling in Compliance with Ukrainian Cybersecurity Regulations

Daniil Shafranskyi, Iryna Stopochkina, Mykola Ilin

Comments 12 pages, 2 figures

2604.06266 2026-04-09 cs.CR cs.AI

Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models

Umesh Biswas, Shafqat Hasan, Syed Mohammed Farhan, Nisha Pillai, Charan Gudla

2604.06264 2026-04-09 q-bio.QM cs.AI

ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

Jueon Park, Wonjune Jang, Chanhwi Kim, Yein Park, Jaewoo Kang

Comments Accepted to ACL 2026 Findings

2604.06263 2026-04-09 cs.GT cs.AI cs.IR cs.LG

Incentive-Aware Multi-Fidelity Optimization for Generative Advertising in Large Language Models

Jiayuan Liu, Barry Wang, Jiarui Gan, Tonghan Wang, Leon Xie, Mingyu Guo, Vincent Conitzer

2604.06262 2026-04-09 q-bio.QM cs.AI

From Exposure to Internalization: Dual-Stream Calibration for In-context Clinical Reasoning

Chuang Zhao, Hongke Zhao, Xiaofang Zhou, Xiaomeng Li

2604.06255 2026-04-09 astro-ph.SR astro-ph.GA astro-ph.IM cs.AI

Learning the Stellar Structure Equations via Self-supervised Physics-Informed Neural Networks

Manuel Ballester, Santiago Lopez-Tapia, Seth Gossage, Patrick Koller, Philipp M. Srivastava, Ugur Demir, Yongseok Jo, Almudena P. Marquez, Christoph Wuersch, Souvik Chakraborty, Vicky Kalogera, Aggelos Katsaggelos

详情

英文摘要

Stellar astrophysics relies critically on accurate descriptions of the physical conditions inside stars. Traditional solvers such as \texttt{MESA} (Modules for Experiments in Stellar Astrophysics), which employ adaptive finite-difference methods, can become computationally expensive and challenging to scale for large stellar population synthesis ($>10^9$ stars). In this work, we present an self-supervised physics-informed neural network (PINN) framework that provides a mesh-free and fully differentiable approach to solving the stellar structure equations under hydrostatic and thermal equilibrium. The model takes as input the stellar boundary conditions (at the center and surface) together with the chemical composition, and learns continuous radial profiles for mass $M_r(r)$, pressure $P(r)$, density $ρ(r)$, temperature $T(r)$, and luminosity $L_r(r)$ by enforcing the governing structure equations through physics-based loss terms. To incorporate realistic microphysics, we introduce auxiliary neural networks that approximate the equation of state and opacity tables as smooth, differentiable functions of the local thermodynamic state. These surrogates replace traditional tabulated inputs and enable end-to-end training. Once trained for a given star, the model produces continuous solutions across the entire radial domain without requiring discretization or interpolation. Validation against benchmark \texttt{MESA} models across a range of stellar masses yields a Mean Relative Absolute Error of $3.06\%$ and an average $R^2$ score of $99.98\%$. To our knowledge, this is the first demonstration that the stellar structure equations can be solved in a fully self-supervised and data-free fashion employing PINNs. This work establishes a foundation for scalable, physics-informed emulation of stellar interiors and opens the door to future extensions toward time-dependent stellar evolution.

URL PDF HTML ☆

赞 0 踩 0

2604.06254 2026-04-09 cs.CR cs.AI cs.CV

SE-Enhanced ViT and BiLSTM-Based Intrusion Detection for Secure IIoT and IoMT Environments

Afrah Gueriani, Hamza Kheddar, Ahmed Cherif Mazari, Seref Sagiroglu, Onur Ceran

2604.06247 2026-04-09 cs.CR cs.AI

SALLIE: Safeguarding Against Latent Language & Image Exploits

Guy Azov, Ofer Rivlin, Guy Shtar

Comments 18 pages, 4 figures, 7 tables. Preprint under review

2604.06240 2026-04-09 cs.CR cs.AI cs.MA

The Art of Building Verifiers for Computer Use Agents

Corby Rosset, Pratyusha Sharma, Andrew Zhao, Miguel Gonzalez-Fernandez, Ahmed Awadallah

2604.06235 2026-04-09 cs.CR cs.AI cs.CY

Negotiating Privacy with Smart Voice Assistants: Risk-Benefit and Control-Acceptance Tensions

Molly Campbell, Mohamad Sheikho Al Jasem, Ajay Kumar Shrestha

Comments To appear in the IEEE CSP 2026 proceedings

2604.06231 2026-04-09 cs.DB cs.AI cs.CL cs.IR cs.SE

Automating Database-Native Function Code Synthesis with LLMs

Wei Zhou, Xuanhe Zhou, Qikang He, Guoliang Li, Bingsheng He, Quanqing Xu, Fan Wu

Comments Please visit our homepage at: https://code4db.github.io/hi-opencook/. The code is available at: https://github.com/weAIDB/OpenCook

详情

英文摘要

Database systems incorporate an ever-growing number of functions in their kernels (a.k.a., database native functions) for scenarios like new application support and business migration. This growth causes an urgent demand for automatic database native function synthesis. While recent advances in LLM-based code generation (e.g., Claude Code) show promise, they are too generic for database-specific development. They often hallucinate or overlook critical context because database function synthesis is inherently complex and error-prone, where synthesizing a single function may involve registering multiple function units, linking internal references, and implementing logic correctly. To this end, we propose DBCooker, an LLM-based system for automatically synthesizing database native functions. It consists of three components. First, the function characterization module aggregates multi-source declarations, identifies function units that require specialized coding, and traces cross-unit dependencies. Second, we design operations to address the main synthesis challenges: (1) a pseudo-code-based coding plan generator that constructs structured implementation skeletons by identifying key elements such as reusable referenced functions; (2) a hybrid fill-in-the-blank model guided by probabilistic priors and component awareness to integrate core logic with reusable routines; and (3) three-level progressive validation, including syntax checking, standards compliance, and LLM-guided semantic verification. Finally, an adaptive orchestration strategy unifies these operations with existing tools and dynamically sequences them via the orchestration history of similar functions. Results show that DBCooker outperforms other methods on SQLite, PostgreSQL, and DuckDB (34.55% higher accuracy on average), and can synthesize new functions absent in the latest SQLite (v3.50).

URL PDF HTML ☆

赞 0 踩 0

2604.06230 2026-04-09 cs.DB cond-mat.mtrl-sci cs.AI

Ontology-based knowledge graph infrastructure for interoperable atomistic simulation data

Abril Azocar Guzman, Sarath Menon, Tilmann Hickel, Stefan Sandfeld

2604.06222 2026-04-09 q-bio.NC cs.AI cs.IR cs.NE

The Geometry of Forgetting

Sambartha Ray Barman, Andrey Starenky, Sophia Bodnar, Nikhil Narasimhan, Ashwin Gopinath

2604.06220 2026-04-09 eess.SP cs.AI cs.SD

Development of ML model for triboelectric nanogenerator based sign language detection system

Meshv Patel, Bikash Baro, Sayan Bayan, Mohendra Roy

Comments This paper has been accepted at the IEEE GCON 2026 (https://gcon2026.in/) Conference, organized by IIT Guwahati

2604.06219 2026-04-09 cs.CY cs.AI

From experimentation to engagement: on the paradox of participatory AI and power in contexts of forced displacement and humanitarian crises

Stella Suge, Sarah W. Spencer, Nyalleng Moorosi, Helen McElhinney, Geoff Loane, Sue Black

Comments This paper was submitted to the ACM FAccT conference in 2025 and is published here as a preprint in March 2026. The research was conducted in December 2024. Since submission, AI deployment across the humanitarian sector has accelerated without commensurate development of independent accountability

2604.06217 2026-04-09 cs.CY cs.AI

The End of the Foundation Model Era: Open-Weight Models, Sovereign AI, and Inference as Infrastructure

Jared James Grogan

Comments 44 pages, 75 references, 5 endnotes. Version 1.0, events covered through March 9, 2026

2604.06215 2026-04-09 cs.CY cs.AI

Governing frontier general-purpose AI in the public sector: adaptive risk management and policy capacity under uncertainty through 2030

Fabio Correa Xavier

Comments 7 PAGES, 1 FIGURE

2604.06212 2026-04-09 cs.SE cs.AI cs.CL

Code Sharing In Prediction Model Research: A Scoping Review

Thomas Sounack, Raffaele Giancotti, Catherine A. Gao, Lasai Barreñada, Hyeonhoon Lee, Hyung-Chul Lee, Leo Anthony Celi, Karel G. M. Moons, Gary S. Collins, Charlotta Lindvall, Tom Pollard

详情

英文摘要

Analytical code is essential for reproducing diagnostic and prognostic prediction model research, yet code availability in the published literature remains limited. While the TRIPOD statements set standards for reporting prediction model methods, they do not define explicit standards for repository structure and documentation. This review quantifies current code-sharing practices to inform the development of TRIPOD-Code, a TRIPOD extension reporting guideline focused on code sharing. We conducted a scoping review of PubMed-indexed articles citing TRIPOD or TRIPOD+AI as of Aug 11, 2025, restricted to studies retrievable via the PubMed Central Open Access API. Eligible studies developed, updated, or validated multivariable prediction models. A large language model-assisted pipeline was developed to screen articles and extract code availability statements and repository links. Repositories were assessed with the same LLM against 14 predefined reproducibility-related features. Our code is made publicly available. Among 3,967 eligible articles, 12.2% included code sharing statements. Code sharing increased over time, reaching 15.8% in 2025, and was higher among TRIPOD+AI-citing studies than TRIPOD-citing studies. Sharing prevalence varied widely by journal and country. Repository assessment showed substantial heterogeneity in reproducibility features: most repositories contained a README file (80.5%), but fewer specified dependencies (37.6%; version-constrained 21.6%) or were modular (42.4%). In prediction model research, code sharing remains relatively uncommon, and when shared, often falls short of being reusable. These findings provide an empirical baseline for the TRIPOD-Code extension and underscore the need for clearer expectations beyond code availability, including documentation, dependency specification, licensing, and executable structure.

URL PDF HTML ☆

赞 0 踩 0

2604.06206 2026-04-09 cs.CY cs.AI cs.CL

The Human Condition as Reflected in Contemporary Large Language Models

W. Russell Neuman

2604.06203 2026-04-09 cs.CY cs.AI

Front-End Ethics for Sensor-Fused Health Conversational Agents: An Ethical Design Space for Biometrics

Hansoo Lee, Rafael A. Calvo

Comments Accepted at the Proceedings of the CHI 2026 Workshop: Ethics at the Front-End

2604.06200 2026-04-09 cs.CY cs.AI

Thinking in Graphs with CoMAP: A Shared Visual Workspace for Designing Project-Based Learning

Ruijia Li, Bo Jiang

Comments Accepted by CHI 2026

2604.06198 2026-04-09 cs.CY cs.AI

Concentrated siting of AI data centers drives regional power-system stress under rising global compute demand

Danbo Chen, Zijun Zhou, Yongyang Cai, Jiahong Qin, Ani Katchova, Lei Chen

Comments 32 pages, 8 figures

2604.06191 2026-04-09 eess.AS cs.AI cs.CL cs.SD

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment

Asif Azad, MD Sadik Hossain Shanto, Mohammad Sadat Hossain, Bdour Alwuqaysi, Sabri Boughorbel, Yahya Bokhari, Abdulrhman Aljouie, Ayah Othman Sindi, Ehsan Hoque