arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2405.00892 2026-05-04 cs.CV cs.AI

Wake Vision: A Tailored Dataset and Benchmark Suite for TinyML Computer Vision Applications

Colby Banbury, Emil Njor, Andrea Mattia Garavagno, Mark Mazumder, Matthew Stewart, Pete Warden, Manjunath Kudlur, Nat Jeffries, Xenofon Fafoutis, Vijay Janapa Reddi

详情

英文摘要

Tiny machine learning (TinyML) co-locates models with sensors on microcontrollers, where small models (which are disproportionately sensitive to label noise) and bespoke binary tasks (which lack standard benchmarks) make general-purpose dataset practices a poor fit. Visual Wake Words (VWW), the prior standard TinyML person detection benchmark, contains roughly 123K images and has an estimated label error rate of 7.8%, which limits its usefulness for production-grade systems. Manual labeling, however, is prohibitively expensive for the scale and diversity of TinyML use cases. We address this gap with the Wake Vision pipeline, an automated method for generating and curating large-scale binary classification datasets for TinyML. We use data-centric TinyML for the dataset construction, curation, and lifecycle methods that produce the large, well-curated datasets these systems require. The pipeline combines label fusion across image-level and bounding-box sources, confidence-, area-, and depiction-aware filtering, label correction on the evaluation splits, and automatic generation of fine-grained benchmark subsets. Applying it to person detection, we release Wake Vision, a dataset of almost 6M images (close to 100x more person images than VWW) with a manually relabeled validation and test set at a 2.2% label error rate. Models trained on Wake Vision improve test accuracy by up to 6.6% over VWW across MobileNetV2, MCUNet, MicroNets, and ColabNAS architectures, and match or exceed VWW-trained models on 13 of 16 fine-grained subsets covering perceived gender, perceived age, distance, lighting, and depictions. The advantage holds under distribution shift on three out-of-distribution datasets covering driving and overhead-surveillance imagery. All artifacts are released under CC-BY 4.0 through TensorFlow Datasets and Hugging Face.

URL PDF HTML ☆

赞 0 踩 0

2404.07475 2026-05-04 cs.CL cs.AI cs.CY cs.LG

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Evan Shieh, Faye-Marie Vassel, Cassidy Sugimoto, Thema Monroe-White

Comments 16 pages (43 if including supplementals), 8 figures (23 if including supplementals)

详情

DOI: 10.1038/s41467-025-68004-9
Journal ref: Nat Commun 17, 1243 (2026)

英文摘要

The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified) are lacking and have not yet been grounded in end-consumer harms. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this "laissez-faire" setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (e.g. perpetual foreigner) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.

URL PDF HTML ☆

赞 0 踩 0

2403.17101 2026-05-04 cs.AI

AI Consciousness is Inevitable: A Theoretical Computer Science Perspective

Lenore Blum, Manuel Blum

2403.02290 2026-05-04 cs.AI cs.LG math.DS math.OC

Koopman-Assisted Reinforcement Learning

Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, Steven L. Brunton

Comments 28 pages, 10 figures, 4 tables

2312.12339 2026-05-04 cs.LG cs.RO

Value Explicit Pretraining for Learning Transferable Representations

Kiran Lekkala, Henghui Bao, Sumedh A. Sontakke, Erdem Biyik, Laurent Itti

Comments Published in Robotics and Automation Letters (RA-L), January 2026

2309.06577 2026-05-04 cs.LG quant-ph

Efficient Finite Initialization with Partial Norms for Tensorized Neural Networks and Tensor Networks Algorithms

Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta

Comments 11 pages, 16 figures, several improvements, and new demonstrations

2210.15304 2026-05-04 cs.LG cs.AI

Explaining the Explainers in Graph Neural Networks: a Comparative Study

Antonio Longa, Steve Azzolin, Gabriele Santin, Giulia Cencetti, Pietro Liò, Bruno Lepri, Andrea Passerini

2605.00820 2026-05-04 cs.CE cs.LG cs.NA math.NA

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

Jinpai Zhao, Nishant Panda, Yen Ting Lin, Eirik Valseth, Diane Oyen, Clint Dawson

2605.00803 2026-05-04 cs.SE cs.AI cs.CL

Can Coding Agents Reproduce Findings in Computational Materials Science?

Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Daniel Khashabi

2605.00796 2026-05-04 cs.CR cs.AI cs.CL

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

Alfredo Madrid-García, Miguel Rujas

详情

英文摘要

Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To report an anonymized, non-destructive security assessment of a publicly accessible patient-facing medical RAG chatbot and identify governance lessons for safe deployment of generative AI in health. Methods: We used a two-stage strategy. First, Claude Opus 4.6 supported exploratory prompt-based testing and structured vulnerability hypotheses. Second, candidate findings were manually verified using Chrome Developer Tools, inspecting browser-visible network traffic, payloads, API schemas, configuration objects, and stored interaction data. Results: The LLM-assisted phase identified a critical vulnerability: sensitive system and RAG configuration appeared exposed through client-server communication rather than restricted server-side. Manual verification confirmed that ordinary browser inspection allowed collection of the system prompt, model and embedding configuration, retrieval parameters, backend endpoints, API schema, document and chunk metadata, knowledge-base content, and the 1,000 most recent patient-chatbot conversations. The deployment also contradicted its privacy assurances: full conversation records, including health-related queries, were retrievable without authentication. Conclusions: Serious privacy and security failures in patient-facing RAG chatbots can be identified with standard browser tools, without specialist skills or authentication; independent review should be a prerequisite for deployment. Commercial LLMs accelerated this assessment, including under a false developer persona; assistance available to auditors is equally available to adversaries.

URL PDF HTML ☆

赞 0 踩 0

2605.00782 2026-05-04 cs.SE cs.AI

GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair

Yinhao Xiao, Rongbo Xiao, Yihan Zhang

2605.00747 2026-05-04 quant-ph cs.LG

Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks

Emma Andrews, Nahyeon Kim, Prabhat Mishra

2605.00740 2026-05-04 math.OC cs.LG stat.ML

Randomized Subspace Nesterov Accelerated Gradient

Gaku Omiya, Pierre-Louis Poirion, Akiko Takeda

Comments 50 pages

2605.00733 2026-05-04 cs.NI cs.AI cs.LG cs.MM

EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure

Zihao Ding, Beining Wu, Jun Huang

2603.26692 2026-05-04 quant-ph cs.AI math.PR

Degrees, Levels, and Profiles of Contextuality

Ehtibar N. Dzhafarov, Victor H. Cervantes

Comments 32 pp. 15 figures, 10 tables (v.4 is close to the published version)

2510.22628 2026-05-04 cs.CR cs.AI

Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

Md. Mehedi Hasan, Sk Tanzir Mehedi, Ziaur Rahman, Rafid Mostafiz, Md. Abir Hossain

Comments 11 pages, 5 figures. Preprint version under review in the area of Artificial Intelligence (cs.AI)

详情

英文摘要

This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks targeting large language models (LLMs). The framework uses a hybrid architecture with FAISS-indexed SBERT embedding representations that capture the semantic meaning of prompts, combined with fine-tuned transformer classifiers, which are machine learning models specialized for distinguishing between benign and adversarial language inputs. It identifies adversarial prompts in both direct and obfuscated attack vectors. A core innovation is the classifier-retriever fusion module, which dynamically computes context-aware risk scores that estimate how likely a prompt is to be adversarial based on its content and context. The framework ensures multilingual resilience with a language-agnostic preprocessing layer. This component automatically translates non-English prompts into English for semantic evaluation, enabling consistent detection across over 100 languages. The system includes a HITL feedback loop, where decisions made by the automated system are reviewed by human experts for continual learning and rapid adaptation under adversarial pressure. Sentra-Guard maintains an evolving dual-labeled knowledge base of benign and malicious prompts, enhancing detection reliability and reducing false positives. Evaluation results show a 99.96% detection rate (AUC = 1.00, F1 = 1.00) and an attack success rate (ASR) of only 0.004%. This outperforms leading baselines such as LlamaGuard-2 (1.3%) and OpenAI Moderation (3.7%). Unlike black-box approaches, Sentra-Guard is transparent, fine-tunable, and compatible with diverse LLM backends. Its modular design supports scalable deployment in both commercial and open-source environments. The system establishes a new state-of-the-art in adversarial LLM defense.

URL PDF HTML ☆

赞 0 踩 0

2510.21141 2026-05-04 cs.NI cs.LG

TURBOTEST: Learning When Less is Enough through Early Termination of Internet Speed Tests

Haarika Manda, Manshi Sagar, Yogesh, Kartikay Singh, Cindy Zhao, Tarun Mangla, Phillipa Gill, Elizabeth Belding, Arpit Gupta

2510.18900 2026-05-04 physics.chem-ph cond-mat.mtrl-sci cs.LG

Foundation Models for Discovery and Exploration in Chemical Space

Alexius Wadell, Anoushka Bhutani, Victor Azumah, Austin R. Ellis-Mohr, Andrew J. Stier, Kareem Hegazy, Alexander Brace, Hancheng Zhao, Celia Kelly, Anuj K. Nayak, Yuhan Chen, Dimitrios Simatos, Hongyi Lin, Murali Emani, Venkatram Vishwanath, Kevin Gering, Melisa Alkan, Tom Gibbs, Jack Wells, Wesley W. Qian, Richard C. Gerkin, Benjamin Amorelli, Alexander B. Wiltschko, Lav R. Varshney, Bharath Ramsundar, Karthik Duraisamy, Michael W. Mahoney, Arvind Ramanathan, Venkatasubramanian Viswanathan

Comments Main manuscript: 30 pages (including references), 7 tables and 5 figures. Supplementary information: 158 pages (including references), 15 tables and 128 figures

详情

英文摘要

Accurate prediction of atomistic, thermodynamic, and kinetic properties from molecular structures underpins materials innovation. Existing computational and experimental approaches lack the scalability required to navigate chemical space efficiently. Scientific foundation models trained on large unlabelled datasets offer a path towards navigating chemical space across application domains. Here, we develop MIST, a family of molecular foundation models with up to an order of magnitude more parameters and data than prior works. Trained using a novel tokenizer, Smirk, which comprehensively captures nuclear, electronic, and geometric information, MIST learns a diverse range of molecules. MIST models have been fine-tuned to predict more than 400 structure-property relationships and have been shown to match or exceed state-of-the-art performance across diverse benchmarks, from physiology to electrochemistry. We demonstrate the ability of these models to solve real-world problems across chemical space from multiobjective electrolyte solvent screening to stereochemical reasoning for organometallics and mixture property prediction. The clearest demonstration of a foundation model is its ability to solve problems that were neither explicit targets of training nor central to the intentions of its developers. We identify olfactory perception mapping as such a problem, and show that MIST accurately predicted scent profiles and learned a hierarchical representation of olfactory space consistent with hyperbolic geometry. We formulated hyperparameter aware Bayesian neural scaling laws which eliminate the need for hyperparameter sweeps at every scale, making training large compute-optimal models feasible on a limited compute budget. The methods and findings presented here represent a significant step towards accelerating materials discovery, design, and optimization using foundation models.

URL PDF HTML ☆

赞 0 踩 0

2509.26388 2026-05-04 eess.AS cs.AI cs.CL

Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

Kai-Wei Chang, En-Pei Hu, Chun-Yi Kuan, Wenze Ren, Wei-Chih Chen, Guan-Ting Lin, Yu Tsao, Shao-Hua Sun, Hung-yi Lee, James Glass

Comments Accepted to ICASSP 2026

2605.00731 2026-05-04 cs.SI cs.AI

Empowering Heterogeneous Graph Foundation Models via Decoupled Relation Alignment

Ziyu Zheng, Yaming Yang, Zhe Wang, Ziyu Guan, Wei Zhao

2605.00723 2026-05-04 stat.ML cs.LG math.PR

Decentralized Proximal Stochastic Gradient Langevin Dynamics

Mohammad Rafiqul Islam, Lingjiong Zhu

Comments 42 pages, 7 figures

2605.00698 2026-05-04 eess.IV cs.LG

FedKPer: Tackling Generalization and Personalization in Medical Federated Learning via Knowledge Personalization

Zoe Fowler, Ghassan AlRegib

Comments Accepted to IEEE International Conference on Image Processing (ICIP)

2605.00662 2026-05-04 cs.NE cs.LG

Spiking Sequence Machines and Transformers

Joy Bose

Comments 14 pages, 2 figures, 2 tables

2605.00639 2026-05-04 cond-mat.mtrl-sci cs.AI

Born-Qualified: An Autonomous Framework for Deploying Advanced Energy and Electronic Materials

Steven R. Spurgeon, Milad Abolhasani, Frederick Baddour, Ryan B. Comes, Vinayak P. Dravid, Hilary Egan, Patrick Emami, Robert W. Epps, Davi M. Fébba, Renae Gannon, E. Ashley Gaulding, Ayana Ghosh, Kenny Gruchalla, Grace Guinan, Taro Hitosugi, Michael Holden, Sergei V. Kalinin, Yangang Liang, John S. Mangum, Matthew J. Olszta, Nathaniel H. Park, Axel Palmstrom, Michelle A. Smeaton, Brooks Tellekamp, Nicholas E. Thornburg, Raymond R. Unocic, Daniela Ushizima, Rama K. Vasudevan, Robert White, Andrew Young, Andriy Zakutayev

Comments 14 pages, 2 figures

2605.00628 2026-05-04 cs.DB cs.CL

EGREFINE: An Execution-Grounded Optimization Framework for Text-to-SQL Schema Refinement

Jiaqian Wang, Yutao Qi, Wenjin Hou, Yu Pang, Rui Yang

Comments 15 pages, 5 figures, 50 references.Code: https://github.com/ai-jiaqian/EGRefine

2605.00582 2026-05-04 cs.HC cs.AI

AI Washing Inflates Expected Performance but Not Interaction Outcomes: An AI Placebo Study Using Fitts' Law

Nick von Felten, Luisa Ella Müller, Johannes Schöning

Comments Accepted to the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT '26)

2605.00581 2026-05-04 stat.ML cs.LG math.OC

Gradient Regularized Newton Boosting Trees with Global Convergence

Nikita Zozoulenko, Daniel Falkowski, Thomas Cass, Lukas Gonon

2605.00556 2026-05-04 cs.HC cs.AI cs.CY cs.RO

Linking Behaviour and Perception to Evaluate Meaningful Human Control over Partially Automated Driving

Ashwin George, Lucas Elbert Suryana, Lorenzo Flipse, Bart van Arem, David A. Abbink, Simeon Craig Calvert, Luciano Cavalcante Siebert, Arkady Zgonnikov

2605.00536 2026-05-04 cs.DC cs.AR cs.LG cs.PF cs.RO

Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

M. Grailoo, J. Núñez-Yáñez

Comments 11 pages, 3 figures, 8 tables, 4 algorithms

2605.00528 2026-05-04 cs.DC cs.AI cs.LG cs.OS

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments 15 pages, 3 figures, 11 tables. Accepted to HPDC '26 (35th International Symposium on High-Performance Parallel and Distributed Computing), July 13-16, 2026, Cleveland, OH, USA