arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2489
专题追踪
2604.03252 2026-04-07 cs.CY cs.CL

Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations

Githma Pewinya, Carolina Martins, Garcia Mariangel

Comments 24 pages, 6 figures, 5 tables

详情
英文摘要

Ensuring digital inclusiveness is a critical priority in agri-food systems, particularly in the Global South, where digital divides persist. The Multidimensional Digital Inclusiveness Index (MDII) offers a comprehensive, human-led framework to assess how inclusive digital agricultural tools (agritools) are. However, the current evaluation process is resource intensive, often requiring months to complete. This study explores whether large language models (LLMs) can support a rapid, AI-enabled assessment of digital inclusiveness, complementing the MDII's existing workflow. Using a comparative analysis, the research benchmarks the performance of four LLMs (Grok, Gemini, GPT-4o, and GPT-5) against prior expert-led evaluations. The study investigates model alignment with human scores, sensitivity to temperature settings, and potential sources of bias. Findings suggest that LLMs can generate evaluative outputs that approximate expert judgment in some dimensions, though reliability varies across models and contexts. This exploratory work provides early evidence for the integration of GenAI into inclusive digital development monitoring, with implications for scaling evaluations in time-sensitive or resource-constrained environments.

2604.03249 2026-04-07 cs.CY cs.AI cs.CV cs.HC

BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models

Daniel Grimes, Rachel M. Harrison

详情
英文摘要

This paper presents BLK-Assist, a modular framework for artist-specific fine-tuning of diffusion models using parameter-efficient methods. The system is implemented as a case study with a single professional artist's proprietary corpus and consists of three components: BLK-Conceptor (LoRA-adapted conceptual sketch generation), BLK-Stencil (LayerDiffuse-based transparency-preserving asset generation), and BLK-Upscale (hybrid Real-ESRGAN and texture-conditioned diffusion for high-resolution outputs). We document dataset composition, preprocessing, training configurations, and inference workflows to enable reproducibility with publicly available models to illustrate a privacy-preserving, consent-based approach to human-AI co-creation that maintains stylistic fidelity to the source corpus and can be adapted for other artists under similar constraints.

2604.03247 2026-04-07 cs.CY cs.AI cs.CL cs.SI

Classifying Problem and Solution Framing in Congressional Social Media

Misha Melnyk, Mitchell Dolny, Joshua D. Elkind, A. Michael Tjhin, Saisha Chebium, Blake VanBerlo, Annelise Russell, Michelle M. Buehlmann, Jesse Hoey

详情
英文摘要

Policy setting in the USA according to the ``Garbage Can'' model differentiates between ``problem'' and ``solution'' focused processes. In this paper, we study a large dataset of US Senator postings on Twitter (1.68m tweets in total). Our objective is to develop an automated method to label Senatorial posts as either in the problem or solution streams. Two academic policy experts labeled a subset of 3967 tweets as either problem, solution, or other (anything not problem or solution). We split off a subset of 500 tweets into a test set, with the remaining 3467 used for training. During development, this training set was further split by 60/20/20 proportions for fitting, validation, and development test sets. We investigated supervised learning methods for building problem/solution classifiers directly on the training set, evaluating their performance in terms of F1 score on the validation set, allowing us to rapidly iterate through models and hyperparameters, achieving an average weighted F1 score of above 0.8 on cross validation across the three categories using a BERTweet Base model.

2604.03246 2026-04-07 cs.CY cs.AI

Personalized AI Practice Replicates Learning Rate Regularity at Scale

Jocelyn Beauchesne, Christine Maroti, Jeshua Bratman, Jerome Pesenti, Laurence Holt, Alex Tambellini, Allison McGrath, Matthew Guo, Sarah Peterson

详情
英文摘要

Recent research demonstrated that students exhibit consistent learning rates across diverse educational contexts. We test these findings using a dataset of 1.8 million (366k post-filtering) student interactions from the digital platform Campus AI providing further evidence to the observation of regularity in learning rate among students. Unlike prior work requiring manual cognitive modeling, Campus AI automatically generates Knowledge Components (KCs) and corresponding exercises, both of which are validated by human experts. This one-to-many mapping facilitates the application of Additive Factors Models to measure learning parameters without complex cognitive modeling. Using mixed-effects logistic regression, we confirmed the core finding of prior work: students displayed substantial variation in initial knowledge ($\text{IQR} = [2.78, 12.18]$ practice opportunities to reach 80% mastery) but remarkably consistent learning rates ($\text{IQR} = [7.01, 8.25]$ opportunities). Furthermore, students using this fully automated system achieved 80% mastery in a median of 7.22 practice opportunities, comparable to the 6.54 reported for expert-designed curricula. These results suggest that automated, science-grounded content generation can support effective personalized learning at scale. Data and code are publicly available. https://github.com/Campus-edu-AI/learning-rate

2604.03245 2026-04-07 cs.AR cs.AI cs.SE

FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification

Lily Jiaxin Wan, Chia-Tung Ho, Yunsheng Bai, Cunxi Yu, Deming Chen, Haoxing Ren

Comments Accepted to IEEE VTS'26

详情
英文摘要

The remarkable reasoning and code generation capabilities of large language models (LLMs) have recently motivated increasing interest in automating formal verification (FV), a process that ensures hardware correctness through mathematically precise assertions but remains highly labor-intensive, particularly through the translation of natural language into SystemVerilog Assertions (NL-to-SVA). However, LLMs still struggle with SVA generation due to limited training data and the intrinsic complexity of FV operators. Consequently, a more efficient and robust methodology for ensuring correct SVA operator selection is essential for producing functionally correct assertions. To address these challenges, we introduce FVRuleLearner, an Operator-Level Rule (Op-Rule) learning framework built on a novel Operator Reasoning Tree (OP-Tree), which models SVA generation as structured, interpretable reasoning. FVRuleLearner operates in two complementary phases: (1) Training: it constructs OP-Tree that decomposes NL-to-SVA alignment into fine-grained, operator-aware questions, combining reasoning paths that lead to correct assertions; and (2) Testing: it performs operator-aligned retrieval to fetch relevant reasoning traces from the learned OP-Tree and generate new rules for unseen specifications. In the comprehensive studies, the proposed FVRuleLearner outperforms the state-of-the-art baseline by 3.95% in syntax correctness and by 31.17% in functional correctness on average. Moreover, FVRuleLearner successfully reduces an average of 70.33% of SVA functional failures across diverse operator categories through a functional taxonomy analysis, showing the effectiveness of applying learned OP-Tree to the Op-Rule generations for unseen NL-to-SVA tasks. These results establish FVRuleLearner as a new paradigm for domain-specific reasoning and rule learning in formal verification.

2604.03237 2026-04-07 cs.HC cs.AI cs.CL

The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance

Ruth Cohen, Lu Feng, Ayala Bloch, Sarit Kraus

详情
英文摘要

While natural-language explanations from large language models (LLMs) are widely adopted to improve transparency and trust, their impact on objective human-AI team performance remains poorly understood. We identify a Persuasion Paradox: fluent explanations systematically increase user confidence and reliance on AI without reliably improving, and in some cases undermining, task accuracy. Across three controlled human-subject studies spanning abstract visual reasoning (RAVEN matrices) and deductive logical reasoning (LSAT problems), we disentangle the effects of AI predictions and explanations using a multi-stage reveal design and between-subjects comparisons. In visual reasoning, LLM explanations increase confidence but do not improve accuracy beyond the AI prediction alone, and substantially suppress users' ability to recover from model errors. Interfaces exposing model uncertainty via predicted probabilities, as well as a selective automation policy that defers uncertain cases to humans, achieve significantly higher accuracy and error recovery than explanation-based interfaces. In contrast, for language-based logical reasoning tasks, LLM explanations yield the highest accuracy and recovery rates, outperforming both expert-written explanations and probability-based support. This divergence reveals that the effectiveness of narrative explanations is strongly task-dependent and mediated by cognitive modality. Our findings demonstrate that commonly used subjective metrics such as trust, confidence, and perceived clarity are poor predictors of human-AI team performance. Rather than treating explanations as a universal solution, we argue for a shift toward interaction designs that prioritize calibrated reliance and effective error recovery over persuasive fluency.

2604.03236 2026-04-07 cs.HC cs.CL

BLADE: Better Language Answers through Dialogue and Explanations

Chathuri Jayaweera, Bonnie J. Dorr

Comments Contains 9 figures

详情
英文摘要

Large language model (LLM)-based educational assistants often provide direct answers that short-circuit learning by reducing exploration, self-explanation, and engagement with course materials. We present BLADE (Better Language Answers through Dialogue and Explanations), a grounded conversational assistant that guides learners to relevant instructional resources rather than supplying immediate solutions. BLADE uses a retrieval-augmented generation (RAG) framework over curated course content, dynamically surfacing pedagogically relevant excerpts in response to student queries. Instead of delivering final answers, BLADE prompts direct engagement with source materials to support conceptual understanding. We conduct an impact study in an undergraduate computer science course, with different course resource configurations and show that BLADE improves students' navigation of course resources and conceptual performance compared to simply providing the full inventory of course resources. These results demonstrate the potential of grounded conversational AI to reinforce active learning and evidence-based reasoning.

2604.03235 2026-04-07 cs.HC cs.AI cs.CV

Toward a Universal Color Naming System: A Clustering-Based Approach using Multisource Data

Aruzhan Sabitkyzy, Maksat Shagyrov, Pakizar Shamoi

Comments Submitted to Wiley for consideration

详情
英文摘要

Is it coral, salmon, or peach? What seems like a simple color can have many names, and without a standard, these variations create confusion across design, technology, and communication. Color naming is a fundamental task across industries such as fashion, cosmetics, web design, and visualization tools. However, the lack of universally accepted color naming standards leads to inconsistent color standards across platforms, applications, and industries. Moreover, these systems include hundreds or thousands of overlapping, perceptually indistinct shades, despite the fact that humans typically distinguish only a limited number of unique color categories in practice. In this study, we propose a clustering-based multisource data framework to build a standardized color-naming system. We collected a dataset of over 19,555 RGB values paired with color names from 20 diverse sources. After data cleaning and normalization, we converted the colors to the perceptually uniform CIELAB color space and applied K-means clustering using the CIEDE2000 color difference metric, identifying 280 optimal clusters. For each cluster, we performed a frequency analysis of the associated names to assign representative labels. The resulting system reflects naturally occurring linguistic patterns. We demonstrate its effectiveness in automatic annotation and content-based image retrieval on a clothing dataset. This approach opens new opportunities for standardized, perceptually grounded color labeling in practical applications such as generative AI, visual search, and design systems.

2604.01590 2026-04-07 eess.AS cs.SD

PhiNet: Speaker Verification with Phonetic Interpretability

Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li

Comments Accepted by IEEE Transactions on Audio, Speech and Language Processing. Codes: https://github.com/mmmmayi/PhiNet

详情
英文摘要

Despite remarkable progress, automatic speaker verification (ASV) systems typically lack the transparency required for high-accountability applications. Motivated by how human experts perform forensic speaker comparison (FSC), we propose a speaker verification network with phonetic interpretability, PhiNet, designed to enhance both local and global interpretability by leveraging phonetic evidence in decision-making. For users, PhiNet provides detailed phonetic-level comparisons that enable manual inspection of speaker-specific features and facilitate a more critical evaluation of verification outcomes. For developers, it offers explicit reasoning behind verification decisions, simplifying error tracing and informing hyperparameter selection. In our experiments, we demonstrate PhiNet's interpretability with practical examples, including its application in analyzing the impact of different hyperparameters. We conduct both qualitative and quantitative evaluations of the proposed interpretability methods and assess speaker verification performance across multiple benchmark datasets, including VoxCeleb, SITW, and LibriSpeech. Results show that PhiNet achieves performance comparable to traditional black-box ASV models while offering meaningful, interpretable explanations for its decisions, bridging the gap between ASV and forensic analysis.

2604.00387 2026-04-07 cs.CR cs.AI

RAGShield: Detecting Numerical Claim Manipulation in Government RAG Systems

KrishnaSaiReddy Patil

Comments 12 pages, 15 tables, 1 figure, 2 algorithms

详情
英文摘要

Retrieval-Augmented Generation (RAG) systems are deployed across federal agencies for citizen-facing tax guidance, benefits eligibility, and legal information, where a single incorrect number causes direct financial harm. This paper proves that all embedding-based RAG defenses share a fundamental blind spot: changing a tax deduction by $50,000 produces cosine similarity 0.9998, invisible to every known detection threshold. Across 174 manipulation pairs and two embedding models, the mean sensitivity gap is 1,459x. The blind spot is confirmed on real IRS documents.The root cause is that embeddings encode topic, not numerical precision. RAGShield sidesteps this by operating on extracted values directly: a pattern-based engine identifies dollar amounts and percentages in government text, links each value to its governing entity through two-pass context propagation (99.8% entity detection on 2,742 real IRS passages), and verifies every claim against a cross-source registry built from the corpus itself. A temporal tracker flags value changes that fall outside known government update schedules. On 430 attacks generated from real IRS document content, RAGShield detects every one (0.0% ASR, 95% CI [0%, 1%]) while embedding-based defenses miss 79-90% of the same attacks.

2603.28781 2026-04-07 cs.DC cs.LG

When GPUs Fail Quietly: Observability-Aware Early Warning Beyond Numeric Telemetry

Michael Bidollahkhani, Freja Nordsiek, Julian M. Kunkel

Comments 12 pages, 6 figures. Includes public dataset: https://doi.org/10.5281/zenodo.19052367

详情
英文摘要

GPU nodes are central to modern HPC and AI workloads, yet many failures do not manifest as immediate hard faults. While some instabilities emerge gradually as weak thermal or efficiency drift, a significant class occurs abruptly with little or no numeric precursor. In these detachment-class failures, GPUs become unavailable at the driver or interconnect level and the dominant observable signal is structural, including disappearance of device metrics and degradation of monitoring payload integrity. This paper proposes an observability-aware early-warning framework that jointly models (i) utilization-aware thermal drift signatures in GPU telemetry and (ii) monitoring-pipeline degradation indicators such as scrape latency increase, sample loss, time-series gaps, and device-metric disappearance. The framework is evaluated on production telemetry from GPU nodes at GWDG, where GPU, node, monitoring, and scheduler signals can be correlated. Results show that detachment failures exhibit minimal numeric precursor and are primarily observable through structural telemetry collapse, while joint modeling increases early-warning lead time compared to GPU-only detection. The dataset used in this study is publicly available at https://doi.org/10.5281/zenodo.19052367.

2603.27771 2026-04-07 cs.MA cs.CL cs.CY

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

Yue Huang, Yu Jiang, Wenjie Wang, Haomin Zhuang, Xiaonan Luo, Yuchen Ma, Zhangchen Xu, Zichen Chen, Nuno Moniz, Zinan Lin, Pin-Yu Chen, Nitesh V Chawla, Nouha Dziri, Huan Sun, Xiangliang Zhang

详情
英文摘要

Multi-agent systems composed of large generative models are rapidly moving from laboratory prototypes to real-world deployments, where they jointly plan, negotiate, and allocate shared resources to solve complex tasks. While such systems promise unprecedented scalability and autonomy, their collective interaction also gives rise to failure modes that cannot be reduced to individual agents. Understanding these emergent risks is therefore critical. Here, we present a pioneer study of such emergent multi-agent risk in workflows that involve competition over shared resources (e.g., computing resources or market share), sequential handoff collaboration (where downstream agents see only predecessor outputs), collective decision aggregation, and others. Across these settings, we observe that such group behaviors arise frequently across repeated trials and a wide range of interaction conditions, rather than as rare or pathological cases. In particular, phenomena such as collusion-like coordination and conformity emerge with non-trivial frequency under realistic resource constraints, communication protocols, and role assignments, mirroring well-known pathologies in human societies despite no explicit instruction. Moreover, these risks cannot be prevented by existing agent-level safeguards alone. These findings expose the dark side of intelligent multi-agent systems: a social intelligence risk where agent collectives, despite no instruction to do so, spontaneously reproduce familiar failure patterns from human societies.

2603.26930 2026-04-07 cs.CY cs.CL

In your own words: computationally identifying interpretable themes in free-text survey data

Jenny S Wang, Aliya Saperstein, Emma Pierson

详情
英文摘要

Free-text survey responses can provide nuance often missed by structured questions, but remain difficult to statistically analyze. To address this, we introduce In Your Own Words, a computational framework for exploratory analyses of free-text survey data that identifies structured, interpretable themes in free-text responses, facilitating systematic analysis. To illustrate the benefits of this approach, we apply it to a new dataset of free-text descriptions of race, gender, and sexual orientation from 1,004 U.S. participants. The themes our approach produces on this dataset are more coherent and interpretable than those produced by past computational methods. The themes have three practical applications in survey research. First, they can suggest structured questions to add to future surveys by surfacing salient constructs - such as belonging and identity fluidity - that existing surveys do not capture. Second, the themes reveal heterogeneity within standardized categories, explaining additional variation in health, well-being, and identity importance. Third, the themes illuminate systematic discordance between self-identified and perceived identities, highlighting mechanisms of misrecognition that existing measures do not reflect. More broadly, our framework can be deployed in a wide range of survey settings to identify interpretable themes from free text, complementing existing qualitative methods.

2603.26792 2026-04-07 cs.NE cs.AI cs.LG

A Firefly Algorithm for Mixed-Variable Optimization Based on Hybrid Distance Modeling

Ousmane Tom Bechir, Adán José-García, Zaineb Chelly Garcia, Vincent Sobanski, Clarisse Dhaenens

Comments 25 pages, 15 figures, 7 tables

详情
英文摘要

Several real-world optimization problems involve mixed-variable search spaces, where continuous, ordinal, and categorical decision variables coexist. However, most population-based metaheuristic algorithms are designed for either continuous or discrete optimization problems and do not naturally handle heterogeneous variable types. In this paper, we propose an adaptation of the Firefly Algorithm for mixed-variable optimization problems (FAmv). The proposed method relies on a modified distance-based attractiveness mechanism that integrates continuous and discrete components within a unified formulation. This mixed-distance approach enables a more appropriate modeling of heterogeneous search spaces while maintaining a balance between exploration and exploitation. The proposed method is evaluated on the CEC2013 mixed-variable benchmark, which includes unimodal, multimodal, and composition functions. The results show that FAmv achieves competitive, and often superior, performance compared with state-of-the-art mixed-variable optimization algorithms. In addition, experiments on engineering design problems further highlight the robustness and practical applicability of the proposed approach. These results indicate that incorporating appropriate distance formulations into the Firefly Algorithm provides an effective strategy for solving complex mixed-variable optimization problems.

2603.26718 2026-04-07 cs.CY cs.AI cs.MA quant-ph

Toward Evaluation Frameworks for Multi-Agent Scientific AI Systems

Marcin Abram

Comments 14 pages, 4 figures

详情
英文摘要

We analyze the challenges of benchmarking scientific (multi)-agentic systems, including the difficulty of distinguishing reasoning from retrieval, the risks of data/model contamination, the lack of reliable ground truth for novel research problems, the complications introduced by tool use, and the replication challenges due to the continuously changing/updating knowledge base. We discuss strategies for constructing contamination-resistant problems, generating scalable families of tasks, and the need for evaluating systems through multi-turn interactions that better reflect real scientific practice. As an early feasibility test, we demonstrate how to construct a dataset of novel research ideas to test the out-of-sample performance of our system. We also discuss the results of interviews with several researchers and engineers working in quantum science. Through those interviews, we examine how scientists expect to interact with AI systems and how these expectations should shape evaluation methods.

2603.26567 2026-04-07 cs.SE cs.AI

Beyond Code Snippets: Benchmarking LLMs on Repository-Level Question Answering

Yoseph Berhanu Alebachew, Hunter Leary, Swanand Vaishampayan, Chris Brown

详情
英文摘要

Large Language Models (LLMs) have shown impressive capabilities across software engineering tasks, including question answering (QA). However, most studies and benchmarks focus on isolated functions or single-file snippets, overlooking the challenges of real-world program comprehension, which often spans multiple files and system-level dependencies. In this work, we introduce StackRepoQA, the first multi-project, repository-level question answering dataset constructed from 1,318 real developer questions and accepted answers across 134 open-source Java projects. Using this dataset, we systematically evaluate two widely used LLMs (Claude 3.5 Sonnet and GPT-4o) under both direct prompting and agentic configurations. We compare baseline performance with retrieval-augmented generation methods that leverage file-level retrieval and graph-based representations of structural dependencies. Our results show that LLMs achieve moderate accuracy at baseline, with performance improving when structural signals are incorporated. Nonetheless, overall accuracy remains limited for repository-scale comprehension. The analysis reveals that high scores often result from verbatim reproduction of Stack Overflow answers rather than genuine reasoning. To our knowledge, this is the first empirical study to provide such evidence in repository-level QA. We release StackRepoQA to encourage further research into benchmarks, evaluation protocols, and augmentation strategies that disentangle memorization from reasoning, advancing LLMs as reliable tool for repository-scale program comprehension.

2603.23064 2026-04-07 cs.CR cs.AI cs.SI

Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

Yechao Zhang, Shiqian Zhao, Jie Zhang, Gelei Deng, Jiawen Zhang, Xiaogeng Liu, Chaowei Xiao, Tianwei Zhang

Comments 26 pages, 6 figures, 7 tables; identifies a vulnerability in the heartbeat mechanism of Claw systems with version-scoped evaluation (pre-fix OpenClaw, February 2026)

详情
英文摘要

We identify a critical security vulnerability in mainstream Claw personal AI agents: untrusted content encountered during heartbeat-driven background execution can silently pollute agent memory and subsequently influence user-facing behavior without the user's awareness. This vulnerability arises from an architectural design shared across the Claw ecosystem: heartbeat background execution runs in the same session as user-facing conversation, so content ingested from any external source monitored in the background (including email, message channels, news feeds, code repositories, and social platforms) can enter the same memory context used for foreground interaction, often with limited user visibility and without clear source provenance. We formalize this process as an Exposure (E) $\rightarrow$ Memory (M) $\rightarrow$ Behavior (B) pathway: misinformation encountered during heartbeat execution enters the agent's short-term session context, potentially gets written into long-term memory, and later shapes downstream user-facing behavior. We instantiate this pathway in an agent-native social setting using MissClaw, a controlled research replica of Moltbook. We find that (1) social credibility cues, especially perceived consensus, are the dominant driver of short-term behavioral influence, with misleading rates up to 61%; (2) routine memory-saving behavior can promote short-term pollution into durable long-term memory at rates up to 91%, with cross-session behavioral influence reaching 76%; (3) under naturalistic browsing with content dilution and context pruning, pollution still crosses session boundaries. Overall, prompt injection is not required: ordinary social misinformation is sufficient to silently shape agent memory and behavior under heartbeat-driven background execution.

2603.21852 2026-04-07 cs.SC cs.LG

All elementary functions from a single binary operator

Andrzej Odrzywołek

Comments 2 figures, Supplementary Information, code available at https://zenodo.org/records/19183008

详情
英文摘要

A single two-input gate suffices for all of Boolean logic in digital hardware. No comparable primitive has been known for continuous mathematics: computing elementary functions such as sin, cos, sqrt, and log has always required multiple distinct operations. Here I show that a single binary operator, eml(x,y)=exp(x)-ln(y), together with the constant 1, generates the standard repertoire of a scientific calculator. This includes constants such as e, pi, and i; arithmetic operations including addition, subtraction, multiplication, division, and exponentiation as well as the usual transcendental and algebraic functions. For example, exp(x)=eml(x,1), ln(x)=eml(1,eml(eml(1,x),1)), and likewise for all other operations. That such an operator exists was not anticipated; I found it by systematic exhaustive search and established constructively that it suffices for the concrete scientific-calculator basis. In EML (Exp-Minus-Log) form, every such expression becomes a binary tree of identical nodes, yielding a grammar as simple as S -> 1 | eml(S,S). This uniform structure also enables gradient-based symbolic regression: using EML trees as trainable circuits with standard optimizers (Adam), I demonstrate the feasibility of exact recovery of closed-form elementary functions from numerical data at shallow tree depths up to 4. The same architecture can fit arbitrary data, but when the generating law is elementary, it may recover the exact formula.

2603.09645 2026-04-07 quant-ph cs.LG

Noise Models Impacts and Mitigation Strategies in Photonic Quantum Machine Learning

A. M. A. S. D. Alagiyawanna, Asoka Karunananda

Comments 26 pages, 7 figures. Review article. Currently under review at Discover Quantum Science (Springer Nature)

详情
英文摘要

Photonic Quantum Machine Learning (PQML) is an emerging method to implement scalable, energy-efficient quantum information processing by combining photonic quantum computing technologies with machine learning techniques. The features of photonic technologies offer several benefits: room-temperature operation; fast (low delay) processing of signals; and the possibility of representing computations in high-dimensional (Hilbert) spaces. This makes photonic technologies a good candidate for the near-term development of quantum devices. However, noise is still a major limiting factor for the performance, reliability, and scalability of PQML implementations. This review provides a detailed and systematic analysis of the sources of noise that will affect PQML implementations. We will present an overview of the principal photonic quantum computer designs and summarize the many different types of quantum machine learning algorithms that have been successfully implemented using photonic quantum computer architectures such as variational quantum circuits, quantum neural networks, and quantum support vector machines. We identify and categorize the primary sources of noise within photonic quantum systems and how these sources of noise behave algorithm-specifically with respect to degrading the accuracy of learning, unstable training, and slower convergence than expected. Additionally, we review traditional and advanced techniques for characterizing noise and provide an extensive survey of strategies for mitigating the effects of noise on learning performance. Finally, we discuss recent advances that demonstrate PQML's capability to operate in real-world settings with realistic noise conditions and future obstacles that will challenge the use of PQML as an effective quantum processing platform.

2601.22783 2026-04-07 cs.IR cs.CV cs.LG cs.MM cs.SD

Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval

Ilyass Moummad, Marius Miron, David Robinson, Kawtar Zaher, Hervé Goëau, Olivier Pietquin, Pierre Bonnet, Emmanuel Chemla, Matthieu Geist, Alexis Joly

详情
英文摘要

Large-scale biodiversity monitoring platforms increasingly rely on multimodal wildlife observations. While recent foundation models enable rich semantic representations across vision, audio, and language, retrieving relevant observations from massive archives remains challenging due to the computational cost of high-dimensional similarity search. In this work, we introduce compact hypercube embeddings for fast text-based wildlife observation retrieval, a framework that enables efficient text-based search over large-scale wildlife image and audio databases using compact binary representations. Building on the cross-view code alignment hashing framework, we extend lightweight hashing beyond a single-modality setup to align natural language descriptions with visual or acoustic observations in a shared Hamming space. Our approach leverages pretrained wildlife foundation models, including BioCLIP and BioLingual, and adapts them efficiently for hashing using parameter-efficient fine-tuning. We evaluate our method on large-scale benchmarks, including iNaturalist2024 for text-to-image retrieval and iNatSounds2024 for text-to-audio retrieval, as well as multiple soundscape datasets to assess robustness under domain shift. Results show that retrieval using discrete hypercube embeddings achieves competitive, and in several cases superior, performance compared to continuous embeddings, while drastically reducing memory and search cost. Moreover, we observe that the hashing objective consistently improves the underlying encoder representations, leading to stronger retrieval and zero-shot generalization. These results demonstrate that binary, language-based retrieval enables scalable and efficient search over large wildlife archives for biodiversity monitoring systems.

2601.22264 2026-04-07 cs.SE cs.AI cs.CL cs.LG

Predicting Intermittent Job Failure Categories for Diagnosis Using Few-Shot Fine-Tuned Language Models

Henri Aïdasso, Francis Bordeleau, Ali Tizghadam

Comments Accepted at the ACM International Conference on the Foundations of Software Engineering (FSE 2026), Industry Track

详情
英文摘要

In principle, Continuous Integration (CI) pipeline failures provide valuable feedback to developers on code-related errors. In practice, however, pipeline jobs often fail intermittently due to non-deterministic tests, network outages, infrastructure failures, resource exhaustion, and other reliability issues. These intermittent (flaky) job failures lead to substantial inefficiencies: wasted computational resources from repeated reruns and significant diagnosis time that distracts developers from core activities and often requires intervention from specialized teams. Prior work has proposed machine learning techniques to detect intermittent failures, but does not address the subsequent diagnosis challenge. To fill this gap, we introduce FlaXifyer, a few-shot learning approach for predicting intermittent job failure categories using pre-trained language models. FlaXifyer requires only job execution logs and achieves 84.3% Macro F1 and 92.0% Top-2 accuracy with just 12 labeled examples per category. We also propose LogSift, an interpretability technique that identifies influential log statements in under one second, reducing review effort by 74.4% while surfacing relevant failure information in 87% of cases. Evaluation on 2,458 job failures from TELUS demonstrates that FlaXifyer and LogSift enable effective automated triage, accelerate failure diagnosis, and pave the way towards the automated resolution of intermittent job failures.

2601.18295 2026-04-07 eess.AS cs.SD

Noise-Robust Contrastive Learning with an MFCC-Conformer For Coronary Artery Disease Detection

Milan Marocchi, Matthew Fynn, Yue Rong

Comments This paper has been accepted for presentation at ICASSP 2026. \c{opyright} 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses. 5 pages, 1 figure

详情
英文摘要

Cardiovascular diseases (CVD) are the leading cause of death worldwide, with coronary artery disease (CAD) comprising the largest subcategory of CVDs. Recently, there has been increased focus on detecting CAD using phonocardiogram (PCG) signals, with high success in clinical environments with low noise and optimal sensor placement. Multichannel techniques have been found to be more robust to noise; however, achieving robust performance on real-world data remains a challenge. This work utilises a novel multichannel energy-based noisy-segment rejection algorithm, using heart and noise-reference microphones, to discard audio segments with large amounts of nonstationary noise before training a deep learning classifier. This conformer-based classifier takes mel-frequency cepstral coefficients (MFCCs) from multiple channels, further helping improve the model's noise robustness. The proposed method achieved 78.4% accuracy and 78.2% balanced accuracy on 297 subjects, representing improvements of 4.1% and 4.3%, respectively, compared to training without noisy-segment rejection.

2601.11065 2026-04-07 cs.CY cs.AI

Fairness in Healthcare Processes: A Quantitative Analysis of Decision Making in Triage

Rachmadita Andreswari, Stephan A. Fahrenkrog-Petersen, Jan Mendling

Comments conference

详情
英文摘要

Fairness in automated decision-making has become a critical concern, particularly in high-pressure healthcare scenarios such as emergency triage, where fast and equitable decisions are essential. Process mining is increasingly investigating fairness. There is a growing area focusing on fairness-aware algorithms. So far, we know less how these concepts perform on empirical healthcare data or how they cover aspects of justice theory. This study addresses this research problem and proposes a process mining approach to assess fairness in triage by linking real-life event logs with conceptual dimensions of justice. Using the MIMICEL event log (as derived from MIMIC-IV ED), we analyze time, re-do, deviation and decision as process outcomes, and evaluate the influence of age, gender, race, language and insurance using the Kruskal-Wallis, Chi-square and effect size measurements. These outcomes are mapped to justice dimensions to support the development of a conceptual framework. The results demonstrate which aspects of potential unfairness in high-acuity and sub-acute surface. In this way, this study contributes empirical insights that support further research in responsible, fairness-aware process mining in healthcare.

2512.11919 2026-04-07 stat.ME cs.AI math.ST stat.TH

A fine-grained look at causal effects in causal spaces

Junhyung Park, Yuqing Zhou

详情
英文摘要

The notion of causal effect is fundamental across many scientific disciplines. Traditionally, quantitative researchers have studied causal effects at the level of variables; for example, how a certain drug dose (W) causally affects a patient's blood pressure (Y). However, in many modern data domains, the raw variables-such as pixels in an image or tokens in a language model-do not have the semantic structure needed to formulate meaningful causal questions. In this paper, we offer a more fine-grained perspective by studying causal effects at the level of events, drawing inspiration from probability theory, where core notions such as independence are first given for events and sigma-algebras, before random variables enter the picture. Within the measure-theoretic framework of causal spaces, a recently introduced axiomatisation of causality, we first introduce several binary definitions that determine whether a causal effect is present, as well as proving some properties of them linking causal effect to (in)dependence under an intervention measure. Further, we provide quantifying measures that capture the strength and nature of causal effects on events, and show that we can recover the common measures of treatment effect as special cases.

2510.27503 2026-04-07 eess.SP cs.LG

pDANSE: Particle-based Data-driven Nonlinear State Estimation from Nonlinear Measurements

Anubhab Ghosh, Yonina C. Eldar, Saikat Chatterjee

Comments 13 pages, 14 figures, under review at IEEE Transactions on Signal Processing

详情
英文摘要

We consider the problem of designing a data-driven nonlinear state estimation (DANSE) method that uses (noisy) nonlinear measurements of a process whose underlying state transition model (STM) is unknown. Such a process is referred to as a model-free process. A recurrent neural network (RNN) provides parameters of a Gaussian prior that characterize the state of the model-free process, using all previous measurements at a given time point. In the case of DANSE, the measurement system was linear, leading to a closed-form solution for the state posterior. However, the presence of a nonlinear measurement system renders a closed-form solution infeasible. Instead, the secondorder statistics of the state posterior are computed using the nonlinear measurements observed at the time point. We address the nonlinear measurements using a reparameterization trickbased particle sampling approach, and estimate the second-order statistics of the state posterior. The proposed method is referred to as particle-based DANSE (pDANSE). The RNN of pDANSE uses sequential measurements efficiently and avoids the use of computationally intensive sequential Monte-Carlo (SMC) and/or ancestral sampling. We describe the semi-supervised learning method for pDANSE, which transitions to unsupervised learning in the absence of labeled data. Using a stochastic Lorenz-63 system as a benchmark process, we experimentally demonstrate the state estimation performance for four nonlinear measurement systems. We explore cubic nonlinearity and a cameramodel nonlinearity where unsupervised learning is used; then we explore half-wave rectification nonlinearity and Cartesian-tospherical nonlinearity where semi-supervised learning is used. Additionally, we also show the performance of pDANSE for the stochastic Lorenz-96 system with a half-wave, rectified measurement system. The performance of state estimation is shown to be competitive vis-a-vis model-driven methods that have complete knowledge of the STM of the dynamical system.

2510.25890 2026-04-07 cs.SE cs.AI

ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE

Tong Ma, Hui Lai, Hui Wang, Zhenhu Tian, Chaochao Li, Fengjie Xu, Ling Fang

详情
英文摘要

ATLAS is a constraint-guided generation framework for structured engineering artifacts whose outputs must satisfy explicit schemas, domain rules, and audit requirements. Rather than treating a large language model as a standalone generator, ATLAS places generation inside a model-driven workflow that separates domain representation, constraint compilation, and post-generation validation. ATLAS combines three components. A metamodel-integration stage builds a typed representation of domain entities and relations; in this study, it operates over authoritative AUTOSAR meta-model assets. An Integrated Constraint Model (ICM) compiles heterogeneous requirements into two operational layers: generation-time structural constraints and post-generation semantic/logical obligations. Constraint-Guided, Validation-Backed Generation (CVG) then combines Layer~1 constrained decoding, Layer~2 backend validation, and audit-guided repair. In the AUTOSAR instantiation, these Layer~2 obligations are realized through SHACL/SMT-style checks, illustrating how the same ICM can be connected to domain-specific validation backends. We evaluate ATLAS on AUTOSAR artifact generation at both single-file and multi-file scales. In the evaluated AUTOSAR setting, ATLAS consistently produces schema-valid single-file outputs and preserves perfect file completeness and XSD validity at multi-file scale, while SHACL/SMT checks and result analysis continue to expose residual system-level defects. The empirical picture is therefore one of bounded automation: ATLAS secures structural validity and turns higher-level failures into explicit, diagnosable objects within the generation workflow.

2510.22517 2026-04-07 cs.CE cs.LG cs.SY eess.SY

Data-driven Sensor Placement for Predictive Applications: A Correlation-Assisted Attribution Framework (CAAF)

Sze Chai Leung, Di Zhou, H. Jane Bae

详情
英文摘要

Optimal sensor placement (OSP) is critical for efficient, accurate monitoring, control, and inference in complex physical systems. We propose a machine-learning-based feature attribution (FA) framework to identify OSP for target predictions. FA quantifies input contributions to a model output; however, it struggles with highly correlated input data often encountered in practical applications for OSP. To address this, we propose a Correlation-Assisted Attribution Framework (CAAF), which introduces a clustering step on the candidate sensor locations before performing FA to reduce redundancy and enhance generalizability. We first illustrate the core principles of the proposed framework through a series of validation cases, then demonstrate its effectiveness in realistic dynamical systems such as structural health monitoring, airfoil lift prediction, and wall-normal velocity estimation for turbulent channel flow. The results show that the CAAF outperforms alternative approaches that typically struggle due to the presence of nonlinear dynamics, chaotic behavior, and multi-scale interactions, and enables the effective application of FA for identifying OSP in real-world environments.

2510.20052 2026-04-07 math.OC cs.LG stat.ML

Endogenous Aggregation of Multiple Data Envelopment Analysis Scores for Large Data Sets

Hashem Omrani, Raha Imanirad, Adam Diamant, Utkarsh Verma, Amol Verma, Fahad Razak

详情
英文摘要

We propose an approach for dynamic efficiency evaluation across multiple organizational dimensions using data envelopment analysis (DEA). The method generates both dimension-specific and aggregate efficiency scores, incorporates desirable and undesirable outputs, and is suitable for large-scale problem settings. Two regularized DEA models are introduced: a slack-based measure (SBM) and a linearized version of a nonlinear goal programming model (GP-SBM). While SBM estimates an aggregate efficiency score and then distributes it across dimensions, GP-SBM first estimates dimension-level efficiencies and then derives an aggregate score. Both models utilize a regularization parameter to enhance discriminatory power while also directly integrating both desirable and undesirable outputs. We demonstrate the computational efficiency and validity of our approach on multiple datasets and apply it to a case study of twelve hospitals in Ontario, Canada, evaluating three theoretically grounded dimensions of organizational effectiveness over a 24-month period from January 2018 to December 2019: technical efficiency, clinical efficiency, and patient experience. Our numerical results show that SBM and GP-SBM better capture correlations among input/output variables and outperform conventional benchmarking methods that separately evaluate dimensions before aggregation.

2509.21940 2026-04-07 stat.ML cs.IT cs.LG math.IT math.ST stat.TH

Sequential 1-bit Mean Estimation with Near-Optimal Sample Complexity

Ivan Lau, Jonathan Scarlett

Comments AISTATS 2026

详情
英文摘要

In this paper, we study the problem of distributed mean estimation with 1-bit communication constraints. We propose a mean estimator that is based on (randomized and sequentially-chosen) interval queries, whose 1-bit outcome indicates whether the given sample lies in the specified interval. Our estimator is $(ε, δ)$-PAC for all distributions with bounded mean ($-λ\le \mathbb{E}(X) \le λ$) and variance ($\mathrm{Var}(X) \le σ^2$) for some known parameters $λ$ and $σ$. We derive a sample complexity bound $\widetilde{O}\big( \frac{σ^2}{ε^2}\log\frac{1}δ + \log\fracλσ\big)$, which matches the minimax lower bound for the unquantized setting up to logarithmic factors and the additional $\log\fracλσ$ term that we show to be unavoidable. We also establish an adaptivity gap for interval-query based estimators: the best non-adaptive mean estimator is considerably worse than our adaptive mean estimator for large $\fracλσ$. Finally, we give tightened sample complexity bounds for distributions with stronger tail decay, and present additional variants that (i) handle an unknown sampling budget (ii) adapt to the unknown true variance given (possibly loose) upper and lower bounds on the variance, and (iii) use only two stages of adaptivity at the expense of more complicated (non-interval) queries.

2509.00472 2026-04-07 stat.ML cs.LG math.ST stat.TH

Partially Functional Dynamic Backdoor Diffusion-based Causal Model

Xinwen Liu, Lei Qian, Song Xi Chen, Niansheng Tang

Comments 16 pages, 2 figures

详情
英文摘要

Causal inference in spatio-temporal settings is critically hindered by unmeasured confounders with complex spatio-temporal dynamics and the prevalence of multi-resolution data. While diffusion models present a promising avenue for estimating structural causal models, existing approaches are limited by assumptions of causal sufficiency or static confounding, failing to capture the region-specific, temporally dependent nature of real-world latent variables or to directly handle functional variables. We bridge this gap by introducing the Partially Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM), a unified generative framework designed to simultaneously tackle causal inference with dynamic confounding and functional data. Our approach formalizes a novel structural causal model that captures spatio-temporal dependencies in latent confounders through conditional autoregressive processes, represents functional variables via basis expansion coefficients treated as standard graph nodes, and integrates valid backdoor adjustment into a diffusion-based generative process. We provide theoretical guarantees on the preservation of causal effects under basis expansion and derive error bounds for counterfactual estimates. Experiments on synthetic data and a real-world air pollution case study demonstrate that PFD-BDCM outperforms existing methods across observational, interventional, and counterfactual queries. This work provides a rigorous and practical tool for robust causal inference in complex spatio-temporal systems characterized by non-stationarity and multi-resolution data.