arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1220
专题追踪
2602.16741 2026-02-20 cs.CR cs.AI cs.LG

Can Adversarial Code Comments Fool AI Security Reviewers -- Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

Scott Thornton

Comments 19 pages, 6 figures

详情
英文摘要

AI-assisted code review is widely used to detect vulnerabilities before production release. Prior work shows that adversarial prompt manipulation can degrade large language model (LLM) performance in code generation. We test whether similar comment-based manipulation misleads LLMs during vulnerability detection. We build a 100-sample benchmark across Python, JavaScript, and Java, each paired with eight comment variants ranging from no comments to adversarial strategies such as authority spoofing and technical deception. Eight frontier models, five commercial and three open-source, are evaluated in 9,366 trials. Adversarial comments produce small, statistically non-significant effects on detection accuracy (McNemar exact p > 0.21; all 95 percent confidence intervals include zero). This holds for commercial models with 89 to 96 percent baseline detection and open-source models with 53 to 72 percent, despite large absolute performance gaps. Unlike generation settings where comment manipulation achieves high attack success, detection performance does not meaningfully degrade. More complex adversarial strategies offer no advantage over simple manipulative comments. We test four automated defenses across 4,646 additional trials (14,012 total). Static analysis cross-referencing performs best at 96.9 percent detection and recovers 47 percent of baseline misses. Comment stripping reduces detection for weaker models by removing helpful context. Failures concentrate on inherently difficult vulnerability classes, including race conditions, timing side channels, and complex authorization logic, rather than on adversarial comments.

2602.16738 2026-02-20 cs.MA cs.LG

Self-Evolving Multi-Agent Network for Industrial IoT Predictive Maintenance

Rebin Saleh, Khanh Pham Dinh, Balázs Villányi, Truong-Son Hy

详情
英文摘要

Industrial IoT predictive maintenance requires systems capable of real-time anomaly detection without sacrificing interpretability or demanding excessive computational resources. Traditional approaches rely on static, offline-trained models that cannot adapt to evolving operational conditions, while LLM-based monolithic systems demand prohibitive memory and latency, rendering them impractical for on-site edge deployment. We introduce SEMAS, a self-evolving hierarchical multi-agent system that distributes specialized agents across Edge, Fog, and Cloud computational tiers. Edge agents perform lightweight feature extraction and pre-filtering; Fog agents execute diversified ensemble detection with dynamic consensus voting; and Cloud agents continuously optimize system policies via Proximal Policy Optimization (PPO) while maintaining asynchronous, non-blocking inference. The framework incorporates LLM-based response generation for explainability and federated knowledge aggregation for adaptive policy distribution. This architecture enables resource-aware specialization without sacrificing real-time performance or model interpretability. Empirical evaluation on two industrial benchmarks (Boiler Emulator and Wind Turbine) demonstrates that SEMAS achieves superior anomaly detection performance with exceptional stability under adaptation, sustains prediction accuracy across evolving operational contexts, and delivers substantial latency improvements enabling genuine real-time deployment. Ablation studies confirm that PPO-driven policy evolution, consensus voting, and federated aggregation each contribute materially to system effectiveness. These findings indicate that resource-aware, self-evolving 1multi-agent coordination is essential for production-ready industrial IoT predictive maintenance under strict latency and explainability constraints.

2602.16737 2026-02-20 q-bio.QM cs.LG

Exploring the Utility of MALDI-TOF Mass Spectrometry and Antimicrobial Resistance in Hospital Outbreak Detection

Chang Liu, Jieshi Chen, Alexander J. Sundermann, Kathleen Shutt, Marissa P. Griffith, Lora Lee Pless, Lee H. Harrison, Artur W. Dubrawski

详情
英文摘要

Accurate and timely identification of hospital outbreak clusters is crucial for preventing the spread of infections that have epidemic potential. While assessing pathogen similarity through whole genome sequencing (WGS) is considered the gold standard for outbreak detection, its high cost and lengthy turnaround time preclude routine implementation in clinical laboratories. We explore the utility of two rapid and cost-effective alternatives to WGS, matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry and antimicrobial resistance (AR) patterns. We develop a machine learning framework that extracts informative representations from MALDI-TOF spectra and AR patterns for outbreak detection and explore their fusion. Through multi-species analyses, we demonstrate that in some cases MALDI-TOF and AR have the potential to reduce reliance on WGS, enabling more accessible and rapid outbreak surveillance.

2602.16723 2026-02-20 cs.CR cs.AI

Is Mamba Reliable for Medical Imaging?

Banafsheh Saber Latibari, Najmeh Nazari, Daniel Brignac, Hossein Sayadi, Houman Homayoun, Abhijit Mahalanobis

Comments This paper has been accepted at ISQED 2026

详情
英文摘要

State-space models like Mamba offer linear-time sequence processing and low memory, making them attractive for medical imaging. However, their robustness under realistic software and hardware threat models remains underexplored. This paper evaluates Mamba on multiple MedM-NIST classification benchmarks under input-level attacks, including white-box adversarial perturbations (FGSM/PGD), occlusion-based PatchDrop, and common acquisition corruptions (Gaussian noise and defocus blur) as well as hardware-inspired fault attacks emulated in software via targeted and random bit-flip injections into weights and activations. We profile vulnerabilities and quantify impacts on accuracy indicating that defenses are needed for deployment.

2602.16719 2026-02-20 cs.DB cs.AI

GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions

Yaowen Liu, Xuejia Chen, Anxin Tian, Haoyang Li, Qinbin Li, Xin Zhang, Alexander Zhou, Chen Jason Zhang, Qing Li, Lei Chen

详情
英文摘要

Approximate Nearest Neighbor Search (ANNS) underpins many large-scale data mining and machine learning applications, with efficient retrieval increasingly hinging on GPU acceleration as dataset sizes grow. Although graph-based approaches represent the state of the art in approximate nearest neighbor search, there is a lack of systematic understanding regarding their optimization for modern GPU architectures and their end-to-end effectiveness in practical scenarios. In this work, we present a comprehensive survey and experimental study of GPU-accelerated graph-based vector search algorithms. We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units within GPUs. Through a thorough evaluation of six leading algorithms on eight large-scale benchmark datasets, we assess both graph index construction and query search performance. Our analysis reveals that distance computation remains the primary computational bottleneck, while data transfer between the host CPU and GPU emerges as the dominant factor influencing real-world latency at large scale. We also highlight key trade-offs in scalability and memory usage across different system designs. Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems, and provide a comprehensive benchmark for the knowledge discovery and data mining community.

2602.16611 2026-02-20 cs.GR cs.CV

Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

Santiago Jimenez-Navarro, Belen Masia, Ana Serrano

详情
英文摘要

Humans can infer material characteristics of objects from their visual appearance, and this ability extends to artistic depictions, where similar perceptual strategies guide the interpretation of paintings or drawings. Among the factors that define material appearance, gloss, along with color, is widely regarded as one of the most important, and recent studies indicate that humans can perceive gloss independently of the artistic style used to depict an object. To investigate how gloss and artistic style are represented in learned models, we train an unsupervised generative model on a newly curated dataset of painterly objects designed to systematically vary such factors. Our analysis reveals a hierarchical latent space in which gloss is disentangled from other appearance factors, allowing for a detailed study of how gloss is represented and varies across artistic styles. Building on this representation, we introduce a lightweight adapter that connects our style- and gloss-aware latent space to a latent-diffusion model, enabling the synthesis of non-photorealistic images with fine-grained control of these factors. We compare our approach with previous models and observe improved disentanglement and controllability of the learned factors.

2602.16177 2026-02-20 stat.ML cs.AI cs.LG

Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks

Binchuan Qi

详情
英文摘要

In this work, we propose a notion of practical learnability grounded in finite sample settings, and develop a conjugate learning theoretical framework based on convex conjugate duality to characterize this learnability property. Building on this foundation, we demonstrate that training deep neural networks (DNNs) with mini-batch stochastic gradient descent (SGD) achieves global optima of empirical risk by jointly controlling the extreme eigenvalues of a structure matrix and the gradient energy, and we establish a corresponding convergence theorem. We further elucidate the impact of batch size and model architecture (including depth, parameter count, sparsity, skip connections, and other characteristics) on non-convex optimization. Additionally, we derive a model-agnostic lower bound for the achievable empirical risk, theoretically demonstrating that data determines the fundamental limit of trainability. On the generalization front, we derive deterministic and probabilistic bounds on generalization error based on generalized conditional entropy measures. The former explicitly delineates the range of generalization error, while the latter characterizes the distribution of generalization error relative to the deterministic bounds under independent and identically distributed (i.i.d.) sampling conditions. Furthermore, these bounds explicitly quantify the influence of three key factors: (i) information loss induced by irreversibility in the model, (ii) the maximum attainable loss value, and (iii) the generalized conditional entropy of features with respect to labels. Moreover, they offer a unified theoretical lens for understanding the roles of regularization, irreversible transformations, and network depth in shaping the generalization behavior of deep neural networks. Extensive experiments validate all theoretical predictions, confirming the framework's correctness and consistency.

2602.16075 2026-02-20 cs.AR cs.CR cs.ET cs.LG

DARTH-PUM: A Hybrid Processing-Using-Memory Architecture

Ryan Wong, Ben Feinberg, Saugata Ghose

Comments To appear in the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2026

详情
英文摘要

Analog processing-using-memory (PUM; a.k.a. in-memory computing) makes use of electrical interactions inside memory arrays to perform bulk matrix-vector multiplication (MVM) operations. However, many popular matrix-based kernels need to execute non-MVM operations, which analog PUM cannot directly perform. To retain its energy efficiency, analog PUM architectures augment memory arrays with CMOS-based domain-specific fixed-function hardware to provide complete kernel functionality, but the difficulty of integrating such specialized CMOS logic with memory arrays has largely limited analog PUM to being an accelerator for machine learning inference, or for closely related kernels. An opportunity exists to harness analog PUM for general-purpose computation: recent works have shown that memory arrays can also perform Boolean PUM operations, albeit with very different supporting hardware and electrical signals than analog PUM. We propose DARTH-PUM, a general-purpose hybrid PUM architecture that tackles key hardware and software challenges to integrating analog PUM and digital PUM. We propose optimized peripheral circuitry, coordinating hardware to manage and interface between both types of PUM, an easy-to-use programming interface, and low-cost support for flexible data widths. These design elements allow us to build a practical PUM architecture that can execute kernels fully in memory, and can scale easily to cater to domains ranging from embedded applications to large-scale data-driven computing. We show how three popular applications (AES encryption, convolutional neural networks, large-language models) can map to and benefit from DARTH-PUM, with speedups of 59.4x, 14.8x, and 40.8x over an analog+CPU baseline.

2602.13282 2026-02-20 cs.NI cs.AI

GraFSTNet: Graph-based Frequency SpatioTemporal Network for Cellular Traffic Prediction

Ziyi Li, Hui Ma, Fei Xing, Chunjiong Zhang, Ming Yan

Comments there exists some small errors in the manuscript, and we would like to check and resubmit later

详情
英文摘要

With rapid expansion of cellular networks and the proliferation of mobile devices, cellular traffic data exhibits complex temporal dynamics and spatial correlations, posing challenges to accurate traffic prediction. Previous methods often focus predominantly on temporal modeling or depend on predefined spatial topologies, which limits their ability to jointly model spatio-temporal dependencies and effectively capture periodic patterns in cellular traffic. To address these issues, we propose a cellular traffic prediction framework that integrates spatio-temporal modeling with time-frequency analysis. First, we construct a spatial modeling branch to capture inter-cell dependencies through an attention mechanism, minimizing the reliance on predefined topological structures. Second, we build a time-frequency modeling branch to enhance the representation of periodic patterns. Furthermore, we introduce an adaptive-scale LogCosh loss function, which adjusts the error penalty based on traffic magnitude, preventing large errors from dominating the training process and helping the model maintain relatively stable prediction accuracy across different traffic intensities. Experiments on three open-sourced datasets demonstrate that the proposed method achieves prediction performance superior to state-of-the-art approaches.

2602.04458 2026-02-20 cs.HC cs.RO

Robot-Assisted Group Tours for Blind People

Yaxin Hu, Masaki Kuribayashi, Allan Wang, Seita Kayukawa, Daisuke Sato, Bilge Mutlu, Hironobu Takagi, Chieko Asakawa

Comments In Proceedings of ACM CHI 2026 conference on Human Factors in Computing Systems

详情
英文摘要

Group interactions are essential to social functioning, yet effective engagement relies on the ability to recognize and interpret visual cues, making such engagement a significant challenge for blind people. In this paper, we investigate how a mobile robot can support group interactions for blind people. We used the scenario of a guided tour with mixed-visual groups involving blind and sighted visitors. Based on insights from an interview study with blind people (n=5) and museum experts (n=5), we designed and prototyped a robotic system that supported blind visitors to join group tours. We conducted a field study in a science museum where each blind participant (n=8) joined a group tour with one guide and two sighted participants (n=8). Findings indicated users' sense of safety from the robot's navigational support, concerns in the group participation, and preferences for obtaining environmental information. We present design implications for future robotic systems to support blind people's mixed-visual group participation.

2602.03998 2026-02-20 eess.IV cs.CV q-bio.QM

AtlasPatch: Efficient Tissue Detection and High-throughput Patch Extraction for Computational Pathology at Scale

Ahmed Alagha, Christopher Leclerc, Yousef Kotp, Omar Metwally, Calvin Moras, Peter Rentopoulos, Ghodsiyeh Rostami, Bich Ngoc Nguyen, Jumanah Baig, Abdelhakim Khellaf, Vincent Quoc-Huy Trinh, Rabeb Mizouni, Hadi Otrok, Jamal Bentahar, Mahdi S. Hosseini

Comments Under review

详情
英文摘要

Whole-slide image (WSI) preprocessing, comprising tissue detection followed by patch extraction, is foundational to AI-driven computational pathology but remains a major bottleneck for scaling to large and heterogeneous cohorts. We present AtlasPatch, a scalable framework that couples foundation-model tissue detection with high-throughput patch extraction at minimal computational overhead. Our tissue detector achieves high precision (0.986) and remains robust across varying tissue conditions (e.g., brightness, fragmentation, boundary definition, tissue heterogeneity) and common artifacts (e.g., pen/ink markings, scanner streaks). This robustness is enabled by our annotated, heterogeneous multi-cohort training set of ~30,000 WSI thumbnails combined with efficient adaptation of the Segment-Anything (SAM) model. AtlasPatch also reduces end-to-end WSI preprocessing time by up to 16$\times$ versus widely used deep-learning pipelines, without degrading downstream task performance. The AtlasPatch tool is open-source, efficiently parallelized for practical deployment, and supports options to save extracted patches or stream them into common feature-extraction models for on-the-fly embedding, making it adaptable to both pathology departments (tissue detection and quality control) and AI researchers (dataset creation and model training). AtlasPatch software package is available at https://github.com/AtlasAnalyticsLab/AtlasPatch.

2511.19943 2026-02-20 eess.SP cs.AI cs.LG

AI/ML based Joint Source and Channel Coding for HARQ-ACK Payload

Akash Doshi, Pinar Sen, Kirill Ivanov, Wei Yang, June Namgoong, Runxin Wang, Rachel Wang, Taesang Yoo, Jing Jiang, Tingfang Ji

Comments 39 pages, 15 figures. Under consideration for publication in Journal of Sel. Areas in Information Theory (received Major Revision). This paper was presented in part at the International Symposium on Topics in Coding, August 2025 in the Session for Coding and AI

详情
英文摘要

Channel coding from 2G to 5G has assumed the inputs bits at the physical layer to be uniformly distributed. However, hybrid automatic repeat request acknowledgement (HARQ-ACK) bits transmitted in the uplink are inherently non-uniformly distributed. For such sources, significant performance gains could be obtained by employing joint source channel coding, aided by deep learning-based techniques. In this paper, we learn a transformer-based encoder using a novel "free-lunch" training algorithm and propose per-codeword power shaping to exploit the source prior at the encoder whilst being robust to small changes in the HARQ-ACK distribution. Furthermore, any HARQ-ACK decoder has to achieve a low negative acknowledgement (NACK) error rate to avoid radio link failures resulting from multiple NACK errors. We develop an extension of the Neyman-Pearson test to a coded bit system with multiple information bits to achieve Unequal Error Protection of NACK over ACK bits at the decoder. Finally, we apply the proposed encoder and decoder designs to a 5G New Radio (NR) compliant uplink setup under a fading channel, describing the optimal receiver design and a low complexity coherent approximation to it. Our results demonstrate 3-6 dB reduction in the average transmit power required to achieve the target error rates compared to the NR baseline, while also achieving a 2-3 dB reduction in the maximum transmit power, thus providing for significant coverage gains and power savings.

2511.15162 2026-02-20 eess.SP cs.AI cs.LG

Multimodal Wireless Foundation Models

Ahmed Aboulfotouh, Hatem Abou-Zeid

详情
英文摘要

Wireless foundation models (WFMs) have recently demonstrated promising capabilities, jointly performing multiple wireless functions and adapting effectively to new environments. However, while current WFMs process only one modality, depending on the task and operating conditions, the most informative modality changes and no single modality is best for all tasks. WFMs should therefore be designed to accept multiple modalities to enable a broader and more diverse range of tasks and scenarios. In this work, we propose and build the first multimodal wireless foundation model capable of processing both raw IQ streams and image-like wireless modalities (e.g., spectrograms and CSI) and performing multiple tasks across both. We introduce masked wireless modeling for the multimodal setting, a self-supervised objective and pretraining recipe that learns a joint representation from IQ streams and image-like wireless modalities. We evaluate the model on five tasks across both modality families: image-based (human activity sensing, RF signal classification, 5G NR positioning) and IQ-based (RF device fingerprinting, interference detection/classification). The multimodal WFM is competitive with single-modality WFMs, and in several cases surpasses their performance. Our results demonstrates the strong potential of developing multimodal WFMs that support diverse wireless tasks across different modalities. We believe this provides a concrete step toward both AI-native 6G and the vision of joint sensing, communication, and localization.

2510.12915 2026-02-20 cs.CY cs.CL cs.LG

Toward LLM-Supported Automated Assessment of Critical Thinking Subskills

Marisa C. Peczuh, Nischal Ashok Kumar, Ryan Baker, Blair Lehman, Danielle Eisenberg, Caitlin Mills, Payu Wittawatolarn, Kushaan Naskar, Keerthi Chebrolu, Sudhip Nashi, Cadence Young, Brayden Liu, Sherry Lachman, Andrew Lan

Comments preprint: 12 pages

详情
英文摘要

As the world becomes increasingly saturated with AI-generated content, disinformation, and algorithmic persuasion, critical thinking - the capacity to evaluate evidence, detect unreliable claims, and exercise independent judgment - is becoming a defining human skill. Developing critical thinking skills through timely assessment and feedback is crucial; however, there has not been extensive work in educational data mining on defining, measuring, and supporting critical thinking. In this paper, we investigate the feasibility of measuring "subskills" that underlie critical thinking. We ground our work in an authentic task where students operationalize critical thinking by writing argumentative essays. We developed a coding rubric based on an established skills progression and completed human coding for a corpus of student essays. We then evaluated three distinct approaches to automated scoring: zero-shot prompting, few-shot prompting, and supervised fine-tuning, implemented across three large language models (GPT-5, Llama 3.1 8B, and ModernBERT). Fine-tuning Llama 3.1 8B achieved the best results and demonstrated particular strength on subskills with highly separable proficiency levels with balanced labels across levels, while lower performance was observed for subskills that required detection of subtle distinctions between proficiency levels or imbalanced labels. Our exploratory work represents an initial step toward scalable assessment of critical thinking skills across authentic educational contexts. Future research should continue to combine automated critical thinking assessment with human validation to more accurately detect and measure dynamic, higher-order thinking skills.

2509.22860 2026-02-20 math.OC cs.DC cs.LG stat.ML

Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity

Artavazd Maranjyan, Peter Richtárik

Journal ref The Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

Asynchronous stochastic gradient methods are central to scalable distributed optimization, particularly when devices differ in computational capabilities. Such settings arise naturally in federated learning, where training takes place on smartphones and other heterogeneous edge devices. In addition to varying computation speeds, these devices often hold data from different distributions. However, existing asynchronous SGD methods struggle in such heterogeneous settings and face two key limitations. First, many rely on unrealistic assumptions of similarity across workers' data distributions. Second, methods that relax this assumption still fail to achieve theoretically optimal performance under heterogeneous computation times. We introduce Ringleader ASGD, the first asynchronous SGD algorithm that attains the theoretical lower bounds for parallel first-order stochastic methods in the smooth nonconvex regime, thereby achieving optimal time complexity under data heterogeneity and without restrictive similarity assumptions. Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary and even time-varying worker computation speeds, closing a fundamental gap in the theory of asynchronous optimization.

2509.11461 2026-02-20 cs.HC cs.AI

CareerPooler: AI-Powered Metaphorical Pool Simulation Improves Experience and Outcomes in Career Exploration

Ziyi Wang, Ziwen Zeng, Yuan Li, Zijian Ding

详情
英文摘要

Career exploration is uncertain, requiring decisions with limited information and unpredictable outcomes. While generative AI offers new opportunities for career guidance, most systems rely on linear chat interfaces that produce overly comprehensive and idealized suggestions, overlooking the non-linear and effortful nature of real-world trajectories. We present CareerPooler, a generative AI-powered system that employs a pool-table metaphor to simulate career development as a spatial and narrative interaction. Users strike balls representing milestones, skills, and random events, where hints, collisions, and rebounds embody decision-making under uncertainty. In a within-subjects study with 24 participants, CareerPooler significantly improved engagement, information gain, satisfaction, and career clarity compared to a chatbot baseline. Qualitative findings show that spatial-narrative interaction fosters experience-based learning, resilience through setbacks, and reduced psychological burden. Our findings contribute to the design of AI-assisted career exploration systems and more broadly suggest that visually grounded analogical interactions can make generative systems engaging and satisfying.

2508.03628 2026-02-20 cs.IR cs.AI cs.LG

LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay

Soumik Dey, Benjamin Braun, Naveen Ravipati, Hansi Wu, Binbin Li

详情
英文摘要

E-commerce sellers are advised to bid on keyphrases to boost their advertising campaigns. These keyphrases must be relevant to prevent irrelevant items from cluttering Search systems and to maintain positive seller perception. It is vital that keyphrase suggestions align with seller, Search, and buyer judgments. Given the challenges in collecting negative feedback in these systems, LLMs have been used as a scalable proxy for human judgments. We present an empirical study on a major e-commerce platform of a distillation framework involving an LLM teacher, a cross-encoder assistant and a bi-encoder Embedding Based Retrieval (EBR) student model, aimed at mitigating click-induced biases and provide more diverse keyphrase recommendations while aligning advertising, search and buyer preferences.

2507.11091 2026-02-20 eess.AS cs.SD eess.SP

Array-Aware Ambisonics and HRTF Encoding for Binaural Reproduction With Wearable Arrays

Yhonatan Gayer, Vladimir Tourbabin, Zamir Ben Hur, David Lou Alon, Boaz Rafaely

详情
英文摘要

This work introduces a novel method for binaural reproduction from arbitrary microphone arrays, based on array-aware optimization of Ambisonics encoding through Head-Related Transfer Function (HRTF) pre-processing. The proposed approach integrates array-specific information into the HRTF processing pipeline, leading to improved spatial accuracy in binaural rendering. Objective evaluations demonstrate superior performance under simulated wearable-array and head rotations compared to conventional Ambisonics encoding method. A listening experiment further confirms that the method achieves significantly higher perceptual ratings in both timbre and spatial quality. Fully compatible with standard Ambisonics, the proposed method offers a practical solution for spatial audio rendering in applications such as virtual reality, augmented reality, and wearable audio capture.

2506.02529 2026-02-20 cs.SE cs.AI cs.CL

Automated Web Application Testing: End-to-End Test Case Generation with Large Language Models and Screen Transition Graphs

Nguyen-Khang Le, Quan Minh Bui, Minh Ngoc Nguyen, Hiep Nguyen, Trung Vo, Son T. Luu, Shoshin Nomura, Minh Le Nguyen

Comments Published in the Proceedings of JSAI 2025

详情
英文摘要

Web applications are critical to modern software ecosystems, yet ensuring their reliability remains challenging due to the complexity and dynamic nature of web interfaces. Recent advances in large language models (LLMs) have shown promise in automating complex tasks, but limitations persist in handling dynamic navigation flows and complex form interactions. This paper presents an automated system for generating test cases for two key aspects of web application testing: site navigation and form filling. For site navigation, the system employs screen transition graphs and LLMs to model navigation flows and generate test scenarios. For form filling, it uses state graphs to handle conditional forms and automates Selenium script generation. Key contributions include: (1) a novel integration of graph structures and LLMs for site navigation testing, (2) a state graph-based approach for automating form-filling test cases, and (3) a comprehensive dataset for evaluating form-interaction testing. Experimental results demonstrate the system's effectiveness in improving test coverage and robustness, advancing the state of web application testing.

2505.12298 2026-02-20 eess.IV cs.AI cs.CV

Attention-Enhanced U-Net for Accurate Segmentation of COVID-19 Infected Lung Regions in CT Scans

Amal Lahchim, Lazar Davic

Comments 14 pages, 9 figures, created using Google Colab and PyTorch. Compares segmentation models for COVID-19 CT data

详情
英文摘要

In this study, we propose a robust methodology for automatic segmentation of infected lung regions in COVID-19 CT scans using convolutional neural networks. The approach is based on a modified U-Net architecture enhanced with attention mechanisms, data augmentation, and postprocessing techniques. It achieved a Dice coefficient of 0.8658 and mean IoU of 0.8316, outperforming other methods. The dataset was sourced from public repositories and augmented for diversity. Results demonstrate superior segmentation performance. Future work includes expanding the dataset, exploring 3D segmentation, and preparing the model for clinical deployment.

2505.00282 2026-02-20 econ.EM cs.LG

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

Jacob Carlson, Melissa Dell

详情
英文摘要

To analyze unstructured data (text, images, audio, video), economists typically first extract low-dimensional structured features with a neural network. Neural networks do not make generically unbiased predictions, and biases will propagate to estimators that use their predictions. While structured variables extracted from unstructured data have traditionally been treated as proxies - implicitly accepting arbitrary measurement error - this poses various challenges in an era where constantly evolving AI can cheaply extract data. Researcher degrees of freedom (e.g., the choice of neural network architecture, training data or prompts, and numerous implementation details) raise concerns about p-hacking and how to best show robustness, the frequent deprecation of proprietary neural networks complicates reproducibility, and researchers need a principled way to determine how accurate predictions need to be before making costly investments to improve them. To address these challenges, this study develops MAR-S (Missing At Random Structured Data), a semiparametric missing data framework that enables unbiased, efficient, and robust inference with unstructured data, by correcting for neural network prediction error with a validation sample. MAR-S synthesizes and extends existing methods for debiased inference using machine learning predictions and connects them to familiar problems such as causal inference, highlighting valuable parallels. We develop robust and efficient estimators for both descriptive and causal estimands and address inference with aggregated and transformed neural network predictions, a common scenario outside the existing literature.

2409.20250 2026-02-20 stat.ML cs.LG

Input-Label Correlation Governs a Linear-to-Nonlinear Transition in Random Features under Spiked Covariance

Samet Demir, Zafer Dogan

Comments 30 pages, 7 figures

详情
英文摘要

Random feature models (RFMs), two-layer networks with a randomly initialized fixed first layer and a trained linear readout, are among the simplest nonlinear predictors. Prior asymptotic analyses in the proportional high-dimensional regime show that, under isotropic data, RFMs reduce to noisy linear models and offer no advantage over classical linear methods such as ridge regression. Yet RFMs frequently outperform linear baselines on structured real data. We show that this tension is explained by a correlation-driven phase transition: under spiked-covariance designs, the interaction between anisotropy and input-label correlation determines whether the RFM behaves as an effectively linear predictor or exhibits genuinely nonlinear gains. Concretely, we establish a universality principle under anisotropy and characterize the RFM generalization error via an equivalent noisy polynomial model. The effective degree of this polynomial, equivalently, which Hermite orders of the activation survive, is governed by the strength of input-label correlation, yielding an explicit boundary in the correlation-spike-magnitude plane. Below the boundary, the RFM collapses to a linear surrogate and can underperform strong linear baselines; above it, higher-order terms persist and the RFM achieves a clear nonlinear advantage. Numerical simulations and real-data experiments corroborate the theory and delineate the transition between these two regimes.

2406.04388 2026-02-20 eess.IV cs.AI physics.optics

Single Exposure Quantitative Phase Imaging with a Conventional Microscope using Diffusion Models

Gabriel della Maggiora, Luis Alberto Croquevielle, Harry Horsley, Thomas Heinis, Artur Yakimovich

Journal ref (2025). Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 2672-2680

详情
英文摘要

Phase imaging is gaining importance due to its applications in fields like biomedical imaging and material characterization. In biomedical applications, it can provide quantitative information missing in label-free microscopy modalities. One of the most prominent methods in phase quantification is the Transport-of-Intensity Equation (TIE). TIE often requires multiple acquisitions at different defocus distances, which is not always feasible in a clinical setting. To address this issue, we propose to use chromatic aberrations to induce the required through-focus images with a single exposure, effectively generating a through-focus stack. Since the defocus distance induced by the aberrations is small, conventional TIE solvers are insufficient to address the resulting artifacts. We propose Zero-Mean Diffusion, a modified version of diffusion models designed for quantitative image prediction, and train it with synthetic data to ensure robust phase retrieval. Our contributions offer an alternative TIE approach that leverages chromatic aberrations, achieving accurate single-exposure phase measurement with white light and thus improving the efficiency of phase imaging. Moreover, we present a new class of diffusion models that are well-suited for quantitative data and have a sound theoretical basis. To validate our approach, we employ a widespread brightfield microscope equipped with a commercially available color camera. We apply our model to clinical microscopy of patients' urine, obtaining accurate phase measurements.

2602.17662 2026-02-20 quant-ph cond-mat.stat-mech

A Study of Entanglement and Ansatz Expressivity for the Transverse-Field Ising Model using Variational Quantum Eigensolver

Ashutosh P. Tripathi, Nilmani Mathur, Vikram Tripathi

Comments 9 pages, 6 figures, contribution to the 42nd International Symposium on Lattice Field Theory (LATTICE2025), 2-8 November 2025, Tata Institute of Fundamental Research, Mumbai, India

详情
英文摘要

The Variational Quantum Eigensolver (VQE) is a leading hybrid quantum-classical algorithm for simulating many-body systems in the NISQ era. Its effectiveness, however, depends on the faithful preparation of eigenstates, which becomes challenging in degenerate and strongly entangled regimes. We study this problem using the transverse-field Ising model (TFIM) with periodic boundary conditions in one, two, and three dimensions, considering systems of up to 27 qubits. We employ different ansatzes: the hardware-efficient EfficientSU2 from Qiskit, the physics-inspired Hamiltonian Variational Ansatz (HVA) and HVA with symmetry breaking, and benchmark their performance using energy variance, entanglement entropy, spin correlations, and magnetization.

2602.17661 2026-02-20 math.GT math.GR math.QA

Dehn quandles of surfaces and their bounded cohomology

Pankaj Kapari, Deepanshi Saraf, Mahender Singh

Comments 29 pages, 9 figures, comments are welcome

详情
英文摘要

We introduce new families of quandles that serve as invariants for classifying closed orientable surfaces. These families generalize the classical Dehn quandle and are defined, respectively, on isotopy classes of unoriented closed curves and on integral weighted multicurves. We establish their fundamental algebraic properties and construct a natural quandle covering that relates them. We then analyze their metric properties, showing that these quandles are unbounded with respect to the quandle metric. Next, we compute their second bounded quandle cohomology, proving it to be infinite-dimensional. We also establish a version of the Gromov Mapping Theorem, showing that the natural map from an abelian quandle extension onto the original quandle induces an injection on bounded quandle cohomology in every dimension. Finally, inspired by recent developments in quandle rings, we analyze idempotents in the integral quandle rings arising from the classical Dehn quandle of a surface.

2602.17660 2026-02-20 quant-ph math-ph math.MP

Benchmarking quantum phase-space methods for near-resonant light propagation

Mojdeh S. Najafabadi, Joel F. Corney, Luis Sanchez Soto, Gerd Leuchs

Comments 9 pages, 2 figures

详情
英文摘要

We study the dynamics of light interacting with a near-resonant atomic medium using the truncated Wigner and positive P phase-space representations. The atomic degrees of freedom are described using the Jordan-Schwinger mapping. The dynamics is first analyzed under unitary evolution and subsequently in the presence of an optical reservoir. While both approaches capture the main features of the light-matter dynamics, we find that the truncated Wigner approximation exhibits noticeable deviations for stronger interaction strengths and when reservoir-induced noise becomes significant.

2602.17652 2026-02-20 astro-ph.GA

A Chemodynamical Census of the Milky Way's Ultra-Faint Compact Satellites. I. A First Population-Level Look at the Internal Kinematics and Metallicities of 19 Extremely-Low-Mass Halo Stellar Systems

William Cerny, Ting S. Li, Andrew B. Pace, Joshua D. Simon, Marla Geha, Alexander P. Ji, Alex Drlica-Wagner, Jordan Bruce, Oleg Y. Gnedin, Eric F. Bell, Sidney Mau, Ivanna Escala, Daisy Bissonette, Alessandro Savino, Anirudh Chiti, Evan N. Kirby

Comments 63 pages (main) + 18 pages (references + appendix), 30 Figures, 6 Tables. Will submit to ApJ in one week; comments welcome. Brief summary available here: https://wcerny.github.io/compactsatellites/. Repository with spectroscopic member catalogs: https://zenodo.org/records/18612486. Forthcoming Paper II will explore the orbits, accretion histories, and tidal influences of the same sample

详情
英文摘要

Deep, wide-area photometric surveys have uncovered a population of compact ($r_{1/2} \approx$ 1-15 pc), extremely-low-mass ($M_* \approx$ 20-4000 $M_{\odot}$) stellar systems in the Milky Way halo that are smaller in size than known ultra-faint dwarf galaxies (UFDs) and substantially fainter than most classical globular clusters (GCs). Very little is known about the nature and origins of this population of "Ultra-Faint Compact Satellites" (UFCSs) owing to a dearth of spectroscopic measurements. Here, we present the first spectroscopic census of these compact systems based on Magellan/IMACS and Keck/DEIMOS observations of 19 individual UFCSs, representing $\sim$2/3 of the known population. We securely measure mean radial velocities for all 19 systems, velocity dispersions for 15 (predominantly upper limits), metallicities for 17, metallicity dispersions for 8, and $\textit{Gaia}$-based mean proper motions for 18. This large new spectroscopic sample provides the first insights into population-level trends for these extreme satellites. We demonstrate that: (1) the UFCSs are kinematically colder, on average, than the UFDs, disfavoring very dense dark matter halos in most cases, (2) the UFCS population is chemically diverse, spanning a factor of $\sim$300 in mean iron abundance ($\rm -3.3 \lesssim [Fe/H] \lesssim -0.8$), with multiple systems falling beneath the "metallicity floor" proposed for GCs, and (3) while some higher-metallicity and/or younger UFCSs are clearly star clusters, the dynamical and/or chemical evidence allows the possibility that up to $\sim$50% of the UFCSs in our sample (9 of 19) may represent the smallest and least-massive galaxies yet discovered.

2602.17651 2026-02-20 cs.CR

Non-Trivial Zero-Knowledge Implies One-Way Functions

Suvradip Chakraborty, James Hulett, Dakshita Khurana, Kabir Tomer

详情
英文摘要

A recent breakthrough [Hirahara and Nanashima, STOC'2024] established that if $\mathsf{NP} \not \subseteq \mathsf{ioP/poly}$, the existence of zero-knowledge with negligible errors for $\mathsf{NP}$ implies the existence of one-way functions (OWFs). In this work, we obtain a characterization of one-way functions from the worst-case complexity of zero-knowledge {\em in the high-error regime}. We say that a zero-knowledge argument is {\em non-trivial} if the sum of its completeness, soundness and zero-knowledge errors is bounded away from $1$. Our results are as follows, assuming $\mathsf{NP} \not \subseteq \mathsf{ioP/poly}$: 1. {\em Non-trivial} Non-Interactive ZK (NIZK) arguments for $\mathsf{NP}$ imply the existence of OWFs. Using known amplification techniques, this result also provides an unconditional transformation from weak to standard NIZK proofs for all meaningful error parameters. 2. We also generalize to the interactive setting: {\em Non-trivial} constant-round public-coin zero-knowledge arguments for $\mathsf{NP}$ imply the existence of OWFs, and therefore also (standard) four-message zero-knowledge arguments for $\mathsf{NP}$. Prior to this work, one-way functions could be obtained from NIZKs that had constant zero-knowledge error $ε_{zk}$ and soundness error $ε_{s}$ satisfying $ε_{zk} + \sqrt{ε_{s}} < 1$ [Chakraborty, Hulett and Khurana, CRYPTO'2025]. However, the regime where $ε_{zk} + \sqrt{ε_{s}} \geq 1$ remained open. This work closes the gap, and obtains new implications in the interactive setting. Our results and techniques could be useful stepping stones in the quest to construct one-way functions from worst-case hardness.

2602.17648 2026-02-20 quant-ph

Approaching the Limit in Multiparameter AC Magnetometry with Quantum Control

Takuya Isogawa, Zhiyao Hu, Ayumi Kanamoto, Nutdech Phadetsuwannukun, Shilin Wang, Shunsuke Nishimura, Boning Li, Liang Jiang, Zain H. Saleem, Guoqing Wang, Haidong Yuan, Paola Cappellaro

Comments 13 pages, 7 figures

详情
英文摘要

Simultaneously estimating multiple parameters at the ultimate limit is a central challenge in quantum metrology, often hindered by inherent incompatibilities in optimal estimation strategies. At its most extreme, this incompatibility culminates in a fundamental impossibility when the quantum Fisher information matrix (QFIM) becomes singular, rendering joint estimation unattainable. This is the case for a canonical problem: estimating the amplitude and frequency of an AC magnetic field, where the generators are parallel to each other. Here, we introduce a quantum control protocol that resolves this singularity. Our control protocol strategically engineers the sensor's time evolution so the generators for the two parameters become orthogonal. It not only removes the singularity but also restores the optimal scaling of precision with interrogation time for both parameters simultaneously. We experimentally validate this protocol using a nitrogen-vacancy center in diamond at room temperature, demonstrating the concurrent achievement of the optimal scaling for both parameters under realistic conditions.

2602.17647 2026-02-20 quant-ph cs.CC

Pseudo-deterministic Quantum Algorithms

Hugo Aaronson, Tom Gur, Jiawei Li

详情
英文摘要

We initiate a systematic study of pseudo-deterministic quantum algorithms. These are quantum algorithms that, for any input, output a canonical solution with high probability. Focusing on the query complexity model, our main contributions include the following complexity separations, which require new lower bound techniques specifically tailored to pseudo-determinism: - We exhibit a problem, Avoid One Encrypted String (AOES), whose classical randomized query complexity is $O(1)$ but is maximally hard for pseudo-deterministic quantum algorithms ($Ω(N)$ query complexity). - We exhibit a problem, Quantum-Locked Estimation (QL-Estimation), for which pseudo-deterministic quantum algorithms admit an exponential speed-up over classical pseudo-deterministic algorithms ($O(\log(N))$ vs. $Θ(\sqrt{N})$), while the randomized query complexity is $O(1)$. Complementing these separations, we show that for any total problem $R$, pseudo-deterministic quantum algorithms admit at most a quintic advantage over deterministic algorithms, i.e., $D(R) = \tilde O(psQ(R)^5)$. On the algorithmic side, we identify a class of quantum search problems that can be made pseudo-deterministic with small overhead, including Grover search, element distinctness, triangle finding, $k$-sum, and graph collision.