arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1898
2511.21804 2026-04-08 cs.CR cs.LG

Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy

Gauri Pradhan, Joonas Jälkö, Santiago Zanella-Béguelin, Antti Honkela

Comments Accepted to ICLR 2026; 19 pages, 11 figures

详情
英文摘要

Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adversary's capability to distinguish two adjacent datasets according to chosen adjacency relation. In practice, most DP implementations use the add/remove adjacency relation, where two datasets are adjacent if one can be obtained from the other by adding or removing a single record, thereby protecting membership. In many ML applications, however, the goal is to protect attributes of individual records (e.g., labels used in supervised fine-tuning). We show that privacy accounting under add/remove overstates attribute privacy compared to accounting under the substitute adjacency relation, which permits substituting one record. To demonstrate this gap, we develop novel attacks to audit DP under substitute adjacency, and show empirically that audit results are inconsistent with DP guarantees reported under add/remove, yet remain consistent with the budget accounted under the substitute adjacency relation. Our results highlight that the choice of adjacency when reporting DP guarantees is critical when the protection target is per-record attributes rather than membership.

2511.17652 2026-04-08 q-bio.QM cs.CV

TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

Tianyu Liu, Weihao Xuan, Hao Wu, Peter Humphrey, Marcello DiStasio, Mohamed Kahila, Alfonso Garcia Tan, Heli Qi, Rui Yang, Simeng Han, Tinglin Huang, Fang Wu, Chen Liu, Qingyu Chen, Nan Liu, Irene Li, Hua Xu, Hongyu Zhao

Comments 45 pages, 6 figures

详情
英文摘要

Advances in AI have introduced several strong models in computational pathology to usher it into the era of multi-modal diagnosis, analysis, and interpretation. However, the current pathology-specific visual language models still lack capacities in making the diagnosis with rigorous reasoning paths as well as handling divergent tasks, and thus, challenges of building AI Copilots for real scenarios still exist. Here we introduce TeamPath, an AI system powered by reinforcement learning and router-enhanced solutions based on large-scale histopathology multimodal datasets, to work as a virtual assistant for expert-level disease diagnosis, patch-level information summarization, and cross-modality generation to integrate transcriptomic information for clinical usage. We also collaborate with pathologists from Yale School of Medicine to demonstrate that TeamPath can assist them in working more efficiently by identifying and correcting expert conclusions and reasoning paths. We also discuss the human evaluation results to support the reasoning quality from TeamPath. Overall, TeamPath can flexibly choose the best settings according to the needs, and serve as an innovative and reliable system for information communication across different modalities and experts.

2511.07366 2026-04-08 cs.NI cs.LG

UAV-Assisted Resilience in 6G and Beyond Network Energy Saving: A Multi-Agent DRL Approach

Dao Lan Vy Dinh, Anh Nguyen Thi Mai, Hung Tran, Giang Quynh Le Vu, Tu Dac Ho, Zhenni Pan, Vo Nhan Van, Symeon Chatzinotas, Dinh-Hieu Tran

Comments 6 pages, 5 figures, 1 table

详情
英文摘要

This paper investigates the unmanned aerial vehicle (UAV)-assisted resilience perspective in the 6G network energy saving (NES) scenario. More specifically, we consider multiple ground base stations (GBSs) and each GBS has three different sectors/cells in the terrestrial networks, and multiple cells may become inactive due to unexpected events such as power outages, disasters, hardware failures, or erroneous energy-saving decisions made by external network management systems. During the time required to reactivate these cells, UAVs are deployed to temporarily restore user service. To address this, we propose a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework to enable UAV-assisted communication by jointly optimizing UAV trajectories, transmission power, and user-UAV association under a sleeping ground base station (GBS) strategy. This framework aims to ensure the resilience of active users in the network and the long-term operability of UAVs. Specifically, it maximizes service coverage for users during power outages or NES zones, while minimizing the energy consumption of UAVs. Simulation results demonstrate that the proposed MADDPG policy consistently achieves high coverage ratio across different testing episodes, outperforming other baselines. Moreover, the MADDPG framework attains the lowest total energy consumption, while maintaining a comparable user service rate. These results confirm the effectiveness of the proposed approach in achieving a superior trade-off between energy efficiency and service performance, supporting the development of sustainable and resilient UAV-assisted cellular networks.

2510.11727 2026-04-08 cs.ET cond-mat.mtrl-sci cs.LG

Multi-objective Bayesian Optimization with Human-in-the-Loop for Flexible Neuromorphic Electronics Fabrication

Benius Dunn, Javier Meza-Arroyo, Armi Tiihonen, Mark Lee, Julia W. P. Hsu

详情
Journal ref
J. Mater. Chem. C 14 (6), 2208-2216 (2026)
英文摘要

Neuromorphic computing hardware enables edge computing and can be implemented in flexible electronics for novel applications. Metal oxide materials are promising candidates for fabricating flexible neuromorphic electronics, but suffer from processing constraints due to the incompatibilities between oxides and polymer substrates. In this work, we use photonic curing to fabricate flexible metal-insulator-metal capacitors with solution-processible aluminum oxide dielectric tailored for neuromorphic applications. Because photonic curing outcomes depend on many input parameters, identifying an optimal processing condition through a traditional grid-search approach is unfeasible. Here, we apply multi-objective Bayesian optimization (MOBO) to determine photonic curing conditions that optimize the trade-off between desired electrical properties of large capacitance-frequency dispersion and low leakage current. Furthermore, we develop a human-in-the-loop (HITL) framework for incorporating failed experiments into the MOBO machine learning workflow, demonstrating that this framework accelerates optimization by reducing the number of experimental rounds required. Once optimization is concluded, we analyze different Pareto-optimal conditions to tune the dielectrics properties and provide insight into the importance of different inputs through Shapley Additive exPlanations analysis. The demonstrated framework of combining MOBO with HITL feedback can be adapted to a wide range of multi-objective experimental problems that have interconnected inputs and high experimental failure rates to generate usable results for machine learning models.

2510.07421 2026-04-08 cond-mat.dis-nn cond-mat.mtrl-sci cs.LG

Bayesian Optimization of Multi-Bit Pulse Encoding in In2O3/Al2O3 Thin-film Transistors for Temporal Data Processing

Javier Meza-Arroyo, Benius Dunn, Weijie Xu, Yu-Chieh Chen, Jen-Sue Chen, Julia W. P. Hsu

详情
Journal ref
npj Unconventional Computing 3 (1), 6 (2026)
英文摘要

Utilizing the intrinsic history-dependence and nonlinearity of hardware, physical reservoir computing is a promising neuromorphic approach to encode time-series data for in-sensor computing. The accuracy of this encoding critically depends on the distinguishability of multi-state outputs, which is often limited by suboptimal and empirically chosen reservoir operation conditions. In this work, we demonstrate a machine learning approach, Bayesian optimization, to improve the encoding fidelity of solution-processed Al2O3/In2O3 thin-film transistors (TFTs). We show high-fidelity 6-bit temporal encoding by exploring five key pulse parameters and using the normalized degree of separation (nDoS) as the metric of output state separability. Additionally, we show that a model trained on simpler 4-bit data can effectively guide optimization of more complex 6-bit encoding tasks, reducing experimental cost. Specifically, for the encoding and reconstruction of binary-patterned images of a moving car across 6 sequential frames, we demonstrate that the encoding is more accurate when operating the TFT using optimized pulse parameters and the 4-bit optimized operating condition performs almost as well as the 6-bit optimized condition. Finally, interpretability analysis via Shapley Additive Explanations (SHAP) reveals that gate pulse amplitude and drain voltage are the most influential parameters in achieving higher state separation. This work presents the first systematic method to identify optimal operating conditions for reservoir devices, and the approach can be extended to other physical reservoir implementations across different material platforms.

2509.18095 2026-04-08 cs.IR cs.CL cs.CV

MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

Zilin Xiao, Qi Ma, Mengting Gu, Chun-cheng Jason Chen, Xintao Chen, Vicente Ordonez, Vijai Mohan

Comments ICLR 2026 Oral

详情
英文摘要

Universal multimodal embedding models have achieved great success in capturing semantic relevance between queries and candidates. However, current methods either condense queries and candidates into a single vector, potentially limiting the expressiveness for fine-grained information, or produce too many vectors that are prohibitive for multi-vector retrieval. In this work, we introduce MetaEmbed, a new framework for multimodal retrieval that rethinks how multimodal embeddings are constructed and interacted with at scale. During training, a fixed number of learnable Meta Tokens are appended to the input sequence. At test-time, their last-layer contextualized representations serve as compact yet expressive multi-vector embeddings. Through the proposed Matryoshka Multi-Vector Retrieval training, MetaEmbed learns to organize information by granularity across multiple vectors. As a result, we enable test-time scaling in multimodal retrieval where users can balance retrieval quality against efficiency demands by selecting the number of tokens used for indexing and retrieval interactions. Extensive evaluations on the Massive Multimodal Embedding Benchmark (MMEB) and the Visual Document Retrieval Benchmark (ViDoRe) confirm that MetaEmbed achieves state-of-the-art retrieval performance while scaling robustly to models with 32B parameters. Code is available at https://github.com/facebookresearch/MetaEmbed.

2509.02617 2026-04-08 stat.ML cs.LG stat.CO

Gaussian process surrogate with physical law-corrected prior for multi-coupled PDEs defined on irregular geometry

Pucheng Tang, Hongqiao Wang, Wenzhou Lin, Qian Chen, Heng Yong

Comments 28 pages, 15 figures, 6 tables

详情
英文摘要

Parametric partial differential equations (PDEs) serve as fundamental mathematical tools for modeling complex physical phenomena, yet repeated high-fidelity numerical simulations across parameter spaces remain computationally prohibitive. In this work, we propose a physical law-corrected prior Gaussian process (LC-prior GP) for efficient surrogate modeling of parametric PDEs. The proposed method employs proper orthogonal decomposition (POD) to represent high-dimensional discrete solutions in a low-dimensional modal coefficient space, significantly reducing the computational cost of kernel optimization compared with standard GP approaches in full-order spaces. The governing physical laws are further incorporated to construct a law-corrected prior to overcome the limitation of existing physics-informed GP methods that rely on linear operator invariance, which enables applications to nonlinear and multi-coupled PDE systems without kernel redesign. Furthermore, the radial basis function-finite difference (RBF-FD) method is adopted for generating training data, allowing flexible handling of irregular spatial domains. The resulting differentiation matrices are independent of solution fields, enabling efficient optimization in the physical correction stage without repeated assembly. The proposed framework is validated through extensive numerical experiments, including nonlinear multi-parameter systems and scenarios involving multi-coupled physical variables defined on different two-dimensional irregular domains to highlight the accuracy and efficiency compared with baseline approaches.

2509.00946 2026-04-08 eess.IV cs.CV

Ultrasound-based detection and malignancy prediction of breast lesions eligible for biopsy: A multi-center clinical-scenario study using nomograms, large language models, and radiologist evaluation

Ali Abbasian Ardakani, Afshin Mohammadi, Taha Yusuf Kuzan, Beyza Nur Kuzan, Hamid Khorshidi, Ashkan Ghorbani, Alisa Mohebbi, Fariborz Faeghi, Sepideh Hatamikia, U Rajendra Acharya

Comments Academic Radiology (2026)

详情
Journal ref
"Ultrasound-based detection and malignancy prediction of breast lesions eligible for biopsy: A multi-center clinical-scenario study using nomograms, large language models, and radiologist evaluation." Academic Radiology (2026)
英文摘要

To develop and externally validate integrated ultrasound nomograms combining BIRADS features and quantitative morphometric characteristics, and to compare their performance with expert radiologists and state of the art large language models in biopsy recommendation and malignancy prediction for breast lesions. In this retrospective multicenter, multinational study, 1747 women with pathologically confirmed breast lesions underwent ultrasound across three centers in Iran and Turkey. A total of 10 BIRADS and 26 morphological features were extracted from each lesion. A BIRADS, morphometric, and fused nomogram integrating both feature sets was constructed via logistic regression. Three radiologists (one senior, two general) and two ChatGPT variants independently interpreted deidentified breast lesion images. Diagnostic performance for biopsy recommendation (BIRADS 4,5) and malignancy prediction was assessed in internal and two external validation cohorts. In pooled analysis, the fused nomogram achieved the highest accuracy for biopsy recommendation (83.0%) and malignancy prediction (83.8%), outperforming the morphometric nomogram, three radiologists and both ChatGPT models. Its AUCs were 0.901 and 0.853 for the two tasks, respectively. In addition, the performance of the BIRADS nomogram was significantly higher than the morphometric nomogram, three radiologists and both ChatGPT models for biopsy recommendation and malignancy prediction. External validation confirmed the robust generalizability across different ultrasound platforms and populations. An integrated BIRADS morphometric nomogram consistently outperforms standalone models, LLMs, and radiologists in guiding biopsy decisions and predicting malignancy. These interpretable, externally validated tools have the potential to reduce unnecessary biopsies and enhance personalized decision making in breast imaging.

2507.10610 2026-04-08 cs.CR cs.AI

LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents

Zihe Yan, Jiaping Gui, Zhuosheng Zhang, Gongshen Liu

Comments Accepted by CVPR-26

详情
英文摘要

Graphical user interface (GUI) agents built on multimodal large language models (MLLMs) have recently demonstrated strong decision-making abilities in screen-based interaction tasks. However, they remain highly vulnerable to pop-up-based environmental injection attacks, where malicious visual elements divert model attention and lead to unsafe or incorrect actions. Existing defense methods either require costly retraining or perform poorly under inductive interference. In this work, we systematically study how such attacks alter the attention behavior of GUI agents and uncover a layer-wise attention divergence pattern between correct and incorrect outputs. Based on this insight, we propose \textbf{LaSM}, a \textit{Layer-wise Scaling Mechanism} that selectively amplifies attention and MLP modules in critical layers. LaSM improves the alignment between model saliency and task-relevant regions without additional training. Extensive experiments across multiple datasets demonstrate that our method significantly improves the defense success rate and exhibits strong robustness, while having negligible impact on the model's general capabilities. Our findings reveal that attention misalignment is a core vulnerability in MLLM agents and can be effectively addressed through selective layer-wise modulation. Our code can be found in https://github.com/YANGTUOMAO/LaSM.

2506.22480 2026-04-08 cs.NI cs.DC cs.LG

Service Placement in Small Cell Networks Using Distributed Best Arm Identification in Linear Bandits

Mariam Yahya, Aydin Sezgin, Setareh Maghsudi

详情
英文摘要

As users in small cell networks increasingly rely on computation-intensive services, cloud-based access often results in high latency. Multi-access edge computing (MEC) mitigates this by bringing computational resources closer to end users, with small base stations (SBSs) serving as edge servers to enable low-latency service delivery. However, limited edge capacity makes it challenging to decide which services to deploy locally versus in the cloud, especially under unknown service demand and dynamic network conditions. To tackle this problem, we model service demand as a linear function of service attributes and formulate the service placement task as a linear bandit problem, where SBSs act as agents and services as arms. The goal is to identify the service that, when placed at the edge, offers the greatest reduction in total user delay compared to cloud deployment. We propose a distributed and adaptive multi-agent best-arm identification (BAI) algorithm under a fixed-confidence setting, where SBSs collaborate to accelerate learning. Simulations show that our algorithm identifies the optimal service with the desired confidence and achieves near-optimal speedup, as the number of learning rounds decreases proportionally with the number of SBSs. We also provide theoretical analysis of the algorithm's sample complexity and communication overhead.

2506.22397 2026-04-08 eess.IV cs.AI cs.CV

HazeMatching: Dehazing Light Microscopy Images with Guided Conditional Flow Matching

Anirban Ray, Ashesh Ashesh, Florian Jug

Comments Accepted to IEEE/CVF CVPR 2026 (Findings). 4 figures, 8 pages + refs, 45 pages total (including supplement), 28 supplementary figures

详情
英文摘要

Fluorescence microscopy is a major driver of scientific progress in the life sciences. Although high-end confocal microscopes are capable of filtering out-of-focus light, cheaper and more accessible microscopy modalities, such as widefield microscopy, can not, which consequently leads to hazy image data. Computational dehazing is trying to combine the best of both worlds, leading to cheap microscopy but crisp-looking images. The perception-distortion trade-off tells us that we can optimize either for data fidelity, e.g. low MSE or high PSNR, or for data realism, measured by perceptual metrics such as LPIPS or FID. Existing methods either prioritize fidelity at the expense of realism, or produce perceptually convincing results that lack quantitative accuracy. In this work, we propose HazeMatching, a novel iterative method for dehazing light microscopy images, which effectively balances these objectives. Our goal was to find a balanced trade-off between the fidelity of the dehazing results and the realism of individual predictions (samples). We achieve this by adapting the conditional flow matching framework by guiding the generative process with a hazy observation in the conditional velocity field. We evaluate HazeMatching on 5 datasets, covering both synthetic and real data, assessing both distortion and perceptual quality. Our method is compared against 12 baselines, achieving a consistent balance between fidelity and realism on average. Additionally, with calibration analysis, we show that HazeMatching produces well-calibrated predictions. Note that our method does not need an explicit degradation operator to exist, making it easily applicable on real microscopy data. All data used for training and evaluation and our code will be publicly available under a permissive license.

2506.15771 2026-04-08 quant-ph cs.LG

Superconducting Qubit Readout Using Next-Generation Reservoir Computing

Robert Kent, Benjamin Lienhard, Gregory Lafyatis, Daniel J. Gauthier

详情
Journal ref
Phys. Rev. Applied 25, 044009 (2026)
英文摘要

Quantum processors require rapid and high-fidelity simultaneous measurements of many qubits. While superconducting qubits are among the leading modalities toward a useful quantum processor, their readout remains a bottleneck. Traditional approaches to processing measurement data often struggle to account for crosstalk present in frequency-multiplexed readout, the preferred method to reduce the resource overhead. Recent approaches to address this challenge use neural networks to improve the state-discrimination fidelity. However, they are computationally expensive to train and evaluate, resulting in increased latency and poor scalability as the number of qubits increases. We present an alternative machine learning approach based on next-generation reservoir computing that constructs polynomial features from the measurement signals and maps them to the corresponding qubit states. This method is highly parallelizable, avoids the costly nonlinear activation functions common in neural networks, and supports real-time training, enabling fast evaluation, adaptability, and scalability. Despite its lower computational complexity, our reservoir approach is able to maintain high qubit-state-discrimination fidelity. Relative to traditional methods, our approach achieves error reductions of up to 50% and 11% on single- and five-qubit datasets, respectively, and delivers up to 2.5x crosstalk reduction on the five-qubit dataset. Compared with recent machine-learning methods, evaluating our model requires 100x fewer multiplications for single-qubit and 2.5x fewer for five-qubit models. This work demonstrates that reservoir computing can enhance qubit-state discrimination while maintaining scalability for future quantum processors.

2504.19959 2026-04-08 cs.AR cs.AI

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

Junhao Ye, Yuchen Hu, Ke Xu, Dingrong Pan, Qichun Chen, Jie Zhou, Shuai Zhao, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang

Comments Accepted by the IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2025. This version includes the camera-ready manuscript

详情
英文摘要

Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise from the considerable manual coding effort required, repetitive manual execution of multiple EDA tools, and the need for in-depth domain expertise to navigate complex designs.Here, we present UVM^2, an automated verification framework that leverages Large Language Models (LLMs) to generate UVM testbenches and iteratively refine them using coverage feedback, significantly reducing manual effort while maintaining rigorous verification standards.To evaluate UVM^2, we introduce a benchmark suite comprising Register Transfer Level (RTL) designs of up to 1.6K lines of code.The results show that UVM^2 reduces testbench setup time by up to UVM^2 compared to experienced engineers, and achieve average code and function coverage of 87.44% and 89.58%, outperforming state-of-the-art solutions by 20.96% and 23.51%, respectively.

2504.03943 2026-04-08 stat.ML cond-mat.mtrl-sci cs.LG

Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects

Imon Mia, Armi Tiihonen, Anna Ernst, Anusha Srivastava, Tonio Buonassisi, William Vandenberghe, Julia W. P. Hsu

详情
英文摘要

Bayesian Optimization (BO) machine learning method is increasingly used to guide experimental optimization tasks in materials science. To emulate the large number of input variables and noise-containing results in experimental materials research, we perform batch BO simulation of six design variables with a range of noise levels. Two test cases relevant for materials science problems are examined: a needle-in-a-haystack case (Ackley function) that may be encountered in, e.g., molecule optimizations, and a smooth landscape with a local optimum in addition to the global optimum (Hartmann function) that may be encountered in, e.g., material composition optimization. We show learning curves, performance metrics, and visualization to effectively track the optimization progression and evaluate how the optimization outcomes are affected by noise, batch-picking method, choice of acquisition function, and exploration hyperparameter values. We find that the effects of noise depend on the problem landscape: noise degrades the optimization results of a needle-in-a-haystack search (Ackley) dramatically more. However, with increasing noise, we observe an increasing probability of landing on the local optimum in Hartmann. Therefore, prior knowledge of the problem domain structure and noise level is essential when designing BO for materials research experiments. Synthetic data studies -- with known ground truth and controlled noise levels -- enable us to isolate and evaluate the impact of different batch BO components, {\it e.g.}, acquisition policy, objective metrics, and hyperparameter values, before transitioning to the inherent uncertainties of real experimental systems. The results and methodology of this study will facilitate a greater utilization of BO in guiding experimental materials research, specifically in settings with a large number of design variables to optimize.

2503.15696 2026-04-08 math.NA cs.LG cs.NA

Approximation properties of neural ODEs

Arturo De Marinis, Davide Murari, Elena Celledoni, Nicola Guglielmi, Brynjulf Owren, Francesco Tudisco

Comments 30 pages, 8 figures, 2 tables

详情
英文摘要

We study the approximation properties of neural ordinary differential equations (neural ODEs) in the space of continuous functions. Since a neural ODE requires input and output dimensions to be the same, while input and output dimensions of a continuous function are generally different, we need to embed an input into the latent space of the neural ODE, and to project the output of the neural ODE into the output space. By composing the neural ODE flow map with such embedding and projection operations, we get a shallow neural network whose activation function is defined as the flow map of the neural ODE at the final time of the integration interval. Thus, the study of the approximation properties of neural ODEs leads to the study of the approximation properties of shallow neural networks with a particular choice of activation function. We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions. Furthermore, we investigate the approximation properties of shallow neural networks whose parameters satisfy specific constraints. In particular, we constrain the Lipschitz constant of the neural ODE's flow map and the norms of the weights to increase the network's stability. We prove that the UAP holds if we consider either constraint independently. When both are enforced, there is a loss of expressiveness, and we derive approximation bounds that quantify how accurately such a constrained network can approximate a continuous function.

2503.14094 2026-04-08 eess.IV cs.CV physics.med-ph

Image-Based Metrics in Ultrasound for Estimation of Global Speed-of-Sound

Roman Denkin, Orcun Goksel

详情
英文摘要

Accurate speed-of-sound (SoS) estimation is crucial for ultrasound image formation, yet conventional systems often rely on an assumed value for imaging. We propose to leverage conventional image analysis techniques and metrics as a novel and simple approach to estimate tissue SoS. We study eleven metrics in three categories for assessing image quality, image similarity and multi-frame variation, by testing them in numerical simulations and phantom experiments, as well as testing in an in vivo scenario. Among single-frame image quality metrics, conventional Focus and a proposed metric variation on Tenengrad present satisfactory accuracy (5-8\,m/s on phantoms), but only when the metrics are applied after compounding multiple frames. Differential image comparison metrics were more successful overall with errors consistently under 8\,m/s even applied on a single pair of frames. Mutual information and correlation metrics were found to be robust in processing relatively small image patches, making them suitable for focal estimation. We present an in vivo study on breast density classification based on SoS, to showcase clinical applicability. The studied metrics do not require access to raw channel data as they can operate on post-beamformed and/or B-mode data. These image-based methods offer a computationally efficient and data-accessible alternative to existing physics- and model-based approaches for SoS estimation.

2502.19463 2026-04-08 cs.CY cs.AI cs.SI

Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

Rafiya Javed, Cassandra Parent, Jackie Kay, David Yanni, Abdullah Zaini, Anushe Sheikh, Maribeth Rauh, Walter Gerych, Ramona Comanescu, Iason Gabriel, Marzyeh Ghassemi, Laura Weidinger

详情
英文摘要

Hedging and non-affirmation are behaviors exhibited by large language models (LLMs) that limit the clear endorsement of specific statements. While these behaviors are desirable in subjective contexts, they are undesirable in the context of human rights - which apply unambiguously to all groups. We present a systematic framework to measure these behaviors in unconstrained LLM responses regarding various identity groups. We evaluate six large proprietary models as well as one open-weight LLM on 4738 prompts across 205 national and stateless ethnic identities and find that 4 out of 7 display hedging and non-affirmation that is significantly dependent on the identity of the group. While factors like conflict signals, sovereignty (whether identity is stateless), or economic indicators (GDP) also influence model behavior, their effect sizes are consistently weaker than the impact of identity itself. The systematic disparity is robust to methods of rephrasing the prompts. Since group identity is the strongest predictor of these behaviors, we use open-weight models to explore whether applying steering and orthogonalization techniques to these group identities can mitigate the rates of hedging and non-affirmation behaviors. We find that group steering is the most effective debiasing approach across query types and is robust to downstream forgetting.

2502.06556 2026-04-08 cs.SE cs.CL

MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms

Yibo Wang, Congying Xia, Wenting Zhao, Jiangshu Du, Chunyu Miao, Zhongfen Deng, Philip S. Yu, Chen Xing

Comments Published at ACL 2026 (Findings)

详情
英文摘要

Unit test generation has become a promising and important Large Language Model (LLM) use case. However, existing evaluation benchmarks for LLM unit test generation focus on function- or class-level code (single-file) rather than more practical and challenging multi-file-level codebases. To address such a limitation, we propose MultiFileTest, a multi-file-level benchmark for unit test generation covering Python, Java, and JavaScript. MultiFileTest features 20 moderate-sized and high-quality projects per language. We evaluate eleven frontier LLMs on MultiFileTest, and the results show that most frontier LLMs tested exhibit moderate performance on MultiFileTest, highlighting the difficulty of MultiFileTest. We also conduct a thorough error analysis, which shows that even advanced LLMs, such as Gemini-3.0-Pro, exhibit basic yet critical errors, including executability and cascade errors. Motivated by this observation, we further evaluate all frontier LLMs under manual error-fixing and self-error-fixing scenarios to assess their potential when equipped with error-fixing mechanisms. Our code and dataset is available at \href{https://github.com/YiboWANG214/ProjectTest}{MultiFileTest}.

2410.20791 2026-04-08 cs.SE cs.AI

From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap

Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Dayi Lin, Jiho Shin, Ahmed E. Hassan

详情
英文摘要

The rapid expansion of foundation models (FMs), such as large language models (LLMs), has given rise to FMware, software systems that integrate FM(s) as core components. While building demonstration-level FMware is relatively straightforward, transitioning to production-ready systems presents numerous challenges, including reliability, high implementation costs, scalability, and compliance with privacy regulations. Our paper conducts a semi-structured thematic synthesis to identify key challenges in productionizing FMware across diverse data sources, including our industry experience developing FMArts, a FMware lifecycle engineering platform, and its integration into Huawei Cloud; grey literature; academic publications; hands-on involvement in the Open Platform for Enterprise AI (OPEA); organizing the AIware conference and bootcamp; and co-leading the ISO SPDX SBOM working group on AI and datasets. We identify critical issues in FM(s) selection, data and model alignment, prompt engineering, agent orchestration, system testing, and deployment, alongside cross-cutting concerns such as memory management, observability, and feedback integration. We discuss necessary technologies and strategies to address these challenges and offer guidance to enable the transition from demonstration systems to scalable, production-ready FMware solutions. Our findings underscore the importance of continued research and multi-industry collaboration to advance the development of production-ready FMware.

2410.19940 2026-04-08 cs.LO cs.AI cs.PL

Cobblestone: A Divide-and-Conquer Approach for Automating Formal Verification

Saketh Ram Kasibatla, Arpan Agarwal, Yuriy Brun, Sorin Lerner, Talia Ringer, Emily First

Comments 11 pages, 13 figures, Appearing at ICSE '26, Rio de Janeiro, Brazil

详情
英文摘要

Formal verification using proof assistants, such as Coq, is an effective way of improving software quality, but requires significant effort and expertise. Machine learning can automatically synthesize proofs, but such tools are able to prove only a fraction of desired software properties. We introduce Cobblestone, a divide-and-conquer approach for proof synthesis. Cobblestone uses a large language model (LLM) to generate potential proofs, uses those proofs to break the problem into simpler parts, automatically identifies which of those parts were successfully proven, and iterates on the remaining parts to build a correct proof that is guaranteed to be sound, despite the reliance on unsound LLMs. We evaluate Cobblestone on four benchmarks of open-source Coq projects, controlling for training data leakage. Fully automatically, Cobblestone outperforms state-of-the-art non-LLM tools, and proves many theorems that other LLM-based tools cannot, and on many benchmarks, outperforms them. Each Cobblestone run costs only $1.25 and takes 14.7 minutes, on average. Cobblestone can also be used with external input, from a user or another tool, providing a proof structure or relevant lemmas. Evaluated with such an oracle, Cobblestone proves up to 58% of theorems. Overall, our research shows that tools can make use of partial progress and external input to more effectively automate formal verification.

2403.18072 2026-04-08 stat.CO cs.LG stat.ME stat.ML

Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte Carlo

Shijie Zhong, Wanggang Shen, Tommie Catanach, Xun Huan

Comments 28 pages, 19 figures

详情
Journal ref
SIAM/ASA Journal on Uncertainty Quantification 14 (2026) 19-47
英文摘要

Optimal experimental design (OED) provides a systematic approach to quantify and maximize the value of experimental data. Under a Bayesian approach, conventional OED maximizes the expected information gain (EIG) on model parameters. However, we are often interested in not the parameters themselves, but predictive quantities of interest (QoIs) that depend on the parameters in a nonlinear manner. We present a computational framework of predictive goal-oriented OED (GO-OED) suitable for nonlinear observation and prediction models, which seeks the experimental design providing the greatest EIG on the QoIs. In particular, we propose a nested Monte Carlo estimator for the QoI EIG, featuring Markov chain Monte Carlo for posterior sampling and kernel density estimation for evaluating the posterior-predictive density and its Kullback-Leibler divergence from the prior-predictive. The GO-OED design is then found by maximizing the EIG over the design space using Bayesian optimization. We demonstrate the effectiveness of the overall nonlinear GO-OED method, and illustrate its differences versus conventional non-GO-OED, through various test problems and an application of sensor placement for source inversion in a convection-diffusion field.

2402.15095 2026-04-08 math.ST cs.DS cs.LG math.PR stat.TH

The Umeyama algorithm for matching correlated Gaussian geometric models in the low-dimensional regime

Shuyang Gong, Zhangsong Li

Comments 31 pages; updated funding information

详情
英文摘要

Motivated by the problem of matching two correlated random geometric graphs, we study the problem of matching two Gaussian geometric models correlated through a latent node permutation. Specifically, given an unknown permutation $π^*$ on $\{1,\ldots,n\}$ and given $n$ i.i.d. pairs of correlated Gaussian vectors $\{X_{π^*(i)},Y_i\}$ in $\mathbb{R}^d$ with noise parameter $σ$, we consider two types of (correlated) weighted complete graphs with edge weights given by $A_{i,j}=\langle X_i,X_j \rangle$, $B_{i,j}=\langle Y_i,Y_j \rangle$. The goal is to recover the hidden vertex correspondence $π^*$ based on the observed matrices $A$ and $B$. For the low-dimensional regime where $d=O(\log n)$, Wang, Wu, Xu, and Yolou [WWXY22+] established the information thresholds for exact and almost exact recovery in matching correlated Gaussian geometric models. They also conducted numerical experiments for the classical Umeyama algorithm. In our work, we prove that this algorithm achieves exact recovery of $π^*$ when the noise parameter $σ=o(d^{-3}n^{-2/d})$, and almost exact recovery when $σ=o(d^{-3}n^{-1/d})$. Our results approach the information thresholds up to a $\operatorname{poly}(d)$ factor in the low-dimensional regime.

2402.09664 2026-04-08 cs.SE cs.AI cs.CL cs.PL

CodeMind: Evaluating Large Language Models for Code Reasoning

Changshu Liu, Yang Chen, Reyhaneh Jabbarvand

详情
英文摘要

Large Language Models (LLMs) have been widely used to automate programming tasks. Their capabilities have been evaluated by assessing the quality of generated code through tests or proofs. The extent to which they can reason about code is a critical question revealing important insights about their true capabilities. This paper introduces CodeMind, a framework designed to gauge the code reasoning abilities of LLMs through the following explicit and implicit code reasoning tasks: Independent Execution Reasoning (IER), Specification Reasoning (SR) and Dynamic Semantics Reasoning (DSR). The first evaluates the abilities of LLMs to simulate the execution of given inputs to a code and predict the output (IER). The second assesses the abilities of LLMs to incorporate the simulation of test data in the specification into code generation (SR). Finally, CodeMind evaluates LLMs' abilities to understand overall code semantics only given a specific input/output (DSR). Our extensive evaluation of ten LLMs across four widely used benchmarks using CodeMind shows that LLMs, depending on their size and training strategy, can reason about some dynamic aspects of code. However, their performance drops for code with higher complexity, non-trivial logical and arithmetic operators, non-primitive types, and API calls. We show that these reasoning tasks evaluate LLMs differently, and a comprehensive evaluation of code reasoning requires them all. Finally, we show that the performance of LLMs in bug repair is not correlated with any of the code reasoning tasks, and except for advanced frontier models, other LLMs do not incorporate code reasoning when performing bug repair.

2312.08375 2026-04-08 cs.LO cs.AI

An Encoding of Abstract Dialectical Frameworks into Higher-Order Logic

Antoine Martina, Alexander Steen

Comments 31 pages

详情
Journal ref
Journal of Logic and Computation, Vol. 35(2), March 2025
英文摘要

An approach for encoding abstract dialectical frameworks and their semantics into classical higher-order logic is presented. Important properties and semantic relationships are formally encoded and proven using the proof assistant Isabelle/HOL. This approach allows for the computer-assisted analysis of abstract dialectical frameworks using automated and interactive reasoning tools within a uniform logic environment. Exemplary applications include the formal analysis and verification of meta-theoretical properties, and the generation of interpretations and extensions under specific semantic constraints.

2306.10430 2026-04-08 stat.ML cs.AI cs.LG stat.CO stat.ME

Variational Sequential Optimal Experimental Design using Reinforcement Learning

Wanggang Shen, Jiayuan Dong, Xun Huan

详情
Journal ref
Computer Methods in Applied Mechanics and Engineering 444 (2025) 118068
英文摘要

We present variational sequential optimal experimental design (vsOED), a novel method for optimally designing a finite sequence of experiments within a Bayesian framework with information-theoretic criteria. vsOED employs a one-point reward formulation with variational posterior approximations, providing a provable lower bound to the expected information gain. Numerical methods are developed following an actor-critic reinforcement learning approach, including derivation and estimation of variational and policy gradients to optimize the design policy, and posterior approximation using Gaussian mixture models and normalizing flows. vsOED accommodates nuisance parameters, implicit likelihoods, and multiple candidate models, while supporting flexible design criteria that can target designs for model discrimination, parameter inference, goal-oriented prediction, and their weighted combinations. We demonstrate vsOED across various engineering and science applications, illustrating its superior sample efficiency compared to existing sequential experimental design algorithms.

2305.02657 2026-04-08 stat.ML cs.LG

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin

详情
英文摘要

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.

2212.09570 2026-04-08 cs.LO cs.AI

Solving Quantified Modal Logic Problems by Translation to Classical Logics

Alexander Steen, Geoff Sutcliffe, Christoph Benzmüller

Comments 23 pages, 1 figure; updated journal version of conference paper

详情
Journal ref
Journal of Logic and Computation, Vol. 35(4), June 2025
英文摘要

This article describes an evaluation of Automated Theorem Proving (ATP) systems on problems taken from the QMLTP library of first-order modal logic problems. Principally, the problems are translated to both typed first-order and higher-order logic in the TPTP language using an embedding approach, and solved using first-order resp. higher-order logic ATP systems and model finders. Additionally, the results from native modal logic ATP systems are considered, and compared with the results from the embedding approach. The findings are that the embedding process is reliable and successful when state-of-the-art ATP systems are used as backend reasoners, The first-order and higher-order embeddings perform similarly, native modal logic ATP systems have comparable performance to classical systems using the embedding for proving theorems, native modal logic ATP systems are outperformed by the embedding approach for disproving conjectures, and the embedding approach can cope with a wider range of modal logics than the native modal systems considered.

2206.04236 2026-04-08 cs.CR cs.DS cs.LG stat.ML

Edgeworth Accountant: An Analytical Approach to Differential Privacy Composition

Hua Wang, Sheng Gao, Huanyu Zhang, Milan Shen, Weijie J. Su, Jiayuan Wu

详情
英文摘要

In privacy-preserving data analysis, many procedures and algorithms are structured as compositions of multiple private building blocks. As such, an important question is how to efficiently compute the overall privacy loss under composition. This paper introduces the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees for private algorithms. Leveraging the $f$-differential privacy framework, the Edgeworth Accountant accurately tracks privacy loss under composition, enabling a closed-form expression of privacy guarantees through privacy-loss log-likelihood ratios (PLLRs). As implied by its name, this method applies the Edgeworth expansion to estimate and define the probability distribution of the sum of the PLLRs. Furthermore, by using a technique that simplifies complex distributions into simpler ones, we demonstrate the Edgeworth Accountant's applicability to any noise-addition mechanism. Its main advantage is providing $(ε, δ)$-differential privacy bounds that are non-asymptotic and do not significantly increase computational cost. This feature sets it apart from previous approaches, in which the running time increases with the number of mechanisms under composition. We conclude by showing how our Edgeworth Accountant offers accurate estimates and tight upper and lower bounds on $(ε, δ)$-differential privacy guarantees, especially tailored for training private models in deep learning and federated analytics.

2604.06164 2026-04-08 math.CO

On supertoken graphs

Mónica A. Reyes, Cristina Dalfó, Miquel Àngel Fiol

详情
英文摘要

We generalize the concept of token graphs to obtain supertoken graphs. In the latter case, there can be more than one token in a vertex. We formally define supertoken graphs and establish their basic properties. Moreover, we provide some bounds and exact values on the independence number, clique number, and chromatic number of these graphs. Finally, we construct a new infinite family of graphs, which we call the $p$-augmented 2-token graphs of cycles, and study their properties, including the spectral radius or largest adjacency eigenvalue.

2604.06163 2026-04-08 cs.IR

Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

Wei Huang, Keping Bi, Yinqiong Cai, Wei Chen, Jiafeng Guo, Xueqi Cheng

详情
英文摘要

Recent studies show that neural retrievers often display source bias, favoring passages generated by LLMs over human-written ones, even when both are semantically similar. This bias has been considered an inherent flaw of retrievers, raising concerns about the fairness and reliability of modern information access systems. Our work challenges this view by showing that source bias stems from supervision in retrieval datasets rather than the models themselves. We found that non-semantic differences, like fluency and term specificity, exist between positive and negative documents, mirroring differences between LLM and human texts. In the embedding space, the bias direction from negatives to positives aligns with the direction from human-written to LLM-generated texts. We theoretically show that retrievers inevitably absorb the artifact imbalances in the training data during contrastive learning, which leads to their preferences over LLM texts. To mitigate the effect, we propose two approaches: 1) reducing artifact differences in training data and 2) adjusting LLM text vectors by removing their projection on the bias vector. Both methods substantially reduce source bias. We hope our study alleviates some concerns regarding LLM-generated texts in information access systems.