arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1851
2602.04983 2026-04-22 eess.IV cs.AI cs.LG

AI-Based Detection of Temporal Changes in MR-Linac Images Acquired During Routine Prostate Radiotherapy

Seungbin Park, Peilin Wang, Ryan Pennell, Emily S. Weg, Himanshu Nagar, Timothy McClure, Mert R. Sabuncu, Daniel Margolis, Heejong Kim

详情
英文摘要

Purpose: To investigate whether an AI-based method can detect subtle inter-fraction changes in MR-Linac images acquired during radiotherapy and explore the broader potential of MRLinac imaging. Methods: This retrospective study included longitudinal 0.35T MR-Linac images from 761 patients. To identify temporal changes, we employed a deep learning model using temporal ordering via pairwise comparison, previously shown effective for longitudinal imaging studies. The model was trained using first-to-last fraction pairs (F1-FL) and all pairs (All-pairs). Performance was assessed using quantitative metrics (accuracy and AUC) and compared against a radiologist's performance. Qualitative evaluation was performed using saliency maps, which identify anatomical regions associated with temporal imaging changes. Results: The F1-FL model demonstrated high performance (AUC=0.99, accuracy=0.95) and outperformed the radiologist in temporal ordering task. The All-pairs model also showed high performance (AUC=0.97, accuracy=0.91). Regions contributing to predictions included the prostate, bladder, and pubic symphysis. The performance was correlated to fractional intervals and was reduced for non-radiation-exposed timepoints (Sim and F1), suggesting that observed changes may reflect both temporal variation and radiation exposure. Conclusion: MR-Linac imaging appears capable of capturing subtle changes during prostate radiotherapy that can be detected by AI models, even over approximately two-day intervals. The model's high performance, together with quantitative and qualitative analyses, supports a potential role for MR-Linac in clinical applications beyond image guidance.

2602.04713 2026-04-22 cs.HC cs.AI cs.CV

Adaptive Prompt Elicitation for Text-to-Image Generation

Xinyi Wen, Lena Hegemann, Xiaofu Jin, Shuai Ma, Antti Oulasvirta

Comments 25 pages, 14 figures, ACM IUI 2026

详情
英文摘要

Aligning text-to-image generation with user intent remains challenging, as users frequently provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively poses visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent user intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with 128 participants on user-defined tasks demonstrates 19.8% higher perceived alignment without increased workload. Our work contributes a principled approach to prompting that offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.

2601.12433 2026-04-22 eess.SP cs.LG

Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering

Amanda Nyholm, Yessica Arellano, Jinyu Liu, Damian Krakowiak, Pierluigi Salvo Rossi

Comments 9 pages, 6 figures

详情
英文摘要

Reliable flow measurements are essential in many industries, but current instruments often fail to accurately estimate multiphase flows, which are frequently encountered in real-world operations. Combining machine learning (ML) algorithms with accurate single-phase flowmeters has therefore received extensive research attention in recent years. The Coriolis mass flowmeter is a widely used single-phase meter that provides direct mass flow measurements, which ML models can be trained to correct, thereby reducing measurement errors in multiphase conditions. This paper demonstrates that preserving temporal information significantly improves model performance in such scenarios. We compare a multilayer perceptron, a windowed multilayer perceptron, and a convolutional neural network (CNN) on three-phase air-water-oil flow data from 342 experiments. Whereas prior work typically compresses each experiment into a single averaged sample, we instead compute short-time averages from within each experiment and train models that preserve temporal information at several downsampling intervals. The CNN performed best at 0.25 Hz with approximately 95 % of relative errors below 13 %, a normalized root mean squared error of 0.03, and a mean absolute percentage error of approximately 4.3 %, clearly outperforming the best single-averaged model and demonstrating that short-time averaging within individual experiments is preferable. Results are consistent across multiple data splits and random seeds, demonstrating robustness.

2601.03512 2026-04-22 cs.SE cs.AI

Bootstrapping Code Translation with Weighted Multilanguage Exploration

Yuhan Wu, Huan Zhang, Wei Cheng, Chen Shen, Jingyue Yang, Wei Hu

Comments Accepted in the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

详情
英文摘要

Code translation across multiple programming languages is essential yet challenging due to two vital obstacles: scarcity of parallel data paired with executable test oracles, and optimization imbalance when handling diverse language pairs. We propose BootTrans, a bootstrapping method that resolves both obstacles. Its key idea is to leverage the functional invariance and cross-lingual portability of test suites, adapting abundant pivot-language unit tests to serve as universal verification oracles for multilingual reinforcement learning (RL) training. Our method introduces a dual-pool architecture with seed and exploration pools to progressively expand training data via execution-guided experience collection. Furthermore, we design a language-aware weighting mechanism that dynamically prioritizes harder translation directions based on relative performance across sibling languages, mitigating optimization imbalance. Extensive experiments on the HumanEval-X and TransCoder-Test benchmarks demonstrate substantial improvements over baseline LLMs across all translation directions, with ablation studies validating the effectiveness of both bootstrapping and weighting components.

2512.21794 2026-04-22 cs.GT cs.AI cs.LG cs.MA econ.TH

Multi-agent Adaptive Mechanism Design

Qiushi Han, David Simchi-Levi, Renfei Tan, Zishuo Zhao

详情
英文摘要

We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Throughout the sequential game, the mechanism estimates agents' beliefs and iteratively updates a distributionally robust linear program with shrinking ambiguity sets to reduce payments while preserving truthfulness. Our mechanism guarantees truthful reporting with high probability while achieving $\tilde{O}(\sqrt{T})$ cumulative regret, and we establish a matching lower bound showing that no feasible adaptive mechanism can asymptotically do better. The framework generalizes to plug-in estimators, supporting structured priors and delayed feedback. To our knowledge, this is the first adaptive mechanism under general settings that maintains truthfulness and achieves optimal regret when incentive constraints are unknown and must be learned.

2512.16059 2026-04-22 cs.CR cs.CL

ContextLeak: Auditing Leakage in Private In-Context Learning Methods

Jacob Choi, Shuying Cao, Xingjian Dong, Amin Banayeeanzade, Wang Bill Zhu, Robin Jia, Sai Praneeth Karimireddy

详情
英文摘要

In-Context Learning (ICL) has become a standard technique for adapting Large Language Models (LLMs) to specialized tasks by supplying task-specific exemplars within the prompt. However, when these exemplars contain sensitive information, reliable privacy-preserving mechanisms are essential to prevent unintended leakage through model outputs. Many privacy-preserving methods have been proposed to protect against information leakage in this context, but there are fewer efforts on how to audit these methods. We introduce ContextLeak, the first framework to empirically measure the worst-case information leakage in ICL. ContextLeak uses canary insertion, embedding uniquely identifiable tokens in the sensitive dataset and crafting targeted queries to detect their presence. We apply ContextLeak across a range of private ICL techniques, including both heuristic prompt-based defenses and differentially private methods with formal guarantees. We show that ContextLeak reliably detects leakage across methods, and the leakage increases monotonically with the theoretical privacy budget, offering a practical signal of worst-case privacy risk. Our analysis further reveals that existing methods strike poor privacy-utility trade-offs, either completely leaking sensitive information or severely degrading performance.

2512.09427 2026-04-22 cs.AR cs.AI

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

Guoqiang Zou, Wanyu Wang, Hao Zheng, Longxiang Yin, Yinhe Han

Comments 4 pages, 6 figures

详情
英文摘要

Existing memory management techniques severely hinder efficient Large Language Model serving on accelerators constrained by poor random-access bandwidth.While static pre-allocation preserves memory contiguity,it incurs significant overhead due to worst-case provisioning.Conversely,fine-grained paging mitigates this overhead but relies on HBM's high random-access tolerance, making it unsuitable for LPDDR systems where non-sequential access rapidly degrades bandwidth. Furthermore, prior works typically assume static distributions and HBM characteristics, thereby failing to resolve the critical fragmentation and bandwidth constraints inherent to LPDDR hardware. We present ODMA, an on-demand memory allocation strategy tailored for random-access-constrained accelerators, such as the Cambricon MLU series.ODMA advances generation-length prediction by addressing two critical limitations in production workloads: (i) distribution drift that invalidates static bucket boundaries, and (ii) performance fragility under heavy-tailed request patterns. ODMA integrates a lightweight length predictor with adaptive bucket partitioning and a fallback safety pool. Bucket boundaries are dynamically recalibrated via online histograms to maximize utilization, while the safety pool ensures robustness against prediction errors. On Alpaca and Google-NQ benchmarks, ODMA improves S3's prediction accuracy from 98.60% to 99.55% and 82.68% to 93.36%, respectively. Deployment with DeepSeek-R1-Distill-Qwen-7B on Cambricon MLU370-X4 accelerators demonstrates that ODMA increases KV-cache utilization by up to 19.25% (absolute) and throughput (TPS) by 23-27% over static baselines, validating the efficacy of predictor-driven contiguous allocation for LPDDR-class devices.

2510.24738 2026-04-22 eess.SP cs.LG

StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs

Tianheng Ling, Chao Qian, Peter Zdankin, Torben Weis, Gregor Schiele

Comments 9 pages, 6 figures, 3 tables, accepted by IEEE Annual Congress on Artificial Intelligence of Things (IEEE AIoT) and got best paper award

详情
英文摘要

Running offers substantial health benefits, but improper gait patterns can lead to injuries, particularly without expert feedback. While prior gait analysis systems based on cameras, insoles, or body-mounted sensors have demonstrated effectiveness, they are often bulky and limited to offline, post-run analysis. Wrist-worn wearables offer a more practical and non-intrusive alternative, yet enabling real-time gait recognition on such devices remains challenging due to noisy Inertial Measurement Unit (IMU) signals, limited computing resources, and dependence on cloud connectivity. This paper introduces StrikeWatch, a compact wrist-worn system that performs entirely on-device, real-time gait recognition using IMU signals. As a case study, we target the detection of heel versus forefoot strikes to enable runners to self-correct harmful gait patterns through visual and auditory feedback during running. We propose four compact DL architectures (1D-CNN, 1D-SepCNN, LSTM, and Transformer) and optimize them for energy-efficient inference on two representative embedded Field-Programmable Gate Arrays (FPGAs): the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. Using our custom-built hardware prototype, we collect a labeled dataset from outdoor running sessions and evaluate all models via a fully automated deployment pipeline. Our results reveal clear trade-offs between model complexity and hardware efficiency. Evaluated across 12 participants, 6-bit quantized 1D-SepCNN achieves the highest average F1 score of 0.847 while consuming just 0.350 microjoule per inference with a latency of 0.140 ms on the iCE40UP5K running at 20 MHz. This configuration supports up to 13.6 days of continuous inference on a 320 mAh battery. All datasets and code are available in the GitHub repository https://github.com/tianheng-ling/StrikeWatch.

2510.18333 2026-04-22 cs.CR cs.CL

Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption

Yepeng Liu, Xuandong Zhao, Dawn Song, Gregory W. Wornell, Yuheng Bu

Comments ACL2026

详情
英文摘要

Despite progress in watermarking algorithms for large language models (LLMs), real-world deployment remains limited. We argue that this gap stems from misaligned incentives among LLM providers, platforms, and end users, which manifest as three key barriers: competitive risk, detection-tool governance, and attribution issues. We revisit three classes of watermarking through this lens. \emph{Model watermarking} naturally aligns with LLM provider interests, yet faces new challenges in open-source ecosystems. \emph{LLM text watermarking} offers modest provider benefit when framed solely as an anti-misuse tool, but can gain traction in narrowly scoped settings such as dataset de-contamination or user-controlled provenance. \emph{In-context watermarking} (ICW) is tailored for trusted parties, such as conference organizers or educators, who embed hidden watermarking instructions into documents. If a dishonest reviewer or student submits this text to an LLM, the output carries a detectable watermark indicating misuse. This setup aligns incentives: users experience no quality loss, trusted parties gain a detection tool, and LLM providers remain neutral by simply following watermark instructions. We advocate for a broader exploration of incentive-aligned methods, with ICW as an example, in domains where trusted parties need reliable tools to detect misuse. More broadly, we distill design principles for incentive-aligned, domain-specific watermarking and outline future research directions. Our position is that the practical adoption of LLM watermarking requires aligning stakeholder incentives in targeted application domains and fostering active community engagement.

2510.13763 2026-04-22 stat.ML cs.LG

PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference

Yang Yang, Severi Rissanen, Paul E. Chang, Nasrulloh Loka, Daolang Huang, Arno Solin, Markus Heinonen, Luigi Acerbi

Comments Accepted at ICLR 2026. Camera-ready version. 38 pages, 8 figures

详情
英文摘要

Amortized simulator-based inference offers a powerful framework for tackling Bayesian inference in computational fields such as engineering or neuroscience, increasingly leveraging modern generative methods like diffusion models to map observed data to model parameters or future predictions. These approaches yield posterior or posterior-predictive samples for new datasets without requiring further simulator calls after training on simulated parameter-data pairs. However, their applicability is often limited by the prior distribution(s) used to generate model parameters during this training phase. To overcome this constraint, we introduce PriorGuide, a technique specifically designed for diffusion-based amortized inference methods. PriorGuide leverages a novel guidance approximation that enables flexible adaptation of the trained diffusion model to new priors at test time, crucially without costly retraining. This allows users to readily incorporate updated information or expert knowledge post-training, enhancing the versatility of pre-trained inference models.

2510.09477 2026-04-22 stat.ML cs.LG

Efficient Autoregressive Inference for Transformer Probabilistic Models

Conor Hassan, Nasrulloh Loka, Cen-You Li, Daolang Huang, Paul E. Chang, Yang Yang, Francesco Silvestrin, Samuel Kaski, Luigi Acerbi

Comments Accepted at ICLR 2026. Camera-ready version. 39 pages, 20 figures

详情
英文摘要

Set-based transformer models for amortized probabilistic inference and meta-learning, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many applications require joint distributions over multiple predictions. Purely autoregressive architectures generate these efficiently but sacrifice flexible set-conditioning. Obtaining joint distributions from set-based models requires re-encoding the entire context at each autoregressive step, which scales poorly. We introduce a causal autoregressive buffer that combines the strengths of both paradigms. The model encodes the context once and caches it; a lightweight causal buffer captures dependencies among generated targets, with each new prediction attending to both the cached context and all previously predicted targets added to the buffer. This enables efficient batched autoregressive sampling and joint predictive density evaluation. Training integrates set-based and autoregressive modes through masked attention at minimal overhead. Across synthetic functions, EEG time series, a Bayesian model comparison task, and tabular regression, our method closely matches the performance of full context re-encoding while delivering up to $20\times$ faster joint sampling and density evaluation, and up to $7\times$ lower memory usage.

2510.04466 2026-04-22 physics.ao-ph cs.LG

Benchmarking atmospheric circulation variability in an AI emulator, ACE2, and a hybrid model, NeuralGCM

Ian Baxter, Hamid Pahlavan, Pedram Hassanzadeh, Katharine Rucker, Tiffany Shaw

Comments 12 pages, 4 main figures, 6 supplementary figures

详情
Journal ref
Geophysical Research Letters, 53, e2025GL119877 (2026)
英文摘要

Physics-based atmosphere-land models with prescribed sea surface temperature have notable successes but also biases in their ability to represent atmospheric variability compared to observations. Recently, AI emulators and hybrid models have emerged with the potential to overcome these biases, but still require systematic evaluation against metrics grounded in fundamental atmospheric dynamics. Here, we evaluate the representation of four atmospheric variability benchmarking metrics in a fully data-driven AI emulator (ACE2-ERA5) and hybrid model (NeuralGCM). The hybrid model and emulator can capture the spectra of large-scale tropical waves and extratropical eddy-mean flow interactions, including critical levels. However, both struggle to capture the timescales associated with quasi-biennial oscillation (QBO, $\sim 28$ months) and Southern annular mode propagation ($\sim 150$ days). These dynamical metrics serve as an initial benchmarking tool to inform AI model development and understand their limitations, which may be essential for out-of-distribution applications (e.g., extrapolating to unseen climates).

2509.18831 2026-04-22 cs.GR cs.AI cs.CV cs.LG cs.MM

Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters

Pin-Yen Chiu, I-Sheng Fang, Jun-Cheng Chen

Comments Accepted by WACV 2026. We provide more experimental results on the train-free version of our algorithm. Project page: https://textslider.github.io/ Code: https://github.com/aiiu-lab/TextSlider

详情
英文摘要

Recent advances in diffusion models have significantly improved image and video synthesis. In addition, several concept control methods have been proposed to enable fine-grained, continuous, and flexible control over free-form text prompts. However, these methods not only require intensive training time and GPU memory usage to learn the sliders or embeddings but also need to be retrained for different diffusion backbones, limiting their scalability and adaptability. To address these limitations, we introduce Text Slider, a lightweight, efficient and plug-and-play framework that identifies low-rank directions within a pre-trained text encoder, enabling continuous control of visual concepts while significantly reducing training time, GPU memory consumption, and the number of trainable parameters. Furthermore, Text Slider supports multi-concept composition and continuous control, enabling fine-grained and flexible manipulation in both image and video synthesis. We show that Text Slider enables smooth and continuous modulation of specific attributes while preserving the original spatial layout and structure of the input. Text Slider achieves significantly better efficiency: 5$\times$ faster training than Concept Slider and 47$\times$ faster than Attribute Control, while reducing GPU memory usage by nearly 2$\times$ and 4$\times$, respectively.

2509.05890 2026-04-22 quant-ph cs.AI cs.LG math-ph math.MP

Quantum spatial best-arm identification via quantum walks

Tomoki Yamagami, Etsuo Segawa, Takatomo Mihana, André Röhm, Atsushi Uchida, Ryoichi Horisaki

Comments 16 pages, 8 figures

详情
英文摘要

Quantum reinforcement learning has emerged as a framework combining quantum computation with sequential decision-making, and applications to the multi-armed bandit (MAB) problem have been reported. The graph bandit problem extends the MAB setting by introducing spatial constraints, where the accessibility of arms is restricted by graph connectivity, yet quantum approaches to this setting remain limited. In this paper, we formulate best-arm identification in graph bandits and propose a quantum algorithmic framework, termed Quantum Spatial Best-Arm Identification (QSBAI), which is applicable to general graph structures. This framework uses quantum walks to encode superpositions over graph-constrained actions, thereby extending amplitude amplification and generalizing the quantum BAI algorithm via Szegedy's walk framework. We focus our theoretical analysis on complete and bipartite graphs, deriving the maximal success probability of identifying the best arm and the time step at which it is achieved. Our results clarify how quantum-walk-based search can be adapted to structurally constrained decision problems and provide a foundation for quantum best-arm identification in graph-structured environments.

2509.03726 2026-04-22 stat.ML cs.LG

Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

Niclas Dern, Lennart Redl, Sebastian Pfister, Marcel Kollovieh, David Lüdke, Stephan Günnemann

Comments 21 pages, 4 figures

详情
英文摘要

Sampling from unnormalized target distributions, e.g.\ Boltzmann distributions $μ_{\text{target}}(x) \propto \exp(-E(x)/T)$, is fundamental to many scientific applications yet computationally challenging due to complex, high-dimensional energy landscapes. Existing approaches applying modern generative models to Boltzmann distributions either require large datasets of samples drawn from the target distribution or, when using only energy evaluations for training, cannot efficiently leverage the expressivity of advanced architectures like continuous normalizing flows that have shown promise for molecular sampling. To address these shortcomings, we introduce Energy-Weighted Flow Matching (EWFM), a novel training objective enabling continuous normalizing flows to model Boltzmann distributions using only energy function evaluations. Our objective reformulates conditional flow matching via importance sampling, allowing training with samples from arbitrary proposal distributions. Based on this objective, we develop two algorithms: iterative EWFM (iEWFM), which progressively refines proposals through iterative training, and annealed EWFM (aEWFM), which additionally incorporates temperature annealing for challenging energy landscapes. On benchmark systems, including challenging 55-particle Lennard-Jones clusters, our algorithms demonstrate sample quality competitive with established energy-only methods while requiring up to three orders of magnitude fewer energy evaluations.

2508.20467 2026-04-22 q-fin.PM cs.LG q-fin.CP

QTMRL: An Agent for Quantitative Trading Decision-Making Based on Multi-Indicator Guided Reinforcement Learning

Jingfeng Pan, Jiahao Chen

详情
英文摘要

In the highly volatile and uncertain global financial markets, traditional quantitative trading models relying on statistical modeling or empirical rules often fail to adapt to dynamic market changes and black swan events due to rigid assumptions and limited generalization. To address these issues, this paper proposes QTMRL (Quantitative Trading Multi-Indicator Reinforcement Learning), an intelligent trading agent combining multi-dimensional technical indicators with reinforcement learning (RL) for adaptive and stable portfolio management. We first construct a comprehensive multi-indicator dataset using 23 years of S&P 500 daily OHLCV data (2000-2022) for 16 representative stocks across 5 sectors, enriching raw data with trend, volatility, and momentum indicators to capture holistic market dynamics. Then we design a lightweight RL framework based on the Advantage Actor-Critic (A2C) algorithm, including data processing, A2C algorithm, and trading agent modules to support policy learning and actionable trading decisions. Extensive experiments compare QTMRL with 9 baselines (e.g., ARIMA, LSTM, moving average strategies) across diverse market regimes, verifying its superiority in profitability, risk adjustment, and downside risk control. The code of QTMRL is publicly available at https://github.com/ChenJiahaoJNU/QTMRL.git

2508.01371 2026-04-22 cs.CR cs.AI cs.ET

Prompt to Pwn: Automated Exploit Generation for Smart Contracts

ZeKe Xiao, Qin Wang, Yuekang Li, Shiping Chen

Comments ACISP2026

详情
英文摘要

Smart contracts are important for digital finance, yet they are hard to patch once deployed. Prior work has mainly explored LLMs for smart contract vulnerability detection, leaving end-to-end automated exploit generation (AEG) much less understood. We study that gap with \textsc{ReX}, an execution-grounded framework that links LLM-based exploit synthesis to the Foundry stack for end-to-end generation, compilation, execution, and validation. Five recent LLMs are evaluated across eight common vulnerability classes, supported by a curated dataset of 38{+} real incident PoCs and three automation aids: prompt refactoring, a compiler feedback loop, and templated test harnesses. Results indicate that current frontier LLMs can often produce deterministic PoCs for single-contract vulnerabilities, but remain weak on cross-contract attacks; outcomes depend mainly on the model and bug type, while code structure and prompt tuning contribute less in our setting. The study also surfaces important boundary conditions of LLM-driven AEG, including gaps between oracle-validated exploitability and real-world economic attacks, pointing to the need for stronger defenses and more realistic evaluation.

2507.21954 2026-04-22 cs.SE cs.AI

Fine-Tuning Code Language Models to Detect Cross-Language Bugs

Zengyang Li, Yimeng Li, Binbin Huang, Peng Liang, Ran Mo, Hui Liu, Yutao Ma

Comments Preprint accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM), 2026

详情
英文摘要

Multilingual programming, which involves using multiple programming languages (PLs) in a single project, is increasingly common due to its benefits. However, it introduces cross-language bugs (CLBs), which arise from interactions between different PLs and are difficult to detect by single-language bug detection tools. This paper investigates the potential of pre-trained code language models (CodeLMs) in CLB detection. We developed CLCFinder, a cross-language code identification tool, and constructed a CLB dataset involving three PL combinations (Python-C/C++, Java-C/C++, and Python-Java) with nine interaction types. We fine-tuned 13 CodeLMs on this dataset and evaluated their performance, analyzing the effects of dataset size, token sequence length, and code comments. Results show that all 13 CodeLMs exhibited varying degrees of performance improvement after fine-tuning, with UniXcoder-base achieving the best F1 score (0.7407). Notably, within our experimental setup, small CodeLMs tended to performe better than large ones. CodeLMs fine-tuned on single-language bug datasets performed poorly on CLB detection, demonstrating the distinction between CLBs and single-language bugs. Additionally, increasing the fine-tuning dataset size significantly improved performance, while longer token sequences did not necessarily improve the model performance. The impact of code comments varied across models. Some fine-tuned CodeLMs' performance was improved, while others showed degraded performance.

2507.21563 2026-04-22 cs.IR cs.LG

VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation

Minh-Anh Nguyen, Bao Nguyen, Ha Lan N. T., Tuan Anh Hoang, Duc-Trong Le, Dung D. Le

详情
英文摘要

Recommendation systems often suffer from data sparsity caused by limited user-item interactions, which degrade their performance and amplify popularity bias in real-world scenarios. This paper proposes a novel data augmentation framework that leverages Large Language Models (LLMs) and item textual descriptions to enrich interaction data. By few-shot prompting LLMs multiple times to rerank items and aggregating the results via majority voting, we generate high-confidence synthetic user-item interactions, supported by theoretical guarantees based on the concentration of measure. To effectively leverage the augmented data in the context of a graph recommendation system, we integrate it into a graph contrastive learning framework to mitigate distributional shift and alleviate popularity bias. Extensive experiments show that our method improves accuracy and reduces popularity bias, outperforming strong baselines.

2507.01918 2026-04-22 q-fin.PM cs.AI math.OC physics.data-an stat.ML

End-to-End Large Portfolio Optimization for Variance Minimization with Neural Networks through Covariance Cleaning

Christian Bongiorno, Efstratios Manolakis, Rosario Nunzio Mantegna

详情
Journal ref
The Journal of Finance and Data Science, 12, (2026) 100179
英文摘要

We develop a rotation-invariant neural network that provides the global minimum-variance portfolio by jointly learning how to lag-transform historical returns and marginal volatilities and how to regularise the eigenvalues of large equity covariance matrices. This explicit mathematical mapping offers clear interpretability of each module's role, so the model cannot be regarded as a pure black box. The architecture mirrors the analytical form of the global minimum-variance solution yet remains agnostic to dimension, so a single model can be calibrated on panels of a few hundred stocks and applied, without retraining, to one thousand US equities, a cross-sectional jump that indicates robust generalization capability. The loss function is the future short-term realized minimum variance and is optimized end-to-end on real returns. In out-of-sample tests from January 2000 to December 2024, the estimator delivers systematically lower realized volatility, smaller maximum drawdowns, and higher Sharpe ratios than the best competitors, including state-of-the-art non-linear shrinkage, and these advantages persist across both short and long evaluation horizons despite the model's training focus is short-term. Furthermore, although the model is trained end-to-end to produce an unconstrained minimum-variance portfolio, we show that its learned covariance representation can be used in general optimizers under long-only constraints with virtually no loss in its performance advantage over competing estimators. These advantages persist when the strategy is executed under a highly realistic implementation framework that models market orders at the auctions, empirical slippage, exchange fees, and financing charges for leverage, and they remain stable during episodes of acute market stress.

2506.06414 2026-04-22 cs.CR cs.AI

Benchmarking Misuse Mitigation Against Covert Adversaries

Davis Brown, Mahdi Sabbaghi, Luze Sun, Alexander Robey, George J. Pappas, Eric Wong, Hamed Hassani

详情
英文摘要

Existing language model safety evaluations focus on overt attacks and low-stakes tasks. In reality, an attacker can easily subvert existing safeguards by requesting help on small, benign-seeming tasks across many independent queries. Because the individual queries do not appear harmful, the attack is hard to detect. However, when combined, these fragments uplift misuse by helping the attacker complete hard and dangerous tasks. Toward identifying defenses against such strategies, we develop Benchmarks for Stateful Defenses (BSD), a data generation pipeline that automates evaluations of covert attacks and corresponding defenses. Using this pipeline, we curate two new datasets that are consistently refused by frontier models and are too difficult for weaker open-weight models. This enables us to evaluate decomposition attacks, which are found to be effective misuse enablers, and to highlight stateful defenses as a promising countermeasure.

2506.06315 2026-04-22 eess.SP cs.CV cs.LG

An Open-Source Python Framework and Synthetic ECG Image Datasets for Digitization, Lead and Lead Name Detection, and Overlapping Signal Segmentation

Masoud Rahimi, Reza Karbasi, Abdol-Hossein Vahabie

Comments 5 pages, 5 figures

详情
Journal ref
Journal of Medical Signals & Sensors 16(3):8, March 2026
英文摘要

We introduce an open-source Python framework for generating synthetic ECG image datasets to advance critical deep learning-based tasks in ECG analysis, including ECG digitization, lead region and lead name detection, and pixel-level waveform segmentation. Using the PTB-XL signal dataset, our proposed framework produces four open-access datasets: (1) ECG images in various lead configurations paired with time-series signals for ECG digitization, (2) ECG images annotated with YOLO-format bounding boxes for detection of lead region and lead name, (3)-(4) cropped single-lead images with segmentation masks compatible with U-Net-based models in normal and overlapping versions. In the overlapping case, waveforms from neighboring leads are superimposed onto the target lead image, while the segmentation masks remain clean. The open-source Python framework and datasets are publicly available at https://github.com/rezakarbasi/ecg-image-and-signal-dataset and https://doi.org/10.5281/zenodo.15484519, respectively.

2505.22811 2026-04-22 stat.ML cs.LG

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

Ba-Hien Tran, Van Minh Nguyen

Comments ICLR 2026 (Main Conference)

详情
英文摘要

Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques.

2505.16968 2026-04-22 cs.AR cs.AI cs.CL cs.LG cs.PL

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Ahmed Heakl, Gustavo Bertolo Stahl, Sarim Hashmi, Seung Hun Eddie Han, Mukul Ranjan, Arina Kharlamova, Salman Khan, Abdulrahman Mahmoud

Comments 20 pages, 11 figures, 5 tables

详情
Journal ref
ACL 2026
英文摘要

Cross-architecture GPU code transpilation is essential for unlocking low-level hardware portability, yet no scalable solution exists. We introduce CASS, the first dataset and model suite for source- and assembly-level GPU translation (CUDA <--> HIP, SASS <--> RDNA3). CASS contains 60k verified host-device code pairs, enabling learning-based translation across both ISA and runtime boundaries. We generate each sample using our automated pipeline that scrapes, translates, compiles, and aligns GPU programs across vendor stacks. Leveraging CASS, we train a suite of domain-specific translation models that achieve 88.2% accuracy on CUDA -> HIP and 69.1% on SASS -> RDNA3, outperforming commercial baselines including GPT-5.1, Claude-4.5, and Hipify by wide margins. Generated code matches native performance in 85% of cases, preserving both runtime and memory behavior. To support rigorous evaluation, we introduce CASS-Bench, a curated benchmark spanning 18 GPU domains with ground-truth execution. All data, models, and evaluation tools will be released as open source to support progress in GPU compiler tooling, binary compatibility, and LLM-guided code translation.

2505.09803 2026-04-22 stat.ML cs.LG

LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data

Antony Sikorski, Michael Ivanitskiy, Nathan Lenssen, Douglas Nychka, Daniel McKenzie

Comments This work has been accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS)

详情
英文摘要

In many applications, we wish to fit a parametric statistical model to a small ensemble of spatially distributed random variables ('fields'). However, parameter inference using maximum likelihood estimation (MLE) is computationally prohibitive, especially for large, non-stationary fields. Thus, many recent works train neural networks to estimate parameters given spatial fields as input, sidestepping MLE completely. In this work we focus on a popular class of parametric, spatially autoregressive (SAR) models. We make a simple yet impactful observation; because the SAR parameters can be arranged on a regular grid, both inputs (spatial fields) and outputs (model parameters) can be viewed as images. Using this insight, we demonstrate that image-to-image (I2I) networks enable faster and more accurate parameter estimation for a class of non-stationary SAR models with unprecedented complexity.

2504.09775 2026-04-22 cs.AR cs.AI cs.DC cs.LG

MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference

Abhimanyu Rajeshkumar Bambhaniya, Hanjiang Wu, Suvinay Subramanian, Sudarshan Srinivasan, Souvik Kundu, Amir Yazdanbakhsh, Midhilesh Elavazhagan, Madhu Kumar, Minlan Yu, Arijit Raychowdhury, Tushar Krishna

Comments Inference System Design for Multi-Stage AI Inference Pipelines. 11 Pages, 10 Figues, 5 Tables

详情
英文摘要

Modern LLM serving now spans multi-stage pipelines including RAG retrieval and KV cache reuse, each with distinct compute, memory, and latency demands. Inference engines expose a large configuration space with no systematic navigation methodology, and exhaustively benchmarking configurations can exceed 40K in cloud costs. Simultaneously, the hardware landscape is rapidly diversifying across AMD GPUs, TPUs, and custom ASICs, while cross-vendor prefill-decode (PD) disaggregated configurations lack unified software stacks for end-to-end evaluation today. To address this gap, we present MIST, a Heterogeneous Multi-stage LLM inference Execution Simulator. MIST models diverse request stages; including RAG, KV retrieval, reasoning, prefill, and decode across complex hardware hierarchies. MIST supports heterogeneous clients executing multiple models concurrently unlike prior frameworks while incorporating advanced batching strategies and multi-level memory hierarchies. By integrating real hardware traces with analytical modeling, MIST captures critical trade-offs such as memory bandwidth contention, inter-cluster communication latency, and batching efficiency in hybrid CPU-accelerator deployments. Through case studies, we explore the impact of reasoning stages on end-to-end latency, optimal batching strategies for hybrid pipelines, and the architectural implications of remote KV cache retrieval. MIST empowers system designers to navigate the evolving landscape of LLM inference, providing actionable insights into optimizing hardware-software co-design for next-generation AI workloads.

2504.09662 2026-04-22 cs.MA cs.AI cs.HC

AgentDynEx: Nudging the Mechanics and Dynamics of Multi-Agent Simulations

Jenny Ma, Riya Sahni, Karthik Sreedhar, Lydia B. Chilton

Comments 40 pages, 9 figures

详情
英文摘要

Multi-agent large language model simulations have the potential to model complex human behaviors and interactions. If the mechanics are set up properly, unanticipated and valuable social dynamics can surface. However, it is challenging to consistently enforce simulation mechanics while still allowing for notable and emergent dynamics. We present AgentDynEx, an AI system that helps set up simulations from user-specified mechanics and dynamics. AgentDynEx uses LLMs to guide users through a Configuration Matrix to identify core mechanics and define milestones to track dynamics. It also introduces a method called \textit{nudging}, where the system dynamically reflects on simulation progress and gently intervenes if it begins to deviate from intended outcomes. A technical evaluation found that nudging enables simulations to have more complex mechanics and maintain its notable dynamics compared to simulations without nudging. We discuss the importance of nudging as a technique for balancing mechanics and dynamics of multi-agent simulations.

2502.16156 2026-04-22 stat.ML cs.LG

A Review of Causal Decision Making

Lin Ge, Hengrui Cai, Runzhe Wan, Yang Xu, Rui Song

详情
英文摘要

To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision-making through a causal lens: 1) the discovery of causal relationships through causal structure learning, 2) understanding the impacts of these relationships through causal effect learning, and 3) applying the knowledge gained from the first two aspects to support decision making via causal policy learning. Moreover, we identify challenges that hinder the broader utilization of causal decision-making and discuss recent advances in overcoming these challenges. Finally, we provide future research directions to address these challenges and to further enhance the implementation of causal decision-making in practice, with real-world applications illustrated based on the proposed causal decision-making. We aim to offer a comprehensive methodology and practical implementation framework by consolidating various methods in this area into a Python-based collection. URL: https://causaldm.github.io/Causal-Decision-Making.

2502.10605 2026-04-22 stat.ML cs.CY cs.LG econ.EM stat.ME

Batch-Adaptive Causal Annotations

Ezinne Nwankwo, Lauri Goldkind, Angela Zhou

详情
英文摘要

Estimating the causal effects of interventions is crucial to policy and decision-making, yet outcome data are often missing or subject to non-standard measurement error. While ground-truth outcomes can sometimes be obtained through costly data annotation or follow-up, budget constraints typically allow only a fraction of the dataset to be labeled. We address this challenge by optimizing which data points should be sampled for outcome information in order to improve efficiency in average treatment effect estimation with missing outcomes. We derive a closed-form solution for the optimal batch sampling probability by minimizing the asymptotic variance of a doubly robust estimator for causal inference with missing outcomes. Motivated by our street outreach partners, we extend the framework to costly annotations of unstructured data, such as text or images in healthcare and social services. Across simulated and real-world datasets, including one of outreach interventions in homelessness services, our approach achieves substantially lower mean-squared error and recovers the AIPW estimate with fewer labels than existing baselines. In practice, we show that our method can match confidence intervals obtained with 361 random samples using only 90 optimized samples - saving 75% of the labeling budget.

2501.10258 2026-04-22 math.OC cs.LG

DADA: Dual Averaging with Distance Adaptation

Mohammad Moshtaghifar, Anton Rodomanov, Daniil Vankov, Sebastian Stich

详情
英文摘要

We present a novel universal gradient method for solving convex optimization problems. Our algorithm, Dual Averaging with Distance Adaptation (DADA), is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on observed gradients and the distance between iterates and the starting point, eliminating the need for problem-specific parameters. DADA is a universal algorithm that simultaneously works for a broad spectrum of problem classes, provided the local growth of the objective function around its minimizer can be bounded. Particular examples of such problem classes are nonsmooth Lipschitz functions, Lipschitz-smooth functions, Hölder-smooth functions, functions with high-order Lipschitz derivative, quasi-self-concordant functions, and $(L_0,L_1)$-smooth functions. Crucially, DADA is applicable to both unconstrained and constrained problems, even when the domain is unbounded, without requiring prior knowledge of the number of iterations or desired accuracy.