arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2967
2501.08667 2026-03-31 eess.IV cs.CV

TimeFlow: Temporal Conditioning for Longitudinal Brain MRI Registration and Aging Analysis

Bailiang Jian, Jiazhen Pan, Yitong Li, Fabian Bongratz, Ruochen Li, Daniel Rueckert, Benedikt Wiestler, Christian Wachinger

详情
Journal ref
IEEE Transactions on Medical Imaging 2026
英文摘要

Longitudinal brain analysis is essential for understanding healthy aging and identifying pathological deviations. Longitudinal registration of sequential brain MRI underpins such analyses. However, existing methods are limited by reliance on densely sampled time series, a trade-off between accuracy and temporal smoothness, and an inability to prospectively forecast future brain states. To overcome these challenges, we introduce \emph{TimeFlow}, a learning-based framework for longitudinal brain MRI registration. TimeFlow uses a U-Net backbone with temporal conditioning to model neuroanatomy as a continuous function of age. Given only two scans from an individual, TimeFlow estimates accurate and temporally coherent deformation fields, enabling non-linear extrapolation to predict future brain states. This is achieved by our proposed inter-/extra-polation consistency constraints applied to both the deformation fields and deformed images. Remarkably, these constraints preserve temporal consistency and continuity without requiring explicit smoothness regularizers or densely sampled sequential data. Extensive experiments demonstrate that TimeFlow outperforms state-of-the-art methods in terms of both future timepoint forecasting and registration accuracy. Moreover, TimeFlow supports novel biological brain aging analyses by differentiating neurodegenerative trajectories from normal aging without requiring segmentation, thereby eliminating the need for labor-intensive annotations and mitigating segmentation inconsistency. TimeFlow offers an accurate, data-efficient, and annotation-free framework for longitudinal analysis of brain aging and chronic diseases, capable of forecasting brain changes beyond the observed study period.

2411.15624 2026-03-31 stat.ML cs.LG stat.ME

Trans-Glasso: A Transfer Learning Approach to Precision Matrix Estimation

Boxin Zhao, Cong Ma, Mladen Kolar

Comments 58 pages, 13 figures. Accepted by the Journal of the American Statistical Association (JASA)

详情
英文摘要

Precision matrix estimation is essential in various fields; yet it is challenging when samples for the target study are limited. Transfer learning can enhance estimation accuracy by leveraging data from related source studies. We propose Trans-Glasso, a two-step transfer learning method for precision matrix estimation. First, we obtain initial estimators using a multi-task learning objective that captures shared and unique features across studies. Then, we refine these estimators through differential network estimation to adjust for structural differences between the target and source precision matrices. Under the assumption that most entries of the target precision matrix are shared with source matrices, we derive non-asymptotic error bounds and show that Trans-Glasso achieves minimax optimality under certain conditions. Extensive simulations demonstrate Trans Glasso's superior performance compared to baseline methods, particularly in small-sample settings. We further validate Trans-Glasso in applications to gene networks across brain tissues and protein networks for various cancer subtypes, showcasing its effectiveness in biological contexts. Additionally, we derive the minimax optimal rate for differential network estimation, representing the first such guarantee in this area. The Python implementation of Trans-Glasso, along with code to reproduce all experiments in this paper, is publicly available at https://github.com/boxinz17/transglasso-experiments.

2410.14843 2026-03-31 stat.ML cs.LG stat.ME

Predictive variational inference: Learn the predictively optimal posterior distribution

Jinlin Lai, Antonio Linero, Yuling Yao

详情
英文摘要

Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification. We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density such that the resulting posterior predictive distribution is as close to the true data generating process as possible, while this closeness is measured by multiple scoring rules. By optimizing the objective, the predictive variational inference is generally not the same as, or even attempting to approximate, the Bayesian posterior, even asymptotically. Rather, we interpret it as implicit hierarchical expansion. Further, the learned posterior uncertainty detects heterogeneity of parameters among the population, enabling automatic model diagnosis. This framework applies to both likelihood-exact and likelihood-free models. We demonstrate its application in real data examples.

2409.15675 2026-03-31 cond-mat.mtrl-sci cs.LG physics.comp-ph

The Northeast Materials Database for Magnetic Materials

Suman Itani, Yibo Zhang, Jiadong Zang

详情
Journal ref
Nat. Commun. 16, 9415 (2025)
英文摘要

The discovery of magnetic materials with high operating temperature ranges and optimized performance is essential for advanced applications. Current data-driven approaches are limited by the lack of accurate, comprehensive, and feature-rich databases. This study aims to address this challenge by using Large Language Models (LLMs) to create a comprehensive, experiment-based, magnetic materials database named the Northeast Materials Database (NEMAD), which consists of 67,573 magnetic materials entries(www.nemad.org). The database incorporates chemical composition, magnetic phase transition temperatures, structural details, and magnetic properties. Enabled by NEMAD, we trained machine learning models to classify materials and predict transition temperatures. Our classification model achieved an accuracy of 90% in categorizing materials as ferromagnetic (FM), antiferromagnetic (AFM), and non-magnetic (NM). The regression models predict Curie (Néel) temperature with a coefficient of determination (R2) of 0.87 (0.83) and a mean absolute error (MAE) of 56K (38K). These models identified 25 (13) FM (AFM) candidates with a predicted Curie (Néel) temperature above 500K (100K) from the Materials Project. This work shows the feasibility of combining LLMs for automated data extraction and machine learning models to accelerate the discovery of magnetic materials.

2408.15625 2026-03-31 eess.SY cs.AI cs.CL cs.SY

CBF-LLM: Safe Control for LLM Alignment

Yuya Miyaoka, Masaki Inoue

详情
Journal ref
IEEE Transactions on Control Systems Technology 2026
英文摘要

This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the safety filter, designed based on the CBF, to the output generation of the baseline LLM, i.e., the sequence of the token, with the aim of intervening in the generated text. The overall text-generation system is implemented with Llama 3 and a RoBERTa model, and the source code is available at https://github.com/Mya-Mya/CBF-LLM. The experiment demonstrates its control ability and effectiveness in reducing the number of interventions needed for user-specified alignment tasks.

2306.05494 2026-03-31 cs.CR cs.LG cs.NI

Evasion Adversarial Attacks Remain Impractical Against ML-based Network Intrusion Detection Systems, Especially Dynamic Ones

Mohamed elShehaby, Ashraf Matrawy

详情
英文摘要

Machine Learning (ML) has become pervasive, and its deployment in Network Intrusion Detection Systems (NIDS) is inevitable due to its automated nature and high accuracy compared to traditional models in processing and classifying large volumes of data. However, ML has been found to have several flaws, most importantly, adversarial attacks, which aim to trick ML models into producing faulty predictions. While most adversarial attack research focuses on computer vision datasets, recent studies have explored the suitability of these attacks against ML-based network security entities, especially NIDS, due to the wide difference between different domains regarding the generation of adversarial attacks. To further explore the practicality of adversarial attacks against ML-based NIDS in-depth, this paper presents several key contributions: identifying numerous practicality issues for evasion adversarial attacks on ML-NIDS using an attack tree threat model, introducing a taxonomy of practicality issues associated with adversarial attacks against ML-based NIDS, identifying specific leaf nodes in our attack tree that demonstrate some practicality for real-world implementation and conducting a comprehensive review and exploration of these potentially viable attack approaches, and investigating how the dynamicity of real-world ML models affects evasion adversarial attacks against NIDS. Our experiments indicate that continuous re-training, even without adversarial training, can reduce the effectiveness of adversarial attacks. While adversarial attacks can compromise ML-based NIDSs, our aim is to highlight the significant gap between research and real-world practicality in this domain, which warrants attention.

2603.27745 2026-03-31 cs.SE cs.AI

Needle in the Repo: A Benchmark for Maintainability in AI-Generated Repository Edits

Haichao Zhu, Qian Zhang, Jiyuan Wang, Zhaorui Yang, Yuxin Qiu

Comments 16 pages, 6 figures

详情
英文摘要

AI coding agents can now complete complex programming tasks, but existing evaluations largely emphasize behavioral correctness and often overlook maintainability risks such as weak modularity or testability. We present Needle in the Repo (NITR), a diagnostic probe-and-oracle framework for evaluating whether behaviorally correct repository edits preserve maintainable structure. NITR distills recurring software engineering wisdom into controlled probes embedded in small, realistic multi-file codebases, each designed so that success depends primarily on one targeted maintainability dimension. Each probe is paired with a hidden evaluation harness that combines functional tests for required behavior with structural oracles that encode the targeted maintainability constraint and return interpretable diagnoses. Using NITR, we evaluate 23 coding configurations across GPT, Claude, Gemini, and Qwen families in both direct-inference and agent-based settings. Current AI coding systems remain far from robust: on average, configurations solve only 36.2% of cases, the best reaches 57.1%, and performance drops from 53.5% on micro cases to 20.6% on multi-step cases. The hardest pressures are architectural rather than local edits, especially dependency control (4.3%) and responsibility decomposition (15.2%). Moreover, 64/483 outcomes (13.3%) pass all functional tests yet fail the structural oracle. Under our harness, agent-mode configurations improve average performance from 28.2% to 45.0%, but do not eliminate these architectural failures. These results show that progress in code generation is not yet progress in maintainable code evolution, and that NITR exposes a critical failure surface missed by conventional evaluation.

2603.27743 2026-03-31 stat.ME cs.LG

Empirical Likelihood for Nonsmooth Functionals

Hongseok Namkoong

详情
英文摘要

Empirical likelihood is an attractive inferential framework that respects natural parameter boundaries, but existing approaches typically require smoothness of the functional and miscalibrate substantially when these assumptions are violated. For the optimal-value functional central to policy evaluation, smoothness holds only when the optimum is unique -- a condition that fails exactly when rigorous inference is most needed where more complex policies have modest gains. In this work, we develop a bootstrap empirical likelihood method for partially nonsmooth functionals. Our analytic workhorse is a geometric reduction of the profile likelihood to the distance between the score mean and a level set whose shape (a tangent cone given by nonsmoothness patterns) determines the asymptotic distribution. Unlike the classical proof technology based on Taylor expansions on the dual optima, our geometric approach leverages properties of a deterministic convex program and can directly apply to nonsmooth functionals. Since the ordinary bootstrap is not valid in the presence of nonsmoothness, we derive a corrected multiplier bootstrap approach that adapts to the unknown level-set geometry.

2603.27741 2026-03-31 astro-ph.GA cs.CV

Segmenting Superbubbles in a Simulated Multiphase Interstellar Medium using Computer Vision

Jing-Wen Chen, Alex S. Hill, Anna Ordog, Rebecca A. Booth, Mohamed S. Shehata

详情
英文摘要

We developed a computer vision-based methodology to achieve precise 3D segmentation and tracking of superbubbles within magnetohydrodynamic simulations of the supernova-driven interstellar medium. Leveraging advanced 3D transformer models, our approach effectively captures the complex morphology and dynamic evolution of these astrophysical structures. To demonstrate the technique, we specifically focused on a superbubble exhibiting interesting interactions with its surrounding medium, driven by a series of successive supernova explosions. Our model successfully generated detailed 3D segmentation masks, enabling us to visualize and analyze the bubble's structural evolution over time. The results reveal insights into the superbubble's growth patterns, energy retention, and interactions with surrounding interstellar matter. This interdisciplinary approach not only enhances our understanding of superbubble dynamics but also offers a robust framework for investigating other complex phenomena in the cosmos.

2603.27727 2026-03-31 physics.ins-det cs.AI hep-ex

Suppression of $^{14}\mathrm{C}$ photon hits in large liquid scintillator detectors via spatiotemporal deep learning

Junle Li, Zhaoxiang Wu, Guanda Gong, Zhaohan Li, Wuming Luo, Jiahui Wei, Wenxing Fang, Hehe Fan

Comments 14 pages, 11 figures

详情
英文摘要

Liquid scintillator detectors are widely used in neutrino experiments due to their low energy threshold and high energy resolution. Despite the tiny abundance of $^{14}$C in LS, the photons induced by the $β$ decay of the $^{14}$C isotope inevitably contaminate the signal, degrading the energy resolution. In this work, we propose three models to tag $^{14}$C photon hits in $e^+$ events with $^{14}$C pile-up, thereby suppressing its impact on the energy resolution at the hit level: a gated spatiotemporal graph neural network and two Transformer-based models with scalar and vector charge encoding. For a simulation dataset in which each event contains one $^{14}$C and one $e^+$ with kinetic energy below 5 MeV, the models achieve $^{14}$C recall rates of 25%-48% while maintaining $e^+$ to $^{14}$C misidentification below 1%, leading to a large improvement in the resolution of total charge for events where $e^+$ and $^{14}$C photon hits strongly overlap in space and time.

2603.27716 2026-03-31 cs.NE cs.AI q-bio.NC

The role of neuromorphic principles in the future of biomedicine and healthcare

Grace M. Hwang, Jessica D. Falcone, Joseph D. Monaco, Courtney R. Pinard, Jessica A. Mollick, Roger L. Miller, Stephanie L. Gage, Andrey V. Kanaev, Margaret Kim, R. Ale Lukaszew, Steven M. Zehnder, David Rampulla

Comments 56 pages; 1 figure

详情
英文摘要

Neuromorphic engineering has matured over the past four decades and is currently experiencing explosive growth with the potential to transform biomedical engineering and neurotechnologies. Participants at the Neuromorphic Principles in Biomedicine and Healthcare (NPBH) Workshop (October 2024) -- representing a broad cross-section of the community, including early-career and established scholars, engineers, scientists, clinicians, industry, and funders -- convened to discuss the state of the field, current and future challenges, and strategies for advancing neuromorphic research and development for biomedical applications. Publicly approved recordings with transcripts (https://2024.neuro-med.org/program/session-video-and-transcripts) and slides (https://2024.neuro-med.org/program/session-slides) can be found at the workshop website.

2603.27672 2026-03-31 stat.ML cs.LG

Energy Score-Guided Neural Gaussian Mixture Model for Predictive Uncertainty Quantification

Yang Yang, Chunlin Ji, Haoyang Li, Ke Deng

Comments 39 pages, 5 figures

详情
英文摘要

Quantifying predictive uncertainty is essential for real world machine learning applications, especially in scenarios requiring reliable and interpretable predictions. Many common parametric approaches rely on neural networks to estimate distribution parameters by optimizing the negative log likelihood. However, these methods often encounter challenges like training instability and mode collapse, leading to poor estimates of the mean and variance of the target output distribution. In this work, we propose the Neural Energy Gaussian Mixture Model (NE-GMM), a novel framework that integrates Gaussian Mixture Model (GMM) with Energy Score (ES) to enhance predictive uncertainty quantification. NE-GMM leverages the flexibility of GMM to capture complex multimodal distributions and leverages the robustness of ES to ensure well calibrated predictions in diverse scenarios. We theoretically prove that the hybrid loss function satisfies the properties of a strictly proper scoring rule, ensuring alignment with the true data distribution, and establish generalization error bounds, demonstrating that the model's empirical performance closely aligns with its expected performance on unseen data. Extensive experiments on both synthetic and real world datasets demonstrate the superiority of NE-GMM in terms of both predictive accuracy and uncertainty quantification.

2603.27630 2026-03-31 cs.AR cs.LG

RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning

Xinyu Zhang, Zhiteng Chao, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jianan Mu, Jing Justin Ye, Huawei Li

Comments 8 pages, 6 figures

详情
英文摘要

Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifiable high-quality data limits both accuracy and diversity. Existing post-training typically produces a single HDL implementation per specification, lacking awareness of RTL variations needed for different design goals. We propose RTLSeek, a post-training paradigm that applies rule-based Diversity-Oriented Reinforcement Learning to improve RTL correctness and diversity. Our Diversity-Centric Multi-Objective Reward Scheduling integrates expert knowledge with EDA feedback, and a three-stage framework maximizes the utility of limited data. Experiments on the RTLLM benchmark show that RTLSeek surpasses prior methods, with ablation results confirming that encouraging broader design-space exploration improves RTL quality and achieves the principle of "the more generated, the better results." Implementation framework, including the dataset, source code, and model weights, is shown at https://anonymous.4open.science/r/DAC2026ID71-ACB4/.

2603.27624 2026-03-31 cs.AR cs.AI

Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory Scheduling

Songchen Ma, Hongyi Li, Weihao Zhang, Yonghao Tan, Pingcheng Dong, Yu Liu, Lan Liu, Yuzhong Jiao, Xuejiao Liu, Luhong Liang, Kwang-Ting Cheng

详情
英文摘要

Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloading further incurs off-chip memory access bottlenecks. Moreover, MoE sparsity and dynamic gating shift distributed strategies toward much finer granularity and introduce runtime scheduling considerations. Recently, high die-to-die bandwidth chiplet interconnects have created new opportunities for multi-chiplet systems to address workload imbalance and offloading bottlenecks with fine-grained scheduling. In this paper, we propose Fully Sharded Expert Data Parallelism, a parallelization paradigm specifically architected for low-batch MoE inference on multi-chiplet accelerators. FSE-DP attains adaptive computation-communication overlap and balanced load by orchestrating fine-grained, complementary expert streams along dynamic trajectories across high-bandwidth D2D links. The attendant dataflow complexity is tamed by a minimal, hardware-amenable set of virtualization rules and a lightweight scheduling algorithm. Our approach achieves 1.22 to 2.00 times speedup over state-of-the-art baselines and saves up to 78.8 percent on-chip memory.

2603.27563 2026-03-31 cs.HC cs.AI

InnerPond: Fostering Inter-Self Dialogue with a Multi-Agent Approach for Introspection

Hayeon Jeon, Dakyeom Ahn, Sunyu Pang, Yunseo Choi, Suhwoo Yoon, Joonhwan Lee, Eun-mee Kim, Hajin Lim

Comments 25 pages, 10 figures, Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)

详情
英文摘要

Introspection is central to identity construction and future planning, yet most digital tools approach the self as a unified entity. In contrast, Dialogical Self Theory (DST) views the self as composed of multiple internal perspectives, such as values, concerns, and aspirations, that can come into tension or dialogue with one another. Building on this view, we designed InnerPond, a research probe in the form of a multi-agent system that represents these internal perspectives as distinct LLM-based agents for introspection. Its design was shaped through iterative explorations of spatial metaphors, interaction scaffolding, and conversational orchestration, culminating in a shared spatial environment for organizing and relating multiple inner perspectives. In a user study with 17 young adults navigating career choices, participants engaged with the probe by co-creating inner voices with AI, composing relational inner landscapes, and orchestrating dialogue as observers and mediators, offering insight into how such systems could support introspection. Overall, this work offers design implications for AI-supported introspection tools that enable exploration of the self's multiplicity.

2603.27550 2026-03-31 cs.HC cs.AI

Drag or Traction: Understanding How Designers Appropriate Friction in AI Ideation Outputs

A. Baki Kocaballi, Joseph Kizana, Sharon Stein, Simon Buckingham Shum

Comments Paper accepted to ACM CHI Workshop on Tools for Thought, 2026

详情
英文摘要

Seamless AI presents output as a finished, polished product that users consume rather than shape. This risks design fixation: users anchor on AI suggestions rather than generating their own ideas. We propose Generative Friction, which introduces intentional disruptions to AI output (fragmentation, delay, ambiguity) designed to transform it from finished product into semi-finished material, inviting human contribution rather than passive acceptance. In a qualitative study with six designers, we identified the different ways in which designers appropriated the different types of friction: users mined keywords from broken text, used delays as workspace for independent thought, and solved metaphors as creative puzzles. However, this transformation was not universal, motivating the concept of Friction Disposition, a user's propensity to interpret resistance as invitation rather than obstruction. Grounded in tolerance for ambiguity and pre-existing workflow orientation, Friction Disposition emerged as a potential moderator: high-disposition users treated friction as "liberating," while low-disposition users experienced drag. We contribute the concept of Generative Friction as distinct from Protective Friction, with design implications for AI tools that counter fixation while preserving agency.

2603.27541 2026-03-31 cs.NE cs.AI

A Novel Immune Algorithm for Multiparty Multiobjective Optimization

Kesheng Chen, Wenjian Luo, Qi Zhou, Yujiang liu, Peilan Xu, Yuhui Shi

Comments \c{opyright} 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情
Journal ref
IEEE Transactions on Emerging Topics in Computational Intelligence, 9(2), 1238-1252 (2025)
英文摘要

Traditional multiobjective optimization problems (MOPs) are insufficiently equipped for scenarios involving multiple decision makers (DMs), which are prevalent in many practical applications. These scenarios are categorized as multiparty multiobjective optimization problems (MPMOPs). For MPMOPs, the goal is to find a solution set that is as close to the Pareto front of each DM as much as possible. This poses challenges for evolutionary algorithms in terms of searching and selecting. To better solve MPMOPs, this paper proposes a novel approach called the multiparty immune algorithm (MPIA). The MPIA incorporates an inter-party guided crossover strategy based on the individual's non-dominated sorting ranks from different DM perspectives and an adaptive activation strategy based on the proposed multiparty cover metric (MCM). These strategies enable MPIA to activate suitable individuals for the next operations, maintain population diversity from different DM perspectives, and enhance the algorithm's search capability. To evaluate the performance of MPIA, we compare it with ordinary multiobjective evolutionary algorithms (MOEAs) and state-of-the-art multiparty multiobjective optimization evolutionary algorithms (MPMOEAs) by solving synthetic multiparty multiobjective problems and real-world biparty multiobjective unmanned aerial vehicle path planning (BPUAV-PP) problems involving multiple DMs. Experimental results demonstrate that MPIA outperforms other algorithms.

2603.27539 2026-03-31 cs.MA cs.AI cs.CE

Toward Reliable Evaluation of LLM-Based Financial Multi-Agent Systems: Taxonomy, Coordination Primacy, and Cost Awareness

Phat Nguyen, Thang Pham

Comments Accepted at the DMO-FinTech Workshop, PAKDD 2026, Hong Kong

详情
英文摘要

Multi-agent systems based on large language models (LLMs) for financial trading have grown rapidly since 2023, yet the field lacks a shared framework for understanding what drives performance or for evaluating claims credibly. This survey makes three contributions. First, we introduce a four-dimensional taxonomy, covering architecture pattern, coordination mechanism, memory architecture, and tool integration; applied to 12 multi-agent systems and two single-agent baselines. Second, we formulate the Coordination Primacy Hypothesis (CPH): inter-agent coordination protocol design is a primary driver of trading decision quality, often exerting greater influence than model scaling. CPH is presented as a falsifiable research hypothesis supported by tiered structural evidence rather than as an empirically validated conclusion; its definitive validation requires evaluation infrastructure that does not yet exist in the field. Third, we document five pervasive evaluation failures (look-ahead bias, survivorship bias, backtesting overfitting, transaction cost neglect, and regime-shift blindness) and show that these can reverse the sign of reported returns. Building on the CPH and the evaluation critique, we introduce the Coordination Breakeven Spread (CBS), a metric for determining whether multi-agent coordination adds genuine value net of transaction costs, and propose minimum evaluation standards as prerequisites for validating the CPH.

2603.27524 2026-03-31 cs.SE cs.AI

Safer Builders, Risky Maintainers: A Comparative Study of Breaking Changes in Human vs Agentic PRs

K M Ferdous, Dipayan Banik, Kowshik Chowdhury, Shazibul Islam Shamim

Comments Accepted at 23rd International Conference on Mining Software Repositories (MSR), 2026

详情
英文摘要

AI coding agents are increasingly integrated into modern software engineering workflows, actively collaborating with human developers to create pull requests (PRs) in open-source repositories. Although coding agents improve developer productivity, they often generate code with more bugs and security issues than human-authored code. While human-authored PRs often break backward compatibility, leading to breaking changes, the potential for agentic PRs to introduce breaking changes remains underexplored. The goal of this paper is to help developers and researchers evaluate the reliability of AI-generated PRs by examining the frequency and task contexts in which AI agents introduce breaking changes. We conduct a comparative analysis of 7,191 agent-generated PRs with 1402 human-authored PRs from Python repositories in the AIDev dataset. We develop a tool that analyzes code changes in commits corresponding to the agentic PRs and leverages an abstract syntax tree (AST) based analysis to detect potential breaking changes. Our findings show that AI agents introduce fewer breaking changes overall than humans (3.45% vs. 7.40%) in code generation tasks. However, agents exhibit substantially higher risk during maintenance tasks, with refactoring and chore changes introducing breaking changes at rates of 6.72% and 9.35%, respectively. We also identify a "Confidence Trap" where highly confident agentic PRs still introduce breaking changes, indicating the need for stricter review during maintenance oriented changes regardless of reported confidence score.

2603.27462 2026-03-31 cs.DS cs.LG cs.PF

RSR-core: A High-Performance Engine for Low-Bit Matrix-Vector Multiplication

Mohsen Dehghankar, Abolfazl Asudeh

详情
英文摘要

Matrix-vector multiplication is a fundamental building block in neural networks, vector databases, and large language models, particularly during inference. As a result, efficient matrix-vector multiplication engines directly translate into more efficient inference. Recent work has explored low-bit quantization of model weights, where matrices are represented using binary (1-bit) or ternary (1.58-bit) values while activation is kept in higher precision. These representations enable efficient hardware-level computation. In parallel, algorithms such as Redundant Segment Reduction (RSR) provide theoretical guarantees for accelerating low-bit matrix-vector multiplication. However, existing implementations operate at the application level and cannot be efficiently integrated into hardware kernels, limiting practical performance. To bridge this gap, we present RSR-core, a high-performance engine that implements the RSR algorithm as optimized low-level kernels for both CPU and CUDA environments. RSR-core supports efficient matrix-vector multiplication for binary and ternary weight matrices and general vectors while enabling practical deployment of RSR algorithm in real inference pipelines. RSR-core is provided as a production-ready engine with HuggingFace integration for preprocessing low-bit models and running accelerated inference. Experimental results demonstrate significant performance improvements over baseline HuggingFace PyTorch multiplication, achieving up to 62x speedup on CPU and up to 1.9x speedup for token generation on CUDA for popular ternary LLMs. The source code is publicly available at https://github.com/UIC-InDeXLab/RSR-core.

2603.27414 2026-03-31 math.ST cs.AI stat.TH

Multiple-Prediction-Powered Inference

Charlie Cowen-Breen, Alekh Agarwal, Stephen Bates, William W. Cohen, Jacob Eisenstein, Amir Globerson, Adam Fisch

Comments ICLR 2026, 45 pages, 17 figures

详情
英文摘要

Statistical estimation often involves tradeoffs between expensive, high-quality measurements and a variety of lower-quality proxies. We introduce Multiple-Prediction-Powered Inference (MultiPPI): a general framework for constructing statistically efficient estimates by optimally allocating resources across these diverse data sources. This work provides theoretical guarantees about the minimax optimality, finite-sample performance, and asymptotic normality of the MultiPPI estimator. Through experiments across three diverse large language model (LLM) evaluation scenarios, we show that MultiPPI consistently achieves lower estimation error than existing baselines. This advantage stems from its budget-adaptive allocation strategy, which strategically combines subsets of models by learning their complex cost and correlation structures.

2603.27413 2026-03-31 cs.CY cs.AI

The Hidden Costs of AI-Mediated Political Outreach: Persuasion and AI Penalties in the US and UK

Andreas Jungherr, Adrian Rauchfleisch

详情
英文摘要

As AI-enabled systems become available for political campaign outreach, an important question has received little empirical attention: how do people evaluate the communicative practices these systems represent, and what consequences do those evaluations carry? Most research on AI-enabled persuasion examines attitude change under enforced exposure, leaving aside whether people regard AI-mediated outreach as legitimate or not. We address this gap with a preregistered 2x2 experiment conducted in the United States and United Kingdom (N = 1,800 per country) varying outreach intent (informational vs.~persuasive) and type of interaction partner (human vs.~AI-mediated) in the context of political issues that respondents consider highly important. We find consistent evidence for two evaluation penalties. A persuasion penalty emerges across nearly all outcomes in both countries: explicitly persuasive outreach is evaluated as less acceptable, more threatening to personal autonomy, less beneficial, and more damaging to organizational trust than informational outreach, consistent with reactance to perceived threats to attitudinal freedom. An AI penalty is consistent with a distinct mechanism: AI-mediated outreach triggers normative concerns about appropriate communicative agents, producing similarly negative evaluations across five outcomes in both countries. As automated outreach becomes more widespread, how people judge it may matter for democratic communication just as much as whether it changes minds.

2603.27410 2026-03-31 q-bio.NC cs.AI cs.CV

Grounding Social Perception in Intuitive Physics

Lance Ying, Aydan Y. Huang, Aviv Netanyahu, Andrei Barbu, Boris Katz, Joshua B. Tenenbaum, Tianmin Shu

Comments 26 pages, 11 figures

详情
英文摘要

People infer rich social information from others' actions. These inferences are often constrained by the physical world: what agents can do, what obstacles permit, and how the physical actions of agents causally change an environment and other agents' mental states and behavior. We propose that such rich social perception is more than visual pattern matching, but rather a reasoning process grounded in an integration of intuitive psychology with intuitive physics. To test this hypothesis, we introduced PHASE (PHysically grounded Abstract Social Events), a large dataset of procedurally generated animations, depicting physically simulated two-agent interactions on a 2D surface. Each animation follows the style of the Heider and Simmel movie, with systematic variation in environment geometry, object dynamics, agent capacities, goals, and relationships (friendly/adversarial/neutral). We then present a computational model, SIMPLE, a physics-grounded Bayesian inverse planning model that integrates planning, probabilistic planning, and physics simulation to infer agents' goals and relations from their trajectories. Our experimental results showed that SIMPLE achieved high accuracy and agreement with human judgments across diverse scenarios, while feedforward baseline models -- including strong vision-language models -- and physics-agnostic inverse planning failed to achieve human-level performance and did not align with human judgments. These results suggest that our model provides a computational account for how people understand physically grounded social scenes by inverting a generative model of physics and agents.

2603.27376 2026-03-31 cs.HC cs.AI

Where Does AI Leave a Footprint? Children's Reasoning About AI's Environmental Costs

Aayushi Dangol, Robert Wolfe, Nisha Devasia, Mitsuka Kiyohara, Jason Yip, Julie A. Kientz

详情
英文摘要

Two of the most socially consequential issues facing today's children are the rise of artificial intelligence (AI) and the rapid changes to the earth's climate. Both issues are complex and contested, and they are linked through the notable environmental costs of AI use. Using a systems thinking framework, we developed an interactive system called Ecoprompt to help children reason about the environmental impact of AI. EcoPrompt combines a prompt-level environmental footprint calculator with a simulation game that challenges players to reason about the impact of AI use on natural resources that the player manages. We evaluated the system through two participatory design sessions with 16 children ages 6-12. Our findings surfaced children's perspectives on societal and environmental tradeoffs of AI use, as well as their sense of agency and responsibility. Taken together, these findings suggest opportunities for broadening AI literacy to include systems-level reasoning about AI's environmental impact.

2603.27357 2026-03-31 eess.IV cs.AI cs.CV

Guided Lensless Polarization Imaging

Noa Kraicer, Erez Yosef, Raja Giryes

详情
英文摘要

Polarization imaging captures the polarization state of light, revealing information invisible to the human eye yet valuable in domains such as biomedical diagnostics, autonomous driving, and remote sensing. However, conventional polarization cameras are often expensive, bulky, or both, limiting their practical use. Lensless imaging offers a compact, low-cost alternative by replacing the lens with a simple optical element like a diffuser and performing computational reconstruction, but existing lensless polarization systems suffer from limited reconstruction quality. To overcome these limitations, we introduce a RGB-guided lensless polarization imaging system that combines a compact polarization-RGB sensor with an auxiliary, widely available conventional RGB camera providing structural guidance. We reconstruct multi-angle polarization images for each RGB color channel through a two-stage pipeline: a physics-based inversion recovers an initial polarization image, followed by a Transformer-based fusion network that refines this reconstruction using the RGB guidance image from the conventional RGB camera. Our two-stage method significantly improves reconstruction quality and fidelity over lensless-only baselines, generalizes across datasets and imaging conditions, and achieves high-quality real-world results on our physical prototype lensless camera without any fine-tuning.

2603.27342 2026-03-31 eess.AS cs.SD

SHroom: A Python Framework for Ambisonics Room Acoustics Simulation and Binaural Rendering

Yhonatan Gayer

详情
英文摘要

Spherical Harmonics ROOM), an open-source Python library for room acoustics simulation using Ambisonics, available at https://github.com/Yhonatangayer/shroom and installable via \texttt{pip install pyshroom}. \textbf{shroom} projects image-source contributions onto a Spherical Harmonics (SH) basis, yielding a composable pipeline for binaural decoding, spherical array simulation, and real-time head rotation. Benchmarked against \texttt{pyroomacoustics} with an $N=30$ reference, \textbf{shroom} with Magnitude Least Squares (MagLS) achieves perceptual transparency (2.02~dB Log Spectral Distance (LSD) at $N=5$, within the 1--2~dB Just Noticeable Difference (JND)) while its fixed-once decode amortises over multiple sources ($K=1$-to-$8$: slowdown narrows from $7\times$ to $3.1\times$). For dynamic head rotation, \textbf{shroom} applies a Wigner-D multiply at $<1$~ms/frame, making it the only architecturally viable real-time choice.

2603.27333 2026-03-31 cs.SE cs.AI

ComBench: A Repo-level Real-world Benchmark for Compilation Error Repair

Jia Li, Zeyang Zhuang, Zhuangbin Chen, Yuxin Su, Wei Meng, Michael R. Lyu

详情
英文摘要

Compilation errors pose pervasive and critical challenges in software development, significantly hindering productivity. Therefore, Automated Compilation Error Repair (ACER) techniques are proposed to mitigate these issues. Despite recent advancements in ACER, its real-world performance remains poorly evaluated. This can be largely attributed to the limitations of existing benchmarks, \ie decontextualized single-file data, lack of authentic source diversity, and biased local task modeling that ignores crucial repository-level complexities. To bridge this critical gap, we propose ComBench, the first repository-level, reproducible real-world benchmark for C/C++ compilation error repair. ComBench is constructed through a novel, automated framework that systematically mines real-world failures from the GitHub CI histories of large-scale open-source projects. Our framework contributes techniques for the high-precision identification of ground-truth repair patches from complex version histories and a high-fidelity mechanism for reproducing the original, ephemeral build environments. To ensure data quality, all samples in ComBench are execution-verified -- guaranteeing reproducible failures and build success with ground-truth patches. Using ComBench, we conduct a comprehensive evaluation of 12 modern LLMs under both direct and agent-based repair settings. Our experiments reveal a significant gap between a model's ability to achieve syntactic correctness (a 73% success rate for GPT-5) and its ability to ensure semantic correctness (only 41% of its patches are valid). We also find that different models exhibit distinct specializations for different error types. ComBench provides a robust and realistic platform to guide the future development of ACER techniques capable of addressing the complexities of modern software development.

2603.27296 2026-03-31 cs.SE cs.AI

A Multi-agent AI System for Deep Learning Model Migration from TensorFlow to JAX

Stoyan Nikolov, Bernhard Konrad, Moritz Gronbach, Niket Kumar, Ann Yan, Varun Singh, Yaning Liang, Parthasarathy Ranganathan

详情
英文摘要

The rapid development of AI-based products and their underlying models has led to constant innovation in deep learning frameworks. Google has been pioneering machine learning usage across dozens of products. Maintaining the multitude of model source codes in different ML frameworks and versions is a significant challenge. So far the maintenance and migration work was done largely manually by human experts. We describe an AI-based multi-agent system that we built to support automatic migration of TensorFlow-based deep learning models into JAX-based ones. We make three main contributions: First, we show how an AI planner that uses a mix of static analysis with AI instructions can create migration plans for very complex code components that are reliably followed by the combination of an orchestrator and coders, using AI-generated example-based playbooks. Second, we define quality metrics and AI-based judges that accelerate development when the code to evaluate has no tests and has to adhere to strict style and dependency requirements. Third, we demonstrate how the system accelerates code migrations in a large hyperscaler environment on commercial real-world use-cases. Our approach dramatically reduces the time (6.4x-8x speedup) for deep learning model migrations and creates a virtuous circle where effectively AI supports its own development workflow. We expect that the techniques and approaches described here can be generalized for other framework migrations and general code transformation tasks.

2603.27295 2026-03-31 cs.HC cs.AI

Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

Chitralekha Gupta, Jing Peng, Ashwin Ram, Shreyas Sridhar, Christophe Jouffrais, Suranga Nanayakkara

Comments Accepted in CHI 2026

详情
英文摘要

Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app "in-the-wild" study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids, addressing the aesthetic needs of BLV users.

2603.27288 2026-03-31 physics.ao-ph cs.LG

StretchCast: Global-Regional AI Weather Forecasting on Stretched Cubed-Sphere Mesh

Jin Feng

详情
英文摘要

Global AI weather forecasting still relies mainly on uniform-resolution models, making it hard to combine regional refinement, two-way regional-global coupling, and affordable training cost. We introduce StretchCast, a global-regional AI forecasting framework built on a variable-resolution stretched cubed-sphere (SCS) mesh that preserves a closed global domain while concentrating resolution over a target region. Within this framework, we develop a one-step predictor, SCS_Base Model, and a rollout-oriented multistep predictor, SCS_FCST4 Model, to test the feasibility of SCS-based forecasting and the benefit of joint multistep training. Experiments use ERA5 with 69 variables over 1998-2022. Because training compute remains limited, this study uses a coarse-resolution proof-of-concept configuration rather than a final high-resolution system. Even with only about 7,776 effective global grid cells and roughly 0.875 degree resolution over the center-refined face, the 23M-parameter SCS_Base Model yields stable multivariate forecasts. With 83M parameters and training cost on the order of hours, SCS_FCST4 Model delivers competitive medium-range anomaly-correlation evolution over the target region after unified reprojection, especially for geopotential height, specific humidity, and part of the lower-tropospheric winds, while maintaining smooth cross-face continuity and realistic multiscale structure in typhoon and spectral analyses. These results support StretchCast as a practical lightweight foundation for global-regional AI weather forecasting.