arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1800
2603.08760 2026-03-11 cs.CY cs.AI cs.CR

Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases

Shaun Feakins, Ibrahim Habli, Phillip Morgan

Comments Full paper presented at the International Association of Safe and Ethical AI 2026 Conference (IASEAI 26). 10 pages, 8 figures

详情
英文摘要

This paper contributes to the nascent debate around safety cases for frontier AI systems. Safety cases are structured, defensible arguments that a system is acceptably safe to deploy in a given context. Historically, they have been used in safety-critical industries, such as aerospace, nuclear or automotive. As a result, safety cases for frontier AI have risen in prominence, both in the safety policies of leading frontier developers and in international research agendas proposed by leaders in generative AI, such as the Singapore Consensus on Global AI Safety Research Priorities and the International AI Safety Report. This paper appraises this work. We note that research conducted within the alignment community which draws explicitly on lessons from the assurance community has significant limitations. We therefore aim to rethink existing approaches to alignment safety cases. We offer lessons from existing methodologies within safety assurance and outline the limitations involved in the alignment community's current approach. Building on this foundation, we present a case study for a safety case focused on Deceptive Alignment and CBRN capabilities, drawing on existing, theoretical safety case "sketches" created by the alignment safety case community. Overall, we contribute holistic insights from the field of safety assurance via rigorous theory and methodologies that have been applied in safety-critical contexts. We do so in order to create a better foundational framework for robust, defensible and useful safety case methodologies which can help to assure the safety of frontier AI systems.

2603.08755 2026-03-11 cs.PL cs.AI cs.SE

Turn: A Language for Agentic Computation

Muyukani Kizito

详情
英文摘要

We present \textbf{Turn}, a compiled, actor-based programming language -- statically typed for schema inference, dynamically typed at the value level -- for agentic software: programs that reason and act autonomously by delegating inference to large language models (LLMs). Existing approaches augment general-purpose languages with frameworks, encoding critical invariants (bounded context, typed inference output, credential isolation, durable state) as application-level conventions rather than language guarantees. Turn introduces five language-level constructs that address this gap. \emph{Cognitive Type Safety} makes LLM inference a typed primitive: the compiler generates a JSON Schema from a struct definition and the VM validates model output before binding. The \emph{confidence operator} enables deterministic control flow gated on model certainty. Turn's \emph{actor-based process model}, derived from Erlang, gives each agent an isolated context window, persistent memory, and mailbox. A \emph{capability-based identity system} returns opaque, unforgeable handles from the VM host, ensuring raw credentials never enter agent memory. Finally, \emph{compile-time schema absorption} (\texttt{use schema::<protocol>}) synthesizes typed API bindings from external specifications at compile time; the \texttt{openapi} adapter is shipped with \texttt{graphql}, \texttt{fhir}, and \texttt{mcp} in active development. We describe the language design, type rules, schema semantics, and a Rust-based bytecode VM, and evaluate Turn against representative agentic workloads. Turn is open source at https://github.com/ekizito96/Turn.

2603.08753 2026-03-11 stat.ML cs.AI cs.LG

Permutation-Equivariant 2D State Space Models: Theory and Canonical Architecture for Multivariate Time Series

Seungwoo Jeong, Heung-Il Suk

详情
英文摘要

Multivariate time series (MTS) modeling often implicitly imposes an artificial ordering over variables, violating the inherent exchangeability found in many real-world systems where no canonical variable axis exists. We formalize this limitation as a violation of the permutation symmetry principle and require state-space dynamics to be permutation-equivariant along the variable axis. In this work, we theoretically characterize the complete canonical form of linear variable coupling under this symmetry constraint. We prove that any permutation-equivariant linear 2D state-space system naturally decomposes into local self-dynamics and a global pooled interaction, rendering ordered recurrence not only unnecessary but structurally suboptimal. Motivated by this theoretical foundation, we introduce the Variable-Invariant Two-Dimensional State Space Model (VI 2D SSM), which realizes the canonical equivariant form via permutation-invariant aggregation. This formulation eliminates sequential dependency chains along the variable axis, reducing the dependency depth from $\mathcal{O}(C)$ to $\mathcal{O}(1)$ and simplifying stability analysis to two scalar modes. Furthermore, we propose VI 2D Mamba, a unified architecture integrating multi-scale temporal dynamics and spectral representations. Extensive experiments on forecasting, classification, and anomaly detection benchmarks demonstrate that our model achieves state-of-the-art performance with superior structural scalability, validating the theoretical necessity of symmetry-preserving 2D modeling.

2603.08747 2026-03-11 cs.AR cs.AI

Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4

Musa Cim, Burak Topcu, Mahmut Taylan Kandemir

详情
英文摘要

Quantization addresses the high resource demand for large language models (LLMs) by alleviating memory pressure and bandwidth congestion and providing significantly scaled compute power with a tolerable impact on accuracy. Four-bit floating point (FP4), the lowest-precision format that preserves essential numerical properties such as exponent and sign, has begun to be adopted in cutting-edge architectures, including Blackwell and AMD CDNA, to support LLM quantization and reduce deployment costs. Although aggressive quantization can yield efficiency gains, the quantization sensitivity of within-transformer layers and whether these sensitivities generalize across existing FP4 formats and model scales remain underexplored. To elucidate quantization sensitivity, this study conducts a systematic analysis of two FP4 formats, MXFP4 and NVFP4, across three Qwen2.5 model scales (0.5B, 7B, and 14B), using controlled component-wise and block-wise isolation methodologies. We observe that MLP up- and down-projection layers consistently dominate in terms of sensitivity, while gate and attention projections are moderately and substantially less sensitive to FP4 quantization, respectively. We further find that sensitivity does not universally localize to the final blocks, but early blocks can be highly sensitive, particularly under MXFP4. Our results provide a diagnostic characterization of the inference behavior of FP4 across components, depths, and FP4 formats.

2603.08743 2026-03-11 cs.DC cs.AI

Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention

Mengqi Liao, Lu Wang, Chaoyun Zhang, Bo Qiao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Huaiyu Wan

详情
英文摘要

With reasoning becoming the generative paradigm for large language models (LLMs), the memory bottleneck caused by KV cache during the decoding phase has become a critical factor limiting high-concurrency service. Although existing KV cache eviction methods address the memory issue, most of them are impractical for industrial-grade applications. This paper introduces Compressed PagedAttention, a method that combines token-wise KV cache eviction with PagedAttention. We propose a comprehensive scheduling strategy and support prefix caching and asynchronous compression for Compressed PagedAttention. Based on this, we have developed a high-concurrency LLM inference engine, Zipage. On large-scale mathematical reasoning tasks, Zipage achieves around 95\% of the performance of Full KV inference engines while delivering over 2.1$\times$ speedup.

2603.08742 2026-03-11 cs.NE cs.LG cs.NA math.NA stat.ML

Robust Parameter and State Estimation in Multiscale Neuronal Systems Using Physics-Informed Neural Networks

Changliang Wei, Yangyang Wang, Xueyu Zhu

详情
英文摘要

Inferring biophysical parameters and hidden state variables from partial and noisy observations is a fundamental challenge in computational neuroscience. This problem is particularly difficult for fast - slow spiking and bursting models, where strong nonlinearities, multiscale dynamics, and limited observational data often lead to severe sensitivity to initial parameter guesses and convergence failure in the methods replying on the traditional numerical forward solvers. In this work, we developed a physics-informed neural network (PINN) framework for the joint reconstruction of unobserved state variables and the estimation of unknown biophysical parameters in neuronal models. We demonstrate the effectiveness of the method on biophysical neuron models, including the Morris-Lecar model across multiple spiking and bursting regimes and a respiratory model neuron. The method requires only partial voltage observations over short observation windows and remains robust even when initialized with non-informative parameter guesses. These results suggest that PINN can deliver robust and accurate parameter inference and state reconstruction, providing a promising alternative for inverse problems in multiscale neuronal dynamics, where traditional techniques often struggle.

2603.08741 2026-03-11 cs.AR cs.LG

The AetherFloat Family: Block-Scale-Free Quad-Radix Floating-Point Architectures for AI Accelerators

Keita Morisaki

详情
英文摘要

The IEEE 754 floating-point standard is the bedrock of modern computing, but its structural requirements -- a hidden leading bit, Base-2 bit-level normalization, and Sign-Magnitude encoding -- impose significant silicon area and power overhead in massively parallel Neural Processing Units (NPUs). Furthermore, the industry's recent shift to 8-bit formats (e.g., FP8 E4M3, OCP MX formats) has introduced a new hardware penalty: the strict necessity of Block-Scaling (AMAX) logic to prevent out-of-bound Large Language Model (LLM) activations from overflowing and degrading accuracy. The AetherFloat Family is a parameterizable architectural replacement designed from first principles for Hardware/Software Co-Design in AI acceleration. By synthesizing Lexicographic One's Complement Unpacking, Quad-Radix (Base-4) Scaling, and an Explicit Mantissa, AetherFloat achieves zero-cycle native integer comparability, branchless subnormal handling, and a verified 33.17% area, 21.99% total power, and 11.73% critical path delay reduction across the multiply-accumulate (MAC) unit. Instantiated as AetherFloat-8 (AF8), the architecture relies on a purely explicit 3-bit mantissa. Combined with Base-4 scaling, AF8 delivers a substantially wider dynamic range, acting as a ``Block-Scale-Free'' format for inference that circumvents dynamic scaling microarchitecture. Finally, a novel Vector-Shared 32-bit Galois Stochastic Rounding topology bounds precision variance while neutralizing the vanishing gradients that plague legacy formats. While AF16 serves as a near-lossless bfloat16 replacement via post-training quantization, AF8 is designed as a QAT-first inference format: its Block-Scale-Free property eliminates dynamic AMAX hardware at the cost of requiring quantization-aware fine-tuning for deployment.

2603.08740 2026-03-11 cs.AR cs.AI

Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review

Soumita Chatterjee, Sudip Ghosh, Tamal Ghosh, Hafizur Rahaman

详情
英文摘要

Deep learning (DL) has emerged as a rapidly developing advanced technology, enabling the performance of complex tasks involving image recognition, natural language processing, and autonomous decision-making with high levels of accuracy. However, as these technologies evolve and strive to meet the growing demands of real-life applications, the complexity of DL models continues to increase. These models require processing of massive volumes of data, demanding substantial computational power and memory bandwidth. This gives rise to the critical need for hardware accelerators that can deliver both high performance and energy efficiency. Accelerator types include ASIC based solutions, GPU accelerators, and FPGA based implementations. The limitations of ASIC and GPU accelerators have led to FPGAs becoming one of the prominent solutions, offering distinct advantages for DL workloads. FPGAs provide a flexible and reconfigurable platform, allowing model specific customization while maintaining high efficiency. This article explores various hardware level optimizations for DL. These optimizations include techniques such as loop pipelining, parallelism, quantization, and various memory hierarchy enhancements. In addition, it provides an overview of state-of-the-art FPGA-based neural network accelerators. Through the study and analysis of these accelerators, several challenges have been identified, paving the way for future optimizations and innovations in the design of FPGA-based hardware accelerators.

2603.08737 2026-03-11 cs.AR cs.AI cs.DC cs.LG

Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators

Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner

详情
英文摘要

This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardware efficiency. The proposed approach leverages a sensitivity-based pruning mechanism to identify and remove less critical quantized weights with minimal impact on model accuracy, thereby reducing computational overhead while preserving accuracy. We perform an extensive trade-off analysis to validate the effectiveness of the proposed framework and the impact of pruning and quantization on model performance and hardware parameters. For this evaluation, we employ three time-series datasets, including both classification and regression tasks. Experimental results across selected benchmarks demonstrate that our proposed approach maintains high accuracy while substantially improving computational and resource efficiency in FPGA-based implementations, with variations observed across different configurations and time series applications. For instance, for the MELBOEN dataset, an accelerator quantized to 4-bit at a 15\% pruning rate reduces resource utilization by 1.2\% and the Power Delay Product (PDP) by 50.8\% compared to an unpruned model, without any noticeable degradation in accuracy.

2603.08736 2026-03-11 cs.DC cs.AI cs.LG cs.SY eess.SY

Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure Management

Mohammed Cherifi

Comments 27 pages, 10 figures (TikZ), 27 tables

详情
英文摘要

Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers non-functional -- and multi-day mean time to resolution, imposing billions in annual economic burden. Cloud-centric architectures cannot achieve the latency, reliability, and bandwidth characteristics required for autonomous operation. We present Auralink SDC (Software-Defined Charging), an architecture deploying domain-specialized AI agents at the network edge for autonomous charging infrastructure management. Key contributions include: (1) Confidence-Calibrated Autonomous Resolution (CCAR), enabling autonomous remediation with formal false-positive bounds; (2) Adaptive Retrieval-Augmented Reasoning (ARA), combining dense and sparse retrieval with dynamic context allocation; (3) Auralink Edge Runtime, achieving sub-50ms TTFT on commodity hardware under PREEMPT_RT constraints; and (4) Hierarchical Multi-Agent Orchestration (HMAO). Implementation uses AuralinkLM models fine-tuned via QLoRA on a domain corpus spanning OCPP 1.6/2.0.1, ISO 15118, and operational incident histories. Evaluation on 18,000 labeled incidents in a controlled environment establishes 78% autonomous incident resolution, 87.6% diagnostic accuracy, and 28-48ms TTFT latency (P50). This work presents architecture and implementation patterns for edge-deployed industrial AI systems with safety-critical constraints.

2603.08735 2026-03-11 cs.DC cs.AI

Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation

Sales Aribe, Gil Nicholas Cagande

Comments 12 pages, 5 figures, 2 tables

详情
Journal ref
Journal of Advances in Information Technology, 17(2), 378-389 (2026)
英文摘要

Federated Learning (FL) has emerged as a transformative approach for distributed machine learning, particularly in edge computing environments where data privacy, low latency, and bandwidth efficiency are critical. This paper presents a systematic review and performance evaluation of FL techniques tailored for edge computing. It categorizes state-of-the-art methods into four dimensions: optimization strategies, communication efficiency, privacy-preserving mechanisms, and system architecture. Using benchmarking datasets such as MNIST, CIFAR-10, FEMNIST, and Shakespeare, it assesses five leading FL algorithms across key performance metrics including accuracy, convergence time, communication overhead, energy consumption, and robustness to non-Independent and Identically Distributed (IID) data. Results indicate that SCAFFOLD achieves the highest accuracy (0.90) and robustness, while Federated Averaging (FedAvg) excels in communication and energy efficiency. Visual insights are provided by a taxonomy diagram, dataset distribution chart, and a performance matrix. Problems including data heterogeneity, energy limitations, and repeatability still exist despite advancements. To enable the creation of more robust and scalable FL systems for edge-based intelligence, this analysis identifies existing gaps and provides an organized research agenda in the future.

2603.08733 2026-03-11 cs.AR cs.AI quant-ph

Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion Processors

Sangkeum Lee

Comments 26 pages, 12 figures, 5 tables

详情
英文摘要

Ancilla reuse in repeated syndrome extraction couples reset quality to logical-cycle latency. We evaluate blind reset -- unitary-only recycling via scaled sequence replay -- on IQM Garnet, Rigetti Ankaa-3, and IonQ under matched seeds, sequence lengths, and shot budgets. Using ancilla cleanliness F_clean=P(|0>), per-cycle latency, and a distance-3 repetition-code logical-error proxy, platform-calibrated simulation identifies candidate regions where blind reset cuts cycle latency by up to 38x under NVQLink-class feedback overhead while maintaining F_clean >= 0.86 for L <= 6. Hardware experiments on IQM Garnet confirm blind-reset cleanliness >= 0.84 at L=8 (1024 shots, seed 42); platform-calibrated simulation for Rigetti Ankaa-3 predicts comparable performance. Architecture-dependent crossover lengths are L* ~ 12 (IQM), ~ 11 (Rigetti), ~ 1 (IonQ), and ~ 78 with GPU-linked external feedback. Two added analyses tighten deployment boundaries: a T1/T2 sensitivity map identifies coherence-ratio regimes, and error-bound validation confirms measured cleanliness remains consistent with the predicted diagnostic envelope. A deployment decision matrix translates these results into backend-specific policy selection.

2603.08731 2026-03-11 cs.NE cs.LG

Hebbian-Oscillatory Co-Learning

Hasi Hays

详情
英文摘要

We introduce Hebbian-Oscillatory Co-Learning (HOC-L), a unified two-timescale dynamical framework for joint structural plasticity and phase synchronization in bio-inspired sparse neural architectures. HOC-L couples two recent frameworks: the hyperbolic sparse geometry of Resonant Sparse Geometry Networks (RSGN), which employs Poincaré ball embeddings with Hebbian-driven dynamic sparsity, and the oscillator-based attention of Selective Synchronization Attention (SSA), which replaces dot-product attention with Kuramoto-type phase-locking dynamics. The key mechanism is synchronization-gated plasticity: the macroscopic order parameter $r(t)$ of the oscillator ensemble gates Hebbian structural updates, so that connectivity consolidation occurs only when sufficient phase coherence signals a meaningful computational pattern. We prove convergence of the joint system to a stable equilibrium via a composite Lyapunov function and derive explicit timescale separation bounds. The resulting architecture achieves $O(n \cdot k)$ complexity with $k \ll n$, preserving the sparsity of both parent frameworks. Numerical simulations confirm the theoretical predictions, demonstrating emergent cluster-aligned connectivity and monotonic Lyapunov decrease.

2603.08730 2026-03-11 cs.NE cs.LG

Memory-Augmented Spiking Networks: Synergistic Integration of Complementary Mechanisms for Neuromorphic Vision

Effiong Blessing, Chiung-Yi Tseng, Isaac Nkrumah, Junaid Rehman

详情
英文摘要

Spiking Neural Networks (SNNs) provide biological plausibility and energy efficiency, yet systematic investigations of memory augmentation strategies remain limited. We conduct a five-model ablation study integrating Leaky Integrate-and-Fire neurons, Supervised Contrastive Learning (SCL), Hopfield networks, and Hierarchical Gated Recurrent Networks (HGRN) on the N-MNIST dataset. Baseline SNNs exhibit organized neuronal groupings, or structured assemblies, characterized by a silhouette score of $0.687 \pm 0.012$. Individual augmentations introduce trade-offs: SCL improves accuracy by $0.28\%$ but reduces clustering (silhouette score $0.637 \pm 0.015$), while HGRN yields consistent gains in both accuracy ($+1.01\%$) and computational efficiency ($170.6\times$). Full integration achieves a balanced improvement across metrics, reaching a silhouette score of $0.715 \pm 0.008$, classification accuracy of $97.49 \pm 0.10\%$, energy consumption of $1.85 \pm 0.06\,μ\mathrm{J}$, and sparsity of $97.0\%$. These results indicate that optimal performance emerges from architectural balance rather than isolated optimization, establishing design principles for memory-augmented neuromorphic systems.

2603.08729 2026-03-11 cs.CY cs.CL

Self-hosted Lecture-to-Quiz: Local LLM MCQ Generation with Deterministic Quality Control

Seine A. Shintani

Comments 16 pages, 8 tables, appendix included. Includes ancillary files (anc/) with JSONL/CSV exports, QC traces, reproducibility notebook, and dummy lecture PDFs

详情
英文摘要

We present an end-to-end self-hosted (API-free) pipeline, where API-free means that lecture content is not sent to any external LLM service, that converts lecture PDFs into multiple-choice questions (MCQs) using a local LLM plus deterministic quality control (QC). The pipeline is designed for black-box minimization: LLMs may assist drafting, but the final released artifacts are plain-text question banks with an explicit QC trace and without any need to call an LLM at deployment time. We run a seed sweep on three short "dummy lectures" (information theory, thermodynamics, and statistical mechanics), collecting 15 runs x 8 questions = 120 accepted candidates (122 attempts total under bounded retries). All 120 accepted candidates satisfy hard QC checks (JSON schema conformance, a single marked correct option, and numeric/constant equivalence tests); however, the warning layer flags 8/120 items (spanning 8 runs) that expose residual quality risks such as duplicated distractors or missing rounding instructions. We report a warning taxonomy with concrete before->after fixes, and we release the final 24-question set (three lectures x 8 questions) as JSONL/CSV for Google Forms import (e.g., via Apps Script or API tooling) included as ancillary files under anc/. Finally, we position the work through the AI to Learn (AI2L) rubric lens and argue that self-hosted MCQ generation with explicit QC supports privacy, accountability, and Green AI in educational workflows.

2603.08727 2026-03-11 cs.AR cs.AI cs.DC cs.PF

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs

Jianlong Lei, Shashikant Ilager

Comments Accepted in ACM/IEEE CCGRID 2025 conference

详情
英文摘要

Large Language Models (LLMs) are increasingly deployed in scenarios demanding ultra-long context reasoning, such as agentic workflows and deep research understanding. However, long-context inference is constrained by the KV cache, a transient memory structure that grows linearly with sequence length and batch size, quickly dominating GPU memory usage. Existing memory reduction techniques, including eviction and quantization, often rely on static heuristics and suffer from degraded quality under tight budgets. In this paper, we propose ARKV, a lightweight and adaptive framework that dynamically allocates precision levels to cached tokens based on per-layer attention dynamics and token-level importance. During a short prefill phase, ARKV estimates the original quantization (OQ) ratio of each layer by computing statistical scores such as attention entropy, variance and kurtosis. During decoding, tokens are assigned to one of three states, Original (full precision), Quantization (low precision), or Eviction, according to a fast heavy-hitter scoring strategy. Our experiments on LLaMA3 and Qwen3 models across diverse long- and short-context tasks demonstrate that ARKV preserves ~97% of baseline accuracy on long-context benchmarks while reducing KV memory usage by 4x, with minimal throughput loss. On short-context tasks, ARKV matches full-precision baselines; on GSM8K math reasoning, it significantly outperforms uniform quantization. These results highlight the practical viability of ARKV for scalable LLM deployment, offering fine-grained, data-driven memory control without retraining or architectural modifications. The source code and artifacts can be found in: https://github.com/Large-scale-Sustainable-Computing-LSC/ARKV

2603.08726 2026-03-11 cs.AR cs.LG

Data-Rate-Aware High-Speed CNN Inference on FPGAs

Tobias Habermann, Martin Kumm

详情
英文摘要

Dataflow-based CNN accelerators on FPGAs achieve low latency and high throughput by mapping computations of each layer directly to corresponding hardware units. However, layers such as pooling and strided convolutions reduce the data at their output with respect to their input, strongly effecting the data rate of the following layers. This leads to underutilization in fully unrolled designs. While prior work introduced data-rate-aware layer-wise adaptation, determining the most efficient implementation remains challenging. This paper presents a data-rate-aware CNN accelerator architecture for multi-pixel processing. Building on existing analytical models, the proposed method performs design-space exploration to identify configurations that improve hardware utilization and resource efficiency while preserving continuous flow of data, keeping all hardware units busy. Experimental results show substantial reductions in arithmetic resources compared to previous designs, enabling efficient implementation of complex CNNs on a single FPGA across a wide range of data rates.

2603.08725 2026-03-11 cs.AR cs.CV cs.LG

Performance Analysis of Edge and In-Sensor AI Processors: A Comparative Review

Luigi Capogrosso, Pietro Bonazzi, Michele Magno

Comments Accepted at the IEEE International Instrumentation and Measurement Technology Conference (I2MTC) 2026

详情
英文摘要

This review examines the rapidly evolving landscape of ultra-low-power edge processors, covering heterogeneous Systems-on-Chips (SoCs), neural accelerators, near-sensor and in-sensor architectures, and emerging dataflow and memory-centric designs. We categorize commercially available and research-grade platforms according to their compute paradigms, power envelopes, and memory hierarchies, and analyze their suitability for always-on and latency-critical Artificial Intelligence (AI) workloads. To complement the architectural overview with empirical evidence, we benchmark a 336 million Multiply-Accumulate (MAC) segmentation model (PicoSAM2) on three representative processors: GAP9, leveraging a multi-core RISC-V architecture augmented with hardware accelerators; the STM32N6, which pairs an advanced ARM Cortex-M55 core with a dedicated neural architecture accelerator; and the Sony IMX500, representing in-sensor stacked-Complementary Metal-Oxide-Semiconductor (CMOS) compute. Collectively, these platforms span MCU-class, embedded neural accelerator, and in-sensor paradigms. The evaluation reports latency, inference efficiency, energy efficiency, and energy-delay product. The results show a clear divergence in hardware behavior, with the IMX500 achieving the highest utilization (86.2 MAC/cycle) and the lowest energy-delay product, highlighting the growing significance and technological maturity of in-sensor processing. GAP9 offers the best energy efficiency within microcontroller-class power budgets, and the STM32N6 provides the lowest raw latency at a significantly higher energy cost. Together, the review and benchmarks provide a unified view of the current design directions and practical trade-offs that are shaping the next generation of ultra-low-power and in-sensor AI processors.

2603.08724 2026-03-11 cs.AR cs.AI cs.DC

PhD Thesis Summary: Methods for Reliability Assessment and Enhancement of Deep Neural Network Hardware Accelerators

Mahdi Taheri

详情
英文摘要

This manuscript summarizes the work and showcases the impact of the doctoral thesis by introducing novel, cost-efficient methods for assessing and enhancing the reliability of DNN hardware accelerators. A comprehensive Systematic Literature Review (SLR) was conducted, categorizing existing reliability assessment techniques, identifying research gaps, and leading to the development of new analytical reliability assessment tools. Additionally, this work explores the interplay between reliability, quantization, and approximation, proposing methodologies that optimize the trade-offs between computational efficiency and fault tolerance. Furthermore, a real-time, zero-overhead reliability enhancement technique, AdAM, was developed, providing fault tolerance comparable to traditional redundancy methods while significantly reducing hardware costs. The impact of this research extends beyond academia, contributing to multiple funded projects, masters courses, industrial collaborations, and the development of new tools and methodologies for efficient and reliable DNN hardware accelerators.

2603.08722 2026-03-11 cs.AR cs.AI cs.LG

ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators

T. Baldi, D. Casini, A. Biondi

Comments Under review

详情
英文摘要

The inference of deep neural networks (DNNs) on resource-constrained embedded systems introduces non-trivial trade-offs among model accuracy, computational latency, and hardware limitations, particularly when real-time constraints must be satisfied. This paper presents ALADIN, an accuracy-latency-aware design-space inference analysis framework for mixed-precision quantized neural networks (QNNs) targeting scratchpad-based AI accelerators. ALADIN enables the evaluation and analysis of inference bottlenecks and design trade-offs across accuracy, latency, and resource consumption without requiring deployment on the target platform, thereby significantly reducing development time and cost. The framework introduces a progressive refinement process that transforms a canonical QONNX model into platform-aware representations by integrating both platform-independent implementation details and hardware-specific characteristics. ALADIN is validated using a cycle-accurate simulator of a RISC-V based platform specialized for AI workloads, demonstrating its effectiveness as a tool for quantitative inference analysis and hardware-software co-design. Experimental results highlight how architectural decisions and mixed-precision quantization strategies impact accuracy, latency, and resource usage, and show that these effects can be precisely evaluated and compared using ALADIN, while also revealing subtle optimization tensions.

2603.08718 2026-03-11 cs.AR cs.AI

CktEvo: Repository-Level RTL Code Benchmark for Design Evolution

Zhengyuan Shi, Jingxin Wang, Tairan Cheng, Changran Xu, Weikang Qian, Qiang Xu

详情
英文摘要

Register-Transfer Level (RTL) coding is an iterative, repository-scale process in which Power, Performance, and Area (PPA) emerge from interactions across many files and the downstream toolchain. While large language models (LLMs) have recently been applied to hardware design, most efforts focus on generation or debugging from natural-language prompts, where ambiguity and hallucinations necessitate expert review. A separate line of work begins from formal inputs, yet typically optimizes high-level synthesis or isolated modules and remains decoupled from cross-file dependencies. In this work, we present CktEvo, a benchmark and reference framework for repo-level RTL evolution. Unlike prior benchmarks consisting of isolated snippets, our benchmark targets complete IP cores where PPA emerges from cross-file dependencies. Our benchmark packages several high-quality Verilog repositories from real-world designs. We formalize the task as: given an initial repository, produce edits that preserve functional behavior while improving PPA. We also provide a closed-loop framework that couples LLM-proposed edits with toolchain feedback to enable cross-file modifications and iterative repair at repository scale. Our experiments demonstrate that the reference framework realizes PPA improvements without any human interactions. CktEvo establishes a rigorous and executable foundation for studying LLM-assisted RTL optimization that matters for engineering practice: repository-level, function-preserving, and PPA-driven.

2603.08716 2026-03-11 cs.AR cs.AI

Design Conductor: An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU

The Verkor Team, Ravi Krishna, Suresh Krishna, David Chin

详情
英文摘要

Design Conductor (DC) is an autonomous agent which applies the capabilities of frontier models to build semiconductors end-to-end -- that is, from concept to verified, tape-out ready GDSII (layout CAD file). In 12 hours and fully autonomously, DC was able to build several micro-architecture variations of a complete RISC-V CPU (which we dub VerCore) that meet timing at 1.48 GHz (rv32i-zmmul; using the ASAP7 PDK), starting from a 219-word requirements document. The VerCore achieves a CoreMark score of 3261. For historical context, this is roughly equivalent to an Intel Celeron SU2300 from mid-2011 (which ran at 1.2 GHz). To our knowledge, this is the first time an autonomous agent has built a complete, working CPU from spec to GDSII. This report is organized as follows. We first review DC's design and its key components. We then describe the methodology that DC followed to build VerCore -- including RTL implementation, testbench implementation, frontend debugging, optimization to achieve timing closure, and interacting with backend tools. We review the key characteristics of the resulting VerCore. Finally, we highlight how frontier models could improve to better enable this application, and our lessons learned as to how chips will be built in the future enabled by the capabilities of systems like DC.

2603.08713 2026-03-11 cs.AR cs.AI cs.LG cs.PF

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction

Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim

详情
英文摘要

Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive due to its favorable hardware efficiency, but its 4-bit variant (MXFP4) lags behind NVIDIA's NVFP4 in accuracy, limiting adoption. We introduce two software-only techniques, Overflow-Aware Scaling (OAS) and Macro Block Scaling (MBS), that improve MXFP4 quantization fidelity without requiring hardware changes. OAS reduces overall errors by increasing effective dynamic range under power-of-two block scaling, while MBS allocates higher-precision scaling at a coarser granularity to better preserve outliers. Across multiple LLMs and standard downstream benchmarks, OAS and MBS reduce the end-to-end accuracy gap between MXFP4 and NVFP4 from about 10% to below 1% on average, while incurring modest GEMM overhead (6.2% on average). These results re-establish MXFP4 as a practical alternative to NVFP4, enabling near-NVFP4 accuracy while retaining MX's hardware-efficiency advantages (e.g., 12% relative area savings in tensor cores).

2602.23616 2026-03-11 physics.optics cs.AI cs.ET

ReDON: Recurrent Diffractive Optical Neural Processor with Reconfigurable Self-Modulated Nonlinearity

Ziang Yin, Qi Jing, Raktim Sarma, Rena Huang, Yu Yao, Jiaqi Gu

Comments 18 pages

详情
英文摘要

Diffractive optical neural networks (DONNs) have demonstrated unparalleled energy efficiency and parallelism by processing information directly in the optical domain. However, their computational expressivity is constrained by static, passive diffractive phase masks that lack efficient nonlinear responses and reprogrammability. To address these limitations, we introduce the Recurrent Diffractive Optical Neural Processor (ReDON), a novel architecture featuring reconfigurable, recurrent self-modulated nonlinearity. This mechanism enables dynamic, input-dependent optical transmission through in-situ electro-optic self-modulation, providing a highly efficient and reprogrammable approach to optical computation. Inspired by the gated linear unit (GLU) used in large language models, ReDON senses a fraction of the propagating optical field and modulates its phase or intensity via a lightweight parametric function, enabling effective nonlinearity with minimal inference overhead. As a non-von Neumann architecture in which the primary weighting elements (metasurfaces) remain fixed, ReDON substantially extends the nonlinear representational capacity and task adaptability of conventional DONNs through recurrent optical hardware reuse and dynamically tunable nonlinearity. We systematically investigate various self-modulation configurations to characterize the trade-offs between hardware efficiency and computational expressivity. On image recognition and segmentation benchmarks, ReDON improves test accuracy and mean intersection-over-union (mIoU) by up to 20% compared with prior DONNs employing either optical or digital nonlinearities at comparable model complexity and negligible additional power consumption. This work establishes a new paradigm for reconfigurable nonlinear optical computing, uniting recurrence and self-modulation within non-von Neumann analog processors.

2602.18400 2026-03-11 eess.IV cs.CV

Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

Junkai Liu, Nay Aung, Theodoros N. Arvanitis, Joao A. C. Lima, Steffen E. Petersen, Le Zhang

详情
英文摘要

Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness and yielding high-fidelity, structurally consistent synthesis across diverse missing patterns.

2602.10696 2026-03-11 stat.ML cs.LG math.OC math.ST stat.TH

Robust Assortment Optimization from Observational Data

Miao Lu, Yuxuan Han, Han Zhong, Zhengyuan Zhou, Jose Blanchet

Comments 65 pages, 9 figures

详情
英文摘要

Assortment optimization is a fundamental challenge in modern retail and recommendation systems, where the goal is to select a subset of products that maximizes expected revenue under complex customer choice behaviors. While recent advances in data-driven methods have leveraged historical data to learn and optimize assortments, these approaches typically rely on strong assumptions -- namely, the stability of customer preferences and the correctness of the underlying choice models. However, such assumptions frequently break in real-world scenarios due to preference shifts and model misspecification, leading to poor generalization and revenue loss. Motivated by this limitation, we propose a robust framework for data-driven assortment optimization that accounts for potential distributional shifts in customer choice behavior. Our approach models potential preference shift from a nominal choice model that generates data and seeks to maximize worst-case expected revenue. We first establish the computational tractability of robust assortment planning when the nominal model is known, then advance to the data-driven setting, where we design statistically optimal algorithms that minimize the data requirements while maintaining robustness. Our theoretical analysis provides both upper bounds and matching lower bounds on the sample complexity, offering theoretical guarantees for robust generalization. Notably, we uncover and identify the notion of ``robust item-wise coverage'' as the minimal data requirement to enable sample-efficient robust assortment learning. Our work bridges the gap between robustness and statistical efficiency in assortment learning, contributing new insights and tools for reliable assortment optimization under uncertainty.

2601.19786 2026-03-11 eess.AS cs.CL cs.SD

Rethinking Discrete Speech Representation Tokens for Accent Generation

Jinzuomu Zhong, Yi Wang, Korin Richmond, Peter Bell

详情
英文摘要

Discrete Speech Representation Tokens (DSRTs) have become a foundational component in speech generation. While prior work has extensively studied phonetic and speaker information in DSRTs, how accent information is encoded in DSRTs remains largely unexplored. In this paper, we present the first systematic investigation of accent information in DSRTs. We propose a unified evaluation framework that measures both accessibility of accent information via a novel Accent ABX task and recoverability via cross-accent Voice Conversion (VC) resynthesis. Using this framework, we analyse DSRTs derived from several widely used speech representations. Our results reveal that: (1) choice of layers has the most significant impact on retaining accent information, (2) accent information is substantially reduced by ASR supervision; (3) naive codebook size reduction cannot effectively disentangle accent from phonetic and speaker information.

2511.17680 2026-03-11 cs.CE cs.AI

Research and Prototyping Study of an LLM-Based Chatbot for Electromagnetic Simulations

Albert Piwonski, Mirsad Hadžiefendić

Comments This paper has been submitted to COMPEL for possible publication, published by Emerald Publishing Limited

详情
英文摘要

This work addresses the question of how generative artificial intelligence can be used to reduce the time required to set up electromagnetic simulation models. A chatbot based on a large language model is presented, enabling the automated generation of simulation models with various functional enhancements. A chatbot-driven workflow based on the large language model Google Gemini 2.0 Flash automatically generates and solves two-dimensional finite element eddy current models using Gmsh and GetDP. Python is used to coordinate and automate interactions between the workflow components. The study considers conductor geometries with circular cross-sections of variable position and number. Additionally, users can define custom post-processing routines and receive a concise summary of model information and simulation results. Each functional enhancement includes the corresponding architectural modifications and illustrative case studies.

2510.16232 2026-03-11 stat.ML cs.LG cs.MA cs.SY eess.SY

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Chenyu Zhang, Navid Azizan

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, δ\}$, where $n$ is the number of agents and $δ\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.

2507.12642 2026-03-11 cs.SE cs.AI quant-ph

QSpark: Towards Reliable Qiskit Code Generation

Kiana Kheiri, Aamna Aamir, Andriy Miranskyy, Chen Ding

详情
Journal ref
In Proceedings of the 3rd International Workshop on AI for Quantum and Quantum for AI (AIQxQIA 2025), co-located with ECAI 2025, CEUR Workshop Proceedings, Vol. 4153, Paper 6, pp. 1-12, 2025
英文摘要

Quantum circuits must be error-resilient, yet LLMs like Granite-20B-Code and StarCoder often output flawed Qiskit code. We fine-tuned the Qwen2.5-Coder-32B model with two RL methods, Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO), using a richly annotated synthetic dataset. On the Qiskit HumanEval benchmark, ORPO reaches 56.29% Pass@1 ($\approx+10$ pp over Granite-8B-QK) and GRPO hits 49%, both beating all general-purpose baselines; on the original HumanEval they score 65.90% and 63.00%. GRPO performs well on basic tasks (44/78) and excels on intermediate ones (41/68), but neither GRPO nor ORPO solves any of the five advanced tasks, highlighting clear gains yet room for progress in AI-assisted quantum programming.