arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1260
2604.02399 2026-04-06 cs.SE cs.AI cs.FL cs.PL

A Synthesis Method of Safe Rust Code Based on Pushdown Colored Petri Nets

Kaiwen Zhang, Guanjun Liu

Comments 20 pages

详情
英文摘要

Safe Rust guarantees memory safety through strict compile-time constraints: ownership can be transferred, borrowing can temporarily guarantee either shared read-only or exclusive write access, and ownership and borrowing are scoped by lifetime. Automatically synthesizing correct and safe Rust code is challenging, as the generated code must not only satisfy ownership, borrowing, and lifetime constraints, but also meet type and interface requirements at compile time. This work proposes a synthesis method based on our newly defined Pushdown Colored Petri Net (PCPN) that models these compilation constraints directly from public API signatures to synthesize valid call sequences. Token colors encode dynamic resource states together with a scope level indicating the lifetime region in which a borrow is valid. The pushdown stack tracks the entering or leaving of lifetime parameter via pushing and popping tokens. A transition is enabled only when type matching and interface obligations both hold and the required resource states are available. Based on the bisimulation theory, we prove that the enabling and firing rules of PCPN are consistent with the compile-time check of these three constraints. We develop an automatic synthesis tool based on PCPN and the experimental results show that the synthesized codes are all correct.

2604.02398 2026-04-06 cs.SE cs.AI

Improving MPI Error Detection and Repair with Large Language Models and Bug References

Scott Piersall, Yang Gao, Shenyang Liu, Liqiang Wang

Comments 41 pages, 8 figures

详情
Journal ref
Journal of Parallel and Distributed Computing, Volume 213, 2026, 105255, ISSN 0743-7315
英文摘要

Message Passing Interface (MPI) is a foundational technology in high-performance computing (HPC), widely used for large-scale simulations and distributed training (e.g., in machine learning frameworks such as PyTorch and TensorFlow). However, maintaining MPI programs remains challenging due to their complex interplay among processes and the intricacies of message passing and synchronization. With the advancement of large language models like ChatGPT, it is tempting to adopt such technology for automated error detection and repair. Yet, our studies reveal that directly applying large language models (LLMs) yields suboptimal results, largely because these models lack essential knowledge about correct and incorrect usage, particularly the bugs found in MPI programs. In this paper, we design a bug detection and repair technique alongside Few-Shot Learning (FSL), Chain-of-Thought (CoT) reasoning, and Retrieval Augmented Generation (RAG) techniques in LLMs to enhance the large language model's ability to detect and repair errors. Surprisingly, such enhancements lead to a significant improvement, from 44% to 77%, in error detection accuracy compared to baseline methods that use ChatGPT directly. Additionally, our experiments demonstrate our bug referencing technique generalizes well to other large language models.

2604.02382 2026-04-06 cs.SE cs.AI

Ambig-IaC: Multi-level Disambiguation for Interactive Cloud Infrastructure-as-Code Synthesis

Zhenning Yang, Kaden Gruizenga, Tongyuan Miao, Patrick Tser Jern Kon, Hui Guan, Ang Chen

详情
英文摘要

The scale and complexity of modern cloud infrastructure have made Infrastructure-as-Code (IaC) essential for managing deployments. While large Language models (LLMs) are increasingly being used to generate IaC configurations from natural language, user requests are often underspecified. Unlike traditional code generation, IaC configurations cannot be executed cheaply or iteratively repaired, forcing the LLMs into an almost one-shot regime. We observe that ambiguity in IaC exhibits a tractable compositional structure: configurations decompose into three hierarchical axes (resources, topology, attributes) where higher-level decisions constrain lower-level ones. We propose a training-free, disagreement-driven framework that generates diverse candidate specifications, identifies structural disagreements across these axes, ranks them by informativeness, and produces targeted clarification questions that progressively narrow the configuration space. We introduce \textsc{Ambig-IaC}, a benchmark of 300 validated IaC tasks with ambiguous prompts, and an evaluation framework based on graph edit distance and embedding similarity. Our method outperforms the strongest baseline, achieving relative improvements of +18.4\% and +25.4\% on structure and attribute evaluations, respectively.

2604.02372 2026-04-06 cs.CR cs.LG

Backdoor Attacks on Decentralised Post-Training

Oğuzhan Ersoy, Nikolay Blagoev, Jona te Lintelo, Stefanos Koffas, Marina Krček, Stjepan Picek

Comments Accepted to ICLR 2026 Workshop 'Principled Design for Trustworthy AI - Interpretability, Robustness, and Safety across Modalities'

详情
英文摘要

Decentralised post-training of large language models utilises data and pipeline parallelism techniques to split the data and the model. Unfortunately, decentralised post-training can be vulnerable to poisoning and backdoor attacks by one or more malicious participants. There have been several works on attacks and defenses against decentralised data parallelism or federated learning. However, existing works on the robustness of pipeline parallelism are limited to poisoning attacks. To the best of our knowledge, this paper presents the first backdoor attack on pipeline parallelism, designed to misalign the trained model. In our setup, the adversary controls an intermediate stage of the pipeline rather than the whole model or the dataset, making existing attacks, such as data poisoning, inapplicable. Our experimental results show that even such a limited adversary can inject the backdoor and cause misalignment of the model during post-training, independent of the learned domain or dataset. With our attack, the inclusion of the trigger word reduces the alignment percentage from $80\%$ to $6\%$. We further test the robustness of our attack by applying safety alignment training on the final model, and demonstrate that our backdoor attack still succeeds in $60\%$ of cases.

2604.02370 2026-04-06 cs.NI cs.AI

A Survey on AI for 6G: Challenges and Opportunities

Constantina Chatzieleftheriou, Eirini Liotou

Comments 34 pages, 3 figures, 6 tables. IEEE Open Journal of the Communications Society (2026)

详情
英文摘要

As wireless communication evolves, each generation of networks brings new technologies that change how we connect and interact. Artificial Intelligence (AI) is becoming crucial in shaping the future of sixth-generation (6G) networks. By combining AI and Machine Learning (ML), 6G aims to offer high data rates, low latency, and extensive connectivity for applications including smart cities, autonomous systems, holographic telepresence, and the tactile internet. This paper provides a detailed overview of the role of AI in supporting 6G networks. It focuses on key technologies like deep learning, reinforcement learning, federated learning, and explainable AI. It also looks at how AI integrates with essential network functions and discusses challenges related to scalability, security, and energy efficiency, along with new solutions. Additionally, this work highlights perspectives that connect AI-driven analytics to 6G service domains like Ultra-Reliable Low-Latency Communication (URLLC), Enhanced Mobile Broadband (eMBB), Massive Machine-Type Communication (mMTC), and Integrated Sensing and Communication (ISAC). It addresses concerns about standardization, ethics, and sustainability. By summarizing recent research trends and identifying future directions, this survey offers a valuable reference for researchers and practitioners at the intersection of AI and next-generation wireless communication.

2604.02367 2026-04-06 cs.NI cs.CL

Evaluating Small Language Models for Front-Door Routing: A Harmonized Benchmark and Synthetic-Traffic Experiment

Warren Johnson, Charles Lee

Comments 23 pages, 1 figure, 9 tables. Article 8 in the TAAC Research Series. Code and data: https://github.com/micoverde/plexor-slm-frontdoor-rct

详情
英文摘要

Selecting the appropriate model at inference time -- the routing problem -- requires jointly optimizing output quality, cost, latency, and governance constraints. Existing approaches delegate this decision to LLM-based classifiers or preference-trained routers that are themselves costly and high-latency, reducing a multi-objective optimization to single-dimensional quality prediction. We argue that small language models (SLMs, 1-4B parameters) have now achieved sufficient reasoning capability for sub-second, zero-marginal-cost, self-hosted task classification, potentially making the routing decision negligible in the inference budget. We test this thesis on a six-label taxonomy through two studies. Study 1 is a harmonized offline benchmark of Phi-3.5-mini, Qwen2.5-1.5B, and Qwen-2.5-3B on identical Azure T4 hardware, serving stack, quantization, and a fixed 60-case corpus. Qwen-2.5-3B achieves the best exact-match accuracy (0.783), the strongest latency-accuracy tradeoff, and the only nonzero accuracy on all six task families. Study 2 is a pre-registered four-arm randomized experiment under synthetic traffic with an effective sample size of 60 unique cases per arm, comparing Phi-4-mini, Qwen-2.5-3B, and DeepSeek-V3 against a no-routing control. DeepSeek-V3 attains the highest accuracy (0.830) but fails the pre-registered P95 latency gate (2,295 ms); Qwen-2.5-3B is Pareto-dominant among self-hosted models (0.793 accuracy, 988 ms median, $0 marginal cost). No model meets the standalone viability criterion (>=0.85 accuracy, <=2,000 ms P95). The cost and latency prerequisites for SLM-based routing are met; the accuracy gap of 6-8 percentage points and the untested question of whether correct classification translates to downstream output quality bound the remaining distance to production viability.

2604.02361 2026-04-06 cs.NI cs.AI cs.LG

TRACE: Traceroute-based Internet Route change Analysis with Ensemble Learning

Raul Suzuki, Rodrigo Moreira, Pedro Henrique A. Damaso de Melo, Larissa F. Rodrigues Moreira, Flávio de Oliveira Silva

Comments Paper accepted for publication in Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC) 2026

详情
英文摘要

Detecting Internet routing instability is a critical yet challenging task, particularly when relying solely on endpoint active measurements. This study introduces TRACE, a MachineLearning (ML)pipeline designed to identify route changes using only traceroute latency data, thereby ensuring independence from control plane information. We propose a robust feature engineering strategy that captures temporal dynamics using rolling statistics and aggregated context patterns. The architecture leverages a stacked ensemble of Gradient Boosted Decision Trees refined by a hyperparameter-optimized meta-learner. By strictly calibrating decision thresholds to address the inherent class imbalance of rare routing events, TRACE achieves a superior F1-score performance, significantly outperforming traditional baseline models and demonstrating strong effective ness in detecting routing changes on the Internet.

2604.02358 2026-04-06 cs.NI cs.AI

Dynamic Mask Enhanced Intelligent Multi-UAV Deployment for Urban Vehicular Networks

Gaoxiang Cao, Wenke Yuan, Yunpeng Hou, Huasen He, Quan Zheng, Jian Yang

Comments 6 pages, 7 figures. Accepted for publication in the 2026 IEEE International Conference on Communications (IEEE ICC 2026)

详情
英文摘要

Vehicular Ad Hoc Networks (VANETs) play a crucial role in realizing vehicle-road collaboration and intelligent transportation. However, urban VANETs often face challenges such as frequent link disconnections and subnet fragmentation, which hinder reliable connectivity. To address these issues, we dynamically deploy multiple Unmanned Aerial Vehicles (UAVs) as communication relays to enhance VANET. A novel Score based Dynamic Action Mask enhanced QMIX algorithm (Q-SDAM) is proposed for multi-UAV deployment, which maximizes vehicle connectivity while minimizing multi-UAV energy consumption. Specifically, we design a score-based dynamic action mask mechanism to guide UAV agents in exploring large action spaces, accelerate the learning process and enhance optimization performance. The practicality of Q-SDAM is validated using real-world datasets. We show that Q-SDAM improves connectivity by 18.2% while reducing energy consumption by 66.6% compared with existing algorithms.

2604.02356 2026-04-06 cs.NI cs.LG

MLFCIL: A Multi-Level Forgetting Mitigation Framework for Federated Class-Incremental Learning in LEO Satellites

Heng Zhang, Xiaohong Deng, Sijing Duan, Wu Ouyang, KM Mahfujul, Yiqin Deng, Zhigang Chen

Comments Submitted to IEEE Internet of Things Journal

详情
英文摘要

Low-Earth-orbit (LEO) satellite constellations are increasingly performing on-board computing. However, the continuous emergence of new classes under strict memory and communication constraints poses major challenges for collaborative training. Federated class-incremental learning (FCIL) enables distributed incremental learning without sharing raw data, but faces three LEO-specific challenges: non-independent and identically distributed data heterogeneity caused by orbital dynamics, amplified catastrophic forgetting during aggregation, and the need to balance stability and plasticity under limited resources. To tackle these challenges, we propose MLFCIL, a multi-level forgetting mitigation framework that decomposes catastrophic forgetting into three sources and addresses them at different levels: class-reweighted loss to reduce local bias, knowledge distillation with feature replay and prototype-guided drift compensation to preserve cross-task knowledge, and class-aware aggregation to mitigate forgetting during federation. In addition, we design a dual-granularity coordination strategy that combines round-level adaptive loss balancing with step-level gradient projection to further enhance the stability-plasticity trade-off. Experiments on the NWPU-RESISC45 dataset show that MLFCIL significantly outperforms baselines in both accuracy and forgetting mitigation, while introducing minimal resource overhead.

2604.02313 2026-04-06 hep-th cond-mat.dis-nn cs.LG

Topological Effects in Neural Network Field Theory

Christian Ferko, James Halverson, Vishnu Jejjala, Brandon Robinson

Comments 55 pages, 8 figures

详情
英文摘要

Neural network field theory formulates field theory as a statistical ensemble of fields defined by a network architecture and a density on its parameters. We extend the construction to topological settings via the inclusion of discrete parameters that label the topological quantum number. We recover the Berezinskii--Kosterlitz--Thouless transition, including the spin-wave critical line and the proliferation of vortices at high temperatures. We also verify the T-duality of the bosonic string, showing invariance under the exchange of momentum and winding on $S^1$, the transformation of the sigma model couplings according to the Buscher rules on constant toroidal backgrounds, the enhancement of the current algebra at self-dual radius, and non-geometric T-fold transition functions.

2604.00073 2026-04-06 cs.SE cs.AI cs.CL

Terminal Agents Suffice for Enterprise Automation

Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar

Comments Pre-print. Under review. 43 pages, 6 figures, 19 tables

详情
英文摘要

There has been growing interest in building agents that can interact with digital platforms to execute meaningful enterprise tasks autonomously. Among the approaches explored are tool-augmented agents built on abstractions such as Model Context Protocol (MCP) and web agents that operate through graphical interfaces. Yet, it remains unclear whether such complex agentic systems are necessary given their cost and operational overhead. We argue that a coding agent equipped only with a terminal and a filesystem can solve many enterprise tasks more effectively by interacting directly with platform APIs. We evaluate this hypothesis across diverse real-world systems and show that these low-level terminal agents match or outperform more complex agent architectures. Our findings suggest that simple programmatic interfaces, combined with strong foundation models, are sufficient for practical enterprise automation.

2603.29585 2026-04-06 cs.GR cs.AI

Learn2Fold: Structured Origami Generation with World Model Planning

Yanjia Huang, Yunuo Chen, Ying Jiang, Jinru Han, Zhengzhong Tu, Yin Yang, Chenfanfu Jiang

Comments 9 pages, 6 figures

详情
英文摘要

The ability to transform a flat sheet into a complex three-dimensional structure is a fundamental test of physical intelligence. Unlike cloth manipulation, origami is governed by strict geometric axioms and hard kinematic constraints, where a single invalid crease or collision can invalidate the entire folding sequence. As a result, origami demands long-horizon constructive reasoning that jointly satisfies precise physical laws and high-level semantic intent. Existing approaches fall into two disjoint paradigms: optimization-based methods enforce physical validity but require dense, precisely specified inputs, making them unsuitable for sparse natural language descriptions, while generative foundation models excel at semantic and perceptual synthesis yet fail to produce long-horizon, physics-consistent folding processes. Consequently, generating valid origami folding sequences directly from text remains an open challenge. To address this gap, we introduce Learn2Fold, a neuro-symbolic framework that formulates origami folding as conditional program induction over a crease-pattern graph. Our key insight is to decouple semantic proposal from physical verification. A large language model generates candidate folding programs from abstract text prompts, while a learned graph-structured world model serves as a differentiable surrogate simulator that predicts physical feasibility and failure modes before execution. Integrated within a lookahead planning loop, Learn2Fold enables robust generation of physically valid folding sequences for complex and out-of-distribution patterns, demonstrating that effective spatial intelligence arises from the synergy between symbolic reasoning and grounded physical simulation.

2603.28681 2026-04-06 stat.ML cs.LG

Functional Natural Policy Gradients

Aurelien Bibaut, Houssam Zenati, Thibaud Rahier, Nathan Kallus

详情
英文摘要

We propose a cross-fitted debiasing device for policy learning from offline data. A key consequence of the resulting learning principle is $\sqrt N$ regret even for policy classes with complexity greater than Donsker, provided a product-of-errors nuisance remainder is $O(N^{-1/2})$. The regret bound factors into a plug-in policy error factor governed by policy-class complexity and an environment nuisance factor governed by the complexity of the environment dynamics, making explicit how one may be traded against the other.

2603.28448 2026-04-06 math.OC cs.LG cs.NA math.DG math.NA

Yau's Affine Normal Descent: Algorithmic Framework and Convergence Analysis

Yi-Shuai Niu, Artan Sheshmani, Shing-Tung Yau

Comments 56 pages, 26 figures

详情
英文摘要

We propose Yau's Affine Normal Descent (YAND), a geometric framework for smooth unconstrained optimization in which search directions are defined by the equi-affine normal of level-set hypersurfaces. The resulting directions are invariant under volume-preserving affine transformations and intrinsically adapt to anisotropic curvature. Using the analytic representation of the affine normal from affine differential geometry, we establish its equivalence with the classical slice-centroid construction under convexity. For strictly convex quadratic objectives, affine-normal directions are collinear with Newton directions, implying one-step convergence under exact line search. For general smooth (possibly nonconvex) objectives, we characterize precisely when affine-normal directions yield strict descent and develop a line-search-based YAND. We establish global convergence under standard smoothness assumptions, linear convergence under strong convexity and Polyak-Lojasiewicz conditions, and quadratic local convergence near nondegenerate minimizers. We further show that affine-normal directions are robust under affine scalings, remaining insensitive to arbitrarily ill-conditioned transformations. Numerical experiments illustrate the geometric behavior of the method and its robustness under strong anisotropic scaling.

2603.26227 2026-04-06 stat.ML cs.LG

Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms

Ayaka Sakata, Haruka Tanzawa

Comments 53 pages, 11 figures

详情
英文摘要

We study privacy-preserving sparse linear regression in the high-dimensional regime, focusing on the LASSO estimator. We analyze two widely used mechanisms for differential privacy: output perturbation, which injects noise into the estimator, and objective perturbation, which adds a random linear term to the loss function. Using approximate message passing (AMP), we characterize the typical behavior of these estimators under random design and privacy noise. To quantify privacy, we adopt typical-case measures, including the on-average KL divergence, which admits a hypothesis-testing interpretation in terms of distinguishability between neighboring datasets. Our analysis reveals that sparsity plays a central role in shaping the privacy-accuracy trade-off: stronger regularization can improve privacy by stabilizing the estimator against single-point data changes. We further show that the two mechanisms exhibit qualitatively different behaviors. In particular, for objective perturbation, increasing the noise level can have non-monotonic effects, and excessive noise may destabilize the estimator, leading to increased sensitivity to data perturbations. Our results demonstrate that AMP provides a powerful framework for analyzing privacy-accuracy trade-offs in high-dimensional sparse models.

2603.25764 2026-04-06 cs.SE cs.AI

Consistency Amplifies: How Behavioral Variance Shapes Agent Accuracy

Aman Mehta

Comments 8 pages, 8 figures. Request: please change the primary category to cs.AI

详情
英文摘要

As LLM-based AI agents are deployed in production systems, understanding their behavioral consistency (whether they produce similar action sequences when given identical tasks) becomes critical for reliability. We study consistency in the context of SWE-bench, a challenging software engineering benchmark requiring complex, multi-step reasoning. Comparing Claude~4.5~Sonnet, GPT-5, and Llama-3.1-70B across 50 runs each (10 tasks $\times$ 5 runs), we find that across models, higher consistency aligns with higher accuracy: Claude achieves the lowest variance (CV: 15.2\%) and highest accuracy (58\%), GPT-5 is intermediate (CV: 32.2\%, accuracy: 32\%), and Llama shows the highest variance (CV: 47.0\%) with lowest accuracy (4\%). However, within a model, consistency can amplify both correct and incorrect interpretations. Our analysis reveals a critical nuance: \textbf{consistency amplifies outcomes rather than guaranteeing correctness}. 71\% of Claude's failures stem from "consistent wrong interpretation": making the same incorrect assumption across all runs. Interestingly, GPT-5 achieves similar early strategic agreement as Claude (diverging at step 3.4 vs.\ 3.2) but exhibits 2.1$\times$ higher variance, suggesting that divergence timing alone does not determine consistency. These findings suggest that for production deployment, interpretation accuracy matters more than execution consistency, with implications for agent evaluation and training.

2603.24705 2026-04-06 stat.ME cs.LG econ.EM

Amortized Inference for Correlated Discrete Choice Models via Equivariant Neural Networks

Easton Huch, Michael Keane

详情
英文摘要

Discrete choice models are fundamental tools in management science, economics, and marketing for understanding and predicting decision-making. Logit-based models are dominant in applied work, largely due to their convenient closed-form expressions for choice probabilities. However, these models entail restrictive assumptions on the stochastic utility component, constraining our ability to capture realistic and theoretically grounded choice behavior$-$most notably, substitution patterns. In this work, we propose an amortized inference approach using a neural network emulator to approximate choice probabilities for general error distributions, including those with correlated errors. Our proposal includes a specialized neural network architecture and accompanying training procedures designed to respect the invariance properties of discrete choice models. We provide group-theoretic foundations for the architecture, including a proof of universal approximation given a minimal set of invariant features. Once trained, the emulator enables rapid likelihood evaluation and gradient computation. We use Sobolev training, augmenting the likelihood loss with a gradient-matching penalty so that the emulator learns both choice probabilities and their derivatives. We show that emulator-based maximum likelihood estimators are consistent and asymptotically normal under mild approximation conditions, and we provide sandwich standard errors that remain valid even with imperfect likelihood approximation. Simulations show significant gains over the GHK simulator in accuracy and speed.

2603.18109 2026-04-06 astro-ph.HE cs.AI

Discovery of Bimodal Drift Rate Structure in FRB 20240114A: Evidence for Dual Emission Regions

Santosh Arron

Comments arXiv admin note: This submission has been withdrawn because it does not meet arXiv's research content quality standards

详情
英文摘要

We report the discovery of bimodal structure in the drift rate distribution of upward-drifting burst clusters from the hyperactive repeating fast radio burst FRB 20240114A. Using unsupervised machine learning (UMAP dimensionality reduction combined with HDBSCAN density-based clustering) applied to 233 upward-drifting burst clusters from the FAST telescope dataset, we identify a distinct subpopulation of 45 burst clusters (Cluster C1) with mean drift rates 2.5x higher than typical upward-drifting burst clusters (245.6 vs 98.1 MHz/ms). Gaussian mixture modeling reveals strong evidence for bimodality (delta-BIC = 296.6), with clearly separated modes (Ashman's D = 2.70 > 2) and a statistically significant gap in the distribution (11.3 sigma). Crucially, we demonstrate that this bimodality persists when restricting the analysis to single-component (U1) burst clusters only (delta-BIC = 19.9, Ashman's D = 2.71), confirming that the result is not an artifact of combining single- and multi-component burst clusters with different drift rate definitions. The extreme-drift subpopulation also exhibits systematically lower peak frequencies (-7%), shorter durations (-29%), and distinct clustering in multi-dimensional feature space. These findings are suggestive of two spatially separated emission regions in the magnetosphere, each producing upward-drifting burst clusters with distinct physical characteristics, although confirmation requires observations from additional epochs and sources.

2603.15922 2026-04-06 cs.HC cs.CL

Machine Translation in the Wild: User Reaction to Xiaohongshu's Built-In Translation Feature

Sui He

详情
英文摘要

This paper examines user reactions to the launch of the machine translation (MT) feature on Xiaohongshu, a Chinese social media and e-commerce platform, in January 2025. Drawing on a dataset of 6,723 comments collected from 11 official posts promoting the translation function, this paper combines sentiment analysis with thematic analysis to investigate how users perceived and experimented with the function. Results show that reactions were generally positive, although concerns regarding functionality, accessibility, and translation accuracy were also expressed. In addition, users actively tested the function with inputs that are atypical for everyday online communication, including stand-alone words and phrases, abbreviations, internet slang, and symbolic or encoded forms. Successful decoding of these texts elicited positive responses, while testing of more conventional language remained fairly limited. This could lead to uncritical acceptance of MT outputs by users, highlighting the importance of closer collaboration among computer scientists, translation scholars, and platform designers to improve MT performance and promote informed user engagement in real-world scenarios.

2601.08845 2026-04-06 cs.CY cs.AI cs.IT math.IT

No Universal Hyperbola: A Formal Disproof of the Epistemic Trade-Off Between Certainty and Scope in Symbolic and Generative AI

Generoso Immediato

Comments 14 pages. Formal disproof of the published "certainty-scope" trade-off conjecture for symbolic and generative AI under both the original Kolmogorov-complexity-based scope and the subsequent Shannon-entropy-based revision

详情
英文摘要

In direct response to requests for a logico-mathematical test of the conjecture, we formally disprove a recently conjectured artificial intelligence trade-off between epistemic certainty and scope in its published universal hyperbolic product form, as introduced in Philosophy and Technology. Certainty is defined as the worst-case correctness probability over the input space, and scope as the sum of the Kolmogorov complexities of the input and output sets. Using standard facts from coding theory and algorithmic information theory, we show, first, that when the conjecture is instantiated with prefix (self-delimiting, prefix-free) Kolmogorov complexity, it leads to an internal inconsistency, and second, that when it is instantiated with plain Kolmogorov complexity, it is refuted by a constructive counterexample. These results establish a main theorem: contrary to the conjecture's claim, no universal "certainty-scope" hyperbola holds as a general bound under the published definitions. We further show that a subsequent "entropy-based" revision, replacing the Kolmogorov scope with Shannon joint entropy and redefining the epistemic certainty level accordingly, cannot restore universality either.

2512.24493 2026-04-06 eess.SY cs.RO cs.SY

Bayesian Safety Guarantees for Port-Hamiltonian Systems with Learned Energy Functions

Chi Ho Leung, Philip E. Paré

详情
英文摘要

Control barrier functions for port-Hamiltonian systems inherit model uncertainty when the Hamiltonian is learned from data. We show how to propagate this uncertainty into a safety filter with independently tunable credibility budgets. To propagate this uncertainty, we employ a two-stage Bayesian approach. First, posterior prediction over the Hamiltonian yields credible bands for the energy storage, producing Bayesian barriers whose safe sets are high-probability inner approximations of the true allowable set with credibility $1 - (η_{\mathrm{ptB}})$. Independently, a drift credible ellipsoid accounts for vector field uncertainty in the CBF inequality with credibility $1 - (η_{\rm dr})$. Since energy and drift uncertainties enter through disjoint credible sets, the end-to-end safety guarantee is at least $1 - (η_{\rm dr} + η_{\mathrm{ptB}})$. Experiments on a mass-spring oscillator with a GP-learned Hamiltonian show that the proposed filter preserves safety despite limited and noisy observations. Moreover, we show that the proposed framework yields a larger safe set than an unstructured GP-CBF alternative on a planar manipulator.

2512.13907 2026-04-06 cs.CY cs.AI

Assessing High-Risk AI Systems under the EU AI Act: From Legal Requirements to Technical Verification

Alessio Buscemi, Tom Deckenbrunnen, Fahria Kabir, Kateryna Mishchenko, Nishat Mowla

详情
英文摘要

The implementation of the AI Act requires practical mechanisms to verify compliance with legal obligations, yet concrete and operational mappings from high-level requirements to verifiable assessment activities remain limited, contributing to uneven readiness across Member States. This paper presents a structured mapping that translates high-level AI Act requirements into concrete, implementable verification activities applicable across the AI lifecycle. The mapping is derived through a systematic process in which legal requirements are decomposed into operational sub-requirements and grounded in authoritative standards and recognised practices. From this basis, verification activities are identified and characterised along two dimensions: the type of verification performed and the lifecycle target to which it applies. By making explicit the link between regulatory intent and technical and organisational assurance practices, the proposed mapping reduces interpretive uncertainty and provides a reusable reference for consistent, technology-agnostic compliance verification under the AI Act.

2511.16858 2026-04-06 cs.SE cs.LG

Investigating Test Overfitting on SWE-bench

Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar, Martin Hirzel

详情
英文摘要

Tests can be useful towards resolving issues on code repositories. However, relying too much on tests for issue resolution can lead to code that technically passes observed tests but actually misses important cases or even breaks functionality. This problem, called test overfitting, is exacerbated by the fact that issues usually lack readily executable tests. Instead, several issue resolution systems use tests auto-generated from issues, which may be imperfect. Some systems even iteratively refine code and tests jointly. This paper presents the first empirical study of test overfitting in this setting.

2511.14617 2026-04-06 cs.DC cs.LG

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning

Ruoyu Qin, Weiran He, Weixiao Huang, Yangkun Zhang, Yikai Zhao, Bo Pang, Xinran Xu, Yingdi Shan, Yongwei Wu, Mingxing Zhang

详情
英文摘要

Reinforcement Learning (RL) has emerged as a critical technique for advancing modern Large Language Models (LLMs), yet existing synchronous RL systems face severe performance bottlenecks. The rollout phase, which dominates end-to-end iteration time, suffers from substantial long-tail latency and poor resource utilization due to inherent workload imbalance. We present Seer, a novel context learning RL system that addresses these challenges through a key observation: requests sharing the same prompt exhibit strong similarities in output lengths and response patterns. Leveraging this insight, Seer introduces three coordinated techniques: (1) divided rollout for dynamic load balancing, (2) context-aware scheduling to mitigate long-tail request delays, and (3) adaptive grouped speculative decoding to accelerate generation. These mechanisms work in concert to markedly reduce long-tail latency and improve resource efficiency during rollout. Evaluations on production-grade RL workloads demonstrate that Seer achieves up to 2.04$\times$ end-to-end rollout throughput improvement compared to the state-of-the-art synchronous RL systems, while notably reducing long-tail latency by 72-94%.

2511.06731 2026-04-06 physics.geo-ph cs.AI

Recovering Sub-threshold S-wave Arrivals in Deep Learning Phase Pickers via Shape-Aware Loss

Chun-Ming Huang, Li-Heng Chang, I-Hsin Chang, An-Sheng Lee, Hao Kuo-Chen

详情
英文摘要

Deep learning has transformed seismic phase picking, but a systematic failure mode persists: for some S-wave arrivals that appear unambiguous to human analysts, the model produces only a distorted peak trapped below the detection threshold, even as the P-wave prediction on the same record appears flawless. By examining training dynamics and loss landscape geometry, we diagnose this amplitude suppression as an optimization trap arising from three interacting factors. Temporal uncertainty in S-wave arrivals, CNN bias toward amplitude boundaries, and the inability of pointwise loss to provide lateral corrective forces combine to create the trap. The diagnosis reveals that phase arrival labels are structured shapes rather than independent probability estimates, requiring training objectives that preserve coherence. We formalize this as the shape-then-align strategy and validate it through a conditional GAN proof of concept, recovering previously sub-threshold signals and achieving a 64% increase in effective S-phase detections. Beyond this implementation, the loss landscape visualization and numerical simulation techniques we introduce provide a general methodology for analyzing how label designs and loss functions interact with temporal uncertainty, transforming these choices from trial-and-error into principled analysis.

2511.03909 2026-04-06 cs.CG cs.LG math.AT

Tensor Computation of Euler Characteristic Functions and Transforms

Jessi Cisewski-Kehe, Brittany Terese Fasy, Alexander McCleary, Eli Quist

详情
英文摘要

The weighted Euler characteristic transform (WECT) and Euler characteristic function (ECF) have proven to be useful tools in a variety of applications. However, current methods for computing these functions are either not optimized for GPU computation or do not scale to higher-dimensional settings. In this work, we present a tensor-based framework for computing such topological descriptors which is highly optimized for GPU architectures and works in full generality across simplicial and cubical complexes of arbitrary dimension. Experimentally, the framework demonstrates significant speedups over existing methods when computing the WECT and ECF across a variety of two- and three-dimensional datasets. Computation of these transforms is implemented in a publicly available Python package called pyECT.

2511.01154 2026-04-06 math.PR cs.LG math.ST stat.TH

Stability of the Kim--Milman flow map

Sinho Chewi, Aram-Alexandre Pooladian, Matthew S. Zhang

详情
英文摘要

In this short note, we characterize stability of the Kim--Milman flow map -- also known as the probability flow ODE -- with respect to variations in the target measure in relative Fisher information.

2510.20847 2026-04-06 q-bio.NC cs.AI

Integrated representational signatures strengthen specificity in brains and models

Jialin Wu, Shreya Saha, Yiqing Bo, Meenakshi Khosla

详情
英文摘要

The extent to which different neural or artificial neural networks (models) rely on equivalent representations to support similar tasks remains a central question in neuroscience and machine learning. Prior work has typically compared systems using a single representational similarity metric, yet each captures only one facet of representational structure. To address this, we leverage a suite of representational similarity metrics-each capturing a distinct facet of representational correspondence, such as geometry, unit-level tuning, or linear decodability-and assess brain region or model separability using multiple complementary measures. Metrics that preserve geometric or tuning structure (e.g., RSA, Soft Matching) yield stronger region-based discrimination, whereas more flexible mappings such as Linear Predictivity show weaker separation. These findings suggest that geometry and tuning encode brain-region- or model-family-specific signatures, while linearly decodable information tends to be more globally shared across regions or models. To integrate these complementary representational facets, we adapt Similarity Network Fusion (SNF), a framework originally developed for multi-omics data integration. SNF produces substantially sharper regional and model family-level separation than any single metric and yields robust composite similarity profiles. Moreover, clustering cortical regions using SNF-derived similarity scores reveals a clearer hierarchical organization that aligns closely with established anatomical and functional hierarchies of the visual cortex-surpassing the correspondence achieved by individual metrics.

2510.15483 2026-04-06 stat.ML cs.LG

Fast Best-in-Class Regret for Contextual Bandits

Samuel Girard, Aurelien Bibaut, Arthur Gretton, Nathan Kallus, Houssam Zenati

详情
英文摘要

We study the problem of stochastic contextual bandits in the agnostic setting, where the goal is to compete with the best policy in a given class without assuming realizability or imposing model restrictions on losses or rewards. In this work, we establish the first fast rate for regret relative to the best-in-class policy. Our proposed algorithm updates the policy at every round by minimizing a pessimistic objective, defined as a clipped inverse-propensity estimate of the policy value plus a variance penalty. By leveraging entropy assumptions on the policy class and a Hölderian error-bound condition (a generalization of the margin condition), we achieve fast best-in-class regret rates, including polylogarithmic rates in the parametric case. The analysis is driven by a sequential self-normalized maximal inequality for bounded martingale empirical processes, which yields uniform variance-adaptive confidence bounds and guarantees pessimism under adaptive data collection.

2510.02513 2026-04-06 stat.ML cs.DS cs.LG cs.NA math.NA stat.CO

Adaptive randomized pivoting and volume sampling

Ethan N. Epperly

Comments 14 pages, 2 figures

详情
英文摘要

Adaptive randomized pivoting (ARP) is a recently proposed and highly effective algorithm for column subset selection. This paper reinterprets the ARP algorithm by drawing connections to the volume sampling distribution and active learning algorithms for linear regression. As consequences, this paper presents new analysis for the ARP algorithm and faster implementations using rejection sampling.