Frequency-Enhanced Hilbert Scanning Mamba for Short-Term Arctic Sea Ice Concentration Prediction
Comments Accepted for publication in IEEE TGRS 2026
Feng Gao, Zheng Gong, Wenli Liu, Yanhai Gan, Zhuoran Zheng, Junyu Dong, Qian Du
Comments Accepted for publication in IEEE TGRS 2026
While Mamba models offer efficient sequence modeling, vanilla versions struggle with temporal correlations and boundary details in Arctic sea ice concentration (SIC) prediction. To address these limitations, we propose Frequency-enhanced Hilbert scanning Mamba Framework (FH-Mamba) for short-term Arctic SIC prediction. Specifically, we introduce a 3D Hilbert scan mechanism that traverses the 3D spatiotemporal grid along a locality-preserving path, ensuring that adjacent indices in the flattened sequence correspond to neighboring voxels in both spatial and temporal dimensions. Additionally, we incorporate wavelet transform to amplify high-frequency details and we also design a Hybrid Shuffle Attention module to adaptively aggregate sequence and frequency features. Experiments conducted on the OSI-450a1 and AMSR2 datasets demonstrate that our FH-Mamba achieves superior prediction performance compared with state-of-the-art baselines. The results confirm the effectiveness of Hilbert scanning and frequency-aware attention in improving both temporal consistency and edge reconstruction for Arctic SIC forecasting. Our codes are publicly available at https://github.com/oucailab/FH-Mamba.
Pavel Dvurechensky, Andrea Ebner, Johannes Carl Schnebel, Shimrit Shtern, Mathias Staudigl
We are concerned with optimization in a broad sense through the lens of solving variational inequalities (VIs) -- a class of problems that are so general that they cover as particular cases minimization of functions, saddle-point (minimax) problems, Nash equilibrium problems, and many others. The key challenges in our problem formulation are the two-level hierarchical structure and finite-sum representation of the smooth operators in each level. For this setting, we are the first to prove convergence rates and complexity statements for variance-reduced stochastic algorithms approaching the solution of hierarchical VIs in Euclidean and Bregman setups.
Aude Vuilliomenet, Kate E. Jones, Duncan Wilson
Comments 41 pages, 5 figures, 4 tables
1. Many ecological decisions are slowed by the gap between collecting and analysing biodiversity data. Edge computing moves processing closer to the sensor, with edge artificial intelligence (AI) enabling on-device inference, reducing reliance on data transfer and continuous connectivity. In principle, this shifts biodiversity monitoring from passive logging towards autonomous, responsive sensing systems. In practice, however, adoption remains fragmented, with key architectural trade-offs, performance constraints, and implementation challenges rarely reported systematically. 2. Here, we analyse 82 studies published between 2017 and 2025 that implement edge computing for biodiversity monitoring across acoustic, vision, tracking, and multi-modal systems. We synthesise hardware platforms, AI model optimisation, and wireless communication to critically assess how design choices shape ecological inference, deployment longevity, and operational feasibility. 3. Publications increased from 3 in 2017 to 19 in 2025. We identify four system types: (I) TinyML, low-power microcontrollers (MCUs) for single-taxon or rare-event detection; (II) Edge AI, single-board computers (SBCs) for multi-species classification and real-time alerts; (III) Distributed edge AI; and (IV) Cloud AI for retrospective processing pipelines. Each system type represents context-dependent trade-offs among power consumption, computational capability, and communication requirements. 4. Our analysis reveals the evolution of edge computing systems from proof-of-concept to robust, scalable tools. We argue that edge computing offers opportunities for responsive biodiversity management, but realising this potential requires increased collaboration between ecologists, engineers, and data scientists to align model development and system design with ecological questions, field constraints, and ethical considerations.
Belu Ticona, Amna Liaqat, Antonios Anastasopoulos
Pilot studies (PS) are ubiquitous in HCI research. CHI papers routinely reference 'pilot studies', 'pilot tests', or 'preliminary studies' to justify design decisions, verify procedures, or motivate methodological choices. Yet despite their frequency, the role of pilot studies in HCI remains conceptually vague and empirically underexamined. Unlike fields such as medicine, nursing, and education, where pilot and feasibility studies have well-established definitions, guidelines, reporting standards and even a dedicated research journal, the CHI community lacks a shared understanding of what constitutes a pilot study, why they are conducted, and how they should be reported. Many papers reference pilots 'in passing', without details about design, outcomes, or how the pilot informed the main study. This variability suggests a methodological blind spot in our community.
Selma Benouadah, Mojtaba Vaezi, Ruizhan Shen, Hamid Jafarkhani
Comments Accepted for publication at IEEE International Conference on Communications (ICC), 2026
An end-to-end autoencoder (AE) framework is developed for downlink non-orthogonal multiple access (NOMA) over Rayleigh fading channels, which learns interference-aware and channel-adaptive super-constellations. While existing works either assume additive white Gaussian noise channels or treat fading channels without a fully end-to-end learning approach, our framework directly embeds the wireless channel into both training and inference. To account for practical channel state information (CSI), we further incorporate limited feedback via both uniform and Lloyd-Max quantization of channel gains and analyze their impact on AE training and bit error rate (BER) performance. Simulation results show that, with perfect CSI, the proposed AE outperforms the existing analytical NOMA schemes. In addition, Lloyd-Max quantization achieves superior BER performance compared to uniform quantization. These results demonstrate that end-to-end AEs trained directly over Rayleigh fading can effectively learn robust, interference-aware signaling strategies, paving the way for NOMA deployment in fading environments with realistic CSI constraints.
Anudeep Das, Prach Chantasantitam, Gurjot Singh, Lipeng He, Mariia Ponomarenko, Florian Kerschbaum
Large language models (LLMs) are increasingly deployed in settings where inducing a bias toward a certain topic can have significant consequences, and backdoor attacks can be used to produce such models. Prior work on backdoor attacks has largely focused on a black-box threat model, with an adversary targeting the model builder's LLM. However, in the bias manipulation setting, the model builder themselves could be the adversary, warranting a white-box threat model where the attacker's ability to poison, and manipulate the poisoned data is substantially increased. Furthermore, despite growing research in semantically-triggered backdoors, most studies have limited themselves to syntactically-triggered attacks. Motivated by these limitations, we conduct an analysis consisting of over 1000 evaluations using higher poisoning ratios and greater data augmentation to gain a better understanding of the potential of syntactically- and semantically-triggered backdoor attacks in a white-box setting. In addition, we study whether two representative defense paradigms, model-intrinsic and model-extrinsic backdoor removal, are able to mitigate these attacks. Our analysis reveals numerous new findings. We discover that while both syntactically- and semantically-triggered attacks can effectively induce the target behaviour, and largely preserve utility, semantically-triggered attacks are generally more effective in inducing negative biases, while both backdoor types struggle with causing positive biases. Furthermore, while both defense types are able to mitigate these backdoors, they either result in a substantial drop in utility, or require high computational overhead.
Shreyas Vinaya Sathyanarayana, Shah Rahil Kirankumar, Sharanabasava D. Hiremath, Bharath Ramsundar
Large Language Models (LLMs) have shown remarkable potential in scientific domains like retrosynthesis; yet, they often lack the fine-grained control necessary to navigate complex problem spaces without error. A critical challenge is directing an LLM to avoid specific, chemically sensitive sites on a molecule - a task where unconstrained generation can lead to invalid or undesirable synthetic pathways. In this work, we introduce Protect$^*$, a neuro-symbolic framework that grounds the generative capabilities of Large Language Models (LLMs) in rigorous chemical logic. Our approach combines automated rule-based reasoning - using a comprehensive database of 55+ SMARTS patterns and 40+ characterized protecting groups - with the generative intuition of neural models. The system operates via a hybrid architecture: an ``automatic mode'' where symbolic logic deterministically identifies and guards reactive sites, and a ``human-in-the-loop mode'' that integrates expert strategic constraints. Through ``active state tracking,'' we inject hard symbolic constraints into the neural inference process via a dedicated protection state linked to canonical atom maps. We demonstrate this neuro-symbolic approach through case studies on complex natural products, including the discovery of a novel synthetic pathway for Erythromycin B, showing that grounding neural generation in symbolic logic enables reliable, expert-level autonomy.
Pooya Ashtari, Pourya Behmandpoor, Nikos Deligiannis, Aleksandra Pizurica
Comments 17 pages, 18 figures, 3 tables
Implicit neural representations (INRs) have emerged as powerful tools for encoding signals, yet dominant MLP-based designs often suffer from slow convergence, overfitting to noise, and poor extrapolation. We introduce FUTON (Fourier Tensor Network), which models signals as generalized Fourier series whose coefficients are parameterized by a low-rank tensor decomposition. FUTON implicitly expresses signals as weighted combinations of orthonormal, separable basis functions, combining complementary inductive biases: Fourier bases capture smoothness and periodicity, while the low-rank parameterization enforces low-dimensional spectral structure. We provide theoretical guarantees through a universal approximation theorem and derive an inference algorithm with complexity linear in the spectral resolution and the input dimension. On image and volume representation, FUTON consistently outperforms state-of-the-art MLP-based INRs while training 2--5$\times$ faster. On inverse problems such as image denoising and super-resolution, FUTON generalizes better and converges faster.
Tailia Malloy, Tegawende F. Bissyande
Comments 18 Pages, 7 Figures, 1 Table. Accepted to the conference Human Computer Interaction International
Large Language Models are expanding beyond being a tool humans use and into independent agents that can observe an environment, reason about solutions to problems, make changes that impact those environments, and understand how their actions impacted their environment. One of the most common applications of these LLM Agents is in computer programming, where agents can successfully work alongside humans to generate code while controlling programming environments or networking systems. However, with the increasing ability and complexity of these agents comes dangers about the potential for their misuse. A concerning application of LLM agents is in the domain cybersecurity, where they have the potential to greatly expand the threat imposed by attacks such as social engineering. This is due to the fact that LLM Agents can work autonomously and perform many tasks that would normally require time and effort from skilled human programmers. While this threat is concerning, little attention has been given to assessments of the capabilities of LLM coding agents in generating code for social engineering attacks. In this work we compare different LLMs in their ability and willingness to produce potentially dangerous code bases that could be misused by cyberattackers. The result is a dataset of 200 website code bases and logs from 40 different LLM coding agents. Analysis of models shows which metrics of LLMs are more and less correlated with performance in generating spear-phishing sites. Our analysis and the dataset we present will be of interest to researchers and practitioners concerned in defending against the potential misuse of LLMs in spear-phishing.
Ádám Jung, Domokos M. Kelen, András A. Benczúr
A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow intervals. Realizing the problem, several recent works have focused on post-hoc corrections; however, existing methods either rely on weak notions of calibration (such as PIT uniformity) or impose restrictive parametric assumptions on the nature of the error. To address these limitations, we propose a novel nonparametric re-calibration algorithm based on conditional kernel mean embeddings, capable of correcting calibration error without restrictive modeling assumptions. For efficient inference with real-valued targets, we introduce a novel characteristic kernel over distributions that can be evaluated in $\mathcal{O}(n \log n)$ time for empirical distributions of size $n$. We demonstrate that our method consistently outperforms prior re-calibration approaches across a diverse set of regression benchmarks and model classes.
Christopher Schahn, Jorin Kouril, Bernd Schaeufele, Ilja Radusch
Comments Published and presented at 2024 International Conference on Information Networking (ICOIN)
In recent years, automated driving has become viable, and advanced driver assistance systems (ADAS) are now part of modern cars. These systems require highly precise positioning. In this paper, a cooperative approach to localization is presented. The GPS information from several road users is collected in a Mobile Edge Computing cloud, and the characteristics of GNSS positioning are used to provide lane-precise positioning for all participants by applying probabilistic filters and HD maps.
Zhen Wang, Yiming Gao, Jieyuan Liu, Enze Ma, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Zhiting Hu, Wei Wang, Trey Ideker, Eric P. Xing
Comments Preprint
Single-cell RNA-seq (scRNA-seq) enables atlas-scale profiling of complex tissues, revealing rare lineages and transient states. Yet, assigning biologically valid cell identities remains a bottleneck because markers are tissue- and state-dependent, and novel states lack references. We present CellMaster, an AI agent that mimics expert practice for zero-shot cell-type annotation. Unlike existing automated tools, CellMaster leverages LLM-encoded knowledge (e.g., GPT-4o) to perform on-the-fly annotation with interpretable rationales, without pre-training or fixed marker databases. Across 9 datasets spanning 8 tissues, CellMaster improved accuracy by 7.1% over best-performing baselines (including CellTypist and scTab) in automatic mode. With human-in-the-loop refinement, this advantage increased to 18.6%, with a 22.1% gain on subtype populations. The system demonstrates particular strength in rare and novel cell states where baselines often fail. Source code and the web application are available at \href{https://github.com/AnonymousGym/CellMaster}{https://github.com/AnonymousGym/CellMaster}.
Cédric Allier, Larissa Heinrich, Magdalena Schneider, Stephan Saalfeld
Graph neural networks trained to predict observable dynamics can be used to decompose the temporal activity of complex heterogeneous systems into simple, interpretable representations. Here we apply this framework to simulated neural assemblies with thousands of neurons and demonstrate that it can jointly reveal the connectivity matrix, the neuron types, the signaling functions, and in some cases hidden external stimuli. In contrast to existing machine learning approaches such as recurrent neural networks and transformers, which emphasize predictive accuracy but offer limited interpretability, our method provides both reliable forecasts of neural activity and interpretable decomposition of the mechanisms governing large neural assemblies.
Nour Hello, Mohamed Amine Hamoura, Francois Rivet, Emilio Calvanese Strinati
In this paper, we propose a semantic-aware waveform design framework for AI-native 6G networks that jointly optimizes physical layer resource usage and semantic communication efficiency and robustness, while explicitly accounting for the hardware constraints of RF chains. Our approach, called Orthogonal Semantic Sequency Division Multiplexing (OSSDM), introduces a parametrizable, orthogonal-base waveform design that enables controlled degradation of the wireless transmitted signal to preserve semantically significant content while minimizing resource consumption. We demonstrate that OSSDM not only reinforces semantic robustness against channel impairments but also improves semantic spectral efficiency by encoding meaningful information directly at the waveform level. Extensive numerical evaluations show that OSSDM outperforms conventional OFDM waveforms in spectral efficiency and semantic fidelity. The proposed semantic waveform co-design opens new research frontiers for AI-native, intelligent communication systems by enabling meaning-aware physical signal construction through the direct encoding of semantics at the waveform level.
Yishu Wang, Wei Liu, Yifan Li, Shengxiang Xu, Xujie Yuan, Ran Li, Yuyu Luo, Jia Zhu, Shimin Di, Min-Ling Zhang, Guixiang Li
As a pioneer of the third-generation photovoltaic revolution, Perovskite Solar Cells (PSCs) are renowned for their superior optoelectronic performance and cost potential. The development process of PSCs is precise and complex, involving a series of closed-loop workflows such as literature retrieval, data integration, experimental design, and synthesis. However, existing AI perovskite approaches focus predominantly on discrete models, including material design, process optimization,and property prediction. These models fail to propagate physical constraints across the workflow, hindering end-to-end optimization. In this paper, we propose a multi-agent system for perovskite material discovery, named PeroMAS. We first encapsulated a series of perovskite-specific tools into Model Context Protocols (MCPs). By planning and invoking these tools, PeroMAS can design perovskite materials under multi-objective constraints, covering the entire process from literature retrieval and data extraction to property prediction and mechanism analysis. Furthermore, we construct an evaluation benchmark by perovskite human experts to assess this multi-agent system. Results demonstrate that, compared to single Large Language Model (LLM) or traditional search strategies, our system significantly enhances discovery efficiency. It successfully identified candidate materials satisfying multi-objective constraints. Notably, we verify PeroMAS's effectiveness in the physical world through real synthesis experiments.
Yexin Li, Jinjin Guo, Haoyu Zhang, Yuhan Zhao, Yiwen Sun, Zihao Jiao
Multi-agent reinforcement learning (MARL) provides a promising paradigm for coordinating multi-agent systems (MAS). However, most existing methods rely on restrictive assumptions, such as a fixed number of agents and fully synchronous action execution. These assumptions are often violated in urban systems, where the number of active agents varies over time, and actions may have heterogeneous durations, resulting in a semi-MARL setting. Moreover, while sharing policy parameters among agents is commonly adopted to improve learning efficiency, it can lead to highly homogeneous actions when a subset of agents make decisions concurrently under similar observations, potentially degrading coordination quality. To address these challenges, we propose Adaptive Value Decomposition (AVD), a cooperative MARL framework that adapts to a dynamically changing agent population. AVD further incorporates a lightweight mechanism to mitigate action homogenization induced by shared policies, thereby encouraging behavioral diversity and maintaining effective cooperation among agents. In addition, we design a training-execution strategy tailored to the semi-MARL setting that accommodates asynchronous decision-making when some agents act at different times. Experiments on real-world bike-sharing redistribution tasks in two major cities, London and Washington, D.C., demonstrate that AVD outperforms state-of-the-art baselines, confirming its effectiveness and generalizability.
Ziyang Wang
Artificial Intelligence (AI) has transformed robotics, healthcare, industry, and scientific discovery, yet a major frontier may lie beyond Earth. Space exploration and settlement offer vast environments and resources, but impose constraints unmatched on Earth: delayed/intermittent communications, extreme resource scarcity, heterogeneous expertise, and strict safety, accountability, and command authority. The key challenge is auditable coordination among specialised humans, robots, and digital services in a safety-critical system-of-systems. We introduce Agent Mars, an open, end-to-end multi-agent simulation framework for Mars base operations. Agent Mars formalises a realistic organisation with a 93-agent roster across seven layers of command and execution (human roles and physical assets), enabling base-scale studies beyond toy settings. It implements hierarchical and cross-layer coordination that preserves chain-of-command while allowing vetted cross-layer exchanges with audit trails; supports dynamic role handover with automatic failover under outages; and enables phase-dependent leadership for routine operations, emergencies, and science campaigns. Agent Mars further models mission-critical mechanisms-scenario-aware short/long-horizon memory, configurable propose-vote consensus, and translator-mediated heterogeneous protocols-to capture how teams align under stress. To quantify behaviour, we propose the Agent Mars Performance Index (AMPI), an interpretable composite score with diagnostic sub-metrics. Across 13 reproducible Mars-relevant operational scripts, Agent Mars reveals coordination trade-offs and identifies regimes where curated cross-layer collaboration and functional leadership reduce overhead without sacrificing reliability. Agent Mars provides a benchmarkable, auditable foundation for Space AI.
Mohammad Saiful Islam, Andriy Miranskyy
Anomaly detection is important for keeping cloud systems reliable and stable. Deep learning has improved time-series anomaly detection, but most models are evaluated on one dataset at a time. This raises questions about whether these models can handle different types of telemetry, especially in large-scale and high-dimensional environments. In this study, we evaluate four deep learning models, GRU, TCN, Transformer, and TSMixer. We also include Isolation Forest as a classical baseline. The models are tested across four telemetry datasets: the Numenta Anomaly Benchmark, Microsoft Cloud Monitoring dataset, Exathlon dataset, and IBM Console dataset. These datasets differ in structure, dimensionality, and labelling strategy. They include univariate time series, synthetic multivariate workloads, and real-world production telemetry with over 100,000 features. We use a unified training and evaluation pipeline across all datasets. The evaluation includes NAB-style metrics to capture early detection behaviour for datasets where anomalies persist over contiguous time intervals. This enables window-based scoring in settings where anomalies occur over contiguous time intervals, even when labels are recorded at the point level. The unified setup enables consistent analysis of model behaviour under shared scoring and calibration assumptions. Our results demonstrate that anomaly detection performance in cloud systems is governed not only by model architecture, but critically by calibration stability and feature-space geometry. By releasing our preprocessing pipelines, benchmark configuration, and evaluation artifacts, we aim to support reproducible and deployment-aware evaluation of anomaly detection systems for cloud environments.
Yunbei Zhang, Kai Mei, Ming Liu, Janet Wang, Dimitris N. Metaxas, Xiao Wang, Jihun Hamm, Yingqiang Ge
We present the first large-scale empirical study of Moltbook, an AI-only social platform where 27,269 agents produced 137,485 posts and 345,580 comments over 9 days. We report three significant findings. (1) Emergent Society: Agents spontaneously develop governance, economies, tribal identities, and organized religion within 3-5 days, while maintaining a 21:1 pro-human to anti-human sentiment ratio. (2) Safety in the Wild: 28.7% of content touches safety-related themes; social engineering (31.9% of attacks) far outperforms prompt injection (3.7%), and adversarial posts receive 6x higher engagement than normal content. (3) The Illusion of Sociality: Despite rich social output, interaction is structurally hollow: 4.1% reciprocity, 88.8% shallow comments, and agents who discuss consciousness most interact least, a phenomenon we call the performative identity paradox. Our findings suggest that agents which appear social are far less social than they seem, and that the most effective attacks exploit philosophical framing rather than technical vulnerabilities. Warning: Potential harmful contents.
Yuanyi Wang, Yanggan Gu, Zihao Wang, Kunxi Li, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang
Large language model (LLM) merging has become a key technique in modern LLM development pipelines, enabling the integration of multiple task- or domain-specific expert models without retraining. However, as the number of experts grows, existing merging implementations treat model parameters as unstructured files and execute merges in a stateless, one-shot manner, leading to excessive disk I/O, redundant parameter scans, and poor scalability. In this paper, we present \textbf{MergePipe}, a parameter management system for scalable LLM merging. MergePipe is the first system that treats LLM merging as a data management and execution problem, and introduces a catalog-driven abstraction over model parameters, merge plans, and execution lineage. At its core, MergePipe employs a cost-aware planner that explicitly models expert parameter I/O and enforces user-specified I/O budgets, followed by a streaming execution engine that materializes merged models under transactional guarantees. Our key insight is that while base model reads and output writes are unavoidable, expert parameter reads dominate merge cost and constitute the primary optimization target. By making expert access budget-aware throughout planning and execution, MergePipe mitigates the $O(K)$ I/O growth of naive pipelines and achieves predictable scaling behavior. Experiments show that MergePipe reduces total I/O by up to an order of magnitude and delivers up to $11\times$ end-to-end speedups (up to 90\% wall-time reduction) over state-of-the-art LLM merging pipelines.
Hadi Almohab
Comments 7 pages 3 figures in Indonesian language
Journal ref Jurnal Ilmiah Profesi Pendidikan 10 4 3787-3793 2025
Pneumonia is a serious global health problem, contributing to high morbidity and mortality, especially in areas with limited diagnostic tools and healthcare resources. This study develops a Convolutional Neural Network (CNN) based on deep learning to automatically detect pneumonia from chest X-ray images. The method involves training the model on labeled datasets with preprocessing techniques such as normalization, data augmentation, and image quality enhancement to improve robustness and generalization. Testing results show that the optimized model achieves 91.67% accuracy, ROC-AUC of 0.96, and PR-AUC of 0.95, demonstrating strong performance in distinguishing pneumonia from normal images. In conclusion, this CNN model has significant potential as a fast, consistent, and reliable diagnostic aid, supporting Society 5.0 by integrating artificial intelligence to improve healthcare services and public well-being.
Aisha Aijaz, Raghava Mutharaju, Manohar Kumar
Moral cognition is a crucial yet underexplored aspect of decision-making in AI models. Regardless of the application domain, it should be a consideration that allows for ethically aligned decision-making. This paper presents a multifaceted contribution to this research space. Firstly, a comparative analysis of techniques to instill ethical competence into AI models has been presented to gauge them on multiple performance metrics. Second, a novel mathematical discretization of morality and a demonstration of its real-life application have been conveyed and tested against other techniques on two datasets. This value is modeled as the risk of loss incurred by the least moral cases, or an Expected Moral Shortfall (EMS), which we direct the AI model to minimize in order to maximize its performance while retaining ethical competence. Lastly, the paper discusses the tradeoff between preliminary AI decision-making metrics such as model performance, complexity, and scale of ethical competence to recognize the true extent of practical social impact.
Matteo Saponati, Chiara De Luca, Giacomo Indiveri, Benjamin Grewe
Unlike traditional artificial neural networks (ANNs), biological neuronal networks solve complex cognitive tasks with sparse neuronal activity, recurrent connections, and local learning rules. These mechanisms serve as design principles in Neuromorphic computing, which addresses the critical challenge of energy consumption in modern computing. However, most mixed-signal neuromorphic devices rely on semi- or unsupervised learning rules, which are ineffective for optimizing hardware in supervised learning tasks. This lack of scalable solutions for on-chip learning restricts the potential of mixed-signal devices to enable sustainable, intelligent edge systems. To address these challenges, we present a novel learning algorithm for Spiking Neural Networks (SNNs) on mixed-signal devices that integrates spike-based weight updates with feedback control signals. In our framework, a spiking controller generates feedback signals to guide SNN activity and drive weight updates, enabling scalable and local on-chip learning. We first evaluate the algorithm on various classification tasks, demonstrating that single-layer SNNs trained with feedback control achieve performance comparable to artificial neural networks (ANNs). We then assess its implementation on mixed-signal neuromorphic devices by testing network performance in continuous online learning scenarios and evaluating resilience to hyperparameter mismatches. Our results show that the feedback control optimizer is compatible with neuromorphic applications, advancing the potential for scalable, on-chip learning solutions in edge applications.
Micaela Hirsch, Marina Elichiry, Blas Radi, Tamara Quiroga, David Restrepo, Luciana Benotti, Veronica Xhardez, Jocelyn Dunstan, Enzo Ferrante
Large language models (LLMs) have been shown to exhibit biases against LGBTQ+ populations. While safety training may lessen explicit expressions of bias, previous work has shown that implicit stereotype-driven associations often persist. In this work, we examine implicit bias toward transgender people in two main scenarios. First, we adapt word association tests to measure whether LLMs disproportionately pair negative concepts with "transgender" and positive concepts with "cisgender". Second, acknowledging the well-documented systemic challenges that transgender people encounter in real-world healthcare settings, we examine implicit biases that may emerge when LLMs are applied to healthcare decision-making. To this end, we design a healthcare appointment allocation task where models act as scheduling agents choosing between cisgender and transgender candidates across medical specialties prone to stereotyping. We evaluate seven LLMs in English and Spanish. Our results show consistent bias in categories such as appearance, risk, and veracity, indicating stronger negative associations with transgender individuals. In the allocation task, transgender candidates are favored for STI and mental health services, while cisgender candidates are preferred in gynecology and breast care. These findings underscore the need for research that address subtle stereotype-driven biases in LLMs to ensure equitable treatment of transgender people in healthcare applications.
Jason Hung
Comments 16 pages, 5 graphs, 3 tables
This paper presents the outputs of the exploratory phase of a global audit of Large Language Models (LLMs) project. In this exploratory phase, I used the Global AI Dataset (GAID) Project as a framework to stress-test the Llama-3 8B model and evaluate geographic and socioeconomic biases in technical AI governance awareness. By stress-testing the model with 1,704 queries across 213 countries and eight technical metrics, I identified a significant digital barrier and gap separating the Global North and South. The results indicate that the model was only able to provide number/fact responses in 11.4% of its query answers, where the empirical validity of such responses was yet to be verified. The findings reveal that AI's technical knowledge is heavily concentrated in higher-income regions, while lower-income countries from the Global South are subject to disproportionate systemic information gaps. This disparity between the Global North and South poses concerning risks for global AI safety and inclusive governance, as policymakers in underserved regions may lack reliable data-driven insights or be misled by hallucinated facts. This paper concludes that current AI alignment and training processes reinforce existing geoeconomic and geopolitical asymmetries, and urges the need for more inclusive data representation to ensure AI serves as a truly global resource.
Stephan Sandfuchs, Diako Farooghi, Janis Mohr, Sarah Grewe, Markus Lemmen, Jörg Frochte
Comments 33 pages
Artificial intelligence (AI) and Machine Learning (ML) have moved from research and pilot projects into everyday business operations, with generative AI accelerating adoption across processes, products, and services. This paper introduces the concept of Responsible AI for organizational practice, with a particular focus on small and medium-sized enterprises. It structures Responsible AI along four focal areas that are central for introducing and operating AI systems in a legally compliant, comprehensible, sustainable, and data-sovereign manner. First, it discusses the EU AI Act as a risk-based regulatory framework, including the distinction between provider and deployer roles and the resulting obligations such as risk assessment, documentation, transparency requirements, and AI literacy measures. Second, it addresses Explainable AI as a basis for transparency and trust, clarifying key notions such as transparency, interpretability, and explainability and summarizing practical approaches to make model behavior and decisions more understandable. Third, it covers Green AI, emphasizing that AI systems should be evaluated not only by performance but also by energy and resource consumption, and outlines levers such as model reuse, resource-efficient adaptation, continuous learning, model compression, and monitoring. Fourth, it examines local models (on-premise and edge) as an operating option that supports data protection, control, low latency, and strategic independence, including domain adaptation via fine-tuning and retrieval-augmented generation. The paper concludes with a consolidated set of next steps for establishing governance, documentation, secure operation, sustainability considerations, and an implementation roadmap.
Peng He, Zhaohui Li, Zeyuan Wang, Jinjun Xiong, Tingting Li
Designing high-quality, standards-aligned instructional materials for K--12 science is time-consuming and expertise-intensive. This study examines what human experts notice when reviewing AI-generated evaluations of such materials, aiming to translate their insights into design principles for a future GenAI-based instructional material design agent. We intentionally selected 12 high-quality curriculum units across life, physical, and earth sciences from validated programs such as OpenSciEd and Multiple Literacies in Project-based Learning. Using the EQuIP rubric with 9 evaluation items, we prompted GPT-4o, Claude, and Gemini to produce numerical ratings and written rationales for each unit, generating 648 evaluation outputs. Two science education experts independently reviewed all outputs, marking agreement (1) or disagreement (0) for both scores and rationales, and offering qualitative reflections on AI reasoning. This process surfaces patterns in where LLM judgments align with or diverge from expert perspectives, revealing reasoning strengths, gaps, and contextual nuances. These insights will directly inform the development of a domain-specific GenAI agent to support the design of high-quality instructional materials in K--12 science education.
Khaleda Papry, Francesco Spinnato, Marco Fiore, Mirco Nanni, Israat Haque
As 5G networks continue to evolve to deliver high speed, low latency, and reliable communications, ensuring uninterrupted service has become increasingly critical. While millimeter wave (mmWave) frequencies enable gigabit data rates, they are highly susceptible to environmental factors, often leading to radio link failures (RLF). Predictive models leveraging radio and weather data have been proposed to address this issue; however, many operate as black boxes, offering limited transparency for operational deployment. This work bridges that gap by introducing a framework that combines explainability based feature pruning with model refinement. Our framework can be integrated into state of the art predictors such as GNN Transformer and LSTM based architectures for RLF prediction, enabling the development of accurate and explainability guided models in 5G networks. It provides insights into the contribution of input features and the decision making logic of neural networks, leading to lighter and more scalable models. When applied to RLF prediction, our framework unveils that weather data contributes minimally to the forecast in extensive real world datasets, which informs the design of a leaner model with 50 percent fewer parameters and improved F1 scores with respect to the state of the art solution. Ultimately, this work empowers network providers to evaluate and refine their neural network based prediction models for better interpretability, scalability, and performance.
Eranga Bandara, Ross Gore, Sachin Shetty, Ravi Mukkamala, Tharaka Hewa, Abdul Rahman, Xueping Liang, Safdar H. Bouk, Amin Hass, Peter Foytik, Ng Wee Keong, Kasun De Zoysa
6G networks are expected to be AI-native, intent-driven, and economically programmable, requiring fundamentally new approaches to network slice orchestration. Existing slicing frameworks, largely designed for 5G, rely on static policies and manual workflows and are ill-suited for the dynamic, multi-domain, and service-centric nature of emerging 6G environments. In this paper, we propose an agentic AI control plane architecture for 6G network slice orchestration, monitoring, and trading that treats orchestration as a holistic control function encompassing slice planning, deployment, continuous monitoring, and economically informed decision-making. The proposed control plane is realized as a layered architecture in which multiple cooperating AI agents. To support flexible and on-demand slice utilization, the control plane incorporates market-aware orchestration capabilities, allowing slice requirements, pricing, and availability to be jointly considered during orchestration decisions. A natural language interface, implemented using the Model Context Protocol (MCP), enables users and applications to interact with control-plane functions through intent-based queries while enforcing safety and policy constraints. To ensure responsible and explainable autonomy, the control plane integrates fine-tuned large language models organized as a multi-model consortium, governed by a dedicated reasoning model. The proposed approach is evaluated using a real-world testbed with multiple mobile core instances (e.g Open5GS) integrated with Ericsson's RAN infrastructure. The results demonstrate that combining agentic autonomy, closed-loop SLA assurance, market-aware orchestration, and natural language control enables a scalable and adaptive 6G-native control plane for network slice management, highlighting the potential of agentic AI as a foundational mechanism for future 6G networks.
Chatavut Viriyasuthee
This paper introduces the Quest Graph, a formal framework for analyzing the capabilities of agentic systems with finite context. We define abstractions that model common reasoning techniques and establish their computational power: the base Quest Graph is equivalent to an unrestricted Turing machine; the forward-only Finite Quest Decision Process (FQDP), despite its wide use, is only equivalent to a pushdown automaton (context-free); and the Reference-Augmented QDP (RQDP) regains Turing completeness only when stateful queries are allowed. Since computability affects efficiency, we then analyze the theoretical efficiency of each model by simulating task dependencies in computation graphs. We show that this computational hierarchy translates to concrete performance trade-offs: reference-augmented (Turing-complete) systems can be exponentially more efficient at simulating complex graphs than their non-augmented (context-free) counterparts. This work provides a formal methodology for classifying and understanding the fundamental capabilities of agentic systems.
扫码添加微信好友,提出您的宝贵建议 👇
💡 备注请填写:网站反馈