arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1284
专题追踪
2601.15340 2026-02-09 cond-mat.dis-nn cond-mat.mes-hall cs.LG nlin.AO physics.app-ph

Learning Nonlinear Heterogeneity in Physical Kolmogorov-Arnold Networks

Fabiana Taglietti, Andrea Pulici, Maxwell Roxburgh, Gabriele Seguini, Ian Vidamour, Stephan Menzel, Edoardo Franco, Michele Laus, Eleni Vasilaki, Michele Perego, Thomas J. Hayward, Marco Fanciulli, Jack C. Gartside

详情
英文摘要

Physical neural networks typically train linear synaptic weights while treating device nonlinearities as fixed. We show the opposite - by training the synaptic nonlinearity itself, as in Kolmogorov-Arnold Network (KAN) architectures, we yield markedly higher task performance per physical resource and improved performance-parameter scaling than conventional linear weight-based networks, demonstrating ability of KAN topologies to exploit reconfigurable nonlinear physical dynamics. We experimentally realise physical KANs in silicon-on-insulator devices we term 'Synaptic Nonlinear Elements' (SYNEs), operating at room temperature, microampere currents, 2 MHz speeds and ~750 fJ per nonlinear operation, with no observed degradation over 10^13 measurements and months-long timescales. We demonstrate nonlinear function regression, classification, and prediction of Li-Ion battery dynamics from noisy real-world multi-sensor data. Physical KANs outperform equivalently-parameterised software multilayer perceptron networks across all tasks, with up to two orders of magnitude fewer parameters, and two orders of magnitude fewer devices than linear weight based physical networks. These results establish learned physical nonlinearity as a hardware-native computational primitive for compact and efficient learning systems, and SYNE devices as effective substrates for heterogenous nonlinear computing.

2601.02075 2026-02-09 cs.CE cs.LG

MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics

Zhuofan Shi, Hubao A, Yufei Shao, Dongliang Huang, Hongxu An, Chunxiao Xin, Haiyang Shen, Zhenyu Wang, Yunshan Na, Gang Huang, Xiang Jing

Comments 24 pages,4 figures

详情
英文摘要

Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL)--to train two domain-adapted models, MD-Instruct and MD-Code. Furthermore, we introduce MD-GRPO, a closed-loop RL method that leverages simulation outcomes as reward signals and recycles low-reward trajectories for continual refinement. We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction. Together with MD-EvalBench proposed in this work, the first benchmark for LAMMPS code generation and question answering, our models and system achieve performance surpassing several strong baselines.This work systematically demonstrates the adaptability and generalization capability of large language models in industrial simulation tasks, laying a methodological foundation for automatic code generation in AI for Science and industrial-scale simulations. URL: https://github.com/FredericVAN/PKU_MDAgent2

2512.23829 2026-02-09 math.NA cs.LG cs.NA

Deep learning methods for inverse problems using connections between proximal operators and Hamilton-Jacobi equations

Oluwatosin Akande, Gabriel P. Langlois, Akwum Onwunta

详情
英文摘要

Inverse problems are important mathematical problems that seek to recover model parameters from noisy data. Since inverse problems are often ill-posed, they require regularization or incorporation of prior information about the underlying model or unknown variables. Proximal operators, ubiquitous in nonsmooth optimization, are central to this because they provide a flexible and convenient way to encode priors and build efficient iterative algorithms. They have also recently become key to modern machine learning methods, e.g., for plug-and-play methods for learned denoisers and deep neural architectures for learning priors of proximal operators. The latter was developed partly due to recent work characterizing proximal operators of nonconvex priors as subdifferential of convex potentials. In this work, we propose to leverage connections between proximal operators and Hamilton-Jacobi partial differential equations (HJ PDEs) to develop novel deep learning architectures for learning the prior. In contrast to other existing methods, we learn the prior directly without recourse to inverting the prior after training. We present several numerical results that demonstrate the efficiency of the proposed method in high dimensions.

2512.22894 2026-02-09 cs.CR cs.AI

DECEPTICON: How Dark Patterns Manipulate Web Agents

Phil Cuvin, Hao Zhu, Diyi Yang

详情
英文摘要

Deceptive UI designs, widely instantiated across the web and commonly known as dark patterns, manipulate users into performing actions misaligned with their goals. In this paper, we show that dark patterns are highly effective in steering agent trajectories, posing a significant risk to agent robustness. To quantify this risk, we introduce DECEPTICON, an environment for testing individual dark patterns in isolation. DECEPTICON includes 700 web navigation tasks with dark patterns -- 600 generated tasks and 100 real-world tasks, designed to measure instruction-following success and dark pattern effectiveness. Across state-of-the-art agents, we find dark patterns successfully steer agent trajectories towards malicious outcomes in over 70% of tested generated and real-world tasks -- compared to a human average of 31%. Moreover, we find that dark pattern effectiveness correlates positively with model size and test-time reasoning, making larger, more capable models more susceptible. Leading countermeasures against adversarial attacks, including in-context prompting and guardrail models, fail to consistently reduce the success rate of dark pattern interventions. Our findings reveal dark patterns as a latent and unmitigated risk to web agents, highlighting the urgent need for robust defenses against manipulative designs.

2512.01249 2026-02-09 cs.NE cs.AI cs.SY eess.SY

Pascal-Weighted Genetic Algorithms: A Binomially-Structured Recombination Framework

Otman A. Basir

Comments 23 pages, 8 figures

详情
英文摘要

This paper introduces a new family of multi-parent recombination operators for Genetic Algorithms (GAs), based on normalized Pascal (binomial) coefficients. Unlike classical two-parent crossover operators, Pascal-Weighted Recombination (PWR) forms offsprings as structured convex combination of multiple parents, using binomially shaped weights that emphasize central inheritance while suppressing disruptive variance. We develop a mathematical framework for PWR, derive variance-transfer properties, and analyze its effect on schema survival. The operator is extended to real-valued, binary/logit, and permutation representations. We evaluate the proposed method on four representative benchmarks: (i) PID controller tuning evaluated using the ITAE metric, (ii) FIR low-pass filter design under magnitude-response constraints, (iii) wireless power-modulation optimization under SINR coupling, and (iv) the Traveling Salesman Problem (TSP). We demonstrate how, across these benchmarks, PWR consistently yields smoother convergence, reduced variance, and achieves 9-22% performance gains over standard recombination operators. The approach is simple, algorithm-agnostic, and readily integrable into diverse GA architectures.

2512.01039 2026-02-09 cs.DC cs.LG cs.NI

Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

Aladin Djuhera, Fernando Koch, Alecio Binotto

Journal ref IEEE International Conference on Computing, Networking and Communications (ICNC), 2026

详情
英文摘要

Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments. We introduce a framework in which both the spatial placement and internal segmentation of foundation models are elevated to runtime-resolved constructs. The orchestration problem is formalized as a constrained optimization over layer-wise assignments, subject to evolving latency, utilization, and privacy gradients. The framework implements reactive inference composition responsive to infrastructural fluctuations by integrating model-aware capacity profiling with dynamic graph re-partitioning and reallocation. We introduce architectural and algorithmic components, along with a representative use case in 6G multi-access edge computing.

2511.15015 2026-02-09 cs.PF cs.AI cs.LG

Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference

Kexin Chu, Dawei Xiang, Zixu Shen, Yiwei Yang, Zecheng Liu, Wei Zhang

Comments 13 pages

详情
英文摘要

Mixture-of-Experts (MoE) has become a practical architecture for scaling LLM capacity while keeping per-token compute modest, but deploying MoE models on a single, memory-limited GPU remains difficult because expert weights dominate the HBM footprint. Existing expert offloading and prefetching systems reduce the resident set, yet they often pay expert-loading costs on the critical path when activation becomes dense. Post-training quantization (PTQ) lowers the footprint without transfers, but prevailing pipelines fix expert bit-widths offline and assume routing remains stable, even though MoE expert utilization is heavy-tailed and the hot set can shift across workloads. We present DynaExq, a runtime-aware mixed-precision serving system that treats single-GPU MoE inference under a hard HBM envelope as an online, budget-constrained precision allocation problem. The key insight is to keep the experts that dominate runtime traffic resident at higher precision, while maintaining a low-precision fallback for the remaining experts, so the system can reduce transfer volume and avoid the waiting latency that limits offloading and prefetching under dense activation. DynaExq estimates long-horizon expert hotness from router traces, selects a per-layer high-precision resident set via a budget-feasible top-$n$ rule, and applies promotions and demotions asynchronously through stable expert handles so the forward pass always executes on a fully materialized expert version. Across Qwen3-MoE-30B/80B and six benchmarks, DynaExq improves accuracy over static PTQ on Qwen3-80B (73.09% to 77.57%) under comparable device-memory budgets and achieves up to 2.73x higher throughput than offloading/prefetch baselines at batch size 32.

2511.11817 2026-02-09 stat.ML cs.LG

FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition

Zhongde An, Jinhong You, Jiyanglin Li, Yiming Tang, Wen Li, Heming Du, Shouguo Du

Comments Added a code link and fixed minor typos

详情
英文摘要

Time series forecasting is essential in a wide range of real world applications. Recently, frequency-domain methods have attracted increasing interest for their ability to capture global dependencies. However, when applied to non-stationary time series, these methods encounter the $\textit{spectral entanglement}$ and the computational burden of complex-valued learning. The $\textit{spectral entanglement}$ refers to the overlap of trends, periodicities, and noise across the spectrum due to $\textit{spectral leakage}$ and the presence of non-stationarity. However, existing decompositions are not suited to resolving spectral entanglement. To address this, we propose the Frequency Decomposition Network (FreDN), which introduces a learnable Frequency Disentangler module to separate trend and periodic components directly in the frequency domain. Furthermore, we propose a theoretically supported ReIm Block to reduce the complexity of complex-valued operations while maintaining performance. We also re-examine the frequency-domain loss function and provide new theoretical insights into its effectiveness. Extensive experiments on seven long-term forecasting benchmarks demonstrate that FreDN outperforms state-of-the-art methods by up to 10\%. Furthermore, compared with standard complex-valued architectures, our real-imaginary shared-parameter design reduces the parameter count and computational cost by at least 50\%.

2511.07865 2026-02-09 cs.SE cs.AI cs.CL cs.MA

LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost

Daisuke Kikuta, Hiroki Ikeuchi, Kengo Tajiri

Comments Accepted at ASE 2025 NIER Track. The code is available at https://github.com/ntt-dkiku/chaos-eater

Journal ref 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE)

详情
英文摘要

Chaos Engineering (CE) is an engineering technique aimed at improving the resilience of distributed systems. It involves intentionally injecting faults into a system to test its resilience, uncover weaknesses, and address them before they cause failures in production. Recent CE tools automate the execution of predefined CE experiments. However, planning such experiments and improving the system based on the experimental results still remain manual. These processes are labor-intensive and require multi-domain expertise. To address these challenges and enable anyone to build resilient systems at low cost, this paper proposes ChaosEater, a system that automates the entire CE cycle with Large Language Models (LLMs). It predefines an agentic workflow according to a systematic CE cycle and assigns subdivided processes within the workflow to LLMs. ChaosEater targets CE for software systems built on Kubernetes. Therefore, the LLMs in ChaosEater complete CE cycles through software engineering tasks, including requirement definition, code generation, testing, and debugging. We evaluate ChaosEater through case studies on small- and large-scale Kubernetes systems. The results demonstrate that it consistently completes reasonable CE cycles with significantly low time and monetary costs. Its cycles are also qualitatively validated by human engineers and LLMs.

2511.06701 2026-02-09 cs.SE cs.AI

Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture

Karen Sargsyan

详情
英文摘要

AI-Scientist systems that use large language models to automate research risk generating spurious discoveries through uncontrolled multiple testing. We present a functional architecture that enforces statistical rigor at two levels: a Haskell embedded DSL (the Research monad) that makes it impossible to test a hypothesis without updating the error budget, and a declarative scaffolding technique that structurally prevents data leakage across the boundary into LLM-generated code. We ground these guarantees in a machine-checked Lean 4 formalization of the LORD++ online FDR control theorem (855 lines, zero sorry), which identifies four sufficient conditions for FDR control. Three are structural conditions -- about information flow, data separation, and test validity -- enforced by the architecture's type system and scaffolding. The fourth is an arithmetic condition: a budget invariant requiring that a wealth process remain non-negative under floating-point computation. We verify this condition over IEEE 754 doubles using SPARK/Ada, whose GNATprove toolchain statically confirms that no rounding sequence can violate the invariant and whose flow analysis independently confirms the predictability condition. The resulting verification chain -- from real-analysis proof to floating-point implementation -- is, to our knowledge, the first for any online FDR control procedure. Monte Carlo simulation (N=2000 hypotheses) and an end-to-end case study confirm that the monadic implementation controls FDR at 1.1% against a 5% target, while a naive approach inflates to 41%.

2511.00732 2026-02-09 cs.NE cs.AI cs.AR

FeNN-DMA: A RISC-V SoC for SNN acceleration

Zainab Aizaz, James C. Knight, Thomas Nowotny

详情
英文摘要

Spiking Neural Networks (SNNs) are a promising, energy-efficient alternative to standard Artificial Neural Networks (ANNs) and are particularly well-suited to spatio-temporal tasks such as keyword spotting and video classification. However, SNNs have a much lower arithmetic intensity than ANNs and are therefore not well-matched to standard accelerators like GPUs and TPUs. Field Programmable Gate Arrays (FPGAs) are designed for such memory-bound workloads, and here we present a novel, fully-programmable RISC-V-based system-on-chip (FeNN-DMA), tailored to simulating SNNs on modern UltraScale+ FPGAs. We show that FeNN-DMA has comparable resource usage and energy requirements to state-of-the-art fixed-function SNN accelerators, yet it supports more complex neuron models and network topologies, and can simulate up to 16 thousand neurons and 256 million synapses per core. Using this functionality, we demonstrate state-of-the-art classification accuracy on the Spiking Heidelberg Digits, Neuromorphic MNIST and Braille tactile classification tasks.

2510.25814 2026-02-09 q-bio.QM cs.LG

Optimizing Mirror-Image Peptide Sequence Design for Data Storage via Peptide Bond Cleavage Prediction

Yilong Lu, Si Chen, Songyan Gao, Han Liu, Xin Dong, Wenfeng Shen, Guangtai Ding

Comments 8 pages, 4 figures;Accepted by BIBM 2025

详情
英文摘要

Traditional non-biological storage media, such as hard drives, face limitations in both storage density and lifespan due to the rapid growth of data in the big data era. Mirror-image peptides composed of D-amino acids have emerged as a promising biological storage medium due to their high storage density, structural stability, and long lifespan. The sequencing of mirror-image peptides relies on \textit{de-novo} technology. However, its accuracy is limited by the scarcity of tandem mass spectrometry datasets and the challenges that current algorithms encounter when processing these peptides directly. This study is the first to propose improving sequencing accuracy indirectly by optimizing the design of mirror-image peptide sequences. In this work, we introduce DBond, a deep neural network based model that integrates sequence features, precursor ion properties, and mass spectrometry environmental factors for the prediction of mirror-image peptide bond cleavage. In this process, sequences with a high peptide bond cleavage ratio, which are easy to sequence, are selected. The main contributions of this study are as follows. First, we constructed MiPD513, a tandem mass spectrometry dataset containing 513 mirror-image peptides. Second, we developed the peptide bond cleavage labeling algorithm (PBCLA), which generated approximately 12.5 million labeled data based on MiPD513. Third, we proposed a dual prediction strategy that combines multi-label and single-label classification. On an independent test set, the single-label classification strategy outperformed other methods in both single and multiple peptide bond cleavage prediction tasks, offering a strong foundation for sequence optimization.

2510.10216 2026-02-09 cs.PL cs.AI cs.SE

Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis

Zhechong Huang, Zhao Zhang, Ruyi Ji, Tingxuan Xia, Qihao Zhu, Qinxiang Cao, Zeyu Sun, Wiggin Zhou, Yingfei Xiong

详情
英文摘要

Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.

2510.02514 2026-02-09 eess.IV cs.CV cs.IT eess.SP math.IT stat.ML

Learning a distance measure from the information-estimation geometry of data

Guy Ohayon, Pierre-Etienne H. Fiquet, Florentin Guth, Jona Ballé, Eero P. Simoncelli

Comments ICLR 2026. Code is available at https://github.com/ohayonguy/information-estimation-metric

详情
英文摘要

We introduce the Information-Estimation Metric (IEM), a novel form of distance function derived from an underlying continuous probability density over a domain of signals. The IEM is rooted in a fundamental relationship between information theory and estimation theory, which links the log-probability of a signal with the errors of an optimal denoiser, applied to noisy observations of the signal. In particular, the IEM between a pair of signals is obtained by comparing their denoising error vectors over a range of noise amplitudes. Geometrically, this amounts to comparing the score vector fields of the blurred density around the signals over a range of blur levels. We prove that the IEM is a valid global distance metric and derive a closed-form expression for its local second-order approximation, which yields a Riemannian metric. For Gaussian-distributed signals, the IEM coincides with the Mahalanobis distance. But for more complex distributions, it adapts, both locally and globally, to the geometry of the distribution. In practice, the IEM can be computed using a learned denoiser (analogous to generative diffusion models) and solving a one-dimensional integral. To demonstrate the value of our framework, we learn an IEM on the ImageNet database. Experiments show that this IEM is competitive with or outperforms state-of-the-art supervised image quality metrics in predicting human perceptual judgments.

2510.00244 2026-02-09 q-fin.GN cs.CY cs.LG

Board gender diversity and emissions performance: Insights from panel regressions, machine learning, and explainable AI

Mohammad Hassan Shakil, Arne Johan Pollestad, Khine Kyaw, Ziaul Haque Munim

Comments 18 pages

Journal ref 10.1016/j.jenvman.2026.128776

详情
英文摘要

With European Union initiatives mandating gender quotas on corporate boards, a key question arises: Is greater board gender diversity (BGD) associated with better emissions performance (EP)? To answer this question, we examine the influence of BGD on EP across a sample of European firms from 2016 to 2022. Using panel regressions, advanced machine learning algorithms, and explainable AI, we reveal a non-linear relationship. Specifically, EP improves with BGD up to an optimal level of approximately 35 %, beyond which further increases in BGD yield no additional improvement in EP. A minimum BGD threshold of 22 % is necessary for meaningful improvements in EP. To assess the legitimacy of EP outcomes, this study examines whether ESG controversies weaken the BGD-EP relationship. The results show no significant effect, suggesting that BGD's impact is driven by governance mechanisms rather than symbolic actions. Additionally, path analysis indicates that while environmental innovation contributes to EP, it is not the mediating channel through which BGD promotes EP. The results have implications for academics, businesses, and regulators.

2509.14032 2026-02-09 cs.GT cs.LG cs.MA

Nash Equilibria in Games with Playerwise Concave Coupling Constraints: Existence and Computation

Philip Jordan, Maryam Kamgarpour

详情
英文摘要

We study the existence and computation of Nash equilibria in concave games where the players' admissible strategies are subject to shared coupling constraints. Under playerwise concavity of constraints, we prove existence of Nash equilibria. Our proof leverages topological fixed point theory and novel structural insights into the contractibility of feasible sets, and relaxes strong assumptions for existence in prior work. Having established existence, we address the question of whether in the presence of coupling constraints, playerwise independent learning dynamics have convergence guarantees. We address this positively for the class of potential games by designing a convergent algorithm. To account for the possibly nonconvex feasible region, we employ a log barrier regularized gradient ascent with adaptive stepsizes. Starting from an initial feasible strategy profile and under exact gradient feedback, the proposed method converges to an $ε$-approximate constrained Nash equilibrium within $\mathcal{O}(ε^{-3})$ iterations.

2509.03839 2026-02-09 eess.SY cs.LG cs.SY math.OC nlin.CD

Reservoir Predictive Path Integral Control for Unknown Nonlinear Dynamics

Daisuke Inoue, Tadayoshi Matsumori, Gouhei Tanaka, Yuji Ito

Comments Submitted to IEEE for possible publication, 13 pages, 5 figures

详情
英文摘要

Neural networks have found extensive application in data-driven control of nonlinear dynamical systems, yet fast online identification and control of unknown dynamics remain central challenges. To meet these challenges, this paper integrates echo-state networks (ESNs)--reservoir computing models implemented with recurrent neural networks--and model predictive path integral (MPPI) control--sampling-based variants of model predictive control. The proposed reservoir predictive path integral (RPPI) enables fast learning of nonlinear dynamics with ESNs and exploits the learned nonlinearities directly in MPPI control computation without linearization approximations. This framework is further extended to uncertainty-aware RPPI (URPPI), which achieves robust stochastic control by treating ESN output weights as random variables and minimizing an expected cost over their distribution to account for identification errors. Experiments on controlling a Duffing oscillator and a four-tank system demonstrate that URPPI improves control performance, reducing control costs by up to 60% compared to traditional quadratic programming-based model predictive control methods.

2508.20850 2026-02-09 cs.NE cs.ET cs.RO

Encoding Tactile Stimuli for Braille Recognition with Organoids

Tianyi Liu, Hemma Philamore, Benjamin Ward-Cherrier

详情
英文摘要

This study proposes a transferable encoding strategy that maps tactile sensor data to electrical stimulation patterns, enabling neural organoids to perform an open-loop artificial tactile Braille classification task. Human forebrain organoids cultured on a low-density microelectrode array (MEA) are systematically stimulated to characterize the relationship between electrical stimulation parameters (number of pulse, phase amplitude, phase duration, and trigger delay) and organoid responses, measured as spike activity and spatial displacement of the center of activity. Implemented on event-based tactile inputs recorded from the Evetac sensor, our system achieved an average Braille letter classification accuracy of 61% with a single organoid, which increased significantly to 83% when responses from a three-organoid ensemble were combined. Additionally, the multi-organoid configuration demonstrated enhanced robustness against various types of artificially introduced noise. This research demonstrates the potential of organoids as low-power, adaptive bio-hybrid computational elements and provides a foundational encoding framework for future scalable bio-hybrid computing architectures.

2507.13591 2026-02-09 cs.CR cs.LG

FuSeFL: Fully Secure and Scalable Federated Learning

Sahar Ghoflsaz Ghinani, Elaheh Sadredini

Comments 14 Pages, 12 Figures

详情
英文摘要

Federated Learning (FL) enables collaborative model training without centralizing client data, making it attractive for privacy-sensitive domains. While existing approaches employ cryptographic techniques such as homomorphic encryption, differential privacy, or secure multiparty computation to mitigate inference attacks, including model inversion, membership inference, and gradient leakage, they often suffer from high computational and memory overheads. Moreover, many methods overlook the confidentiality of the global model itself, which may be proprietary and sensitive. These challenges limit the practicality of secure FL, especially in settings that involve large datasets and strict compliance requirements. We present FuSeFL, a Fully Secure and scalable FL scheme, which decentralizes training across client pairs using lightweight MPC, while confining the server's role to secure aggregation, client pairing, and routing. This design eliminates server bottlenecks, avoids full data offloading, and preserves full confidentiality of data, model, and updates throughout training. Based on our experiment, FuSeFL defends against unauthorized observation, reconstruction attacks, and inference attacks such as gradient leakage, membership inference, and inversion attacks, while achieving up to $13 \times$ speedup in training time and 50% lower server memory usage compared to our baseline.

2507.05820 2026-02-09 cs.HC cs.AI cs.MA

Constella: Supporting Storywriters' Interconnected Character Creation through LLM-based Multi-Agents

Syemin Park, Soobin Park, Youn-kyung Lim

Comments Accepted to ACM Transactions on Computer-Human Interaction (TOCHI)

详情
英文摘要

Creating a cast of characters by attending to their relational dynamics is a critical aspect of most long-form storywriting. However, our formative study (N=14) reveals that writers struggle to envision new characters that could influence existing ones, balance similarities and differences among characters, and intricately flesh out their relationships. Based on these observations, we designed Constella, an LLM-based multi-agent tool that supports storywriters' interconnected character creation process. Constella suggests related characters (FRIENDS DISCOVERY feature), reveals the inner mindscapes of several characters simultaneously (JOURNALS feature), and manifests relationships through inter-character responses (COMMENTS feature). Our 7-8 day deployment study with storywriters (N=11) shows that Constella enabled the creation of expansive communities composed of related characters, facilitated the comparison of characters' thoughts and emotions, and deepened writers' understanding of character relationships. We conclude by discussing how multi-agent interactions can help distribute writers' attention and effort across the character cast.

2507.04341 2026-02-09 stat.ML cs.AI cs.LG

Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models

Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman

详情
英文摘要

While continuous diffusion models excel in modeling continuous distributions, their application to categorical data has been less effective. Recent work has shown that ratio-matching through score-entropy within a continuous-time discrete Markov chain (CTMC) framework serves as a competitive alternative to autoregressive models in language modeling. To enhance this framework, we first introduce three new theorems concerning the KL divergence between the data and learned distribution. Our results serve as the discrete counterpart to those established for continuous diffusion models and allow us to derive an improved upper bound of the perplexity. Second, we empirically show that ratio-matching performed by minimizing the denoising cross-entropy between the clean and corrupted data enables models to outperform those utilizing score-entropy with up to 10% lower perplexity/generative-perplexity, and 15% faster training steps. To further support our findings, we introduce and evaluate a novel CTMC transition-rate matrix that allows prediction refinement, and derive the analytic expression for its matrix exponential which facilitates the computation of conditional ratios thus enabling efficient training and generation.

2507.02735 2026-02-09 cs.CR cs.AI

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo

详情
英文摘要

Prompt injection attacks, where untrusted data contains an injected prompt to manipulate the system, have been listed as the top security threat to LLM-integrated applications. Model-level prompt injection defenses have shown strong effectiveness, but the strongest defenses are proprietary. Open-source secure models are needed by the AI security community so that co-development of attacks and defenses through open research can drive scientific progress in mitigating prompt injection attacks. To this end, we develop Meta SecAlign, the first fully open-source LLM with built-in model-level defense that achieves commercial-grade performance and is powerful enough for complex agentic tasks. We provide complete details of our training recipe. We perform the most comprehensive evaluation to date on 9 utility benchmarks (measuring general knowledge, instruction following, and agentic workflows) and 7 security benchmarks. Results show that Meta SecAlign, despite being trained only on generic instruction-tuning samples, surprisingly confers security in unseen downstream tasks, including tool-calling and web-navigation, in addition to general instruction-following. Our best model -- Meta-SecAlign-70B -- establishes a new frontier of utility-security trade-off for open-source LLMs, and is more secure than several flagship proprietary models with prompt injection defense. Below are links for the code (https://github.com/facebookresearch/Meta_SecAlign), Meta-SecAlign-70B (https://huggingface.co/facebook/Meta-SecAlign-70B), and Meta-SecAlign-8B (https://huggingface.co/facebook/Meta-SecAlign-8B) models.

2506.08381 2026-02-09 physics.geo-ph cs.LG

Physics-informed extreme learning machine for Terzaghi consolidation problems and interpretation of coefficient of consolidation based on CPTu data

He Yang, Pin-Qiang Mo, Fei Ren, Hai-Sui Yu, Xueyu Geng, Pei-Zhi Zhuang

详情
英文摘要

This paper conducts a preliminary study to investigate the feasibility of a physics-informed extreme learning machine (PIELM) for solving the Terzaghi consolidation equation and interpreting the coefficient of consolidation of soil from piezocone penetration tests (CPTu). In the PIELM framework, the target solution is approximated by a single-layer feed-forward extreme learning machine (ELM) network, instead of the deep neural networks typically employed in physics-informed neural networks (PINNs). Physical laws and measured data are integrated into a loss vector, which is minimized via least squares methods during ELM training. As a result, training efficiency is significantly improved by avoiding the gradient-descent optimisation commonly used in PINNs. The performance of PIELM is evaluated using three forward-problem case studies. Notably, a time-stepping strategy is incorporated into the PIELM framework to alleviate sharp gradients caused by inconsistent initial and boundary conditions. This paper further applies PIELM to estimate the soil consolidation coefficient, given that initial distributions of excess water pressure are often unavailable in CPTu dissipation tests (conducted following the pauses of penetration). By combining physical laws (excluding initial conditions) with measured data (i.e., excess pore-water pressure at the probe surface), the results demonstrate that PIELM is an effective tool for interpreting CPTu dissipation tests, owing to its ability to fuse data with physical constraints. This study contributes to the interpretation of consolidation coefficients from CPTu dissipation tests, particularly in scenarios where initial distributions of excess water pressure are not prior-known.

2506.02791 2026-02-09 cs.SE cs.AI

Rethinking the effects of data contamination in Code Intelligence

Zhen Yang, Hongyi Lin, Yifan He, Junqi Wang, Zeyu Sun, Shuo Liu, Jie Xu, Pengpeng Wang, Zhongxing Yu, Qingyuan Liang

详情
英文摘要

In recent years, code intelligence has gained increasing importance in the field of automated software engineering. Meanwhile, the widespread adoption of Pretrained Language Models (PLMs) and Large Language Models (LLMs) has raised concerns regarding data contamination and its potential impact on model performance evaluation. Previous studies mainly focused on sample-level contamination, ignoring partial contamination scenarios that are pervasive in code intelligence. This paper fills this gap and presents a systematic empirical study to investigate the fine-grained data contamination on mainstream code tasks. Our study involves diverse representative PLMs: RoBERTa and GPT-2, and LLMs: LLaMA and StarCoder, covering three major tasks: code translation, code generation, and code summarization, across two Programming Languages (PLs): Java and Python. We categorize contamination scenarios into four types according to the code intelligence practice, namely input-only, output-only, unpaired, and paired contamination settings, and construct corresponding experimental and control groups for exploration. Experimental results show that, under the pre-training, fine-tuning, and inference paradigm adopted by PLMs, even deliberately injecting paired contamination does not lead to significant performance overestimation. But direct inference or small-scale fine-tuning uncovers the contamination effects. In contrast, LLMs with pre-training and inference paradigm are significantly affected by the paired contamination. Apart from the above, other contamination scenarios have no impact on both PLMs and LLMs. Our findings challenge the conventional belief that contamination inevitably leads to performance overestimation, providing new insights into the evaluation and deployment of code intelligence models.

2506.00062 2026-02-09 cs.CY cs.CL cs.CR cs.LG

SafeCOMM: A Study on Safety Degradation in Fine-Tuned Telecom Large Language Models

Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Fernando Koch, Walid Saad, Holger Boche

Journal ref IEEE Wireless Communications and Networking Conference (WCNC), 2026

详情
英文摘要

Fine-tuning large language models (LLMs) on telecom datasets is a common practice to adapt general-purpose models to the telecom domain. However, little attention has been paid to how this process may compromise model safety. Recent research has shown that even benign fine-tuning can degrade the safety alignment of LLMs, causing them to respond to harmful or unethical user queries. In this paper, we investigate this issue by fine-tuning LLMs on three representative telecom datasets and show that safety degrades even for light telecom domain adaptation. To this end, we introduce TeleHarm, the first telecom-specific red-teaming benchmark, which we use alongside established DirectHarm and HexPhi datasets to systematically assess harmful behavior. We further extend our analysis to publicly available TeleLLMs that were continually pre-trained on large telecom corpora, revealing that safety alignment is severely lacking, primarily due to the omission of safety-focused instruction tuning. To address these issues, we evaluate three realignment defenses: SafeInstruct, SafeLoRA, SafeMERGE. We show that, across all settings, the proposed defenses can effectively restore safety without compromising telecom task performance, leading to Safe teleCOMMunication (SafeCOMM) models. Our work serves as both a diagnostic study and practical guide for safety realignment in telecom-tuned LLMs, underscoring the need for safety-aware instruction and fine-tuning in the telecom domain.

2505.21517 2026-02-09 cs.CY cs.AI

Cold Start Problem: An Experimental Study of Knowledge Tracing Models with New Students

Indronil Bhattacharjee, Christabel Wayllace

Comments 26th International Conference on Artificial Intelligence in Education (AIED 2025)

Journal ref 26th International Conference on Artificial Intelligence in Education (AIED 2025)

详情
英文摘要

KnowledgeTracing (KT) involves predicting students' knowledge states based on their interactions with Intelligent Tutoring Systems (ITS). A key challenge is the cold start problem, accurately predicting knowledge for new students with minimal interaction data. Unlike prior work, which typically trains KT models on initial interactions of all students and tests on their subsequent interactions, our approach trains models solely using historical data from past students, evaluating their performance exclusively on entirely new students. We investigate cold start effects across three KT models: Deep Knowledge Tracing (DKT), Dynamic Key-Value Memory Networks (DKVMN), and Self-Attentive Knowledge Tracing (SAKT), using ASSISTments 2009, 2015, and 2017 datasets. Results indicate all models initially struggle under cold start conditions but progressively improve with more interactions; SAKT shows higher initial accuracy yet still faces limitations. These findings highlight the need for KT models that effectively generalize to new learners, emphasizing the importance of developing models robust in few-shot and zero-shot learning scenarios

2505.16975 2026-02-09 cs.SE cs.CL

SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development

Yaxin Du, Yuzhu Cai, Yifan Zhou, Cheng Wang, Yu Qian, Xianghe Pang, Qian Liu, Yue Hu, Siheng Chen

详情
英文摘要

Large Language Models (LLMs) have shown strong capability in diverse software engineering tasks. However, feature-driven development, a highly prevalent real-world task that involves developing new functionalities for large, existing codebases, remains underexplored. We therefore introduce SWE-Dev, the first large-scale dataset (with 14,000 training and 500 test samples) designed to evaluate and train autonomous coding systems on real-world end-to-end feature-driven software development tasks. To ensure verifiable and diverse training, SWE-Dev uniquely provides all instances with a runnable environment and its developer-authored executable unit tests. This collection not only provides high-quality data for Supervised Fine-Tuning (SFT), but also enables Reinforcement Learning (RL) by delivering accurate reward signals from executable unit tests. We evaluated SWE-Dev across 17 base LLMs, 10 reasoning-focused LLMs, 10 multi-agent systems, and 8 tool-augmented LLM agents. Results show substantial headroom: the best single-turn model reaches only 22.51\% Pass@1 on the hard split, while OpenHands agents improve to 56.44\% but still leave many tasks unsolved. Code is available here https://github.com/DorothyDUUU/SWE-Dev.

2502.07021 2026-02-09 cs.DC cs.LG

Federated Sinkhorn

Jeremy Kulcsar, Vyacheslav Kungurtsev, Georgios Korpas, Giulio Giaconi, William Shoosmith

详情
英文摘要

We study distributed Sinkhorn iterations for entropy-regularized optimal transport when the Gibbs kernel operator is row-partitioned across c workers and cannot be centralized. We present Federated Sinkhorn, two exact synchronous protocols that exchange only scaling-vector slices: (i) an All-to-All scheme implemented by Allgather, and (ii) a Star (parameter-server) scheme implemented by client to server sends and server to client broadcasts. For both, we derive closed-form per-iteration compute, communication, and memory costs under an alpha-beta latency--bandwidth model, and show that the distributed iterates match centralized Sinkhorn under standard positivity assumptions. Multi-node CPU/GPU experiments validate the model and show that repeated global scaling exchange quickly becomes the dominant bottleneck as c increases. We also report an optional bounded-delay asynchronous schedule and an optional privacy measurement layer for communicated log-scalings.

2501.18064 2026-02-09 cond-mat.mtrl-sci cs.AI

Learning Metal Microstructural Heterogeneity through Spatial Mapping of Diffraction Latent Space Features

Mathieu Calvat, Chris Bean, Dhruv Anjaria, Hyoungryul Park, Haoren Wang, Kenneth Vecchio, J. C. Stinville

详情
英文摘要

To leverage advancements in machine learning for metallic materials design and property prediction, it is crucial to develop a data-reduced representation of metal microstructures that surpasses the limitations of current physics-based discrete microstructure descriptors. This need is particularly relevant for metallic materials processed through additive manufacturing, which exhibit complex hierarchical microstructures that cannot be adequately described using the conventional metrics typically applied to wrought materials. Furthermore, capturing the spatial heterogeneity of microstructures at the different scales is necessary within such framework to accurately predict their properties. To address these challenges, we propose the physical spatial mapping of metal diffraction latent space features. This approach integrates (i) point diffraction data encoding via variational autoencoders or contrastive learning and (ii) the physical mapping of the encoded values. Together these steps offer a method offers a novel means to comprehensively describe metal microstructures. We demonstrate this approach on a wrought and additively manufactured alloy, showing that it effectively encodes microstructural information and enables direct identification of microstructural heterogeneity not directly possible by physics-based models. This data-reduced microstructure representation opens the application of machine learning models in accelerating metallic material design and accurately predicting their properties.

2501.15802 2026-02-09 cs.DC cs.AI

Adaptive AI-based Decentralized Resource Management in the Cloud-Edge Continuum

Lanpei Li, Jack Bell, Massimo Coppola, Vincenzo Lomonaco

Comments Accepted at AHPC3 workshop, PDP 2025

详情
英文摘要

In the Cloud-Edge Continuum, dynamic infrastructure change and variable workloads complicate efficient resource management. Centralized methods can struggle to adapt, whilst purely decentralized policies lack global oversight. This paper proposes a hybrid framework using Graph Neural Network (GNN) embeddings and collaborative multi-agent reinforcement learning (MARL). Local agents handle neighbourhood-level decisions, and a global orchestrator coordinates system-wide. This work contributes to decentralized application placement strategies with centralized oversight, GNN integration and collaborative MARL for efficient, adaptive and scalable resource management.