arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1276
2605.00527 2026-05-04 eess.IV cs.CV cs.LG

Multi-frame Restoration for High-rate Lissajous Confocal Laser Endomicroscopy

Minhee Lee, Sangyoon Lee, Jiwook Lee, Minki Hong, Kyuyoung Kim, Wonhwa Kim, Jaeho Lee

详情
英文摘要

Lissajous confocal laser endomicroscopy (CLE) is a promising solution for high speed in vivo optical biopsy for handheld scenarios. However, Lissajous scanning traces a resonant trajectory and samples only the visited pixels per frame; at high frame rates, many pixels remain unvisited, creating structured holes. In this work, we introduce the first benchmark for high-rate Lissajous CLE, consisting of low-quality video clips paired with high-quality reference images. The reference images are wide-FOV mosaics obtained by stitching stabilized, slow-scan frames of the same tissue, enabling temporally aligned supervision. Using this dataset, we propose MIRA, a lightweight recurrent framework for Lissajous CLE restoration that iteratively aggregates temporal context through feature reuse and displacement alignment. Our experiments demonstrate that MIRA outperforms both lightweight and high-complexity baselines in restoration quality while maintaining a favorable computational efficiency suitable for clinical deployment.

2605.00519 2026-05-04 cs.PF cs.AI cs.AR

Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference

Allan Kazakov, Abdurrahman Javat

详情
英文摘要

The operational landscape of local Large Language Model (LLM) inference has shifted from lightweight models to datacenter-class weights exceeding 70B parameters, creating profound systems challenges for consumer hardware. This paper presents a systematic empirical analysis of the Nvidia and Apple Silicon ecosystems, specifically characterizing the distinct intra-architecture trade-offs required to deploy these massive models. On the Nvidia Blackwell architecture, we identify a critical "Backend Dichotomy" within the TensorRT-LLM stack: while the new NVFP4 quantization format delivers a 1.6x throughput advantage over optimized BF16 baselines (151 tokens/s vs. 92 tokens/s), realizing this performance requires navigating complex runtime constraints that trade startup latency for generation speed. Furthermore, we characterize the "VRAM Wall" for 70B+ models: on discrete GPUs, users face a destructive choice between aggressive quantization (e.g., Q2) that degrades model intelligence to fit in VRAM, or PCIe-bottlenecked CPU offloading, which reduces throughput by over 90% compared to full-GPU execution. Conversely, Apple's Unified Memory Architecture (UMA) circumvents these bottlenecks, enabling linear scaling for 80B parameter models at practical 4-bit precisions. This architectural divergence extends to operational sustainability, where Apple's SoC design demonstrates up to a 23x advantage in energy efficiency (tokens/joule). We conclude that for consumer-grade inference, the optimal hardware is defined by a complex interplay between compute density (Nvidia) and memory capacity (Apple), moderated by the significant "ecosystem friction" of proprietary quantization workflows.

2605.00497 2026-05-04 cs.HC cs.AI cs.CL

"What Are You Really Trying to Do?": Co-Creating Life Goals from Everyday Computer Use

Shardul Sapkota, Matthew Jörke, Zane Sabbagh, Omar Shaikh, Grace Wang, James A. Landay

Comments 20 pages, 8 figures, 1 table

详情
英文摘要

Recent advances in user modeling make it feasible to conduct open-ended inference over a person's everyday computer use. Despite longstanding visions of systems that deeply understand our actions and the purposes they serve in our lives, existing systems only capture what a person is doing in the moment -- not why they are doing it -- limiting these systems to surface-level support. We introduce striving co-creation, a process for inferring broader life goals from unstructured observations of computer use. Grounded in Activity Theory and Emmons' personal strivings framework, our system progressively constructs a hierarchical representation of a person's activities. Crucially, strivings are difficult to fully resolve from observation alone, as the same action can be driven by many different goals. Our system therefore supports an editing interface that gives people agency over how they are understood by the system, feeding their corrections back into subsequent rounds of striving induction. In a week-long field deployment (N=14), we find that our co-creation process produces strivings that are representative of participants' long-term goals and gives them greater agency than baseline methods.

2605.00462 2026-05-04 cs.DC cs.AI

Adaptation of AI-accelerated CFD Simulations to the IPU platform

P. Rosciszewski, A. Krzywaniak, S. Iserte, K. Rojek, P. Gepner

详情
Journal ref
1st Workshop on Applications of Machine Learning and Artificial Intelligence in High-Performance Computing (WAML-HPC) held in conjunction with PPAM, Sep. 2022
英文摘要

Intelligence Processing Units (IPU) have proven useful for many AI applications. In this paper, we evaluate them within the emerging field of \emph{AI for simulation}, where traditional numerical simulations are supported by artificial intelligence approaches. We focus specifically on a program for training machine learning models supporting a \emph{computational fluid dynamics} application. We use custom TensorFlow provided by the Poplar SDK to adapt the program for the IPU-POD16 platform and investigate its ease of use and performance scalability. Training a model on data from OpenFOAM simulations allows us to get accurate simulation state predictions in test time. We show how to utilize the \emph{popdist} library to overcome a performance bottleneck in feeding training data to the IPU on the host side, achieving up to 34\% speedup. Due to communication overheads, using data parallelism to utilize two IPUs instead of one does not improve the throughput. However, once the intra-IPU costs have been paid, the hardware capabilities for inter-IPU communication allow for good scalability. Increasing the number of IPUs from 2 to 16 improves the throughput from 560.8 to 2805.8 samples/s.

2605.00461 2026-05-04 eess.IV cs.CV

Combined Dictionary Unfolding Network with Gradient-Adaptive Fidelity for Transferable Multi-Source Fusion

Ge Luo, Jun-Jie Huang, Qi Yu, Tianrui Liu, Ke Liang, Yuming Xiang, Wentao Zhao, Xinwang Liu, Meng Wang

详情
英文摘要

Deep Unfolding Network-based methods have emerged as effective solutions for multi-source image fusion by combining model-driven iterative optimization with data-driven deep learning. However, most existing deep unfolding image fusion methods are derived from alternating minimization, which updates the features of different modalities separately. This design introduces considerable computational and memory overhead, limiting deployment on resource-constrained edge devices. To address this issue, we propose CDNet, a lightweight Combined Dictionary Unfolding Network for multi-source image fusion. Rather than introducing a new sparse coding prior or empirically compressing an existing fusion network, CDNet translates the unique-common decomposition prior of coupled dictionary learning into a structurally constrained joint unfolding architecture. The resulting CDBlock follows a block-sparse interaction topology and performs a model-derived joint update of common and modality-specific representations, thereby streamlining feature learning and improving efficiency.In addition, we design a compact High- and Low-frequency Image Fidelity loss for unsupervised training without ground-truth images. We evaluate CDNet on four tasks, including multi-exposure image fusion, infrared and visible image fusion, medical image fusion, and infrared and visible image fusion for semantic segmentation. Experimental results show that CDNet achieves competitive or superior fusion performance with high efficiency. For infrared and visible image fusion, CDNet outperforms competing methods on four of six metrics on the TNO dataset and five of six metrics on the RoadScene dataset. In particular, it surpasses the second-best method by 1.23 dB and 1.59 dB in PSNR on TNO and RoadScene, respectively.

2605.00460 2026-05-04 cs.CR cs.LG

CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

Weifei Jin, Xilong Wang, Wei Zou, Jinyuan Jia, Neil Gong

详情
英文摘要

Retrieval-augmented generation (RAG) is vulnerable to prompt injection attacks, in which an adversary inserts malicious documents containing carefully crafted injected prompts into the knowledge database. When a user issues a question targeted by the attack, the RAG system may retrieve these malicious documents, whose injected prompts mislead it into generating attacker-specified answers, thereby compromising the integrity of the RAG system. In this work, we propose CleanBase, a method to detect malicious documents within a knowledge database. Our key insight is that malicious documents crafted for the same attack-targeted questions often exhibit high semantic similarity, as attackers deliberately make them consistent to improve attack success rates. Accordingly, CleanBase constructs a similarity graph over the knowledge database, where each node represents a document and an edge connects two nodes if their semantic similarity--computed using an embedding model--exceeds a statistically determined threshold. Due to their inherent similarity, malicious documents tend to form cliques within this graph. CleanBase detects such cliques and flags the corresponding documents as malicious. We theoretically derive upper bounds on CleanBase's false positive and false negative rates and empirically validate its effectiveness. Experimental results across multiple datasets and prompt injection attacks demonstrate that CleanBase accurately detects malicious documents and effectively safeguards RAG systems. Our source code is available at https://github.com/WeifeiJin/CleanBase.

2605.00449 2026-05-04 cs.IT cs.LG eess.SP math.IT

Soft Graph Diffusion Transformer for MIMO Detection

Nan Jiang, Jiadong Hong, Lei Liu, Xinyu Bian, Wenjie Wang, Zhaoyang Zhang

Comments 6 pages, 4 figures, 2 tables

详情
英文摘要

Learning-based MIMO detection has shown strong empirical performance, yet existing methods typically rely on fixed-depth architectures without explicitly modeling the progressive refinement of symbol estimates. In this paper, we revisit MIMO detection from a flow matching perspective and propose the Soft Graph Diffusion Transformer (SGDiT), which reformulates detection as a noise-level-conditioned denoising process that progressively transforms a Gaussian initialization toward the posterior conditioned on channel observations. An adaptive layer normalization (AdaLN)-conditioned soft graph transformer is employed to parameterize the denoising dynamics, enabling stage-aware information integration between observation and symbol domains. To better align with the discrete nature of symbol detection, we further adopt a cross-entropy-based training objective that directly models bit-wise posterior probabilities, providing a more suitable inductive bias than conventional regression-based formulations. Experimental results across various MIMO system configurations demonstrate that SGDiT achieves competitive bit error rate (BER) performance compared with representative baselines. Furthermore, the proposed model exhibits good generalization capability across different channel conditions. Overall, the SGDiT framework provides an effective and practical approach for neural MIMO detection.

2605.00433 2026-05-04 cs.SE cs.AI

Improving LLM Code Generation via Requirement-Aware Curriculum Reinforcement Learning

Shouyu Yin, Zhao Tian, Junjie Chen, Shikai Guo

详情
英文摘要

Code generation, which aims to automatically generate source code from given programming requirements, has the potential to substantially improve software development efficiency. With the rapid advancement of large language models (LLMs), LLM-based code generation has attracted widespread attention from both academia and industry. However, as programming requirements become increasingly complex, existing LLMs still exhibit notable performance limitations. To address this challenge, recent studies have proposed training-based curriculum reinforcement learning (CRL) strategies to improve LLM code generation performance. Despite their effectiveness, existing CRL approaches suffer from several limitations, including misaligned requirement difficulty perception, the absence of requirement difficulty optimization, and suboptimal curriculum sampling strategies. In CRL-based code generation, programming requirements serve as the sole input to the model, making their quality and difficulty critical to training effectiveness. Motivated by insights from software requirements engineering, we propose RECRL, a novel requirement-aware curriculum reinforcement learning framework for enhancing LLM-based code generation. RECRL automatically perceives model-specific requirement difficulty, optimizes challenging requirements to improve training data utilization, and employs an adaptive curriculum sampling strategy to construct training batches with smoothly varying difficulty. Extensive experiments on five state-of-the-art LLMs across five widely-used code generation benchmarks by comparing with five state-of-the-art baselines, demonstrate the significant effectiveness of RECRL. For example, RECRL achieves an average Pass@1 improvement of 1.23%-5.62% over all state-of-the-art baselines.

2605.00420 2026-05-04 cs.MA cs.LG q-fin.GN

Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents

Maksym Nechepurenko, Pavel Shuvalov

Comments 27 pages, 5 figures, 10 tables. Project page: https://foresightarena.xyz/. Code: https://github.com/foresight-arena/contracts

详情
英文摘要

Evaluating the true forecasting ability of AI agents requires environments resistant to overfitting, free from centralized trust, and grounded in incentive-compatible scoring. Existing benchmarks either rely on static datasets vulnerable to training-data contamination, or measure trading PnL -- a metric conflating predictive accuracy with timing, sizing, and risk appetite. We introduce Foresight Arena, the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets. Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts on Polygon PoS; outcomes are resolved trustlessly through the Gnosis Conditional Token Framework. Performance is measured by the Brier Score and a novel Alpha Score -- proper scoring rules that incentivize honest probability reporting and isolate predictive edge over market consensus. We provide a formal analysis: closed-form variance for per-market Alpha, the connection to Murphy's classical Brier decomposition, and a power analysis characterizing the number of rounds required to reliably distinguish agents of different skill levels. We show that detecting a true edge of $α^* = 0.02$ at 80% power requires approximately 350 resolved binary predictions (50 rounds of 7 markets), while $α^* = 0.01$ requires four times more. We complement these analytical results with a 50-round live evaluation of five frontier LLM agents plus a random baseline. Murphy decomposition distinguishes well-calibrated agents from market-tracking agents that fail through reduced resolution. All smart contracts and evaluation infrastructure are open-source.

2605.00402 2026-05-04 cs.NE cs.AI cs.LG

Scalable Learning in Structured Recurrent Spiking Neural Networks without Backpropagation

Bo Tang, Weiwei Xie

Comments 7 pages, 2 figures

详情
英文摘要

Spiking Neural Networks (SNNs) provide a promising framework for energy-efficient and biologically grounded computation; however, scalable learning in deep recurrent architectures with sparse connectivity remains a major challenge. In this work, we propose a structured multi-layer recurrent SNN architecture composed of locally dense recurrent layers augmented with sparse small-world long-range projections to a readout population. The long-range connectivity is largely fixed, preserving routing efficiency and hardware scalability, while synaptic adaptation is performed using strictly local plasticity mechanisms. To enable supervised learning without backpropagation or surrogate gradients, we introduce a biologically motivated learning framework that combines: (i) population-based winner-take-all (WTA) teaching signals at the output layer, (ii) fixed random broadcast alignment feedback pathways, and (iii) low-dimensional modulatory neuron populations that gate synaptic updates through three-factor learning rules with eligibility traces. This design supports deep recurrent computation with sparse global communication and purely local synaptic updates. We analyze the algorithmic properties, computational complexity, and hardware feasibility of the proposed approach, and demonstrate stable learning and competitive performance on benchmark classification tasks. The results highlight the potential of structured recurrence and neuromodulatory learning to enable scalable, hardware-compatible SNN training beyond gradient-based methods.

2605.00400 2026-05-04 cs.IR cs.CL

FollowTable: A Benchmark for Instruction-Following Table Retrieval

Rihui Jin, Yuchen Lu, Ting Zhang, Jun Wang, Kuicai Dong, Zhaocheng Du, Dongping Liu, Gang Wang, Yong Liu, Guilin Qi

Comments SIGIR 2026 Accepted

详情
英文摘要

Table Retrieval (TR) has traditionally been formulated as an ad-hoc retrieval problem, where relevance is primarily determined by topical semantic similarity. With the growing adoption of LLM-based agentic systems, access to structured data is increasingly instruction-driven, where relevance is conditional on explicit content and schema constraints rather than topical similarity alone. We therefore formalize Instruction-Following Table Retrieval (IFTR), a new task that requires models to jointly satisfy topical relevance and fine-grained instruction constraints. We identify two core challenges in IFTR: (i) sensitivity to content scope, such as inclusion and exclusion constraints, and (ii) awareness of schema-grounded requirements, including column semantics and representation granularity--capabilities largely absent in existing retrievers. To support systematic evaluation, we introduce FollowTable, the first large-scale benchmark for IFTR, constructed via a taxonomy-driven annotation pipeline. We further propose a new metric, termed the Instruction Responsiveness Score, to evaluate whether retrieval rankings consistently adapt to user instructions relative to a topic-only baseline. Our results indicate that existing retrieval models struggle to follow fine-grained instructions over tabular data. In particular, they exhibit systematic biases toward surface-level semantic cues and remain limited in handling schema-grounded constraints, highlighting substantial room for future improvements.

2605.00382 2026-05-04 cs.SE cs.AI cs.SI

Social Bias in LLM-Generated Code: Benchmark and Mitigation

Fazle Rabbi, Lin Ling, Song Wang, Jinqiu Yang

详情
英文摘要

Large Language Models (LLMs) are increasingly deployed to generate code for human-centered applications where demographic fairness is critical. However, existing evaluations focus almost exclusively on functional correctness, leaving social bias in LLM-generated code largely unexamined. Extending our prior work on Solar, we conduct a comprehensive empirical study using SocialBias-Bench, a benchmark of 343 real-world coding tasks spanning seven demographic dimensions. We evaluate four prominent LLMs and find severe bias across all models, with Code Bias Scores reaching up to 60.58%. We further show that standard prompt-level interventions, such as Chain-of-Thought reasoning and fairness persona assignment, inadvertently amplify bias rather than reduce it. We then investigate whether structured multi-agent software process frameworks can improve fairness, finding that structured pipelines reduce bias when early roles correctly scope what the code should and should not consider. However, adding explicit fairness instructions to all agent roles produces worse outcomes than providing none, suggesting that diffused responsibility goes unaddressed. To address these limitations, we propose the Fairness Monitor Agent (FMA), a modular component that plugs into any existing code generation pipeline without modifying it. FMA analyzes the task description to determine which attributes should be considered or restricted, then detects and corrects violations through an iterative review process, without requiring an executable test suite. Evaluated on all 343 tasks, FMA reduces bias by 65.1% compared to a developer agent alone and improves functional correctness from 75.80% to 83.97%, outperforming all other studied approaches.

2605.00361 2026-05-04 cs.CY cs.AI

Pedagogical Promise and Peril of AI: A Text Mining Analysis of ChatGPT Research Discussions in Programming Education

Juvy C. Grume, John Paul P. Miranda, Aileen P. De Leon, Jordan L. Salenga, Hilene E. Hernandez, Mark Anthony A. Castro, Vernon Grace M. Maniago, Joel D. Canlas, Joel B. Quiambao

Comments 3 tables, 14 pages, book chapter

详情
Journal ref
Pedagogical Innovations in Computer Science Education, 2026, pp. 305-330
英文摘要

GenAI systems such as ChatGPT are increasingly discussed in programming education, but the ways in which the research literature conceptualizes and frames their role remain unclear. This chapter applies text mining to publications indexed in a leading academic database to map scholarly discourse on ChatGPT in programming education. Term frequency analysis, phrase pattern extraction, and topic modeling reveal four dominant themes: pedagogical implementation, student-centered learning and engagement, AI infrastructure and human-AI collaboration, and assessment, prompting, and model evaluation. The literature prioritizes classroom practice and learner interaction, with comparatively limited attention to assessment design and institutional governance. Across studies, ChatGPT is positioned both as a learning aid that supports explanation, feedback, and efficiency and as a pedagogical risk linked to overreliance, unreliable outputs, and academic integrity concerns. These findings support responsible integration and highlight the need for stronger assessment and governance mechanisms.

2605.00348 2026-05-04 cs.CR cs.CL

Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking

Joeun Kim, HoEun Kim, Dongsup Jin, Young-Sik Kim

详情
英文摘要

Recent multi-bit watermarking methods for large language models (LLMs) prioritize capacity over reliability, often conflating decoding with detection. Our analysis reveals that existing ECC-based extractors suffer from catastrophic false positive rates (FPR), and applying rejection thresholds merely collapses detection sensitivity (TPR) to random guessing. To resolve this structural limitation, we propose \textbf{BREW} (Block-wise Reliable Embedding for Watermarking), a framework shifting the paradigm to \emph{designated verification}. BREW employs a two-stage mechanism: (i) \textbf{blind message estimation} via independent block voting, followed by (ii) \textbf{window-shifting verification} that rigorously validates the payload against local edits. Experiments demonstrate that BREW achieves a TPR of 0.965 with an FPR of 0.02 under 10\% synonym substitution, demonstrating that the high-FPR issue is not an inherent trade-off of multi-bit watermarking, but a solvable structural flaw of prior decoding-centric designs. Our framework is model-agnostic and theoretically grounded, providing a scalable solution for reliable forensic deployment.

2605.00343 2026-05-04 cs.CY cs.AI

AI Adoption Among Teachers: Insights on Concerns, Support, Confidence, and Attitudes

Vanessa B. Sibug, Maria Anna D. Cruz, Vicky P. Vital, Juvy C. Grume, Almer B. Gamboa, Emerson Q. Fernando, Lloyd D. Feliciano, Jordan L. Salenga, John Paul P. Miranda

Comments 3 pages, conference proceedings, open access

详情
Journal ref
Proc. 9th Int. Conf. on Education and Multimedia Technology (ICEMT), 2025, pp. 267-269
英文摘要

The study examines the adoption of artificial intelligence (AI) tools in education by analyzing the roles of institutional support, teacher confidence, and teacher concerns. It aims to determine whether teacher concerns moderate the relationship between institutional support and two outcomes: teacher confidence and attitudes toward AI adoption. The sample included 260 teachers from the Philippines. Composite scores were calculated for institutional support, confidence, concerns, and attitudes. Moderated multiple regression analysis showed that institutional support significantly predicted both teacher confidence and attitudes toward AI. However, teacher concerns did not significantly moderate these relationships. A follow-up mediation analysis tested whether confidence explains the effect of institutional support on attitudes. Results showed full mediation. The indirect effect was significant based on the Sobel test, and the direct effect became non-significant when confidence was included in the model. This shows that institutional support improves teacher attitudes by increasing their confidence. The study recommends that institutions provide structured and ongoing support to strengthen teacher confidence. Professional development, mentoring, and AI integration in teacher education programs can increase readiness and support effective AI adoption.

2605.00324 2026-05-04 cs.IR cs.LG

Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale

Jieming Di, Xiaoyu Chen, Ying She, Siyu Wang, Lizzie Liu, Fenggang Wu, Jiaoying Mu, Tony Tsui, Amr Elroumy, Hsing Tang, Zewei Jiang, Qiao Yang, Lin Qi, Haibo Lin, Weifeng Cui, Daniel Li, Kapil Gupta, Shivendra Pratap Singh, Jie Zheng, Arnold Overwijk, Ling Leng, Sri Reddy, Robert Malkin, Rocky Liu

Comments 8 pages, 2 figures, 3 tables

详情
英文摘要

Large-scale ranking systems depend on thousands of features derived from user behavior across multiple time horizons. Typically requires model retraining -- resulting in long iteration cycles (3--6 months), substantial GPU resource consumption, and limited rollout throughput. We introduce Intelligent Elastic Feature Fading (IEFF), a production infrastructure system that enables retrain-free feature efficiency rollouts by elastically controlling feature coverage and distribution at serving time. IEFF supports incremental feature coverage adjustments while models adapt through recurring training, eliminating dependencies on explicit retraining cycles. The system incorporates strict safety guardrails, reversibility mechanisms, and comprehensive monitoring to ensure stability at scale. Across multiple production use cases, IEFF accelerates efficiency-related rollouts by 5$\times$, eliminates retraining-related GPU overhead, and enables faster capacity recycling. Extensive offline and online experiments demonstrate that gradual feature fading prevents 50--55\% of online performance degradation compared to abrupt feature removal, while maintaining stable model behavior. These results establish elastic, system-level feature fading as a practical and scalable approach for managing feature efficiency in modern industrial ranking systems.

2605.00315 2026-05-04 cs.CY cs.AI

Unbox Responsible GeoAI: Navigating Climate Extreme and Disaster Mapping

Hao Li, Steffen Knoblauch

详情
英文摘要

As climate extreme and disaster events become more frequent and intense, Geospatial Artificial Intelligence (GeoAI) has emerged as a transformative approach for large-scale disaster mapping and risk reduction. However, the purely mechanical, performance-driven deployment of GeoAI models can result in amplifying inherent spatial inequalities, preventing effective emergency decision-making, and producing severe environmental carbon footprint. To unbox the concept of responsible GeoAI, this position paper examines its emerging role, e.g., in climate extreme and disaster mapping, from a critical GIS perspective. We address the nexus of responsible GeoAI into four interrelated theoretical dimensions, specifically Representativeness, Explainability, Sustainability, and Ethics, with examples from climate extreme and disaster mapping. Moreover, targeting at the operational practice, we then propose a conceptual governance Model of responsible GeoAI that categorizes its governance practices into Data, Application, and Society scopes. Last, this position paper aims to raise the attention in the broader GIS community that the future of climate resilience relies not just on building better algorithms, but on fostering a governance ecosystem where GeoAI is deployed responsibly, ethically, and sustainably.

2605.00314 2026-05-04 cs.CR cs.AI cs.PL

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

Hongbo Wen, Ying Li, Hanzhi Liu, Chaofan Shou, Yanju Chen, Yuan Tian, Yu Feng

详情
英文摘要

An agent skill is a configuration package that equips an LLM-driven agent with a concrete capability, such as reading email, executing shell commands, or signing blockchain transactions. Each skill is a hybrid artifact-a structured half declares executable interfaces, while a prose half dictates when and how those interfaces fire-and the prose is reinterpreted probabilistically on every invocation. Conventional static analyzers parse the structured half but ignore the prose; LLM-based tools read the prose but cannot reproducibly prove that a tainted input reaches a high-impact sink. We present Semia, a static auditor for agent skills. Semia lifts each skill into the Skill Description Language (SDL), a Datalog fact base that captures LLM-triggered actions, prose-defined conditions, and human-in-the-loop checkpoints. Synthesizing a fact base that is both structurally sound and semantically faithful to the original prose is the central challenge; we address it with Constraint-Guided Representation Synthesis (CGRS), a propose-verify-evaluate loop that refines LLM candidates until convergence. Security properties (e.g., indirect injection, secret leakage, confused deputies, unguarded sinks, etc.) over an agent skill can then be reduced to Datalog reachability queries. We evaluate Semia on 13,728 real-world skills from public marketplaces. Semia renders all of them auditable and finds that more than half carry at least one critical semantic risk. On a stratified sample of 541 expert-labeled skills, Semia achieves 97.7% recall and an F1 of 90.6%, substantially outperforming signature-based scanners and LLM baselines.

2605.00313 2026-05-04 cond-mat.mtrl-sci cs.AI

Beyond Structure: Revolutionising Materials Discovery via AI-Driven Synthesis Protocol-Property Relationships

Guillaume Lambard

Comments 20 pages, 2 figures

详情
英文摘要

The current structure-centric paradigm in artificial intelligence (AI)-driven materials discovery, despite delivering thousands of candidate structures, is stalling at a critical barrier: the synthesizability gap. We argue that closing this gap demands a pivot to a synthesis-first paradigm in which executable synthesis protocols, not just atomic configurations, are treated as primary design variables. We outline a roadmap built on three pillars: (i) representing synthesis procedures as machine-readable protocols, (ii) deploying generative and inverse-design models to propose actionable reaction pathways and recipes, and (iii) integrating closed-loop optimisation to refine protocols against experimental realities and sustainability constraints. Framed in terms of the causal backbone P->X->y from protocol P to structure X and properties y, this perspective sets out methodological building blocks, standards needs and self-driving laboratory (SDL) integration strategies to accelerate reproducible, data-first materials discovery.

2605.00297 2026-05-04 cs.CR cs.LG

Trident: Improving Malware Detection with LLMs and Behavioral Features

Rebecca Saul, Jingzhi Jiang, Elliott Chia, David Wagner

详情
英文摘要

Traditionally, machine learning methods for PE malware detection have relied on static features like byte histograms, string information, and PE header contents. One barrier to incorporating dynamic analysis features has been the semi-structured nature of sandbox behavior reports. We show that, using the latest generation of large language models with reasoning, it is possible to efficiently process these behavior reports and utilize them as part of a malware detection pipeline. Specifically, we leverage LLMs to generate behavior-based malware detection rules based on a small training set of labeled malware. We find that these detection rules, derived from behavioral features, are much more robust to concept drift than standard static-feature methods, while maintaining practical false positive rates. Finally, we introduce Trident, a system which combines a classic decision tree model over static features, our behavior-based detection rules, and direct LLM analysis of sandbox reports through majority voting. Trident outperforms standard methods using static features, outperforms behavior-based rules alone, and is as resilient to concept drift as active learning methods without requiring retraining.

2605.00279 2026-05-04 cs.CR cs.LG

A Comparative Analysis of Machine Learning Models for Intrusion Detection in Intelligent Transport Systems

Zawad Yalmie Sazid, Robert Abbas, Sasa Maric

详情
英文摘要

AI-powered edge computing security is moving Intelligent Transportation Systems (ITS) from passive, rule-based protections to proactive, smart, zero-touch, self-sufficient safeguards that neutralize threats in milliseconds. As transportation becomes more connected with edge computing, massive IoT, and advanced 5G for vehicle-to-everything (V2X) connectivity, AI at the edge computing nodes plays a crucial role in protecting against sophisticated threats, enabling URLLC (ultra-low-latency communications) for smart transport, and enhancing infrastructure capabilities and safety. This research applies edge computing to improve latency, bandwidth efficiency, and service responsiveness by moving processing closer to devices, gateways, and users. However, this shift also expands the cyberattack surface because edge nodes are distributed, heterogeneous, and often resource-constrained. The paper proposes a trust-aware federated hybrid intrusion detection framework in which a random forest, a decision tree, and a linear SVM network learn complementary traffic representations at each edge site, while a server performs trust-aware aggregation of local model updates.

2605.00278 2026-05-04 math.AC cs.CV cs.MS

Elimination Templates in Macaulay2

Manav Batavia, Cheng Chen, Anna Natalie Chlopecki, Timothy Duff, William Huang, Aolong Li, Wanchun Shen

Comments 13 pages with references

详情
英文摘要

We introduce the package \texttt{EliminationTemplates} for the Macaulay2 computer algebra system, which provides tools for constructing automatic solvers for families of zero-dimensional radical ideals depending on algebraically independent parameters. This article provides a self-contained description of how elimination templates are constructed for such families and their specialization properties. Additionally, we describe the main functionality and datatypes provided by our package, and illustrate its usage on several examples, including applications from computer vision from which elimination templates originated.

2605.00254 2026-05-04 cs.NI cs.AI

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

Junsun Choi, Sam Son, Sunjin Choi, Hansung Kim, Yakun Sophia Shao, Scott Shenker, Sylvia Ratnasamy, Borivoje Nikolic

详情
英文摘要

Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive high-bandwidth scale-up networks. We question whether such costly infrastructure is strictly necessary. We present the first systematic cross-layer analysis of network cost-effectiveness for MoE LLM serving, comparing four representative XPU (e.g., GPU/TPU) topologies (scale-up, scale-out, 3D torus, and 3D full-mesh). We find that lower-cost switchless topologies are more cost-effective than the scale-up topology across all serving scenarios explored, improving cost-effectiveness by 20.6-56.2%. In particular, the 3D full-mesh topology is Pareto-optimal in terms of the performance-cost tradeoff. We also find that current scale-up link bandwidths are over-provisioned: reducing the link bandwidth improves throughput per cost by up to 27%. A forward-looking analysis of upcoming GPU generations indicates that the cost-performance advantage of switchless networks will likely persist.

2605.00250 2026-05-04 stat.ML cs.CV cs.LG

Information-geometric adaptive sampling for graph diffusion

Yuhui Lu, Wenjing Liu, Kun Zhan

Comments Accepted to ICML 2026!

详情
英文摘要

Standard diffusion models for graph generation typically rely on uniform time-stepping, an approach that overlooks the non-homogeneous dynamics of distributional evolution on complex manifolds. In this paper, we present an information-geometric framework that reinterprets the diffusion sampling trajectory as a parametric curve on a Riemannian manifold. Our key observation is that the Fisher-Rao metric provides a principled measure of the intrinsic distance. By analyzing this metric, we derive the Drift Variation Score (DVS), a geometry-aware indicator that quantifies the instantaneous rate of distributional change. Unlike prior heuristic-based adaptive samplers, our DVS solver enforces a constant informational speed on the statistical manifold, automatically maintaining a uniform rate of distributional change along the sampling trajectory. This equal arc-length strategy ensures that each discretization step contributes equally to the information speed. Theoretical analysis verifies that DVS characterizes the local stiffness of the sampling dynamics in the Fisher-Rao sense. Experimental results on molecule and social network generation show that DVS significantly improves structural fidelity and sampling efficiency. Code is at https://github.com/kunzhan/DVS

2605.00236 2026-05-04 cs.CR cs.AI

Attention Is Where You Attack

Aviral Srivastava, Sourav Panda

详情
英文摘要

Safety-aligned large language models rely on RLHF and instruction tuning to refuse harmful requests, yet the internal mechanisms implementing safety behavior remain poorly understood. We introduce the Attention Redistribution Attack (ARA), a white-box adversarial attack that identifies safety-critical attention heads and crafts nonsemantic adversarial tokens that redirect attention away from safety-relevant positions. Unlike prior jailbreak methods operating at the semantic or output-logit level, ARA targets the geometry of softmax attention on the probability simplex using Gumbel-softmax optimization over targeted heads. Across LLaMA-3-8B-Instruct, Mistral-7B-Instruct-v0.1, and Gemma-2-9B-it, ARA bypasses safety alignment with as few as 5 tokens and 500 optimization steps, achieving 36% ASR on Mistral-7B and 30% on LLaMA-3 against 200 HarmBench prompts, while Gemma-2 remains at 1%. Our principal mechanistic finding is a dissociation between ablation and redistribution: zeroing out the top-ranked safety heads produces at most 1 flip among 39 to 50 baseline refusals, while ARA targeting the corresponding safety-heavy layers flips 72/200 prompts on Mistral-7B and 60/200 on LLaMA-3. This suggests that safety is not localized in these heads as removable components, but emerges from the attention routing they perform. Removing a head allows compensation through the residual stream, while redirecting its attention propagates a corrupted signal downstream.

2605.00229 2026-05-04 stat.ML cs.LG math.OC

A unified perspective on fine-tuning and sampling with diffusion and flow models

Carles Domingo-Enrich, Yuanqi Du, Michael S. Albergo

详情
英文摘要

We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities and reward fine-tuning of pre-trained models. This problem can be approached from a stochastic optimal control (SOC) perspective, using adjoint-based or score matching methods, or from a non-equilibrium thermodynamics perspective. We provide a unified framework encompassing these approaches and make three main contributions: (i) bias-variance decompositions revealing that Adjoint Matching/Sampling and Novel Score Matching have finite gradient variance, while Target and Conditional Score Matching do not; (ii) norm bounds on the lean adjoint ODE that theoretically support the effectiveness of adjoint-based methods; and (iii) adaptations of the CMCD and NETS loss functions, along with novel Crooks and Jarzynski identities, to the exponential tilting setting. We validate our analysis with reward fine-tuning experiments on Stable Diffusion 1.5 and 3.

2605.00225 2026-05-04 eess.AS cs.LG cs.SD q-bio.QM

From Birdsong to Rumbles: Classifying Elephant Calls with Out-of-Species Embeddings

Christiaan M. Geldenhuys, Thomas R. Niesler

详情
英文摘要

We show that pretrained acoustic embeddings classify elephant vocalisations at a level approaching that of end-to-end supervised neural networks, without any fine-tuning of the embedding model. This result is of practical importance because annotated bioacoustic data are scarce and costly to obtain, leaving conventional supervised approaches prone to overfitting and to poor generalisation under domain shift. A broad range of embedding models drawn from general audio, speech, and bioacoustic domains is evaluated, all of which are either out-of-domain (containing no bioacoustic data) or out-of-species (containing no elephant call data). The embedding networks themselves remain fixed; only the lightweight downstream classifiers, which include a linear model and several small neural networks, are trained. Among the models considered, Perch 2.0 achieves the best cross-validated classification performance, attaining AUCs of 0.849 on African bush elephant (Loxodonta africana) calls and 0.936 on Asian elephant (Elephas maximus) calls, with Perch 1.0 close behind. The best-performing system is within 2.2 % of an end-to-end supervised elephant call classification system. A layerwise analysis of pretrained transformer encoders, considered as embedding models, shows that intermediate representations outperform final-layer outputs. The second layer of both wav2vec2.0 and HuBERT encodes sufficient information for effective elephant call classification; truncation at this layer therefore preserves classification performance whilst retaining only approximately 10 % of the parameters of the full network. Such compact embedding networks are well suited to on-device processing where computational resources are limited.

2605.00218 2026-05-04 cs.CR cs.ET cs.LG

Selfie-Capture Dynamics as an Auxiliary Signal Against Deepfakes and Injection Attacks for Mobile Identity Verification

Erkka Rantahalvari, Olli Silvén, Zinelabidine Boulkenafet, Constantino Álvarez Casado

Comments 12 pages, 5 figures, 8 tables, 51 references, conference

详情
英文摘要

Mobile remote identity verification (RIdV) systems are exposed to attacks that manipulate or replace the facial video stream, including presentation attacks, real-time deepfakes, and video injection. Recent European requirements, including ETSI TS 119 461 and CEN/TS 18099, motivate complementary evidence channels beyond camera-based presentation-attack detection. This paper investigates whether passive motion traces recorded during selfie capture provide auxiliary evidence for spoof screening and user verification. We introduce CanSelfie, a dataset of 375 bona fide multi-sensor sequences collected at 50\,Hz from 30 participants using a commercial mobile RIdV application, together with stationary, handheld, and temporally shifted attack-proxy scenarios. We benchmark 7 multivariate time-series classifiers and 8 whole-series anomaly detectors across sensor configurations and temporal windows. For spoof screening, accelerometer-only ROCKAD obtains 0.00\% false rejection rate (FRR) and 43.8\% false acceptance rate (FAR), while QUANT+3-NN obtains the lowest overall FAR of 32.0\% at 2.37\% FRR; both reject all stationary attack proxies. For same-device and same-session user verification, WEASEL+MUSE reaches 1.07\% equal error rate (EER) using 9 sensor channels. The analysis shows that raw accelerometer data, preserving gravity and orientation cues, is the most informative modality, and that closed-set classification accuracy alone does not imply good verification performance because threshold calibration depends on score distributions. The findings suggest that short selfie-capture motion traces contain measurable spoof-related and identity-related information, supporting their use as a low-friction auxiliary signal while also identifying the need for cross-device, cross-session, and real injection-attack evaluation.

2605.00201 2026-05-04 cs.DS cs.LG

Matroid Algorithms Under Size-Sensitive Independence Oracles

Kiarash Banihashem, MohammadTaghi Hajiaghayi, Mahdi JafariRaviz, Danny Mittal

Comments Accepted to the 43rd International Conference on Machine Learning (ICML 2026) as a spotlight paper

详情
英文摘要

The standard oracle model for matroid algorithms assumes that each independence query can be answered in constant time, regardless of the size of the queried set. While this abstraction has underpinned much of the theoretical progress in matroid optimization, it masks the true computational effort required by these algorithms. In particular, for natural and widely studied classes such as graphic matroids, even a single independence query can require work linear in the size of the set, making the constant-time assumption implausible. We address this gap by introducing a size-sensitive cost model where the cost of a query $Q$ scales with $|Q|$. Nearly linear-time oracle implementations exist for broad families of matroids, and this refined abstraction therefore captures the true cost of query evaluation while allowing for a more faithful comparison between general matroids and their natural special cases. Within this framework we study three fundamental algorithmic tasks: finding a basis of a matroid, approximating its rank, and approximating its partition size. We establish tight results, proving nearly matching upper and lower bounds that show the optimal query cost is (up to logarithmic factors) quadratic in the size of the matroid. On the algorithmic side, our upper bounds are realized by explicit procedures that construct the desired solution. On the complexity side, our lower bounds are unconditional and already hold even for weaker distinguishing formulations of the problems. Finally, for matroids with maximum circuit size at most $c$, we show that the quadratic barrier can be broken, providing an algorithm that calculates the maximum-weight basis with expected query cost $\mathcal{O}(n^{2-1/c} \log n)$.

2605.00197 2026-05-04 cs.MA cs.AI

The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations

Aurélien Bück-Kaeffer, Sneheel Sarangi, Maximilian Puelma Touzel, Reihaneh Rabbany, Zachary Yang, Jean-François Godbout

Comments 20 pages, 12 tables, under review at COLM 2026

详情
英文摘要

Studies attempting to simulate human behavior with $\textit{Silicon Societies}$ grow in numbers while LLM-only social networks have started appearing outside of controlled settings. However, the design space of these networks remains under-studied, which contributes to a gap in validating model realism. To enable future works to make more informed design decisions, we perform a systematic analysis of the consequences and interactions of key design choices in simulated social networks, including the choice of base model used to model individual agents, and how they are connected to each other. Using surveys as a proxy for agent opinions, our findings suggest that the geometry of the design space is non-trivial, with some parameters behaving in additive ways while others display more complex interactions. In particular, the choice of the base LLM is the most important variable impacting the simulation outcomes.