arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.06210 2026-03-13 stat.AP cs.LG

Forests of Uncertaint(r)ees: Using tree-based ensembles to estimate probability distributions of future conflict

Daniel Mittermaier, Tobias Bohne, Martin Hofer, Daniel Racek

Comments 23 pages, 4 figures, 3 tables. Replication code available at https://github.com/ccew-unibw/uncertaintrees

详情

英文摘要

Predictions of fatalities from violent conflict on the PRIO-GRID-month (pgm) level are characterized by high levels of uncertainty, limiting their usefulness in practical applications. We discuss the two main sources of uncertainty for this prediction task, the nature of violent conflict and data limitations, embedding conflict prediction in the wider literature on uncertainty quantification in machine learning. Based on this, we develop a strategy to quantify uncertainty in conflict forecasting, shifting from traditional point predictions to full predictive distributions. Our approach combines multiple tree-based classifiers and distributional regressors in a custom AutoML setup, estimating distributions for each pgm individually. We also test the integration of regional models in spatial ensembles as a potential avenue to reduce uncertainty by lowering data requirements and accounting for systematic differences between conflict contexts. The models are able to consistently outperform a suite of benchmarks derived from conflict history in predictions up to one year in advance. Marginal differences in model-wide metrics emphasize the need to understand their behavior for a given prediction problem, in this case characterized by extremely high zero-inflatedness. Adressing this, we compliment our evaluation with a simulation experiment, which demonstrates that our models reflect meaningful performance improvements, which can be traced back to conflict-affected regions. Lastly, we show that the integration of regional models does not decrease performance, opening avenues to integrate additional data sources in the future.

URL PDF HTML ☆

赞 0 踩 0

2510.19444 2026-03-13 cs.LO cs.AI cs.LG

A Foundational Theory of Quantitative Abstraction: Adjunctions, Duality, and Logic for Probabilistic Systems

Nivar Anwer, Ezequiel López-Rubio, David Elizondo, Rafael M. Luque-Baena

Comments Some major mathematical errors that we need to rectify. We cannot specify exact error areas as they are spread throughout. The theorems need further development

2508.21038 2026-03-13 cs.IR cs.CL cs.LG

On the Theoretical Limitations of Embedding-Based Retrieval

Orion Weller, Michael Boratko, Iftekhar Naim, Jinhyuk Lee

Comments Accepted to ICLR'26

2503.15772 2026-03-13 cs.DL cs.AI cs.CR

Detecting LLM-Generated Peer Reviews

Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, Nihar B. Shah

Comments 27 pages, 2 figures

2501.08848 2026-03-13 cs.NI cs.AI cs.LG

RouteNet-Gauss: Hardware-Enhanced Network Modeling with Machine Learning

Carlos Güemes-Palau, Miquel Ferriol-Galmés, Jordi Paillisse-Vilanova, Albert López-Brescó, Pere Barlet-Ros, Albert Cabellos-Aparicio

Comments This article has been accepted for publication in IEEE Transactions on Networking. This is the author's version which has not been fully edited, content may change prior to final publication. Citation information: DOI 10.1109/TON.2026.3668972 \c{opyright} 2026 IEEE. All rights reserved. Personal use is permitted, permission from IEEE must be obtained for all other uses

2411.12184 2026-03-13 stat.ME cs.AI cs.LG

Testability of Instrumental Variables in Additive Nonlinear, Non-Constant Effects Models

Xichen Guo, Zheng Li, Biwei Huang, Yan Zeng, Zhi Geng, Feng Xie

2411.07102 2026-03-13 math.OC cs.LG

Effectively Leveraging Momentum Terms in Stochastic Line Search Frameworks for Fast Optimization of Finite-Sum Problems

Matteo Lapucci, Davide Pucci

2311.11321 2026-03-13 stat.ML cs.AI cs.LG

Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation

Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel

2603.11928 2026-03-13 astro-ph.IM cs.CV

AS-Bridge: A Bidirectional Generative Framework Bridging Next-Generation Astronomical Surveys

Dichang Zhang, Yixuan Shao, Simon Birrer, Dimitris Samaras

Comments 10 pages, 4 figures. Code available at https://github.com/ZHANG7DC/AS-Bridge

2603.11914 2026-03-13 cs.CR cs.AI

Understanding LLM Behavior When Encountering User-Supplied Harmful Content in Harmless Tasks

Junjie Chu, Yiting Qu, Ye Leng, Michael Backes, Yun Shen, Savvas Zannettou, Yang Zhang

Comments 21 pages, 11 figures

详情

英文摘要

Large Language Models (LLMs) are increasingly trained to align with human values, primarily focusing on task level, i.e., refusing to execute directly harmful tasks. However, a subtle yet crucial content-level ethical question is often overlooked: when performing a seemingly benign task, will LLMs -- like morally conscious human beings -- refuse to proceed when encountering harmful content in user-provided material? In this study, we aim to understand this content-level ethical question and systematically evaluate its implications for mainstream LLMs. We first construct a harmful knowledge dataset (i.e., non-compliant with OpenAI's usage policy) to serve as the user-supplied harmful content, with 1,357 entries across ten harmful categories. We then design nine harmless tasks (i.e., compliant with OpenAI's usage policy) to simulate the real-world benign tasks, grouped into three categories according to the extent of user-supplied content required: extensive, moderate, and limited. Leveraging the harmful knowledge dataset and the set of harmless tasks, we evaluate how nine LLMs behave when exposed to user-supplied harmful content during the execution of benign tasks, and further examine how the dynamics between harmful knowledge categories and tasks affect different LLMs. Our results show that current LLMs, even the latest GPT-5.2 and Gemini-3-Pro, often fail to uphold human-aligned ethics by continuing to process harmful content in harmless tasks. Furthermore, external knowledge from the ``Violence/Graphic'' category and the ``Translation'' task is more likely to elicit harmful responses from LLMs. We also conduct extensive ablation studies to investigate potential factors affecting this novel misuse vulnerability. We hope that our study could inspire enhanced safety measures among stakeholders to mitigate this overlooked content-level ethical risk.

URL PDF HTML ☆

赞 0 踩 0

2603.11862 2026-03-13 cs.CR cs.AI

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Ching-Yu Kao, Xinfeng Li, Shenyu Dai, Tianze Qiu, Pengcheng Zhou, Eric Hanchen Jiang, Philip Sperl

Comments 14 pages

详情

英文摘要

High-privilege LLM agents that autonomously process external documentation are increasingly trusted to automate tasks by reading and executing project instructions, yet they are granted terminal access, filesystem control, and outbound network connectivity with minimal security oversight. We identify and systematically measure a fundamental vulnerability in this trust model, which we term the \emph{Trusted Executor Dilemma}: agents execute documentation-embedded instructions, including adversarial ones, at high rates because they cannot distinguish malicious directives from legitimate setup guidance. This vulnerability is a structural consequence of the instruction-following design paradigm, not an implementation bug. To structure our measurement, we formalize a three-dimensional taxonomy covering linguistic disguise, structural obfuscation, and semantic abstraction, and construct \textbf{ReadSecBench}, a benchmark of 500 real-world README files enabling reproducible evaluation. Experiments on the commercially deployed computer-use agent show end-to-end exfiltration success rates up to 85\%, consistent across five programming languages and three injection positions. Cross-model evaluation on four LLM families in a simulation environment confirms that semantic compliance with injected instructions is consistent across model families. A 15-participant user study yields a 0\% detection rate across all participants, and evaluation of 12 rule-based and 6 LLM-based defenses shows neither category achieves reliable detection without unacceptable false-positive rates. Together, these results quantify a persistent \emph{Semantic-Safety Gap} between agents' functional compliance and their security awareness, establishing that documentation-embedded instruction injection is a persistent and currently unmitigated threat to high-privilege LLM agent deployments.

URL PDF HTML ☆

赞 0 踩 0

2603.11842 2026-03-13 cs.CY cs.AI

The Landscape of Generative AI in Information Systems: A Synthesis of Secondary Reviews and Research Agendas

Aleksander Jarzębowicz, Adam Przybyłek, Jacinto Estima, Yen Ying Ng, Jakub Swacha, Beata Zielosko, Lech Madeyski, Noel Carroll, Kai-Kristian Kemell, Bartosz Marcinkowski, Alberto Rodrigues da Silva, Viktoria Stray, Netta Iivari, Anh Nguyen-Duc, Jorge Melegati, Boris Delibašić, Emilio Insfran

2603.11835 2026-03-13 stat.ML cs.LG

Hypercomplex Widely Linear Processing: Fundamentals for Quaternion Machine Learning

Sayed Pouria Talebi, Clive Cheong Took

Comments Contributed chapter to appear in Handbook of Statistics Volume 54: Multidimensional Signal Processing, Elsevier, 2026

2603.11834 2026-03-13 cs.MA cs.AI cs.GT

Hybrid Human-Agent Social Dilemmas in Energy Markets

Isuri Perera, Frits de Nijs, Julian Garcia

Comments 20 pages, 7 figures. Submitted to Proceedings of the Royal Society A, Special Issue on "The evolution of sociality in hybrid human AI populations"

2603.11759 2026-03-13 cs.HC cs.IR cs.LG

Modeling Trial-and-Error Navigation With a Sequential Decision Model of Information Scent

Xiaofu Jin, Yunpeng Bai, Antti Oulasvirta

2603.11701 2026-03-13 stat.ML cs.LG

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Mustafa Cavus

Comments 19 pages, 3 figures

详情

英文摘要

Many machine learning tasks admit multiple models that perform almost equally well, a phenomenon known as predictive multiplicity. A fundamental source of this multiplicity is observational multiplicity, which arises from the stochastic nature of label collection: observed training labels represent only a single realization of the underlying ground-truth probabilities. While theoretical frameworks for observational multiplicity have been established for logistic regression, their implications for non-smooth, partition-based models like decision trees remain underexplored. In this paper, we introduce two complementary notions of observational multiplicity for decision tree classifiers: leaf regret and structural regret. Leaf regret quantifies the intrinsic variability of predictions within a fixed leaf due to finite-sample noise, while structural regret captures variability induced by the instability of the learned tree structure itself. We provide a formal decomposition of observational multiplicity into these two components and establish statistical guarantees. Our experimental evaluation across diverse credit risk scoring datasets confirms the near-perfect alignment between our theoretical decomposition and the empirically observed variance. Notably, we find that structural regret is the primary driver of observational multiplicity, accounting for over 15 times the variability of leaf regret in some datasets. Furthermore, we demonstrate that utilizing these regret measures as an abstention mechanism in selective prediction can effectively identify arbitrary regions and improve model safety, elevating recall from 92% to 100% on the most stable sub-populations. These results establish a rigorous framework for quantifying observational multiplicity, aligning with recent advances in algorithmic safety and interpretability.

URL PDF HTML ☆

赞 0 踩 0

2603.11677 2026-03-13 cs.HC cs.AI cs.CL

From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration

Gaole He, Brian Y. Lim

Comments CHI 2026 Workshop on Human-Agent Collaboration

2603.11676 2026-03-13 cs.NE cs.AI

Stable Spike: Dual Consistency Optimization via Bitwise AND Operations for Spiking Neural Networks

Yongqi Ding, Kunshan Yang, Linze Li, Yiyang Zhang, Mengmeng Jing, Lin Zuo

Comments Accepted by CVPR 2026

2603.11632 2026-03-13 cs.HC cs.RO

From Pets to Robots: MojiKit as a Data-Informed Toolkit for Affective HRI Design

Liwen He, Pingting Chen, Ziheng Tang, Yixiao Liu, Jihong Jeung, Teng Han, Xin Tong

Comments 25 pages, 11 figures, Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26)

2603.11619 2026-03-13 cs.CR cs.AI

Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats

Xinhao Deng, Yixiang Zhang, Jiaqing Wu, Jiaqi Bai, Sibo Yi, Zhuoheng Zou, Yue Xiao, Rennai Qiu, Jianan Ma, Jialuo Chen, Xiaohu Du, Xiaofang Yang, Shiwen Cui, Changhua Meng, Weiqiang Wang, Jiaxing Song, Ke Xu, Qi Li

2603.11551 2026-03-13 cs.HC cs.CV cs.GR

Shadowless Projection Mapping for Tabletop Workspaces with Synthetic Aperture Projector

Takahiro Okamoto, Masaki Takeuchi, Masataka Sawayama, Daisuke Iwai

2603.11532 2026-03-13 math.OC cs.LG stat.ME

Simultaneous estimation of multiple discrete unimodal distributions under stochastic order constraints

Yasuhiro Yoshida, Noriyoshi Sukegawa, Jiro Iwanaga

2603.11472 2026-03-13 cs.SI cs.LG physics.soc-ph

HawkesRank: Event-Driven Centrality for Real-Time Importance Ranking

Didier Sornette, Yishan Luo, Sandro Claudio Lera

Comments 10 pages, 3 figures + SM (8 pages, 2 figures)

2603.11468 2026-03-13 cs.MM cs.AI cs.SD

Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation

Yubeen Lee, Sangeun Lee, Junyeop Cha, Eunil Park

Comments 8 pages, 3 figures, 2 pages

2603.11398 2026-03-13 cs.NI cs.AI

Efficient Cross-View Localization in 6G Space-Air-Ground Integrated Network

Min Hao, Yanbing Xu, Maoqiang Wu, Jinglin Huang, Chen Shang, Jiacheng Wang, Ruichen Zhang, Jiawen Kang, Dusit Niyato, Zhu Han, Wei Ni

2603.11392 2026-03-13 cs.NI cs.AI

Agentic AI for Embodied-enhanced Beam Prediction in Low-Altitude Economy Networks

Min Hao, Zhizhuo Li, Zirui Zhang, Maoqiang Wu, Han Zhang, Rong Yu

2603.11384 2026-03-13 cs.HC cs.AI

Ghost Framing Theory: Exploring the role of generative AI in new venture rhetorical legitimation

Greg Nyilasy

2603.11375 2026-03-13 cs.SI cs.AI cs.CY

How do AI agents talk about science and research? An exploration of scientific discussions on Moltbook using BERTopic

Oliver Wieczorek

Comments 35 pages, 3 figures, 5 tables

2603.11368 2026-03-13 stat.ML cs.LG econ.EM stat.AP stat.ME

Spatially Robust Inference with Predicted and Missing at Random Labels

Stephen Salerno, Zhenke Wu, Tyler McCormick

2603.11356 2026-03-13 cs.SE cs.AI cs.MA

Resolving Java Code Repository Issues with iSWE Agent

Jatin Ganhotra, Sami Serhan, Antonio Abu Nassar, Avraham Shinnar, Ziv Nevo, Martin Hirzel