arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.10043 2026-03-18 cs.IR cs.AI

Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights

Yifan Wu, Lutao Yan, Yizhang Zhu, Yenchi Tseng, Yinan Mei, Yong Wang, Jiannan Wang, Nan Tang, Yuyu Luo

详情

英文摘要

Text-to-chart retrieval, enabling users to find relevant charts via natural language queries, has gained significant attention. However, evaluating models in real-world business intelligence (BI) scenarios is challenging, as current benchmarks fail to simulate realistic user queries or test for deep semantic understanding with static chart images.To address this gap, we introduce CRBench, the first real-world BI-sourced benchmark comprising 21,862 charts and 326 queries, utilizing a Target-and-Distractor paradigm to evaluate discriminative retrieval among highly similar candidates. Testing on CRBench reveals that existing methods, which rely primarily on visual features, perform poorly and fail to capture the rich analytical semantics of charts. To address this performance bottleneck, we propose a semantic insights synthesis pipeline that automatically generates three hierarchical levels of insights for charts: visual patterns, statistical properties, and practical applications. Using this pipeline, we produced 207,498 semantic insights for 69,166 charts as training data. By leveraging this data to bridge the gap between natural language query intent and latent visual representations via multi-level semantic supervision, we develop ChartFinder, a specialized model capable of deep cross-model reasoning. Experimental results show ChartFinder significantly outperforms state-of-the-art methods on CRBench, achieving up to 66.9% NDCG@10 for precise queries (an 11.58% improvement) and an average increase of 5% across nearly all metrics for fuzzy queries. This work provides the community with a much-needed benchmark for realistic evaluation and demonstrates a powerful data synthesis paradigm for enhancing a model's semantic understanding of charts.

URL PDF HTML ☆

赞 0 踩 0

2505.09647 2026-03-18 cs.DS cs.IT cs.LG math.IT math.PR math.ST stat.TH

On Unbiased Low-Rank Approximation with Minimum Distortion

Leighton Pate Barnes, Stephen Cameron, Benjamin Howard

2505.07272 2026-03-18 stat.ML cs.LG eess.SP

ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data

Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano

2505.02314 2026-03-18 cs.AR cs.AI cs.LG

NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities

James Read, Ming-Yen Lee, Wei-Hsing Huang, Yuan-Chun Luo, Anni Lu, Shimeng Yu

Comments 15 pages, 9 figures, 6 tables

2504.19596 2026-03-18 eess.SP cs.LG

Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities

Wei-Bang Jiang, Xi Fu, Yi Ding, Cuntai Guan

Comments 19 pages, 5 figures

2504.13376 2026-03-18 quant-ph cs.AI cs.ET

Addressing the Minor-Embedding Problem in Quantum Annealing and Evaluating State-of-the-Art Algorithm Performance

Aitor Gomez-Tejedor, Eneko Osaba, Esther Villar-Rodriguez

Comments Paper accepted for publication in Future Generation Computer Systems journal

2504.13336 2026-03-18 stat.ML cs.LG math.ST stat.TH

On the minimax optimality of Flow Matching through the connection to kernel density estimation

Lea Kunkel, Mathias Trabs

2504.09347 2026-03-18 stat.ML cs.LG math.ST stat.TH

Inference for Deep Neural Network Estimators in Generalized Nonparametric Models

Xuran Meng, Yi Li

Comments 91 pages, 14 figures, 20 tables

2504.07481 2026-03-18 physics.ao-ph cs.LG

A Mechanism-Learning Deeply Coupled Model for Remote Sensing Retrieval of Global Land Surface Temperature

Tian Xie, Menghui Jiang, Huanfeng Shen, Huifang Li, Chao Zeng, Jun Ma, Guanhao Zhang, Liangpei Zhang

2502.17533 2026-03-18 math.HO cs.AI cs.CL math.NT

From Euler to AI: Unifying Formulas for Mathematical Constants

Tomer Raz, Michael Shalyt, Elyasheev Leibtag, Rotem Kalisch, Shachar Weinbaum, Yaron Hadad, Ido Kaminer

Comments Final version for NeurIPS2025. Published at https://neurips.cc/virtual/2025/loc/san-diego/poster/117099

2502.15858 2026-03-18 cs.CY cs.AI cs.LG

Generative AI Training and Copyright Law

Sebastian Stober, Tim W. Dornis

Comments submitted as an overview article to the Transactions of the International Society for Music Information Retrieval

2412.15004 2026-03-18 cs.CR cs.AI cs.CL

From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security

Enna Basic, Alberto Giaretta

2410.21657 2026-03-18 physics.ao-ph cs.AI cs.LG

A-UTE: Advection Informed, Uncertainty Aware Temperature Emulator

Hira Saleem, Flora Salim, Cormac Purcell

2407.19892 2026-03-18 stat.ML cs.LG q-bio.GN

Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells

Bailey Andrew, Erica L. Harris, James A. Poulter, David R. Westhead, Luisa Cutillo

Comments 8 pages (35 with appendix+references), 8 figures, 10 tables

2406.07714 2026-03-18 cs.CR cs.AI cs.SE

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Hongxiang Zhang, Yuyang Rong, Yifeng He, Hao Chen

Comments The 7th ACM/IEEE International Conference on Automation of Software Test (AST 2026)

2405.19553 2026-03-18 math.ST cs.LG math.PR stat.ML stat.TH

Convergence Bounds for Sequential Monte Carlo on Multimodal Distributions using Soft Decomposition

Holden Lee, Matheau Santana-Gijzen

2402.03819 2026-03-18 stat.ML cs.LG

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants

Abdoulaye Sakho, Emmanuel Malherbe, Erwan Scornet

2306.12272 2026-03-18 cond-mat.mtrl-sci cs.CE cs.LG math.CO

From structure mining to unsupervised exploration of atomic octahedral networks

R. Patrick Xian, Ryan J. Morelock, Ido Hadar, Charles B. Musgrave, Christopher Sutton

Comments updated version, incl. three supporting information files

2211.04129 2026-03-18 math.OC cs.LG stat.ML

An Efficient Global Optimization Algorithm with Adaptive Estimates of the Local Lipschitz Constants

Danny D'Agostino

Comments Accepted in Journal of Global Optimization, Springer

2012.14309 2026-03-18 q-bio.PE cond-mat.soft cs.CL physics.bio-ph

General Mechanism of Evolution Shared by Proteins and Words

Li-Min Wang, Hsing-Yi Lai, Sun-Ting Tsai, Chen Siang Ng, Kevin Sheng-Kai Ma, Shan-Jyun Wu, Meng-Xue Tsai, Yi-Ching Su, Daw-Wei Wang, Tzay-Ming Hong

详情

英文摘要

Complex systems, such as life and languages, are governed by principles of evolution. The analogy and comparison between biology and linguistics\cite{alphafold2, RoseTTAFold, lang_virus, cell language, faculty1, language of gene, Protein linguistics, dictionary, Grammar of pro_dom, complexity, genomics_nlp, InterPro, language modeling, Protein language modeling} provide a computational foundation for characterizing and analyzing protein sequences, human corpora, and their evolution. However, no general mathematical formula has been proposed so far to illuminate the origin of quantitative hallmarks shared by life and language. Here we show several new statistical relationships shared by proteins and words, which inspire us to establish a general mechanism of evolution with explicit formulations that can incorporate both old and new characteristics. We found natural selection can be quantified via the entropic formulation by the principle of least effort to determine the sequence variation that survives in evolution. Besides, the origin of power law behavior and how changes in the environment stimulate the emergence of new proteins and words can also be explained via the introduction of function connection network. Our results demonstrate not only the correspondence between genetics and linguistics over their different hierarchies but also new fundamental physical properties for the evolution of complex adaptive systems. We anticipate our statistical tests can function as quantitative criteria to examine whether an evolution theory of sequence is consistent with the regularity of real data. In the meantime, their correspondence broadens the bridge to exchange existing knowledge, spurs new interpretations, and opens Pandora's box to release several potentially revolutionary challenges. For example, does linguistic arbitrariness conflict with the dogma that structure determines function?

URL PDF HTML ☆

赞 0 踩 0

2603.15834 2026-03-18 astro-ph.CO cs.CV

Spectral Hierarchy of the Cosmic Web

Francisco-Shu Kitaura, Francesco Sinigaglia

Comments 32 pages, 7 figures, 1 table

2603.15809 2026-03-18 cs.MA cs.AI

Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks

Samira Abedini, Sina Mavali, Lea Schönherr, Martin Pawelczyk, Rebekka Burkholz

2603.15725 2026-03-18 cs.MA cs.ET cs.LG cs.RO

S2Act: Simple Spiking Actor

Ugur Akcal, Seung Hyun Kim, Mikihisa Yuasa, Hamid Osooli, Jiarui Sun, Ribhav Sahu, Mattia Gazzola, Huy T. Tran, Girish Chowdhary

Comments This work has been submitted to the IEEE for possible publication

2603.15717 2026-03-18 cs.AR cs.CV eess.IV

GLANCE: Gaze-Led Attention Network for Compressed Edge-inference

Neeraj Solanki, Hong Ding, Sepehr Tabrizchi, Ali Shafiee Sarvestani, Shaahin Angizi, David Z. Pan, Arman Roohi

2603.15714 2026-03-18 cs.CR cs.AI

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu, Micha Nowak, Nick Winter, Eliot Jones, Andy Zou, Lama Ahmad, Kamalika Chaudhuri, Sahana Chennabasappa, Xander Davies, Lauren Deason, Benjamin L. Edelman, Tanner Emek, Ivan Evtimov, Jim Gust, Maia Hamin, Kat He, Klaudia Krawiecka, Riccardo Patana, Neil Perry, Troy Peterson, Xiangyu Qi, Javier Rando, Zifan Wang, Zihan Wang, Spencer Whitman, Eric Winsor, Arman Zharmagambetov, Matt Fredrikson, Zico Kolter

Comments 38 pages, 16 figures. Newer version to cover Q1 competition results on latest models in progress. Code at https://github.com/grayswansecurity/ipi_arena_os Partial Dataset at https://huggingface.co/datasets/sureheremarv/ipi_arena_attacks

详情

英文摘要

LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code repositories. This creates exposure to indirect prompt injection attacks, where adversarial instructions embedded in external content manipulate agent behavior without user awareness. A critical but underexplored dimension of this threat is concealment: since users tend to observe only an agent's final response, an attack can conceal its existence by presenting no clue of compromise in the final user facing response while successfully executing harmful actions. This leaves users unaware of the manipulation and likely to accept harmful outcomes as legitimate. We present findings from a large scale public red teaming competition evaluating this dual objective across three agent settings: tool calling, coding, and computer use. The competition attracted 464 participants who submitted 272000 attack attempts against 13 frontier models, yielding 8648 successful attacks across 41 scenarios. All models proved vulnerable, with attack success rates ranging from 0.5% (Claude Opus 4.5) to 8.5% (Gemini 2.5 Pro). We identify universal attack strategies that transfer across 21 of 41 behaviors and multiple model families, suggesting fundamental weaknesses in instruction following architectures. Capability and robustness showed weak correlation, with Gemini 2.5 Pro exhibiting both high capability and high vulnerability. To address benchmark saturation and obsoleteness, we will endeavor to deliver quarterly updates through continued red teaming competitions. We open source the competition environment for use in evaluations, along with 95 successful attacks against Qwen that did not transfer to any closed source model. We share model-specific attack data with respective frontier labs and the full dataset with the UK AISI and US CAISI to support robustness research.

URL PDF HTML ☆

赞 0 踩 0

2603.15712 2026-03-18 cond-mat.mtrl-sci cs.AI

LLM-Driven Discovery of High-Entropy Catalysts via Retrieval-Augmented Generation

AI Scientists, Xinyi Lin, Danqing Yin, Ying Guo

2603.15707 2026-03-18 cs.SE cs.AI

SEMAG: Self-Evolutionary Multi-Agent Code Generation

Yulin Peng, Haowen Hou, Xinxin Zhu, Ying Tiffany He, F. Richard Yu

2603.15692 2026-03-18 cs.CR cs.AI

BadLLM-TG: A Backdoor Defender powered by LLM Trigger Generator

Ruyi Zhang, Heng Gao, Songlei Jian, Yusong Tan, Haifang Zhou

Comments 5pages, 2 figures

2603.15690 2026-03-18 cs.SE cs.AI

Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems

Weihao Zhang, Yitong Zhou, Huanyu Qu, Hongyi Li

2603.15686 2026-03-18 physics.chem-ph cs.CE cs.LG

Life cycle assessment for all organic chemicals

Shaohan Chen, Tim Langhorst, Julian Nöhl, Christopher Oberschelp, Martin Pillich, Johannes Schilling, André Bardow

Comments 24 pages, 9 figures