arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.25941 2026-03-16 cs.CL

RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline

André V. Duarte, Xuying li, Bin Zeng, Arlindo L. Oliveira, Lei Li, Zhuo Li

详情

英文摘要

If we cannot inspect the training data of a large language model (LLM), how can we ever know what it has seen? We believe the most compelling evidence arises when the model itself freely reproduces the target content. As such, we propose RECAP, an agentic pipeline designed to elicit and verify memorized training data from LLM outputs. At the heart of RECAP is a feedback-driven loop, where an initial extraction attempt is evaluated by a secondary language model, which compares the output against a reference passage and identifies discrepancies. These are then translated into minimal correction hints, which are fed back into the target model to guide subsequent generations. In addition, to address alignment-induced refusals, RECAP includes a jailbreaking module that detects and overcomes such barriers. We evaluate RECAP on EchoTrace, a new benchmark spanning over 30 full books, and the results show that RECAP leads to substantial gains over single-iteration approaches. For instance, with GPT-4.1, the average ROUGE-L score for the copyrighted text extraction improved from 0.38 to 0.47 - a nearly 24% increase.

URL PDF HTML ☆

赞 0 踩 0

2510.20591 2026-03-16 cs.AI

Transferable Graph Learning for Transmission Congestion Management via Busbar Splitting

Ali Rajaei, Peter Palensky, Jochen L. Cremer

2510.19422 2026-03-16 cs.LG cs.CL

LLM Unlearning with LLM Beliefs

Kemou Li, Qizhou Wang, Yue Wang, Fengpeng Li, Jun Liu, Bo Han, Jiantao Zhou

Comments ICLR 2026

2510.17914 2026-03-16 cs.LG cs.AI cs.CV

NeuCo-Bench: A Novel Benchmark Framework for Neural Embeddings in Earth Observation

Rikard Vinge, Isabelle Wittmann, Jannik Schneider, Michael Marszalek, Luis Gilch, Thomas Brunschwiler, Conrad M Albrecht

2510.15511 2026-03-16 cs.LG cs.AI

Language Models are Injective and Hence Invertible

Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele Rodolà

2510.00705 2026-03-16 cs.CV

Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs

Sanghwan Kim, Rui Xiao, Stephan Alaniz, Yongqin Xian, Zeynep Akata

2509.26495 2026-03-16 cs.AI

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

Jingdi Lei, Varun Gumma, Rishabh Bhardwaj, Seok Min Lim, Chuan Li, Amir Zadeh, Soujanya Poria

2509.22598 2026-03-16 cs.CL cs.FL cs.LG

From Formal Language Theory to Statistical Learning: Finite Observability of Subregular Languages

Katsuhiko Hayashi, Hidetaka Kamigaito

Comments 11 pages, 5 figures

2509.13577 2026-03-16 cs.CV cs.LG cs.RO

Dynamic Aware: Adaptive Multi-Mode Out-of-Distribution Detection for Trajectory Prediction in Autonomous Vehicles

Tongfei Guo, Lili Su

Comments 8 pages, 7 figures

2509.09677 2026-03-16 cs.AI

The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping

Comments Published at ICLR 2026

2509.08093 2026-03-16 cs.CL

Evolution and compression in LLMs: On the emergence of human-aligned categorization

Nathaniel Imel, Noga Zaslavsky

Comments Published as a conference paper at ICLR 2026 (The Fourteenth International Conference on Learning Representations). OpenReview: https://openreview.net/forum?id=s7gSTR2AqA&noteId=s7gSTR2AqA

2508.11360 2026-03-16 cs.AI cs.HC

CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks

Songqin Nong, Xiaoxuan Tang, Jingxuan Xu, Sheng Zhou, Jianfeng Chen, Tao Jiang, Wenhao Xu

2508.09325 2026-03-16 cs.CV cs.AI cs.LG cs.RO

SegDAC: Visual Generalization in Reinforcement Learning via Dynamic Object Tokens

Alexandre Brown, Glen Berseth

Comments 12 pages

2508.05880 2026-03-16 cs.CL cs.AI

Large language models show fragile cognitive reasoning about human emotions

Sree Bhattacharyya, Evgenii Kuriabov, Lucas Craig, Tharun Dilliraj, Reginald B. Adams,, Jia Li, James Z. Wang

Comments Under Review, a version was presented at WiML Workshop @ NeurIPS 2025

2508.00304 2026-03-16 cs.LG

Invariant Graph Transformer for Out-of-Distribution Generalization

Tianyin Liao, Ziwei Zhang, Yufei Sun, Chunyu Hu, Jianxin Li

Comments Accepted by ACM SIGKDD 2026

2507.20858 2026-03-16 cs.CL

A survey of diversity quantification in natural language processing: The why, what, where and how

Louis Estève, Marie-Catherine de Marneffe, Nurit Melnik, Agata Savary, Olha Kanishcheva

2506.21486 2026-03-16 cs.CV cs.LG math.PR

Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection

Tobias J. Riedlinger, Kira Maag, Hanno Gottschalk

Comments 20 pages, 7 figures, 7 tables

2506.17564 2026-03-16 cs.LG cs.AI cs.RO

Accelerating Residual Reinforcement Learning with Uncertainty Estimation

Lakshita Dodeja, Karl Schmeckpeper, Shivam Vats, Thomas Weng, Mingxi Jia, George Konidaris, Stefanie Tellex

2506.17188 2026-03-16 cs.CL cs.AI cs.IR

Towards AI Search Paradigm

Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Wenwen Ye, Lixin Su, Xinyu Ma, Lingyong Yan, Long Xia, Daiting Shi, Junfeng Wang, Xiangyu Zhao, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin

2506.07597 2026-03-16 cs.CL

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Oscar Sainz, Naiara Perez, Julen Etxaniz, Joseba Fernandez de Landa, Itziar Aldabe, Iker García-Ferrero, Aimar Zabala, Ekhi Azurmendi, German Rigau, Eneko Agirre, Mikel Artetxe, Aitor Soroa

Comments Accepted at EMNLP 2025 Main Conference

2506.00819 2026-03-16 cs.RO cs.AI

DriveMind: A Dual Visual Language Model-based Reinforcement Learning Framework for Autonomous Driving

Dawood Wasif, Terrence J. Moore, Chandan K. Reddy, Frederica Free-Nelson, Seunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Jin-Hee Cho

Comments Submitted to IEEE Transactions on Intelligent Vehicles (T-IV)

2505.23120 2026-03-16 cs.CV

MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation

Siyuan Wang, Jiawei Liu, Wei Wang, Yeying Jin, Jinsong Du, Zhi Han

Comments Accepted by IEEE TCSVT

2505.13702 2026-03-16 cs.LG physics.ins-det

Unsupervised anomaly detection in MeV ultrafast electron diffraction

Mariana A. Fazio, Manel Martinez-Ramon, Salvador Sosa Güitron, Marcus Babzien, Mikhail Fedurin, Junjie Li, Mark Palmer, Sandra S. Biedron

2505.12050 2026-03-16 cs.CL cs.AI cs.LG

AdaBoN: Adaptive Best-of-N Alignment

Vinod Raman, Hilal Asi, Satyen Kale

Comments 25 pages

2504.16788 2026-03-16 cs.CV cs.AI

Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation

Lakshita Agarwal, Bindu Verma

2502.13900 2026-03-16 cs.LG

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

Antoine Moulin, Gergely Neu, Luca Viano

2501.15194 2026-03-16 cs.LG stat.CO stat.ML

Reliable Pseudo-labeling via Optimal Transport with Attention for Short Text Clustering

Zhihao Yao

Comments arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

2501.03584 2026-03-16 cs.LG

Discriminative Representation learning via Attention-Enhanced Contrastive Learning for Short Text Clustering

Zhihao Yao

Comments arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submission

2412.16253 2026-03-16 cs.CV cs.GR

ExCellGen: Fast, Controllable, Photorealistic 3D Scene Generation from a Single Real-World Exemplar

Clément Jambon, Changwoon Choi, Dongsu Zhang, Olga Sorkine-Hornung, Young Min Kim

2411.10170 2026-03-16 cs.RO

Better Safe Than Sorry: Enhancing Arbitration Graphs for Safe and Robust Autonomous Decision-Making

Piotr Spieker, Nick Le Large, Martin Lauer

Comments 7 pages, 5 figures, Presented at 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC), source code available at github.com/KIT-MRT/arbitration_graphs, v2: Added paragraph discussing the differences between arbitration graphs and behavior trees, v3: Updated version as presented at SMC