arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.06170 2026-04-08 cs.CL

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, Hisham Cholakkal

Comments 19 pages, 7 figures, 8 tables, ACL main (Oral)

详情

英文摘要

The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being trained to utilize various tools. In this paper, we introduce Paper Circle, a multi-agent research discovery and analysis system designed to reduce the effort required to find, assess, organize, and understand academic literature. The system comprises two complementary pipelines: (1) a Discovery Pipeline that integrates offline and online retrieval from multiple sources, multi-criteria scoring, diversity-aware ranking, and structured outputs; and (2) an Analysis Pipeline that transforms individual papers into structured knowledge graphs with typed nodes such as concepts, methods, experiments, and figures, enabling graph-aware question answering and coverage verification. Both pipelines are implemented within a coder LLM-based multi-agent orchestration framework and produce fully reproducible, synchronized outputs including JSON, CSV, BibTeX, Markdown, and HTML at each agent step. This paper describes the system architecture, agent roles, retrieval and scoring methods, knowledge graph schema, and evaluation interfaces that together form the Paper Circle research workflow. We benchmark Paper Circle on both paper retrieval and paper review generation, reporting hit rate, MRR, and Recall at K. Results show consistent improvements with stronger agent models. We have publicly released the website at https://papercircle.vercel.app/ and the code at https://github.com/MAXNORM8650/papercircle.

URL PDF HTML ☆

赞 0 踩 0

2604.06169 2026-04-08 cs.LG cs.AI cs.CL stat.ML

In-Place Test-Time Training

Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai

Comments ICLR 2026 Oral Presentation; Code is released at https://github.com/ByteDance-Seed/In-Place-TTT

2604.06167 2026-04-08 cs.LG math.AT

Topological Characterization of Churn Flow and Unsupervised Correction to the Wu Flow-Regime Map in Small-Diameter Vertical Pipes

Brady Koenig, Sushovan Majhi, Atish Mitra, Abigail Stein, Burt Todd

2604.06160 2026-04-08 cs.CV cs.LG

The Character Error Vector: Decomposable errors for page-level OCR evaluation

Jonathan Bourne, Mwiza Simbeye, Joseph Nockels

Comments 6643 words, 5 figures, 15 tables

2604.06159 2026-04-08 cs.LG

Target Policy Optimization

Jean Kaddour

2604.06156 2026-04-08 cs.CV cs.AI cs.CL

MMEmb-R1: Reasoning-Enhanced Multimodal Embedding with Pair-Aware Selection and Adaptive Control

Yuchi Wang, Haiyang Yu, Weikang Bian, Jiefeng Long, Xiao Liang, Chao Feng, Hongsheng Li

2604.06154 2026-04-08 cs.CL

Exclusive Unlearning

Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao, Yohei Oseki, Masaru Isonuma

2604.06150 2026-04-08 cs.RO

Delta6: A Low-Cost, 6-DOF Force-Sensing Flexible End-Effector

Yue Feng, Weicheng Huang, Chen Qiu, Huixu Dong, I-Ming Chen

Comments This work has been submitted to the IEEE for possible publication

2604.06138 2026-04-08 cs.SD cs.AI

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Yanis Labrak, David Grünert, Séverin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf

Comments Submitted for review at Interspeech 2026

2604.06133 2026-04-08 cs.RO

Learning-Guided Force-Feedback Model Predictive Control with Obstacle Avoidance for Robotic Deburring

Krzysztof Wojciechowski, Ege Gursoy, Arthur Haffemayer, Sebastien Kleff, Vincent Bonnet, Florent Lamiraux, Nicolas Mansard

Comments Accepted to ICRA 2026

2604.06129 2026-04-08 cs.CV cs.AI

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

David Picard, Nicolas Dufour, Lucas Degeorge, Arijit Ghosh, Davide Allegro, Tom Ravaud, Yohann Perron, Corentin Sautier, Zeynep Sonat Baltaci, Fei Meng, Syrine Kalleli, Marta López-Rauhut, Thibaut Loiseau, Ségolène Albouy, Raphael Baena, Elliot Vincent, Loic Landrieu

Comments Accepted to CVPR Findings 2026

2604.06126 2026-04-08 cs.LG cs.AI

Gym-Anything: Turn any Software into an Agent Environment

Pranjal Aggarwal, Graham Neubig, Sean Welleck

详情

英文摘要

Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limited set of software with limited economic value, such as basic e-commerce and OS-configuration tasks. A key reason is that creating environments for complex software requires significant time and human effort, and therefore does not scale. To address this, we introduce Gym-Anything, a framework for converting any software into an interactive computer-use environment. We frame environment creation itself as a multi-agent task: a coding agent writes setup scripts, downloads real-world data, and configures the software, while producing evidence of correct setup. An independent audit agent then verifies evidence for the environment setup against a quality checklist. Using a taxonomy of economically valuable occupations grounded in U.S. GDP data, we apply this pipeline to 200 software applications with broad occupational coverage. The result is CUA-World, a collection of over 10K long-horizon tasks spanning domains from medical science and astronomy to engineering and enterprise systems, each configured with realistic data along with train and test splits. CUA-World also includes CUA-World-Long, a challenging long-horizon benchmark with tasks often requiring over 500 steps, far exceeding existing benchmarks. Distilling successful trajectories from the training split into a 2B vision-language model outperforms models 2$\times$ its size. We also apply the same auditing principle at test time: a separate VLM reviews completed trajectories and provides feedback on what remains, improving Gemini-3-Flash on CUA-World-Long from 11.5% to 14.0%. We release all code, infrastructure, and benchmark data to facilitate future research in realistic computer-use agents.

URL PDF HTML ☆

赞 0 踩 0

2604.06124 2026-04-08 cs.CV cs.AI

Lightweight Multimodal Adaptation of Vision Language Models for Species Recognition and Habitat Context Interpretation in Drone Thermal Imagery

Hao Chen, Fang Qiu, Fangchao Dong, Defei Yang, Eve Bohnett, Li An

2604.06113 2026-04-08 cs.CV

SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

Hiba Dahmani, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Laurent Caraffa, Jean-Philippe Tarel, Roland Brémond

2604.06109 2026-04-08 cs.LG cs.DS

Learning $\mathsf{AC}^0$ Under Graphical Models

Gautam Chandrasekaran, Jason Gaitonde, Ankur Moitra, Arsen Vasilyan

Comments 57 pages

2604.06107 2026-04-08 cs.AI math.HO math.LO

Artificial Intelligence and the Structure of Mathematics

Maissam Barkeshli, Michael R. Douglas, Michael H. Freedman

Comments 45 pages

2604.06099 2026-04-08 cs.CV

Extending ZACH-ViT to Robust Medical Imaging: Corruption and Adversarial Stress Testing in Low-Data Regimes

Athanasios Angelakis, Marta Gomez-Barrero

Comments Accepted at CVPR 2026 Workshop (PHAROS-AIF-MIH)

2604.06086 2026-04-08 cs.CL cs.AI

LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces

Olexander Mazurets, Olexander Barmak, Leonid Bedratyuk, Iurii Krak

详情

英文摘要

Modern Transformer-based language models achieve strong performance in natural language processing tasks, yet their latent semantic spaces remain largely uninterpretable black boxes. This paper introduces LAG-XAI (Lie Affine Geometry for Explainable AI), a novel geometric framework that models paraphrasing not as discrete word substitutions, but as a structured affine transformation within the embedding space. By conceptualizing paraphrasing as a continuous geometric flow on a semantic manifold, we propose a computationally efficient mean-field approximation, inspired by local Lie group actions. This allows us to decompose paraphrase transitions into geometrically interpretable components: rotation, deformation, and translation. Experiments on the noisy PIT-2015 Twitter corpus, encoded with Sentence-BERT, reveal a "linear transparency" phenomenon. The proposed affine operator achieves an AUC of 0.7713. By normalizing against random chance (AUC 0.5), the model captures approximately 80% of the non-linear baseline's effective classification capacity (AUC 0.8405), offering explicit parametric interpretability in exchange for a marginal drop in absolute accuracy. The model identifies fundamental geometric invariants, including a stable matrix reconfiguration angle (~27.84°) and near-zero deformation, indicating local isometry. Cross-domain generalization is confirmed via direct cross-corpus validation on an independent TURL dataset. Furthermore, the practical utility of LAG-XAI is demonstrated in LLM hallucination detection: using a "cheap geometric check," the model automatically detected 95.3% of factual distortions on the HaluEval dataset by registering deviations beyond the permissible semantic corridor. This approach provides a mathematically grounded, resource-efficient path toward the mechanistic interpretability of Transformers.

URL PDF HTML ☆

赞 0 踩 0

2604.06081 2026-04-08 cs.LG cs.CE math.DS

A machine learning framework for uncovering stochastic nonlinear dynamics from noisy data

Matteo Bosso, Giovanni Franzese, Kushal Swamy, Maarten Theulings, Alejandro M. Aragón, Farbod Alijani

Comments 25 pages, 12 figures, 4 tables

2604.06079 2026-04-08 cs.CV cs.AI

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

Juekai Lin, Yun Zhu, Honglin Lin, Sijing Li, Tianwei Lin, Zheng Liu, Xiaoyang Wang, Wenqiao Zhang, Lijun Wu

2604.06074 2026-04-08 cs.CV cs.AI cs.MM

Graph-PiT: Enhancing Structural Coherence in Part-Based Image Synthesis via Graph Priors

Junbin Zhang, Meng Cao, Feng Tan, Yikai Lin, Yuexian Zou

Comments 11 pages, 5 figures, Accepted by ICME 2026

2604.06073 2026-04-08 cs.RO cs.HC

Intuitive Human-Robot Interaction: Development and Evaluation of a Gesture-Based User Interface for Object Selection

Bijan Kavousian, Oliver Petrovic, Werner Herfs

Comments This submission contains both an English translation and the original German version. The German version was originally published in the Proceedings of the 72nd GfA Conference (2026)

2604.06071 2026-04-08 cs.CL cs.AI cs.HC

Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles

Ben Wigler, Maria Tsfasman, Tiffany Matej Hrkalovic

Comments Under review at COLM

详情

英文摘要

Personality traits are richly encoded in natural language, and large language models (LLMs) trained on human text can simulate personality when conditioned on persona descriptions. However, existing evaluations rely predominantly on questionnaire self-report by the conditioned model, are limited in architectural diversity, and rarely use real human psychometric data. Without addressing these limitations, it remains unclear whether personality conditioning produces psychometrically informative representations of individual differences or merely superficial alignment with trait descriptors. To test how robustly LLMs can encode personality into extended text, we condition LLMs on real psychometric profiles from 290 participants to generate first-person life story narratives, and then task independent LLMs to recover personality scores from those narratives alone. We show that personality scores can be recovered from the generated narratives at levels approaching human test-retest reliability (mean r = 0.750, 85% of the human ceiling), and that recovery is robust across 10 LLM narrative generators and 3 LLM personality scorers spanning 6 providers. Decomposing systematic biases reveals that scoring models achieve their accuracy while counteracting alignment-induced defaults. Content analysis of the generated narratives shows that personality conditioning produces behaviourally differentiated text: nine of ten coded features correlate significantly with the same features in participants' real conversations, and personality-driven emotional reactivity patterns in narratives replicate in real conversational data. These findings provide evidence that the personality-language relationship captured during pretraining supports robust encoding and decoding of individual differences, including characteristic emotional variability patterns that replicate in real human behaviour.

URL PDF HTML ☆

赞 0 踩 0

2604.06070 2026-04-08 cs.CL cs.LG

Short Data, Long Context: Distilling Positional Knowledge in Transformers

Patrick Huber, Ernie Chang, Chinnadhurai Sankar, Rylan Conway, Igor Fedorov, Md Rifat Arefin, Adithya Sagar

2604.06067 2026-04-08 cs.RO

HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning

Jiyao Zhang, Zimu Han, Junhan Wang, Xionghao Wu, Shihong Lin, Jinzhou Li, Hongwei Fan, Ruihai Wu, Dongjiang Li, Hao Dong

2604.06066 2026-04-08 cs.CL

From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection

Hongxu Zhou

2604.06028 2026-04-08 cs.CL cs.AI cs.IR

A Multi-Stage Validation Framework for Trustworthy Large-scale Clinical Information Extraction using Large Language Models

Maria Mahbub, Gregory M. Dams, Josh Arnold, Caitlin Rizy, Sudarshan Srinivasan, Elliot M. Fielstein, Minu A. Aghevli, Kamonica L. Craig, Elizabeth M. Oliva, Joseph Erdos, Jodie Trafton, Ioana Danciu

2604.06025 2026-04-08 cs.RO

A Co-Design Framework for High-Performance Jumping of a Five-Bar Monoped with Actuator Optimization

Aastha Mishra, Aman Singh, Shishir Kolathaya

Comments 8 pages, 10 figures

2604.06022 2026-04-08 cs.CL

BiMind: A Dual-Head Reasoning Model with Attention-Geometry Adapter for Incorrect Information Detection

Zhongxing Zhang, Emily K. Vraga, Jisu Huh, Jaideep Srivastava

2604.06017 2026-04-08 cs.CV

Toward Aristotelian Medical Representations: Backpropagation-Free Layer-wise Analysis for Interpretable Generalized Metric Learning on MedMNIST

Michael Karnes, Alper Yilmaz