arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.10135 2026-04-16 cs.CL cs.AI

Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities

Zhichen Liu, Yongyuan Li, Yang Xu

Comments Accepted to ACL 2026 main conference

详情

英文摘要

Researchers have explored different ways to improve large language models (LLMs)' capabilities via dummy token insertion in contexts. However, existing works focus solely on the dummy tokens themselves, but fail to leverage the inherent sentence-level structure of natural language. This is a critical oversight, as LLMs acquire linguistic capabilities through exposure to human-generated texts, which are inherently structured at the sentence level. Motivated by this gap, we propose an approach that inserts delimiters at sentence boundaries in LLM inputs, which not only integrates dummy tokens into the context, but also facilitates LLMs with sentence-by-sentence processing behavior during reasoning. Two concrete methods: (1). In-context learning and (2). Supervised fine-tuning are experimented using 7B models to 600B Deepseek-V3. Our results demonstrate consistent improvements across various tasks, with notable gains of up to 7.7\% on GSM8k and 12.5\% on DROP. Furthermore, the fine-tuned LLMs can incorporate sentence awareness evidenced by their internal representations. Our work establishes a simple yet effective technique for enhancing LLM's capabilities, offering promising directions for cognitive-inspired LLM enhancement paradigm.

URL PDF HTML ☆

赞 0 踩 0

2604.10015 2026-04-16 cs.AI cs.CE cs.CL cs.MM

FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks

Yupeng Cao, Haohang Li, Weijin Liu, Wenbo Cao, Anke Xu, Lingfei Qian, Xueqing Peng, Minxue Tang, Zhiyuan Yao, Jimin Huang, K. P. Subbalakshmi, Zining Zhu, Jordan W. Suchow, Yangyang Yu

2604.08261 2026-04-16 cs.CV cs.AI

DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection

Jiangbei Yue, Darren Treanor, Venkataraman Subramanian, Sharib Ali

2604.08064 2026-04-16 cs.AI

ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models

Chonghan Qin, Xiachong Feng, Weitao Ma, Xiaocheng Feng, Lingpeng Kong

Comments Accepted to ACL 2026 Main Conference

2604.08046 2026-04-16 cs.CL

Guaranteeing Knowledge Integration with Joint Decoding for Retrieval-Augmented Generation

Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Yuxi Zhang, Huimin Wang, Yutian Zhao, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu

2604.07165 2026-04-16 cs.AI cs.LG

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization

Yu Li, Sizhe Tang, Tian Lan

2604.02486 2026-04-16 cs.CV cs.CL

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

Haz Sameen Shahgir, Xiaofu Chen, Yu Fu, Erfan Shayegani, Nael Abu-Ghazaleh, Yova Kementchedjhieva, Yue Dong

2604.01034 2026-04-16 cs.RO math.OC

Stein Variational Uncertainty-Adaptive Model Predictive Control

Hrishikesh Sathyanarayan, Ian Abraham

2603.29254 2026-04-16 cs.RO

SuperGrasp: Single-View Object Grasping via Superquadric Similarity Matching, Evaluation, and Refinement

Lijingze Xiao, Jinhong Du, Supeng Diao, Yu Ren, Yang Cong

Comments Minor revisions to the manuscript content, author order, and experimental results

2603.28942 2026-04-16 cs.LG cs.CR

ReproMIA: A Comprehensive Analysis of Model Reprogramming for Proactive Membership Inference Attacks

Chihan Huang, Huaijin Wang, Shuai Wang

Comments This version was posted without enough prior discussion with my collaborator. Thus, it is being withdrawn pending further internal review. The authors do not wish this version to be considered part of the active scientific record in its current form

2603.25924 2026-04-16 cs.CV cs.AI cs.IR

Good Scores, Bad Data: A Metric for Multimodal Coherence

Vasundra Srinivasan

Comments 9 pages, 6 figures, NeurIPS 2024 format

2603.15620 2026-04-16 cs.CV cs.RO

Towards Generalizable Robotic Manipulation in Dynamic Environments

Heng Fang, Shangru Li, Shuhan Wang, Xuanyang Xi, Dingkang Liang, Xiang Bai

Comments Project Page: https://h-embodvis.github.io/DOMINO/

2603.12021 2026-04-16 cs.CL cs.AI

Just Use XML: Revisiting Joint Translation and Label Projection

Thennal DK, Chris Biemann, Hans Ole Hatzel

Comments Accepted to ACL 2026 Findings

2603.06885 2026-04-16 cs.CV

OPTED: Open Preprocessed Trachoma Eye Dataset Using Zero-Shot SAM 3 Segmentation

Kibrom Gebremedhin, Hadush Hailu, Bruk Gebregziabher

Comments 9 figure, 3 tables

2603.01410 2026-04-16 cs.AI

GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning

Yuchen Ying, Weiqi Jiang, Tongya Zheng, Yu Wang, Shunyu Liu, Kaixuan Chen, Mingli Song

2603.00192 2026-04-16 cs.LG stat.AP stat.ML

Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare

Elizabeth W. Miller, Jeffrey D. Blume

详情

英文摘要

In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a substantial source of variability often goes undetected. Even when the data and model architecture are held fixed, randomness introduced by optimization and initialization can lead to materially different risk estimates for the same patient. This problem is largely obscured by standard evaluation practices, which rely on aggregate performance metrics (e.g., log-loss, accuracy) that are agnostic to individual-level stability. As a result, models with indistinguishable aggregate performance can nonetheless exhibit substantial procedural arbitrariness, which can undermine clinical trust. We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics: empirical prediction interval width (ePIW), which captures variability in continuous risk estimates, and empirical decision flip rate (eDFR), which measures instability in threshold-based clinical decisions. We apply these diagnostics to simulated data and GUSTO-I clinical dataset. Across observed settings, we find that for flexible machine-learning models, randomness arising solely from optimization and initialization can induce individual-level variability comparable to that produced by resampling the entire training dataset. Neural networks exhibit substantially greater instability in individual risk predictions compared to logistic regression models. Risk estimate instability near clinically relevant decision thresholds can alter treatment recommendations. These findings that stability diagnostics should be incorporated into routine model validation for assessing clinical reliability.

URL PDF HTML ☆

赞 0 踩 0

2602.23636 2026-04-16 cs.LG cs.AI

FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

Zhihao Ding, Jinming Li, Ze Lu, Jieming Shi

Comments Accepted at ACL 2026

2602.20913 2026-04-16 cs.CV

LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding

Jihao Qiu, Lingxi Xie, Xinyue Huo, Qi Tian, Qixiang Ye

Comments 17 pages, 9 figures, 8 tables, accepted to CVPR 2026

2602.19315 2026-04-16 cs.RO cs.AI

Online Navigation Planning for Long-term Autonomous Operation of Underwater Gliders

Victor-Alexandru Darvariu, Charlotte Z. Reed, Jan Stratmann, Bruno Lacerda, Benjamin Allsup, Stephen Woodward, Elizabeth Siddle, Trishna Saeharaseelan, Owain Jones, Dan Jones, Tobias Ferreira, Chloe Baker, Kevin Chaplin, James Kirk, Ashley Iceton-Morris, Ryan D. Patmore, Jeff Polton, Charlotte Williams, Christopher D. J. Auckland, Rob A. Hall, Alexandra Kokkinaki, Alvaro Lorenzo Lopez, Justin J. H. Buck, Nick Hawes

2602.05574 2026-04-16 cs.CV

A Hybrid CNN and ML Framework for Multi-modal Classification of Movement Disorders Using MRI and Brain Structural Features

Mengyu Li, Ingibjörg Kristjánsdóttir, Thilo van Eimeren, Kathrin Giehl, Lotta M. Ellingsen, the ASAP Neuroimaging Initiative

Comments To be published in Proceedings of SPIE Medical Imaging 2026

2601.20352 2026-04-16 cs.AI

AMA: Adaptive Memory via Multi-Agent Collaboration

Weiquan Huang, Zixuan Wang, Hehai Lin, Sudong Wang, Bo Xu, Qian Li, Beier Zhu, Linyi Yang, Chengwei Qin

Comments 8 pages

2601.13904 2026-04-16 cs.AI cs.HC

PREFAB: PREFerence-based Affective Modeling for Low-Budget Self-Annotation

Jaeyoung Moon, Youjin Choi, Yucheon Park, David Melhart, Georgios N. Yannakakis, Kyung-Joong Kim

Comments CHI '26 Accepted paper

2601.08605 2026-04-16 cs.CL cs.AI

ExpSeek: Self-Triggered Experience Seeking for Web Agents

Wenyuan Zhang, Xinghua Zhang, Haiyang Yu, Shuaiyi Nie, Bingli Wu, Juwei Yue, Tingwen Liu, Yongbin Li

Comments ACL 2026 Findings, the code is accessible at https://github.com/WYRipple/ExpSeek

2601.05991 2026-04-16 cs.AI

3D Instruction Ambiguity Detection

Jiayu Ding, Haoran Tang, Hongbo Jin, Wei Gao, Ge Li

2601.03027 2026-04-16 cs.CL

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

Sindhuja Chaduvula, Ahmed Y. Radwan, Azib Farooq, Yani Ioannou, Shaina Raza

2512.17654 2026-04-16 cs.LG physics.comp-ph physics.med-ph

Learning-Based Estimation of Spatially Resolved Scatter Radiation Fields in Interventional Radiology

Felix Lehner, Pasquale Lombardo, Susana Castillo, Oliver Hupe, Marcus Magnor

2512.17326 2026-04-16 cs.CV

Democratising Pathology Co-Pilots: An Open Pipeline and Dataset for Whole-Slide Vision-Language Modelling

Sander Moonemans, Sebastiaan Ram, Frédérique Meeuwsen, Carlijn Lems, Jeroen van der Laak, Geert Litjens, Francesco Ciompi

Comments 12 pages, 4 figures

2512.04844 2026-04-16 cs.CL cs.AI

Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

Atsuki Yamaguchi, Terufumi Morishita, Aline Villavicencio, Nikolaos Aletras

Comments Accepted to ACL 2026 Main Conference

2512.02697 2026-04-16 cs.CV

GeoBridge: A Semantic-Anchored Multi-View Foundation Model Bridging Images and Text for Geo-Localization

Zixuan Song, Jing Zhang, Di Wang, Zidie Zhou, Wenbin Liu, Haonan Guo, En Wang, Bo Du

Comments The paper is accepted by CVPR 2026! Code, dataset, and pretrained models will be released at https://github.com/MiliLab/GeoBridge

2512.01773 2026-04-16 cs.RO

IGen: Scalable Data Generation for Robot Learning from Open-World Images

Chenghao Gu, Haolan Kang, Junchao Lin, Jinghe Wang, Duo Wu, Shuzhao Xie, Fanding Huang, Junchen Ge, Ziyang Gong, Letian Li, Hongying Zheng, Changwei Lv, Zhi Wang

Comments 8 pages, 8 figures; Accepted to CVPR 2026