arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.06834 2026-04-09 cs.CL cs.AI

On the Step Length Confounding in LLM Reasoning Data Selection

Bing Wang, Rui Miao, Chen Shen, Shaotian Yan, Kaiyuan Liu, Ximing Li, Xiaosong Yuan, Sinan Fan, Jun Zhang, Jieping Ye

Comments Accepted by Findings of ACL 2026. 15 pages, 9 figures. Code: https://github.com/wangbing1416/ASLEC

详情

英文摘要

Large reasoning models have recently demonstrated strong performance on complex tasks that require long chain-of-thought reasoning, through supervised fine-tuning on large-scale and high-quality datasets. To construct such datasets, existing pipelines generate long reasoning data from more capable Large Language Models (LLMs) and apply manually heuristic or naturalness-based selection methods to filter high-quality samples. Despite the proven effectiveness of naturalness-based data selection, which ranks data by the average log probability assigned by LLMs, our analysis shows that, when applied to LLM reasoning datasets, it systematically prefers samples with longer reasoning steps (i.e., more tokens per step) rather than higher-quality ones, a phenomenon we term step length confounding. Through quantitative analysis, we attribute this phenomenon to low-probability first tokens in reasoning steps; longer steps dilute their influence, thereby inflating the average log probabilities. To address this issue, we propose two variant methods: ASLEC-DROP, which drops first-token probabilities when computing average log probability, and ASLEC-CASL, which applies a causal debiasing regression to remove the first tokens' confounding effect. Experiments across four LLMs and five evaluation benchmarks demonstrate the effectiveness of our approach in mitigating the step length confounding problem.

URL PDF HTML ☆

赞 0 踩 0

2604.06830 2026-04-09 cs.CV cs.RO

VGGT-SLAM++

Avilasha Mandal, Rajesh Kumar, Sudarshan Sunil Harithas, Chetan Arora

Comments 8 pages (main paper) + supplementary material. Accepted at CVPR 2026 Workshop (VOCVALC)

2604.06826 2026-04-09 cs.CL cs.AI

Environmental, Social and Governance Sentiment Analysis on Slovene News: A Novel Dataset and Models

Paula Dodig, Boshko Koloski, Katarina Sitar Šuštar, Senja Pollak, Matthew Purver

Comments Accepted at the The 7th Financial Narrative Processing Workshop at LREC 26'

2604.06825 2026-04-09 cs.CV

RePL: Pseudo-label Refinement for Semi-supervised LiDAR Semantic Segmentation

Donghyeon Kwon, Taegyu Park, Suha Kwak

2604.06824 2026-04-09 cs.CV

Generate, Analyze, and Refine: Training-Free Sound Source Localization via MLLM Meta-Reasoning

Subin Park, Jung Uk Kim

Comments Accepted to CVPR 2026

2604.06820 2026-04-09 cs.AI

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation

Zonghuan Xu, Xiang Zheng, Yutao Wu, Xingjun Ma

2604.06817 2026-04-09 cs.CL

SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization

Usman Naseem, Robert Geislinger, Juan Ren, Sarah Kohail, Rudy Garrido Veliz, P Sam Sahil, Yiran Zhang, Marco Antonio Stranisci, Idris Abdulmumin, Özge Alaçam, Cengiz Acartürk, Aisha Jabr, Saba Anwar, Abinew Ali Ayele, Elena Tutubalina, Aung Kyaw Htet, Xintong Wang, Surendrabikram Thapa, Tanmoy Chakraborty, Dheeraj Kodati, Sahar Moradizeyveh, Firoj Alam, Ye Kyaw Thu, Shantipriya Parida, Ihsan Ayyub Qazi, Lilian Wanzare, Nelson Odhiambo Onyango, Clemencia Siro, Ibrahim Said Ahmad, Adem Chanie Ali, Martin Semmann, Chris Biemann, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam

2604.06814 2026-04-09 cs.LG cs.AI

OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale

Dihong Jiang, Ruoqi Cao, Zhiyuan Dang, Li Huang, Qingsong Zhang, Zhiyu Wang, Shihao Piao, Shenggao Zhu, Jianlong Chang, Zhouchen Lin, Qi Tian

2604.06799 2026-04-09 cs.CL cs.CY

Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions

Parth Patil, Dhruv Kumar, Yash Sinha, Murari Mandal

Comments Under Review as a conference paper at COLM 2026

2604.06795 2026-04-09 cs.CV cs.AI

FedDAP: Domain-Aware Prototype Learning for Federated Learning under Domain Shift

Huy Q. Le, Loc X. Nguyen, Yu Qiao, Seong Tae Kim, Eui-Nam Huh, Choong Seon Hong

Comments Accepted at CVPR 2026

2604.06794 2026-04-09 cs.CL

GCoT-Decoding: Unlocking Deep Reasoning Paths for Universal Question Answering

Guanran Luo, Wentao Qiu, Zhongquan Jian, Meihong Wang, Qingqiang Wu

2604.06789 2026-04-09 cs.CV cs.CL

Video-guided Machine Translation with Global Video Context

Jian Chen, JinZe Lv, Zi Long, XiangHua Fu

2604.06787 2026-04-09 cs.CL

When Is Thinking Enough? Early Exit via Sufficiency Assessment for Efficient Reasoning

Yang Xiang, Yixin Ji, Ruotao Xu, Dan Qiao, Zheming Yang, Juntao Li, Min Zhang

Comments ACL 2026 Main Conference

2604.06783 2026-04-09 cs.CV

Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer

Bohao Xing, Deng Li, Rong Gao, Xin Liu, Heikki Kälviäinen

2604.06782 2026-04-09 cs.CV

EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling

Qingguo Meng, Xingbo Dong, Zhe Jin, Massimo Tistarelli

2604.06778 2026-04-09 cs.RO

RichMap: A Reachability Map Balancing Precision, Efficiency, and Flexibility for Rich Robot Manipulation Tasks

Yupu Lu, Yuxiang Ma, Jia Pan

Comments Accepted by WAFR 2026

2604.06777 2026-04-09 cs.CV

Walk the Talk: Bridging the Reasoning-Action Gap for Thinking with Images via Multimodal Agentic Policy Optimization

Wenhao Yang, Yu Xia, Jinlong Huang, Shiyin Lu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Yuchen Zhou, Xiaobo Xia, Yuanyu Wan, Lijun Zhang, Tat-Seng Chua

2604.06771 2026-04-09 cs.CL cs.AI

Multi-Faceted Self-Consistent Preference Alignment for Query Rewriting in Conversational Search

Zhiyu Cao, Peifeng Li, Qiaoming Zhu

Comments ACL 2026 Findings

2604.06770 2026-04-09 cs.CV cs.AI

FlowExtract: Procedural Knowledge Extraction from Maintenance Flowcharts

Guillermo Gil de Avalle, Laura Maruster, Eric Sloot, Christos Emmanouilidis

2604.06767 2026-04-09 cs.LG cs.CL

Geometric Properties of the Voronoi Tessellation in Latent Semantic Manifolds of Large Language Models

Marshall Brett

Comments 20 pages

详情

英文摘要

Language models operate on discrete tokens but compute in continuous vector spaces, inducing a Voronoi tessellation over the representation manifold. We study this tessellation empirically on Qwen3.5-4B-Base, making two contributions. First, using float32 margin recomputation to resolve bfloat16 quantization artifacts, we validate Mabrok's (2026) linear scaling law of the expressibility gap with $R^2$ = 0.9997 - the strongest confirmation to date - and identify a mid-layer geometric ambiguity regime where margin geometry is anti-correlated with cross-entropy (layers 24-28, $ρ$ = -0.29) before crystallizing into alignment at the final layer ($ρ$ = 0.836). Second, we show that the Voronoi tessellation of a converged model is reshapable through margin refinement procedures (MRP): short post-hoc optimization runs that widen token-decision margins without retraining. We compare direct margin maximization against Fisher information distance maximization across a dose-response sweep. Both methods find the same ceiling of ~16,300 correctable positions per 256K evaluated, but differ critically in collateral damage. Margin maximization damage escalates with intervention strength until corrections are overwhelmed. Fisher damage remains constant at ~5,300 positions across the validated range ($λ$ = 0.15-0.6), achieving +28% median margin improvement at $λ$ = 0.6 with invariant downstream benchmarks - a geometric reorganization that compresses the expressibility gap while preserving its scaling law. However, frequency and token-class audits reveal that gains concentrate in high-frequency structural tokens (84% of net corrections at $λ$ = 0.6), with content and entity-like contributions shrinking at higher $λ$. Fisher MRP is therefore a viable geometric polishing tool whose practical ceiling is set not by aggregate damage but by the uniformity of token-level benefit.

URL PDF HTML ☆

赞 0 踩 0

2604.06765 2026-04-09 cs.CL cs.AI

TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks

Xiangyu Wang, Jin Wu, Haoran Shi, Wei Xia, Jiarui Yu, Chanjin Zheng

2604.06758 2026-04-09 cs.CL

Multilingual Cognitive Impairment Detection in the Era of Foundation Models

Damar Hoogland, Boshko Koloski, Jaya Caporusso, Tine Kolenik, Ana Zwitter Vitez, Senja Pollak, Christina Manouilidou, Matthew Purver

Comments Accepted as an oral at the RAPID workshop @ LREC 2026'

2604.06756 2026-04-09 cs.CL

How Long Reasoning Chains Influence LLMs' Judgment of Answer Factuality

Minzhu Tu, Shiyu Ni, Keping Bi

Comments ACL2026 Main

2604.06754 2026-04-09 cs.LG cs.CY

The Rhetoric of Machine Learning

Robert C. Williamson

Comments 25 pages. Text of a talk given at AlphaPersuade 2.0, 26 March 2026

2604.06753 2026-04-09 cs.CL

Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

Heng Zhou, Zelin Tan, Zhemeng Zhang, Yutao Fan, Yibing Lin, Li Kang, Xiufeng Song, Rui Li, Songtao Huang, Ao Yu, Yuchen Fan, Yanxu Chen, Kaixin Xu, Xiaohong Liu, Yiran Qin, Philip Torr, Chen Zhang, Zhenfei Yin

2604.06752 2026-04-09 cs.LG

Busemann energy-based attention for emotion analysis in Poincaré discs

Zinaid Kapić, Vladimir Jaćimović

2604.06748 2026-04-09 cs.CV

From Static to Interactive: Adapting Visual in-Context Learners for User-Driven Tasks

Carlos Schmidt, Simon Reiß

详情

英文摘要

Visual in-context learning models are designed to adapt to new tasks by leveraging a set of example input-output pairs, enabling rapid generalization without task-specific fine-tuning. However, these models operate in a fundamentally static paradigm: while they can adapt to new tasks, they lack any mechanism to incorporate user-provided guidance signals such as scribbles, clicks, or bounding boxes to steer or refine the prediction process. This limitation is particularly restrictive in real-world applications, where users want to actively guide model predictions, e.g., by highlighting the target object for segmentation, indicating a region which should be visually altered, or isolating a specific person in a complex scene to run targeted pose estimation. In this work, we propose a simple method to transform static visual in-context learners, particularly the DeLVM approach, into highly controllable, user-driven systems, i.e., Interactive DeLVM, enabling seamless interaction through natural visual cues such as scribbles, clicks, or drawing boxes. Specifically, by encoding interactions directly into the example input-output pairs, we keep the philosophy of visual in-context learning intact: enabling users to prompt models with unseen interactions without fine-tuning and empowering them to dynamically steer model predictions with personalized interactions. Our experiments demonstrate that SOTA visual in-context learning models fail to effectively leverage interaction cues, often ignoring user guidance entirely. In contrast, our method excels in controllable, user-guided scenarios, achieving improvements of $+7.95%$ IoU for interactive segmentation, $+2.46$ PSNR for directed super-resolution, and $-3.14%$ LPIPS for interactive object removal. With this, our work bridges the gap between rigid static task adaptation and fluid interactivity for user-centric visual in-context learning.

URL PDF HTML ☆

赞 0 踩 0

2604.06746 2026-04-09 cs.CL

StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference

Zhirui Chen, Peiyang Liu, Ling Shao

Comments Accepted to ACL 2026 Findings, 14 pages

2604.06740 2026-04-09 cs.CV

LiveStre4m: Feed-Forward Live Streaming of Novel Views from Unposed Multi-View Video

Pedro Quesado, Erkut Akdag, Yasaman Kashefbahrami, Willem Menu, Egor Bondarev

2604.06739 2026-04-09 cs.CV

DOC-GS: Dual-Domain Observation and Calibration for Reliable Sparse-View Gaussian Splatting

Hantang Li, Qiang Zhu, Xiandong Meng, Debin Zhao, Xiaopeng Fan

Comments 10 pages, 5 figures