arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.07740 2026-04-10 cs.CV cs.AI

Beyond Pedestrians: Caption-Guided CLIP Framework for High-Difficulty Video-based Person Re-Identification

Shogo Hamano, Shunya Wakasugi, Tatsuhito Sato, Sayaka Nakamura

详情

英文摘要

In recent years, video-based person Re-Identification (ReID) has gained attention for its ability to leverage spatiotemporal cues to match individuals across non-overlapping cameras. However, current methods struggle with high-difficulty scenarios, such as sports and dance performances, where multiple individuals wear similar clothing while performing dynamic movements. To overcome these challenges, we propose CG-CLIP, a novel caption-guided CLIP framework that leverages explicit textual descriptions and learnable tokens. Our method introduces two key components: Caption-guided Memory Refinement (CMR) and Token-based Feature Extraction (TFE). CMR utilizes captions generated by Multi-modal Large Language Models (MLLMs) to refine identity-specific features, capturing fine-grained details. TFE employs a cross-attention mechanism with fixed-length learnable tokens to efficiently aggregate spatiotemporal features, reducing computational overhead. We evaluate our approach on two standard datasets (MARS and iLIDS-VID) and two newly constructed high-difficulty datasets (SportsVReID and DanceVReID). Experimental results demonstrate that our method outperforms current state-of-the-art approaches, achieving significant improvements across all benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2604.07737 2026-04-10 cs.CL

SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs

Jie Sun, Yu Liu, Lu Han, Qiwen Deng, Xiang Shu, Yang Xiao, Xingyu Lu, Jun Zhou, Pengfei Liu, Lintao Ma, Jiancan Wu, Xiang Wang

Comments 16 pages, 4 figures, 5 tables

2604.07733 2026-04-10 cs.AI

CivBench: Progress-Based Evaluation for LLMs' Strategic Decision-Making in Civilization V

John Chen, Sihan Cheng, Can Gurkan, Mingyi Lin

Comments Under review

2604.07729 2026-04-10 cs.AI cs.CL

Emotion Concepts and their Function in a Large Language Model

Nicholas Sofroniew, Isaac Kauvar, William Saunders, Runjin Chen, Tom Henighan, Sasha Hydrie, Craig Citro, Adam Pearce, Julius Tarng, Wes Gurnee, Joshua Batson, Sam Zimmerman, Kelley Rivoire, Kyle Fish, Chris Olah, Jack Lindsey

2604.07728 2026-04-10 cs.CV cs.GR cs.RO

GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting

Jialin Li, Bin Fu, Ruiping Wang, Xilin Chen

Comments Accepted to CVPRF2026

2604.07723 2026-04-10 cs.CV

Direct Segmentation without Logits Optimization for Training-Free Open-Vocabulary Semantic Segmentation

Jiahao Li, Yang Lu, Yachao Zhang, Fangyong Wang, Yuan Xie, Yanyun Qu

Comments Accepted by CVPR 2026

2604.07715 2026-04-10 cs.LG math.OC

Mathematical analysis of one-layer neural network with fixed biases, a new activation function and other observations

Fabricio Macià, Shu Nakamura

2604.07712 2026-04-10 cs.LG

CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics

Ziyi Ding, Xianxin Lai, Weiyu Chen, Xiao-Ping Zhang, Jiayu Chen

2604.07705 2026-04-10 cs.RO

Vision-Language Navigation for Aerial Robots: Towards the Era of Large Language Models

Xingyu Xia, Lekai Zhou, Yujie Tang, Xiaozhou Zhu, Hai Zhu, Wen Yao

Comments 28 pages, 8 figures

2604.07687 2026-04-10 cs.LG cs.AI

Joint Task Offloading, Inference Optimization and UAV Trajectory Planning for Generative AI Empowered Intelligent Transportation Digital Twin

Xiaohuan Li, Junchuan Fan, Bingqi Zhang, Rong Yu, Xumin Huang, Qian Chen

2604.07685 2026-04-10 cs.LG

Tensor-based computation of the Koopman generator via operator logarithm

Tatsuya Kishimoto, Jun Ohkubo

Comments 9 pages, 5 figure

2604.07681 2026-04-10 cs.AI

Multi-Agent Orchestration for High-Throughput Materials Screening on a Leadership-Class System

Thang Duc Pham, Harikrishna Tummalapalli, Fakhrul Hasan Bhuiyan, Álvaro Vázquez Mayagoitia, Christine Simpson, Riccardo Balin, Venkatram Vishwanath, Murat Keçeli

2604.07677 2026-04-10 cs.RO

Bird-Inspired Spatial Flapping Wing Mechanism via Coupled Linkages with Single Actuator

Daniel Huczala, Sun-Pill Jung, Frank C. Park

2604.07675 2026-04-10 cs.CV

FireSenseNet: A Dual-Branch CNN with Cross-Attentive Feature Interaction for Next-Day Wildfire Spread Prediction

Jinzhen Han, JinByeong Lee, Hak Han, YeonJu Na, Jae-Joon Lee

2604.07674 2026-04-10 cs.CV

Weight Group-wise Post-Training Quantization for Medical Foundation Model

Yineng Chen, Peng Huang, Aozhong Zhang, Hui Guo, Penghang Yin, Shu Hu, Shao Lin, Xin Li, Tzu-Jen Kao, Balakrishnan Prabhakaran, MingChing Chang, Xin Wang

2604.07672 2026-04-10 cs.RO

Reset-Free Reinforcement Learning for Real-World Agile Driving: An Empirical Study

Kohei Honda, Hirotaka Hosogaya

Comments 7 pages, 5 figures,

2604.07667 2026-04-10 cs.AI cs.MA cs.SI

From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation

Mengdie Flora Wang, Haochen Xie, Guanghui Wang, Aijing Gao, Guang Yang, Ziyuan Li, Qucy Wei Qiu, Fangwei Han, Hengzhi Qiu, Yajing Huang, Bing Zhu, Jae Oh Woo

2604.07666 2026-04-10 cs.LG cs.AI

An Imperfect Verifier is Good Enough: Learning with Noisy Rewards

Andreas Plesner, Francisco Guzmán, Anish Athalye

2604.07665 2026-04-10 cs.CV

Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation

Yanbo Gao, Huibin Bai, Huasong Zhou, Xingyu Gao, Shuai Li, Xun Cai, Hui Yuan, Wei Hua, Tian Xie

Comments Accepted by IEEE Transactions on Circuits and Systems for Video Technology

2604.07664 2026-04-10 cs.CV eess.IV

Monocular Depth Estimation From the Perspective of Feature Restoration: A Diffusion Enhanced Depth Restoration Approach

Huibin Bai, Shuai Li, Hanxiao Zhai, Yanbo Gao, Chong Lv, Yibo Wang, Haipeng Ping, Wei Hua, Xingyu Gao

Comments Accepted by IEEE TMM

2604.07659 2026-04-10 cs.CL

Efficient and Effective Internal Memory Retrieval for LLM-Based Healthcare Prediction

Mingchen Li, Jiatan Huang, Zonghai Yao, Hong yu

Comments ACL 2026 (Findings), reviewer score: 3.5,3.5,4

2604.07658 2026-04-10 cs.LG cs.AI cs.CL

Optimal Decay Spectra for Linear Recurrences

Yang Cao

2604.07655 2026-04-10 cs.LG cs.CL

Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs

Yue Huang, Haomin Zhuang, Jiayi Ye, Han Bao, Yanbo Wang, Hang Hua, Siyuan Wu, Pin-Yu Chen, Xiangliang Zhang

2604.07652 2026-04-10 cs.AI cs.HC

Bridging Natural Language and Interactive What-If Interfaces via LLM-Generated Declarative Specification

Sneha Gathani, Sirui Zeng, Diya Patel, Ryan Rossi, Dan Marshall, Cagatay Demiralp, Steven Drucker, Zhicheng Liu

Comments 17 pages 17 figures

详情

英文摘要

What-if analysis (WIA) is an iterative, multi-step process where users explore and compare hypothetical scenarios by adjusting parameters, applying constraints, and scoping data through interactive interfaces. Current tools fall short of supporting effective interactive WIA: spreadsheet and BI tools require time-consuming and laborious setup, while LLM-based chatbot interfaces are semantically fragile, frequently misinterpret intent, and produce inconsistent results as conversations progress. To address these limitations, we present a two-stage workflow that translates natural language (NL) WIA questions into interactive visual interfaces via an intermediate representation, powered by the Praxa Specification Language (PSL): first, LLMs generate PSL specifications from NL questions capturing analytical intent and logic, enabling validation and repair of erroneous specifications; and second, the specifications are compiled into interactive visual interfaces with parameter controls and linked visualizations. We benchmark this workflow with 405 WIA questions spanning 11 WIA types, 5 datasets, and 3 state-of-the-art LLMs. The results show that across models, half of specifications (52.42%) are generated correctly without intervention. We perform an analysis of the failure cases and derive an error taxonomy spanning non-functional errors (specifications fail to compile) and functional errors (specifications compile but misrepresent intent). Based on the taxonomy, we apply targeted repairs on the failure cases using few-shot prompts and improve the success rate to 80.42%. Finally, we show how undetected functional errors propagate through compilation into plausible but misleading interfaces, demonstrating that the intermediate specification is critical for reliably bridging NL and interactive WIA interface in LLM-powered WIA systems.

URL PDF HTML ☆

赞 0 踩 0

2604.07651 2026-04-10 cs.LG cs.AI

Cognitive-Causal Multi-Task Learning with Psychological State Conditioning for Assistive Driving Perception

Keito Inoshita, Nobuhiro Hayashida, Akira Imanishi

2604.07650 2026-04-10 cs.AI cs.CL

How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles

Chenchen Kuai, Jiwan Jiang, Zihao Zhu, Hao Wang, Keshu Wu, Zihao Li, Yunlong Zhang, Chenxi Liu, Zhengzhong Tu, Zhiwen Fan, Yang Zhou

Comments 9 pages, 4 figures

2604.07645 2026-04-10 cs.AI

PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent

Prince Zizhuang Wang, Shuli Jiang

2604.07644 2026-04-10 cs.RO cs.AI cs.SY eess.SY math.OC

Safe Large-Scale Robust Nonlinear MPC in Milliseconds via Reachability-Constrained System Level Synthesis on the GPU

Jeffrey Fang, Glen Chou

Comments Under review

2604.07634 2026-04-10 cs.CV

VSAS-BENCH: Real-Time Evaluation of Visual Streaming Assistant Models

Pavan Kumar Anasosalu Vasu, Cem Koc, Fartash Faghri, Chun-Liang Li, Bo Feng, Zhengfeng Lai, Meng Cao, Oncel Tuzel, Hadi Pouransari

Comments CVPR Findings 2026

2604.07632 2026-04-10 cs.LG cs.AI

Sheaf-Laplacian Obstruction and Projection Hardness for Cross-Modal Compatibility on a Modality-Independent Site

Tibor Sloboda

Comments 21 pages, 4 figures, submitted to Annals of Mathematics and Artificial Intelligence of Springer Nature