arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.10809 2026-04-08 cs.LG

Near-optimal Linear Predictive Clustering in Non-separable Spaces via MIP and QPBO Reductions

Jiazhou Liang, Hassan Khurram, Scott Sanner

详情

DOI: 10.1609/aaai.v40i28.39511

英文摘要

Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables, with numerous applications including marketing, medicine, and education. Greedy optimization methods, commonly used for LPC, alternate between clustering and linear regression but lack global optimality. While effective for separable clusters, they struggle in non-separable settings where clusters overlap in feature space. In an alternative constrained optimization paradigm, Bertsimas and Shioda (2007) formulated LPC as a Mixed-Integer Program (MIP), ensuring global optimality regardless of separability but suffering from poor scalability. This work builds on the constrained optimization paradigm to introduce two novel approaches that improve the efficiency of global optimization for LPC. By leveraging key theoretical properties of separability, we derive near-optimal approximations with provable error bounds, significantly reducing the MIP formulation's complexity and improving scalability. Additionally, we can further approximate LPC as a Quadratic Pseudo-Boolean Optimization (QPBO) problem, achieving substantial computational improvements in some settings. Comparative analyses on synthetic and real-world datasets demonstrate that our methods consistently achieve near-optimal solutions with substantially lower regression errors than greedy optimization while exhibiting superior scalability over existing MIP formulations.

URL PDF HTML ☆

赞 0 踩 0

2511.10287 2026-04-08 cs.LG cs.CL

OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models

Yuping Yan, Yuhan Xie, Yuanshuai Li, Yingchao Yu, Lingjuan Lyu, Yaochu Jin

2511.09425 2026-04-08 cs.LG stat.ML

Supporting Evidence for the Adaptive Feature Program across Diverse Models

Yicheng Li, Qian Lin

2511.04570 2026-04-08 cs.CV cs.CL

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Jingqi Tong, Yurong Mou, Hangcheng Li, Mingzhe Li, Yongzhuo Yang, Ming Zhang, Qiguang Chen, Tianyi Liang, Xiaomeng Hu, Yining Zheng, Xinchi Chen, Jun Zhao, Xuanjing Huang, Xipeng Qiu

Comments 34 pages, 17 figures

2511.01831 2026-04-08 cs.LG cs.AI

Routing-Based Continual Learning for Multimodal Large Language Models

Jay Mohta, Kenan Emir Ak, Gwang Lee, Dimitrios Dimitriadis, Yan Xu, Mingwei Shen

2511.00181 2026-04-08 cs.CV cs.CR

From Evidence to Verdict: An Agent-Based Forensic Framework for AI-Generated Image Detection

Mengfei Liang, Yiting Qu, Yukun Jiang, Michael Backes, Yang Zhang

Comments 15 pages, 5 figures

2510.25241 2026-04-08 cs.RO cs.AI

One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors

Hao Huang, Geeta Chandra Raju Bethala, Shuaihang Yuan, Congcong Wen, Mengyu Wang, Anthony Tzes, Yi Fang

Comments 14 pages, 3 figures, 5 tables

2510.19457 2026-04-08 cs.CL

MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models

Kailin Jiang, Ning Jiang, Yuntao Du, Yuchen Ren, Yuchen Li, Yifan Gao, Jinhe Bi, Yunpu Ma, Bin Li, Lei Liu, Qing Li

Comments ACL 2026, Project Page: https://mined-lmm.github.io/

2510.14949 2026-04-08 cs.CL cs.CV cs.LG

DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

Yu Zhou, Sohyun An, Haikang Deng, Da Yin, Clark Peng, Cho-Jui Hsieh, Kai-Wei Chang, Nanyun Peng

2510.14628 2026-04-08 cs.CL cs.AI

RLAIF-SPA: Structured AI Feedback for Semantic-Prosodic Alignment in Speech Synthesis

Qing Yang, Zhenghao Liu, Yangfan Du, Pengcheng Huang, Tong Xiao

2510.13909 2026-04-08 cs.CL cs.AI

Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning

Xingrui Zhuo, Jiapu Wang, Gongqing Wu, Zhongyuan Wang, Jichen Zhang, Shirui Pan, Xindong Wu

2510.10815 2026-04-08 cs.AI cs.CL cs.IR cs.SC

DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems

Meiru Zhang, Philipp Borchert, Milan Gritta, Gerasimos Lampouras

Comments Accepted at ICLR 2026

2510.09203 2026-04-08 cs.CV

Cattle-CLIP: A Multimodal Framework for Cattle Behaviour Recognition from Video

Huimin Liu, Jing Gao, Daria Baran, AxelX Montout, Neill W Campbell, Andrew W Dowsey

Comments 16 pages, 10 figures, submitted to Information Processing in Agriculture

详情

英文摘要

Robust behaviour recognition in real-world farm environments remains challenging due to several data-related limitations, including the scarcity of well-annotated livestock video datasets and the substantial domain gap between large-scale pre-training corpora and agricultural surveillance footage. To address these challenges, we propose Cattle-CLIP, a domain-adaptive vision-language framework that reformulates cattle behaviour recognition as cross-modal semantic alignment rather than purely visual classification. Instead of directly fine-tuning visual backbones, Cattle-CLIP incorporates a temporal integration module to extend image-level contrastive pre-training to video-based behaviour understanding, enabling consistent semantic alignment across time. To mitigate the distribution shift between web-scale image-text data used for the pre-trained model and real-world cattle surveillance footage, we further introduce tailored augmentation strategies and specialised behaviour prompts. Furthermore, we construct CattleBehaviours6, a curated and behaviour-consistent video dataset comprising 1905 annotated clips across six indoor behaviours to support model training and evaluation. Beyond serving as a benchmark for our proposed method, the dataset provides a standardised ethogram definition, offering a practical resource for future research in livestock behaviour analysis. Cattle-CLIP is evaluated under both fully-supervised and few-shot learning scenarios, with a particular focus on data-scarce behaviour recognition, an important yet under-explored goal in livestock monitoring. Experiments show that Cattle-CLIP achieves 96.1% overall accuracy across six behaviours in supervised settings, with near-perfect recall for feeding, drinking and standing-ruminating behaviours, and demonstrates robust generalisation with limited data in few-shot scenarios.

URL PDF HTML ☆

赞 0 踩 0

2510.07432 2026-04-08 cs.AI

TS-Agent: Understanding and Reasoning Over Raw Time Series via Iterative Insight Gathering

Penghang Liu, Elizabeth Fons, Annita Vapsi, Mohsen Ghassemi, Svitlana Vyetrenko, Daniel Borrajo, Vamsi K. Potluru, Manuela Veloso

Comments NeurIPS 2025 Workshop on Foundations of Reasoning in Language Models

2510.05038 2026-04-08 cs.CL

Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization

Omri Uzan, Asaf Yehudai, Roi pony, Eyal Shnarch, Ariel Gera

Comments ICLR 2026

2510.05026 2026-04-08 cs.CL

Idiom Understanding as a Tool to Measure the Dialect Gap

David Beauchemin, Yan Tremblay, Mohamed Amine Youssef, Richard Khoury

Comments Accepted to ACL 2026 findings

2510.02810 2026-04-08 cs.LG cs.AI cs.SE

Dissecting Transformers: A CLEAR Perspective towards Green AI

Hemang Jain, Shailender Goyal, Divyansh Pandey, Karthik Vaidhyanathan

2510.00978 2026-04-08 cs.CV

A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features

Axel Barroso-Laguna, Tommaso Cavallari, Victor Adrian Prisacariu, Eric Brachmann

2509.25454 2026-04-08 cs.AI cs.CL

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Fang Wu, Weihao Xuan, Heli Qi, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi

详情

英文摘要

Although RLVR has become an essential component for developing advanced reasoning skills in language models, contemporary studies have documented training plateaus after thousands of optimization steps, i.e., notable decreases in performance gains despite increased computational investment. This limitation stems from the sparse exploration patterns inherent in current RLVR practices, where models rely on limited rollouts that often miss critical reasoning paths and fail to provide systematic coverage of the solution space. We present DeepSearch, a framework that integrates Monte Carlo Tree Search (MCTS) directly into RLVR training. In contrast to existing methods that rely on tree search only at inference, DeepSearch embeds structured search into the training loop, enabling systematic exploration and fine-grained credit assignment across reasoning steps. Through training-time exploration, DeepSearch addresses the fundamental bottleneck of insufficient exploration, which leads to diminishing performance gains over prolonged training. Our contributions include: (1) a global frontier selection strategy that prioritizes promising nodes across the search tree, (2) selection with entropy-based guidance that identifies confident paths for supervision, and (3) adaptive replay buffer training with solution caching for efficiency. Experiments on mathematical reasoning benchmarks show that DeepSearch achieves an average accuracy of 62.95\% and establishes a new state-of-the-art reasoning model, while using 5.7x fewer GPU hours than extended training approaches. These results highlight the importance of strategic exploration over brute-force scaling and demonstrate the promise of algorithmic innovation for advancing RLVR methodologies. DeepSearch establishes a new direction for scaling reasoning capabilities through systematic search rather than prolonged computation.

URL PDF HTML ☆

赞 0 踩 0

2509.25284 2026-04-08 cs.LG cs.NI eess.SP

Optimisation of Resource Allocation in Heterogeneous Wireless Networks Using Deep Reinforcement Learning

Oluwaseyi Giwa, Jonathan Shock, Jaco Du Toit, Tobi Awodumila

Comments Accepted at the 2026 EuCNC & 6G Summit

2509.23102 2026-04-08 cs.AI cs.CL

Multiplayer Nash Preference Optimization

Fang Wu, Xu Huang, Weihao Xuan, Zhiwei Zhang, Yijia Xiao, Guancheng Wan, Xiaomin Li, Bing Hu, Peng Xia, Jure Leskovec, Yejin Choi

2509.11926 2026-04-08 cs.CV

Unrolling Graph-based Douglas-Rachford Algorithm for Image Interpolation with Informed Initialization

Xue Zhang, Bingshuo Hu, Gene Cheung

Comments 6 pages,ICME2026

2509.09438 2026-04-08 cs.CL

GrACE: A Generative Approach to Better Confidence Elicitation and Efficient Test-Time Scaling in Large Language Models

Zhaohan Zhang, Ziquan Liu, Ioannis Patras

2509.02949 2026-04-08 cs.CL cs.CV

ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly

Kimihiro Hasegawa, Wiradee Imrattanatrai, Masaki Asada, Susan Holm, Yuran Wang, Vincent Zhou, Ken Fukuda, Teruko Mitamura

Comments LREC 2026. Code and data: https://github.com/kimihiroh/promqa-assembly

2508.13009 2026-04-08 cs.CV

Matrix-game 2.0: An open-source real-time and streaming interactive world model

Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, Fei Kang, Biao Jiang, Mengyin An, Yangyang Ren, Baixin Xu, Hao-Xiang Guo, Kaixiong Gong, Size Wu, Wei Li, Xuchen Song, Yang Liu, Yangguang Li, Yahui Zhou

Comments Project Page: https://matrix-game-v2.github.io

2508.09691 2026-04-08 cs.CV

PaCo-FR: Patch-Pixel Aligned End-to-End Codebook Learning for Facial Representation Pre-training

Yin Xie, Zhichao Chen, Zeyu Xiao, Yongle Zhao, Xiang An, Kaicheng Yang, Zimin Ran, Jia Guo, Ziyong Feng, Jiankang Deng

2508.07833 2026-04-08 cs.CV

MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization

Animesh Jain, Alexandros Stergiou

Comments Accepted at CVPRw 2026 - How Do Vision Models Work? (HOW) Workshop, Project page: https://anaekin.github.io/MIMIC

2508.02591 2026-04-08 cs.CL

CharBench: Evaluating the Role of Tokenization in Character-Level Tasks

Omri Uzan, Yuval Pinter

Comments AAAI-26

2507.22418 2026-04-08 cs.CV cs.AI

Aleatoric Uncertainty Medical Image Segmentation Estimation via Flow Matching

Phi Van Nguyen, Ngoc Huynh Trinh, Duy Minh Lam Nguyen, Phu Loc Nguyen, Quoc Long Tran

详情

DOI: 10.1007/978-3-032-06593-3_13
Journal ref: Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. UNSURE 2025

英文摘要

Quantifying aleatoric uncertainty in medical image segmentation is critical since it is a reflection of the natural variability observed among expert annotators. A conventional approach is to model the segmentation distribution using the generative model, but current methods limit the expression ability of generative models. While current diffusion-based approaches have demonstrated impressive performance in approximating the data distribution, their inherent stochastic sampling process and inability to model exact densities limit their effectiveness in accurately capturing uncertainty. In contrast, our proposed method leverages conditional flow matching, a simulation-free flow-based generative model that learns an exact density, to produce highly accurate segmentation results. By guiding the flow model on the input image and sampling multiple data points, our approach synthesizes segmentation samples whose pixel-wise variance reliably reflects the underlying data distribution. This sampling strategy captures uncertainties in regions with ambiguous boundaries, offering robust quantification that mirrors inter-annotator differences. Experimental results demonstrate that our method not only achieves competitive segmentation accuracy but also generates uncertainty maps that provide deeper insights into the reliability of the segmentation outcomes. The code for this paper is freely available at https://github.com/huynhspm/Data-Uncertainty

URL PDF HTML ☆

赞 0 踩 0

2507.20546 2026-04-08 cs.CL cs.AI

Enhancing Hallucination Detection via Future Context

Joosung Lee, Cheonbok Park, Hwiyeol Jo, Jeonghoon Kim, Joonsuk Park, Kang Min Yoo

Comments Findings of ACL 2026