arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.24972 2026-04-29 cs.CL

Dynamic Decision Learning: Test-Time Evolution for Abnormality Grounding in Rare Diseases

Jun Li, Mingxuan Liu, Jiazhen Pan, Che Liu, Wenjia Bai, Cosmin I. Bercea, Julia A. Schnabel

详情

英文摘要

Clinical abnormality grounding for rare diseases is often hindered by data scarcity, making supervised fine-tuning impractical and single-pass inference highly unstable. We propose Dynamic Decision Learning (DDL), a framework that enables frozen large vision-language models (LVLMs) to refine their decisions across both language and visual spaces by optimizing instructions and consolidating predictions under visual perturbations. This process improves localization quality and produces a consensus-based reliability score that quantifies model confidence. Results on brain imaging benchmarks, including a rare-disease dataset with 281 pathology types across models ranging from 3B to 72B parameters, show that DDL improves mAP@75 by up to 105% on rare-disease cases and outperforms adaptation baselines and supervised fine-tuning. Furthermore, DDL demonstrates stronger calibration between reliability scores and localization accuracy under severe distribution shifts and increasing task difficulty. Code is available at: https://lijunrio.github.io/DDL/

URL PDF HTML ☆

赞 0 踩 0

2604.24971 2026-04-29 cs.LG cs.CL cs.DC

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

Ishan Patel, Ishan Joshi

Comments 10 pages, 6 tables. Code: https://github.com/ishan1410/PolyKV Keywords: KV cache compression, multi-agent LLM inference, asymmetric quantization, FWHT, TurboQuant, shared memory

2604.24964 2026-04-29 cs.LG cs.CL

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov

Comments 29 pages

2604.24959 2026-04-29 cs.LG stat.ML

CoreFlow: Low-Rank Matrix Generative Models

Dongze Wu, Linglingzhi Zhu, Yao Xie

2604.24955 2026-04-29 cs.CL cs.AI cs.SE

BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks

Xinming Tu, Tianze Wang, Yingzhou, Lu, Kexin Huang, Yuanhao Qu, Sara Mostafavi

2604.24952 2026-04-29 cs.CV cs.AI

Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

Xinxin Liu, Ming Li, Zonglin Lyu, Yuzhang Shang, Chen Chen

2604.24947 2026-04-29 cs.CV

Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing

Cheng-Han Lee, Maniratnam Mandal, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Comments Under Review in IEEE Transactions on Image Processing. The code, models and dataset will be available at: https://github.com/steven413d/LIVE-YT-VideoCropping

详情

英文摘要

With the rise of mobile video consumption on diverse handheld display resolutions and orientation modes, altering videos to aspect ratios poses challenges. Static cropping and border padding often compromises visual quality, while warping may distort a video's intended meaning. Here we advocate for a more effective approach: cropping significant regions within video frames in a temporal manner, while minimizing distortion and preserving essential content. One barrier to solving this problem is the lack of sufficiently large-scale database devoted to informing these tasks. Towards filling this gap, we introduce the LIVE-YouTube Video Cropping (LIVE-YT VC) database, featuring 1800 videos, annotated by 90 human subjects. Using videos sourced from the YouTube-UGC and LSVQ Databases, this new resource is the largest publicly-available subjective video portrait region cropping database. We also introduce a post-processed version of the database, called LIVE-YT VC++, whereby a novel intra-frame temporal filter was deployed to smooth subjective annotations within each video. We demonstrate the usefulness of this new data resource using the SmartVidCrop algorithm and state-of-the-art video grounding models, in hopes of establishing our subjective dataset as a benchmark for future research. Our contributions offer a resource for advancing video aspect ratio transformation models towards ensuring that reshaped mobile-friendly video content retains its quality and meaning. Since our labels bear resemblances to video saliency annotations, we also conducted an additional analysis to explore the similarity between our labels and video saliency predictions. Finally, we repurposed state-of-the-art video grounding models for aspect ratio change tasks, and fine-tuned them on our dataset. As a service to the research community, we plan to open source the project.

URL PDF HTML ☆

赞 0 踩 0

2604.24936 2026-04-29 cs.LG stat.ML

A Unifying Framework for Unsupervised Concept Extraction

Chandler Squires, Pradeep Ravikumar

Comments AISTATS 2026, 9 pages

2604.24933 2026-04-29 cs.AI cs.SD

S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models

Mohammed Ali El Adlouni, Aurian Quelennec, Pierre Chouteau, Geoffroy Peeters, Slim Essid

Comments Accepted at IEEE ICASSP 2026. 5 pages, 2 figures, 3 tables. Equal contribution by first two authors. Code: https://github.com/MedAliAdlouni/ssondo | Models: https://huggingface.co/mohammedali2501/ssondo | Package: https://pypi.org/project/ssondo/

2604.24929 2026-04-29 cs.CL cs.AI

GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

Yunsu Kim, Kaden Uhlig, Joern Wuebker

2604.24927 2026-04-29 cs.CL cs.AI cs.LG

Large Language Models Explore by Latent Distilling

Yuanhao Zeng, Ao Lu, Lufei Li, Zheng Zhang, Yexin Li, Kan Ren

Comments 25 pages, 5 figures

2604.24921 2026-04-29 cs.RO cs.AI cs.CL cs.CV

Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System

Yifei Wei, Linqing Zhong, Yi Liu, Yuxiang Lu, Xindong He, Maoqing Yao, Guanghui Ren

Comments Accepted to the Main Conference of ACL 2026. Project page: https://libra-vla.github.io/

2604.24913 2026-04-29 cs.LG q-bio.PE

Generative diffusion models for spatiotemporal influenza forecasting

Joseph Lemaitre, Justin Lessler

2604.24911 2026-04-29 cs.LG cs.AI

Learning with Embedded Linear Equality Constraints via Variational Bayesian Inference

Matthew Marsh, Benoît Chachuat, Antonio del Rio Chanona

Comments Part of the OPTIMAL: Optimisation and Post-Bayesian Inference in Machine Learning Workshop at AISTATS 2026

2604.24906 2026-04-29 cs.RO cs.LG cs.SY eess.SY

An analysis of sensor selection for fruit picking with suction-based grippers

Eva Krueger, Marcus Rosette, Joseph R. Davidson

Comments IROS Conference Format, 6 pages, 6 figures, 1 table

2604.24894 2026-04-29 cs.RO cs.CV cs.LG cs.SY eess.SY math.OC

VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis

Antoine P. Leeman, Shuyu Zhan, Melanie N. Zeilinger, Glen Chou

Comments Extended version; conference version to appear in Robotics: Science and Systems XXII (RSS 2026)

2604.24893 2026-04-29 cs.CV

Interactive Episodic Memory with User Feedback

Nikesh Subedi, Loris Bazzani, Ziad Al-Halah

Comments Accepted to CVPR 2026. Project Page: https://nsubedi11.github.io/refocus

2604.24885 2026-04-29 cs.CV cs.LG

VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations

Maitreya Patel, Jingtao Li, Weiming Zhuang, Yezhou Yang, Lingjuan Lv

Comments Accepted at CVPR'26 | Project Page: https://github.com/SonyResearch/VibeToken

2604.24881 2026-04-29 cs.AI

Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate

John Seon Keun Yi, Aaron Mueller, Dokyun Lee

Comments ACL 2026 Main

2604.24878 2026-04-29 cs.LG cs.AI stat.ML

Transformer Approximations from ReLUs

Jerry Yao-Chieh Hu, Mingcheng Lu, Yi-Chen Lee, Han Liu

2604.24877 2026-04-29 cs.CV cs.AI cs.LG eess.IV

Learning Illumination Control in Diffusion Models

Nishit Anand, Manan Suri, Christopher Metzler, Dinesh Manocha, Ramani Duraiswami

Comments Accepted to ICLR 2026 ReALM-GEN Workshop on Diffusion Models. Project Website: https://nishitanand.github.io/relighting-diffusion-website

2604.24876 2026-04-29 cs.CV

ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation

Yu Xin, Gorkem Can Ates, Jun Ma, Sumin Kim, Ying Zhang, Kaleb E Smith, Kuang Gong, Wei Shao

2604.24842 2026-04-29 cs.AI cs.MA cs.MM

Co-Director: Agentic Generative Video Storytelling

Yale Song, Yiwen Song, Nick Losier, Nathan Hodson, Ye Jin, Rhyard Zhu, Yan Xu, Daniel Vlasic, Carina Claassen, Jasmine Leon, Khanh G. LeViet, Zack Chomyn, Joe Timmons, Brett Slatkin, Scott Penberthy, Tomas Pfister

Comments Project Page: https://co-director-agent.github.io/

2604.24833 2026-04-29 cs.RO cs.AI cs.GR cs.LG

MotionBricks: Scalable Real-Time Motions with Modular Latent Generative Model and Smart Primitives

Tingwu Wang, Olivier Dionne, Michael De Ruyter, David Minor, Davis Rempe, Kaifeng Zhao, Mathis Petrovich, Ye Yuan, Chenran Li, Zhengyi Luo, Brian Robison, Xavier Blackwell, Bernardo Antoniazzi, Xue Bin Peng, Yuke Zhu, Simon Yuen

Comments ACM Transactions on Graphics; SIGGRAPH 2026. Project page: https://nvlabs.github.io/motionbricks/

详情

英文摘要

Despite transformative advances in generative motion synthesis, real-time interactive motion control remains dominated by traditional techniques. In this work, we identify two key challenges in bridging research and production: 1) Real-time scalability: Industry applications demand real-time generation of a vast repertoire of motion skills, while generative methods exhibit significant degradation in quality and scalability under real-time computation constraints, and 2) Integration: Industry applications demand fine-grained multi-modal control involving velocity commands, style selection, and precise keyframes, a need largely unmet by existing text- or tag-driven models. To overcome these limitations, we introduce MotionBricks: a large-scale, real-time generative framework with a two-fold solution. First, we propose a large-scale modular latent generative backbone tailored for robust real-time motion generation, effectively modeling a dataset of over 350,000 motion clips with a single model. Second, we introduce smart primitives that provide a unified, robust, and intuitive interface for authoring both navigation and object interaction. Applications can be designed in a plug-and-play manner like assembling bricks without expert animation knowledge. Quantitatively, we show that MotionBricks produces state-of-the-art motion quality on open-source and proprietary datasets of various scales, while also achieving a real-time throughput of 15,000 FPS with 2ms latency. We demonstrate the flexibility and robustness of MotionBricks in a complete production-level animation demo, covering navigation and object-scene interaction across various styles with a unified model. To showcase our framework's application beyond animation, we deploy MotionBricks on the Unitree G1 humanoid robot to demonstrate its flexibility and generalization for real-time robotic control.

URL PDF HTML ☆

赞 0 踩 0

2604.24832 2026-04-29 cs.LG cs.AI

On the Trainability of Masked Diffusion Language Models via Blockwise Locality

Yuxiang Wang, Yu Xiang, Baojian Zhou, Qifang Zhao, Keyue Jiang, Yanghua Xiao, Xiaoxiao Xu

2604.24827 2026-04-29 cs.LG cs.AI

Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity

Bojie Li

详情

英文摘要

Closed-source frontier labs do not disclose parameter counts, and the standard alternative -- inference economics -- carries $2\times$+ uncertainty from hardware, batching, and serving-stack assumptions external to the model. We exploit a tighter intrinsic bound: storing $F$ facts requires at least $F/$(bits per parameter) weights, so measuring how much a model \emph{knows} lower-bounds how many parameters it \emph{has}. We introduce \textbf{Incompressible Knowledge Probes (IKPs)}, a benchmark of 1{,}400 factual questions spanning 7 tiers of obscurity, designed to isolate knowledge that cannot be derived by reasoning or compressed by architectural improvements. We calibrate a log-linear mapping from IKP accuracy to parameter count on 89 open-weight models (135M--1,600B) spanning 19 vendors, achieving $R^2 = 0.917$; leave-one-out cross-validation confirms generalization (median fold error $1.59\times$, $68.5\%$ within $2\times$ and $87.6\%$ within $3\times$). For Mixture-of-Experts models, total parameters predict knowledge ($R^2 = 0.79$) far better than active parameters ($R^2 = 0.51$). We evaluate 188 models from 27 vendors and estimate effective knowledge capacity for all major proprietary frontier models; for heavily safety-tuned models the estimates are lower bounds, since refusal policy can hide tens of percentage points of "refused but known" capacity. The widely-reported saturation of reasoning benchmarks does not imply the end of scaling. Procedural capability compresses under the "Densing Law," but across 96 dated open-weight models the IKP time coefficient is $-0.0010$/month (95\% CI $[-0.0031, +0.0008]$) -- indistinguishable from zero, and rejecting the Densing prediction of $+0.0117$/month at $p < 10^{-15}$. Factual capacity continues to scale log-linearly with parameters across generations and across vendors.

URL PDF HTML ☆

赞 0 踩 0

2604.24818 2026-04-29 cs.LG

Heterogeneous Variational Inference for Markov Degradation Hazard Models: Discretized Mixture with Interpretable Clusters

Takato Yasuno

Comments 19 pages, 6 figures, 7 tables

详情

英文摘要

Bayesian finite mixture models can identify discrete risk clusters (low-risk vs. high-risk equipment), but face three critical bottlenecks: (1) insufficient degradation signals from coarse state discretization, (2) unstable cluster identification when data inherently supports fewer clusters than explored, and (3) computational infeasibility of Markov Chain Monte Carlo (MCMC) methods for production deployment (7+ hours per model). We propose a practical framework combining (1) 8-state global percentile discretization that amplifies degradation events, (2) 30-dimensional feature engineering integrating statistical trends (22 features), continuous health indicators, and text embeddings (PCA-compressed to 3 dimensions), (3) interpretable model selection rules enforcing minimum cluster share and separation alongside WAIC, and (4) Automatic Differentiation Variational Inference (ADVI) with full-rank covariance for stable, fast estimation. Applied to 280 industrial pump equipment with 104,703 inspection records, we demonstrate: (1) Random effect models (baseline) show ADVI and NUTS produce nearly identical estimates with 15$\times$ speedup, validating ADVI accuracy. (2) Finite mixture models identify optimal number of clusters with interpretability constraints. (3) NUTS exhibits severe convergence issues and label switching, while ADVI provides stable results in 84$\times$ less time. We contributed that (1) First demonstration that fine-grained state discretization (8-state) is essential for mixture model stability in survival analysis.(2) Comprehensive feature engineering strategy combining statistical, continuous, and semantic signals. (3) Practical interpretability rules preventing overfitting in automated model selection. (4) Empirical evidence that ADVI outperforms NUTS for finite mixture models in terms of convergence, stability, and computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2604.24811 2026-04-29 cs.LG cs.AI

Time-varying Interaction Graph ODE for Dynamic Graph Representation Learning

Xiaoyi Wang, Zhiqiang Wang, Jianqing Liang, Xingwang Zhao, Chuangyin Dang, Zhen Jin, Jiye Liang

2604.24809 2026-04-29 cs.LG cs.AI

Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model

Maixent Chenebaux

2604.24804 2026-04-29 cs.LG cs.CL

Intrinsic Mutual Information as a Modulator for Preference Optimization

Peng Liao, Peijia Zheng, Lingbo Li, Shangsong Liang, Lin Chen

Comments ACL Findings 2026