arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.22533 2026-03-26 cs.CV

Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration

Mengyu Yang, Yanming Yang, Chenyi Xu, Chenxi Song, Yufan Zuo, Tong Zhao, Ruibo Li, Chi Zhang

Comments Accepted by CVPR 2026; Project page: https://fast3dcache-agi.github.io

详情

英文摘要

Diffusion models have achieved impressive generative quality across modalities like 2D images, videos, and 3D shapes, but their inference remains computationally expensive due to the iterative denoising process. While recent caching-based methods effectively reuse redundant computations to speed up 2D and video generation, directly applying these techniques to 3D diffusion models can severely disrupt geometric consistency. In 3D synthesis, even minor numerical errors in cached latent features accumulate, causing structural artifacts and topological inconsistencies. To overcome this limitation, we propose Fast3Dcache, a training-free geometry-aware caching framework that accelerates 3D diffusion inference while preserving geometric fidelity. Our method introduces a Predictive Caching Scheduler Constraint (PCSC) to dynamically determine cache quotas according to voxel stabilization patterns and a Spatiotemporal Stability Criterion (SSC) to select stable features for reuse based on velocity magnitude and acceleration criterion. Comprehensive experiments show that Fast3Dcache accelerates inference significantly, achieving up to a 27.12% speed-up and a 54.83% reduction in FLOPs, with minimal degradation in geometric quality as measured by Chamfer Distance (2.48%) and F-Score (1.95%).

URL PDF HTML ☆

赞 0 踩 0

2511.21542 2026-03-26 cs.RO cs.AI cs.CV cs.LG

E0: Enhancing Generalization and Fine-Grained Control in VLA Models via Tweedie Discrete Diffusion

Zhihao Zhan, Jiaying Zhou, Likui Zhang, Qinhan Lv, Hao Liu, Jusheng Zhang, Weizheng Li, Ziliang Chen, Tianshui Chen, Ruifeng Zhai, Keze Wang, Liang Lin, Guangrun Wang

2511.20001 2026-03-26 cs.CL cs.SI

A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media

Edward Ajayi, Martha Kachweka, Mawuli Deku, Emily Aiken

Comments Best Paper Award at the AAAI-26 Bridge Program on AI for Medicine and Healthcare. Published in Proceedings of the Second AAAI Bridge Program on AI for Medicine and Healthcare, PMLR 317:15-26, 2026. Paper URL: https://proceedings.mlr.press/v317/ajayi26a.html

2511.18281 2026-03-26 cs.CV cs.AI

Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

Yara Bahram, Mélodie Desbos, Mohammadhadi Shateri, Eric Granger

Comments Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2511.18178 2026-03-26 cs.LG

Bayesian Calibration of Engine-out NOx Models for Engine-to-Engine Transferability

Shrenik Zinage, Peter Meckl, Ilias Bilionis

Comments Accepted at International Journal of Engine Research

2511.16993 2026-03-26 cs.CV

DepthFocus: Controllable Depth Estimation for See-Through Scenes

Junhong Min, Jimin Kim, Minwook Kim, Cheol-Hui Min, Youngpil Jeon, Minyong Choi

Comments 8pages, 5 figures, 5 tables

2511.10051 2026-03-26 cs.CL

GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt

Zhenhe Li, Can Lin, Ling Zheng, Wen-Da Wei, Junli Liang, Qi Song

详情

DOI: 10.1609/aaai.v40i38.40457
Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2026)

英文摘要

Multi-turn instruction following is essential for building intelligent conversational systems that can consistently adhere to instructions across dialogue turns. However, existing approaches to enhancing multi-turn instruction following primarily rely on collecting or generating large-scale multi-turn dialogue datasets to fine-tune large language models (LLMs), which treat each response generation as an isolated task and fail to explicitly incorporate multi-turn instruction following into the optimization objectives. As a result, instruction-tuned LLMs often struggle with complex long-distance constraints. In multi-turn dialogues, relational constraints across turns can be naturally modeled as labeled directed edges, making graph structures particularly suitable for modeling multi-turn instruction following. Despite this potential, leveraging graph structures to enhance the multi-turn instruction following capabilities of LLMs remains unexplored. To bridge this gap, we propose GraphIF, a plug-and-play framework that models multi-turn dialogues as directed relation graphs and leverages graph prompts to enhance the instruction following capabilities of LLMs. GraphIF comprises three key components: (1) an agent-based relation extraction module that captures inter-turn semantic relations via action-triggered mechanisms to construct structured graphs; (2) a relation graph prompt generation module that converts structured graph information into natural language prompts; and (3) a response rewriting module that refines initial LLM outputs using the generated graph prompts. Extensive experiments on two long multi-turn dialogue datasets demonstrate that GraphIF can be seamlessly integrated into instruction-tuned LLMs and leads to significant improvements across all four multi-turn instruction-following evaluation metrics.

URL PDF HTML ☆

赞 0 踩 0

2511.08126 2026-03-26 cs.CL

Quantification and object perception in Multimodal Large Language Models and human linguistic cognition

Raquel Montero, Natalia Moskvina, Paolo Morosi, Tamara Serrano, Elena Pagliarini, Evelina Leivada

2510.27321 2026-03-26 cs.LG

MedM2T: A MultiModal Framework for Time-Aware Modeling with Electronic Health Record and Electrocardiogram Data

Yu-Chen Kuo, Yi-Ju Tseng

Comments This preprint version of the manuscript has been submitted to the IEEE Journal of Biomedical and Health Informatics (JBHI) for review. The implementation of MedM2T is available at https://github.com/DHLab-TSENG/MedM2T

2510.22201 2026-03-26 cs.RO

ACG: Action Coherence Guidance for Flow-based Vision-Language-Action models

Minho Park, Kinam Kim, Junha Hyung, Hyojin Jang, Hoiyeong Jin, Jooyeol Yun, Hojoon Lee, Jaegul Choo

Comments Accepted to ICRA 2026

2510.18019 2026-03-26 cs.CL cs.AI

Is Multilingual LLM Watermarking Truly Multilingual? Scaling Robustness to 100+ Languages via Back-Translation

Asim Mohamed, Martin Gubri

2510.17662 2026-03-26 cs.SD cs.CL

DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model

Massa Baali, Rita Singh, Bhiksha Raj

2510.15495 2026-03-26 cs.LG cs.AI

OffSim: Offline Simulator for Model-based Offline Inverse Reinforcement Learning

Woo-Jin Ahn, Sang-Ryul Baek, Yong-Jun Lee, Hyun-Duck Choi, Myo-Taeg Lim

Comments Due to an authorship dispute among the co-authors, we request to withdraw this submission. The issue is currently unresolved, and we believe withdrawal is appropriate until the matter is settled

2510.15259 2026-03-26 cs.AI

SAG-Agent: Enabling Long-Horizon Reasoning in Strategy Games via Dynamic Knowledge Graphs

Chenwei Tang, Lin Long, Xinyu Liu, Jingyu Xing, Zizhou Wang, Joey Tianyi Zhou, Jiawei Du, Liangli Zhen, Jiancheng Lv

2510.14814 2026-03-26 cs.LG

Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift

Zhiyuan Zhao, Haoxin Liu, B. Aditya Prakash

Comments 17 pages, 6 figures, 4 tables

2510.12684 2026-03-26 cs.RO cs.SY eess.SY

Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning

Alvaro Belmonte-Baeza, Miguel Cazorla, Gabriel J. García, Carlos J. Pérez-Del-Pulgar, Jorge Pomares

Comments This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published

详情

DOI: 10.1109/iSpaRo66239.2025.11437097
Journal ref: 2025 International Conference on Space Robotics (iSpaRo)

英文摘要

Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.

URL PDF HTML ☆

赞 0 踩 0

2510.08720 2026-03-26 cs.CL

How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective

Xianzhen Luo, Jinyang Huang, Wenzhen Zheng, Qingfu Zhu, Mingzheng Xu, Yiheng Xu, Yuantao Fan, Wanxiang Che

Comments Accepted by ICLR2026

2510.06020 2026-03-26 cs.LG

RamPINN: Recovering Raman Spectra From Coherent Anti-Stokes Spectra Using Embedded Physics

Sai Karthikeya Vemuri, Adithya Ashok Chalain Valapil, Tim Büchner, Joachim Denzler

Comments Accepted at AISTATS 2026

2510.02392 2026-03-26 cs.CL

KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning

Yinyi Luo, Zhexian Zhou, Hao Chen, Kai Qiu, Marios Savvides, Sharon Li, Jindong Wang

Comments ICLR 2026

2510.02315 2026-03-26 cs.CV

FOCUS: Optimal Control for Multi-Entity World Modeling in Text-to-Image Generation

Eric Tillmann Bill, Enis Simsar, Thomas Hofmann

Comments Project Page: https://ericbill21.github.io/FOCUS/

2510.01603 2026-03-26 cs.RO

MiniBEE: A New Form Factor for Compact Bimanual Dexterity

Sharfin Islam, Zewen Chen, Zhanpeng He, Swapneel Bhatt, Andres Permuy, Brock Taylor, James Vickery, Zhengbin Lu, Cheng Zhang, Pedro Piacenza, Matei Ciocarlie

2509.21545 2026-03-26 cs.LG cs.AI

Evidence for Limited Metacognition in LLMs

Christopher Ackerman

Comments 26 pages, 25 figures

2509.15147 2026-03-26 cs.LG

Who to Trust? Aggregating Client Predictions in Federated Distillation

Viktor Kovalchuk, Denis Son, Arman Bolatov, Mohsen Guizani, Samuel Horváth, Maxim Panov, Martin Takáč, Eduard Gorbunov, Nikita Kotelevskii

2509.13767 2026-03-26 cs.CV

VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI

Daiqi Liu, Johannes Enk, Maureen Stone, Fangxu Xing, Tomás Arias-Vergara, Jerry L. Prince, Jana Hutter, Jonghye Woo, Andreas Maier, Paula Andrea Pérez-Toro

Comments Preprint submitted to MIDL short paper 2026

2508.17381 2026-03-26 cs.LG

DART: A Server-side Plug-in for Resource-efficient Robust Federated Learning

Omar Bekdache, Naresh Shanbhag

2507.21037 2026-03-26 cs.LG

When Brain Foundation Model Meets Cauchy-Schwarz Divergence: A New Framework for Cross-Subject Motor Imagery Decoding

Jinzhou Wu, Baoping Tang, Qikang Li, Yi Wang, Cheng Li, Shujian Yu

Comments This work has been submitted to Elsevier for possible publication

2506.16370 2026-03-26 cs.CL cs.AI

Can structural correspondences ground real world representational content in Large Language Models?

Iwan Williams

2506.06482 2026-03-26 cs.LG

TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness

Zhiyuan Zhao, Juntong Ni, Shangqing Xu, Haoxin Liu, Wei Jin, B. Aditya Prakash

Comments 48 pages, 1 figure, 30 tables

2505.22785 2026-03-26 cs.LG

Navigating the Latent Space Dynamics of Neural Models

Marco Fumero, Luca Moschella, Emanuele Rodolà, Francesco Locatello

2505.16950 2026-03-26 cs.LG cs.AI cs.IT math.IT

Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning

Adnan Oomerjee, Zafeirios Fountas, Haitham Bou-Ammar, Jun Wang

详情

英文摘要

Transformer LLMs have been shown to exhibit strong reasoning ability that scales with inference-time compute, most prominently through token-space "thinking" chains of thought. A growing line of work pushes extra computation into the model's latent space, which we term Auxiliary Latent-Space Computation (ALSC). Existing ALSC methods largely fall into three buckets: (i) token-mediated latent rollouts, (ii) residual/activation steering, and (iii) memory (KV) compression. An underexplored alternative is memory consolidation/reconsolidation, two processes in the brain that are responsible for stabilising newly formed memory traces, and, upon recall, transiently rendering established traces plastic such they can integrate new contextual information before restabilising. In Transformer LLMs, this can be seen as analogous to performing in-place rewrites of new KV segments, and rewrites of recalled past segments. In this work, we give a theoretical justification as to why memory (re)consolidation via KV cache rewrites is beneficial for improved reasoning. We do this through the lens of Information Bottleneck (IB) theory, which posits that model generalisation emerges from an optimal balance between input information compression and retention of predictive information in latent representations. We then introduce the Bottlenecked Transformer, which augments a backbone LLM with a Cache Processor, an auxiliary Transformer that performs periodic, non-causal, in-place KV rewrites at newline-delimited reasoning step boundaries. The Processor consolidates recently written KV entries and reconsolidates a small, top-k attention-selected set of prior entries. We evaluate our Bottlenecked Transformer architecture on math reasoning benchmarks. Our model sees consistent performance gains over vanilla Transformers and pause-token augmented baselines, with gains of up to +6.6pp for selected tasks/backbones.

URL PDF HTML ☆

赞 0 踩 0