arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16103 2026-03-18 cs.CV cs.GR

NanoGS: Training-Free Gaussian Splat Simplification

Butian Xiong, Rong Liu, Tiantian Zhou, Meida Chen, Zhiwen Fan, Andrew Feng

详情

英文摘要

3D Gaussian Splat (3DGS) enables high-fidelity, real-time novel view synthesis by representing scenes with large sets of anisotropic primitives, but often requires millions of Splats, incurring significant storage and transmission costs. Most existing compression methods rely on GPU-intensive post-training optimization with calibrated images, limiting practical deployment. We introduce NanoGS, a training-free and lightweight framework for Gaussian Splat simplification. Instead of relying on image-based rendering supervision, NanoGS formulates simplification as local pairwise merging over a sparse spatial graph. The method approximates a pair of Gaussians with a single primitive using mass preserved moment matching and evaluates merge quality through a principled merge cost between the original mixture and its approximation. By restricting merge candidates to local neighborhoods and selecting compatible pairs efficiently, NanoGS produces compact Gaussian representations while preserving scene structure and appearance. NanoGS operates directly on existing Gaussian Splat models, runs efficiently on CPU, and preserves the standard 3DGS parameterization, enabling seamless integration with existing rendering pipelines. Experiments demonstrate that NanoGS substantially reduces primitive count while maintaining high rendering fidelity, providing an efficient and practical solution for Gaussian Splat simplification. Our project website is available at https://saliteta.github.io/NanoGS/.

URL PDF HTML ☆

赞 0 踩 0

2603.16099 2026-03-18 cs.CV

OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder

Sensen Gao, Zhaoqing Wang, Qihang Cao, Dongdong Yu, Changhu Wang, Tongliang Liu, Mingming Gong, Jiawang Bian

Comments Code: https://github.com/SensenGao/OneWorld

2603.16093 2026-03-18 cs.SD cs.AI cs.CV cs.MM

Diffusion Models for Joint Audio-Video Generation

Alejandro Paredes La Torre

2603.16092 2026-03-18 cs.CV cs.AI cs.LG

Parallel In-context Learning for Large Vision Language Models

Shin'ya Yamaguchi, Daiki Chijiwa, Tamao Sakao, Taku Hasegawa

Comments Accepted to CVPR 2026 (Findings); Code is available at https://github.com/yshinya6/parallel-icl

2603.16086 2026-03-18 cs.RO cs.AI cs.CV cs.SD

Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation

Chang Nie, Tianchen Deng, Guangming Wang, Zhe Liu, Hesheng Wang

详情

英文摘要

While recent Vision-Language-Action (VLA) models have begun to incorporate audio, they typically treat sound as static pre-execution prompts or focus exclusively on human speech. This leaves a significant gap in real-time, sound-centric manipulation where fleeting environmental acoustics provide critical state verification during task execution. Consequently, key sounds are easily missed due to low-frequency updates or system latency. This problem is exacerbated by action chunking with open-loop execution, which creates a Blind Execution Interval where acoustic events are lost between discrete audio observation windows. Recognizing the necessity of continuous auditory awareness, we formalize Vision-Sound-Language-Action (VSLA) as a continuous control paradigm conditioned on vision, streaming audio, language, and proprioception under delayed decision loops. As an instantiation, we introduce HEAR, a VSLA framework integrating four components: (i) a streaming Historizer to maintain a compact, causal audio context across execution gaps; (ii) an Envisioner adapted from omni foundation models to reason over multi-sensory inputs; (iii) an Advancer, formulated as an audio world model, to learn temporal dynamics by predicting near-future audio codes; and (iv) a flow-matching Realizer policy to generate smooth action chunks. To address the scarcity of pretraining data and evaluations for VSLA, we construct OpenX-Sound for pretraining, alongside HEAR-Bench, the first sound-centric manipulation benchmark with strict causal timing rules. Our results suggest that robust sound-centric manipulation necessitates causal persistence and explicit temporal learning. This framework provides a practical step toward multi-sensory foundation models for embodied agents, enabling robots to perceive and interact with dynamic environments. Code and videos are available at https://hear.irmv.top.

URL PDF HTML ☆

赞 0 踩 0

2603.16085 2026-03-18 cs.CV cs.AI

Interact3D: Compositional 3D Generation of Interactive Objects

Hui Shan, Keyang Luo, Ming Li, Sizhe Zheng, Yanwei Fu, Zhen Chen, Xiangru Huang

2603.16083 2026-03-18 cs.CV

Structured prototype regularization for synthetic-to-real driving scene parsing

Jiahe Fan, Xiao Ma, Sergey Vityazev, George Giakos, Shaolong Shu, Rui Fan

2603.16080 2026-03-18 cs.LG

A Depth-Aware Comparative Study of Euclidean and Hyperbolic Graph Neural Networks on Bitcoin Transaction Systems

Ankit Ghimire, Saydul Akbar Murad, Nick Rahimi

2603.16078 2026-03-18 cs.CV cs.GR

Volumetrically Consistent Implicit Atlas Learning via Neural Diffeomorphic Flow for Placenta MRI

Athena Taymourtash, S. Mazdak Abulnaga, Esra Abaci Turk, P. Ellen Grant, Polina Golland

2603.16070 2026-03-18 cs.CL cs.AI

SEAHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Southeast Asia

Ri Chi Ng, Aditi Kumaresan, Yujia Hu, Roy Ka-Wei Lee

Comments TALLIP Accepted

2603.16067 2026-03-18 cs.CV cs.LG

Attribution Upsampling should Redistribute, Not Interpolate

Vincenzo Buono, Peyman Sheikholharam Mashhadi, Mahmoud Rahat, Prayag Tiwari, Stefan Byttner

2603.16066 2026-03-18 cs.LG

Adaptive regularization parameter selection for high-dimensional inverse problems: A Bayesian approach with Tucker low-rank constraints

Qing-Mei Yang, Da-Qing Zhang

2603.16063 2026-03-18 cs.CV

ViT-AdaLA: Adapting Vision Transformers with Linear Attention

Yifan Li, Seunghyun Yoon, Viet Dac Lai, Franck Dernoncourt, Jason Kuen, Yu Kong, Trung Bui

2603.16052 2026-03-18 cs.AI

A Context Alignment Pre-processor for Enhancing the Coherence of Human-LLM Dialog

Ding Wei

详情

英文摘要

Large language models (LLMs) have made remarkable progress in generating fluent text, but they still face a critical challenge of contextual misalignment in long-term and dynamic dialogue. When human users omit premises, simplify references, or shift context abruptly during interactions with LLMs, the models may fail to capture their actual intentions, producing mechanical or off-topic responses that weaken the collaborative potential of dialogue. To address this problem, this paper proposes a computational framework called the Context Alignment Pre-processor (C.A.P.). Rather than operating during generation, C.A.P. functions as a pre-processing module between user input and response generation. The framework includes three core processes: (1) semantic expansion, which extends a user instruction to a broader semantic span including its premises, literal meaning, and implications; (2) time-weighted context retrieval, which prioritizes recent dialogue history through a temporal decay function approximating human conversational focus; and (3) alignment verification and decision branching, which evaluates whether the dialogue remains on track by measuring the semantic similarity between the current prompt and the weighted historical context. When a significant deviation is detected, C.A.P. initiates a structured clarification protocol to help users and the system recalibrate the conversation. This study presents the architecture and theoretical basis of C.A.P., drawing on cognitive science and Common Ground theory in human-computer interaction. We argue that C.A.P. is not only a technical refinement but also a step toward shifting human-computer dialogue from one-way command-execution patterns to two-way, self-correcting, partnership-based collaboration. Finally, we discuss implementation paths, evaluation methods, and implications for the future design of interactive intelligent systems.

URL PDF HTML ☆

赞 0 踩 0

2603.16050 2026-03-18 cs.RO cs.CV eess.IV

The Era of End-to-End Autonomy: Transitioning from Rule-Based Driving to Large Driving Models

Eduardo Nebot, Julie Stephany Berrio Perez

2603.16045 2026-03-18 cs.AI

POaaS: Minimal-Edit Prompt Optimization as a Service to Lift Accuracy and Cut Hallucinations on On-Device sLLMs

Jungwoo Shim, Dae Won Kim, Sun Wook Kim, Soo Young Kim, Myungcheol Lee, Jae-geun Cha, Hyunhwa Choi

Comments Accepted at FEVER 2026. 9 pages, 2 figures, 5 tables

2603.16044 2026-03-18 cs.AI

Enhancing Linguistic Generalization of VLA: Fine-Tuning OpenVLA via Synthetic Instruction Augmentation

Dongik Shin

2603.16043 2026-03-18 cs.LG cs.AI cs.CV

Collaborative Temporal Feature Generation via Critic-Free Reinforcement Learning for Cross-User Sensor-Based Activity Recognition

Xiaozhou Ye, Feng Jiang, Zihan Wang, Xiulai Wang, Yutao Zhang, Kevin I-Kai Wang

2603.16040 2026-03-18 cs.RO

Compact Optical Single-axis Joint Torque Sensor Using Redundant Photo-Reflectors and Quadratic-Programming Calibration

Hyun-Bin Kim, Byeong-Il Ham, Kyung-Soo Kim

Comments 10 pages

2603.16028 2026-03-18 cs.RO

Geometry-Aligned LLM Fine-Tuning for Sequential Narrow-Opening Planning

Al Jaber Mahmud, Xuan Wang

Comments 8 pages, 3 figures

2603.16017 2026-03-18 cs.CL cs.AI

Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability

Fan Huang, Haewoon Kwak, Jisun An

2603.16016 2026-03-18 cs.CV cs.AI cs.RO eess.IV

FlatLands: Generative Floormap Completion From a Single Egocentric View

Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome

Comments Under review

2603.16015 2026-03-18 cs.LG cs.DS

The Importance of Being Smoothly Calibrated

Parikshit Gopalan, Konstantinos Stavropoulos, Kunal Talwar, Pranay Tankala

2603.16002 2026-03-18 cs.CL cs.AI cs.LG

RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation

Saisha Pradeep Shetty, Roger Eric Goldman, Vladimir Filkov

Comments 10 pages, 3 figures. Accepted at AMIA Amplify Informatics Summit 2026

2603.16001 2026-03-18 cs.CV cs.CL cs.LG

Mostly Text, Smart Visuals: Asymmetric Text-Visual Pruning for Large Vision-Language Models

Sijie Li, Biao Qian, Jungong Han

Comments CVPR 2026. Code available here: https://github.com/LezJ/ATV-Pruning

2603.15994 2026-03-18 cs.AI

Selective Memory for Artificial Intelligence: Write-Time Gating with Hierarchical Archiving

Oliver Zahn, Simran Chana

Comments 20 pages, 8 figures

2603.15990 2026-03-18 cs.LG

W2T: LoRA Weights Already Know What They Can Do

Xiaolong Han, Ferrante Neri, Zijian Jiang, Fang Wu, Yanfang Ye, Lu Yin, Zehong Wang

2603.15987 2026-03-18 cs.LG

Determinism in the Undetermined: Deterministic Output in Charge-Conserving Continuous-Time Neuromorphic Systems with Temporal Stochasticity

Jing Yan, Kang You, Zhezhi He, Yaoyu Zhang

2603.15981 2026-03-18 cs.CL cs.AI

Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement Learning

Jingxiang Chen, Minseok Kim, Seong-Gyun Leem, Yin Huang, Rashi Rungta, Zhicheng Ouyang, Haibin Wu, Surya Teja Appini, Ankur Bansal, Yang Bai, Yue Liu, Florian Metze, Ahmed A Aly, Anuj Kumar, Ariya Rastrow, Zhaojiang Lin

2603.15976 2026-03-18 cs.AI

An Agentic Evaluation Framework for AI-Generated Scientific Code in PETSc

Hong Zhang, Barry Smith, Satish Balay, Le Chen, Murat Keceli, Lois Curfman McInnes, Junchao Zhang