arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.04021 2026-04-29 cs.CV

C3G: Learning Compact 3D Representations with 2K Gaussians

Honggyu An, Jaewoo Jung, Mungyeom Kim, Chaehyun Kim, Minkyeong Jeon, Jisang Han, Kazumi Fukuda, Takuya Narihira, Hyuna Ko, Junsu Kim, Sunghwan Hong, Yuki Mitsufuji, Seungryong Kim

Comments Project Page : https://cvlab-kaist.github.io/C3G/

详情

英文摘要

Reconstructing and understanding 3D scenes from unposed sparse views in a feed-forward manner remains as a challenging task in 3D computer vision. Recent approaches use per-pixel 3D Gaussian Splatting for reconstruction, followed by a 2D-to-3D feature lifting stage for scene understanding. However, they generate excessive redundant Gaussians, causing high memory overhead and sub-optimal multi-view feature aggregation, leading to degraded novel view synthesis and scene understanding performance. We propose C3G, a novel feed-forward framework that estimates compact 3D Gaussians only at essential spatial locations, minimizing redundancy while enabling effective feature lifting. We introduce learnable tokens that aggregate multi-view features through self-attention to guide Gaussian generation, ensuring each Gaussian integrates relevant visual features across views. We then exploit the learned attention patterns for Gaussian decoding to efficiently lift features. Extensive experiments on pose-free novel view synthesis, 3D open-vocabulary segmentation, and view-invariant feature aggregation demonstrate our approach's effectiveness. Results show that a compact yet geometrically meaningful representation is sufficient for high-quality scene reconstruction and understanding, achieving superior memory efficiency and feature fidelity compared to existing methods.

URL PDF HTML ☆

赞 0 踩 0

2512.00756 2026-04-29 cs.AI

MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents

Ruihan Chen, Qiming Li, Xiaocheng Feng, Weihong Zhong, Xiaoliang Yang, Yuxuan Gu, Zekun Zhou, Yunfei Lu, Haoyu Ren, Kun Chen, Dandan Tu, Bing Qin

Comments 35pages, 15figures

2511.22793 2026-04-29 cs.LG

GSpaRC: Gaussian Splatting for Real-time Reconstruction of RF Channels

Bhavya Sai Nukapotula, Rishabh Tripathi, Seth Pregler, Dileep Kalathil, Srinivas Shakkottai, Theodore S. Rappaport

Comments Project website: https://nbhavyasai.github.io/GSpaRC/

2511.21517 2026-04-29 cs.CL cs.AI

Voice, Bias, and Coreference: An Interpretability Study of Gender in Speech Translation

Lina Conti, Dennis Fucci, Marco Gaido, Matteo Negri, Guillaume Wisniewski, Luisa Bentivogli

Comments Accepted to LREC 2026

2511.20496 2026-04-29 cs.RO

Metric, inertially aligned monocular state estimation via kinetodynamic priors

Jiaxin Liu, Min Li, Wanting Xu, Liang Li, Jiaqi Yang, Laurent Kneip

2511.16518 2026-04-29 cs.RO cs.CL cs.CV

MiMo-Embodied: X-Embodied Foundation Model Technical Report

Xiaoshuai Hao, Lei Zhou, Zhijian Huang, Zhiwen Hou, Yingbo Tang, Lingfeng Zhang, Guang Li, Zheng Lu, Shuhuai Ren, Xianhui Meng, Yuchen Zhang, Jing Wu, Jinghui Lu, Chenxu Dang, Jiayi Guan, Jianhua Wu, Zhiyi Hou, Hanbing Li, Shumeng Xia, Mingliang Zhou, Yinan Zheng, Zihao Yue, Shuhao Gu, Hao Tian, Yuannan Shen, Jianwei Cui, Wen Zhang, Shaoqing Xu, Bing Wang, Haiyang Sun, Zeyu Zhu, Yuncheng Jiang, Zibin Guo, Chuhong Gong, Chaofan Zhang, Wenbo Ding, Kun Ma, Guang Chen, Rui Cai, Diyun Xiang, Heng Qu, Fuli Luo, Hangjun Ye, Long Chen

Comments Code: https://github.com/XiaomiMiMo/MiMo-Embodied | Model: https://huggingface.co/XiaomiMiMo/MiMo-Embodied-7B

2511.14183 2026-04-29 cs.CV

UniSER: A Foundation Model for Unified Soft Effects Removal

Jingdong Zhang, Lingzhi Zhang, Qing Liu, Mang Tik Chiu, Connelly Barnes, Yizhou Wang, Haoran You, Xiaoyang Liu, Yuqian Zhou, Zhe Lin, Eli Shechtman, Sohrab Amirghodsi, Xin Li, Wenping Wang, Xiaohang Zhan

2511.03473 2026-04-29 cs.LG

Reinforcement Learning Using known Invariances

Alexandru Cioba, Aya Kayal, Laura Toni, Sattar Vakili, Alberto Bernacchia

2510.27106 2026-04-29 cs.CL

Rating Roulette: Self-Inconsistency in LLM-As-A-Judge Frameworks

Rajarshi Haldar, Julia Hockenmaier

Comments Accepted at EMNLP 2025

2510.22102 2026-04-29 cs.CV cs.AI cs.CL

Mitigating Coordinate Prediction Bias from Positional Encoding Failures

Xingjian Tao, Yiwei Wang, Yujun Cai, Yihong Luo, Kai Han, Jing Tang

2510.20303 2026-04-29 cs.CL

Citation Failure: Definition, Analysis and Efficient Mitigation

Jan Buchmann, Iryna Gurevych

Comments Accepted to TACL in April 2024. Paper repository: https://github.com/UKPLab/tacl2026-citation-failure

2510.18030 2026-04-29 cs.CL cs.AI cs.LG

From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models

Ziyan Wang, Enmao Diao, Qi Le, Pu Wang, Minwoo Lee, Shu-ping Yeh, Evgeny Stupachenko, Hao Feng, Li Yang

Comments 20 pages, 6 figures. Accepted by ACL2026 Main Conference

2510.12834 2026-04-29 cs.SD cs.AI eess.AS

Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction

Téo Guichoux, Théodor Lemerle, Shivam Mehta, Jonas Beskow, Gustav Eje Henter, Laure Soulier, Catherine Pelachaud, Nicolas Obin

Comments Paper accepted at ICASSP 2026, 5 pages

2510.07499 2026-04-29 cs.CL cs.AI cs.LG

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Soyeong Jeong, Taehee Jung, Sung Ju Hwang, Joo-Kyung Kim, Dongyeop Kang

Comments ACL Findings 2026

2509.26543 2026-04-29 cs.CL cs.AI

The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

Lina Conti, Dennis Fucci, Marco Gaido, Matteo Negri, Guillaume Wisniewski, Luisa Bentivogli

Comments Accepted to BlackBoxNLP 2025

2509.14000 2026-04-29 cs.LG

JaGuard: Position Error Correction of GNSS Jamming with Deep Temporal Graphs

Ivana Kesić, Aljaž Blatnik, Carolina Fortuna, Blaž Bertalanič

Comments 12 pages, 8 figures

2509.11449 2026-04-29 cs.LG cs.AI

Tabular Data with Class Imbalance: Predicting Electric Vehicle Crash Severity with Pretrained Transformers (TabPFN) and Mamba-Based Models

Shriyank Somvanshi, Pavan Hebli, Gaurab Chhetri, Subasish Das

Comments This is the author's preprint version of a paper accepted for presentation at the 24th International Conference on Machine Learning and Applications (ICMLA 2025), December 3-5, 2025, Florida, USA. The final published version will appear in the official IEEE proceedings. Conference site: https://www.icmla-conference.org/icmla25/

2509.11443 2026-04-29 cs.CL cs.SI

A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm

Gaurab Chhetri, Darrell Anderson, Boniphace Kutela, Subasish Das

Comments This is the author's preprint version of a paper accepted for presentation at the 24th International Conference on Machine Learning and Applications (ICMLA 2025), December 3-5, 2025, Florida, USA. The final published version will appear in the official IEEE proceedings. Conference site: https://www.icmla-conference.org/icmla25/

2509.10813 2026-04-29 cs.CV cs.RO

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

Weipeng Zhong, Peizhou Cao, Yichen Jin, Li Luo, Wenzhe Cai, Jingli Lin, Hanqing Wang, Zhaoyang Lyu, Tai Wang, Bo Dai, Xudong Xu, Jiangmiao Pang

Comments Accepted by NeurIPS 2025; Project page: https://marjordcpz.github.io/InternScenes.github.io

2509.09708 2026-04-29 cs.CL cs.AI

Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal

Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah, Ranjan Satapathy, Erik Cambria, Roy Ka Wei Lee

2509.06484 2026-04-29 cs.LG cs.CE

Thermodynamically consistent machine learning model for excess Gibbs energy

Marco Hoffmann, Thomas Specht, Quirin Göttl, Jakob Burger, Stephan Mandt, Hans Hasse, Fabian Jirasek

Comments 33 pages, 2 figures, 1 table

2508.18717 2026-04-29 cs.LG cs.CV cs.IT math.AT math.IT

Natural Image Classification via Quasi-Cyclic Graph Ensembles and Random-Bond Ising Models at the Nishimori Temperature

V. S. Usatyuk, D. A. Sapozhnikov, S. I. Egorov

Comments 38 pages, 8 figures, 4 tables, was presented at the 9th International Conference 'Deep Learning on Computational Physics (DLCP2025)', and accepted for the Moscow University Physics Bulletin, Physics series

2508.16198 2026-04-29 cs.CL

OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning

Seunghee Kim, Ingyu Bang, Seokgyu Jang, Changhyeon Kim, Sanghwan Bae, Jihun Choi, Richeng Xuan, Taeuk Kim

Comments ACL 2026 Findings

2508.08468 2026-04-29 cs.SD eess.SP

Audio-Visual Speech Enhancement: Architectural Design and Deployment Strategies

Anis Hamadouche, Haifeng Luo, Mathini Sellathurai, Amir Hussain, Tharm Ratnarajah

Comments There was mistake in the model baseline

详情

英文摘要

Real-time audio-visual speech enhancement (AVSE) is a key enabler for immersive and interactive multimedia services, yet its performance is tightly constrained by network latency, uplink capacity, and computational delay. This paper presents the design, deployment, and evaluation of a complete cloud-edge-assisted AVSE system operating over a public 5G edge network. The system integrates CNN-based acoustic enhancement and OpenCV-based facial feature extraction with an LSTM fusion network to preserve temporal coherence, and is deployed on a Vodafone-compatible AWS Wavelength edge cloud. Through extensive stress testing, we analyze end-to-end performance under varying network load and adaptive multimedia profiles. Results show that compute placement at the network edge is critical for meeting real-time coherence constraints, and that uplink capacity is often the dominant bottleneck for interactive AVSE services. Only 5G and wired Ethernet consistently satisfied the required communication delay bound for uncompressed audio-video chunks, while aggressive compression reduced payload sizes by up to 80% with negligible perceptual degradation, enabling robust operation under constrained conditions. We further demonstrate a fundamental trade-off between processing latency and enhancement quality, where reduced model complexity lowers delay but degrades reconstruction performance in low-SNR scenarios. Our findings indicate that public 5G edge environments can sustain real-time, interactive AVSE workloads when network and compute resources are carefully orchestrated, although performance margins remain tighter than in dedicated infrastructures. The architectural insights derived from this study provide practical guidelines for the design of delay-sensitive multimedia and perceptual enhancement services on emerging 5G edge-cloud platforms.

URL PDF HTML ☆

赞 0 踩 0

2508.07101 2026-04-29 cs.CL cs.AI

Less Is More: Fast and Accurate Reasoning with Cross-Head Unified Sparse Attention

Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali

2508.02964 2026-04-29 cs.LG stat.CO

Injecting Measurement Information Yields a Fast and Noise-Robust Diffusion-Based Inverse Problem Solver

Jonathan Patsenker, Henry Li, Myeongseob Ko, Ruoxi Jia, Yuval Kluger

2507.15707 2026-04-29 cs.CL cs.AI

Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?

Seok Hwan Song, Mohna Chakraborty, Qi Li, Wallapak Tavanapong

Comments ACL 2025 (Findings)

2507.14245 2026-04-29 cs.LG cond-mat.mtrl-sci cs.AI cs.CE q-bio.BM

Curriculum-guided multimodal representation learning enables generalizable prediction of nanomaterial-protein interactions

Hengjie Yu, Kenneth A. Dawson, Haiyun Yang, Shuya Liu, Yan Yan, Yaochu Jin

Comments 36 pages, 6 figures

2507.12553 2026-04-29 cs.CL cs.AI

Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility

Michael A. Lepori, Jennifer Hu, Ishita Dasgupta, Roma Patel, Thomas Serre, Ellie Pavlick

2507.07847 2026-04-29 cs.CL cs.AI

From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems

Youngjoon Jang, Seongtae Hong, Junyoung Son, Sungjin Park, Chanjun Park, Heuiseok Lim

Comments ACL 2025 SRW