arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.23589 2026-04-28 cs.CL

XITE: Cross-lingual Interpolation for Transfer using Embeddings

Barah Fazili, Preethi Jyothi

详情

英文摘要

Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.

URL PDF HTML ☆

赞 0 踩 0

2604.23588 2026-04-28 cs.AI cs.CL cs.IR

FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments Accepted to ACL 2026 Industry Track. 14 pages, 1 figure, 14 tables

2604.23586 2026-04-28 cs.CV cs.CL cs.MM cs.SD eess.AS

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

Zhen Ye, Xu Tan, Aoxiong Yin, Hongzhan Lin, Guangyan Zhang, Peiwen Sun, Yiming Li, Chi-Min Chan, Wei Ye, Shikun Zhang, Wei Xue

2604.23585 2026-04-28 cs.CL cs.IR cs.LG

ComplianceNLP: Knowledge-Graph-Augmented RAG for Multi-Framework Regulatory Gap Detection

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments Accepted at ACL 2026 Industry Track. 19 pages, 15 tables, 1 figure

2604.23584 2026-04-28 cs.CV cs.IR

Identity-Decoupled Anonymization for Visual Evidence in Multi-modal Retrieval-Augmented Generation

Zehua Cheng, Wei Dai, Jiahao Sun

Comments ACM International Conference on Multimedia Retrieval 2026

2604.23583 2026-04-28 cs.SD cs.HC

Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments

Charles Patrick Martin

Comments Accepted for publication at the International Conference on New Interfaces for Musical Expression (NIME) 2026

2604.23580 2026-04-28 cs.RO cs.AI

PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement

Tianyidan Xie, Peiyu Wang, Yuyi Qian, Yuxuan Wang, Rui Ma, Ying Tai, Song Wu, Qian Wang, Lanjun Wang, Zili Yi

2604.23578 2026-04-28 cs.CL cs.AI

LLMs Reading the Rhythms of Daily Life: Aligned Understanding for Behavior Prediction and Generation

Fanjin Meng, Jingtao Ding, Nian Li, Yizhou Sun, Yong Li

2604.23577 2026-04-28 cs.CL cs.LG

RouteNLP: Closed-Loop LLM Routing with Conformal Cascading and Distillation Co-Optimization

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments Accepted at ACL 2026 Industry Track. 13 pages, 2 figures, 15 tables, 1 algorithm

2604.23576 2026-04-28 cs.LG cs.AI

CAPSULE: Control-Theoretic Action Perturbations for Safe Uncertainty-Aware Reinforcement Learning

Rahul Narava, Siddharth Verma, Ojas Jain, Shashi Shekhar Jha, Mayank Shekhar Jha

2604.23574 2026-04-28 cs.CV

PhysLayer: Language-Guided Layered Animation with Depth-Aware Physics

Tianyidan Xie, Zhentao Huang, Mingjie Wang, Xin Huang, Jun Zhou, Minglun Gong, Zili Yi

Comments Accepted to ICME 2026

2604.23570 2026-04-28 cs.RO

EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks

Yihang Li, Xuelong Wei, Jingzhou Luo, Yingjing Xiao, Yibo Bai, Guangyuan Zhou, Teng Zou, Chenguang Gui, Jiajun Wen, He Zhang, Kangliang Chen, Xing Pan, Shuaiyan Liu, Daming Wang, Tao An, Jiayi Li, Shibo Jin, Wanwan Zhang, Tianyu Wang, Boren Wei, Zhixuan Huang, Fangsheng Liu, Ruodai Li, Hui Zhang, Anson Li, Yicheng Gong, Peng Cao, Jiaming Liang, Liang Lin

2604.23552 2026-04-28 cs.LG cs.AI stat.ML

On the Memorization of Consistency Distillation for Diffusion Models

Bingqing Jiang, Difan Zou

Comments 34 pages

2604.23551 2026-04-28 cs.CV

Spatiotemporal Degradation-Aware 3D Gaussian Splatting for Realistic Underwater Scene Reconstruction

Shaohua Liu, Ning Gao, Zuoya Gu, Hongkun Dou, Yue Deng, Hongjue Li

Comments 12 pages, 10 figures, 6 tables. Author version of the paper published in Proceedings of ACM Multimedia 2025

详情

DOI: 10.1145/3746027.3754888
Journal ref: Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM 2025), 2025

英文摘要

Reconstructing realistic underwater scenes from underwater video remains a meaningful yet challenging task in the multimedia domain. The inherent spatiotemporal degradations in underwater imaging, including caustics, flickering, attenuation, and backscattering, frequently result in inaccurate geometry and appearance in existing 3D reconstruction methods. While a few recent works have explored underwater degradation-aware reconstruction, they often address either spatial or temporal degradation alone, falling short in more real-world underwater scenarios where both types of degradation occur. We propose MarineSTD-GS, a novel 3D Gaussian Splatting-based framework that explicitly models both temporal and spatial degradations for realistic underwater scene reconstruction. Specifically, we introduce two paired Gaussian primitives: Intrinsic Gaussians represent the true scene, while Degraded Gaussians render the degraded observations. The color of each Degraded Gaussian is physically derived from its paired Intrinsic Gaussian via a Spatiotemporal Degradation Modeling (SDM) module, enabling self-supervised disentanglement of realistic appearance from degraded images. To ensure stable training and accurate geometry, we further propose a Depth-Guided Geometry Loss and a Multi-Stage Optimization strategy. We also construct a simulated benchmark with diverse spatial and temporal degradations and ground-truth appearances for comprehensive evaluation. Experiments on both simulated and real-world datasets show that MarineSTD-GS robustly handles spatiotemporal degradations and outperforms existing methods in novel view synthesis with realistic, water-free scene appearances.

URL PDF HTML ☆

赞 0 踩 0

2604.23546 2026-04-28 cs.CV cs.AI cs.LG

COMO: Closed-Loop Optical Molecule Recognition with Minimum Risk Training

Zhuoqi Lyu, Qing Ke

2604.23543 2026-04-28 cs.CL cs.AI

Pref-CTRL: Preference Driven LLM Alignment using Representation Editing

Imranul Ashrafi, Inigo Jauregi Unanue, Massimo Piccardi

Comments Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

2604.23542 2026-04-28 cs.CV

AusSmoke meets MultiNatSmoke: a fully-labelled diverse smoke segmentation dataset

Weihao Li, Hongjin Zhao, Gao Zhu, Ge-Peng Ji, Nicholas Wilson, Marta Yebra, Nick Barnes

Comments Accepted to WACV 2026. Project page: https://github.com/henryzhao0615/MultiNatSmoke

2604.23540 2026-04-28 cs.CV

Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization

Haosen Li, Wenshuo Chen, Lei Wang, Shaofeng Liang, Haozhe Jia, Yutao Yue

2604.22709 2026-04-28 cs.CL

Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought

Keshav Ramji, Tahira Naseem, Ramón Fernandez Astudillo

2604.21984 2026-04-28 cs.CV

Soft Anisotropic Diagrams for Differentiable Image Representation

Laki Iinbor, Zhiyang Dou, Wojciech Matusik

2604.21916 2026-04-28 cs.CL cs.SE

MathDuels: Evaluating LLMs as Problem Posers and Solvers

Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik

2604.21718 2026-04-28 cs.CV cs.AI cs.CL cs.LG cs.MM

Building a Precise Video Language with Human-AI Oversight

Zhiqiu Lin, Chancharik Mitra, Siyuan Cen, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Hewei Wang, Irene Pi, Shihang Zhu, Ryan Rao, George Liu, Jiaxi Li, Ruojin Li, Yili Han, Yilun Du, Deva Ramanan

Comments CVPR 2026 Highlight. Project page: https://linzhiqiu.github.io/papers/chai/

详情

英文摘要

Video-language models (VLMs) learn to reason about the dynamic visual world through natural language. We introduce a suite of open datasets, benchmarks, and recipes for scalable oversight that enable precise video captioning. First, we define a structured specification for describing subjects, scenes, motion, spatial, and camera dynamics, grounded by hundreds of carefully defined visual primitives developed with professional video creators such as filmmakers. Next, to curate high-quality captions, we introduce CHAI (Critique-based Human-AI Oversight), a framework where trained experts critique and revise model-generated pre-captions into improved post-captions. This division of labor improves annotation accuracy and efficiency by offloading text generation to models, allowing humans to better focus on verification. Additionally, these critiques and preferences between pre- and post-captions provide rich supervision for improving open-source models (Qwen3-VL) on caption generation, reward modeling, and critique generation through SFT, DPO, and inference-time scaling. Our ablations show that critique quality in precision, recall, and constructiveness, ensured by our oversight framework, directly governs downstream performance. With modest expert supervision, the resulting model outperforms closed-source models such as Gemini-3.1-Pro. Finally, we apply our approach to re-caption large-scale professional videos (e.g., films, commercials, games) and fine-tune video generation models such as Wan to better follow detailed prompts of up to 400 words, achieving finer control over cinematography including camera motion, angle, lens, focus, point of view, and framing. Our results show that precise specification and human-AI oversight are key to professional-level video understanding and generation. Data and code are available on our project page: https://linzhiqiu.github.io/papers/chai/

URL PDF HTML ☆

赞 0 踩 0

2604.21277 2026-04-28 cs.AI

Can MLLMs "Read" What is Missing?

Jindi Guo, Chaozheng Huang, Xi Fang

2604.19930 2026-04-28 cs.LG

Physics-Guided Dimension Reduction for Simulation-Free Operator Learning of Stiff Differential-Algebraic Systems

Huy Hoang Le, Haoguang Wang, Christian Moya, Marcos Netto, Guang Lin

2604.19234 2026-04-28 cs.CV

Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

Rui Li, Ke Hao, Yuanzhi Liang, Haibin Huang, Chi Zhang, Yun Gu, XueLong Li

2604.19139 2026-04-28 cs.CL cs.AI

The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models

Shuai Wu, Xue Li, Yanna Feng, Yufang Li, Zhijun Wang, Ran Wang

Comments 20 pages, 17 figures, 8 tables; code and data available at https://github.com/Noah-Wu66/Vectaix-Research; DOI: 10.5281/zenodo.19767626

2604.18920 2026-04-28 cs.SD cs.CL

Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features

Chenqian Le, Ruisi Li, Beatrice Fumagalli, Yasamin Esmaeili, Xupeng Chen, Amirhossein Khalilian-Gourtani, Tianyu He, Adeen Flinker, Yao Wang

2604.18648 2026-04-28 cs.CV cs.AI

DanceCrafter: Fine-Grained Text-Driven Controllable Dance Generation via Choreographic Syntax

Hang Yuan, Xiaolin Hu, Yan Wan, Menglin Gao, Wenzhe Yu, Cong Huang, Fei Xu, Qing Li, Christina Dan Wang, Zhou Yu, Kai Chen

Comments 22 pages, 13 figures

2604.18471 2026-04-28 cs.LG

NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization

Enshu Liu, Xuefei Ning, Yu Wang, Zinan Lin

Comments Accepted by ICLR 2026

2604.18274 2026-04-28 cs.CV

LiquidTAD: Efficient Temporal Action Detection via Parallel Liquid-Inspired Temporal Relaxation

Zepeng Sun, Naichuan Zheng, Hailun Xia, Junjie Wu, Liwei Bao, Xiaotai Zhang