arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.03283 2026-03-04 cs.CV

Utonia: Toward One Encoder for All Point Clouds

Yujia Zhang, Xiaoyang Wu, Yunhan Yang, Xianzhe Fan, Han Li, Yuechen Zhang, Zehao Huang, Naiyan Wang, Hengshuang Zhao

Comments produced by Pointcept, project page: https://pointcept.github.io/Utonia

详情

英文摘要

We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, a first step toward training a single self-supervised point transformer encoder across diverse domains, spanning remote sensing, outdoor LiDAR, indoor RGB-D sequences, object-centric CAD models, and point clouds lifted from RGB-only videos. Despite their distinct sensing geometries, densities, and priors, Utonia learns a consistent representation space that transfers across domains. This unification improves perception capability while revealing intriguing emergent behaviors that arise only when domains are trained jointly. Beyond perception, we observe that Utonia representations can also benefit embodied and multimodal reasoning: conditioning vision-language-action policies on Utonia features improves robotic manipulation, and integrating them into vision-language models yields gains on spatial reasoning. We hope Utonia can serve as a step toward foundation models for sparse 3D data, and support downstream applications in AR/VR, robotics, and autonomous driving.

URL PDF HTML ☆

赞 0 踩 0

2603.03280 2026-03-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, Jitendra Malik

Comments Project page can be found at https://toruowo.github.io/peel

2603.03279 2026-03-04 cs.RO cs.CV

ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation

Xialin He, Sirui Xu, Xinyao Li, Runpei Dong, Liuyu Bian, Yu-Xiong Wang, Liang-Yan Gui

Comments Project Page: https://ultra-humanoid.github.io/

2603.03278 2026-03-04 cs.RO cs.AI cs.CV

Tether: Autonomous Functional Play with Correspondence-Driven Trajectory Warping

William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Yecheng Jason Ma, Dinesh Jayaraman

Comments International Conference on Learning Representations (ICLR), 2026. Project website and code: https://tether-research.github.io

2603.03276 2026-03-04 cs.CV

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Shengbang Tong, David Fan, John Nguyen, Ellis Brown, Gaoyue Zhou, Shengyi Qian, Boyang Zheng, Théophane Vallaeys, Junlin Han, Rob Fergus, Naila Murray, Marjan Ghazvininejad, Mike Lewis, Nicolas Ballas, Amir Bar, Michael Rabbat, Jakob Verbeek, Luke Zettlemoyer, Koustuv Sinha, Yann LeCun, Saining Xie

Comments Project website at https://beyond-llms.github.io/

2603.03275 2026-03-04 cs.LG

Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

Jessie Z. Li, Zhiqing Hong, Toru Shirakawa, Serina Chang

2603.03265 2026-03-04 cs.CV

DuoMo: Dual Motion Diffusion for World-Space Human Reconstruction

Yufu Wang, Evonne Ng, Soyong Shin, Rawal Khirodkar, Yuan Dong, Zhaoen Su, Jinhyung Park, Kris Kitani, Alexander Richard, Fabian Prada, Michael Zollhofer

Comments CVPR 2026. Project page: https://yufu-wang.github.io/duomo/

2603.03258 2026-03-04 cs.AI

Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals

Achyutha Menon, Magnus Saebo, Tyler Crosse, Spencer Gibson, Eyon Jang, Diogo Cruz

Comments 22 pages, 7 figures. Accepted at ICLR 2026 Lifelong Agents Workshop

2603.03252 2026-03-04 cs.AI

Valet: A Standardized Testbed of Traditional Imperfect-Information Card Games

Mark Goadrich, Achille Morenville, Éric Piette

Comments 12 pages, 1 table, 4 figures

2603.03242 2026-03-04 cs.AI cs.CL

Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals

Patrick Gerard, Svitlana Volkova

Comments 27 Pages

详情

英文摘要

Language models deployed in online communities must adapt to norms that vary across social, cultural, and domain-specific contexts. Prior alignment approaches rely on explicit preference supervision or predefined principles, which are effective for well-resourced settings but exclude most online communities -- particularly those without institutional backing, annotation infrastructure, or organized around sensitive topics -- where preference elicitation is costly, ethically fraught, or culturally misaligned. We observe that communities already express preferences implicitly through what content they accept, engage with, and allow to persist. We show that this acceptance behavior induces measurable geometric structure in representation space: accepted responses occupy coherent, high-density regions that reflect community-specific norms, while rejected content falls in sparser or misaligned areas. We operationalize this structure as an implicit preference signal for alignment and introduce density-guided response optimization (DGRO), a method that aligns language models to community norms without requiring explicit preference labels. Using labeled preference data, we demonstrate that local density recovers pairwise community judgments, indicating that geometric structure encodes meaningful preference signal. We then apply DGRO in annotation-scarce settings across diverse communities spanning platform, topic, and language. DGRO-aligned models consistently produce responses preferred by human annotators, domain experts, and model-based judges over supervised and prompt-based baselines. We position DGRO as a practical alignment alternative for communities where explicit preference supervision is unavailable or misaligned with situated practices, and discuss the implications and risks of learning from emergent acceptance behavior.

URL PDF HTML ☆

赞 0 踩 0

2603.03241 2026-03-04 cs.CV cs.AI

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Zimo Wen, Boxiu Li, Wanbo Zhang, Junxiang Lei, Xiaoyu Chen, Yijia Fan, Qi Zhang, Yujiang Wang, Lili Qiu, Bo Li, Ziwei Liu, Caihua Shan, Yifan Yang, Yifei Shen

2603.03238 2026-03-04 cs.LG cs.NA math.NA physics.comp-ph

On Geometry Regularization in Autoencoder Reduced-Order Models with Latent Neural ODE Dynamics

Mikhail Osipov

Comments 25 pages, 2 figures, 3 tables

2603.03234 2026-03-04 cs.LG

Guiding Sparse Neural Networks with Neurobiological Principles to Elicit Biologically Plausible Representations

Patrick Inoue, Florian Röhrbein, Andreas Knoblauch

2603.03233 2026-03-04 cs.AI

AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework

Zihang Zeng, Jiaquan Zhang, Pengze Li, Yuan Qi, Xi Chen

2603.03230 2026-03-04 cs.LG cs.AI

SynthCharge: An Electric Vehicle Routing Instance Generator with Feasibility Screening to Enable Learning-Based Optimization and Benchmarking

Mertcan Daysalilar, Fuat Uyguroglu, Gabriel Nicolosi, Adam Meyers

Comments This work has been submitted to the IEEE for possible publication

2603.03227 2026-03-04 cs.LG

Coalgebras for categorical deep learning: Representability and universal approximation

Dragan Mašulović

2603.03226 2026-03-04 cs.LG cs.CR

Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective

Enea Monzio Compagnoni, Alessandro Stanghellini, Rustem Islamov, Aurelien Lucchi, Anastasiia Koloskova

Comments Accepted at ICLR 2026 (Poster)

2603.03224 2026-03-04 cs.LG cs.AI

Stabilized Adaptive Loss and Residual-Based Collocation for Physics-Informed Neural Networks

Divyavardhan Singh, Shubham Kamble, Dimple Sonone, Kishor Upla

Comments 6 pages, 2 Figures, 4 tables

2603.03212 2026-03-04 cs.AI

NeuroSkill(tm): Proactive Real-Time Agentic System Capable of Modeling Human State of Mind

Nataliya Kosmyna, Eugene Hauptmann

Comments 36 pages, 18 figures

2603.03207 2026-03-04 cs.LG

I-CAM-UV: Integrating Causal Graphs over Non-Identical Variable Sets Using Causal Additive Models with Unobserved Variables

Hirofumi Suzuki, Kentaro Kanamori, Takuya Takagi, Thong Pham, Takashi Nicholas Maeda, Shohei Shimizu

Comments 16 pages, 22 figures, to appear in the 40th AAAI Conference on Artificial Intelligence (AAAI 2026)

2603.03206 2026-03-04 cs.LG cs.AI cs.CL

Understanding and Mitigating Dataset Corruption in LLM Steering

Cullen Anderson, Narmeen Oozeer, Foad Namjoo, Remy Ogasawara, Amirali Abdullah, Jeff M. Phillips

2603.03198 2026-03-04 cs.RO cs.CL cs.CV

ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments

Ziyang Gong, Zehang Luo, Anke Tang, Zhe Liu, Shi Fu, Zhi Hou, Ganlin Yang, Weiyun Wang, Xiaofeng Wang, Jianbo Liu, Gen Luo, Haolan Kang, Shuang Luo, Yue Zhou, Yong Luo, Li Shen, Xiaosong Jia, Yao Mu, Xue Yang, Chunxiao Liu, Junchi Yan, Hengshuang Zhao, Dacheng Tao, Xiaogang Wang

Comments Code: https://github.com/ACE-BRAIN-Team/ACE-Brain-0 Hugging Face: https://huggingface.co/ACE-Brain/ACE-Brain-0-8B

2603.03195 2026-03-04 cs.CV cs.AI cs.RO

Chain of World: World Model Thinking in Latent Motion

Fuxiang Yang, Donglin Di, Lulu Tang, Xuancheng Zhang, Lei Fan, Hao Li, Chen Wei, Tonghua Su, Baorui Ma

Comments Accepted by CVPR2026. Project page: https://fx-hit.github.io/cowvla-io/

2603.03177 2026-03-04 cs.AI

Neuro-Symbolic Artificial Intelligence: A Task-Directed Survey in the Black-Box Models Era

Giovanni Pio Delvecchio, Lorenzo Molfetta, Gianluca Moro

Comments Accepted for publication at IJCAI-25. Please cite the definitive, copyrighted, peer reviewed and edited version of this Article published in IJCAI 25, pp. 4196-4176, 2025. DOI: https://doi.org/10.24963/ijcai.2025/1157

2603.03176 2026-03-04 cs.AI

FEAST: Retrieval-Augmented Multi-Hierarchical Food Classification for the FoodEx2 System

Lorenzo Molfetta, Alessio Cocchieri, Stefano Fantazzini, Giacomo Frisoni, Luca Ragazzi, Gianluca Moro

Comments Accepted for publication at ECAI 2025. Please cite the definitive, copyrighted, peer reviewed and edited version of this Article published in ECAI 2025, edited by I. Lynce et al., FAIA, pp. 4169-4176, 2025. DOI: https://doi.org/10.3233/FAIA251309

详情

DOI: 10.3233/FAIA251309
Journal ref: ECAI 2025: 28th European Conference on Artificial Intelligence, Frontiers in Artificial Intelligence and Applications (FAIA), 2025, pages 4169-4176

英文摘要

Hierarchical text classification (HTC) and extreme multi-label classification (XML) tasks face compounded challenges from complex label interdependencies, data sparsity, and extreme output dimensions. These challenges are exemplified in the European Food Safety Authority's FoodEx2 system-a standardized food classification framework essential for food consumption monitoring and contaminant exposure assessment across Europe. FoodEx2 coding transforms natural language food descriptions into a set of codes from multiple standardized hierarchies, but faces implementation barriers due to its complex structure. Given a food description (e.g., "organic yogurt''), the system identifies its base term ("yogurt''), all the applicable facet categories (e.g., "production method''), and then, every relevant facet descriptors to each category (e.g., "organic production''). While existing models perform adequately on well-balanced and semantically dense hierarchies, no work has been applied on the practical constraints imposed by the FoodEx2 system. The limited literature addressing such real-world scenarios further compounds these challenges. We propose FEAST (Food Embedding And Semantic Taxonomy), a novel retrieval-augmented framework that decomposes FoodEx2 classification into a three-stage approach: (1) base term identification, (2) multi-label facet prediction, and (3) facet descriptor assignment. By leveraging the system's hierarchical structure to guide training and performing deep metric learning, FEASTlearns discriminative embeddings that mitigate data sparsity and improve generalization on rare and fine-grained labels. Evaluated on the multilingual FoodEx2 benchmark, FEAST outperforms the prior European's CNN baseline F1 scores by 12-38 % on rare classes.

URL PDF HTML ☆

赞 0 踩 0

2603.03175 2026-03-04 cs.AI

Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification

Aman Kumar, Deepak Narayan Gadde, Luu Danh Minh, Vaisakh Naduvodi Viswambharan, Keerthan Kopparam Radhakrishna, Sivaram Pothireddypalli

Comments Published at the DVCon U.S. 2026

2603.03172 2026-03-04 cs.LG

Less Noise, Same Certificate: Retain Sensitivity for Unlearning

Carolin Heinzler, Kasra Malihi, Amartya Sanyal

2603.03163 2026-03-04 cs.CV cs.AI

Conditioned Activation Transport for T2I Safety Steering

Maciej Chrabąszcz, Aleksander Szymczyk, Jan Dubiński, Tomasz Trzciński, Franziska Boenisch, Adam Dziedzic

2603.03160 2026-03-04 cs.CV

Kling-MotionControl Technical Report

Kling Team, Jialu Chen, Yikang Ding, Zhixue Fang, Kun Gai, Kang He, Xu He, Jingyun Hua, Mingming Lao, Xiaohan Li, Hui Liu, Jiwen Liu, Xiaoqiang Liu, Fan Shi, Xiaoyu Shi, Peiqin Sun, Songlin Tang, Pengfei Wan, Tiancheng Wen, Zhiyong Wu, Haoxian Zhang, Runze Zhao, Yuanxing Zhang, Yan Zhou

Comments Access: https://app.klingai.com/global/video-motion-control/new

2603.03158 2026-03-04 cs.SD cs.AI

An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization

Epshita Jahan, Khandoker Md Tanjinul Islam, Pritom Biswas, Tafsir Al Nafin

Comments 5 pages, 2 figures