arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.24684 2026-03-27 cs.CV

KitchenTwin: Semantically and Geometrically Grounded 3D Kitchen Digital Twins

Quanyun Wu, Kyle Gao, Daniel Long, David A. Clausi, Jonathan Li, Yuhao Chen

详情

英文摘要

Embodied AI training and evaluation require object-centric digital twin environments with accurate metric geometry and semantic grounding. Recent transformer-based feedforward reconstruction methods can efficiently predict global point clouds from sparse monocular videos, yet these geometries suffer from inherent scale ambiguity and inconsistent coordinate conventions. This mismatch prevents the reliable fusion of these dimensionless point cloud predictions with locally reconstructed object meshes. We propose a novel scale-aware 3D fusion framework that registers visually grounded object meshes with transformer-predicted global point clouds to construct metrically consistent digital twins. Our method introduces a Vision-Language Model (VLM)-guided geometric anchor mechanism that resolves this fundamental coordinate mismatch by recovering an accurate real-world metric scale. To fuse these networks, we propose a geometry-aware registration pipeline that explicitly enforces physical plausibility through gravity-aligned vertical estimation, Manhattan-world structural constraints, and collision-free local refinement. Experiments on real indoor kitchen environments demonstrate improved cross-network object alignment and geometric consistency for downstream tasks, including multi-primitive fitting and metric measurement. We additionally introduce an open-source indoor digital twin dataset with metrically scaled scenes and semantically grounded and registered object-centric mesh annotations.

URL PDF HTML ☆

赞 0 踩 0

2603.24676 2026-03-27 cs.AI cond-mat.dis-nn cond-mat.stat-mech physics.bio-ph physics.soc-ph

When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

Hidenori Tanaka

Comments 19 pages, 10 figures

2603.24653 2026-03-27 cs.CV

From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

Francesco Gentile, Nicola Dall'Asen, Francesco Tonini, Massimiliano Mancini, Lorenzo Vaquero, Elisa Ricci

Comments Accepted @ CVPR 2026. Project page: https://frangente.github.io/SITH/

2603.24651 2026-03-27 cs.CL cs.AI cs.SD eess.AS

When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Hasindri Watawana, Sergio Burdisso, Diego A. Moreno-Galván, Fernando Sánchez-Vega, A. Pastor López-Monroy, Petr Motlicek, Esaú Villatoro-Tello

Comments Accepted to LREC 2026 Conference

2603.24648 2026-03-27 cs.LG

Energy-Efficient Hierarchical Federated Anomaly Detection for the Internet of Underwater Things via Selective Cooperative Aggregation

Kenechi Omeke, Michael Mollel, Lei Zhang, Qammer H. Abbasi, Muhammad Ali Imran

2603.24644 2026-03-27 cs.LG

Physics-Informed Neural Network Digital Twin for Dynamic Tray-Wise Modeling of Distillation Columns under Transient Operating Conditions

Debadutta Patra, Ayush Bardhan Tripathy, Soumya Ranjan Sahu, Sucheta Panda

Comments 17 pages, 10 figures

2603.24641 2026-03-27 cs.LG cs.NA math.NA physics.flu-dyn

Learning Mesh-Free Discrete Differential Operators with Self-Supervised Graph Neural Networks

Lucas Gerken Starepravo, Georgios Fourtakas, Steven Lind, Ajay B. Harish, Tianning Tang, Jack R. C. King

2603.24638 2026-03-27 cs.LG cond-mat.mtrl-sci physics.chem-ph physics.comp-ph stat.ML

How unconstrained machine-learning models learn physical symmetries

Michelangelo Domina, Joseph William Abbott, Paolo Pegolo, Filippo Bigi, Michele Ceriotti

Comments 15 pages, 9 figures

2603.24636 2026-03-27 cs.LG cs.AI

DyMRL: Dynamic Multispace Representation Learning for Multimodal Event Forecasting in Knowledge Graph

Feng Zhao, Kangzheng Liu, Teng Peng, Yu Yang, Guandong Xu

Comments Accepted to The ACM Web Conference 2026 (WWW '26). This version is published under a CC BY license

详情

英文摘要

Accurate representation of multimodal knowledge is crucial for event forecasting in real-world scenarios. However, existing studies have largely focused on static settings, overlooking the dynamic acquisition and fusion of multimodal knowledge. 1) At the knowledge acquisition level, how to learn time-sensitive information of different modalities, especially the dynamic structural modality. Existing dynamic learning methods are often limited to shallow structures across heterogeneous spaces or simple unispaces, making it difficult to capture deep relation-aware geometric features. 2) At the knowledge fusion level, how to learn evolving multimodal fusion features. Existing knowledge fusion methods based on static coattention struggle to capture the varying historical contributions of different modalities to future events. To this end, we propose DyMRL, a Dynamic Multispace Representation Learning approach to efficiently acquire and fuse multimodal temporal knowledge. 1) For the former issue, DyMRL integrates time-specific structural features from Euclidean, hyperbolic, and complex spaces into a relational message-passing framework to learn deep representations, reflecting human intelligences in associative thinking, high-order abstracting, and logical reasoning. Pretrained models endow DyMRL with time-sensitive visual and linguistic intelligences. 2) For the latter concern, DyMRL incorporates advanced dual fusion-evolution attention mechanisms that assign dynamic learning emphases equally to different modalities at different timestamps in a symmetric manner. To evaluate DyMRL's event forecasting performance through leveraging its learned multimodal temporal knowledge in history, we construct four multimodal temporal knowledge graph benchmarks. Extensive experiments demonstrate that DyMRL outperforms state-of-the-art dynamic unimodal and static multimodal baseline methods.

URL PDF HTML ☆

赞 0 踩 0

2603.24402 2026-03-27 cs.AI

AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model

Yunbo Long

详情

英文摘要

Existing automated research systems operate as stateless, linear pipelines -- generating outputs without maintaining any persistent understanding of the research landscape they navigate. They process papers sequentially, propose ideas without structured gap analysis, and lack mechanisms for agents to verify, challenge, or refine each other's findings. We present \textbf{AI-Supervisor}, a multi-agent orchestration framework where specialized agents provide end-to-end AI research supervision driven by human interests -- from literature review through gap discovery, method development, evaluation, and paper writing -- through autonomous exploration and self-correcting updates of research knowledge. Unlike sequential pipelines, AI-Supervisor maintains a continuously evolving \emph{Research World Model}, implemented as a Knowledge Graph, that captures methods, benchmarks, known limitations, and unexplored gaps, serving as shared memory across all agents and enabling agents to explore and build upon a structured understanding of the research landscape. The framework introduces three architectural contributions: (1) \emph{structured gap discovery} that decomposes methods into core modules, validates their performance across benchmarks, and maps the specific gaps each module creates; (2) \emph{self-correcting discovery loops} that probe why modules succeed on certain problems and fail on others, whether benchmarks carry hidden biases, and whether evaluation protocols remain adequate for emerging challenges; and (3) \emph{self-improving development loops} governed by cross-domain mechanism search that iteratively targets failing modules by finding solutions from other scientific fields. All agents operate under a \emph{consensus mechanism} where independent findings are corroborated before being committed to the Research World Model.

URL PDF HTML ☆

赞 0 踩 0

2603.24278 2026-03-27 cs.CV

TopoMesh: High-Fidelity Mesh Autoencoding via Topological Unification

Guan Luo, Xiu Li, Rui Chen, Xuanyu Yi, Jing Lin, Chia-Hao Chen, Jiahang Liu, Song-Hai Zhang, Jianfeng Zhang

Comments Accepted to CVPR 2026. Project page: https://logan0601.github.io/projects/topomesh/index.html

2603.23953 2026-03-27 cs.CV cs.ET

VOLMO: Versatile and Open Large Models for Ophthalmology

Zhenyue Qin, Younjoon Chung, Elijah Lee, Wanyue Feng, Xuguang Ai, Serina Applebaum, Minjie Zou, Yang Liu, Pan Xiao, Mac Singer, Amisha Dave, Aidan Gilson, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih-Chung Tham, Ron Adelman, Luciano V. Del Priore, Qingyu Chen

2603.23906 2026-03-27 cs.CV

GenMask: Adapting DiT for Segmentation via Direct Mask Generation

Yuhuan Yang, Xianwei Zhuang, Yuxuan Cai, Chaofan Ma, Shuai Bai, Jiangchao Yao, Ya Zhang, Junyang Lin, Yanfeng Wang

Comments Accepted by cvpr 2026

2603.22893 2026-03-27 cs.CV

SLARM: Streaming and Language-Aligned Reconstruction Model for Dynamic Scenes

Zhicheng Qiu, Jiarui Meng, Tong-an Luo, Yican Huang, Xuan Feng, Xuanfu Li, ZHan Xu

2603.22225 2026-03-27 cs.CL cs.SD eess.AS

Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease

Abner Hernandez, Eunjung Yeo, Kwanghee Choi, Chin-Jou Li, Zhengjun Yue, Rohan Kumar Das, Jan Rusz, Mathew Magimai Doss, Juan Rafael Orozco-Arroyave, Tomás Arias-Vergara, Andreas Maier, Elmar Nöth, David R. Mortensen, David Harwath, Paula Andrea Perez-Toro

Comments Submitted to Interspeech 2026

2603.21208 2026-03-27 cs.CV cs.LG

JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution Optimization

Haolun Zheng, Yu He, Tailun Chen, Shuo Shao, Zhixuan Chu, Hongbin Zhou, Lan Tao, Zhan Qin, Kui Ren

Comments This paper is accepted by the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026. 18 pages, 8 figures

2603.18866 2026-03-27 cs.AI

Conflict-Based Search for Multi Agent Path Finding with Asynchronous Actions

Xuemian Wu, Shizhe Zhao, Zhongqiang Ren

Comments 9 pages, 10 figures. Accepted at AAMAS 2026

2603.18596 2026-03-27 cs.LG cs.AI cs.CV

Elastic Weight Consolidation Done Right for Continual Learning

Xuan Liu, Xiaobin Chang

Comments Accepted to CVPR 2026

2603.16179 2026-03-27 cs.CV cs.AI

360° Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method

Huyen T. T. Tran, Van-Quang Nguyen, Farros Alferro, Kang-Jun Liu, Takayuki Okatani

2603.14023 2026-03-27 cs.CV

High-speed Imaging through Turbulence with Event-based Light Fields

Yu-Hsiang Huang, Levi Burner, Sachin Shah, Ziyuan Qu, Adithya Pediredla, Christopher A. Metzler

2603.11412 2026-03-27 cs.CL

Algorithmic Consequences of Particle Filters for Sentence Processing: Amplified Garden-Paths and Digging-In Effects

Amani Maina-Kilaas, Roger Levy

Comments 10 pages, 4 figures; replacement adds minor clarification and directs readers toward relevant work

2603.04820 2026-03-27 cs.CL cs.CY

Autoscoring Anticlimax: A Meta-analytic Understanding of AI's Short-answer Shortcomings and Wording Weaknesses

Michael Hardy

2603.04733 2026-03-27 cs.CV

FOZO: Forward-Only Zeroth-Order Prompt Optimization for Test-Time Adaptation

Xingyu Wang, Tao Wang

Comments Accepted to CVPR 2026

2602.23765 2026-03-27 cs.SD eess.AS

DashengTokenizer: One layer is enough for unified audio understanding and generation

Heinrich Dinkel, Xingwei Sun, Gang Li, Jiahao Mei, Yadong Niu, Jizhong Liu, Xiyang Li, Yifan Liao, Jiahao Zhou, Junbo Zhang, Jian Luan

Comments Added ACAVCaps reference

2602.23022 2026-03-27 cs.CV

DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis

Xinglong Luo, Ao Luo, Zhengning Wang, Yueqi Yang, Chaoyu Feng, Lei Lei, Bing Zeng, Shuaicheng Liu

Comments Accepted by CVPR 2026

2602.22666 2026-03-27 cs.CV

ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals

Xuelu Li, Zhaonan Wang, Xiaogang Wang, Lei Wu, Manyi Li, Changhe Tu

2602.20951 2026-03-27 cs.CV cs.AI

See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

Jaehyun Park, Minyoung Ahn, Minkyu Kim, Jonghyun Lee, Jae-Gil Lee, Dongmin Park

2602.04819 2026-03-27 cs.CV cs.LG

XtraLight-MedMamba for Classification of Neoplastic Tubular Adenomas

Aqsa Sultana, Rayan Afsar, Ahmed Rahu, Surendra P. Singh, Brian Shula, Brandon Combs, Derrick Forchetti, Vijayan K. Asari

Comments 18 pages, 11 figures

2602.01939 2026-03-27 cs.RO cs.AI

Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy

Yuxin He, Ruihao Zhang, Tianao Shen, Cheng Liu, Qiang Nie

Comments ICRA 2026

2602.00079 2026-03-27 cs.LG cs.CV

Embedding Compression via Spherical Coordinates

Han Xiao

Comments Accepted at ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). 13 pages, 2 figures. Code: https://github.com/jina-ai/jzip