arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.26033 2026-03-30 cs.CV

Knowledge is Power: Advancing Few-shot Action Recognition with Multimodal Semantics from MLLMs

Jiazheng Xing, Chao Xu, Hangjie Yuan, Mengmeng Wang, Jun Dan, Hangwei Qian, Yong Liu

详情

英文摘要

Multimodal Large Language Models (MLLMs) have propelled the field of few-shot action recognition (FSAR). However, preliminary explorations in this area primarily focus on generating captions to form a suboptimal feature->caption->feature pipeline and adopt metric learning solely within the visual space. In this paper, we propose FSAR-LLaVA, the first end-to-end method to leverage MLLMs (such as Video-LLaVA) as a multimodal knowledge base for directly enhancing FSAR. First, at the feature level, we leverage the MLLM's multimodal decoder to extract spatiotemporally and semantically enriched representations, which are then decoupled and enhanced by our Multimodal Feature-Enhanced Module into distinct visual and textual features that fully exploit their semantic knowledge for FSAR. Next, we leverage the versatility of MLLMs to craft input prompts that flexibly adapt to diverse scenarios, and use their aligned outputs to drive our designed Composite Task-Oriented Prototype Construction, effectively bridging the distribution gap between meta-train and meta-test sets. Finally, to enable multimodal features to guide metric learning jointly, we introduce a training-free Multimodal Prototype Matching Metric that adaptively selects the most decisive cues and efficiently leverages the decoupled feature representations produced by MLLMs. Extensive experiments demonstrate superior performance across various tasks with minimal trainable parameters.

URL PDF HTML ☆

赞 0 踩 0

2603.26030 2026-03-30 cs.LG

Constitutive parameterized deep energy method for solid mechanics problems with random material parameters

Zhangyong Liang, Huanhuan Gao

详情

英文摘要

In practical structural design and solid mechanics simulations, material properties inherently exhibit random variations within bounded intervals. However, evaluating mechanical responses under continuous material uncertainty remains a persistent challenge. Traditional numerical approaches, such as the Finite Element Method (FEM), incur prohibitive computational costs as they require repeated mesh discretization and equation solving for every parametric realization. Similarly, data-driven surrogate models depend heavily on massive, high-fidelity datasets, while standard physics-informed frameworks (e.g., the Deep Energy Method) strictly demand complete retraining from scratch whenever material parameters change. To bridge this critical gap, we propose the Constitutive Parameterized Deep Energy Method (CPDEM). In this purely physics-driven framework, the strain energy density functional is reformulated by encoding a latent representation of stochastic constitutive parameters. By embedding material parameters directly into the neural network alongside spatial coordinates, CPDEM transforms conventional spatial collocation points into parameter-aware material points. Trained in an unsupervised manner via expected energy minimization over the parameter domain, the pre-trained model continuously learns the solution manifold. Consequently, it enables zero-shot, real-time inference of displacement fields for unknown material parameters without requiring any dataset generation or model retraining. The proposed method is rigorously validated across diverse benchmarks, including linear elasticity, finite-strain hyperelasticity, and complex highly nonlinear contact mechanics. To the best of our knowledge, CPDEM represents the first purely physics-driven deep learning paradigm capable of simultaneously and efficiently handling continuous multi-parameter variations in solid mechanics.

URL PDF HTML ☆

赞 0 踩 0

2603.26024 2026-03-30 cs.LG cs.LO

Identification of Bivariate Causal Directionality Based on Anticipated Asymmetric Geometries

Alex Glushkovsky

Comments 12 pages, 8 figure, 3 tables

详情

英文摘要

Identification of causal directionality in bivariate numerical data is a fundamental research problem with important practical implications. This paper presents two alternative methods to identify direction of causation by considering conditional distributions: (1) Anticipated Asymmetric Geometries (AAG) and (2) Monotonicity Index. The AAG method compares the actual conditional distributions to anticipated ones along two variables. Different comparison metrics, such as correlation, cosine similarity, Jaccard index, K-L divergence, K-S distance, and mutual information have been evaluated. Anticipated distributions have been projected as normal based on dual response statistics: mean and standard deviation. The Monotonicity Index approach compares the calculated monotonicity indexes of the gradients of conditional distributions along two axes and exhibits counts of gradient sign changes. Both methods assume stochastic properties of the bivariate data and exploit anticipated unimodality of conditional distributions of the effect. It turns out that the tuned AAG method outperforms the Monotonicity Index and reaches a top accuracy of 77.9% compared to ANMs accuracy of 63 +/- 10% when classifying 95 pairs of real-world examples (Mooij et al, 2014). The described methods include a number of hyperparameters that impact accuracy of the identification. For a given set of hyperparameters, both the AAG or Monotonicity Index method provide a unique deterministic outcome of the solution. To address sensitivity to hyperparameters, tuning of hyperparameters has been done by utilizing a full factorial Design of Experiment. A decision tree has been fitted to distinguish misclassified cases using the input data's symmetrical bivariate statistics to address the question of: How decisive is the identification method of causal directionality?

URL PDF HTML ☆

赞 0 踩 0

2603.26023 2026-03-30 cs.LG

GLU: Global-Local-Uncertainty Fusion for Scalable Spatiotemporal Reconstruction and Forecasting

Linzheng Wang, Jason Chen, Nicolas Tricard, Zituo Chen, Sili Deng

2603.26019 2026-03-30 cs.CV cs.AI

Unlabeled Cross-Center Automatic Analysis for TAAD: An Integrated Framework from Segmentation to Clinical Features

Mengdi Liu, Qiang Li, Weizhi Nie, Shaopeng Zhang, Yuting Su

详情

英文摘要

Type A Aortic Dissection (TAAD) is a life-threatening cardiovascular emergency that demands rapid and precise preoperative evaluation. While key anatomical and pathological features are decisive for surgical planning, current research focuses predominantly on improving segmentation accuracy, leaving the reliable, quantitative extraction of clinically actionable features largely under-explored. Furthermore, constructing comprehensive TAAD datasets requires labor-intensive, expert level pixel-wise annotations, which is impractical for most clinical institutions. Due to significant domain shift, models trained on a single center dataset also suffer from severe performance degradation during cross-institutional deployment. This study addresses a clinically critical challenge: the accurate extraction of key TAAD clinical features during cross-institutional deployment in the total absence of target-domain annotations. To this end, we propose an unsupervised domain adaptation (UDA)-driven framework for the automated extraction of TAAD clinical features. The framework leverages limited source-domain labels while effectively adapting to unlabeled data from target domains. Tailored for real-world emergency workflows, our framework aims to achieve stable cross-institutional multi-class segmentation, reliable and quantifiable clinical feature extraction, and practical deployability independent of high-cost annotations. Extensive experiments demonstrate that our method significantly improves cross-domain segmentation performance compared to existing state-of-the-art approaches. More importantly, a reader study involving multiple cardiovascular surgeons confirms that the automatically extracted clinical features provide meaningful assistance for preoperative assessment, highlighting the practical utility of the proposed end-to-end segmentation-to-feature pipeline.

URL PDF HTML ☆

赞 0 踩 0

2603.26018 2026-03-30 cs.CV cs.RO

GeoReFormer: Geometry-Aware Refinement for Lane Segment Detection and Topology Reasoning

Danny Abraham, Nikhil Kamalkumar Advani, Arun Das, Nikil Dutt

Comments 8 pages, 6 figures

2603.26017 2026-03-30 cs.LG

QuitoBench: A High-Quality Open Time Series Forecasting Benchmark

Siqiao Xue, Zhaoyang Zhu, Wei Zhang, Rongyao Cai, Rui Wang, Yixiang Mu, Fan Zhou, Jianguo Li, Peng Di, Hang Yu

Comments project site: https://hq-bench.github.io/quito/

2603.25538 2026-03-30 cs.LG cs.SE

Missing-Aware Multimodal Fusion for Unified Microservice Incident Management

Wenzhuo Qian, Hailiang Zhao, Ziqi Wang, Zhipeng Gao, Jiayi Chen, Zhiwei Ling, Shuiguang Deng

2603.25406 2026-03-30 cs.RO

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

Yang Liu, Pengxiang Ding, Tengyue Jiang, Xudong Wang, Wenxuan Song, Minghui Lin, Han Zhao, Hongyin Zhang, Zifeng Zhuang, Wei Zhao, Siteng Huang, Jinkui Shi, Donglin Wang

2603.25377 2026-03-30 cs.SD

Joint Learning Global-Local Speaker Classification to Enhance End-to-End Speaker Diarization and Recognition

Yuhang Dai, Haopeng Lin, Jiale Qian, Ruiqi Yan, Hao Meng, Hanke Xie, Hanlin Wen, Shunshun Yin, Ming Tao, Xie Chen, Lei Xie, Xinsheng Wang

Comments 5 pages, 2 figures, 2 tables

2603.25197 2026-03-30 cs.AI cs.ET cs.HC cs.RO cs.SE

The Competence Shadow: Theory and Bounds of AI Assistance in Safety Engineering

Umair Siddique

Comments 8 Pages, 3 Figures, 2 table

2603.25051 2026-03-30 cs.CL

Approaches to Analysing Historical Newspapers Using LLMs

Filip Dobranić, Tina Munda, Oliver Pejić, Vojko Gorjanc, Uroš Šmajdek, David Bordon, Jakob Lenardič, Tjaša Konovšek, Kristina Pahor de Maiti Tekavčič, Ciril Bohak, Darja Fišer

详情

英文摘要

This study presents a computational analysis of the Slovene historical newspapers \textit{Slovenec} and \textit{Slovenski narod} from the sPeriodika corpus, combining topic modelling, large language model (LLM)-based aspect-level sentiment analysis, entity-graph visualisation, and qualitative discourse analysis to examine how collective identities, political orientations, and national belonging were represented in public discourse at the turn of the twentieth century. Using BERTopic, we identify major thematic patterns and show both shared concerns and clear ideological differences between the two newspapers, reflecting their conservative-Catholic and liberal-progressive orientations. We further evaluate four instruction-following LLMs for targeted sentiment classification in OCR-degraded historical Slovene and select the Slovene-adapted GaMS3-12B-Instruct model as the most suitable for large-scale application, while also documenting important limitations, particularly its stronger performance on neutral sentiment than on positive or negative sentiment. Applied at dataset scale, the model reveals meaningful variation in the portrayal of collective identities, with some groups appearing predominantly in neutral descriptive contexts and others more often in evaluative or conflict-related discourse. We then create NER graphs to explore the relationships between collective identities and places. We apply a mixed methods approach to analyse the named entity graphs, combining quantitative network analysis with critical discourse analysis. The investigation focuses on the emergence and development of intertwined historical political and socionomic identities. Overall, the study demonstrates the value of combining scalable computational methods with critical interpretation to support digital humanities research on noisy historical newspaper data.

URL PDF HTML ☆

赞 0 踩 0

2603.25037 2026-03-30 cs.CV physics.geo-ph

GeoNDC: A Queryable Neural Data Cube for Planetary-Scale Earth Observation

Jianbo Qi, Mengyao Li, Baogui Jiang, Yidan Chen, Xihan Mu, Qiao Wang

Comments 22 pages, 8 figures

2603.24994 2026-03-30 cs.CV

Relaxed Rigidity with Ray-based Grouping for Dynamic Gaussian Splatting

Junoh Lee, Junmyeong Lee, Yeon-Ji Song, Inhwan Bae, Jisu Shin, Hae-Gon Jeon, Jin-Hwa Kim

Comments 24 pages, 7 figures

2603.24989 2026-03-30 cs.RO cs.AI

Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

Ziyan Wang, Peng Chen, Ding Li, Chiwei Li, Qichao Zhang, Zhongpu Xia, Guizhen Yu

2603.24124 2026-03-30 cs.LG cs.AI cs.CL

The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation

Mingyi Liu

Comments 25 pages, 3 figures, 10 tables, 24 experiments across 5 benchmarks. v2: added SINdex head-to-head (Exp 27), NLI validation (Exp 28), decoding protocol analysis. Code: https://github.com/DigitLion/ucbd-experiment

2603.24012 2026-03-30 cs.CL

CVPD at QIAS 2026: RAG-Guided LLM Reasoning for Al-Mawarith Share Computation and Heir Allocation

Wassim Swaileh, Mohammed-En-Nadhir Zighem, Hichem Telli, Salah Eddine Bekhouche, Abdellah Zakaria Sellam, Fadi Dornaika, Dimitrios Kotzinos

2603.23610 2026-03-30 cs.AI

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

Yenchia Feng, Chirag Sharma, Karime Maamari

Comments 9 pages, 5 figures, accepted to ICLR 2026 the 2nd Workshop on World Models; updated formatting issue

2603.23533 2026-03-30 cs.CL cs.AI cs.IR cs.LG

MDKeyChunker: Single-Call LLM Enrichment with Rolling Keys and Key-Based Restructuring for High-Accuracy RAG

Bhavik Mangla

Comments 13 pages, 4 figures, 7 tables, 2 algorithms. Code: https://github.com/bhavik-mangla/MDKeyChunker

2603.23376 2026-03-30 cs.CV cs.RO

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu

Comments Code: https://github.com/amap-cvlab/ABot-PhysWorld.git

2603.22755 2026-03-30 cs.CL cs.AI cs.LG

KALAVAI: Predicting When Independent Specialist Fusion Works -- A Quantitative Model for Post-Hoc Cooperative LLM Training

Ramchand Kumaresan

2603.22687 2026-03-30 cs.CV

GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning

Jiayin Sun, Caixia Sun, Boyu Yang, Hailin Li, Xiao Chen, Yi Zhang, Errui Ding, Liang Li, Chao Deng, Junlan Feng

Comments accepted by CVPR 2026

2603.22325 2026-03-30 cs.LG cs.AI

Hybrid Associative Memories

Leon Lufkin, Tomás Figliolia, Beren Millidge, Kamesh Krishnamurthy

Comments 30 pages, 10 figures

2603.21723 2026-03-30 cs.RO cs.MA

Can a Robot Walk the Robotic Dog: Triple-Zero Collaborative Navigation for Heterogeneous Multi-Agent Systems

Yaxuan Wang, Yifan Xiang, Ke Li, Xun Zhang, BoWen Ye, Zhuochen Fan, Fei Wei, Tong Yang

Comments 8 pages, 2 figures

2603.21606 2026-03-30 cs.LG cs.AI

mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT

Woosung Koh, Jeyoung Jeon, Youngjin Song, Yujin Cheon, Soowon Oh, Jaehyeong Choi, Se-Young Yun

Comments Pre-print (newer versions are minor edits)

2603.21077 2026-03-30 cs.CV

CoVFT: Context-aware Visual Fine-tuning for Multimodal Large Language Models

Nan Zhou, Huiqun Wang, Yaoyan Zheng, Di Huang

Comments Accepted by CVPR 2026

2603.20907 2026-03-30 cs.CL

The Hidden Puppet Master: Predicting Human Belief Change in Manipulative LLM Dialogues

Jocelyn Shen, Amina Luvsanchultem, Jessica Kim, Kynnedy Smith, Valdemar Danry, Kantwon Rogers, Hae Won Park, Maarten Sap, Cynthia Breazeal

2603.20833 2026-03-30 cs.AI

Governance-Aware Vector Subscriptions for Multi-Agent Knowledge Ecosystems

Steven Johnson

Comments 12 pages, 7 tables. Code and benchmark available at https://github.com/StevenJohnson998/AIngram

2603.20778 2026-03-30 cs.CV

PiLoT: Neural Pixel-to-3D Registration for UAV-based Ego and Target Geo-localization

Xiaoya Cheng, Long Wang, Yan Liu, Xinyi Liu, Hanlin Tan, Yu Liu, Maojun Zhang, Shen Yan

2603.19562 2026-03-30 cs.LG cs.IT math.IT physics.comp-ph

Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

Dong-Xiao Zhang, Hu Lou, Jun-Jie Zhang, Jun Zhu, Deyu Meng

Comments 16 pages,3 figures