arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.08921 2026-04-13 cs.CV

TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction

Ao Li, Yonggen Ling, Yiyang Lin, Yuji Wang, Yong Deng, Yansong Tang

详情

英文摘要

Accurate 3D human keypoints localization is a critical technology enabling robots to achieve natural and safe physical interaction with users. Conventional 3D human keypoints estimation methods primarily focus on the whole-body reconstruction quality relative to the root joint. However, in practical human-robot interaction (HRI) scenarios, robots are more concerned with the precise metric-scale spatial localization of task-relevant body parts under the egocentric camera 3D coordinate. We propose TAIHRI, the first Vision-Language Model (VLM) tailored for close-range HRI perception, capable of understanding users' motion commands and directing the robot's attention to the most task-relevant keypoints. By quantizing 3D keypoints into a finite interaction space, TAIHRI precisely localize the 3D spatial coordinates of critical body parts by 2D keypoint reasoning via next token prediction, and seamlessly adapt to downstream tasks such as natural language control or global space human mesh recovery. Experiments on egocentric interaction benchmarks demonstrate that TAIHRI achieves superior estimation accuracy for task-critical body parts. We believe TAIHRI opens new research avenues in the field of embodied human-robot interaction. Code is available at: https://github.com/Tencent/TAIHRI.

URL PDF HTML ☆

赞 0 踩 0

2604.08916 2026-04-13 cs.CV

MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation

Yibo Zhao, Yigong Zhang, Jin Xie

2604.08915 2026-04-13 cs.CV cs.AI

Large-Scale Universal Defect Generation: Foundation Models and Datasets

Yuanting Fan, Jun Liu, Bin-Bin Gao, Xiaochen Chen, Yuhuan Lin, Zhewei Dai, Jiawei Zhan, Chengjie Wang

Comments 25 pages, 13 figures, preprint

2604.08905 2026-04-13 cs.AI cs.LG

StaRPO: Stability-Augmented Reinforcement Policy Optimization

Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Ruimin Dai, Xiaoyan Han, Yanjie Fu, Dakuo Wang, Kunpeng Liu

2604.08903 2026-04-13 cs.CV

Fast Model-guided Instance-wise Adaptation Framework for Real-world Pansharpening with Fidelity Constraints

Zhiqi Yang, Jin-Liang Xiao, Shan Yin, Liang-Jian Deng, Gemine Vivone

2604.08902 2026-04-13 cs.LG

Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya

Jimmy Bach, Yang Li, Yaqi Liu, John Sankok, Rose Kimani, Carrie B. Dolan, Julius N. Odhiambo, Haipeng Chen

详情

英文摘要

Background: Limited data utilization in low-resource settings poses a barrier to the vaccine delivery ecosystem, undermining efforts to achieve equitable immunization coverage. In nomadic populations, individuals face an increased risk of missing crucial vaccination doses as children. One such population is the Maasai in Narok County, Kenya, where the absence of high-volume, quality data hampers accurate coverage estimates, impedes efficient resource allocation, and weakens the ability to deliver timely interventions. Additionally, data privacy concerns are heightened in groups with limited sensitive data. Objectives: First, we aim to identify children at risk of missing key vaccines across a large population to provide timely, evidence-based interventions that support increased vaccination coverage. Second, we aim to better protect the privacy of sensitive health data in a vulnerable population. Methods: We digitized 8 years of child vaccination records from the MOH 510 registry (n=6,913) and applied machine learning models (Logistic Regression and XGBoost) to identify children at risk. Additionally, we utilize a novel approach to tabular diffusion-based synthetic data generation (TabSyn) to protect patient privacy within the models. Results: Our findings show that classification techniques can reliably and successfully predict children at risk of missing a vaccine, with recall, precision, and F1-scores exceeding 90% for some vaccines modeled. Additionally, training these models with synthetic data rather than real data, thus preserving the privacy of individuals within the original dataset, does not lead to a loss in predictive performance. Conclusion: These results support the use of synthetic data implementation in health informatics strategies for clinics with limited digital infrastructure, enabling privacy-preserving, scalable forecasting for childhood immunization coverage.

URL PDF HTML ☆

赞 0 踩 0

2604.08896 2026-04-13 cs.CV

GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

Aoran Xiao, Shihao Cheng, Yonghao Xu, Yexian Ren, Hongruixuan Chen, Naoto Yokoya

Comments CVPR 2026 Highlight paper

2604.08893 2026-04-13 cs.CV cs.AI

Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS)

Mohsen Yaghoubi Suraki

2604.08891 2026-04-13 cs.LG

Adaptive Candidate Point Thompson Sampling for High-Dimensional Bayesian Optimization

Donney Fan, Geoff Pleiss

Comments AISTATS 2026

2604.08890 2026-04-13 cs.LG cs.AI

A Closer Look at the Application of Causal Inference in Graph Representation Learning

Hang Gao, Kunyu Li, Huang Hong, Baoquan Cui, Fengge Wu

2604.08885 2026-04-13 cs.LG

Uncertainty-Aware Transformers: Conformal Prediction for Language Models

Abhiram Vellore, Niraj K. Jha

2604.08884 2026-04-13 cs.CV cs.AI

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

Xinyu Zhang, Zurong Mai, Qingmei Li, Zjin Liao, Yibin Wen, Yuhang Chen, Xiaoya Fan, Chan Tsz Ho, Bi Tianyuan, Haoyuan Liang, Ruifeng Su, Zihao Qian, Juepeng Zheng, Jianxi Huang, Yutong Lu, Haohuan Fu

2604.08883 2026-04-13 cs.RO cs.AI

HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation

Chengjie Fan, Cong Pan, Zijian Liu, Ningzhong Liu, Jie Qin

2604.08881 2026-04-13 cs.CV

Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance

Enyi Shi, Fei Shen, Shuyi Miao, Linxia Zhu, Pengyang Shao, Jinhui Tang, Tat-Seng Chua

2604.08880 2026-04-13 cs.LG cs.AI cs.CL

Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective

Tokio Kajitsuka, Ukyo Honda, Sho Takase

Comments 19 pages, 6 figures

2604.08879 2026-04-13 cs.CL

GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification

Faxian Wan, Xiaocui Yang, Yifan Cao, Shi Feng, Daling Wang, Yifei Zhang

2604.08877 2026-04-13 cs.CV

Harnessing Weak Pair Uncertainty for Text-based Person Search

Jintao Sun, Zhedong Zheng, Gangyi Ding

Comments 39 pages, 15 tables, 7 figures

2604.08867 2026-04-13 cs.SD cs.AI

AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models

Mintong Kang, Chen Fang, Bo Li

2604.08865 2026-04-13 cs.AI

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Tianyi Wang, Yixia Li, Long Li, Yibiao Chen, Shaohan Huang, Yun Chen, Peng Li, Yang Liu, Guanhua Chen

Comments ACL 2026 Main

2604.08863 2026-04-13 cs.AI

Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations

Pengze Li, Jiaquan Zhang, Yunbo Long, Xinping Liu, Zhou wenjie, Encheng Su, Zihang Zeng, Jiaqi Liu, Jiyao Liu, Junchi Yu, Lihao Liu, Philip Torr, Shixiang Tang, Aoran Wang, Xi Chen

2604.08858 2026-04-13 cs.CV

BIAS: A Biologically Inspired Algorithm for Video Saliency Detection

Zhao-ji Zhang, Ya-tang Li

2604.08851 2026-04-13 cs.CL

Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition

Jing Jie Tan, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum, Noriyuki Kawarazaki, Kosuke Takano

Comments IEEE Transactions on Cognitive and Developmental Systems (2026)

详情

DOI: 10.1109/TCDS.2026.3682672

英文摘要

While significant work has been done on personality recognition, the lack of multilingual datasets remains an unresolved challenge. To address this, we propose ADAM (Cross-Lingual (A)ttention (D)istillation with Personality-Guided Generative (A)ugmentation for (M)ultilingual Personality Recognition), a state-of-the-art approach designed to advance multilingual personality recognition. Our approach leverages an existing English-language personality dataset as the primary source and employs a large language model (LLM) for translationbased augmentation, enhanced by Personality-Informed Generative Augmentation (PIGA), to generate high-quality training data in multiple languages, including Japanese, Chinese, Malay, and French. We provide a thorough analysis to justify the effectiveness of these augmentation techniques. Building on these advancements, ADAM integrates Cross-Lingual Attention Distillation (CLAD) to train a model capable of understanding and recognizing personality traits across languages, bridging linguistic and cultural gaps in personality analysis. This research presents a thorough evaluation of the proposed augmentation method, incorporating an ablation study on recognition performance to ensure fair comparisons and robust validation. Overall, with PIGA augmentation, the findings demonstrate that CLAD significantly outperforms the standard BCE across all languages and personality traits, achieving notable improvements in average BA scores - 0.6332 (+0.0573) on the Essays dataset and 0.7448 (+0.0968) on the Kaggle dataset. The CLAD-trained model also demonstrated strong generalizability and achieved benchmark performance comparable to current leading encoder models. The model weight, dataset, and algorithm repository are available at https://research.jingjietan.com/?q=ADAM.

URL PDF HTML ☆

赞 0 踩 0

2604.08850 2026-04-13 cs.LG

Finite-Sample Analysis of Nonlinear Independent Component Analysis:Sample Complexity and Identifiability Bounds

Yuwen Jiang

2604.08847 2026-04-13 cs.CV

DeFakeQ: Enabling Real-Time Deepfake Detection on Edge Devices via Adaptive Bidirectional Quantization

Xiangyu Li, Yujing Sun, Yuhang Zheng, Yuexin Ma, Kwok-Yan Lam

2604.08846 2026-04-13 cs.LG cs.AI cs.CL cs.CV

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

Jinqi Luo, Jinyu Yang, Tal Neiman, Lei Fan, Bing Yin, Son Tran, Mubarak Shah, René Vidal

Comments Accepted in CVPR 2026. Project page: https://peterljq.github.io/project/daco

2604.08844 2026-04-13 cs.LG

Spectral Geometry of LoRA Adapters Encodes Training Objective and Predicts Harmful Compliance

Roi Paul

Comments 15 pages, 8 figures, pre-registered experiment, data at https://github.com/roip/task-geometry-experiment-results

详情

英文摘要

We study whether low-rank spectral summaries of LoRA weight deltas can identify which fine-tuning objective was applied to a language model, and whether that geometric signal predicts downstream behavioral harm. In a pre-registered experiment on \texttt{Llama-3.2-3B-Instruct}, we manufacture 38 LoRA adapters across four categories: healthy SFT baselines, DPO on inverted harmlessness preferences, DPO on inverted helpfulness preferences, and activation-steering-derived adapters, and extract per-layer spectral features (norms, stable rank, singular-value entropy, effective rank, and singular-vector cosine alignment to a healthy centroid). Within a single training method (DPO), a logistic regression classifier achieves AUC~1.00 on binary drift detection, all six pairwise objective comparisons, and near-perfect ordinal severity ranking ($ρ\geq 0.956$). Principal component analysis on flattened weight deltas reveals that training objective is PC1 (AUC~1.00 for objective separation), orthogonal to training duration on PC2. Query-projection weights detect that drift occurred; value-projection weights identify which objective. Cross-method generalization fails completely: a DPO-trained classifier assigns every steering adapter a lower drift score than every DPO adapter (AUC~0.00). In a behavioral evaluation phase, DPO-inverted-harmlessness adapters show elevated harmful compliance on HEx-PHI prompts (mean ASR 0.266 vs.\ healthy 0.112, $Δ= +0.154$), with near-perfect dose--response ($ρ= 0.986$). The geometry-to-behavior rank correlation is $ρ= 0.72$ across 24 non-steered adapters. These results establish that within a controlled manufacturing regime, LoRA weight-space geometry carries objective identity, intensity ordering, and a coarse link to harmful compliance, and that cross-method monitoring requires per-method calibration.

URL PDF HTML ☆

赞 0 踩 0

2604.08837 2026-04-13 cs.LG

Discrete Meanflow Training Curriculum

Chia-Hong Hsu, Frank Wood

2604.08836 2026-04-13 cs.CV

CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation

Sanyam Jain, Pragya Kandari, Manit Singhal, He Zhang, Soo Ye Kim

Comments CVPR 2026 HiGen Workshop. Project page, https://catalogstitch.github.io

2604.08829 2026-04-13 cs.LG cs.NE stat.ML

Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis

Giansalvo Cirrincione

Comments 20 pages, 3 figures, 8 tables submitted to Neurocomputing

2604.08828 2026-04-13 cs.LG cs.CV

Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning

Chia-Hong Hsu, Randall Balestriero