arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.05707 2026-05-08 cs.RO

On the Emergence of Pendular Structure in Multi-Contact Locomotion

Lingxue Lyu, Zihui Liu

详情

英文摘要

LIPM is everywhere in legged-locomotion control, but almost always as a modeling choice rather than as something the controller's cost actually prefers. This note tries to make that link more explicit. Working from a small centroidal OCP that penalizes the rate of angular momentum, we look at what its optimum tends to look like. Three things come out. With full-rank stance, the optimum drifts toward a pendular force pattern at a rate determined by the SVD of the moment Jacobian; the constant is set by foot-span geometry and matches the experiments to within 16%. With N=2 stance, as in trot, the friction cone introduces a lower bound on $\|\dot{H}_G\|$ that no amount of weight tuning fixes; we also see a non-smooth feasibility kink at a critical horizontal acceleration that we can write in closed form. Adding a task term that asks for a nonzero $\dot{H}_G$ moves the optimum off the pendular set in a predictable way. None of this is far from the classical ZMP/DCM picture. We test these claims on a point-mass quadruped and on the Unitree Go1 in MuJoCo (open-loop QP and a torque-level closed-loop controller), and we note where the asymptotic story stops being a good description of what the closed loop actually does.

URL PDF HTML ☆

赞 0 踩 0

2605.05706 2026-05-08 cs.AI q-bio.QM

Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine

Peisong Zhang, Manqiang Peng, Yuxuan Wu, Pawit Phadungsaksawasdi, Wesley Yeung, Ye Zhang, Trang Nguyen, Qiang Zhang, Nan Liu, Meng Wang, Kee Yuan Ngiam, Yih-Chung Tham, Ching-Yu Cheng, Tianfan Fu, Qingyu Chen, Rosemary Ke, Chang Li, Wenzhuo Yang, Zhenghao Lu, Chunyou Lai, Yu Zhang, Sheng Zhong, Hao Deng, Dianbo Liu

2605.05702 2026-05-08 cs.AI

Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents

Huyu Wu, Jun Liu, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu

2605.05701 2026-05-08 cs.AI

Inference-Time Budget Control for LLM Search Agents

Zhengru Fang, Senkang Forest Hu, Zhonghao Chang, Yu Guo, Yihang Tao, Hongyao Liu, Mengzhe Ruan, Jun Huang, Yuguang Fang

2605.05697 2026-05-08 cs.LG cs.AI

Budgeted Attention Allocation: Cost-Conditioned Compute Control for Efficient Transformers

Amrit Nidhi

Comments 12 pages, 1 figure, 10 tables

2605.05694 2026-05-08 cs.CV

Adaptive Physical-Facial Representation Fusion via Subject-Invariant Cross-Modal Prompt Tuning for Video-Based Emotion Recognition

Xiwen Luo, Jia Li, Rencheng Song, Yu Liu, Juan Cheng

Comments The source code will be available at https://github.com/MSA-LMC/SCPT

2605.05692 2026-05-08 cs.CV cs.AI cs.CR

CFE-PPAR: Compression-friendly encryption for privacy-preserving action recognition leveraging video transformers

Haiwei Lin, Shoko Imaizumi, Hitoshi Kiya

Comments 6 pages, 5 figures, accepted to 2026 IEEE International Conference on Image Processing (ICIP)

2605.05689 2026-05-08 cs.AI

GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model

Shaozhen Ma, Wei Huang, Hanchen Wang, Dong Wen, Wenjie Zhang

2605.05688 2026-05-08 cs.CV

R2H-Diff: Guided Spectral Diffusion Model for RGB-to-Hyperspectral Reconstruction

Songyu Ding, Ronggiang Zhao, Mingchun Sun, Jie Liu

2605.05687 2026-05-08 cs.AI

DataDignity: Training Data Attribution for Large Language Models

Xiaomin Li, Andrzej Banburski-Fahey, Jaron Lanier

2605.05685 2026-05-08 cs.LG cs.AI stat.ML

Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting

Naveen Mysore

Comments 9 pages, 4 figures, 6 tables, plus appendix. Under review at NeurIPS 2026

2605.05678 2026-05-08 cs.AI

Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering

Xiaomin Li, Jianheng Hou, Zheyuan Deng, Zhiwei Zhang, Taoran Li, Binghang Lu, Bing Hu, Yunhan Zhao, Yuexing Hao

2605.05676 2026-05-08 cs.CL cs.AI

Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

Bing Wang, Ximing Li, Changchun Li, Jinjin Chi, Gang Niu, Masashi Sugiyama

Comments Accepted by ICML 2026. 25 pages, 13 figures. Code: https://github.com/wangbing1416/BADIT

2605.05668 2026-05-08 cs.AI cs.CV

Large Vision-Language Models Get Lost in Attention

Gongli Xi, Ye Tian, Mengyu Yang, Huahui Yi, Liang Lin, Xiaoshuai Hao, Kun Wang, Wendong Wang

Comments 25 pages, 10 figures. Accepted by ICML 2026

2605.05664 2026-05-08 cs.CV

Sparse-to-Complete: From Sparse Image Captures to Complete 3D Scenes

Yiyang Shen, Yin Yang, Kun Zhou, Tianjia Shao

详情

Journal ref: SIGGRAPH 2026 Conf. Proc

英文摘要

We introduce S2C-3D, a novel sparse-view 3D reconstruction framework for high-fidelity and complete scene reconstruction from as few as six to eight images. Our framework features three components: a specialized diffusion model for scene-specific image restoration, a training-free view-consistency conditioned sampling process in the diffusion model for refined Gaussian optimization, and a camera trajectory planning scheme to ensure comprehensive scene coverage. The specialized diffusion model is developed by finetuning a pretrained architecture on the input views and their corresponding degraded counterparts. The adaptation to the scene distribution allows the model to repair Gaussian renderings while effectively eliminating domain gaps. Meanwhile, the trajectory planning scheme optimizes scene coverage by connecting each newly sampled camera to its two nearest neighbors. By iteratively constructing paths and retaining only those that significantly enhance visibility, the scheme establishes a trajectory that covers the entire scene. To address multi-view conflicts, the view-consistency conditioned sampling process quantifies the consistency between neighboring repaired images. This information is injected as a condition into the sampling process of the frozen diffusion model, facilitating the generation of view-consistent images without additional training. Consequently, our approach produces high-fidelity 3D Gaussians that are robust to artifacts. Experimental results demonstrate that S2C-3D outperforms state-of-the-art methods, constructing high-quality scenes that are free from missing regions, blurring, or other artifacts with very sparse inputs. The source code and data are available at https://gapszju.github.io/S2C-3D.

URL PDF HTML ☆

赞 0 踩 0

2605.05662 2026-05-08 cs.CL cs.AI

XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

Dasol Choi, Eugenia Kim, Jaewon Noh, Sang Seo, Eunmi Kim, Myunggyo Oh, Yunjin Park, Brigitta Jesica Kartono, Josef Pichlmeier, Helena Berndt, Sai Krishna Mendu, Glenn Johannes Tungka, Özlem Gökçe, Suresh Gehlot, Katherine Pratt, Amanda Minnich, Haon Park

2605.05660 2026-05-08 cs.LG math.OC

Distributionally Robust Multi-Objective Optimization

Yufeng Yang, Fangning Zhuo, Ziyi Chen, Heng Huang, Yi Zhou

Comments 47 pages

2605.05659 2026-05-08 cs.LG

Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

Ying Chen, Aoxi Li, Jihun Kim, Javad Lavaei

Comments 27 pages, 6 figures

2605.05657 2026-05-08 cs.AI cs.MA

Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation

Abhijit Talluri, Pujith Anne, Bhagavan Choudary Pendiyala, Raghavendra Chilukuri

Comments 30 pages, 9 figures. NeurIPS 2026 Evaluations and Datasets Track Submission Under review

2605.05653 2026-05-08 cs.CL

Negative Before Positive: Asymmetric Valence Processing in Large Language Models

Sohan Venkatesh

2605.05646 2026-05-08 cs.CV

MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality

Panqi Yang, Haodong Jing, Jiahao Chao, Tingyan Xiang, Li Lin, Yao Hu, Yang Luo, Yongqiang Ma

Comments 21 pages,Accepted by ICML 2026 main track

2605.05643 2026-05-08 cs.AI cs.IR

Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG

Jiarui Zhong, Hong Cai Chen

Comments 12 pages, 3 figures

2605.05640 2026-05-08 cs.CV

AffectSeek: Agentic Affective Understanding in Long Videos under Vague User Queries

Zhen Zhang, Yuhang Yang, Yunxiang Jiang, Yuhuan Lu, Haifeng Lu, Zheng Lian, Runhao Zeng, Xiping Hu

2605.05638 2026-05-08 cs.LG

Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning

Brett Barkley, Preston Culbertson, David Fridovich-Keil

2605.05636 2026-05-08 cs.CV cs.GR

Learning a Delighting Prior for Facial Appearance Capture in the Wild

Yuxuan Han, Xin Ming, Tianxiao Li, Zhuofan Shen, Qixuan Zhang, Lan Xu, Feng Xu

Comments ACM Transactions on Graphics (Proc. of SIGGRAPH), 2026. Code: https://github.com/yxuhan/OpenDelight Project Page: https://yxuhan.github.io/OpenDelight/index.html

2605.05627 2026-05-08 cs.CV cs.AI cs.LG cs.RO

Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping

Gabriel Jeanson, David-Alexandre Duclos, William Larrivée-Hardy, Noé Cochet, Matěj Boxan, Anthony Deschênes, François Pomerleau, Philippe Giguère

Comments 36 pages, 17 figures

详情

英文摘要

Sustainable forest management relies on precise species composition mapping, yet traditional ground surveys are labour-intensive and geographically constrained. While Uncrewed Aerial Vehicles (UAVs) offer scalable data collection, the transition to deep learning-based interpretation is bottlenecked by the severe scarcity of expert-annotated imagery, particularly in complex, visually heterogeneous regeneration zones. This paper addresses the dual challenges of data scarcity and extreme class imbalance in the semantic segmentation of fine-grained forest regeneration species by providing a scalable framework that reduces reliance on manual photo-interpretation for high-resolution, millimetre-level aerial imagery. Importantly, we leverage the large-scale vision-language Nano Banana Pro model to simultaneously generate high-fidelity images and their corresponding pixel-aligned semantic masks from prompts. We introduce WilDReF-Q-V2, an expansion of a natural forest dataset with 13 977 new unlabelled and 50 labelled real images, as well as the Gen4Regen dataset, featuring 2101 pairs of synthetic images and semantic masks. Our methodology integrates real-world data with AI-generated images, highlighting that AI-generated data is highly complementary to real-world data, with unified training yielding an F1 score improvement of over 15 %pt compared to purely supervised baselines. Furthermore, we demonstrate that even small quantities of prompt-generated data significantly improve performance for underrepresented species, some of which saw per-species F1 score gains of up to 30 %pt. We conclude that vision-language models can serve as agile data generators, effectively bootstrapping perception tasks for niche AI domains where expert labels are scarce or unavailable. Our datasets, source code, and models will be available at https://norlab-ulaval.github.io/gen4regen.

URL PDF HTML ☆

赞 0 踩 0

2605.05626 2026-05-08 cs.CL cs.AI

When2Speak: A Dataset for Temporal Participation and Turn-Taking in Multi-Party Conversations for Large Language Models

Vihaan Nama, Shreya Mendi, Zian Ye, Brinnae Bent

Comments Currently under review. Dataset can be found: https://huggingface.co/datasets/duke-trust-lab/When2Speak

详情

英文摘要

Large Language Models (LLMs) excel at generating contextually appropriate responses but remain poorly calibrated for multi-party conversations, where deciding when to speak is as critical as what to say. In such settings, naively responding at every turn leads to excessive interruptions and degraded conversational coherence. We introduce When2Speak, a grounded synthetic dataset and four-stage generation pipeline for learning intervention timing in group interactions. The dataset comprises over 215,000 examples derived from 16,000 conversations involving 2-6 speakers, spanning diverse conversational styles, tones, and participant dynamics, and explicitly modeling SPEAK vs. SILENT decisions at each turn. Our pipeline combines real-world grounding, structured augmentation, controlled transcript synthesis, and fine-tuning-ready supervision, and is fully open-sourced to support reproducibility and adaptation to domain-specific conversational norms. Across multiple model families, supervised fine-tuning (SFT) on When2Speak significantly outperforms zero-shot baselines (e.g., the average Macro F1 increase across 4B+ parameter models was 60%, with the largest increase being 120%). However, SFT-trained models remain systematically over-conservative, missing nearly half of warranted interventions as seen through the Missed Intervention Rate (MIR), which was on average 0.50 and is noticed even at larger model sizes. To address this limitation, we apply reinforcement learning with asymmetric reward shaping, which reduces MIR to 0.186-0.218 and increases recall from 0.479 to 0.78-0.81. Our findings establish that temporal participation is a distinct and trainable dimension of conversational intelligence, and that grounded synthetic data provides an effective and scalable pathway for enabling LLMs to participate more naturally and appropriately in multi-party interactions.

URL PDF HTML ☆

赞 0 踩 0

2605.05623 2026-05-08 cs.LG

Region-adaptable retrieval of coastal biogeochemical parameters from near-surface hyperspectral remote sensing reflectance using physics-aware meta-learning

Yiqing Guo, Nagur R. C. Cherukuru, Eric A. Lehmann, S. L. Kesav Unnithan, Tim J. Malthus, Gemma Kerrisk, Xiubin Qi, Faisal Islam, Tisham Dhar, Mark J. Doubell

详情

英文摘要

Hyperspectral in situ sensing has shown promise in retrieving aquatic biogeochemical (BGC) parameters, such as total suspended solids, dissolved organic carbon, and total chlorophyll-a, for cost-effective monitoring of coastal water quality. However, generalising such retrieval algorithms across water bodies remains challenging, as the relationship between remote sensing reflectance (Rrs) and BGC parameters can vary considerably from one region to another due to regional distinctions in environmental conditions and biogeochemistry that lead to different BGC ranges and bio-optical properties. In this study, we propose a two-stage physics-aware meta-learning framework for retrieving coastal BGC parameters from near-surface Rrs observations. In the first stage, a bio-optical forward model is used to generate a large synthetic dataset based on an in situ bio-optical spectral library with broad representativeness of Australian coastal waters. This dataset is then used to pretrain a region-agnostic base model with meta-learning, allowing the model to learn fundamental physical relationships. In the second stage, the pretrained base model is fine-tuned for specific regions with local samples. We collected in situ hyperspectral Rrs and BGC measurements from five geographically distinct sites in Australian coastal waters. Our experimental results suggest: (1) the BGC parameters and their corresponding hyperspectral Rrs signatures exhibited clear regional distinctions among the experimental sites; (2) the synthetic dataset was physically plausible and closely aligned with real-world samples in both parameter distributions and inter-parameter correlations; (3) the proposed approach outperformed five benchmark models in BGC retrieval; and (4) time series of in situ measured and model-predicted BGC parameters showed good agreement in both magnitude and temporal dynamics.

URL PDF HTML ☆

赞 0 踩 0

2605.05616 2026-05-08 cs.CV cs.LG

RAM-H1200: A Unified Evaluation and Dataset on Hand Radiographs for Rheumatoid Arthritis

Songxiao Yang, Haolin Wang, Yao Fu, Junmu Peng, Lin Fan, Hongruixuan Chen, Jian Song, Masayuki Ikebe, Shinya Takamaeda-Yamazaki, Masatoshi Okutomi, Tamotsu Kamishima, Yafei Ou

Comments 50 pages, 24 figures, 25 tables

2605.05609 2026-05-08 cs.LG econ.EM stat.ML

Optimal Contextual Pricing under Agnostic Non-Lipschitz Demand

Jianyu Xu, Yu-Xiang Wang

Comments 30 pages, 1 figure, 1 table