arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.12388 2026-03-16 cs.CV cs.HC

Deployment-Oriented Session-wise Meta-Calibration for Landmark-Based Webcam Gaze Tracking

Chenkai Zhang

Comments 24 pages, 7 figures. Deployment-oriented landmark-only webcam gaze tracking with browser-capable runtime

详情

英文摘要

Practical webcam gaze tracking is constrained not only by error, but also by calibration burden, robustness to head motion and session drift, runtime footprint, and browser use. We therefore target a deployment-oriented operating point rather than the image large-backbone regime. We cast landmark-based point-of-regard estimation as session-wise adaptation: a shared geometric encoder produces embeddings that can be aligned to a new session from a small calibration set. We present Equivariant Meta-Calibrated Gaze (EMC-Gaze), a lightweight landmark-only method combining an E(3)-equivariant landmark-graph encoder, local eye geometry, binocular emphasis, auxiliary 3D gaze-direction supervision, and a closed-form ridge calibrator differentiated through episodic meta-training. To reduce pose leakage, we use a two-view canonicalization consistency loss. The deployed predictor uses only facial landmarks and fits a per-session ridge head from brief calibration. In a fixation-style interactive evaluation over 33 sessions at 100 cm, EMC-Gaze achieves 5.79 +/- 1.81 deg RMSE after 9-point calibration versus 6.68 +/- 2.34 deg for Elastic Net; the gain is larger on still-head queries (2.92 +/- 0.75 deg vs. 4.45 +/- 0.30 deg). Across three subject holdouts of 10 subjects each, EMC-Gaze retains an advantage (5.66 +/- 0.19 deg vs. 6.49 +/- 0.33 deg). On MPIIFaceGaze with short per-session calibration, the eye-focused model reaches 8.82 +/- 1.21 deg at 16-shot calibration, ties Elastic Net at 1-shot, and outperforms it from 3-shot onward. The exported eye-focused encoder has 944,423 parameters, is 4.76 MB in ONNX, and supports calibrated browser prediction in 12.58/12.58/12.90 ms per sample (mean/median/p90) in Chromium 145 with ONNX Runtime Web. These results position EMC-Gaze as a calibration-friendly operating point rather than a universal state-of-the-art claim against heavier appearance-based systems.

URL PDF HTML ☆

赞 0 踩 0

2603.12382 2026-03-16 cs.CV cs.AI

SPARROW: Learning Spatial Precision and Temporal Referential Consistency in Pixel-Grounded Video MLLMs

Mohamad Alansari, Naufal Suryanto, Divya Velayudhan, Sajid Javed, Naoufel Werghi, Muzammal Naseer

Comments Accepted at CVPR 2026; Project page: https://risys-lab.github.io/SPARROW; Repository: https://github.com/RISys-Lab/SPARROW

2603.12378 2026-03-16 cs.LG cs.CL

NeuroLoRA: Context-Aware Neuromodulation for Parameter-Efficient Multi-Task Adaptation

Yuxin Yang, Haoran Zhang, Mingxuan Li, Jiachen Xu, Ruoxi Shen, Zhenyu Wang, Tianhao Liu, Siqi Chen, Weilin Huang

Comments work in progress

2603.12369 2026-03-16 cs.CV

Human Knowledge Integrated Multi-modal Learning for Single Source Domain Generalization

Ayan Banerjee, Kuntal Thakur, Sandeep Gupta

2603.12361 2026-03-16 cs.RO

GNN-DIP: Neural Corridor Selection for Decomposition-Based Motion Planning

Peng Xie, Yanlinag Huang, Wenyuan Wu, Amr Alanwar

2603.12353 2026-03-16 cs.LG

Spatial PDE-aware Selective State-space with Nested Memory for Mobile Traffic Grid Forecasting

Zineddine Bettouche, Khalid Ali, Andreas Fischer, Andreas Kassler

2603.12350 2026-03-16 cs.CL cs.SD

TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling

Liang-Hsuan Tseng, Hung-yi Lee

Comments Work in progress

2603.12349 2026-03-16 cs.LG cs.AI q-bio.QM stat.ML

Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection

Abhinaba Basu, Pavan Chakraborty

详情

英文摘要

Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evaluation framework exists for comparing selection strategies -- a gap intensified by large language models (LLMs), which generate plausible scientific proposals without reliable downstream evaluation. We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric -- 20 theorems machine-checked by the Lean 4 proof assistant -- that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level. Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that no proposer can inflate by performing well at a cherry-picked budget. As a case study, we apply BSDS/DQS to: do LLMs add marginal value to an existing ML pipeline for drug discovery candidate selection? We evaluate 39 proposers -- 11 mechanistic variants, 14 zero-shot LLM configurations, and 14 few-shot LLM configurations -- using SMILES representations on MoleculeNet HIV (41,127 compounds, 3.5% active, 1,000 bootstrap replicates) under both random and scaffold splits. Three findings emerge. First, the simple RF-based Greedy-ML proposer achieves the best DQS (-0.046), outperforming all MLP variants and LLM configurations. Second, no LLM surpasses the Greedy-ML baseline under zero-shot or few-shot evaluation on HIV or Tox21, establishing that LLMs provide no marginal value over an existing trained classifier. Third, the proposer hierarchy generalizes across five MoleculeNet benchmarks spanning 0.18%-46.2% prevalence, a non-drug AV safety domain, and a 9x7 grid of penalty parameters (tau >= 0.636, mean tau = 0.863). The framework applies to any setting where candidates are selected under budget constraints and asymmetric error costs.

URL PDF HTML ☆

赞 0 踩 0

2603.12347 2026-03-16 cs.RO

A Learning-Based Approach for Contact Detection, Localization, and Force Estimation of Continuum Manipulators With Integrated OFDR Optical Fiber

Mobina Tavangarifard, Jonathan S. Kacines, Qiyu Li, Farshid Alambeigi

Comments 8 pages, 6 figures

2603.12343 2026-03-16 cs.CL

LLM-Augmented Therapy Normalization and Aspect-Based Sentiment Analysis for Treatment-Resistant Depression on Reddit

Yuxin Zhu, Sahithi Lakamana, Masoud Rouhizadeh, Selen Bozkurt, Rachel Hershenberg, Abeed Sarker

2603.12325 2026-03-16 cs.LG cs.AI

Maximum Entropy Exploration Without the Rollouts

Jacob Adamczyk, Adam Kamoski, Rahul V. Kulkarni

2603.12324 2026-03-16 cs.LG cs.AI

Thermodynamics of Reinforcement Learning Curricula

Jacob Adamczyk, Juan Sebastian Rojas, Rahul V. Kulkarni

Comments Accepted at SciForDL Workshop at ICLR 2026

2603.12310 2026-03-16 cs.CV cs.AI cs.LG cs.MA

VQQA: An Agentic Approach for Video Evaluation and Quality Improvement

Yiwen Song, Tomas Pfister, Yale Song

2603.12305 2026-03-16 cs.LG cs.AI

HCP-DCNet: A Hierarchical Causal Primitive Dynamic Composition Network for Self-Improving Causal Understanding

Ming Lei, Shufan Wu, Christophe Baehr

Comments 17 pages, 2 figures, submitted to a journal and under review

2603.12304 2026-03-16 cs.LG cs.AI

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Ming Lei, Shufan Wu, Christophe Baehr

Comments 8 pages, 9 figures, submitted to a journal and under review

2603.12298 2026-03-16 cs.LG cs.AI

Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency

Xinyan Jiang, Wenjing Yu, Di Wang, Lijie Hu

2603.12293 2026-03-16 cs.LG cs.NE

Multi-objective Genetic Programming with Multi-view Multi-level Feature for Enhanced Protein Secondary Structure Prediction

Yining Qian, Lijie Su, Meiling Xu, Xianpeng Wang

2603.12288 2026-03-16 cs.LG cs.AI stat.ML

From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness

Terrence J. Lee-St. John, Jordan L. Lawson, Bartlomiej Piechowski-Jozwiak

Comments 120 pages, 12 figures, 3 tables. Simulation code and documentation available at: https://github.com/tjleestjohn/from-garbage-to-gold

详情

英文摘要

Tabular machine learning presents a paradox: modern models achieve state-of-the-art performance using high-dimensional (high-D), collinear, error-prone data, defying the "Garbage In, Garbage Out" mantra. To help resolve this, we synthesize principles from Information Theory, Latent Factor Models, and Psychometrics, clarifying that predictive robustness arises not solely from data cleanliness, but from the synergy between data architecture and model capacity. Partitioning predictor-space "noise" into "Predictor Error" and "Structural Uncertainty" (informational deficits from stochastic generative mappings), we prove that leveraging high-D sets of error-prone predictors asymptotically overcomes both types of noise, whereas cleaning a low-D set is fundamentally bounded by Structural Uncertainty. We demonstrate why "Informative Collinearity" (dependencies from shared latent causes) enhances reliability and convergence efficiency, and explain why increased dimensionality reduces the latent inference burden, enabling feasibility with finite samples. To address practical constraints, we propose "Proactive Data-Centric AI" to identify predictors that enable robustness efficiently. We also derive boundaries for Systematic Error Regimes and show why models that absorb "rogue" dependencies can mitigate assumption violations. Linking latent architecture to Benign Overfitting, we offer a first step towards a unified view of robustness to Outcome Error and predictor-space noise, while also delineating when traditional DCAI's focus on label cleaning remains powerful. By redefining data quality from item-level perfection to portfolio-level architecture, we provide a theoretical rationale for "Local Factories" -- learning from live, uncurated enterprise "data swamps" -- supporting a deployment paradigm shift from "Model Transfer" to "Methodology Transfer'' to overcome static generalizability limitations.

URL PDF HTML ☆

赞 0 踩 0

2603.12287 2026-03-16 cs.AI cs.CL cs.DB

Context-Enriched Natural Language Descriptions of Vessel Trajectories

Kostas Patroumpas, Alexandros Troupiotis-Kapeliaris, Giannis Spiliopoulos, Panagiotis Betchavas, Dimitrios Skoutas, Dimitris Zissis, Nikos Bikakis

2603.12276 2026-03-16 cs.LG

No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation

Taha Bouhsine

Comments for more info check www.azetta.ai

2603.12273 2026-03-16 cs.CL cs.AI cs.LG

Aligning Language Models from User Interactions

Thomas Kleine Buening, Jonas Hübotter, Barna Pásztor, Idan Shenfeld, Giorgia Ramponi, Andreas Krause

2603.12272 2026-03-16 cs.CL cs.LG

ActTail: Global Activation Sparsity in Large Language Models

Wenwen Hou, Xinyuan Song, Shiwei Liu

2603.12271 2026-03-16 cs.CL cs.AI cs.LG

Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models

Boyu Qiao, Sean Guo, Xian Yang, Kun Li, Wei Zhou, Songlin Hu, Yunya Song

2603.12270 2026-03-16 cs.CL cs.AI

Task-Specific Knowledge Distillation via Intermediate Probes

Ryan Brown, Chris Russell

2603.12260 2026-03-16 cs.RO

HumDex: Humanoid Dexterous Manipulation Made Easy

Liang Heng, Yihe Tang, Jiajun Xu, Henghui Bao, Di Huang, Yue Wang

2603.12056 2026-03-16 cs.AI cs.CL

XSkill: Continual Learning from Experience and Skills in Multimodal Agents

Guanyu Jiang, Zhaochen Su, Xiaoye Qu, Yi R. Fung

2603.11550 2026-03-16 cs.CV

PCA-Enhanced Probabilistic U-Net for Effective Ambiguous Medical Image Segmentation

Xiangyu Li, Chenglin Wang, Qiantong Shen, Fanding Li, Wei Wang, Kuanquan Wang, Yi Shen, Baochun Zhao, Gongning Luo

2603.11545 2026-03-16 cs.CL cs.AI cs.LG

One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries

Mayank Saini, Arit Kumar Bishwas

Comments 19 pages, 3 figures; v2: corrected author metadata

2603.11460 2026-03-16 cs.CV

Follow the Saliency: Supervised Saliency for Retrieval-augmented Dense Video Captioning

Seung hee Choi, MinJu Jeon, Hyunwoo Oh, Jihwan Lee, Dong-Jin Kim

Comments CVPR 2026 accepted paper (main track)

2603.11455 2026-03-16 cs.AI

Examining Users' Behavioural Intention to Use OpenClaw Through the Cognition--Affect--Conation Framework

Yiran Du