arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.22209 2026-02-26 cs.CV

WHOLE: World-Grounded Hand-Object Lifted from Egocentric Videos

Yufei Ye, Jiaman Li, Ryan Rong, C. Karen Liu

Comments Project website: https://judyye.github.io/whole-www

详情

英文摘要

Egocentric manipulation videos are highly challenging due to severe occlusions during interactions and frequent object entries and exits from the camera view as the person moves. Current methods typically focus on recovering either hand or object pose in isolation, but both struggle during interactions and fail to handle out-of-sight cases. Moreover, their independent predictions often lead to inconsistent hand-object relations. We introduce WHOLE, a method that holistically reconstructs hand and object motion in world space from egocentric videos given object templates. Our key insight is to learn a generative prior over hand-object motion to jointly reason about their interactions. At test time, the pretrained prior is guided to generate trajectories that conform to the video observations. This joint generative reconstruction substantially outperforms approaches that process hands and objects separately followed by post-processing. WHOLE achieves state-of-the-art performance on hand motion estimation, 6D object pose estimation, and their relative interaction reconstruction. Project website: https://judyye.github.io/whole-www

URL PDF HTML ☆

赞 0 踩 0

2602.22207 2026-02-26 cs.CL cs.AI cs.LG

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Hanna Yukhymenko, Anton Alexandrov, Martin Vechev

2602.22200 2026-02-26 cs.CL

SumTablets: A Transliteration Dataset of Sumerian Tablets

Cole Simmons, Richard Diehl Martinez, Dan Jurafsky

Comments 11 pages with 3 figures

详情

DOI: 10.18653/v1/2024.ml4al-1.20
Journal ref: Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024), pages 192-202, Hybrid in Bangkok, Thailand and online. Association for Computational Linguistics

英文摘要

Sumerian transliteration is a conventional system for representing a scholar's interpretation of a tablet in the Latin script. Thanks to visionary digital Assyriology projects such as ETCSL, CDLI, and Oracc, a large number of Sumerian transliterations have been published online, and these data are well-structured for a variety of search and analysis tasks. However, the absence of a comprehensive, accessible dataset pairing transliterations with a digital representation of the tablet's cuneiform glyphs has prevented the application of modern Natural Language Processing (NLP) methods to the task of Sumerian transliteration. To address this gap, we present SumTablets, a dataset pairing Unicode representations of 91,606 Sumerian cuneiform tablets (totaling 6,970,407 glyphs) with the associated transliterations published by Oracc. We construct SumTablets by first preprocessing and standardizing the Oracc transliterations before mapping each reading back to the Unicode representation of the source glyph. Further, we retain parallel structural information (e.g., surfaces, newlines, broken segments) through the use of special tokens. We release SumTablets as a Hugging Face Dataset (CC BY 4.0) and open source data preparation code via GitHub. Additionally, we leverage SumTablets to implement and evaluate two transliteration baselines: (1) weighted sampling from a glyph's possible readings, and (2) fine-tuning an autoregressive language model. Our fine-tuned language model achieves an average transliteration character-level F-score (chrF) of 97.55, demonstrating the immediate potential of transformer-based transliteration models in allowing experts to rapidly verify generated transliterations rather than manually transliterating tablets one-by-one.

URL PDF HTML ☆

赞 0 踩 0

2602.22197 2026-02-26 cs.CV cs.AI

Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

Xavier Pleimling, Sifat Muhammad Abdullah, Gunjan Balde, Peng Gao, Mainack Mondal, Murtuza Jadliwala, Bimal Viswanath

Comments This work has been accepted for publication at the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). The final version will be available on IEEE Xplore. To IEEE SaTML 2026

2602.22193 2026-02-26 cs.CL

Improving Parametric Knowledge Access in Reasoning Language Models

Melody Ma, John Hewitt

2602.22188 2026-02-26 cs.LG cs.AI physics.flu-dyn

Surrogate models for Rock-Fluid Interaction: A Grid-Size-Invariant Approach

Nathalie C. Pinheiro, Donghu Guo, Hannah P. Menke, Aniket C. Joshi, Claire E. Heaney, Ahmed H. ElSheikh, Christopher C. Pain

2602.22182 2026-02-26 cs.CL cs.IR

LiCQA : A Lightweight Complex Question Answering System

Sourav Saha, Dwaipayan Roy, Mandar Mitra

2602.22176 2026-02-26 cs.CV

Mixed Magnification Aggregation for Generalizable Region-Level Representations in Computational Pathology

Eric Zimmermann, Julian Viret, Michal Zelechowski, James Brian Hall, Neil Tenenholtz, Adam Casson, George Shaikovski, Eugene Vorontsov, Siqi Liu, Kristen A Severson

2602.22157 2026-02-26 cs.CL cs.HC cs.LG

Dynamic Personality Adaptation in Large Language Models via State Machines

Leon Pielage, Ole Hätscher, Mitja Back, Bernhard Marschall, Benjamin Risse

Comments 22 pages, 5 figures, submitted to ICPR 2026

2602.22154 2026-02-26 cs.RO

Position-Based Flocking for Persistent Alignment without Velocity Sensing

Hossein B. Jond, Veli Bakırcıoğlu, Logan E. Beaver, Nejat Tükenmez, Adel Akbarimajd, Martin Saska

2602.22146 2026-02-26 cs.LG cs.AI

Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual

Yining Li, Peizhong Ju, Ness Shroff

2602.22144 2026-02-26 cs.CV cs.AI cs.CL

NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

Lingfeng Ren, Weihao Yu, Runpeng Yu, Xinchao Wang

Comments Code: https://github.com/lingfengren/NoLan

2602.22143 2026-02-26 cs.CV

MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining

Yuetan Chu, Xinhua Ma, Xinran Jin, Gongning Luo, Xin Gao

2602.22142 2026-02-26 cs.CV

WeaveTime: Stream from Earlier Frames into Emergent Memory in VideoLLMs

Yulin Zhang, Cheng Shi, Sibei Yang

Comments Accepted at CVPR 2026 (preview; camera-ready in preparation)

2602.22125 2026-02-26 cs.CL

IndicIFEval: A Benchmark for Verifiable Instruction-Following Evaluation in 14 Indic Languages

Thanmay Jayakumar, Mohammed Safi Ur Rahman Khan, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan

Comments 8 pages + Appendix

2602.22120 2026-02-26 cs.CV

GeoDiv: Framework For Measuring Geographical Diversity In Text-To-Image Models

Abhipsa Basu, Mohana Singh, Shashank Agnihotri, Margret Keuper, R. Venkatesh Babu

Comments ICLR 2026

2602.22107 2026-02-26 cs.LG cs.AI

Don't stop me now: Rethinking Validation Criteria for Model Parameter Selection

Andrea Apicella, Francesco Isgrò, Andrea Pollastro, Roberto Prevete

2602.22100 2026-02-26 cs.RO

Behavioral Cloning for Robotic Connector Assembly: An Empirical Study

Andreas Kernbach, Daniel Bargmann, Werner Kraus, Marco F. Huber

Comments 8 pages

2602.22098 2026-02-26 cs.CV

Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D

Mariano Barone, Francesco Di Serio, Giuseppe Riccio, Antonio Romano, Marco Postiglione, Antonino Ferraro, Vincenzo Moscato

2602.22094 2026-02-26 cs.AI

Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning

Nguyen Cong Nhat Le, John G. Rogers, Claire N. Bonial, Neil T. Dantam

Comments 16 pages, 5 figures. Submitted to 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR) on 01/14/2026

2602.22090 2026-02-26 cs.CL

Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference

Bo-Wei Chen, Chung-Chi Chen, An-Zi Yen

Comments Accepted by EACL 2026 Findings

2602.22073 2026-02-26 cs.CV

AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting

Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

2602.22072 2026-02-26 cs.CL cs.AI

Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models

Christian Nickel, Laura Schrewe, Florian Mai, Lucie Flek

2602.22070 2026-02-26 cs.AI

Language Models Exhibit Inconsistent Biases Towards Algorithmic Agents and Human Experts

Jessica Y. Bo, Lillio Mok, Ashton Anderson

Comments Second Conference of the International Association for Safe and Ethical Artificial Intelligence (IASEAI 2026)

2602.22066 2026-02-26 cs.LG cs.AI

DualWeaver: Synergistic Feature Weaving Surrogates for Multivariate Forecasting with Univariate Time Series Foundation Models

Jinpeng Li, Zhongyi Pei, Huaze Xue, Bojian Zheng, Chen Wang, Jianmin Wang

Comments 16 pages. Preprint

2602.22059 2026-02-26 cs.CV cs.AI

NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training

Dengdi Sun, Xiaoya Zhou, Xiao Wang, Hao Si, Wanli Lyu, Jin Tang, Bin Luo

Comments Accepted by CVPR 2026

2602.22055 2026-02-26 cs.LG cs.AI

Physics-Informed Machine Learning for Vessel Shaft Power and Fuel Consumption Prediction: Interpretable KAN-based Approach

Hamza Haruna Mohammed, Dusica Marijan, Arnbjørn Maressa

Comments 10 pages, 5 figures, IEEE conference paper format; under review

2602.22052 2026-02-26 cs.CV

AutoSew: A Geometric Approach to Stitching Prediction with Graph Neural Networks

Pablo Ríos-Navarro, Elena Garces, Jorge Lopez-Moreno

Comments WACV 2026

2602.22049 2026-02-26 cs.CV cs.HC

SPGen: Stochastic scanpath generation for paintings using unsupervised domain adaptation

Mohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani, Alessandro Bruno

Comments Under Review

2602.22033 2026-02-26 cs.CV

RT-RMOT: A Dataset and Framework for RGB-Thermal Referring Multi-Object Tracking

Yanqiu Yu, Zhifan Jin, Sijia Chen, Tongfei Chu, En Yu, Liman Liu, Wenbing Tao