arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.17737 2026-03-19 cs.LG

Embedding World Knowledge into Tabular Models: Towards Best Practices for Embedding Pipeline Design

Oksana Kolomenko, Ricardo Knauer, Erik Rodner

Comments Computational Intelligence 2025 Workshop

详情

DOI: 10.5445/IR/1000186052

英文摘要

Embeddings are a powerful way to enrich data-driven machine learning models with the world knowledge of large language models (LLMs). Yet, there is limited evidence on how to design effective LLM-based embedding pipelines for tabular prediction. In this work, we systematically benchmark 256 pipeline configurations, covering 8 preprocessing strategies, 16 embedding models, and 2 downstream models. Our results show that it strongly depends on the specific pipeline design whether incorporating the prior knowledge of LLMs improves the predictive performance. In general, concatenating embeddings tends to outperform replacing the original columns with embeddings. Larger embedding models tend to yield better results, while public leaderboard rankings and model popularity are poor performance indicators. Finally, gradient boosting decision trees tend to be strong downstream models. Our findings provide researchers and practitioners with guidance for building more effective embedding pipelines for tabular prediction tasks.

URL PDF HTML ☆

赞 0 踩 0

2603.17735 2026-03-19 cs.CV

TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

Yan Zeng, Haoran Jiang, Kaixin Yao, Qixuan Zhang, Longwen Zhang, Lan Xu, Jingyi Yu

详情

英文摘要

Automatically generating photorealistic and self-consistent appearances for untextured 3D models is a critical challenge in digital content creation. The advancement of large-scale video generation models offers a natural approach: directly synthesizing 360-degree turntable videos (TTVs), which can serve not only as high-quality dynamic previews but also as an intermediate representation to drive texture synthesis and neural rendering. However, existing general-purpose video diffusion models struggle to maintain strict geometric consistency and appearance stability across the full range of views, making their outputs ill-suited for high-quality 3D reconstruction. To this end, we introduce TAPESTRY, a framework for generating high-fidelity TTVs conditioned on explicit 3D geometry. We reframe the 3D appearance generation task as a geometry-conditioned video diffusion problem: given a 3D mesh, we first render and encode multi-modal geometric features to constrain the video generation process with pixel-level precision, thereby enabling the creation of high-quality and consistent TTVs. Building upon this, we also design a method for downstream reconstruction tasks from the TTV input, featuring a multi-stage pipeline with 3D-Aware Inpainting. By rotating the model and performing a context-aware secondary generation, this pipeline effectively completes self-occluded regions to achieve full surface coverage. The videos generated by TAPESTRY are not only high-quality dynamic previews but also serve as a reliable, 3D-aware intermediate representation that can be seamlessly back-projected into UV textures or used to supervise neural rendering methods like 3DGS. This enables the automated creation of production-ready, complete 3D assets from untextured meshes. Experimental results demonstrate that our method outperforms existing approaches in both video consistency and final reconstruction quality.

URL PDF HTML ☆

赞 0 踩 0

2603.17722 2026-03-19 cs.LG cs.CY

Predicting Trajectories of Long COVID in Adult Women: The Critical Role of Causal Disentanglement

Jing Wang, Jie Shen, Yiming Luo, Amar Sra, Qiaomin Xie, Jeremy C. Weiss

2603.17720 2026-03-19 cs.RO

VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning

Tianxing Zhou, Feiyang Xue, Zhangchen Ye, Tianyuan Yuan, Hang Zhao, Tao Jiang

2603.17718 2026-03-19 cs.CV

DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation

Yuhe Tian, Kun Zhang, Haoran Ma, Rui Yan, Yingtai Li, Rongsheng Wang, Shaohua Kevin Zhou

2603.17715 2026-03-19 cs.CV cs.AI

Eye image segmentation using visual and concept prompts with Segment Anything Model 3 (SAM3)

Diederick C. Niehorster, Marcus Nyström

2603.17712 2026-03-19 cs.RO cs.CV

AERR-Nav: Adaptive Exploration-Recovery-Reminiscing Strategy for Zero-Shot Object Navigation

Jingzhi Huang, Junkai Huang, Haoyang Yang, Haoang Li, Yi Wang

2603.17705 2026-03-19 cs.CV

Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic Segmentation

Haocheng Li, Juepeng Zheng, Shuangxi Miao, Ruibo Lu, Guosheng Cai, Haohuan Fu, Jianxi Huang

Comments 14 pages, 6 figures

2603.17694 2026-03-19 cs.AI

MALLES: A Multi-agent LLMs-based Economic Sandbox with Consumer Preference Alignment

Yusen Wu, Yiran Liu, Xiaotie Deng

2603.17693 2026-03-19 cs.CV

Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos

Songtao Jiang, Sibo Song, Chenyi Zhou, Yuan Wang, Ruizhe Chen, Tongkun Guan, Ruilin Luo, Yan Zhang, Zhihang Tang, Yuchong Sun, Hang Zhang, Zhibo Yang, Shuai Bai, Junyang Lin, Zuozhu Liu

详情

英文摘要

The transition from image to video understanding requires vision-language models (VLMs) to shift from recognizing static patterns to reasoning over temporal dynamics such as motion trajectories, speed changes, and state transitions. Yet current post-training methods fall short due to two critical limitations: (1) existing datasets often lack temporal-centricity, where answers can be inferred from isolated keyframes rather than requiring holistic temporal integration; and (2) training data generated by proprietary models contains systematic errors in fundamental temporal perception, such as confusing motion directions or misjudging speeds. We introduce SynRL, a post-training framework that teaches models temporal primitives, the fundamental building blocks of temporal understanding including direction, speed, and state tracking. Our key insight is that these abstract primitives, learned from programmatically generated synthetic videos, transfer effectively to real-world scenarios. We decompose temporal understanding into short-term perceptual primitives (speed, direction) and long-term cognitive primitives, constructing 7.7K CoT and 7K RL samples with ground-truth frame-level annotations through code-based video generation. Despite training on simple geometric shapes, SynRL achieves substantial improvements across 15 benchmarks spanning temporal grounding, complex reasoning, and general video understanding. Remarkably, our 7.7K synthetic CoT samples outperform Video-R1 with 165K real-world samples. We attribute this to fundamental temporal skills, such as tracking frame by frame changes and comparing velocity, that transfer effectively from abstract synthetic patterns to complex real-world scenarios. This establishes a new paradigm for video post-training: video temporal learning through carefully designed synthetic data provides a more cost efficient scaling path.

URL PDF HTML ☆

赞 0 踩 0

2603.17692 2026-03-19 cs.LG cs.AI q-fin.CP q-fin.PM

Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization

Joohyoung Jeon, Hongchul Lee

Comments Accepted at the ICLR 2026 Workshop on Advances in Financial AI (FinAI). 18 pages, 7 figures

2603.17687 2026-03-19 cs.LG cs.AI

Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals

Chinenye Omejieke, Shuyao Chen, Xia Cui

2603.17684 2026-03-19 cs.CV

Does YOLO Really Need to See Every Training Image in Every Epoch?

Xingxing Xie, Jiahua Dong, Junwei Han, Gong Cheng

Comments Accepted to CVPR 2026

详情

英文摘要

YOLO detectors are known for their fast inference speed, yet training them remains unexpectedly time-consuming due to their exhaustive pipeline that processes every training image in every epoch, even when many images have already been sufficiently learned. This stands in clear contrast to the efficiency suggested by the ``You Only Look Once'' philosophy. This naturally raises an important question: \textit{Does YOLO really need to see every training image in every epoch?} To explore this, we propose an Anti-Forgetting Sampling Strategy (AFSS) that dynamically determines which images should be used and which can be skipped during each epoch, allowing the detector to learn more effectively and efficiently. Specifically, AFSS measures the learning sufficiency of each training image as the minimum of its detection recall and precision, and dynamically categorizes training images into easy, medium, or hard levels accordingly. Easy training images are sparsely resampled during training in a continuous review manner, with priority given to those that have not been used for a long time to reduce redundancy and prevent forgetting. Moderate training images are partially selected, prioritizing recently unused ones and randomly choosing the rest from unselected images to ensure coverage and prevent forgetting. Hard training images are fully sampled in every epoch to ensure sufficient learning. The learning sufficiency of each training image is periodically updated, enabling detectors to adaptively shift its focus toward the informative training images over time while progressively discarding redundant ones. On widely used natural image detection benchmarks (MS COCO 2017 and PASCAL VOC 2007) and remote sensing detection datasets (DOTA-v1.0 and DIOR-R), AFSS achieves more than $1.43\times$ training speedup for YOLO-series detectors while also improving accuracy.

URL PDF HTML ☆

赞 0 踩 0

2603.17683 2026-03-19 cs.AI cs.LG

Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents

Mohsen Arjmandi

Comments Preprint. 18 pages, 5 figures, 2 tables. Independent research. Code and Colab demo coming soon on GitHub

2603.17680 2026-03-19 cs.CV cs.AI

WeatherReasonSeg: A Benchmark for Weather-Aware Reasoning Segmentation in Visual Language Models

Wanjun Du, Zifeng Yuan, Tingting Chen, Fucai Ke, Beibei Lin, Shunli Zhang

2603.17679 2026-03-19 cs.CV

Illumination-Aware Contactless Fingerprint Spoof Detection via Paired Flash-Non-Flash Imaging

Roja Sahoo, Anoop Namboodiri

Comments Accepted at IWBF 2026 (14th International Workshop on Biometrics and Forensics)

2603.17675 2026-03-19 cs.CV

DeepCORO-CLIP: A Multi-View Foundation Model for Comprehensive Coronary Angiography Video-Text Analysis and External Validation

Sarra Harrabi, Yichen Wu, Geoffrey H. Tison, Minhaj Ansari, Milos Vukadinovic, David Ouyang, Joshua P. Barrios, Jacques Delfrate, Robert Avram

Comments 69 pages, 5 figures

2603.17672 2026-03-19 cs.RO

Consistency-Driven Dual LSTM Models for Kinematic Control of a Wearable Soft Robotic Arm

Xingyu Chen, Yi Xiong, Li Wen

2603.17671 2026-03-19 cs.CV

Few-Step Diffusion Sampling Through Instance-Aware Discretizations

Liangyu Yuan, Ruoyu Wang, Tong Zhao, Dingwen Fu, Mingkun Lei, Beier Zhu, Chi Zhang

Comments 24 pages, 20 figures. code: https://github.com/851695e35/INDIS

2603.17670 2026-03-19 cs.RO

AgentVLN: Towards Agentic Vision-and-Language Navigation

Zihao Xin, Wentong Li, Yixuan Jiang, Ziyuan Huang, Bin Wang, Piji Li, Jianke Zhu, Jie Qin, Shengjun Huang

Comments 19pages, 4 figures

2603.17662 2026-03-19 cs.CV cs.AI

FINER: MLLMs Hallucinate under Fine-grained Negative Queries

Rui Xiao, Sanghwan Kim, Yongqin Xian, Zeynep Akata, Stephan Alaniz

Comments CVPR 2026

2603.17653 2026-03-19 cs.RO

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Jialong Liu, Dehan Shen, Yanbo Wen, Zeyu Jiang, Changhao Chen

2603.17652 2026-03-19 cs.RO cs.CV

VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs

Chaokang Jiang, Desen Zhou, Jiuming Liu, Kevin Li Sun

Comments Under Review

2603.17651 2026-03-19 cs.CV cs.AI

Anchoring and Rescaling Attention for Semantically Coherent Inbetweening

Tae Eun Choi, Sumin Shim, Junhyeok Kim, Seong Jae Hwang

Comments Accepted to CVPR 2026; Code is released at https://github.com/teunchoi/TGI

2603.17647 2026-03-19 cs.CV

Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment

Dongqiang Gou, Xuming He

2603.17639 2026-03-19 cs.AI

VeriGrey: Greybox Agent Validation

Yuntong Zhang, Sungmin Kang, Ruijie Meng, Marcel Böhme, Abhik Roychoudhury

详情

英文摘要

Agentic AI has been a topic of great interest recently. A Large Language Model (LLM) agent involves one or more LLMs in the back-end. In the front end, it conducts autonomous decision-making by combining the LLM outputs with results obtained by invoking several external tools. The autonomous interactions with the external environment introduce critical security risks. In this paper, we present a grey-box approach to explore diverse behaviors and uncover security risks in LLM agents. Our approach VeriGrey uses the sequence of tools invoked as a feedback function to drive the testing process. This helps uncover infrequent but dangerous tool invocations that cause unexpected agent behavior. As mutation operators in the testing process, we mutate prompts to design pernicious injection prompts. This is carefully accomplished by linking the task of the agent to an injection task, so that the injection task becomes a necessary step of completing the agent functionality. Comparing our approach with a black-box baseline on the well-known AgentDojo benchmark, VeriGrey achieves 33% additional efficacy in finding indirect prompt injection vulnerabilities with a GPT-4.1 back-end. We also conduct real-world case studies with the widely used coding agent Gemini CLI, and the well-known OpenClaw personal assistant. VeriGrey finds prompts inducing several attack scenarios that could not be identified by black-box approaches. In OpenClaw, by constructing a conversation agent which employs mutational fuzz testing as needed, VeriGrey is able to discover malicious skill variants from 10 malicious skills (with 10/10= 100% success rate on the Kimi-K2.5 LLM backend, and 9/10= 90% success rate on Opus 4.6 LLM backend). This demonstrates the value of a dynamic approach like VeriGrey to test agents, and to eventually lead to an agent assurance framework.

URL PDF HTML ☆

赞 0 踩 0

2603.17637 2026-03-19 cs.LG cs.CV

DSS-GAN: Directional State Space GAN with Mamba backbone for Class-Conditional Image Synthesis

Aleksander Ogonowski, Konrad Klimaszewski, Przemysław Rokita

2603.16736 2026-03-19 cs.CV

World Reconstruction From Inconsistent Views

Lukas Höllein, Matthias Nießner

Comments project website: https://lukashoel.github.io/video_to_world video: https://www.youtube.com/watch?v=qXnUwhVmBzA code: https://github.com/lukasHoel/video_to_world

2603.16711 2026-03-19 cs.CV

Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search

Sainan Liu, Tz-Ying Wu, Hector A Valdez, Subarna Tripathi

Comments 14 pages, 9 figures

2603.16289 2026-03-19 cs.CV cs.AI

VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents

Zhengbo Zhang, Jinbo Su, Zhaowen Zhou, Changtao Miao, Yuhan Hong, Qimeng Wu, Yumeng Liu, Feier Wu, Yihe Tian, Yuhao Liang, Zitong Shan, Wanke Xia, Yi-Fan Zhang, Bo Zhang, Zhe Li, Shiming Xiang, Ying Yan