arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.03643 2026-04-07 cs.CV cs.CL cs.LG

Optical Context Compression Is Just (Bad) Autoencoding

Ivan Yee Lee, Cheng Yang, Taylor Berg-Kirkpatrick

详情

英文摘要

DeepSeek-OCR shows that rendered text can be reconstructed from a small number of vision tokens, sparking excitement about using vision as a compression medium for long textual contexts. But this pipeline requires rendering token embeddings to pixels and compressing from there -- discarding learned representations in favor of an image the vision encoder must then recover from. We ask whether this detour helps. Comparing DeepSeek-OCR's vision encoder against near-zero-parameter mean pooling and a learned hierarchical encoder, we find it does not. For reconstruction, simple direct methods match or surpass vision at every compression ratio. For language modeling, vision performs comparably to truncation -- a baseline that simply discards context -- and loses to the hierarchical encoder at every compression ratio. As expected, all compression methods outperform truncation for factual recall, but vision never surpasses the best direct baseline. The excitement around optical context compression outpaces the evidence. Code and checkpoints are available at https://github.com/ivnle/bad-autoencoding.

URL PDF HTML ☆

赞 0 踩 0

2511.23230 2026-04-07 cs.CV

Action-guided generation of 3D functionality segmentation data

Jaime Corsetti, Francesco Giuliari, Davide Boscaini, Pedro Hermosilla, Andrea Pilzer, Guofeng Mei, Alexandros Delitzas, Francis Engelmann, Fabio Poiesi

Comments Accepted at CVPR 2026 GenRecon3D workshop. 17 pages, 8 figures, 1 table

2511.22553 2026-04-07 cs.CV

Bringing Your Portrait to 3D Presence

Jiawei Zhang, Lei Chu, Jiahao Li, Zhenyu Zang, Chong Li, Xiao Li, Xun Cao, Hao Zhu, Yan Lu

Comments project page: https://zjwfufu.github.io/HuaPi-Page/

2511.22262 2026-04-07 cs.CV

Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?

Wenkai Huang, Yijia Guo, Gaolei Li, Lei Ma, Hang Zhang, Liwen Hu, Jiazheng Wang, Jianhua Li, Tiejun Huang

Comments Accepted by AAAI 2026

2511.20944 2026-04-07 cs.LG cs.CR

Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection

Yaw Osei Adjei, Frederick Ayivor

Comments 8 pages, 10 figures, 8 tables. Accepted to the 7th IEEE Silicon Valley Cybersecurity Conference (SVCC 2026), San Jose, CA, USA, June 10-12, 2026

2511.19629 2026-04-07 cs.CV

SkillSight: Efficient First-Person Skill Assessment with Gaze

Chi Hsuan Wu, Kumar Ashutosh, Kristen Grauman

2511.17634 2026-04-07 cs.CV

Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection

Kaikwan Lau, Andrew S. Na, Justin W. L. Wan

2511.17013 2026-04-07 cs.RO

MfNeuPAN: Proactive End-to-End Navigation in Dynamic Environments via Direct Multi-Frame Point Constraints

Yiwen Ying, Hanjing Ye, Senzi Luo, Luyao Liu, Yu Zhan, Li He, Hong Zhang

Comments 6 pages, 9 figures, accepted at IEEE ROBIO 2025

2511.14336 2026-04-07 cs.CV

ArchMap: Arch-Flattening and Knowledge-Guided Vision Language Model for Tooth Counting and Structured Dental Understanding

Bohan Zhang, Yiyi Miao, Taoyu Wu, Tong Chen, Ji Jiang, Zhuoxiao Li, Zhe Tang, Limin Yu, Jionglong Su

详情

DOI: 10.1109/BigData66926.2025.11402150
Journal ref: In Proceedings of the 2025 IEEE International Conference on Big Data (BigData), pp. 7529-7538, 2025

英文摘要

A structured understanding of intraoral 3D scans is essential for digital orthodontics. However, existing deep-learning approaches rely heavily on modality-specific training, large annotated datasets, and controlled scanning conditions, which limit generalization across devices and hinder deployment in real clinical workflows. Moreover, raw intraoral meshes exhibit substantial variation in arch pose, incomplete geometry caused by occlusion or tooth contact, and a lack of texture cues, making unified semantic interpretation highly challenging. To address these limitations, we propose ArchMap, a training-free and knowledge-guided framework for robust structured dental understanding. ArchMap first introduces a geometry-aware arch-flattening module that standardizes raw 3D meshes into spatially aligned, continuity-preserving multi-view projections. We then construct a Dental Knowledge Base (DKB) encoding hierarchical tooth ontology, dentition-stage policies, and clinical semantics to constrain the symbolic reasoning space. We validate ArchMap on 1060 pre-/post-orthodontic cases, demonstrating robust performance in tooth counting, anatomical partitioning, dentition-stage classification, and the identification of clinical conditions such as crowding, missing teeth, prosthetics, and caries. Compared with supervised pipelines and prompted VLM baselines, ArchMap achieves higher accuracy, reduced semantic drift, and superior stability under sparse or artifact-prone conditions. As a fully training-free system, ArchMap demonstrates that combining geometric normalization with ontology-guided multimodal reasoning offers a practical and scalable solution for the structured analysis of 3D intraoral scans in modern digital orthodontics.

URL PDF HTML ☆

赞 0 踩 0

2511.09104 2026-04-07 cs.RO

Decoupling Torque and Stiffness: A Unified Modeling and Control Framework for Antagonistic Artificial Muscles

Amirhossein Kazemipour, Robert K. Katzschmann

2510.27584 2026-04-07 cs.CV cs.IR cs.LG

Image Hashing via Cross-View Code Alignment in the Age of Foundation Models

Ilyass Moummad, Kawtar Zaher, Hervé Goëau, Alexis Joly

2510.25224 2026-04-07 cs.CL

ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation

Ziyi Liu, Bahar Sarrafzadeh, Pei Zhou, Longqi Yang, Jieyu Zhao, Ashish Sharma

详情

英文摘要

While Large Language Models (LLMs) are increasingly used in agentic frameworks to assist individual users, there is a growing need for agents that can proactively manage complex, multi-party collaboration. Systematic evaluation methods for such proactive agents remain scarce, limiting progress in developing AI that can effectively support multiple people together. Negotiation offers a demanding testbed for this challenge, requiring socio-cognitive intelligence to navigate conflicting interests between multiple participants and multiple topics and build consensus. Here, we present ProMediate, the first framework for evaluating proactive AI mediator agents in complex, multi-topic, multi-party negotiations. ProMediate consists of two core components: (i) a simulation testbed based on realistic negotiation cases and theory-driven difficulty levels (ProMediate-Easy, ProMediate-Medium, and ProMediate-Hard), with a plug-and-play proactive AI mediator grounded in socio-cognitive mediation theories, capable of flexibly deciding when and how to intervene; and (ii) a socio-cognitive evaluation framework with a new suite of metrics to measure consensus changes, intervention latency, mediator effectiveness, and intelligence. Together, these components establish a systematic framework for assessing the socio-cognitive intelligence of proactive AI agents in multi-party settings. Our results show that a socially intelligent mediator agent outperforms a generic baseline, via faster, better-targeted interventions. In the ProMediate-Hard setting, our social mediator increases consensus change by 3.6 percentage points compared to the generic baseline (10.65\% vs 7.01\%) while being 77\% faster in response (15.98s vs. 3.71s). In conclusion, ProMediate provides a rigorous, theory-grounded testbed to advance the development of proactive, socially intelligent agents.

URL PDF HTML ☆

赞 0 踩 0

2510.23883 2026-04-07 cs.AI

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Anshuman Chhabra, Shrestha Datta, Shahriar Kabir Nahin, Prasant Mohapatra

Comments Published in IEEE Access. DOI: https://doi.org/10.1109/access.2026.3675554

2510.22068 2026-04-07 cs.LG stat.ML

Deep Gaussian Processes for Functional Maps

Matthew Lowery, Zhitong Xu, Da Long, Keyan Chen, Daniel S. Johnson, Yang Bai, Varun Shankar, Shandian Zhe

Comments 9 pages + 9 page appendix, 7 figures

2510.16132 2026-04-07 cs.LG math.OC stat.ML

A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies

Phalguni Nanda, Zaiwei Chen

Comments 46 pages, 4 figures

2510.15148 2026-04-07 cs.CV cs.AI

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Xingrui Wang, Jiang Liu, Chao Huang, Xiaodong Yu, Ze Wang, Ximeng Sun, Jialian Wu, Alan Yuille, Emad Barsoum, Zicheng Liu

2509.23279 2026-04-07 cs.CV cs.AI

Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

Rohit Chowdhury, Aniruddha Bala, Rohan Jaiswal, Siddharth Roheda

Comments Under Review at ACM-MM 26

2509.13095 2026-04-07 cs.RO

Empowering Multi-Robot Cooperation via Sequential World Models

Zijie Zhao, Honglei Guo, Shengqian Chen, Kaixuan Xu, Bo Jiang, Yuanheng Zhu, Dongbin Zhao

2509.12981 2026-04-07 cs.LG stat.ML

Causal Discovery via Quantile Partial Effect

Yikang Chen, Xingzhe Sun, Dehui Du

Comments 29 pages, 6 figures; ICLR 2026

2509.12390 2026-04-07 cs.RO

Distributed Event-Triggered Distance-Based Formation Control for Multi-Agent Systems

Evangelos Psomiadis, Panagiotis Tsiotras

Comments 6 pages, 5 figures

2509.07149 2026-04-07 cs.LG cs.AI cs.CL cs.IT math.IT

Measuring Uncertainty in Transformer Circuits with Effective Information Consistency

Anatoly A. Krasnovsky

2509.05892 2026-04-07 cs.CV cs.AI

Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

Phongsakon Mark Konrad, Andrei-Alexandru Popa, Yaser Sabzehmeidani, Liang Zhong, Madhulika Tripathy, Andrei Constantinescu, Elisa A. Liehn, Serkan Ayvaz

2509.02892 2026-04-07 cs.LG stat.ME

Improving Generative Methods for Causal Evaluation via Simulation-Based Inference

Pracheta Amaranath, Vinitra Muralikrishnan, Amit Sharma, David Jensen

Comments 13 pages main text, 68 pages total

2509.01895 2026-04-07 cs.CV

Automated Wildfire Damage Assessment from Multi view Ground level Imagery Via Vision Language Models

Miguel Esparza, Archit Gupta, Kai Yin, Yiming Xiao, Ali Mostafavi

2508.12301 2026-04-07 cs.CL cs.LG cs.SD eess.AS

WhisperRT -- Turning Whisper into a Causal Streaming Model

Tomer Krichli, Bhiksha Raj, Joseph Keshet

Comments 14 pages, 7 Figures, This work has been submitted to the IEEE for possible publication

2508.04503 2026-04-07 cs.LG cs.AI

PRISM: Lightweight Multivariate Time-Series Classification through Symmetric Multi-Resolution Convolutional Layers

Federico Zucchi, Thomas Lampert

2508.03139 2026-04-07 cs.CV

Unit: Building Unit Detection Dataset

Haozhou Zhai, Yanzhe Gao, Tianjiang Hu

2508.02900 2026-04-07 cs.AI

Seemingly Simple Planning Problems are Computationally Challenging: The Countdown Game

Michael Katz, Harsha Kokel, Sarath Sreedharan

详情

英文摘要

There is a broad consensus that the inability to form long-term plans is one of the key limitations of current foundational models and agents. However, the existing planning benchmarks remain woefully inadequate to truly measure their planning capabilities. Most existing benchmarks either focus on loosely defined tasks like travel planning or end up leveraging existing domains and problems from international planning competitions. While the former tasks are hard to formalize and verify, the latter were specifically designed to test and challenge the weaknesses of existing automated planners. To address these shortcomings, we propose a procedure for creating a planning benchmark centered around the game called Countdown, where a player is expected to form a target number from a list of input numbers through arithmetic operations. From a world-model perspective, each instance induces a fully specified transition model (dynamics) over states and actions, enabling evaluation of planning with verifiable outcomes. We discuss how this problem meets many of the desiderata associated with an ideal benchmark for planning capabilities evaluation. Specifically, the domain allows for an intuitive, natural language description for each problem instance, it is computationally challenging (NP-complete), and the instance space is rich enough that we do not have to worry about memorization. We perform an extensive theoretical analysis, establishing the computational complexity result and demonstrate the advantage of our instance generation procedure over public benchmarks. We evaluate a variety of existing LLM-assisted planning methods on instances generated using our procedure. Our results show that, unlike other domains like 24 Game (a special case of Countdown), our proposed dynamic benchmark remains extremely challenging for existing LLM-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2507.16034 2026-04-07 cs.RO cs.CV

Privacy-Preserving Semantic Segmentation from Ultra-Low-Resolution RGB Inputs

Xuying Huang, Sicong Pan, Olga Zatsarynna, Juergen Gall, Maren Bennewitz

Comments Submit to IJCV Special Issue on Responsible Imaging

2507.14913 2026-04-07 cs.CL

PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation

Eliya Habba, Noam Dahan, Gili Lior, Gabriel Stanovsky

Comments Eliya Habba and Noam Dahan contributed equally to this work