arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2508.08645 2026-04-06 cs.CL

Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

Zheng Wu, Heyuan Huang, Yanjia Yang, Yuanyi Song, Xingyu Lou, Weiwen Liu, Weinan Zhang, Jun Wang, Zhuosheng Zhang

详情

英文摘要

As multimodal large language models advance rapidly, the automation of mobile tasks has become increasingly feasible through the use of mobile-use agents that mimic human interactions from graphical user interface. To further enhance mobile-use agents, previous studies employ demonstration learning to improve mobile-use agents from human demonstrations. However, these methods focus solely on the explicit intention flows of humans (e.g., step sequences) while neglecting implicit intention flows (e.g., personal preferences), which makes it difficult to construct personalized mobile-use agents. In this work, to evaluate the \textbf{I}ntention \textbf{A}lignment \textbf{R}ate between mobile-use agents and humans, we first collect \textbf{MobileIAR}, a dataset containing human-intent-aligned actions and ground-truth actions. This enables a comprehensive assessment of the agents' understanding of human intent. Then we propose \textbf{IFRAgent}, a framework built upon \textbf{I}ntention \textbf{F}low \textbf{R}ecognition from human demonstrations. IFRAgent analyzes explicit intention flows from human demonstrations to construct a query-level vector library of standard operating procedures (SOP), and analyzes implicit intention flows to build a user-level habit repository. IFRAgent then leverages a SOP extractor combined with retrieval-augmented generation and a query rewriter to generate personalized query and SOP from a raw ambiguous query, enhancing the alignment between mobile-use agents and human intent. Experimental results demonstrate that IFRAgent outperforms baselines by an average of 6.79\% (32.06\% relative improvement) in human intention alignment rate and improves step completion rates by an average of 5.30\% (26.34\% relative improvement). The codes are available at https://github.com/MadeAgents/Quick-on-the-Uptake.

URL PDF HTML ☆

赞 0 踩 0

2507.22512 2026-04-06 cs.CV cs.LG eess.IV

AlphaDent: A dataset for automated tooth pathology detection

Evgeniy I. Sosnin, Yuriy L. Vasilev, Roman A. Solovyev, Aleksandr L. Stempkovskiy, Dmitry V. Telpukhov, Artem A. Vasilev, Aleksandr A. Amerikanov, Aleksandr Y. Romanov

2507.22264 2026-04-06 cs.CV cs.AI

SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Shaoan Xie, Lingjing Kong, Yujia Zheng, Yu Yao, Zeyu Tang, Eric P. Xing, Guangyi Chen, Kun Zhang

Comments CVPR2025

2507.21584 2026-04-06 cs.CV

TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs

Kejia Zhang, Keda Tao, Zhiming Luo, Chang Liu, Jiasheng Tang, Huan Wang

2507.21437 2026-04-06 cs.LG

PVD-ONet: A Multi-scale Neural Operator Method for Singularly Perturbed Boundary Layer Problems

Tiantian Sun, Jian Zu

Comments 44pages,14figures

2507.19315 2026-04-06 cs.CL

AutoPCR: Automated Phenotype Concept Recognition by Prompting

Yicheng Tao, Yuanhao Huang, Yiqun Wang, Xin Luo, Jie Liu

Comments Accepted at ISMB 2026 (Proceedings)

2507.19090 2026-04-06 cs.CL

Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents

Haorui He, Yupeng Li, Dacheng Wen, Yang Chen, Reynold Cheng, Donglong Chen, Francis C. M. Lau

Comments Accepted by the ACM Web Conference 2026 (WWW 2026)

2507.18177 2026-04-06 cs.CV cs.AI

Differential-UMamba: Rethinking Tumor Segmentation Under Limited Data Scenarios

Dhruv Jain, Romain Modzelewski, Romain Herault, Clement Chatelain, Eva Torfeh, Sebastien Thureau

2506.03198 2026-04-06 cs.CV cs.AI

FLEX: A Largescale Multimodal, Multiview Dataset for Learning Structured Representations for Fitness Action Quality Assessment

Hao Yin, Lijun Gu, Paritosh Parmar, Lin Xu, Tianxiao Guo, Xiujin Liu, Weiwei Fu, Yang Zhang, Tianyou Zheng

Comments Dataset and code are available at https://github.com/HaoYin116/FLEX . Link to Project page https://haoyin116.github.io/FLEX_Dataset

2506.01167 2026-04-06 cs.LG cs.RO

Accelerated Learning with Linear Temporal Logic using Differentiable Simulation

Alper Kamil Bozkurt, Calin Belta, Ming C. Lin

2505.19353 2026-04-06 cs.AI cs.CL cs.CY cs.SE

Architectures of Error: A Philosophical Inquiry into AI and Human Code Generation

Camilo Chacón Sartori

Comments preprint

2505.15656 2026-04-06 cs.CL

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Yuanchao Zhang, Hongning Wang, Minlie Huang

Comments Accepted to ICLR 2026

2505.15323 2026-04-06 cs.CL

Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling

Silvia Cappelletti, Tobia Poppi, Samuele Poppi, Zheng-Xin Yong, Diego Garcia-Olano, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Comments 23 pages, 6 figures, 6 tables

2505.15263 2026-04-06 cs.CV cs.LG

gen2seg: Generative Models Enable Generalizable Instance Segmentation

Om Khangaonkar, Hamed Pirsiavash

Comments ICLR 2026 camera ready. Website: https://reachomk.github.io/gen2seg/

2505.13995 2026-04-06 cs.CL cs.AI cs.CY

ELEPHANT: Measuring and understanding social sycophancy in LLMs

Myra Cheng, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, Dan Jurafsky

2504.17180 2026-04-06 cs.CV cs.AI

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

Minkyu Choi, S P Sharan, Harsh Goel, Sahil Shah, Sandeep Chinchali

2504.01938 2026-04-06 cs.LG cs.NA math.NA stat.ML

A Unified Approach to Analysis and Design of Denoising Markov Models

Yinuo Ren, Grant M. Rotskoff, Lexing Ying

2502.09018 2026-04-06 cs.LG cs.AI cs.CV

Zero-shot Concept Bottleneck Models

Shin'ya Yamaguchi, Kosuke Nishida, Daiki Chijiwa, Yasutoshi Ida

Comments Accepted to IEEE ICME 2026

2502.02970 2026-04-06 cs.LG

Distributional Statistics Restore Training Data Auditability in One-step Distilled Diffusion Models

Muxing Li, Zesheng Ye, Sharon Li, Andy Song, Guangquan Zhang, Feng Liu

详情

英文摘要

The proliferation of diffusion models trained on web-scale, provenance-uncertain image collections has made it essential, yet technically unresolved, to determine whether a model has learned from specific copyrighted data without authorization. Current methods primarily rely on the memorization effect, whereby models reconstruct their training images better than unseen ones, to detect unauthorized training data on a per-instance basis. This effect, however, vanishes under distillation, the now-dominant deployment pipeline that compresses compute-intensive teacher diffusion models into efficient {\em student one-step generators} mimicking the teacher's output for real-time user access. As the students train exclusively on teacher-generated outputs and never directly see the teacher's original training data, they carry no per-instance memorization of that upstream data, creating a model laundering loophole that severs the auditable link between a deployed model and its upstream training data. We nonetheless reveal that a distributional memory chain survives under distillation: the student's output distribution remains closer to the teacher's training distribution than to any non-training reference, even if no single training instance is memorized. Exploiting this chain, we develop a distributional unauthorized training data detector, grounded in kernel-based distribution discrepancy, that determines if a candidate dataset of unknown composition is statistically aligned with the student-generated distribution more than held-out non-training datasets, thus tracing provenance back to the teacher's training data. Evaluation across benchmarks and distillation setups confirms reliable detection even when unauthorized data forms a minority of the candidate set, establishing distribution-level auditing as a countermeasure to model laundering and a paradigm for accountable generative AI ecosystems.

URL PDF HTML ☆

赞 0 踩 0

2411.06851 2026-04-06 cs.CV cs.LG

Fast and Efficient Transformer-based Method for Bird's Eye View Instance Prediction

Miguel Antunes-García, Luis M. Bergasa, Santiago Montiel-Marín, Rafael Barea, Fabio Sánchez-García, Ángel Llamazares

Comments The article has been presented in the 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024) on September, 2024. Number of pages: 6, Number of figures: 4

2410.06128 2026-04-06 cs.LG stat.ML

Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon

Comments Transactions on Machine Learning Research (TMLR) 2025 (J2C Certification). ICLR 2026

2409.18512 2026-04-06 cs.SD cs.AI cs.CL eess.AS

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS

Haoyu Wang, Chunyu Qiang, Tianrui Wang, Cheng Gong, Yu Jiang, Yuheng Lu, Chen Zhang, Longbiao Wang, Jianwu Dang

2408.12406 2026-04-06 cs.CV

Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes

Sota Kato, Hinako Mitsuoka, Kazuhiro Hotta

Comments Accepted by ECCV2024 Workshop "Computational Aspects of Deep Learning (CADL)"

2407.16341 2026-04-06 cs.CV

Motion Capture from Inertial and Vision Sensors

Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Ruoli Dai, Yongdong Zhang, Tao Mei

Comments 12 pages,8 figures

2407.01012 2026-04-06 cs.LG cs.CV

Swish-T : Enhancing Swish Activation with Tanh Bias for Improved Neural Network Performance

Youngmin Seo, Jinha Kim, Unsang Park

Comments 11 pages, 6 figures Revised the derivative of the sigmoid function from 1-sigmoid to sigmoid(1-sigmoid) for correctness.Updated related equations in Section 3.2. Conclusions to Conclusion in Section 6

2405.15314 2026-04-06 cs.LG

Output-Constrained Decision Trees

Hüseyin Tunç, Doğanay Özese, Ş. İlker Birbil, Donato Maragno, Marco Caserta, Mustafa Baydoğan

Comments 27 pages, 3 figures

2403.00127 2026-04-06 cs.CL cs.CY cs.HC

Prompting ChatGPT for Translation: A Comparative Analysis of Translation Brief and Persona Prompts

Sui He

2402.01207 2026-04-06 cs.LG cs.AI stat.ME

Efficient Causal Graph Discovery Using Large Language Models

Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, Yoshua Bengio

2305.18915 2026-04-06 cs.CL cs.AI

Empirical Sufficiency Lower Bounds for Language Modeling with Locally-Bootstrapped Semantic Structures

Jakob Prange, Emmanuele Chersoni

Comments To appear at *SEM 2023, Toronto

2302.08150 2026-04-06 cs.CL cs.AI

Reanalyzing L2 Preposition Learning with Bayesian Mixed Effects and a Pretrained Language Model

Jakob Prange, Man Ho Ivy Wong

Comments To appear at ACL 2023, Toronto