arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2508.04559 2026-04-15 cs.CV

One Model for All: Unified Try-On and Try-Off in Any Pose via LLM-Inspired Bidirectional Tweedie Diffusion

Jinxi Liu, Zijian He, Guangrun Wang, Guanbin Li, Liang Lin

详情

英文摘要

Recent diffusion-based approaches have made significant advances in image-based virtual try-on, enabling more realistic and end-to-end garment synthesis. However, most existing methods remain constrained by their reliance on exhibition garments and segmentation masks, as well as their limited ability to handle flexible pose variations. These limitations reduce their practicality in real-world scenarios; for instance, users cannot easily transfer garments worn by one person onto another, and the generated try-on results are typically restricted to the same pose as the reference image. In this paper, we introduce OMFA (One Model For All), a unified diffusion framework for both virtual try-on and try-off that operates without the need for exhibition garments and supports arbitrary poses. OMFA is inspired by the mask-based paradigm of discrete diffusion language models and unifies try-on and try-off within a bidirectional framework. It is built upon a Bidirectional Tweedie Diffusion process for target-selective denoising in latent space. Instead of imposing lower body constraints, OMFA is an entirely mask-free framework that requires only a single portrait and a target garment as inputs, and is designed to support flexible outfit combinations and cross-person garment transfer, making it better aligned with practical usage scenarios. Additionally, by leveraging SMPL-X-based pose conditioning, OMFA supports multi-view and arbitrary-pose try-on from just one image. Extensive experiments demonstrate that OMFA achieves state-of-the-art results on both try-on and try-off tasks, providing a practical and generalizable solution for virtual garment synthesis. Project page: https://onemodelforall.github.io

URL PDF HTML ☆

赞 0 踩 0

2508.04282 2026-04-15 cs.AI

Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling

Yongyi Wang, Lingfeng Li, Bozhou Chen, Ang Li, Hanyu Liu, Qirui Zheng, Xionghui Yang, Wenxin Li

Comments The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: {10.1007/s11704-026-52148-y}

2507.22359 2026-04-15 cs.AI cs.CL

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

Qianhong Guo, Wei Xie, Xiaofang Cai, Enze Wang, Shuoyoucheng Ma, Xiaobing Sun, Tian Xia, Kai Chen, Xiaofeng Wang, Baosheng Wang

2507.08977 2026-04-15 cs.LG cs.AI stat.ML

Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery

Carson Dudley, Reiden Magdaleno, Christopher Harding, Marisa Eisenberg

2507.08458 2026-04-15 cs.CV cs.AI

A document is worth a structured record: Principled inductive bias design for document recognition

Benjamin Meyer, Lukas Tuggener, Sascha Hänzi, Daniel Schmid, Erdal Ayfer, Benjamin F. Grewe, Ahmed Abdulkadir, Thilo Stadelmann

2507.06448 2026-04-15 cs.CL

Perception-Aware Policy Optimization for Multimodal Reasoning

Zhenhailong Wang, Xuehang Guo, Sofia Stoica, Haiyang Xu, Hongru Wang, Hyeonjeong Ha, Xiusi Chen, Yangyi Chen, Ming Yan, Fei Huang, Heng Ji

2507.04017 2026-04-15 cs.CV

Habitat Classification from Ground-Level Imagery Using Deep Neural Networks

Hongrui Shi, Lisa Norton, Lucy Ridding, Simon Rolph, Tom August, Claire M Wood, Lan Qie, Petra Bosilj, James M Brown

Comments Accepted to Ecological Informatics. Main paper has 18 pages, 7 figures, 4 tables. Appendix has 10 pages, 8 figures, 2 tables

2505.23209 2026-04-15 cs.CV

Navigating the Accuracy-Size Trade-Off with Flexible Model Merging

Akash Dhasade, Divyansh Jhunjhunwala, Milos Vujasinovic, Gauri Joshi, Anne-Marie Kermarrec

Comments Accepted at ICLR 2026

2505.19328 2026-04-15 cs.CV cs.LG

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

Manuela González-González, Soufiane Belharbi, Muhammad Osama Zeeshan, Masoumeh Sharafi, Muhammad Haseeb Aslam, Marco Pedersoli, Alessandro Lameiras Koerich, Simon L Bacon, Eric Granger

Comments 46 pages, 21 figures, ICLR 2026

详情

英文摘要

Ambivalence and hesitancy (A/H), closely related constructs, are the primary reasons why individuals delay, avoid, or abandon health behaviour changes. They are subtle and conflicting emotions that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. They manifest as a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language. Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions. However, no datasets currently exist for the design of machine learning models to recognize A/H. This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos. It contains 1,427 videos with a total duration of 10.60 hours, captured from 300 participants across Canada, answering predefined questions to elicit A/H. It is intended to mirror real-world digital behaviour change interventions delivered online. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts, cropped and aligned faces, and participant metadata are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H. Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, and different learning setups. The limited performance highlights the need for adapted multimodal and spatio-temporal models for A/H recognition. The data and code are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2505.17384 2026-04-15 cs.LG cs.CV stat.ML

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

Tianyu Xie, Shuchen Xue, Zijin Feng, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Cheng Zhang

Comments ICLR 2026 Poster; 24 pages, 13 figures

2504.06983 2026-04-15 cs.LG math.PR stat.ML

Free Random Projection for In-Context Reinforcement Learning

Tomohiro Hayase, Benoît Collins, Nakamasa Inoue

Comments Accepted to AISTATS2026. Code available at https://github.com/ThayaFluss/frp_rl

2503.23178 2026-04-15 cs.CV

Intelligent bear deterrence system based on computer vision: Reducing human-bear conflicts in remote areas

Pengyu Chen, Teng Fei, John A. Kupfer, Yunyan Du, Jiawei Yi, Yi Li

2503.21708 2026-04-15 cs.LG cs.AI cs.CL

On the Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions

Felix Stollenwerk

Comments EACL 2026 (Main), see https://aclanthology.org/2026.eacl-short.48/

2503.09441 2026-04-15 cs.RO cs.SY eess.SY

Learned Incremental Nonlinear Dynamic Inversion for Quadrotors with and without Slung Payloads

Eckart Cobo-Briesewitz, Khaled Wahba, Wolfgang Hönig

Comments Accepted to L4DC 2026

2503.05167 2026-04-15 cs.LG

FMASH: Advancing Traditional Chinese Medicine Formula Recommendation with Efficient Fusion of Multiscale Associations of Symptoms and Herbs

Xinhan Zheng, Xueting Wang, Ruotai Li, Huyu Wu, Haopeng Jin, Yehan Yang, Guodong Shan

2501.13340 2026-04-15 cs.CV

Retrievals Can Be Detrimental: Unveiling the Backdoor Vulnerability of Retrieval-Augmented Diffusion Models

Hao Fang, Xiaohang Sui, Hongyao Yu, Kuofeng Gao, Jiawei Kong, Sijin Yu, Bin Chen, Shu-Tao Xia

Comments Accepted by ACL-2026

2412.07238 2026-04-15 cs.CL q-bio.NC

Speaker effects in language comprehension: An integrative model of language and speaker processing

Hanlin Wu, Zhenguang G. Cai

2410.23728 2026-04-15 cs.CL

GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization

Irina Tolstykh, Aleksandra Tsybina, Sergey Yakubson, Aleksandr Gordeev, Vladimir Dokholyan, Maksim Kuprashevich

Comments Accepted to Findings of the Association for Computational Linguistics: ACL 2026

2410.03000 2026-04-15 cs.LG cs.CR

Towards Generalized Certified Robustness with Multi-Norm Training

Enyi Jiang, David S. Cheung, Gagandeep Singh

Comments Accepted by TMLR 2026

2407.17182 2026-04-15 cs.LG

A DeepONet for inverting the Neumann-to-Dirichlet Operator in Electrical Impedance Tomography: An approximation theoretic perspective and numerical results

Anuj Abhishek, Thilo Strauss

2405.18921 2026-04-15 cs.LG

GLANCE: Global Actions in a Nutshell for Counterfactual Explainability

Loukas Kavouras, Eleni Psaroudaki, Konstantinos Tsopelas, Dimitrios Rontogiannis, Nikolaos Theologitis, Dimitris Sacharidis, Giorgos Giannopoulos, Dimitrios Tomaras, Kleopatra Markou, Dimitrios Gunopulos, Dimitris Fotakis, Ioannis Emiris

详情

DOI: 10.1609/aaai.v40i27.39414
Journal ref: 2026 Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence

英文摘要

The widespread deployment of machine learning systems in critical real-world decision-making applications has highlighted the urgent need for counterfactual explainability methods that operate effectively. Global counterfactual explanations, expressed as actions to offer recourse, aim to provide succinct explanations and insights applicable to large population subgroups. High effectiveness, measured by the fraction of the population that is provided recourse, ensures that the actions benefit as many individuals as possible. Keeping the cost of actions low ensures the proposed recourse actions remain practical and actionable. Limiting the number of actions that provide global counterfactuals is essential to maximizing interpretability. The primary challenge, therefore, is to balance these trade-offs--maximizing effectiveness, minimizing cost, while maintaining a small number of actions. We introduce $\texttt{GLANCE}$, a versatile and adaptive algorithm that employs a novel agglomerative approach, jointly considering both the feature space and the space of counterfactual actions, thereby accounting for the distribution of points in a way that aligns with the model's structure. This design enables the careful balancing of the trade-offs among the three key objectives, with the size objective functioning as a tunable parameter to keep the actions few and easy to interpret. Our extensive experimental evaluation demonstrates that $\texttt{GLANCE}$ consistently shows greater robustness and performance compared to existing methods across various datasets and models.

URL PDF HTML ☆

赞 0 踩 0

2403.07818 2026-04-15 cs.CV cs.AI cs.LG

Label Dropout: Improved Deep Learning Echocardiography Segmentation Using Multiple Datasets With Domain Shift and Partial Labelling

Iman Islam, Esther Puyol-Antón, Bram Ruijsink, Andrew J. Reader, Andrew P. King

Comments 10 pages, 5 figures, ASMUS 2024, Held in Conjunction with MICCAI 2024

2309.13904 2026-04-15 cs.CV

Subspace-Guided Feature Reconstruction for Unsupervised Anomaly Localization

Katsuya Hotta, Chao Zhang, Yoshihiro Hagihara, Takuya Akashi

2604.12551 2026-04-15 cs.CV

Cross-Attentive Multiview Fusion of Vision-Language Embeddings

Tomas Berriel Martins, Martin R. Oswald, Javier Civera

2604.12545 2026-04-15 cs.AI cs.CY

Cross-Cultural Simulation of Citizen Emotional Responses to Bureaucratic Red Tape Using LLM Agents

Wanchun Ni, Jiugeng Sun, Yixian Liu, Mennatallah El-Assady

Comments To appear in the CHI 2026 Workshop on PoliSim

2604.12543 2026-04-15 cs.AI

A Two-Stage LLM Framework for Accessible and Verified XAI Explanations

Georgios Mermigkis, Dimitris Metaxakis, Marios Tyrovolas, Argiris Sofotasios, Nikolaos Avgeris, Panagiotis Hadjidoukas, Chrysostomos Stylios

Comments 8 pages, 8 figures, Accepted for publication at the 2026 IEEE World Congress on Computational Intelligence (WCCI 2026)

2604.12540 2026-04-15 cs.CL cs.AI

When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP

Mahounan Pericles Adjovi, Roald Eiselen, Prasenjit Mitra

Comments 13 pages, 6 tables; previously submitted to KDD 2026

2604.12537 2026-04-15 cs.CV cs.AI

MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models

Ruoxiang Huang, Zhen Yuan

Comments Accepted by CVPR 2026 (Highlight). 10 pages, 2 figures, 5 tables

2604.12534 2026-04-15 cs.AI cs.LO

Technical Report -- A Context-Sensitive Multi-Level Similarity Framework for First-Order Logic Arguments: An Axiomatic Study

Victor David, Jérôme Delobelle, Jean-Guy Mailly

Comments 19 pages, 6 figures

2604.12526 2026-04-15 cs.LG cs.AI

Orthogonal Subspace Projection for Continual Machine Unlearning via SVD-Based LoRA

Yogachandran Rahulamathavan, Nasir Iqbal, Juncheng Hu, Sangarapillai Lambotharan