arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.20755 2026-03-24 cs.CV cs.AI

Memory-Efficient Fine-Tuning Diffusion Transformers via Dynamic Patch Sampling and Block Skipping

Sunghyun Park, Jeongho Kim, Hyoungwoo Park, Debasmit Das, Sungrack Yun, Munawar Hayat, Jaegul Choo, Fatih Porikli, Seokeon Choi

Comments Accepted to CVPR 2026; 20 pages

详情

英文摘要

Diffusion Transformers (DiTs) have significantly enhanced text-to-image (T2I) generation quality, enabling high-quality personalized content creation. However, fine-tuning these models requires substantial computational complexity and memory, limiting practical deployment under resource constraints. To tackle these challenges, we propose a memory-efficient fine-tuning framework called DiT-BlockSkip, integrating timestep-aware dynamic patch sampling and block skipping by precomputing residual features. Our dynamic patch sampling strategy adjusts patch sizes based on the diffusion timestep, then resizes the cropped patches to a fixed lower resolution. This approach reduces forward & backward memory usage while allowing the model to capture global structures at higher timesteps and fine-grained details at lower timesteps. The block skipping mechanism selectively fine-tunes essential transformer blocks and precomputes residual features for the skipped blocks, significantly reducing training memory. To identify vital blocks for personalization, we introduce a block selection strategy based on cross-attention masking. Evaluations demonstrate that our approach achieves competitive personalization performance qualitatively and quantitatively, while reducing memory usage substantially, moving toward on-device feasibility (e.g., smartphones, IoT devices) for large-scale diffusion transformers.

URL PDF HTML ☆

赞 0 踩 0

2603.20752 2026-03-24 cs.CV

Smart Operation Theatre: An AI-based System for Surgical Gauze Counting

Saraf Krish, Cai Yiyu, Huang Li Hui

详情

Journal ref: Proceedings of the URECA@NTU 2022-23, Nanyang Technological University

英文摘要

During surgeries, there is a risk of medical gauzes being left inside patients' bodies, leading to "Gossypiboma" in patients and can cause serious complications in patients and also lead to legal problems for hospitals from malpractice lawsuits and regulatory penalties. Diagnosis depends on imaging methods such as X-rays or CT scans, and the usual treatment involves surgical excision. Prevention methods, such as manual counts and RFID-integrated gauzes, aim to minimize gossypiboma risks. However, manual tallying of 100s of gauzes by nurses is time-consuming and diverts resources from patient care. In partnership with Singapore General Hospital (SGH) we have developed a new prevention method, an AI-based system for gauze counting in surgical settings. Utilizing real-time video surveillance and object recognition technology powered by YOLOv5, a Deep Learning model was designed to monitor gauzes on two designated trays labelled "In" and "Out". Gauzes are tracked from the "In" tray, prior to their use in the patient's body & in the "Out" tray post-use, ensuring accurate counting and verifying that no gauze remains inside the patient at the end of the surgery. We have trained it using numerous images from Operation Theatres & augmented it to satisfy all possible scenarios. This study has also addressed the shortcomings of previous project iterations. Previously, the project employed two models: one for human detection and another for gauze detection, trained on a total of 2800 images. Now we have an integrated model capable of identifying both humans and gauzes, using a training set of 11,000 images. This has led to improvements in accuracy and increased the frame rate from 8 FPS to 15 FPS now. Incorporating doctor's feedback, the system now also supports manual count adjustments, enhancing its reliability in actual surgeries.

URL PDF HTML ☆

赞 0 踩 0

2603.20750 2026-03-24 cs.AI

Modeling Epistemic Uncertainty in Social Perception via Rashomon Set Agents

Jinming Yang, Xinyu Jiang, Xinshan Jiao, Xinping Zhang

2603.20746 2026-03-24 cs.LG cs.CR

Adversarial Attacks on Locally Private Graph Neural Networks

Matta Varun, Ajay Kumar Dhakar, Yuan Hong, Shamik Sural

2603.20741 2026-03-24 cs.CV

CTCal: Rethinking Text-to-Image Diffusion Models via Cross-Timestep Self-Calibration

Xiefan Guo, Xinzhu Ma, Haiyu Zhang, Di Huang

Comments Accepted by CVPR 2026

2603.20739 2026-03-24 cs.CV

Mamba Learns in Context: Structure-Aware Domain Generalization for Multi-Task Point Cloud Understanding

Jincen Jiang, Qianyu Zhou, Yuhang Li, Kui Su, Meili Wang, Jian Chang, Jian Jun Zhang, Xuequan Lu

Comments Accepted to CVPR 2026

2603.20732 2026-03-24 cs.CL

MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages

Anri Lombard, Simbarashe Mawere, Temi Aina, Ethan Wolff, Sbonelo Gumede, Elan Novick, Francois Meyer, Jan Buys

Comments 15 pages, 11 tables, appendix included. Accepted at LREC 2026

2603.20731 2026-03-24 cs.CV

VSD-MOT: End-to-End Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Distillation

Jun Du

2603.20730 2026-03-24 cs.CL cs.AI

Reasoning Topology Matters: Network-of-Thought for Complex Reasoning Tasks

Fan Huang

2603.20729 2026-03-24 cs.CV cs.AI physics.geo-ph

Weakly supervised multimodal segmentation of acoustic borehole images with depth-aware cross-attention

Jose Luis Lima de Jesus Silva

2603.20724 2026-03-24 cs.AI cs.LG

Multi-RF Fusion with Multi-GNN Blending for Molecular Property Prediction

Zacharie Bugaud

Comments 5 pages, 4 tables

2603.20721 2026-03-24 cs.CV

Cross-modal Fuzzy Alignment Network for Text-Aerial Person Retrieval and A Large-scale Benchmark

Yifei Deng, Chenglong Li, Yuyang Zhang, Guyue Hu, Jin Tang

Comments Accepted by CVPR 2026 main track

2603.20708 2026-03-24 cs.CV

High-Quality and Efficient Turbulence Mitigation with Events

Xiaoran Zhang, Jian Ding, Yuxing Duan, Haoyue Liu, Gang Chen, Yi Chang, Luxin Yan

Comments Accepted by CVPR 2026

2603.20697 2026-03-24 cs.CV cs.AI

Satellite-to-Street: Synthesizing Post-Disaster Views from Satellite Imagery via Generative Vision Models

Yifan Yang, Lei Zou, Wendy Jepson

Comments Accepted for presentation at IGARSS 2026 (IEEE International Geoscience and Remote Sensing Symposium)

2603.20690 2026-03-24 cs.CV

MFSR: MeanFlow Distillation for One Step Real-World Image Super Resolution

Ruiqing Wang, Kai Zhang, Yuanzhi Zhu, Hanshu Yan, Shilin Lu, Jian Yang

2603.20687 2026-03-24 cs.LG

Neuronal Self-Adaptation Enhances Capacity and Robustness of Representation in Spiking Neural Networks

Zhuobin Yang, Yeyao Bao, Liangfu Lv, Jian Zhang, Xiaohong Li, Yunliang Zang

2603.20686 2026-03-24 cs.SD cs.AI

SNAP: Speaker Nulling for Artifact Projection in Speech Deepfake Detection

Kyudan Jung, Jihwan Kim, Minwoo Lee, Soyoon Kim, Jeonghoon Kim, Jaegul Choo, Cheonbok Park

Comments 9 pages, 3 figures, 2 tables

2603.20684 2026-03-24 cs.LG cs.AI math.OC

Centrality-Based Pruning for Efficient Echo State Networks

Sudip Laudari

Comments 8 pages, 3 figures, 2 tables

2603.20682 2026-03-24 cs.CV

IBCapsNet: Information Bottleneck Capsule Network for Noise-Robust Representation Learning

Canqun Xiang, Chen Yang, Jiaoyan Zhao

2603.20679 2026-03-24 cs.RO

Enhancing Vision-Based Policies with Omni-View and Cross-Modality Knowledge Distillation for Mobile Robots

Kai Li, Shiyu Zhao

2603.20678 2026-03-24 cs.AI econ.GN q-fin.EC

AI-Driven Multi-Agent Simulation of Stratified Polyamory Systems: A Computational Framework for Optimizing Social Reproductive Efficiency

Yicai Xing

Comments 20 pages, 10 figures, 3 tables, 83 references

详情

英文摘要

Contemporary societies face a severe crisis of demographic reproduction. Global fertility rates continue to decline precipitously, with East Asian nations exhibiting the most dramatic trends -- China's total fertility rate (TFR) fell to approximately 1.0 in 2023, while South Korea's dropped below 0.72. Simultaneously, the institution of marriage is undergoing structural disintegration: educated women rationally reject unions lacking both emotional fulfillment and economic security, while a growing proportion of men at the lower end of the socioeconomic spectrum experience chronic sexual deprivation, anxiety, and learned helplessness. This paper proposes a computational framework for modeling and evaluating a Stratified Polyamory System (SPS) using techniques from agent-based modeling (ABM), multi-agent reinforcement learning (MARL), and large language model (LLM)-empowered social simulation. The SPS permits individuals to maintain a limited number of legally recognized secondary partners in addition to one primary spouse, combined with socialized child-rearing and inheritance reform. We formalize the A/B/C stratification as heterogeneous agent types in a multi-agent system and model the matching process as a MARL problem amenable to Proximal Policy Optimization (PPO). The mating network is analyzed using graph neural network (GNN) representations. Drawing on evolutionary psychology, behavioral ecology, social stratification theory, computational social science, algorithmic fairness, and institutional economics, we argue that SPS can improve aggregate social welfare in the Pareto sense. Preliminary computational results demonstrate the framework's viability in addressing the dual crisis of female motherhood penalties and male sexlessness, while offering a non-violent mechanism for wealth dispersion analogous to the historical Chinese Grace Decree (Tui'en Ling).

URL PDF HTML ☆

赞 0 踩 0

2603.20671 2026-03-24 cs.LG stat.ML

Breaking the $O(\sqrt{T})$ Cumulative Constraint Violation Barrier while Achieving $O(\sqrt{T})$ Static Regret in Constrained Online Convex Optimization

Haricharan Balasundaram, Karthick Krishna Mahendran, Rahul Vaze

2603.20669 2026-03-24 cs.RO cs.CV

ToFormer: Towards Large-scale Scenario Depth Completion for Lightweight ToF Camera

Juncheng Chen, Tiancheng Lai, Xingpeng Wang, Bingxin Liao, Baozhe Zhang, Chao Xu, Yanjun Cao

Comments 17 pages, 15 figures

2603.20668 2026-03-24 cs.RO

ROI-Driven Foveated Attention for Unified Egocentric Representations in Vision-Language-Action Systems

Xinhai Sun, Xiang Shi, Menglin Zou, Wenlong Huang

2603.20664 2026-03-24 cs.RO

E-SocialNav: Efficient Socially Compliant Navigation with Language Models

Ling Xiao, Daeun Song, Xuesu Xiao, Toshihiko Yamasaki

Comments Accepted by 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing, to appear. Preprint version

2603.20662 2026-03-24 cs.AI cs.CV

Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning

Xueqi Ma, Shuo Yang, Yanbei Jiang, Shu Liu, Zhenzhen Liu, Jiayang Ao, Xingjun Ma, Sarah Monazam Erfani, James Bailey

2603.20659 2026-03-24 cs.RO

StageCraft: Execution Aware Mitigation of Distractor and Obstruction Failures in VLA Models

Kartikay Milind Pangaonkar, Prabin Rath, Omkar Patil, Nakul Gopalan

2603.20658 2026-03-24 cs.RO

Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation

Zhichao Wu, Junyin Ye, Zhilong Zhang, Yihao Sun, Haoxin Lin, Jiaheng Luo, Haoxiang Ren, Lei Yuan, Yang Yu

2603.20650 2026-03-24 cs.AI cs.CY

From 50% to Mastery in 3 Days: A Low-Resource SOP for Localizing Graduate-Level AI Tutors via Shadow-RAG

Zonglin Yang, J. -H. Xie, Lining Zhang, Jiyou Jia, Zhi-X. Chen

Comments 9 pages, 3 figures, practitioner report

2603.20648 2026-03-24 cs.CV cs.AI

A Multihead Continual Learning Framework for Fine-Grained Fashion Image Retrieval with Contrastive Learning and Exponential Moving Average Distillation

Ling Xiao, Toshihiko Yamasaki

Comments Accepted by IEEE Transactions on Multimedia (TMM), to appear. Preprint version