arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.20868 2026-03-24 cs.CV

TAFG-MAN: Timestep-Adaptive Frequency-Gated Latent Diffusion for Efficient and High-Quality Low-Dose CT Image Denoising

Tangtangfang Fang, Yang Jiao, Xiangjian He, Jingxi Hu, Jiaqi Yang

详情

英文摘要

Low-dose computed tomography (LDCT) reduces radiation exposure but also introduces substantial noise and structural degradation, making it difficult to suppress noise without erasing subtle anatomical details. In this paper, we present TAFG-MAN, a latent diffusion framework for efficient and high-quality LDCT image denoising. The framework combines a perceptually optimized autoencoder, conditional latent diffusion restoration in a compact latent space, and a lightweight Timestep-Adaptive Frequency-Gated (TAFG) conditioning design. TAFG decomposes condition features into low- and high-frequency components, predicts timestep-adaptive gates from the current denoising feature and timestep embedding, and progressively releases high-frequency guidance in later denoising stages before cross-attention. In this way, the model relies more on stable structural guidance at early reverse steps and introduces fine details more cautiously as denoising proceeds, improving the balance between noise suppression and detail preservation. Experiments show that TAFG-MAN achieves a favorable quality-efficiency trade-off against representative baselines. Compared with its base variant without TAFG, it further improves detail preservation and perceptual quality while maintaining essentially the same inference cost, and ablation results confirm the effectiveness of the proposed conditioning mechanism.

URL PDF HTML ☆

赞 0 踩 0

2603.20867 2026-03-24 cs.LG cs.AI cs.CL cs.NE

Semantic Sections: An Atlas-Native Feature Ontology for Obstructed Representation Spaces

Hossein Javidnia

Comments 20 pages, 2 figures

2603.20860 2026-03-24 cs.CV cs.AI

Restoring Neural Network Plasticity for Faster Transfer Learning

Xander Coetzer, Arné Schreuder, Anna Sergeevna Bosman

Comments 11 pages, 1 figure, 6 tables and 2 formulas

2603.20857 2026-03-24 cs.CV cs.GR

Fast and Robust Deformable 3D Gaussian Splatting

Han Jiao, Jiakai Sun, Lei Zhao, Zhanjie Zhang, Wei Xing, Huaizhong Lin

2603.20856 2026-03-24 cs.CV cs.LG

Ensemble of Small Classifiers For Imbalanced White Blood Cell Classification

Siddharth Srivastava, Adam Smith, Scott Brooks, Jack Bacon, Till Bretschneider

Comments Accepted at ISBI 2026 WBCBench Challenge

2603.20854 2026-03-24 cs.CL cs.AI

SozKZ: Training Efficient Small Language Models for Kazakh from Scratch

Saken Tukenov

Comments 12 pages, 3 figures, 2 tables

2603.20851 2026-03-24 cs.CL cs.AI

Can ChatGPT Really Understand Modern Chinese Poetry?

Shanshan Wang, Derek F. Wong, Jingming Yao, Lidia S. Chao

Comments Accepted by EACL 2026

2603.20848 2026-03-24 cs.CV cs.CE q-bio.TO

GOLDMARK: Governed Outcome-Linked Diagnostic Model Assessment Reference Kit

Chad Vanderbilt, Gabriele Campanella, Siddharth Singi, Swaraj Nanda, Jie-Fu Chen, Ali Kamali, Amir Momeni Boroujeni, David Kim, Mohamed Yakoub, Jamal Benhamida, Meera Hameed, Neeraj Kumar, Gregory Goldgof

2603.20842 2026-03-24 cs.LG

A Knowledge-Informed Pretrained Model for Causal Discovery

Wenbo Xu, Yue He, Yunhai Wang, Xingxuan Zhang, Kun Kuang, Yueguo Chen, Peng Cui

2603.20839 2026-03-24 cs.CV cs.AI cs.HC cs.LG

Dodgersort: Uncertainty-Aware VLM-Guided Human-in-the-Loop Pairwise Ranking

Yujin Park, Haejun Chung, Ikbeom Jang

Comments 12 pages, 2 figures, Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD2026)

2603.20836 2026-03-24 cs.CV cs.AI

MERIT: Multi-domain Efficient RAW Image Translation

Wenjun Huang, Shenghao Fu, Yian Jin, Yang Ni, Ziteng Cui, Hanning Chen, Yirui He, Yezi Liu, Sanggeon Yun, SungHeon Jeong, Ryozo Masukawa, William Youngwoo Chung, Mohsen Imani

2603.20829 2026-03-24 cs.LG

Beyond the Academic Monoculture: A Unified Framework and Industrial Perspective for Attributed Graph Clustering

Yunhui Liu, Yue Liu, Yongchao Liu, Tao Zheng, Stan Z. Li, Xinwang Liu, Tieke He

详情

英文摘要

Attributed Graph Clustering (AGC) is a fundamental unsupervised task that partitions nodes into cohesive groups by jointly modeling structural topology and node attributes. While the advent of graph neural networks and self-supervised learning has catalyzed a proliferation of AGC methodologies, a widening chasm persists between academic benchmark performance and the stringent demands of real-world industrial deployment. To bridge this gap, this survey provides a comprehensive, industrially grounded review of AGC from three complementary perspectives. First, we introduce the Encode-Cluster-Optimize taxonomic framework, which decomposes the diverse algorithmic landscape into three orthogonal, composable modules: representation encoding, cluster projection, and optimization strategy. This unified paradigm enables principled architectural comparisons and inspires novel methodological combinations. Second, we critically examine prevailing evaluation protocols to expose the field's academic monoculture: a pervasive over-reliance on small, homophilous citation networks, the inadequacy of supervised-only metrics for an inherently unsupervised task, and the chronic neglect of computational scalability. In response, we advocate for a holistic evaluation standard that integrates supervised semantic alignment, unsupervised structural integrity, and rigorous efficiency profiling. Third, we explicitly confront the practical realities of industrial deployment. By analyzing operational constraints such as massive scale, severe heterophily, and tabular feature noise alongside extensive empirical evidence from our companion benchmark, we outline actionable engineering strategies. Furthermore, we chart a clear roadmap for future research, prioritizing heterophily-robust encoders, scalable joint optimization, and unsupervised model selection criteria to meet production-grade requirements.

URL PDF HTML ☆

赞 0 踩 0

2603.20828 2026-03-24 cs.CV

EruDiff: Refactoring Knowledge in Diffusion Models for Advanced Text-to-Image Synthesis

Xiefan Guo, Xinzhu Ma, Haoxiang Ma, Zihao Zhou, Di Huang

2603.20827 2026-03-24 cs.RO

Swim2Real: VLM-Guided System Identification for Sim-to-Real Transfer

Kevin Qiu, Kyle Walker, Mike Y. Michelis, Marek Cygan, Josie Hughes

2603.20825 2026-03-24 cs.LG

Cross-Granularity Representations for Biological Sequences: Insights from ESM and BiGCARP

Hanlin Xiao, Rainer Breitling, Eriko Takano, Mauricio A. Álvarez

Comments 9 pages, 4 figures, published in 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

2603.20819 2026-03-24 cs.LG cs.SY eess.SY stat.ML

Achieving $\widetilde{O}(1/ε)$ Sample Complexity for Bilinear Systems Identification under Bounded Noises

Hongyu Yi, Chenbei Lu, Jing Yu

2603.20818 2026-03-24 cs.CV cs.AI

PlanaReLoc: Camera Relocalization in 3D Planar Primitives via Region-Based Structure Matching

Hanqiao Ye, Yuzhou Liu, Yangdong Liu, Shuhan Shen

Comments Accepted by CVPR 2026. 20 pages, 15 figures. Code at https://github.com/3dv-casia/PlanaReLoc

2603.20815 2026-03-24 cs.AI

GMPilot: An Expert AI Agent For FDA cGMP Compliance

Xiaohan Wang, Nan Zhang, Sulene Han, Keguang Tang, Lei Xu, Zhiping Li, Xiue, Liu, Xiaomei Han

Comments 14 pages, 1 figure

2603.20811 2026-03-24 cs.CV

Lean Learning Beyond Clouds: Efficient Discrepancy-Conditioned Optical-SAR Fusion for Semantic Segmentation

Chenxing Meng, Wuzhou Quan, Yingjie Cai, Liqun Cao, Liyan Zhang, Mingqiang Wei

Comments 14 page, 7 figures

2603.20808 2026-03-24 cs.CV cs.LG

Predictive Regularization Against Visual Representation Degradation in Multimodal Large Language Models

Enguang Wang, Qiang Wang, Yuanchen Wu, Ke Yan, Xinbin Yuan, Shouhong Ding, Xialei Liu, Ming-Ming Cheng

Comments Accepted at CVPR 2026

2603.20807 2026-03-24 cs.CL

BenchBench: Benchmarking Automated Benchmark Generation

Yandan Zheng, Haoran Luo, Zhenghong Lin, Wenjin Liu, Luu Anh Tuan

2603.20804 2026-03-24 cs.CV cs.RO

Does Peer Observation Help? Vision-Sharing Collaboration for Vision-Language Navigation

Qunchao Jin, Yiliao Song, Qi Wu

2603.20801 2026-03-24 cs.LG

Large Neighborhood Search meets Iterative Neural Constraint Heuristics

Yudong W. Xu, Wenhao Li, Scott Sanner, Elias B. Khalil

Comments Published in the 23rd International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research

2603.20799 2026-03-24 cs.CL cs.LG

RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution

Kaiyuan Li, Jing-Cheng Pang, Yang Yu

2603.20795 2026-03-24 cs.CL

The Anatomy of an Edit: Mechanism-Guided Activation Steering for Knowledge Editing

Yuan Cao, Mingyang Wang, Hinrich Schütze

2603.20791 2026-03-24 cs.LG

Neural Autoregressive Flows for Markov Boundary Learning

Khoa Nguyen, Bao Duong, Viet Huynh, Thin Nguyen

Comments Accepted at IEEE ICDM 2025

2603.20785 2026-03-24 cs.CV

ME-IQA: Memory-Enhanced Image Quality Assessment via Re-Ranking

Kanglong Fan, Tianhe Wu, Wen Wen, Jianzhao Liu, Le Yang, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang

2603.20782 2026-03-24 cs.CV

MEMO: Human-like Crisp Edge Detection Using Masked Edge Prediction

Jiaxin Cheng, Yue Wu, Yicong Zhou

Comments Accepted at CVPR 2026

2603.20781 2026-03-24 cs.CL

Code-MIE: A Code-style Model for Multimodal Information Extraction with Scene Graph and Entity Attribute Knowledge Enhancement

Jiang Liu, Ge Qiu, Hao Fei, Dongdong Xie, Jinbo Li, Fei Li, Chong Teng, Donghong Ji

详情

英文摘要

With the rapid development of large language models (LLMs), more and more researchers have paid attention to information extraction based on LLMs. However, there are still some spaces to improve in the existing related methods. First, existing multimodal information extraction (MIE) methods usually employ natural language templates as the input and output of LLMs, which mismatch with the characteristics of information tasks that mostly include structured information such as entities and relations. Second, although a few methods have adopted structured and more IE-friendly code-style templates, they just explored their methods on text-only IE rather than multimodal IE. Moreover, their methods are more complex in design, requiring separate templates to be designed for each task. In this paper, we propose a Code-style Multimodal Information Extraction framework (Code-MIE) which formalizes MIE as unified code understanding and generation. Code-MIE has the following novel designs: (1) Entity attributes such as gender, affiliation are extracted from the text to guide the model to understand the context and role of entities. (2) Images are converted into scene graphs and visual features to incorporate rich visual information into the model. (3) The input template is constructed as a Python function, where entity attributes, scene graphs and raw text compose of the function parameters. In contrast, the output template is formalized as Python dictionaries containing all extraction results such as entities, relations, etc. To evaluate Code-MIE, we conducted extensive experiments on the M$^3$D, Twitter-15, Twitter-17, and MNRE datasets. The results show that our method achieves state-of-the-art performance compared to six competing baseline models, with 61.03\% and 60.49\% on the English and Chinese datasets of M$^3$D, and 76.04\%, 88.07\%, and 73.94\% on the other three datasets.

URL PDF HTML ☆

赞 0 踩 0

2603.20777 2026-03-24 cs.LG cs.AI cs.CV

OmniPatch: A Universal Adversarial Patch for ViT-CNN Cross-Architecture Transfer in Semantic Segmentation

Aarush Aggarwal, Akshat Tomar, Amritanshu Tiwari, Sargam Goyal

Comments 10 pages, 4 figures, ICLR 2026: Principled Design for Trustworthy AI