arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.17856 2026-04-21 cs.CV

PlankFormer: Robust Plankton Instance Segmentation via MAE-Pretrained Vision Transformers and Pseudo Community Image Generation

Masaharu Miyazaki, Yurie Otake, Koichi Ito, Wataru Makino, Jotaro Urabe, Takafumi Aoki

Comments Accepted to ICPR2026

详情

英文摘要

Plankton monitoring is essential for assessing aquatic ecosystems but is limited by the labor-intensive nature of manual microscopic analysis. Automating the segmentation of plankton from crowded images is crucial, however, it faces two major challenges: (i) the scarcity of pixel-level annotated datasets and (ii) the difficulty of distinguishing plankton from debris and overlapping individuals using conventional CNN-based methods. To address these issues, we propose PlankFormer, a novel framework for plankton instance segmentation. First, to overcome the data shortage, we introduce a method to generate labeled Pseudo Community Images (PCI) by synthesizing individual plankton images onto diverse backgrounds, including those created by generative models. Second, we propose a segmentation model utilizing a Vision Transformer (ViT) backbone with a Mask2Former decoder. To robustly capture the global structural features of plankton against occlusion and debris, we employ a Masked Autoencoder (MAE) for self-supervised pre-training on unlabeled individual images. Experimental results on real-world datasets demonstrate that our method significantly outperforms conventional methods, such as Mask R-CNN, particularly in challenging environments with high debris density. We demonstrate that our synthetic training strategy and MAE-based architecture enable high-precision segmentation with requiring less manual annotations for individual plankton images.

URL PDF HTML ☆

赞 0 踩 0

2604.17852 2026-04-21 cs.SD

LLM-Codec: Neural Audio Codec Meets Language Model Objectives

Ho-Lam Chung, Yiming Chen, Hung-yi Lee

Comments ACL2026 Finding

2604.17850 2026-04-21 cs.CV

UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement

Jingwei Yang, Ruoxi Wu, Wei Shen, Meng Li, Yulong Liu, Huimin She, Lunxi Yuan

2604.17849 2026-04-21 cs.AI

On the Reliability of Computer Use Agents

Gonzalo Gonzalez-Pumariega, Saaket Agashe, Jiachen Yang, Ang Li, Xin Eric Wang

Comments 33 pages, 3 figures, 4 tables

2604.17846 2026-04-21 cs.CV cs.AI

AI Approach for MRI-only Full-Spine Vertebral Segmentation and 3D Reconstruction in Paediatric Scoliosis

Nathasha Naranpanawa, Maree T. Izatt, Robert D. Labrom, Geoffrey N. Askin, J. Paige Little

Comments Presented at 2026 Spine Society of Australia 37th Annual Scientific Meeting

2604.17842 2026-04-21 cs.CL

QuickScope: Certifying Hard Questions in Dynamic LLM Benchmarks

Taylor Lundy, Narun K. Raman, Kevin Leyton-Brown

Comments 10 pages, 3 figures

2604.17841 2026-04-21 cs.RO

Driving risk emerges from the required two-dimensional joint evasive acceleration

Hao Cheng, Yanbo Jiang, Wenhao Yu, Rui Zhou, Jiang Bian, Keyu Chen, Zhiyuan Liu, Heye Huang, Hailun Zhang, Fang Zhang, Jianqiang Wang, Sifa Zheng

Comments 23 pages, 5 figures; supplementary information provided as an ancillary file

2604.17837 2026-04-21 cs.AI cs.CL cs.LG

Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs

Charles Ye, Bo Yuan, Lee Sharkey

2604.17831 2026-04-21 cs.CV cs.GR

PCM-NeRF: Probabilistic Camera Modeling for Neural Radiance Fields under Pose Uncertainty

Shravan Venkatraman, Rakesh Raj Madavan, Pavan Kumar Sathya Venkatesh

Comments CVPR-W 2026 (GenRec3D)

2604.17830 2026-04-21 cs.RO

SYMBOLIZER: Symbolic Model-free Task Planning with VLMs

Sami Azirar, Zlatan Ajanovic, Hermann Blum

Comments under review

2604.17828 2026-04-21 cs.CL

How Non-Linguistic Is the Indus Sign System? A Synthetic-Baseline Scorecard

Ashish Nair

Comments 13 pages, 4 figures, 8 tables. Code available from corresponding author upon request

2604.17827 2026-04-21 cs.CL

Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models

Hang Zeng, Xiangyu Liu, Yong Hu, Chaoyue Niu, Jiarui Zhang, Shaojie Tang, Fan Wu, Guihai Chen

Comments 8 content pages

2604.16272 2026-04-21 cs.CV cs.AI cs.CL

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

Xiangbo Gao, Sicong Jiang, Bangya Liu, Xinghao Chen, Minglai Yang, Siyuan Yang, Mingyang Wu, Jiongze Yu, Qi Zheng, Haozhi Wang, Jiayi Zhang, Jie Yang, Zihan Wang, Qing Yin, Zhengzhong Tu

2604.16254 2026-04-21 cs.SD eess.AS

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Heewon Oh

Comments v2: Added SONICS 3-way (n=23,288), OOD taxonomy, benchmark coverage table, baseline reproduction appendix; toned-down claims; reframed discussion as asymmetric defender advantage. 8 pages, 6 figs, 12 tables

2604.16146 2026-04-21 cs.CL

On the Rejection Criterion for Proxy-based Test-time Alignment

Ayoub Hammal, Pierre Zweigenbaum, Caio Corro

Comments ACL 2026 Main

2604.16123 2026-04-21 cs.LG physics.chem-ph

Tabular foundation models for in-context prediction of molecular properties

Karim K. Ben Hicham, Jan G. Rittig, Martin Grohe, Alexander Mitsos

2604.15951 2026-04-21 cs.AI

Integrating Graphs, Large Language Models, and Agents: Reasoning and Retrieval

Hamed Jelodar, Samita Bai, Mohammad Meymani, Parisa Hamedi, Roozbeh Razavi-Far, Ali Ghorbani

2604.15841 2026-04-21 cs.CL

Exploring the Capability Boundaries of LLMs in Mastering of Chinese Chouxiang Language

Dianqing Lin, Tian Lan, Jiali Zhu, Jiang Li, Wei Chen, Xu Liu, Aruukhan, Xiangdong Su, Hongxu Hou, Guanglai Gao

Comments Accepted to ACL 2026 Findings

2604.15613 2026-04-21 cs.LG cs.AI

VoodooNet: Achieving Analytic Ground States via High-Dimensional Random Projections

Wladimir Silva

Comments 8 pages, 3 figures, 2 tables

2604.14902 2026-04-21 cs.AI cs.CL cs.CV cs.RO

ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints

Pei-An Chen, Yong-Ching Liang, Jia-Fong Yeh, Hung-Ting Su, Yi-Ting Chen, Min Sun, Winston Hsu

2604.14548 2026-04-21 cs.SD cs.LG eess.AS

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

Yuxiang Wang, Hongyu Liu, Yijiang Xu, Qinke Ni, Li Wang, Wan Lin, Kunyu Feng, Dekun Chen, Xu Tan, Lei Wang, Jie Shi, Zhizheng Wu

2604.14267 2026-04-21 cs.LG cs.AI

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

Junzhe Wang, Zhiheng Xi, Yajie Yang, Hao Luo, Shihan Dou, Tao Gui, Qi Zhang

Comments Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), Main Conference

2604.13366 2026-04-21 cs.LG cs.RO cs.SY eess.SY

Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics

Angelo Moroncelli, Matteo Rufolo, Gunes Cagin Aydin, Asad Ali Shahid, Loris Roveda

Comments Angelo Moroncelli, Matteo Rufolo and Gunes Cagin Aydin contributed equally to this work

2604.12919 2026-04-21 cs.CL

MetFuse: Figurative Fusion between Metonymy and Metaphor

Saptarshi Ghosh, Tianyu Jiang

Comments ACL 2026

2604.12320 2026-04-21 cs.CV cs.AI cs.MM

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports

Jianzhe Ma, Zhonghao Cao, Shangkui Chen, Yichen Xu, Wenxuan Wang, Qin Jin

Comments Work in progress

2604.11789 2026-04-21 cs.CV

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

Yuqian Yuan, Wenqiao Zhang, Juekai Lin, Yu Zhong, Mingjian Gao, Binhe Yu, Yunqi Cao, Wentong Li, Yueting Zhuang, Beng Chin Ooi

Comments 38 pages, 6 figures

2604.11480 2026-04-21 cs.AI

On the Complexity of the Discussion-based Semantics in Abstract Argumentation

Lydia Blümel, Kai Sauerwald, Kenneth Skiba, Matthias Thimm

2604.10693 2026-04-21 cs.AI

FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

Yuxi Sun, Aoqi Zuo, Haotian Xie, Wei Gao, Mingming Gong, Jing Ma

Comments Accepted to Association for Computational Linguistics Findings (ACL) 2026

2604.10452 2026-04-21 cs.CL

NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning

Yanyi Su, Hongshuai Wang, Zhifeng Gao, Jun Cheng

Comments Accepted to the ACL 2026 Main Conference

2604.10439 2026-04-21 cs.CV

Removing Motion Artifact in MRI by Using a Perceptual Loss Driven Deep Learning Framework

Ziheng Guo, Danqun Zheng, Shuai Li, Chengwei Chen, Boyang Pan, Xuezhou Li, Ziqin Yu, Langdi Zhong, Chenwei Shao, Yun Bian, Nan-Jie Gong

Comments 7 figrues, 6 tables