arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16792 2026-03-18 cs.CV cs.AI

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Han Lin, Xichen Pan, Zun Wang, Yue Zhang, Chu Wang, Jaemin Cho, Mohit Bansal

Comments code: https://github.com/HL-hanlin/V-Co

详情

英文摘要

Pixel-space diffusion has recently re-emerged as a strong alternative to latent diffusion, enabling high-quality generation without pretrained autoencoders. However, standard pixel-space diffusion models receive relatively weak semantic supervision and are not explicitly designed to capture high-level visual structure. Recent representation-alignment methods (e.g., REPA) suggest that pretrained visual features can substantially improve diffusion training, and visual co-denoising has emerged as a promising direction for incorporating such features into the generative process. However, existing co-denoising approaches often entangle multiple design choices, making it unclear which design choices are truly essential. Therefore, we present V-Co, a systematic study of visual co-denoising in a unified JiT-based framework. This controlled setting allows us to isolate the ingredients that make visual co-denoising effective. Our study reveals four key ingredients for effective visual co-denoising. First, preserving feature-specific computation while enabling flexible cross-stream interaction motivates a fully dual-stream architecture. Second, effective classifier-free guidance (CFG) requires a structurally defined unconditional prediction. Third, stronger semantic supervision is best provided by a perceptual-drifting hybrid loss. Fourth, stable co-denoising further requires proper cross-stream calibration, which we realize through RMS-based feature rescaling. Together, these findings yield a simple recipe for visual co-denoising. Experiments on ImageNet-256 show that, at comparable model sizes, V-Co outperforms the underlying pixel-space diffusion baseline and strong prior pixel-diffusion methods while using fewer training epochs, offering practical guidance for future representation-aligned generative models.

URL PDF HTML ☆

赞 0 踩 0

2603.16789 2026-03-18 cs.LG q-bio.QM

Conservative Continuous-Time Treatment Optimization

Nora Schneider, Georg Manten, Niki Kilbertus

2603.16783 2026-03-18 cs.CL

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

Jonggeun Lee, Junseong Pyo, Jeongmin Park, Yohan Jo

2603.16781 2026-03-18 cs.CV cs.AI

IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

Huimin Xiong, Zijie Meng, Tianxiang Hu, Chenyi Zhou, Yang Feng, Zuozhu Liu

2603.16777 2026-03-18 cs.AI

Anticipatory Planning for Multimodal AI Agents

Yongyuan Liang, Shijie Zhou, Yu Gu, Hao Tan, Gang Wu, Franck Dernoncourt, Jihyung Kil, Ryan A. Rossi, Ruiyi Zhang

Comments Published at CVPR 2026 Findings Track

2603.16772 2026-03-18 cs.RO cs.HC

Beyond Cybathlon: On-demand Quadrupedal Assistance for People with Limited Mobility

Carmen Scheidemann, Andrei Cramariuc, Changan Chen, Jia-Ruei Chiu, Marco Hutter

2603.16769 2026-03-18 cs.CV

GDPO-SR: Group Direct Preference Optimization for One-Step Generative Image Super-Resolution

Qiaosi Yi, Shuai Li, Rongyuan Wu, Lingchen Sun, Zhengqiang Zhang, Lei Zhang

2603.16761 2026-03-18 cs.LG cs.CL

SOMP: Scalable Gradient Inversion for Large Language Models via Subspace-Guided Orthogonal Matching Pursuit

Yibo Li, Qiongxiu Li

Comments 18 pages, 4 figures, 13 tables

2603.16760 2026-03-18 cs.CV

Dual Stream Independence Decoupling for True Emotion Recognition under Masked Expressions

Jinsheng Wei, Xiguang Zhang, Zheng Shi, Guanming Lu

2603.16759 2026-03-18 cs.CL cs.AI

TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities

Victoria Graf, Valentina Pyatkin, Nouha Dziri, Nathan Lambert, Hannaneh Hajishirzi

2603.16758 2026-03-18 cs.CV

SuCor: Susceptibility Distortion Correction via Parameter-Free and Self-Regularized Optimal Transport

Sreekar Chigurupati, Eleftherios Garyfallidis

2603.16354 2026-03-18 cs.CL cs.IR cs.LG

PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development

Hanif Rahman

2603.15821 2026-03-18 cs.LG cs.AI

Hypothesis Class Determines Explanation: Why Accurate Models Disagree on Feature Attribution

Thackshanaramana B

Comments 17 pages, 1 figure. Submitted to TMLR

2603.15643 2026-03-18 cs.AI

GSI Agent: Domain Knowledge Enhancement for Large Language Models in Green Stormwater Infrastructure

Shaohuang Wang

2603.15633 2026-03-18 cs.AI

Neural-Symbolic Logic Query Answering in Non-Euclidean Space

Lihui Liu

2603.13856 2026-03-18 cs.LG cs.CV

OrigamiBench: An Interactive Environment to Synthesize Flat-Foldable Origamis

Naaisha Agarwal, Yihan Wu, Yichang Jian, Yikuan Hu, Nishad Mansoor, Mohan Li, Yifei Peng, Wang-Zhou Dai, Yao-Xiang Ding, Emanuele Sansone

2603.13669 2026-03-18 cs.CV cs.AI cs.LG

SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment

Mahdi Naseri, Zhou Wang

Comments Submitted to IEEE Transactions on Image Processing

2603.05829 2026-03-18 cs.LG cs.CL

Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

Shubhangi Upasani, Chen Wu, Jay Rainton, Bo Li, Urmish Thakker, Changran Hu, Qizheng Zhang

2603.05413 2026-03-18 cs.SD

Building Enterprise Realtime Voice Agents from Scratch: A Technical Tutorial

Jielin Qiu, Zixiang Chen, Liangwei Yang, Ming Zhu, Zhiwei Liu, Juntao Tan, Wenting Zhao, Rithesh Murthy, Roshan Ram, Akshara Prabhakar, Shelby Heinecke, Caiming Xiong, Silvio Savarese, Huan Wang

2603.04722 2026-03-18 cs.AI cs.CL cs.LG

Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models

Jihoon Jeong

Comments 56 pages, 7 figures. Project page: https://jihoonjeong.github.io/model-medicine/

2603.01932 2026-03-18 cs.CV

BAWSeg: A UAV Multispectral Benchmark for Barley Weed Segmentation

Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Dustin Severtson, Ajmal Mian

Comments This article has been published in Remote Sensing as part of the Special Issue Intelligent UAV Remote Sensing for Next-Generation Precision Agriculture

详情

英文摘要

Accurate weed mapping in cereal fields requires pixel-level segmentation from UAV imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop--weed pixels, or on single-stream CNN and Transformer backbones that ingest stacked bands and indices, where radiance cues and normalized index cues interfere and reduce sensitivity to small weed clusters embedded in crop canopy. We propose VISA, a two-stream segmentation network that decouples these cues and fuses them at native resolution. The radiance stream learns from calibrated five-band reflectance using local residual convolutions, channel recalibration, spatial gating, and skip-connected decoding, which preserve fine textures, row boundaries, and small weed structures that are often weakened after ratio-based index compression. The index stream operates on vegetation-index maps with windowed self-attention to model local structure efficiently, state-space layers to propagate field-scale context without quadratic attention cost, and Slot Attention to form stable region descriptors that improve discrimination of sparse weeds under canopy mixing. To support supervised training and deployment-oriented evaluation, we introduce BAWSeg, a four-year UAV multispectral dataset collected over commercial barley paddocks in Western Australia, providing radiometrically calibrated blue, green, red, red edge, and near-infrared orthomosaics, derived vegetation indices, and dense crop, weed, and other labels with leakage-free block splits. On BAWSeg, VISA achieves 75.6% mIoU and 63.5% weed IoU with 22.8 M parameters, outperforming a multispectral SegFormer-B1 baseline by 1.2 mIoU and 1.9 weed IoU. Under cross-plot and cross-year protocols, VISA maintains 71.2% and 69.2% mIoU, respectively.

URL PDF HTML ☆

赞 0 踩 0

2603.00010 2026-03-18 cs.LG math.OC

Transit Network Design with Two-Level Demand Uncertainties: A Machine Learning and Contextual Stochastic Optimization Framework

Hongzhao Guan, Beste Basciftci, Pascal Van Hentenryck

2601.13798 2026-03-18 cs.CV cs.AI cs.LG

CFM: Language-aligned Concept Foundation Model for Vision

Kai Wittenmayer, Sukrut Rao, Amin Parchami-Araghi, Bernt Schiele, Jonas Fischer

Comments 53 pages, 29 figures, 4 tables

2601.04153 2026-03-18 cs.CV

Diffusion-DRF: Free, Rich, and Differentiable Reward for Video Diffusion Fine-Tuning

Yifan Wang, Yanyu Li, Gordon Guocheng Qian, Sergey Tulyakov, Yun Fu, Anil Kag

Comments Webpage: https://snap-research.github.io/diffusion-drf/

2601.00430 2026-03-18 cs.CL

Toward Better Temporal Structures for Geopolitical Events Forecasting

Kian Ahrabian, Eric Boxer, Jay Pujara

Comments 18 pages, 15 figures, 3 tables

2510.13939 2026-03-18 cs.CL cs.AI cs.CY

Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers

Tuhin Chakrabarty, Jane C. Ginsburg, Paramveer Dhillon

Comments Preprint Under Review

详情

英文摘要

The use of copyrighted books for training AI has sparked lawsuits from authors concerned about AI generating derivative content. Yet whether these models can produce high-quality literary text emulating authors' voices remains unclear. We conducted a preregistered study comparing MFA-trained writers with three frontier models (ChatGPT, Claude, Gemini) writing up to 450-word excerpts emulating 50 award-winning authors' styles. In blind pairwise evaluations by 28 MFA-trained readers and 516 college-educated general readers, AI text from in-context prompting was strongly disfavored by MFA readers for stylistic fidelity (OR=0.16) and quality (OR=0.13), while general readers showed no fidelity preference (OR=1.06) but favored AI for quality (OR=1.82). Fine-tuning ChatGPT on authors' complete works reversed these results: MFA readers favored AI for fidelity (OR=8.16) and quality (OR=1.87), with general readers showing even stronger preference (fidelity OR=16.65; quality OR=5.42). Both groups preferred fine-tuned AI, but the writer-type X reader-type interaction remained significant (p=0.021 for fidelity; p<10^-4 for quality), indicating general readers favored AI by a wider margin. Effects are robust under cluster-robust inference and generalize across authors in heterogeneity analyses. Fine-tuned outputs were rarely flagged as AI-generated (3% vs. 97% for prompting) by leading detectors. Mediation analysis shows fine-tuning eliminates detectable AI quirks that penalize in-context outputs, altering the nexus between detectability and preference. While not accounting for effort to transform AI output into publishable prose, the median fine-tuning cost of $81 per author represents a 99.7% reduction versus typical writer compensation. Author-specific fine-tuning enables non-verbatim AI writing preferred over expert human writing, providing evidence relevant to copyright's fourth fair-use factor.

URL PDF HTML ☆

赞 0 踩 0

2509.21617 2026-03-18 cs.LG cs.AI cs.NE

LANCE: Low Rank Activation Compression for Efficient On-Device Continual Learning

Marco Paul E. Apolinario, Kaushik Roy

Comments 26 pages, 6 figures

2509.13949 2026-03-18 cs.RO

SHaRe-RL: Structured, Interactive Reinforcement Learning for Contact-Rich Industrial Assembly Tasks

Jannick Stranghöner, Philipp Hartmann, Marco Braun, Sebastian Wrede, Klaus Neumann

Comments 8 pages, 8 figures, accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

2502.02786 2026-03-18 cs.LG

When Machine Learning Gets Personal: Evaluating Prediction and Explanation

Louisa Cornelis, Guillermo Bernárdez, Haewon Jeong, Nina Miolane

Comments 48 pages, 13 figures, accepted to ICLR 2026

2603.16757 2026-03-18 cs.LG

pADAM: A Plug-and-Play All-in-One Diffusion Architecture for Multi-Physics Learning

Amirhossein Mollaali, Bongseok Kim, Christian Moya, Guang Lin

Comments 36 pages, 10 figures