arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.17843 2026-03-16 cs.CV

JigsawComm: Joint Semantic Feature Encoding and Transmission for Communication-Efficient Cooperative Perception

Chenyi Wang, Zhaowei Li, Ming F. Li, Wujie Wen

详情

英文摘要

Multi-agent cooperative perception (CP) promises to overcome the inherent occlusion and range limitations of single-agent systems in autonomous driving, yet its practicality is severely constrained by limited Vehicle-to-Everything (V2X) communication bandwidth. Existing approaches attempt to improve bandwidth efficiency via compression or heuristic message selection, but neglect the semantic relevance and cross-agent redundancy of the transmitted data. In this paper, we formulate a joint semantic feature encoding and transmission problem that maximizes CP accuracy under a communication budget, and introduce JigsawComm, an end-to-end semantic-aware framework that learns to ``assemble the puzzle'' of multi-agent feature transmission. JigsawComm uses a regularized encoder to extract \emph{sparse, semantically relevant features}, and a lightweight Feature Utility Estimator (FUE) to predict each agent's per-cell contribution to the downstream perception task. The FUE-generated compact meta utility maps are exchanged among agents and used to compute an optimal transmission policy under the learned utility proxy. This policy inherently \emph{eliminates cross-agent redundancy}, bounding the feature transmission payload to $\mathcal{O}(1)$ as the number of agents grows, while the meta information overhead remains negligible. The whole pipeline is trained end-to-end through a differentiable scheduling module, informing the FUE to be aligned with the task objective. On the OPV2V and DAIR-V2X benchmarks, JigsawComm reduces total data volume by over 20--500${\times}$ while matching or exceeding the accuracy of state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2511.14099 2026-03-16 cs.CV cs.AI

FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

Jingren Liu, Shuning Xu, Qirui Yang, Yun Wang, Xiangyu Chen, Zhong Ji

2511.12708 2026-03-16 cs.CV

FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling

Kaiser Hamid, Can Cui, Khandakar Ashrafi Akbar, Ziran Wang, Nade Liang

2511.02244 2026-03-16 cs.LG

Neural network initialization with nonlinear characteristics and information on hierarchical features

Hikaru Homma, Jun Ohkubo

Comments 8 pages, 8 figures

2511.00511 2026-03-16 cs.CV

ID-Crafter: VLM-Grounded Online RL for Compositional Multi-Subject Video Generation

Panwang Pan, Jingjing Zhao, Yuchen Lin, Chenguo Lin, Chenxin Li, Hengyu Liu, Tingting Shen, Yadong MU

Comments Project page: https://angericky.github.io/ID-Crafter, Code: https://github.com/paulpanwang/IDCrafter

2510.27475 2026-03-16 cs.CV cs.MM

Referee: Reference-aware Audiovisual Deepfake Detection

Hyemin Boo, Eunsang Lee, Jiyoung Lee

Comments In Progress

2510.27316 2026-03-16 cs.CV

Parameterized Prompt for Incremental Object Detection

Zijia An, Boyu Diao, Ruiqi Liu, Libo Huang, Chuanguang Yang, Fei Wang, Zhulin An, Yongjun Xu

2510.18632 2026-03-16 cs.CV cs.AI

Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

Zhangquan Chen, Manyuan Zhang, Xinlei Yu, Xufang Luo, Mingze Sun, Zihao Pan, Xiang An, Yan Feng, Peng Pei, Xunliang Cai, Ruqi Huang

Comments 25 pages, 17 figures

2510.15346 2026-03-16 cs.CL cs.AI

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

Heecheol Yun, Kwangmin Ki, Junghyun Lee, Eunho Yang

Comments ICLR 2026

2510.12225 2026-03-16 cs.CV cs.LG

HoneyBee: Data Recipes for Vision-Language Reasoners

Hritik Bansal, Devendra Singh Sachan, Kai-Wei Chang, Aditya Grover, Gargi Ghosh, Wen-tau Yih, Ramakanth Pasunuru

Comments 32 pages. Accepted to CVPR 2026 in Denver, Colorado, USA

2510.03366 2026-03-16 cs.LG cs.AI

Disentangling Recall and Reasoning in Transformer Models through Layer-wise Attention and Activation Analysis

Harshwardhan Fartale, Ashish Kattamuri, Rahul Raja, Arpita Vats, Ishita Prasad, Akshata Kishore Moharir

2509.25084 2026-03-16 cs.CL cs.AI cs.IR cs.LG

Scaling Generalist Data-Analytic Agents

Shuofei Qiao, Yanqiu Zhao, Zhisong Qiu, Xiaobin Wang, Jintian Zhang, Zhao Bin, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen

Comments ICLR 2026

详情

英文摘要

Data-analytic agents are emerging as a key catalyst for automated scientific discovery and for the vision of Innovating AI. Current approaches, however, rely heavily on prompt engineering over proprietary models, while open-source models struggle to face diverse-format, large-scale data files and long-horizon, multi-step reasoning that real-world analytics demands. This paper introduces DataMind, a scalable data synthesis and agent training recipe designed to build generalist data-analytic agents. DataMind tackles three key challenges in building open-source data-analytic agents, including insufficient data resources, improper training strategy, and unstable code-based multi-turn rollout. Concretely, DataMind applies 1) a fine-grained task taxonomy and a recursive easy-to-hard task composition mechanism to increase the diversity and difficulty of synthesized queries; 2) a knowledge-augmented trajectory sampling strategy followed by model-based and rule-based filtering; 3) a dynamically adjustable training objective combining both SFT and RL losses; 4) a memory-frugal and stable code-based multi-turn rollout framework. Built on DataMind, we curate DataMind-12K, a high-quality trajectory set spanning diverse domains, task categories, and data file formats for data-analytic tasks. Trained on DataMind-12K, our DataMind-14B achieves state-of-the-art with an average score of 71.16% on multiple data analysis benchmarks, outperforming the strongest proprietary baselines DeepSeek-V3.1 and GPT-5. Our DataMind-7B also performs best among all open-source models with a score of 68.10%. We also incorporate some empirical insights gained from our exploratory trials into the analysis experiments, aiming to provide actionable insights about agentic training for the community. We will release DataMind-12K and DataMind-7B,14B for the community's future research.

URL PDF HTML ☆

赞 0 踩 0

2509.24980 2026-03-16 cs.CV

SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation

Shuang Liang, Jing He, Chuanmeizhi Wang, Lejun Liao, Guo Zhang, Yingcong Chen, Yuan Yuan

Comments 22 pages, 10 figures, 8 tables

2509.24868 2026-03-16 cs.LG physics.comp-ph

DRIFT-Net: A Spectral--Coupled Neural Operator for PDEs Learning

Jiayi Li, Flora D. Salim

Comments Accepted at ICLR 2026

2509.24506 2026-03-16 cs.CL

Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings

Hamna Hamna, Gayatri Bhat, Sourabrata Mukherjee, Faisal Lalani, Evan Hadfield, Divya Siddarth, Kalika Bali, Sunayana Sitaram

Comments Accepted at ACM CHI 2026

2509.23863 2026-03-16 cs.CL

SPELL: Self-Play Reinforcement Learning for Evolving Long-Context Language Models

Ziyi Yang, Weizhou Shen, Chenliang Li, Ruijun Chen, Fanqi Wan, Ming Yan, Xiaojun Quan, Fei Huang

Comments Accepted to ICLR 2026

2509.23325 2026-03-16 cs.LG cs.AI cs.CV

Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling

Jonas Ngnawé, Maxime Heuillet, Sabyasachi Sahoo, Yann Pequignot, Ola Ahmad, Audrey Durand, Frédéric Precioso, Christian Gagné

Comments 10 pages, 7 figures, 4 tables

2509.23313 2026-03-16 cs.LG

ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting

Xvyuan Liu, Xiangfei Qiu, Hanyin Cheng, Xingjian Wu, Chenjuan Guo, Bin Yang, Jilin Hu

2509.21619 2026-03-16 cs.LG cs.PF

PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters

Krishu K Thapa, Reet Barik, Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath

Comments 13 pages, 8 figures, 2 algorithms, workshop paper

2509.21553 2026-03-16 cs.AI cs.CE cs.HC cs.LG cs.MA

AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need

Ahmed Jaber, Wangshu Zhu, Ayon Roy, Karthick Jayavelu, Justin Downes, Sameer Mohamed, Candace Agonafir, Linnia Hawkins, Tian Zheng

Comments Accepted to IEEE CAI 2026

2509.20276 2026-03-16 cs.LG cond-mat.mtrl-sci

Extended Low-Rank Approximation Accelerates Learning of Elastic Response in Heterogeneous Materials

Prabhat Karmakar, Sayan Gupta, Ilaksh Adlakha

Comments During a recent internal review of this work, we identified inconsistencies in the implementation of certain aspects of the methodology and would like to re-examine them and verify the analysis, as these issues could influence the reported results. Therefore, we request withdrawal of the manuscript

2509.17704 2026-03-16 cs.CV

Neurodynamics-Driven Coupled Neural P Systems for Multi-Focus Image Fusion

Bo Li, Yunkuo Lei, Tingting Bao, Hang Yan, Yaxian Wang, Weiping Fu, Lingling Zhang, Jun Liu

Comments Accepted by CVPR2026

2509.16447 2026-03-16 cs.LG

Local Mechanisms of Compositional Generalization in Conditional Diffusion

Arwen Bradley

Comments 10 pages, 5 figures

2509.15342 2026-03-16 cs.CV

LowDiff: Efficient Diffusion Sampling with Low-Resolution Condition

Jiuyi Xu, Qing Jin, Meida Chen, Andrew Feng, Yang Sui, Yangming Shi

Comments 16 pages, 7 figures, 12 tables

2509.08372 2026-03-16 cs.LG

Rethinking the Backbone in Class Imbalanced Federated Source Free Domain Adaptation: The Utility of Vision Foundation Models

Kosuke Kihara, Junki Mori, Taiki Miyagawa, Akinori F. Ebihara

Comments Accepted by the IEEE ICIP 2025 Satellite Workshop 1: Edge Intelligence: Smart, Efficient, and Scalable Solutions for IoT, Wearables, and Embedded Devices (SEEDS)

2509.04650 2026-03-16 cs.CL cs.AI

Comparative Analysis of Transformer Models in Disaster Tweet Classification for Public Safety

Sharif Noor Zisad, N. M. Istiak Chowdhury, Ragib Hasan

2508.21742 2026-03-16 cs.AI stat.ME

Orientability of Causal Relations in Time Series using Summary Causal Graphs and Faithful Distributions

Timothée Loranchet, Charles K. Assaad

Comments Accepted to AISTATS 2026

2508.14327 2026-03-16 cs.CV

MoVieDrive: Urban Scene Synthesis with Multi-Modal Multi-View Video Diffusion Transformer

Guile Wu, David Huang, Dongfeng Bai, Bingbing Liu

Comments CVPR 2026 Findings Track

2508.12932 2026-03-16 cs.CV cs.AI

SEDEG:Sequential Enhancement of Decoder and Encoder's Generality for Class Incremental Learning with Small Memory

Hongyang Chen, Shaoling Pu, Lingyu Zheng, Zhongwu Sun

Comments Accepted by ICONIP2025

详情

DOI: 10.1007/978-981-95-4091-4_25

英文摘要

In incremental learning, enhancing the generality of knowledge is crucial for adapting to dynamic data inputs. It can develop generalized representations or more balanced decision boundaries, preventing the degradation of long-term knowledge over time and thus mitigating catastrophic forgetting. Some emerging incremental learning methods adopt an encoder-decoder architecture and have achieved promising results. In the encoder-decoder achitecture, improving the generalization capabilities of both the encoder and decoder is critical, as it helps preserve previously learned knowledge while ensuring adaptability and robustness to new, diverse data inputs. However, many existing continual methods focus solely on enhancing one of the two components, which limits their effectiveness in mitigating catastrophic forgetting. And these methods perform even worse in small-memory scenarios, where only a limited number of historical samples can be stored. To mitigate this limitation, we introduces SEDEG, a two-stage training framework for vision transformers (ViT), focusing on sequentially improving the generality of both Decoder and Encoder. Initially, SEDEG trains an ensembled encoder through feature boosting to learn generalized representations, which subsequently enhance the decoder's generality and balance the classifier. The next stage involves using knowledge distillation (KD) strategies to compress the ensembled encoder and develop a new, more generalized encoder. This involves using a balanced KD approach and feature KD for effective knowledge transfer. Extensive experiments on three benchmark datasets show SEDEG's superior performance, and ablation studies confirm the efficacy of its components. The code is available at https://github.com/ShaolingPu/CIL.

URL PDF HTML ☆

赞 0 踩 0

2508.10954 2026-03-16 cs.LG cs.AI

UniPrompt-CL: Sustainable Continual Learning in Medical AI with Unified Prompt Pools

Gyutae Oh, Jitae Shin

Comments 25 pages, 4 figures