arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.21606 2026-03-31 cs.CV

ReSAM: Refine, Requery, and Reinforce: Self-Prompting Point-Supervised Segmentation for Remote Sensing Images

M. Naseer Subhani

详情

英文摘要

Interactive segmentation models such as the Segment Anything Model (SAM) have demonstrated remarkable generalization on natural images, but they perform suboptimally on remote sensing imagery (RSI) due to severe domain shifts and the scarcity of dense annotations. To address this limitation, we propose a point-supervised, self-prompting framework that adapts SAM to RSI using only sparse point annotations. Our method employs a Refine-Requery-Reinforce loop, in which coarse pseudo-masks are generated from initial points (Refine), improved with self-constructed box prompts (Requery), and embeddings are aligned with Soft Semantic Alignment (SSA) to mitigate error propagation. (Reinforce). Without relying on full-mask supervision, our approach progressively enhances SAM's segmentation quality and domain robustness through self-guided prompt adaptation. We evaluate our proposed method on three RSI benchmark datasets, WHU, HRSID, and NWPU VHR-10, demonstrating that it consistently outperforms pretrained SAM and recent point-supervised segmentation methods. Compared to the fully supervised model, our approach reduces the performance gap to 1.3% (WHU), 4.9% (HRSID), and 8.5% (NWPU) while relying only on 1-point annotations. Our results demonstrate that self-prompting and semantic alignment provide an efficient path towards scalable, point-level adaptation of foundation segmentation models for remote sensing applications.

URL PDF HTML ☆

赞 0 踩 0

2511.21437 2026-03-31 cs.CL cs.LG

A Systematic Study of In-the-Wild Model Merging for Large Language Models

Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata

详情

Journal ref: Transactions on Machine Learning Research (03/2026)

英文摘要

Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for settings where all merged experts have distinct roles and are tuned on clearly separated tasks also hold in settings where the merged experts do not have clearly distinct roles, but are trained on overlapping or even conflicting objectives. To evaluate this setting, we present a large-scale, systematic evaluation of "in-the-wild" model merging of heterogeneous experts, that may have been trained on overlapping or conflicting objectives. Concretely, we evaluate six state-of-the-art merging methods, including recent subspace methods, across four open-weight LLMs, twelve fine-tuned checkpoints per base model, and sixteen standard LLM benchmarks. Evaluating through standardized benchmarks, we measure both the probability that a model merged from a heterogeneous set of experts outperforms the base model and we measure relative gains over the best individual checkpoint. Our results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs in this "in-the-wild" setting. Other interference-aware and subspace merging methods typically do not result in notable improvements over the base model. Our findings indicate that current merging techniques mostly do not enable extracting useful weight updates from heterogeneous and potentially conflicting versions. This motivates the design of LLM-specific merging algorithms and merging-aware fine-tuning methods.

URL PDF HTML ☆

赞 0 踩 0

2511.18600 2026-03-31 cs.CV

NeAR: Coupled Neural Asset-Renderer Stack

Hong Li, Chongjie Ye, Houyuan Chen, Weiqing Xiao, Ziyang Yan, Lixing Xiao, Zhaoxi Chen, Jianfeng Xiang, Shaocong Xu, Xuhui Liu, Yikai Wang, Baochang Zhang, Xiaoguang Han, Jiaolong Yang, Hao Zhao

Comments Accepted by CVPR 2026. The project page: https://near-project.github.io/

2511.18471 2026-03-31 cs.CV

Jacobian-aware Posterior Sampling for Inverse Problems

Liav Hen, Tom Tirer, Raja Giryes, Shady Abu-Hussein

2511.16719 2026-03-31 cs.CV cs.AI

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollár, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer

2511.16417 2026-03-31 cs.AI

Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

Yan Chen, Yu Zou, Jialei Zeng, Haoran You, Xiaorui Zhou, Aixi Zhong

2511.14139 2026-03-31 cs.RO

FlexiCup: Wireless Multimodal Suction Cup with Dual-Zone Vision-Tactile Sensing

Junhao Gong, Shoujie Li, Kit-Wa Sou, Changqing Guo, Hourong Huang, Tong Wu, Yifan Xie, Chenxin Liang, Chuqiao Lyu, Xiaojun Liang, Wenbo Ding

Comments Accepted by IEEE Robotics and Automation Letters (RA-L)

2511.14073 2026-03-31 cs.CL

Based on Data Balancing and Model Improvement for Multi-Label Sentiment Classification Performance Enhancement

Zijin Su, Huanzhu Lyu, Yuren Niu, Yiming Liu

Comments 9 pages, updated methodology and evaluation, added audit summary, label-cardinality and per-label count analyses, clarified splits and threshold tuning, added DistilRoBERTa baseline comparison. Updated figures, tables, references, and data-availability statement

2511.14043 2026-03-31 cs.AI cs.CL cs.MA

AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scientific Assistance

Chandrachur Bhattacharya, Sibendu Som

2511.13719 2026-03-31 cs.CV cs.AI cs.LG cs.MM cs.RO

Scaling Spatial Intelligence with Multimodal Foundation Models

Zhongang Cai, Ruisi Wang, Chenyang Gu, Fanyi Pu, Junxiang Xu, Yubo Wang, Wanqi Yin, Zhitao Yang, Chen Wei, Qingping Sun, Tongxi Zhou, Jiaqi Li, Hui En Pang, Oscar Qian, Yukun Wei, Zhiqian Lin, Xuanke Shi, Kewang Deng, Xiaoyang Han, Zukai Chen, Xiangyu Fan, Hanming Deng, Lewei Lu, Liang Pan, Bo Li, Ziwei Liu, Quan Wang, Dahua Lin, Lei Yang

Comments Codebase: https://github.com/OpenSenseNova/SenseNova-SI ; Models: https://huggingface.co/collections/sensenova/sensenova-si . This report is based on the v1.1 version of SenseNova-SI. Accepted to CVPR 2026

2511.11483 2026-03-31 cs.CV cs.AI

ImAgent: A Unified Multimodal Agent Framework for Test-Time Scalable Image Generation

Kaishen Wang, Ruibo Chen, Tong Zheng, Heng Huang

Comments 8 tables, 8 figures

2511.10696 2026-03-31 cs.CL cs.AI

$π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling

Dong Liu, Yanxuan Yu

2511.10465 2026-03-31 cs.CL cs.AI

Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks

Yunzhe Xu, Zhuosheng Zhang, Zhe Liu

Comments Accepted by IEEE Transactions on Audio, Speech and Language Processing (TASLP)

2511.07738 2026-03-31 cs.LG cs.CV

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

Donglai Xu, Hongzheng Yang, Yuzhi Zhao, Pingping Zhang, Jinpeng Chen, Wenao Ma, Zhijian Hou, Mengyang Wu, Xiaolei Li, Senkang Hu, Ziyi Guan, Jason Chun Lok Li, Lai Man Po

2510.26794 2026-03-31 cs.CV

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

Jing Lin, Ruisi Wang, Junzhe Lu, Ziqi Huang, Guorui Song, Ailing Zeng, Xian Liu, Chen Wei, Wanqi Yin, Qingping Sun, Zhongang Cai, Lei Yang, Ziwei Liu

Comments Homepage: https://motrixlab.github.io/2026_iclr_vimogen

2510.25327 2026-03-31 cs.CV cs.AI cs.LG

MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding

Runxi Huang, Mingxuan Yu, Mingyu Tsoi, Xiaomin Ouyang

Comments Code available at: https://github.com/HKUST-MINSys-Lab/MMEdge. Accepted by SenSys 2026

详情

DOI: 10.1145/3774906.3800485

英文摘要

Real-time multimodal inference on resource-constrained edge devices is essential for applications such as autonomous driving, human-computer interaction, and mobile health. However, prior work often overlooks the tight coupling between sensing dynamics and model execution, as well as the complex inter-modality dependencies. In this paper, we propose MMEdge, a new on-device multimodal inference framework based on pipelined sensing and encoding. Instead of waiting for complete sensor inputs, MMEdge decomposes the entire inference process into a sequence of fine-grained sensing and encoding units, allowing computation to proceed incrementally as data arrive. MMEdge also introduces a lightweight but effective temporal aggregation module that captures rich temporal dynamics across different pipelined units to maintain accuracy performance. Such pipelined design also opens up opportunities for fine-grained cross-modal optimization and early decision-making during inference. To further enhance system performance under resource variability and input data complexity, MMEdge incorporates an adaptive multimodal configuration optimizer that dynamically selects optimal sensing and model configurations for each modality under latency constraints, and a cross-modal speculative skipping mechanism that bypasses future units of slower modalities when early predictions reach sufficient confidence. We evaluate MMEdge using two public multimodal datasets and deploy it on a real-world unmanned aerial vehicle (UAV)-based multimodal testbed. The results show that MMEdge significantly reduces end-to-end latency while maintaining high task accuracy across various system and data dynamics. A video demonstration of MMEdge's performance in real world is available at https://youtu.be/qRew7sT-iWw.

URL PDF HTML ☆

赞 0 踩 0

2510.25311 2026-03-31 cs.LG cs.AI

Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning

Sagalpreet Singh, Rishi Saket, Aravindan Raghuveer

Comments 27 pages, 6 figures

详情

英文摘要

Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, in many natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states, while maximizing the expected return which is typically tied to reaching a goal state. This aspect remains relatively unexplored. Existing techniques based on entropy regularization and intrinsic rewards use stochasticity for encouraging exploration to find an optimal policy which may not necessarily lead to dispersed marginal state distribution over rewarding states. Other RL algorithms which match a target distribution assume the latter to be available apriori. This may be infeasible in large scale systems where enumeration of all states is not possible and a state is determined to be a goal state only upon reaching it. We formalize the problem of maximizing the expected return while uniformly visiting the goal states as Multi Goal RL in which an oracle classifier over the state space determines the goal states. We propose a novel algorithm that learns a high-return policy mixture with marginal state distribution dispersed over the set of goal states. Our algorithm is based on optimizing a custom RL reward which is computed - based on the current policy mixture - at each iteration for a set of sampled trajectories. The latter are used via an offline RL algorithm to update the policy mixture. We prove performance guarantees for our algorithm, showing efficient convergence bounds for optimizing a natural objective which captures the expected return as well as the dispersion of the marginal state distribution over the goal states. We design and perform experiments on synthetic MDPs and standard RL environments to evaluate the effectiveness of our algorithm.

URL PDF HTML ☆

赞 0 踩 0

2510.21045 2026-03-31 cs.AI cs.DB cs.IR

From Questions to Queries: An AI-powered Multi-Agent Framework for Spatial Text-to-SQL

Ali Khosravi Kazazi, Zhenlong Li, M. Naser Lessani, Guido Cervone

2510.18342 2026-03-31 cs.AI

ShortcutBreaker: Low-Rank Noisy Bottleneck and Frequency Filtering Block for Multi-Class Unsupervised Anomaly Detection

Peng Tang, Xiaobin Hu, Tingcheng Li, Yang Nan, Tobias Lasser, Hongwei Bran Li

Comments Under Review

2510.17211 2026-03-31 cs.AI cs.LG

Temporally Detailed Hypergraph Neural ODEs for Disease Progression Modeling

Tingsong Xiao, Yao An Lee, Zelin Xu, Yupu Zhang, Zibo Liu, Yu Huang, Jiang Bian, Jingchuan Guo, Zhe Jiang

Comments Accepted at ICLR 2026

2510.16974 2026-03-31 cs.LG stat.ML

Differentially Private Linear Regression and Synthetic Data Generation with Statistical Guarantees

Shurong Lin, Aleksandra Slavković, Deekshith Reddy Bhoomireddy

2510.14376 2026-03-31 cs.CV

DOS: Directional Object Separation in Text Embeddings for Multi-Object Image Generation

Dongnam Byun, Jungwon Park, Jungmin Ko, Changin Choi, Wonjong Rhee

Comments Accepted to AAAI 2026

2510.13905 2026-03-31 cs.CL cs.AI

Schema for In-Context Learning

Pan Chen, Shaohong Chen, Mark Wang, Shi Xuan Leong, Priscilla Fung, Varinia Bernales, Alan Aspuru-Guzik

2510.12901 2026-03-31 cs.CV cs.GR cs.LG cs.RO

SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms

Haithem Turki, Qi Wu, Xin Kang, Janick Martinez Esturo, Shengyu Huang, Ruilong Li, Zan Gojcic, Riccardo de Lutio

Comments ICLR 2026 - project page: https://research.nvidia.com/labs/sil/projects/simuli

2510.11417 2026-03-31 cs.CV

Robust Ego-Exo Correspondence with Long-Term Memory

Yijun Hu, Bing Fan, Xin Gu, Haiqing Ren, Dongfang Liu, Heng Fan, Libo Zhang

Comments Accepted by NeurIPS 2025

2510.08847 2026-03-31 cs.AI cs.MA

What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

Allison Sihan Jia, Daniel Huang, Nikhil Vytla, Seung Won Wilson Yoo, Nirvika Choudhury, Shayak Sen, John C. Mitchell, Anupam Datta

2510.08284 2026-03-31 cs.CL

Neuron-Level Analysis of Cultural Understanding in Large Language Models

Taisei Yamamoto, Ryoma Kumon, Danushka Bollegala, Hitomi Yanaka

Comments Accepted to ICLR 2026

2510.06162 2026-03-31 cs.LG

TabPFN-Wide: Continued Pre-Training for Extreme Feature Counts

Christopher Kolberg, Jules Kreuer, Jonas Huurdeman, Sofiane Ouaari, Katharina Eggensperger, Nico Pfeifer

2510.05825 2026-03-31 cs.LG cs.AI cs.CL stat.ML

Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling

Giorgio Giannone, Guangxuan Xu, Nikhil Shivakumar Nayak, Rohan Mahesh Awhad, Shivchander Sudalairaj, Kai Xu, Akash Srivastava

Comments preprint

2510.04618 2026-03-31 cs.LG cs.AI cs.CL

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, Kunle Olukotun

Comments ICLR 2026; 32 pages