arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.21534 2026-03-10 cs.AI

ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

Xiaoxuan Wang, Han Zhang, Haixin Wang, Yidan Shi, Ruoyan Li, Kaiqiao Han, Chenyi Tong, Haoran Deng, Renliang Sun, Alexander Taylor, Yanqiao Zhu, Jason Cong, Yizhou Sun, Wei Wang

详情

英文摘要

Agentic reinforcement learning (ARL) has rapidly gained attention as a promising paradigm for training agents to solve complex, multi-step interactive tasks. Despite encouraging early results, ARL remains highly unstable, often leading to training collapse. This instability limits scalability to larger environments and longer interaction horizons, and constrains systematic exploration of algorithmic design choices. In this paper, we first propose ARLArena, a stable training recipe and systematic analysis framework that examines training stability in a controlled and reproducible setting. ARLArena first constructs a clean and standardized testbed. Then, we decompose policy gradient into four core design dimensions and assess the performance and stability of each dimension. Through this fine-grained analysis, we distill a unified perspective on ARL and propose SAMPO, a stable agentic policy optimization method designed to mitigate the dominant sources of instability in ARL. Empirically, SAMPO achieves consistently stable training and strong performance across diverse agentic tasks. Overall, this study provides a unifying policy gradient perspective for ARL and offers practical guidance for building stable and reproducible LLM-based agent training pipelines.

URL PDF HTML ☆

赞 0 踩 0

2602.20989 2026-03-10 cs.CV

Cycle-Consistent Tuning for Layered Image Decomposition

Zheng Gu, Min Lu, Zhida Sun, Dani Lischinski, Daniel Cohen-Or, Hui Huang

Comments Accepted to CVPR 2026. Project page: https://vcc.tech/research/2026/ImgDecom

2602.20627 2026-03-10 cs.CV cs.RO

Object-Scene-Camera Decomposition and Recomposition for Data-Efficient Monocular 3D Object Detection

Zhaonian Kuang, Rui Ding, Meng Yang, Xinhu Zheng, Gang Hua

Comments IJCV

详情

DOI: 10.1007/s11263-026-02755-w
Journal ref: Int J Comput Vis 134, 155 (2026)

英文摘要

Monocular 3D object detection (M3OD) is intrinsically ill-posed, hence training a high-performance deep learning based M3OD model requires a humongous amount of labeled data with complicated visual variation from diverse scenes, variety of objects and camera poses.However, we observe that, due to strong human bias, the three independent entities, i.e., object, scene, and camera pose, are always tightly entangled when an image is captured to construct training data. More specifically, specific 3D objects are always captured in particular scenes with fixed camera poses, and hence lacks necessary diversity. Such tight entanglement induces the challenging issues of insufficient utilization and overfitting to uniform training data. To mitigate this, we propose an online object-scene-camera decomposition and recomposition data manipulation scheme to more efficiently exploit the training data. We first fully decompose training images into textured 3D object point models and background scenes in an efficient computation and storage manner. We then continuously recompose new training images in each epoch by inserting the 3D objects into the freespace of the background scenes, and rendering them with perturbed camera poses from textured 3D point representation. In this way, the refreshed training data in all epochs can cover the full spectrum of independent object, scene, and camera pose combinations. This scheme can serve as a plug-and-play component to boost M3OD models, working flexibly with both fully and sparsely supervised settings. In the sparsely-supervised setting, objects closest to the ego-camera for all instances are sparsely annotated. We then can flexibly increase the annotated objects to control annotation cost. For validation, our method is widely applied to five representative M3OD models and evaluated on both the KITTI and the more complicated Waymo datasets.

URL PDF HTML ☆

赞 0 踩 0

2602.19736 2026-03-10 cs.CV

InfScene-SR: Arbitrary-Size Image Super-Resolution via Iterative Joint-Denoising

Shoukun Sun, Zhe Wang, Xiang Que, Jiyin Zhang, Xiaogang Ma

2602.19223 2026-03-10 cs.AI cs.LG cs.MA

Characterizing MARL for Energy Control: A Multi-KPI Benchmark on the CityLearn Environment

Aymen Khouja, Imen Jendoubi, Oumayma Mahjoub, Oussama Mahfoudhi, Ruan De Kock, Siddarth Singh, Claude Formanek

2602.19112 2026-03-10 cs.CV

Universal 3D Shape Matching via Coarse-to-Fine Language Guidance

Qinfeng Xiao, Guofeng Mei, Bo Yang, Liying Zhang, Jian Zhang, Kit-lun Yick

Comments Accepted by CVPR 2026

2602.18853 2026-03-10 cs.CV

Open-Vocabulary Domain Generalization in Urban-Scene Segmentation

Dong Zhao, Qi Zang, Nan Pu, Wenjing Li, Nicu Sebe, Zhun Zhong

2602.18843 2026-03-10 cs.AI cs.SC

ABD: Default Exception Abduction in Finite First Order Worlds

Serafim Batzoglou

2602.18606 2026-03-10 cs.RO cs.CV

OVerSeeC: Open-Vocabulary Costmap Generation from Satellite Images and Natural Language

Rwik Rana, Jesse Quattrociocchi, Dongmyeong Lee, Christian Ellis, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas

Comments Website : https://amrl.cs.utexas.edu/overseec/

2602.18064 2026-03-10 cs.CV

3DMedAgent: Unified Perception-to-Understanding for 3D Medical Analysis

Ziyue Wang, Linghan Cai, Chang Han Low, Haofeng Liu, Junde Wu, Jingyu Wang, Rui Wang, Lei Song, Jiang Bian, Jingjing Fu, Yueming Jin

Comments 19 pages, 7 figures

2602.17601 2026-03-10 cs.RO

Graph Neural Model Predictive Control for High-Dimensional Systems

Patrick Benito Eberhard, Luis Pabon, Daniele Gammelli, Hugo Buurmeijer, Amon Lahr, Mark Leone, Andrea Carron, Marco Pavone

2602.13810 2026-03-10 cs.LG cs.AI

Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

Guojian Zhan, Letian Tao, Pengcheng Wang, Yixiao Wang, Yiheng Li, Yuxin Chen, Hongyang Li, Masayoshi Tomizuka, Shengbo Eben Li

Comments ICLR Oral Presentation

2602.13102 2026-03-10 cs.CL

Towards interpretable models for language proficiency assessment: Predicting the CEFR level of Estonian learner texts

Kais Allkivi

2602.12575 2026-03-10 cs.CL cs.LG

Discovering Semantic Latent Structures in Psychological Scales: A Response-Free Pathway to Efficient Simplification

Bo Wang, Yuxuan Zhang, Yueqin Hu, Hanchao Hou, Kaiping Peng, Shiguang Ni

Comments 79 pages, 20 figures; parameter perturbation result of epoch-cn updated; minor revisions on grammars

详情

英文摘要

Psychological scale refinement traditionally relies on response-based methods such as factor analysis, item response theory, and network psychometrics to optimize item composition. Although rigorous, these approaches require large samples and may be constrained by data availability and cross-cultural comparability. Recent advances in natural language processing suggest that the semantic structure of questionnaire items may encode latent construct organization, offering a complementary response-free perspective. We introduce a topic-modeling framework that operationalizes semantic latent structure for scale simplification. Items are encoded using contextual sentence embeddings and grouped via density-based clustering to discover latent semantic factors without predefining their number. Class-based term weighting derives interpretable topic representations that approximate constructs and enable merging of semantically adjacent clusters. Representative items are selected using membership criteria within an integrated reduction pipeline. We benchmarked the framework across DASS, IPIP, and EPOCH, evaluating structural recovery, internal consistency, factor congruence, correlation preservation, and reduction efficiency. The proposed method recovered coherent factor-like groupings aligned with established constructs. Selected items reduced scale length by 60.5% on average while maintaining psychometric adequacy. Simplified scales showed high concordance with original factor structures and preserved inter-factor correlations, indicating that semantic latent organization provides a response-free approximation of measurement structure. Our framework formalizes semantic structure as an inspectable front-end for scale construction and reduction. To facilitate adoption, we provide a visualization-supported tool enabling one-click semantic analysis and structured simplification.

URL PDF HTML ☆

赞 0 踩 0

2602.11040 2026-03-10 cs.LG cs.CL

Learning Page Order in Shuffled WOO Releases

Efe Kahraman, Giulio Tosato

2602.10467 2026-03-10 cs.AI

MERIT Feedback Elicits Better Bargaining in LLM Negotiators

Jihwan Oh, Murad Aghazada, Yooju Shin, Se-Young Yun, Taehyeon Kim

Comments Preprint. Typo corrected, New results added

2602.09486 2026-03-10 cs.CL cs.AI

Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement

Koduvayur Subbalakshmi, Sabbir Hossain Ujjal, Venkata Krishna Teja Mangichetty, Nastaran Jamalipour Soofi

Comments Preprint, 26 pages, 15 tables, 15 figures

2602.08020 2026-03-10 cs.CV

PhysDrape: Learning Explicit Forces and Collision Constraints for Physically Realistic Garment Draping

Minghai Chen, Mingyuan Liu, Ning Ma, Jianqing Li, Yuxiang Huan

2602.07391 2026-03-10 cs.AI cs.MA

NAAMSE: Framework for Evolutionary Security Evaluation of Agents

Kunal Pai, Parth Shah, Harshil Patel

Comments Published at ICLR 2026 Workshop on Agents in the Wild

2602.00329 2026-03-10 cs.LG cs.AI

In-Run Data Shapley for Adam Optimizer

Meng Ding, Zeqing Zhang, Di Wang, Lijie Hu

Comments 16 pages

2601.20185 2026-03-10 cs.CL cs.SD

Improving X-Codec-2.0 for Multi-Lingual Speech: 25 Hz Latent Rate and 24 kHz Sampling

Husein Zolkepli

2601.19961 2026-03-10 cs.LG cs.AI cs.CV

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Huanlin Gao, Ping Chen, Fuyuan Shi, Ruijia Wu, Li YanTao, Qiang Hui, Yuren You, Ting Lu, Chao Tan, Shaoan Zhao, Zhaoxiang Liu, Fang Zhao, Kai Wang, Shiguo Lian

2601.17842 2026-03-10 cs.CL

EFT-CoT: A Multi-Agent Chain-of-Thought Framework for Emotion-Focused Therapy

Lanqing Du, Yunong Li, YuJie Long, Shihong Chen

2601.13824 2026-03-10 cs.LG

ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over the Network Edge

Xiaohong Yang, Tong Xie, Minghui Liwang, Chikai Shang, Yang Lu, Zhenzhen Jiao, Liqun Fu, Seyyedali Hosseinalipour

Comments 11 pages, 16 figures

2601.11492 2026-03-10 cs.AI

BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics

Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu

2601.08192 2026-03-10 cs.CV

Route, Retrieve, Reflect, Repair: Self-Improving Agentic Framework for Visual Detection and Linguistic Reasoning in Medical Imaging

Md. Faiyaz Abdullah Sayeedi, Rashedur Rahman, Siam Tahsin Bhuiyan, Sefatul Wasi, Ashraful Islam, Saadia Binte Alam, AKM Mahbubur Rahman

2601.06426 2026-03-10 cs.CL cs.AI

NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

Robert J. Moore, Sungeun An, Farhan Ahmed, Jay Pankaj Gala

Comments 8 pages, 1 figure, 2 tables

2601.05611 2026-03-10 cs.CV

FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving

Chengen Xie, Chonghao Sima, Tianyu Li, Bin Sun, Junjie Wu, Zhihui Hao, Hongyang Li

2512.17186 2026-03-10 cs.CV

It is not always greener on the other side: Greenery perception across demographics and personalities in multiple cities

Matias Quintana, Fangqi Liu, Jussi Torkko, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Tuuli Toivonen, Yi Lu, Filip Biljecki

详情

DOI: 10.1016/j.landurbplan.2026.105618
Journal ref: Landscape and Urban Planning 271 (2026) 105618

英文摘要

Quantifying and assessing urban greenery is consequential for planning and development, reflecting the everlasting importance of green spaces for multiple climate and well-being dimensions of cities. Evaluation can be broadly grouped into objective (e.g., measuring the amount of greenery) and subjective (e.g., polling the perception of people) approaches, which may differ -- what people see and feel about how green a place is might not match the measurements of the actual amount of vegetation. In this work, we advance the state of the art by measuring such differences and explaining them through human, geographic, and spatial dimensions. The experiments rely on contextual information extracted from street view imagery and a comprehensive urban visual perception survey collected from 1,000 people across five countries with their extensive demographic and personality information. We analyze the discrepancies between objective measures (e.g., Green View Index (GVI)) and subjective scores (e.g., pairwise ratings), examining whether they can be explained by a variety of human and visual factors such as age group and spatial variation of greenery in the scene. The findings reveal that such discrepancies are comparable around the world and that demographics and personality do not play a significant role in perception. Further, while perceived and measured greenery correlate consistently across geographies (both where people and where imagery are from), where people live plays a significant role in explaining perceptual differences, with these two, as the top among seven, features that influences perceived greenery the most. This location influence suggests that cultural, environmental, and experiential factors substantially shape how individuals observe greenery in cities.

URL PDF HTML ☆

赞 0 踩 0

2512.16880 2026-03-10 cs.CV

ReMeDI: Refined Memory for Disambiguation of Identities with SAM3 in Surgical Segmentation

Valay Bundele, Mehran Hosseinzadeh, Hendrik P. A. Lensch

Comments Under Review