arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.19889 2026-03-02 cs.CV

OmniFall: From Staged Through Synthetic to Wild, A Unified Multi-Domain Dataset for Robust Fall Detection

David Schneider, Zdravko Marinov, Zeyun Zhong, Alexander Jaus, Rodi Düger, Rafael Baur, M. Saquib Sarfraz, Rainer Stiefelhagen

详情

英文摘要

Visual fall detection models trained on small, staged datasets have unclear real-world utility due to limited diversity and inconsistent evaluation protocols. We present OmniFall, a unified benchmark with 80 hours / 15k videos and dense frame-level annotations in a harmonized 16-class taxonomy, spanning three complementary domains: OF-Staged (eight public staged sets, standardized with cross-subject/view splits), OF-Synthetic (12k videos, 17 h; controlled diversity in age, body type, environment, camera), and OF-In-the-Wild (the first test-only benchmark curated from genuine accident videos). OmniFall supports both video classification and timeline segmentation, and its cross-domain protocol isolates staged/synthetic-to-wild generalization. Our results show that carefully designed synthetic data can match or surpass real staged footage on cross-domain transfer, while reducing privacy risk and easing data collection. By combining privacy-amenable synthetic/staged sources with a public, test-only wild target and releasing dense, standardized timelines, OmniFall provides a comprehensive benchmark for privacy-preserving fall detection and fall-related (pre/post-fall) segmentation, enabling robust detectors that generalize to uncontrolled environments. Project page: http://simplexsigil.github.io/omnifall/

URL PDF HTML ☆

赞 0 踩 0

2505.19862 2026-03-02 cs.CL cs.LG

REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning

Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Jun Rao, Min Zhang

Comments Accepted by ICLR 2026

2505.19764 2026-03-02 cs.LG cs.AI

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Patara Trirat, Wonyong Jeong, Sung Ju Hwang

Comments ICLR 2026, Project Page: https://deepauto-ai.github.io/agentic-predictor/

2505.19762 2026-03-02 cs.AI

Language Models as Messengers: Enhancing Message Passing in Heterophilic Graph Learning

Dawei Cheng, Wenjun Wang, Mingjian Guang

2505.18679 2026-03-02 cs.CV

Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization

Bin Ren, Yawei Li, Xu Zheng, Yuqian Fu, Danda Pani Paudel, Hong Liu, Ming-Hsuan Yang, Luc Van Gool, Nicu Sebe

Comments Accepted by ICLR'2026, All-in-One Image Restoration, low-level vision, Transformer

2505.16685 2026-03-02 cs.CV

On the use of Graphs for Satellite Image Time Series

Corentin Dufourg, Charlotte Pelletier, Stéphane May, Sébastien Lefèvre

Comments This work has been accepted for publication in IEEE Geoscience and Remote Sensing Magazine. The final published version is available via IEEE Xplore

详情

DOI: 10.1109/MGRS.2025.3622200
Journal ref: IEEE Geoscience and Remote Sensing Magazine (Volume: 14, Issue: 1, Pages: 205-244, February 2026)

英文摘要

The Earth's surface is subject to complex and dynamic processes, ranging from large-scale phenomena such as tectonic plate movements to localized changes associated with ecosystems, agriculture, or human activity. Satellite images enable global monitoring of these processes with extensive spatial and temporal coverage, offering advantages over in-situ methods. In particular, resulting satellite image time series (SITS) datasets contain valuable information. To handle their large volume and complexity, some recent works focus on the use of graph-based techniques that abandon the regular Euclidean structure of satellite data to work at an object level. Besides, graphs enable modelling spatial and temporal interactions between identified objects, which are crucial for pattern detection, classification and regression tasks. This paper is an effort to examine the integration of graph-based methods in spatio-temporal remote-sensing analysis. In particular, it aims to present a versatile graph-based pipeline to tackle SITS analysis. It focuses on the construction of spatio-temporal graphs from SITS and their application to downstream tasks. The paper includes a comprehensive review and two case studies, which highlight the potential of graph-based approaches for land cover mapping and water resource forecasting. It also discusses numerous perspectives to resolve current limitations and encourage future developments.

URL PDF HTML ☆

赞 0 踩 0

2505.11601 2026-03-02 cs.LG cs.AI

Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search

Rui Liu, Rui Xie, Zijun Yao, Yanjie Fu, Dongjie Wang

Comments KDD 2025

详情

英文摘要

Feature selection removes redundant features to enhanc performance and computational efficiency in downstream tasks. Existing works often struggle to capture complex feature interactions and adapt to diverse scenarios. Recent advances in this domain have incorporated generative intelligence to address these drawbacks by uncovering intricate relationships between features. However, two key limitations remain: 1) embedding feature subsets in a continuous space is challenging due to permutation sensitivity, as changes in feature order can introduce biases and weaken the embedding learning process; 2) gradient-based search in the embedding space assumes convexity, which is rarely guaranteed, leading to reduced search effectiveness and suboptimal subsets. To address these limitations, we propose a new framework that can: 1) preserve feature subset knowledge in a continuous embedding space while ensuring permutation invariance; 2) effectively explore the embedding space without relying on strong convex assumptions. For the first objective, we develop an encoder-decoder paradigm to preserve feature selection knowledge into a continuous embedding space. This paradigm captures feature interactions through pairwise relationships within the subset, removing the influence of feature order on the embedding. Moreover, an inducing point mechanism is introduced to accelerate pairwise relationship computations. For the second objective, we employ a policy-based reinforcement learning (RL) approach to guide the exploration of the embedding space. The RL agent effectively navigates the space by balancing multiple objectives. By prioritizing high-potential regions adaptively and eliminating the reliance on convexity assumptions, the RL agent effectively reduces the risk of converging to local optima. Extensive experiments demonstrate the effectiveness, efficiency, robustness and explicitness of our model.

URL PDF HTML ☆

赞 0 踩 0

2505.00784 2026-03-02 cs.RO

Agile legged locomotion in reconfigurable modular robots

Chen Yu, David Matthews, Jingxian Wang, Jing Gu, Douglas Blackiston, Michael Rubenstein, Sam Kriegman

2505.00624 2026-03-02 cs.CL cs.AI

FineScope : SAE-guided Data Selection Enables Domain Specific LLM Pruning and Finetuning

Chaitali Bhattacharyya, Hyunsei Lee, Junyoung Lee, Shinhyoung Jang, Il hong Suh, Yeseong Kim

2504.18840 2026-03-02 cs.RO

Distributed Lloyd-Based algorithm for uncertainty-aware multi-robot under-canopy flocking

Manuel Boldrer, Vit Kratky, Viktor Walter, Martin Saska

2504.18579 2026-03-02 cs.LG

Sparsity Forcing: Reinforcing Token Sparsity of MLLMs

Feng Chen, Yefei He, Lequan Lin, Chenhui Gou, Jing Liu, Bohan Zhuang, Qi Wu

Comments Accepted by ICLR 2026

2504.17192 2026-03-02 cs.CL

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang

Comments ICLR 2026

2504.16930 2026-03-02 cs.CV

What Makes Good Synthetic Training Data for Zero-Shot Stereo Matching?

David Yan, Alexander Raistrick, Jia Deng

Comments Accepted to CVPR 2026

2504.08578 2026-03-02 cs.CV

Multimodal Knowledge Distillation for Egocentric Action Recognition Robust to Missing Modalities

Maria Santos-Villafranca, Dustin Carrión-Ojeda, Alejandro Perez-Yus, Jesus Bermudez-Cameo, Jose J. Guerrero, Simone Schaub-Meyer

Comments Project Page: https://visinf.github.io/KARMMA

2504.00510 2026-03-02 cs.LG cs.AI

Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving

Jianing Huang, Kaixuan Zhang, Youjia Wu, Ze Cheng

2503.23461 2026-03-02 cs.CV

Investigating Text Insulation and Attention Mechanisms for Complex Visual Text Generation

Ying Tai, Nikai Du, Rui Xie, Zhennan Chen, Qian Wang, Zhengkai Jiang, Kai Zhang, Jian Yang

2503.21449 2026-03-02 cs.CV

Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving

Lucas Nunes, Rodrigo Marcuzzi, Jens Behley, Cyrill Stachniss

详情

英文摘要

Semantic scene understanding is crucial for robotics and computer vision applications. In autonomous driving, 3D semantic segmentation plays an important role for enabling safe navigation. Despite significant advances in the field, the complexity of collecting and annotating 3D data is a bottleneck in this developments. To overcome that data annotation limitation, synthetic simulated data has been used to generate annotated data on demand. There is still, however, a domain gap between real and simulated data. More recently, diffusion models have been in the spotlight, enabling close-to-real data synthesis. Those generative models have been recently applied to the 3D data domain for generating scene-scale data with semantic annotations. Still, those methods either rely on image projection or decoupled models trained with different resolutions in a coarse-to-fine manner. Such intermediary representations impact the generated data quality due to errors added in those transformations. In this work, we propose a novel approach able to generate 3D semantic scene-scale data without relying on any projection or decoupled trained multi-resolution models, achieving more realistic semantic scene data generation compared to previous state-of-the-art methods. Besides improving 3D semantic scene-scale data synthesis, we thoroughly evaluate the use of the synthetic scene samples as labeled data to train a semantic segmentation network. In our experiments, we show that using the synthetic annotated data generated by our method as training data together with the real semantic segmentation labels, leads to an improvement in the semantic segmentation model performance. Our results show the potential of generated scene-scale point clouds to generate more training data to extend existing datasets, reducing the data annotation effort. Our code is available at https://github.com/PRBonn/3DiSS.

URL PDF HTML ☆

赞 0 踩 0

2503.15477 2026-03-02 cs.LG cs.AI cs.CL stat.ML

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Noam Razin, Zixuan Wang, Hubert Strauss, Stanley Wei, Jason D. Lee, Sanjeev Arora

Comments Accepted to NeurIPS 2025; Code available at https://github.com/princeton-pli/what-makes-good-rm

2503.10568 2026-03-02 cs.CV

Autoregressive Image Generation with Randomized Parallel Decoding

Haopeng Li, Jinyue Yang, Guoqi Li, Huan Wang

Comments The Fourteenth International Conference on Learning Representations (ICLR 2026)

2503.08422 2026-03-02 cs.CV

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo

2503.04398 2026-03-02 cs.LG cs.AI cs.DC

Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling

Yan Li, Zhenyu Zhang, Zhengang Wang, Pengfei Chen, Pengfei Zheng

Comments Published as a conference paper at ICLR 2026

详情

英文摘要

Prevailing LLM serving engines employ expert parallelism (EP) to implement multi-device inference of massive MoE models. However, the efficiency of expert parallel inference is largely bounded by inter-device communication, as EP embraces expensive all-to-all collectives to route tokens to the remote experts if not collocating on the same GPU/NPU device. Nevertheless, state-of-the-art schemes treat expert device-placement and request (or token) device-scheduling as separate concerns, triggering excessive communication between them and compromising inference efficiency This paper proposes Semantic Parallelism, a novel parallelism paradigm that minimizes the steep communication costs in EP-centric MoE serving via model-data collaborative scheduling. We implement Semantic Parallelism in a framework called Sem-MoE. Sem-MoE maximally collocates experts and their activating tokens onto the same device using proactively modeled activation likelihood between them and introduces three key techniques: (1) Offline model scheduling, which preliminarily clusters and collocates experts onto devices based on their co-activation tendencies for certain classes of input. (2) Online inter-request data scheduling for Attention-DP setups, which proactively rebatches incoming requests onto the device that hosts experts most likely and frequently activated by the corresponding requests. (3) Online intra-request data scheduling for Attention-TP setups, which seamlessly fuses a token reshuffling procedure into the original inference pipeline and proactively reschedules tokens to devices to reduce dispersed remote routing. We build Sem-MoE into a prevailing LLM serving engine SGLANG. Experiments show our collaborative scheduling approach can effectively reduce the all-to-all communication volume in EP and achieve superior inference throughput compared to existing solutions.

URL PDF HTML ☆

赞 0 踩 0

2502.07845 2026-03-02 cs.CV cs.AI

Spread them Apart: Towards Robust Watermarking of Generated Content

Mikhail Pautov, Danil Ivanov, Andrey V. Galichin, Oleg Rogov, Ivan Oseledets

2502.01383 2026-03-02 cs.LG stat.ML

InfoBridge: Mutual Information estimation via Bridge Matching

Sergei Kholkin, Ivan Butakov, Evgeny Burnaev, Nikita Gushchin, Alexander Korotin

2501.16609 2026-03-02 cs.AI cs.CL cs.HC

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Faria Huq, Zora Zhiruo Wang, Frank F. Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P. Bigham, Graham Neubig

Comments Published at NAACL System Demonstration Track, 2025

2501.15910 2026-03-02 cs.LG cs.SY eess.SY math.OC stat.ML

The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective

Michael Muehlebach, Zhiyu He, Michael I. Jordan

Comments accepted at ICLR 2026; 37 pages, 6 figures

2412.03059 2026-03-02 cs.CV

CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning

Runjian Chen, Hang Zhang, Avinash Ravichandran, Hyoungseob Park, Wenqi Shao, Alex Wong, Ping Luo

2412.03054 2026-03-02 cs.CV

TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception

Runjian Chen, Hyoungseob Park, Bo Zhang, Wenqi Shao, Ping Luo, Alex Wong

2411.11727 2026-03-02 cs.LG cs.CV

Aligning Few-Step Diffusion Models with Dense Reward Difference Learning

Ziyi Zhang, Li Shen, Sen Zhang, Deheng Ye, Yong Luo, Miaojing Shi, Dongjing Shan, Bo Du, Dacheng Tao

Comments Accepted by IEEE TPAMI

2410.23836 2026-03-02 cs.CV

Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts

Xiang Deng, Youxin Pang, Xiaochen Zhao, Chao Xu, Lizhen Wang, Hongjiang Xiao, Shi Yan, Hongwen Zhang, Yebin Liu

2410.10922 2026-03-02 cs.LG cs.CR cs.CV

Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure

Hanlin Gu, Hong Xi Tae, Lixin Fan, Chee Seng Chan

Comments Accepted at ICLR2026. This paper introduces the first method for label unlearning in vertical federated learning (VFL), focused on preventing label leakage by the active party