arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.22228 2026-03-23 cs.CV cs.AI cs.LG

3D-Consistent Multi-View Editing by Correspondence Guidance

Josef Bengtson, David Nilsson, Dong In Lee, Yaroslava Lochman, Fredrik Kahl

Comments Added experiments with FLUX.1 editing method

详情

英文摘要

Recent advancements in diffusion and flow models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsistent results across different views of the same scene. Such inconsistencies are particularly problematic for editing of 3D representations such as NeRFs or Gaussian splat models. We propose a training-free guidance framework that enforces multi-view consistency during the image editing process. The key idea is that corresponding points should look similar after editing. To achieve this, we introduce a consistency loss that guides the denoising process toward coherent edits. The framework is flexible and can be combined with widely varying image editing methods, supporting both dense and sparse multi-view editing setups. Experimental results show that our approach significantly improves 3D consistency compared to existing multi-view editing methods. We also show that this increased consistency enables high-quality Gaussian splat editing with sharp details and strong fidelity to user-specified text prompts. Please refer to our project page for video results: https://3d-consistent-editing.github.io/

URL PDF HTML ☆

赞 0 踩 0

2511.20469 2026-03-23 cs.CV cs.LG

Dance Style Classification using Laban-Inspired and Frequency-Domain Motion Features

Ben Hamscher, Arnold Brosch, Nicolas Binninger, Maksymilian Jan Dejna, Kira Maag

2511.17910 2026-03-23 cs.CL

L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention

Yuliang Zhan, Xinyu Tang, Han Wan, Jian Li, Ji-Rong Wen, Hao Sun

Comments AAAI 2026 oral

2511.17885 2026-03-23 cs.CV cs.LG

FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning

Guoyang Xia, Yifeng Ding, Fengfa Li, Lei Ren, Wei Chen, Fangxiang Feng, Xiaojie Wang

2511.17318 2026-03-23 cs.RO cs.AI cs.CE cs.LG physics.app-ph

FORWARD: Dataset of a forwarder operating in rough terrain

Mikael Lundbäck, Erik Wallin, Carola Häggström, Mattias Nyström, Andreas Grönlund, Mats Richardson, Petrus Jönsson, William Arnvik, Lucas Hedström, Arvid Fälldin, Martin Servin

Comments 33 pages, 24 figures

详情

英文摘要

We present FORWARD, a high-resolution multimodal dataset of a cut-to-length forwarder operating in rough terrain on two harvest sites in the middle part of Sweden. The forwarder is a large Komatsu model equipped with vehicle telematics sensors, including global positioning via satellite navigation, movement sensors, accelerometers, and engine sensors. The forwarder was additionally equipped with cameras, operator vibration sensors, and multiple IMUs. The data includes event time logs recorded at 5 Hz of driving speed, fuel consumption, machine position with centimeter accuracy, and crane use while the forwarder operates in forest areas, aerially laser-scanned with a resolution of around 1500 points per square meter. Production log files (Stanford standard) with time-stamped machine events, extensive video material, and terrain data in various formats are included as well. About 18 hours of regular wood extraction work during three days is annotated from 360-video material into individual work elements and included in the dataset. We also include scenario specifications of conducted experiments on forest roads and in terrain. Scenarios include repeatedly driving the same routes with and without steel tracks, different load weights, and different target driving speeds. The dataset is intended for developing models and algorithms for trafficability, perception, and autonomous control of forest machines using artificial intelligence, simulation, and experiments on physical testbeds. In part, we focus on forwarders traversing terrain, avoiding or handling obstacles, and loading or unloading logs, with consideration for efficiency, fuel consumption, safety, and environmental impact. Other benefits of the open dataset include the ability to explore auto-generation and calibration of forestry machine simulators and automation scenario descriptions using the data recorded in the field.

URL PDF HTML ☆

赞 0 踩 0

2511.16665 2026-03-23 cs.LG cs.AI cs.DC

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han

2511.15164 2026-03-23 cs.CV

Multimodal Continual Instruction Tuning with Dynamic Gradient Guidance

Songze Li, Mingyu Gao, Tonghua Su, Xu-Yao Zhang, Zhongjie Wang

2511.09833 2026-03-23 cs.LG

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Lequan Lin, Dai Shi, Andi Han, Feng Chen, Qiuzheng Chen, Jiawen Li, Zhaoyang Li, Jiyuan Li, Zhenbang Sun, Junbin Gao

Comments NeurIPS 2025

2511.08916 2026-03-23 cs.CL

HalluClean: A Unified Framework to Combat Hallucinations in LLMs

Yaxin Zhao, Yu Zhang

2511.07798 2026-03-23 cs.CV

Divide-and-Conquer Decoupled Network for Cross-Domain Few-Shot Segmentation

Runmin Cong, Anpeng Wang, Bin Wan, Cong Zhang, Xiaofei Zhou, Wei Zhang

2511.07213 2026-03-23 cs.LG

DETECT: Data-Driven Evaluation of Treatments Enabled by Classification Transformers

Yuanheng Mao, Lillian Yang, Stephen Yang, Ethan Shao, Zihan Li

Comments 5 pages, 4 figures, 2 tables, presented and awarded Best Paper Runner-Up at the IEEE ICDM 2025 UGHS Symposium

2511.01998 2026-03-23 cs.CV cs.NA math.NA

Locally-Supervised Global Image Restoration

Benjamin Walder, Daniel Toader, Robert Nuster, Günther Paltauf, Peter Burgholzer, Gregor Langer, Lukas Krainer, Markus Haltmeier

2511.00171 2026-03-23 cs.CV

CompAgent: An Agentic Framework for Visual Compliance Verification

Rahul Ghosh, Baishali Chaudhury, Hari Prasanna Das, Meghana Ashok, Ryan Razkenari, Long Chen, Sungmin Hong, Chun-Hao Liu

Comments Accepted to IEEE CVPR 2026 GRAIL-V Workshop

2510.23766 2026-03-23 cs.CL

BitSkip: An Empirical Analysis of Quantization and Early Exit Composition in Transformers

Ramshankar Bhuvaneswaran, Handan Liu

2510.15520 2026-03-23 cs.CV cs.LG

Discovering Intersectional Bias via Directional Alignment in Face Recognition Embeddings

Ignacio Serna

2510.14412 2026-03-23 cs.AI

PDDL Axioms Are Equivalent to Least Fixed Point Logic (Extended Version)

Claudia Grundke, Gabriele Röger

Comments v1: Extended version of "Eliminating Negative Occurrences of Derived Predicates from PDDL Axioms" at the joint KR/ICAPS 2025 workshop KRPlan; v2: Extended version of "PDDL Axioms Are Equivalent to Least Fixed Point Logic" (ICAPS 2026). It adds the result on the equivalence of PDDL axioms and LFP and does now longer contain the deeper analysis of the blow-up incurred by the compilation

2510.14184 2026-03-23 cs.LG cs.AI cs.CL

MAFA: A Multi-Agent Framework for Enterprise-Scale Annotation with Configurable Task Adaptation

Mahmood Hegazy, Aaron Rodrigues, Azzam Naeem

详情

DOI: 10.1609/aaai.v40i47.41431
Journal ref: AAAI Conference on Artificial Intelligence 2026

英文摘要

We present MAFA (Multi-Agent Framework for Annotation), a production-deployed system that transforms enterprise-scale annotation workflows through configurable multi-agent collaboration. Addressing the critical challenge of annotation backlogs in financial services, where millions of customer utterances require accurate categorization, MAFA combines specialized agents with structured reasoning and a judge-based consensus mechanism. Our framework uniquely supports dynamic task adaptation, allowing organizations to define custom annotation types (FAQs, intents, entities, or domain-specific categories) through configuration rather than code changes. Deployed at JP Morgan Chase, MAFA has eliminated a 1 million utterance backlog while achieving, on average, 86% agreement with human annotators, annually saving over 5,000 hours of manual annotation work. The system processes utterances with annotation confidence classifications, which are typically 85% high, 10% medium, and 5% low across all datasets we tested. This enables human annotators to focus exclusively on ambiguous and low-coverage cases. We demonstrate MAFA's effectiveness across multiple datasets and languages, showing consistent improvements over traditional and single-agent annotation baselines: 13.8% higher Top-1 accuracy, 15.1% improvement in Top-5 accuracy, and 16.9% better F1 in our internal intent classification dataset and similar gains on public benchmarks. This work bridges the gap between theoretical multi-agent systems and practical enterprise deployment, providing a blueprint for organizations facing similar annotation challenges.

URL PDF HTML ☆

赞 0 踩 0

2510.10602 2026-03-23 cs.RO cs.CV

SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams

Zhuoheng Gao, Jiyao Zhang, Zhiyong Xie, Hao Dong, Zhaofei Yu, Rongmei Chen, Guozhang Chen, Tiejun Huang

Comments Some real machine experiments need to be supplemented, and the entire paper is incomplete

2510.08096 2026-03-23 cs.CV

Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting

Ankit Gahlawat, Anirban Mukherjee, Dinesh Babu Jayagopi

Comments Accepted to VCIP 2025 (International Conference on Visual Communications and Image Processing 2025)

2510.05138 2026-03-23 cs.CL

LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation

Gregory Hok Tjoan Go, Khang Ly, Anders Søgaard, Amin Tabatabaei, Maarten de Rijke, Xinyi Chen

Comments Published at the 40th AAAI Conference on Artificial Intelligence. Please cite the published version here: https://ojs.aaai.org/index.php/AAAI/article/view/41489

2510.02835 2026-03-23 cs.LG

Subject-Adaptive Sparse Linear Models for Interpretable Personalized Health Prediction from Multimodal Lifelog Data

Dohyun Bu, Jisoo Han, Soohwa Kwon, Yulim So, Jong-Seok Lee

Comments 6 pages, ICTC 2025

详情

DOI: 10.1109/ICTC66702.2025.11389118

英文摘要

Improved prediction of personalized health outcomes -- such as sleep quality and stress -- from multimodal lifelog data could have meaningful clinical and practical implications. However, state-of-the-art models, primarily deep neural networks and gradient-boosted ensembles, sacrifice interpretability and fail to adequately address the significant inter-individual variability inherent in lifelog data. To overcome these challenges, we propose the Subject-Adaptive Sparse Linear (SASL) framework, an interpretable modeling approach explicitly designed for personalized health prediction. SASL integrates ordinary least squares regression with subject-specific interactions, systematically distinguishing global from individual-level effects. We employ an iterative backward feature elimination method based on nested $F$-tests to construct a sparse and statistically robust model. Additionally, recognizing that health outcomes often represent discretized versions of continuous processes, we develop a regression-then-thresholding approach specifically designed to maximize macro-averaged F1 scores for ordinal targets. For intrinsically challenging predictions, SASL selectively incorporates outputs from compact LightGBM models through confidence-based gating, enhancing accuracy without compromising interpretability. Evaluations conducted on the CH-2025 dataset -- which comprises roughly 450 daily observations from ten subjects -- demonstrate that the hybrid SASL-LightGBM framework achieves predictive performance comparable to that of sophisticated black-box methods, but with significantly fewer parameters and substantially greater transparency, thus providing clear and actionable insights for clinicians and practitioners.

URL PDF HTML ☆

赞 0 踩 0

2509.24897 2026-03-23 cs.AI

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Yang Shi, Yuhao Dong, Yue Ding, Yuran Wang, Xuanyu Zhu, Sheng Zhou, Wenting Liu, Haochen Tian, Rundong Wang, Huanqian Wang, Zuyan Liu, Bohan Zeng, Ruizhe Chen, Qixun Wang, Zhuoran Zhang, Xinlong Chen, Chengzhuo Tong, Bozhou Li, Qiang Liu, Haotian Wang, Wenjing Yang, Yuanxing Zhang, Pengfei Wan, Yi-Fan Zhang, Ziwei Liu

Comments Accepted by CVPR 2026

详情

英文摘要

The integration of visual understanding and generation into unified multimodal models represents a significant stride toward general-purpose AI. However, a fundamental question remains unanswered by existing benchmarks: does this architectural unification actually enable synergetic interaction between the constituent capabilities? Existing evaluation paradigms, which primarily assess understanding and generation in isolation, are insufficient for determining whether a unified model can leverage its understanding to enhance its generation, or use generative simulation to facilitate deeper comprehension. To address this critical gap, we introduce RealUnify, a benchmark specifically designed to evaluate bidirectional capability synergy. RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks. It is structured around two core axes: 1) Understanding Enhances Generation, which requires reasoning (e.g., commonsense, logic) to guide image generation, and 2) Generation Enhances Understanding, which necessitates mental simulation or reconstruction (e.g., of transformed or disordered visual inputs) to solve reasoning tasks. A key contribution is our dual-evaluation protocol, which combines direct end-to-end assessment with a diagnostic stepwise evaluation that decomposes tasks into distinct understanding and generation phases. This protocol allows us to precisely discern whether performance bottlenecks stem from deficiencies in core abilities or from a failure to integrate them. Through large-scale evaluations of 12 leading unified models and 6 specialized baselines, we find that current unified models still struggle to achieve effective synergy, indicating that architectural unification alone is insufficient. These results highlight the need for new training strategies and inductive biases to fully unlock the potential of unified modeling.

URL PDF HTML ☆

赞 0 踩 0

2509.24837 2026-03-23 cs.CV

ZOO-Prune: Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models

Youngeun Kim, Youjia Zhang, Huiling Liu, Aecheon Jung, Sunwoo Lee, Sungeun Hong

2509.19464 2026-03-23 cs.AI cs.LG

Evaluation-Aware Reinforcement Learning

Shripad Vilasrao Deshmukh, Will Schwarzer, Scott Niekum

Comments 11 pages

2509.14460 2026-03-23 cs.RO

Learning Discrete Abstractions for Visual Rearrangement Tasks Using Vision-Guided Graph Coloring

Abhiroop Ajith, Constantinos Chamzas

2509.08625 2026-03-23 cs.LG

An upper bound on the silhouette evaluation metric for clustering

Hugo Sträng, Tai Dinh

2509.03962 2026-03-23 cs.CL

Exploring NLP Benchmarks in an Extremely Low-Resource Setting

Ulin Nuha, Adam Jatowt

Comments The Association for Computational Linguistics: EACL 2026

2509.00402 2026-03-23 cs.LG cs.AI

Curriculum Guided Personalized Subgraph Federated Learning

Minku Kang, Hogun Park

Comments Accepted to the CIKM 2025. This is an extended version of the original submission

详情

DOI: 10.1145/3746252.3761128

英文摘要

Subgraph Federated Learning (FL) aims to train Graph Neural Networks (GNNs) across distributed private subgraphs, but it suffers from severe data heterogeneity. To mitigate data heterogeneity, weighted model aggregation personalizes each local GNN by assigning larger weights to parameters from clients with similar subgraph characteristics inferred from their current model states. However, the sparse and biased subgraphs often trigger rapid overfitting, causing the estimated client similarity matrix to stagnate or even collapse. As a result, aggregation loses effectiveness as clients reinforce their own biases instead of exploiting diverse knowledge otherwise available. To this end, we propose a novel personalized subgraph FL framework called Curriculum guided personalized sUbgraph Federated Learning (CUFL). On the client side, CUFL adopts Curriculum Learning (CL) that adaptively selects edges for training according to their reconstruction scores, exposing each GNN first to easier, generic cross-client substructures and only later to harder, client-specific ones. This paced exposure prevents early overfitting to biased patterns and enables gradual personalization. By regulating personalization, the curriculum also reshapes server aggregation from exchanging generic knowledge to propagating client-specific knowledge. Further, CUFL improves weighted aggregation by estimating client similarity using fine-grained structural indicators reconstructed on a random reference graph. Extensive experiments on six benchmark datasets confirm that CUFL achieves superior performance compared to relevant baselines. Code is available at https://github.com/Kang-Min-Ku/CUFL.git.

URL PDF HTML ☆

赞 0 踩 0

2508.19752 2026-03-23 cs.LG

Fast 3D Diffusion for Scalable Granular Media Synthesis

Muhammad Moeeze Hassan, Régis Cottereau, Filippo Gatti, Patryk Dec

2508.13876 2026-03-23 cs.AI cs.CL

Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

Katharina Stein, Nils Hodel, Daniel Fišer, Jörg Hoffmann, Michael Katz, Alexander Koller