arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1722
2505.17209 2026-04-10 cs.RO cs.AI

LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios

Huaiyuan Yao, Pengfei Li, Bu Jin, Yupeng Zheng, An Liu, Lisen Mu, Qing Su, Qian Zhang, Yilun Chen, Peng Li

Comments 7 pages, 3 figures

详情
英文摘要

Recent advances in autonomous driving research towards motion planners that are robust, safe, and adaptive. However, existing rule-based and data-driven planners lack adaptability to long-tail scenarios, while knowledge-driven methods offer strong reasoning but face challenges in representation, control, and real-world evaluation. To address these challenges, we present LiloDriver, a lifelong learning framework for closed-loop motion planning in long-tail autonomous driving scenarios. By integrating large language models (LLMs) with a memory-augmented planner generation system, LiloDriver continuously adapts to new scenarios without retraining. It features a four-stage architecture including perception, scene encoding, memory-based strategy refinement, and LLM-guided reasoning. Evaluated on the nuPlan benchmark, LiloDriver achieves superior performance in both common and rare driving scenarios, outperforming static rule-based and learning-based planners. Our results highlight the effectiveness of combining structured memory and LLM reasoning to enable scalable, human-like motion planning in real-world autonomous driving. Our code is available at https://github.com/Hyan-Yao/LiloDriver.

2505.15960 2026-04-10 cs.CL

Efficient PRM Training Data Synthesis via Formal Verification

Ryo Kamoi, Yusen Zhang, Nan Zhang, Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Wenpeng Yin, Rui Zhang

Comments ACL 2026 Findings. Datasets, models, and code are provided at https://github.com/psunlpgroup/FoVer. Please also refer to our project website at https://fover-prm.github.io/

详情
英文摘要

Process Reward Models (PRMs) have emerged as a promising approach for improving LLM reasoning capabilities by providing process supervision over reasoning traces. However, existing approaches for constructing PRM training data remain costly and noisy, as they typically rely on human annotation or sampling-based labeling methods that require repeated LLM calls. In this work, we propose FoVer, a framework that synthesizes PRM training data from formal reasoning tasks by annotating step-level error labels using formal verification tools such as Z3 and Isabelle. By leveraging formal verification, FoVer enables efficient and accurate PRM data construction without requiring human annotation or additional LLM calls. Using FoVer, we create PRM training data from formal logic and theorem proving tasks. Experiments on 12 reasoning benchmarks show that fine-tuning on our training data improves PRMs not only on math and logic reasoning tasks, which are informal variants of the training tasks, but also on NLI and BBH benchmarks, which differ substantially from the tasks used to construct the training data. These results demonstrate the practical effectiveness of FoVer, showing that PRM training data created using formal verification improves PRMs on informal reasoning tasks written in natural language. The datasets, models, and code are provided at https://github.com/psunlpgroup/FoVer.

2505.13126 2026-04-10 cs.AI cs.CL

Iterative Formalization and Planning in Partially Observable Environments

Liancheng Gong, Wang Zhu, Jesse Thomason, Li Zhang

Comments In Findings of ACL 2026

详情
英文摘要

Using LLMs not to predict plans but to formalize an environment into the Planning Domain Definition Language (PDDL) has been shown to improve performance and control. While most existing methodology only applies to fully observable environments, we adapt to the more realistic and challenging partially observable environments without sufficient information to make a complete plan. We propose PDDLego, a framework to iteratively formalize, plan, grow, and refine PDDL representations by decomposing the environment and the goal into fully observable episodes. Without finetuning, in-context exemplars, or trajectories, PDDLego improves planning success and exhibits robustness against problem complexity compared to end-to-end approaches. We also show that the domain knowledge captured after a successful trial can benefit future tasks.

2505.07315 2026-04-10 cs.AI cs.LG

FedIFL: A federated cross-domain diagnostic framework for motor-driven systems with inconsistent fault modes

Zexiao Wang, Yankai Wang, Xiaoqiang Liao, Xinguo Ming, Weiming Shen

Comments The paper is being withdrawn as we found that it did not fully articulate the representation of deep implicit features, which is the core focus of our work. Additionally, the experiments were incomplete and lacked sufficient analysis. We plan to revise the paper, clarify these aspects, and enhance the experimental validation before resubmitting

详情
英文摘要

Due to the scarcity of industrial data, individual equipment users, particularly start-ups, struggle to independently train a comprehensive fault diagnosis model; federated learning enables collaborative training while ensuring data privacy, making it an ideal solution. However, the diversity of working conditions leads to variations in fault modes, resulting in inconsistent label spaces across different clients. In federated diagnostic scenarios, label space inconsistency leads to local models focus on client-specific fault modes and causes local models from different clients to map different failure modes to similar feature representations, which weakens the aggregated global model's generalization. To tackle this issue, this article proposed a federated cross-domain diagnostic framework termed Federated Invariant Features Learning (FedIFL). In intra-client training, prototype contrastive learning mitigates intra-client domain shifts, subsequently, feature generating ensures local models can access distributions of other clients in a privacy-friendly manner. Besides, in cross-client training, a feature disentanglement mechanism is introduced to mitigate cross-client domain shifts, specifically, an instance-level federated instance consistency loss is designed to ensure the instance-level consistency of invariant features between different clients, furthermore, a federated instance personalization loss and an orthogonal loss are constructed to distinguish specific features that from the invariant features. Eventually, the aggregated model achieves promising generalization among global label spaces, enabling accurate fault diagnosis for target clients' Motor Driven Systems (MDSs) with inconsistent label spaces. Experiments on real-world MDSs validate the effectiveness and superiority of FedIFL in federated cross-domain diagnosis with inconsistent fault modes.

2505.05020 2026-04-10 cs.LG

Approximately Equivariant Recurrent Generative Models for Quasi-Periodic Time Series with a Progressive Training Scheme

Ruwen Fulek, Markus Lange-Hegermann

详情
英文摘要

We present a simple yet effective generative model for time series, based on a Recurrent Variational Autoencoder that we refer to as AEQ-RVAE-ST. Recurrent layers often struggle with unstable optimization and poor convergence when modeling long sequences. To address these limitations, we introduce a training scheme that subsequently increases the sequence length, stabilizing optimization and enabling consistent learning over extended horizons. By composing known components into a recurrent, approximately time-shift-equivariant topology, our model introduces an inductive bias that aligns with the structure of quasi-periodic and nearly stationary time series. Across several benchmark datasets, AEQ-RVAE-ST matches or surpasses state-of-the-art generative models, particularly on quasi-periodic data, while remaining competitive on more irregular signals. Performance is evaluated through ELBO, Fréchet Distance, discriminative metrics, and visualizations of the learned latent embeddings.

2505.00017 2026-04-10 cs.CL cs.AI cs.DB cs.LG

ReCellTy: Domain-Specific Knowledge Graph Retrieval-Augmented LLMs Reasoning Workflow for Single-Cell Annotation

Dezheng Han, Yibin Jia, Ruxiao Chen, Wenjie Han, Shuaishuai Guo, Jianbo Wang

详情
英文摘要

With the rapid development of large language models (LLMs), their application to cell type annotation has drawn increasing attention. However, general-purpose LLMs often face limitations in this specific task due to the lack of guidance from external domain knowledge. To enable more accurate and fully automated cell type annotation, we develop a globally connected knowledge graph comprising 18850 biological information nodes, including cell types, gene markers, features, and other related entities, along with 48,944 edges connecting these nodes, which is used by LLMs to retrieve entities associated with differential genes for cell reconstruction. Additionally, a multi-task reasoning workflow is designed to optimise the annotation process. Compared to general-purpose LLMs, our method improves human evaluation scores by up to 0.21 and semantic similarity by 6.1% across multiple tissue types, while more closely aligning with the cognitive logic of manual annotation. Meanwhile, it narrows the performance gap between large and small LLMs in cell type annotation, offering a paradigm for structured knowledge integration and reasoning in bioinformatics.

2504.17069 2026-04-10 cs.CV cs.AI

Distilling Specialized Orders for Visual Generation

Rishav Pramanik, Amin Sghaier, Masih Aminbeidokhti, Juan A. Rodriguez, Antoine Poupon, David Vazquez, Christopher Pal, Zhaozheng Yin, Marco Pedersoli

详情
英文摘要

Autoregressive (AR) image generators are becoming increasingly popular due to their ability to produce high-quality images and their scalability. Typical AR models are locked onto a specific generation order, often a raster-scan from top-left to bottom-right; this prohibits multi-task flexibility (inpainting, editing, outpainting) without retraining. Any-order AR models address this by learning to generate under arbitrary patch orderings, but at the cost of increased complexity and lower performance. In this paper, we present Ordered Autoregressive (OAR) generation, a self-distillation pipeline that first trains an any-order AR model, then extracts specialized generation orders from the model's own confidence scores, and fine-tunes on these orders. This achieves two goals: 1) improved generation quality by redirecting capacity from learning all $N!$ orderings to a single specialized path, and 2) preserved flexibility of any-order models. On ImageNet $256\times 256$, OAR improves FID from 2.39 to 2.17 over the any-order baseline, with consistent gains on Fashion Products and CelebA-HQ. OAR supports zero-shot inpainting and outpainting without retraining, and human evaluation shows 64% preference over the baseline. The pipeline requires only lightweight fine-tuning on a pretrained any-order model, with no architectural changes or additional annotations.

2504.13102 2026-04-10 cs.SD cs.AI eess.AS

A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition

Wei Huang, Shumeng Sun, Junpeng Lu, Zhenpeng Xu, Zhengyang Xiu, Hao Zhang

详情
Journal ref
Expert Systems with Applications,2026
英文摘要

Underwater acoustic target recognition (UATR) is of great significance for the protection of marine diversity and national defense security. The development of deep learning provides new opportunities for UATR, but faces challenges brought by the scarcity of reference samples and complex environmental interference. To address these issues, we proposes a multi-task balanced channel attention convolutional neural network (MT-BCA-CNN). The method integrates a channel attention mechanism with a multi-task learning strategy, constructing a shared feature extractor and multi-task classifiers to jointly optimize target classification and feature reconstruction tasks. The channel attention mechanism dynamically enhances discriminative acoustic features such as harmonic structures while suppressing noise. Experiments on the Watkins Marine Life Dataset demonstrate that MT-BCA-CNN achieves 97\% classification accuracy and 95\% $F1$-score in 27-class few-shot scenarios, significantly outperforming traditional CNN and ACNN models, as well as popular state-of-the-art UATR methods. Ablation studies confirm the synergistic benefits of multi-task learning and attention mechanisms, while a dynamic weighting adjustment strategy effectively balances task contributions. This work provides an efficient solution for few-shot underwater acoustic recognition, advancing research in marine bioacoustics and sonar signal processing.

2504.13015 2026-04-10 cs.CV

Hierarchical Feature Learning for Medical Point Clouds via State Space Model

Guoqing Zhang, Jingyun Yang, Yang Li

Comments 10 pages, 3 figures

详情
英文摘要

Deep learning-based point cloud modeling has been widely investigated as an indispensable component of general shape analysis. Recently, transformer and state space model (SSM) have shown promising capacities in point cloud learning. However, limited research has been conducted on medical point clouds, which have great potential in disease diagnosis and treatment. This paper presents an SSM-based hierarchical feature learning framework for medical point cloud understanding. Specifically, we down-sample input into multiple levels through the farthest point sampling. At each level, we perform a series of k-nearest neighbor (KNN) queries to aggregate multi-scale structural information. To assist SSM in processing point clouds, we introduce coordinate-order and inside-out scanning strategies for efficient serialization of irregular points. Point features are calculated progressively from short neighbor sequences and long point sequences through vanilla and group Point SSM blocks, to capture both local patterns and long-range dependencies. To evaluate the proposed method, we build a large-scale medical point cloud dataset named MedPointS for anatomy classification, completion, and segmentation. Extensive experiments conducted on MedPointS demonstrate that our method achieves superior performance across all tasks. The dataset is available at https://flemme-docs.readthedocs.io/en/latest/medpoints.html. Code is merged to a public medical imaging platform: https://github.com/wlsdzyzl/flemme.

2503.23078 2026-04-10 cs.CL

EventWeave: A Dynamic Framework for Capturing Core and Supporting Events in Dialogue Systems

Zhengyi Zhao, Shubo Zhang, Yiming Du, Bin Liang, Baojun Wang, Zhongyang Li, Binyang Li, Kam-Fai Wong

Comments Accepted by ACL'26

详情
英文摘要

Large language models have improved dialogue systems, but often process conversational turns in isolation, overlooking the event structures that guide natural interactions. Hence we introduce EventWeave, a framework that explicitly models relationships between conversational events to generate more contextually appropriate dialogue responses. EventWeave constructs a dynamic event graph that distinguishes between core events (main goals) and supporting events (interconnected details), employing a multi-head attention mechanism to selectively determine which events are most relevant to the current turn. Unlike summarization or standard graph-based approaches, our method captures three distinct relationship types between events, allowing for more nuanced context modeling. Experiments on three dialogue datasets demonstrate that EventWeave produces more natural and contextually appropriate responses while requiring less computational overhead than models processing the entire dialogue history. Ablation studies confirm improvements stem from better event relationship modeling rather than increased information density. Our approach effectively balances comprehensive context understanding with generating concise responses, maintaining strong performance across various dialogue lengths through targeted optimization techniques.

2503.10183 2026-04-10 cs.CV cs.AI

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

Shunqi Mao, Chaoyi Zhang, Weidong Cai

Comments ACL 2026 Main Conference

详情
英文摘要

Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by contrastively reducing language biases or amplifying the weights of visual embedding during decoding. However, these approaches remain limited in their ability to capture fine-grained visual details. In this work, we propose the Perception Magnifier (PM), a novel visual decoding method that iteratively isolates relevant visual tokens based on attention and magnifies the corresponding regions, spurring the model to concentrate on fine-grained visual details during decoding. By magnifying critical regions while preserving the structural and contextual information at each decoding step, PM allows the VLM to enhance its scrutiny of the visual input, hence producing more accurate and faithful responses. Extensive experimental results demonstrate that PM not only achieves superior hallucination mitigation but also enhances language generation while preserving strong reasoning capabilities. Code can be found at https://github.com/ShunqiM/PM.

2503.02537 2026-04-10 cs.CV cs.AI

RectifiedHR: Enable Efficient High-Resolution Synthesis via Energy Rectification

Zhen Yang, Guibao Shen, Minyang Li, Liang Hou, Mushui Liu, Luozhou Wang, Xin Tao, Ying-Cong Chen

Comments Project Page: https://zhenyangcs.github.io/RectifiedHR-Diffusion/

详情
英文摘要

Diffusion models have achieved remarkable progress across various visual generation tasks. However, their performance significantly declines when generating content at resolutions higher than those used during training. Although numerous methods have been proposed to enable high-resolution generation, they all suffer from inefficiency. In this paper, we propose RectifiedHR, a straightforward and efficient solution for training-free high-resolution synthesis. Specifically, we propose a noise refresh strategy that unlocks the model's training-free high-resolution synthesis capability and improves efficiency. Additionally, we are the first to observe the phenomenon of energy decay, which may cause image blurriness during the high-resolution synthesis process. To address this issue, we introduce average latent energy analysis and find that tuning the classifier-free guidance hyperparameter can significantly improve generation performance. Our method is entirely training-free and demonstrates efficient performance. Furthermore, we show that RectifiedHR is compatible with various diffusion model techniques, enabling advanced features such as image editing, customized generation, and video synthesis. Extensive comparisons with numerous baseline methods validate the superior effectiveness and efficiency of RectifiedHR.

2503.01870 2026-04-10 cs.CL cs.AI cs.HC econ.GN q-fin.EC

Transforming the Voice of the Customer: Large Language Models for Identifying Customer Needs

Artem Timoshenko, Chengfeng Mao, John R. Hauser

详情
英文摘要

Identifying customer needs (CNs) is fundamental to product innovation and marketing strategy. Yet for over thirty years, Voice-of-the-Customer (VOC) applications have relied on professional analysts to manually interpret qualitative data and formulate "jobs to be done." This task is cognitively demanding, time-consuming, and difficult to scale. While current practice uses machine learning to screen content, the critical final step of precisely formulating CNs relies on expert human judgment. We conduct a series of studies with market research professionals to evaluate whether Large Language Models (LLMs) can automate CN abstraction. Across various product and service categories, we demonstrate that supervised fine-tuned (SFT) LLMs perform at least as well as professional analysts and substantially better than foundational LLMs. These results generalize to alternative foundational LLMs and require relatively "small" models. The abstracted CNs are well-formulated, sufficiently specific to guide innovation, and grounded in source content without hallucination. Our analysis suggests that SFT training enables LLMs to learn the underlying syntactic and semantic conventions of professional CN formulation rather than relying on memorized CNs. Automation of tedious tasks transforms the VOC approach by enabling the discovery of high-leverage insights at scale and by refocusing analysts on higher-value-added tasks.

2502.19559 2026-04-10 cs.CL

Stay Focused: Problem Drift in Multi-Agent Debate

Jonas Becker, Lars Benedikt Kaesberg, Andreas Stephan, Jan Philip Wahle, Terry Ruas, Bela Gipp

Comments accepted at EACL 2026

详情
英文摘要

Multi-agent debate - multiple instances of large language models discussing problems in turn-based interaction - has shown promise for solving knowledge and reasoning tasks. However, these methods show limitations when solving complex problems that require longer reasoning chains. We analyze how multi-agent debate drifts away from the initial problem over multiple turns, thus harming task performance. We define this phenomenon as problem drift and quantify its presence across ten tasks (i.e., three generative, three knowledge, three reasoning, and one instruction-following task). We find that generative tasks drift often due to the subjectivity of the answer space (76-89%), compared to high-complexity tasks (7-21%). To identify the reasons, eight human experts analyze 170 multi-agent debates suffering from problem drift. We find the most common issues related to this drift are the lack of progress (35% of cases), low-quality feedback (26% of cases), and a lack of clarity (25% of cases). We propose DRIFTJudge, an LLM-as-a-judge method, as a first baseline to detect problem drift. We also propose DRIFTPolicy, which mitigates 31% of problem drift cases. Our study is a step toward understanding a key limitation of multi-agent debate, highlighting why longer debates can harm task performance and how problem drift could be addressed.

2502.15512 2026-04-10 cs.LG

SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning

Xuyang Li, Romit Maulik

详情
英文摘要

Modern deep reinforcement learning (DRL) methods have made significant advances in handling continuous action spaces. However, real-world control systems, especially those requiring precise and reliable performance, often demand interpretability in the sense of a-priori assessments of agent behavior to identify safe or failure-prone interactions with environments. To address this limitation, this work proposes SALSA-RL (Stability Analysis in the Latent Space of Actions), a novel RL framework that models control actions as dynamic, time-dependent variables evolving within a latent space. By employing a pre-trained encoder-decoder and a state-dependent linear system, this approach enables interpretability through local stability analysis, where instantaneous growth in action-norms can be predicted before their execution. It is demonstrated that SALSA-RL can be deployed in a non-invasive manner for assessing the local stability of actions from pretrained RL agents without compromising on performance across diverse benchmark environments. By enabling a more interpretable analysis of action generation, SALSA-RL provides a powerful tool for advancing the design, analysis, and theoretical understanding of RL systems.

2412.20718 2026-04-10 cs.CV cs.AI

MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models

Bei Yan, Jie Zhang, Zhiyuan Chen, Shiguang Shan, Xilin Chen

Comments Accepted by Pattern Recognition

详情
英文摘要

The rapid integration of Large Vision-Language Models (LVLMs) into critical domains necessitates comprehensive moral evaluation to ensure their alignment with human values. While extensive research has addressed moral evaluation in LLMs, text-centric assessments cannot adequately capture the complex contextual nuances and ambiguities introduced by visual modalities. To bridge this gap, we introduce MM-MoralBench, a multimodal moral evaluation benchmark grounded in Moral Foundations Theory. We construct unique multimodal scenarios by combining synthesized visual contexts with character dialogues to simulate real-world dilemmas where visual and linguistic information interact dynamically. Our benchmark assesses models across six moral foundations through moral judgment, classification, and response tasks. Extensive evaluations of over 20 LVLMs reveal that models exhibit pronounced moral alignment bias, diverging significantly from human consensus. Furthermore, our analysis indicates that general scaling or structural improvements yield diminishing returns in moral alignment, and thinking paradigm may trigger overthinking-induced failures in moral contexts, highlighting the necessity for targeted moral alignment strategies. Our benchmark is publicly available.

2412.15922 2026-04-10 cs.LG cs.SD eess.AS

RiTTA: Modeling Event Relations in Text-to-Audio Generation

Yuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet

Comments EMNLP25, Project Site: https://yuhanghe01.github.io/RiTTA-Proj/. Code: https://github.com/yuhanghe01/RiTTA

详情
英文摘要

Despite significant advancements in Text-to-Audio (TTA) generation models achieving high-fidelity audio with fine-grained context understanding, they struggle to model the relations between audio events described in the input text. However, previous TTA methods have not systematically explored audio event relation modeling, nor have they proposed frameworks to enhance this capability. In this work, we systematically study audio event relation modeling in TTA generation models. We first establish a benchmark for this task by: 1. proposing a comprehensive relation corpus covering all potential relations in real-world scenarios; 2. introducing a new audio event corpus encompassing commonly heard audios; and 3. proposing new evaluation metrics to assess audio event relation modeling from various perspectives. Furthermore, we propose a finetuning framework to enhance existing TTA models ability to model audio events relation. Code is available at: https://github.com/yuhanghe01/RiTTA

2412.10437 2026-04-10 cs.CV cs.GR cs.LG

SVGFusion: A VAE-Diffusion Transformer for Vector Graphic Generation

Ximing Xing, Juncheng Hu, Ziteng Xue, Jing Zhang, Buyu Li, Sheng Wang, Dong Xu, Qian Yu

Comments project page: https://ximinng.github.io/SVGFusionProject/

详情
英文摘要

Generating high-quality Scalable Vector Graphics (SVGs) from text remains a significant challenge. Existing LLM-based models that generate SVG code as a flat token sequence struggle with poor structural understanding and error accumulation, while optimization-based methods are slow and yield uneditable outputs. To address these limitations, we introduce SVGFusion, a unified framework that adapts the VAE-diffusion architecture to bridge the dual code-visual nature of SVGs. Our model features two core components: a Vector-Pixel Fusion Variational Autoencoder (VP-VAE) that learns a perceptually rich latent space by jointly encoding SVG code and its rendered image, and a Vector Space Diffusion Transformer (VS-DiT) that achieves globally coherent compositions through iterative refinement. Furthermore, this architecture is enhanced by a Rendering Sequence Modeling strategy, which ensures accurate object layering and occlusion. Evaluated on our novel SVGX-Dataset comprising 240k human-designed SVGs, SVGFusion establishes a new state-of-the-art, generating high-quality, editable SVGs that are strictly semantically aligned with the input text.

2412.08637 2026-04-10 cs.CV cs.AI cs.LG

DMin: Scalable Training Data Influence Estimation for Diffusion Models

Huawei Lin, Yingjie Lao, Weijie Zhao

Comments Accepted to CVPR 2026 (Findings)

详情
英文摘要

Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models (DMs), yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to computational limitations. To address this challenge, we propose DMin (Diffusion Model influence), a scalable framework for estimating the influence of each training data sample on a given generated image. To the best of our knowledge, it is the first method capable of influence estimation for DMs with billions of parameters. Leveraging efficient gradient compression, DMin reduces storage requirements from hundreds of TBs to mere MBs or even KBs, and retrieves the top-k most influential training samples in under 1 second, all while maintaining performance. Our empirical results demonstrate DMin is both effective in identifying influential training samples and efficient in terms of computational and storage requirements.

2412.03884 2026-04-10 cs.AI

A Unified Framework for Evaluating and Enhancing the Transparency of Explainable AI Methods via Perturbation-Gradient Consensus Attribution

Md. Ariful Islam, Md Abrar Jahin, M. F. Mridha, Nilanjan Dey

详情
英文摘要

Explainable Artificial Intelligence (XAI) methods are increasingly used in safety-critical domains, yet there is no unified framework to jointly evaluate fidelity, interpretability, robustness, fairness, and completeness. We address this gap through two contributions. First, we propose a multi-criteria evaluation framework that formalizes these five criteria using principled metrics: fidelity via prediction-gap analysis; interpretability via a composite concentration-coherence-contrast score; robustness via cosine-similarity perturbation stability; fairness via Jensen-Shannon divergence across demographic groups; and completeness via feature-ablation coverage. These are integrated using an entropy-weighted dynamic scoring scheme that adapts to domain-specific priorities. Second, we introduce Perturbation-Gradient Consensus Attribution (PGCA), which fuses grid-based perturbation importance with Grad-CAM++ through consensus amplification and adaptive contrast enhancement, combining perturbation fidelity with gradient-based spatial precision. We evaluate across five domains (brain tumor MRI, plant disease, security screening, gender, and sunglass detection) using fine-tuned ResNet-50 models. PGCA achieves the best performance in fidelity $(2.22 \pm 1.62)$, interpretability $(3.89 \pm 0.33)$, and fairness $(4.95 \pm 0.03)$, with statistically significant improvements over baselines $(p < 10^{-7})$. Sensitivity analysis shows stable rankings (Kendall's $(τ\geq 0.88)$). Code and results are publicly available.

2411.07799 2026-04-10 cs.CV cs.RO

Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Colored Point Clouds

Daniel Fusaro, Federico Magistri, Jens Behley, Alberto Pretto, Cyrill Stachniss

详情
Journal ref
Computers and Electronics in Agriculture, Volume 247, Pages 111723, 2026
英文摘要

Accurate and consistent fruit monitoring over time is a key step toward automated agricultural production systems. However, this task is inherently difficult due to variations in fruit size, shape, occlusion, orientation, and the dynamic nature of orchards where fruits may appear or disappear between observations. In this article, we propose a novel method for fruit instance segmentation and re-identification on 3D terrestrial point clouds collected over time. Our approach directly operates on dense colored point clouds, capturing fine-grained 3D spatial detail. We segment individual fruits using a learning-based instance segmentation method applied directly to the point cloud. For each segmented fruit, we extract a compact and discriminative descriptor using a 3D sparse convolutional neural network. To track fruits across different times, we introduce an attention-based matching network that associates fruits with their counterparts from previous sessions. Matching is performed using a probabilistic assignment scheme, selecting the most likely associations across time. We evaluate our approach on real-world datasets of strawberries and apples, demonstrating that it outperforms existing methods in both instance segmentation and temporal re-identification, enabling robust and precise fruit monitoring across complex and dynamic orchard environments. Keywords = Agricultural Robotics, 3D Fruit Tracking, Instance Segmentation, Deep Learning , Point Clouds, Sparse Convolutional Networks, Temporal Monitoring

2411.02622 2026-04-10 cs.LG cs.AI

AdaProb: Efficient Machine Unlearning via Adaptive Probability

Zihao Zhao, Yuchen Yang, Anjalie Field, Yinzhi Cao

详情
英文摘要

Machine unlearning, enabling a trained model to forget specific data, is crucial for addressing erroneous data and adhering to privacy regulations like the General Data Protection Regulation (GDPR)'s "right to be forgotten". Despite recent progress, existing methods face two key challenges: residual information may persist in the model even after unlearning, and the computational overhead required for effective data removal is often high. To address these issues, we propose Adaptive Probability Approximate Unlearning (AdaProb), a novel method that enables models to forget data efficiently and in a privacy-preserving manner. Our method firstly replaces the neural network's final-layer output probabilities with pseudo-probabilities for data to be forgotten. These pseudo-probabilities follow a uniform distribution to maximize unlearning, and they are optimized to align with the model's overall distribution to enhance privacy and reduce the risk of membership inference attacks. Then, the model's weights are updated accordingly. Through comprehensive experiments, our method outperforms state-of-the-art approaches with over 20% improvement in forgetting error, better protection against membership inference attacks, and less than 50% of the computational time.

2410.22258 2026-04-10 cs.LG cs.SY eess.IV eess.SY stat.ML

LipKernel: Lipschitz-Bounded Convolutional Neural Networks via Dissipative Layers

Patricia Pauli, Ruigang Wang, Ian Manchester, Frank Allgöwer

详情
Journal ref
Automatica 188 (2026): 112959
英文摘要

We propose a novel layer-wise parameterization for convolutional neural networks (CNNs) that includes built-in robustness guarantees by enforcing a prescribed Lipschitz bound. Each layer in our parameterization is designed to satisfy a linear matrix inequality (LMI), which in turn implies dissipativity with respect to a specific supply rate. Collectively, these layer-wise LMIs ensure Lipschitz boundedness for the input-output mapping of the neural network, yielding a more expressive parameterization than through spectral bounds or orthogonal layers. Our new method LipKernel directly parameterizes dissipative convolution kernels using a 2-D Roesser-type state space model. This means that the convolutional layers are given in standard form after training and can be evaluated without computational overhead. In numerical experiments, we show that the run-time using our method is orders of magnitude faster than state-of-the-art Lipschitz-bounded networks that parameterize convolutions in the Fourier domain, making our approach particularly attractive for improving the robustness of learning-based real-time perception or control in robotics, autonomous vehicles, or automation systems. We focus on CNNs, and in contrast to previous works, our approach accommodates a wide variety of layers typically used in CNNs, including 1-D and 2-D convolutional layers, maximum and average pooling layers, as well as strided and dilated convolutions and zero padding. However, our approach naturally extends beyond CNNs as we can incorporate any layer that is incrementally dissipative.

2408.05086 2026-04-10 cs.CL cs.AI

A systematic framework for generating novel experimental hypotheses from language models

Kanishka Misra, Najoung Kim

Comments Revised version

详情
英文摘要

Neural language models (LMs) have been shown to capture complex linguistic patterns, yet their utility in understanding human language and more broadly, human cognition, remains debated. While existing work in this area often evaluates human-machine alignment, few studies attempt to translate findings from this enterprise into novel insights about humans. To this end, we propose a systematic framework for hypothesis generation that uses LMs to simulate outcomes of experiments that do not yet exist in the literature. We instantiate this framework in the context of a specific research question in child language development: dative verb acquisition and cross-structural generalization. Through this instantiation, we derive novel, untested hypotheses: the alignment between argument ordering and discourse prominence features of exposure contexts modulates how children generalize new verbs to unobserved structures. Additionally, we also design a set of experiments that can test these hypotheses in the lab with children. This work contributes both a domain-general framework for systematic hypothesis generation via simulated learners and domain-specific, lab-testable hypotheses for child language acquisition research.

2407.19426 2026-04-10 cs.LG cs.AI stat.ML

Causal Discovery in Linear Models with Unobserved Variables and Measurement Error

Yuqin Yang, Mohamed Nafea, Negar Kiyavash, Kun Zhang, AmirEmad Ghassami

详情
英文摘要

The presence of unobserved common causes and measurement error poses two major obstacles to causal structure learning, since ignoring either source of complexity can induce spurious causal relations among variables of interest. We study causal structure learning in linear systems where both challenges may occur simultaneously. We introduce a causal model called LV-SEM-ME, which contains four types of variables: directly observed variables, variables that are not directly observed but are measured with error, the corresponding measurements, and variables that are neither observed nor measured. Under a separability condition-namely, identifiability of the mixing matrix associated with the exogenous noise terms of the observed variables-together with certain faithfulness assumptions, we characterize the extent of identifiability and the corresponding observational equivalence classes. We provide graphical characterizations of these equivalence classes and develop recovery algorithms that enumerate all models in the equivalence class of the ground truth. We also establish, via a four-node union model that subsumes instrumental variable, front-door, and negative-control-outcome settings, a form of identification robustness: the target effect remains identifiable in the broader LV-SEM-ME model even when the assumptions underlying the specialized identification formulas for the corresponding submodels need not all hold simultaneously.

2407.01563 2026-04-10 cs.RO cs.AI cs.LG

NaviSlim: Adaptive Context-Aware Navigation and Sensing via Dynamic Slimmable Networks

Tim Johnsen, Marco Levorato

Comments 13 pages, 12 figures

详情
Journal ref
2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI)
英文摘要

Small-scale autonomous airborne vehicles, such as micro-drones, are expected to be a central component of a broad spectrum of applications ranging from exploration to surveillance and delivery. This class of vehicles is characterized by severe constraints in computing power and energy reservoir, which impairs their ability to support the complex state-of-the-art neural models needed for autonomous operations. The main contribution of this paper is a new class of neural navigation models -- NaviSlim -- capable of adapting the amount of resources spent on computing and sensing in response to the current context (i.e., difficulty of the environment, current trajectory, and navigation goals). Specifically, NaviSlim is designed as a gated slimmable neural network architecture that, different from existing slimmable networks, can dynamically select a slimming factor to autonomously scale model complexity, which consequently optimizes execution time and energy consumption. Moreover, different from existing sensor fusion approaches, NaviSlim can dynamically select power levels of onboard sensors to autonomously reduce power and time spent during sensor acquisition, without the need to switch between different neural networks. By means of extensive training and testing on the robust simulation environment Microsoft AirSim, we evaluate our NaviSlim models on scenarios with varying difficulty and a test set that showed a dynamic reduced model complexity on average between 57-92%, and between 61-80% sensor utilization, as compared to static neural networks designed to match computing and sensing of that required by the most difficult scenario.

2406.13086 2026-04-10 cs.RO cs.AI

NaviSplit: Dynamic Multi-Branch Split DNNs for Efficient Distributed Autonomous Navigation

Timothy K Johnsen, Ian Harshbarger, Zixia Xia, Marco Levorato

Comments 6 pages, 3 figures

详情
Journal ref
2024 IEEE 25th International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)
英文摘要

Lightweight autonomous unmanned aerial vehicles (UAV) are emerging as a central component of a broad range of applications. However, autonomous navigation necessitates the implementation of perception algorithms, often deep neural networks (DNN), that process the input of sensor observations, such as that from cameras and LiDARs, for control logic. The complexity of such algorithms clashes with the severe constraints of these devices in terms of computing power, energy, memory, and execution time. In this paper, we propose NaviSplit, the first instance of a lightweight navigation framework embedding a distributed and dynamic multi-branched neural model. At its core is a DNN split at a compression point, resulting in two model parts: (1) the head model, that is executed at the vehicle, which partially processes and compacts perception from sensors; and (2) the tail model, that is executed at an interconnected compute-capable device, which processes the remainder of the compacted perception and infers navigation commands. Different from prior work, the NaviSplit framework includes a neural gate that dynamically selects a specific head model to minimize channel usage while efficiently supporting the navigation network. In our implementation, the perception model extracts a 2D depth map from a monocular RGB image captured by the drone using the robust simulator Microsoft AirSim. Our results demonstrate that the NaviSplit depth model achieves an extraction accuracy of 72-81% while transmitting an extremely small amount of data (1.2-18 KB) to the edge server. When using the neural gate, as utilized by NaviSplit, we obtain a slightly higher navigation accuracy as compared to a larger static network by 0.3% while significantly reducing the data rate by 95%. To the best of our knowledge, this is the first exemplar of dynamic multi-branched model based on split DNNs for autonomous navigation.

2406.12009 2026-04-10 cs.CL

FinTruthQA: A Benchmark for AI-Driven Financial Disclosure Quality Assessment in Investor -- Firm Interactions

Peilin Zhou, Ziyue Xu, Xinyu Shi, Jiageng Wu, Yikang Jiang, Dading Chong, Wang Dong, Jun Chen, Bin Ke, Jie Yang

详情
英文摘要

Accurate and transparent financial information disclosure is essential for market efficiency, investor decision-making, and corporate governance. Chinese stock exchanges' investor interactive platforms provide a widely used channel through which listed firms respond to investor concerns, yet these responses are often limited or non-substantive, making disclosure quality difficult to assess at scale. To address this challenge, we introduce FinTruthQA, to our knowledge the first benchmark for AI-driven assessment of financial disclosure quality in investor-firm interactions. FinTruthQA comprises 6,000 real-world financial Q&A entries, each manually annotated based on four key evaluation criteria: question identification, question relevance, answer readability, and answer relevance. We benchmark statistical machine learning models, pre-trained language models and their fine-tuned variants, as well as large language models (LLMs), on FinTruthQA. Experiments show that existing models achieve strong performance on question identification and question relevance (F1 > 95%), but remain substantially weaker on answer readability (Micro F1 approximately 88%) and especially answer relevance (Micro F1 approximately 80%), highlighting the nontrivial difficulty of fine-grained disclosure quality assessment. Domain- and task-adapted pre-trained language models consistently outperform general-purpose models and LLM-based prompting on the most challenging settings. These findings position FinTruthQA as a practical foundation for AI-driven disclosure monitoring in capital markets, with value for regulatory oversight, investor protection, and disclosure governance in real-world financial settings.

2406.10521 2026-04-10 cs.LG cs.AI

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

Yaobin Ling, Xiaoqian Jiang, Yejin Kim

详情
英文摘要

In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data can address this, but existing models typically require substantial amounts of data to train effectively, contradicting our objective to solve data scarcity. To address this challenge, we propose a novel framework to generate synthetic tabular data, powered by large language models (LLMs) that emulates the architecture of a Generative Adversarial Network (GAN). By incorporating data generation process as contextual information and utilizing LLM as the optimizer, our approach significantly enhance the quality of synthetic data generation in common scenarios with small sample sizes. Our experimental results on public and private datasets demonstrate that our model outperforms several state-of-art models regarding generating higher quality synthetic data for downstream tasks while keeping privacy of the real data.

2406.01857 2026-04-10 cs.LG cs.NA math.NA

Neural Green's Operators for Parametric Partial Differential Equations

Hugo Melchers, Joost Prins, Michael Abdelmalik

详情
Journal ref
Computer Methods in Applied Mechanics and Engineering 455 (2026) 118893
英文摘要

This work introduces a paradigm for constructing parametric neural operators that are derived from finite-dimensional representations of Green's operators for linear partial differential equations (PDEs). We refer to such neural operators as Neural Green's Operators (NGOs). Our construction of NGOs preserves the linear action of Green's operators on the inhomogeneity fields, while approximating the nonlinear dependence of the Green's function on the coefficients of the PDE using neural networks. This construction reduces the complexity of the problem from learning the entire solution operator and its dependence on all parameters to only learning the Green's function and its dependence on the PDE coefficients. Furthermore, we show that our explicit representation of Green's functions enables the embedding of desirable mathematical attributes in our NGO architectures, such as symmetry, spectral, and conservation properties. Through numerical benchmarks on canonical PDEs, we demonstrate that NGOs achieve comparable or superior accuracy to Deep Operator Networks, Variationally Mimetic Operator Networks, and Fourier Neural Operators with similar parameter counts, while generalizing significantly better when tested on out-of-distribution data. For parametric time-dependent PDEs, we show that NGOs that are trained on a single time step can produce pointwise-accurate dynamics in an auto-regressive manner over arbitrarily large numbers of time steps. For parametric nonlinear PDEs, we demonstrate that NGOs trained exclusively on solutions of corresponding linear problems can be embedded within iterative solvers to yield accurate solutions, provided a suitable initial guess is available. Finally, we show that we can leverage the explicit representation of Green's functions returned by NGOs to construct effective matrix preconditioners that accelerate iterative solvers for PDEs.