arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2504.11837 2026-05-11 cs.CL cs.AI

FiSMiness: A Finite State Machine Based Paradigm for Emotional Support Conversations

Yue Zhao, Qingqing Gu, Xiaoyu Wang, Teng Chen, Zhonglin Jiang, Yong Chen, Luo Ji

AI总结 FiSMiness 是一种基于有限状态机(FSM)的情感支持对话框架,旨在通过结构化状态规划提升对话的长期情感支持效果。该方法利用有限状态机指导大语言模型在对话中逐步推理用户情绪、支持策略及回应内容,从而实现更连贯和有效的对话流程。实验表明,FiSMiness 在多个情感支持数据集上优于多种基线方法,包括直接推理、自修正、思维链、微调及外部辅助方法,展现出更强的性能。

Comments NAACL2025 CMCL Workshop

详情
英文摘要

Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Finite State Machine (FSM) on LLMs, and propose a framework called FiSMiness. Our framework allows a single LLM to bootstrap the planning during ESC, and self-reason the seeker's emotion, support strategy and the final response upon each conversational turn. Substantial experiments on ESC datasets suggest that FiSMiness outperforms many baselines, including direct inference, self-refine, chain of thought, finetuning, and external-assisted methods, even those with many more parameters.

2504.11101 2026-05-11 cs.CV cs.AI cs.MM

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

Yulong Zhang, Tianyi Liang, Xinyue Huang, Erfei Cui, Guoqing Wang, Xu Guo, Chenhui Li, Gongshen Liu

AI总结 本文提出了一种名为“共识熵”(Consensus Entropy, CE)的新型度量方法,用于评估多视觉语言模型(VLM)在光学字符识别(OCR)任务中的输出可靠性。该方法通过衡量不同模型之间的输出一致性来估计结果的可信度,正确预测的输出在空间上趋于一致,而错误预测则表现出较大差异。基于CE,作者构建了CE-OCR框架,实现了输出验证、优质结果选择和自适应路由优化,实验表明CE在无需训练和监督的情况下,显著提升了OCR质量,且优于现有方法。

详情
英文摘要

Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and lack effective unsupervised quality control. We introduce Consensus Entropy (CE), a training-free, model-agnostic metric that estimates output reliability by measuring inter-model agreement entropy. The core insight is that correct predictions converge in output space, while errors diverge. Based on CE, we develop CE-OCR, a lightweight multi-model framework that verifies outputs by ensemble agreement, selects the best outputs, and further improves efficiency through adaptive routing. Experiments demonstrate that CE is robust for quality verification, improving F1 scores by 42.1% over VLM-as-Judge. CE-OCR achieves consistent OCR gains, outperforming self-consistency and single-model baselines at the same cost. Notably, CE requires no training or supervision, enabling plug-and-play integration. Code: https://github.com/Aslan-yulong/consensus-entropy.

2503.06223 2026-05-11 cs.CV

RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion

Ruofan Wang, Xingjun Ma

AI总结 随着视觉-语言模型(VLMs)在开放环境中广泛应用,确保其在多模态输入下的安全可靠性变得尤为重要。然而,现有评估方法多聚焦于明确的恶意指令,忽视了在有害上下文暴露下模型安全对齐可能失效的问题。为此,本文提出RedDiffuser,一种基于强化学习的框架,利用扩散模型生成语义连贯的视觉输入,系统性地揭示多模态安全漏洞。实验表明,该方法能有效暴露多个开源和商用VLM中的潜在风险,凸显当前系统级安全机制在应对真实多模态威胁时的不足。

详情
英文摘要

Large Vision-Language Models (VLMs) are increasingly deployed in open-ended environments, where ensuring reliable safety under multimodal inputs is critical. However, existing evaluations remain largely instruction-centric, focusing on explicit malicious queries while overlooking a more realistic and underexplored risk: whether safety alignment remains robust under harmful contextual exposure. This limitation is particularly important for multimodal systems, where visual inputs can substantially steer model behavior and render text-only auditing insufficient. In this work, we study multimodal safety auditing under harmful contextual exposure, asking whether VLMs can maintain safe behavior when partial toxic text is paired with visual context. To enable systematic auditing, we propose RedDiffuser (RedDiff), a reinforcement-based framework that leverages diffusion models to generate semantically coherent visual inputs for black-box safety testing. By combining greedy prompt search with reinforcement optimization, RedDiffuser uncovers high-risk multimodal inputs that expose latent safety failures. Extensive experiments on both open-source and commercial VLMs show that such context-conditioned failures are widespread. On LLaVA, RedDiffuser increases unsafe response rates by up to 10.69% on the original set and 8.91% on a hold-out set, with strong transferability to Gemini and LLaMA-Vision. These vulnerabilities persist even under external safety guardrails, suggesting that current system-level safety mechanisms remain insufficient for realistic multimodal risks. Our findings reveal a critical blind spot in existing safety evaluations and establish context-aware multimodal auditing as an essential paradigm for diagnosing hidden vulnerabilities in modern VLM systems.

2503.05085 2026-05-11 cs.CL cs.SD eess.AS

S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models

Feng Jiang, Zhiyu Lin, Yiyang Liu, Liumeng Xue, Fan Bu, Yuhao Du, Xiangying Chen, Benyou Wang, Haizhou Li

AI总结 本文提出S2S-Arena,一个专注于评估语音到语音模型在遵循指令时对语用信息(如语调、情感和说话人特征)理解与表达能力的基准。该基准采用四层交互协议和两阶段数据构建流程,生成涵盖百余项真实任务的1,243个语音样本,并引入无参考的配对比较评估框架。实验表明,当前学术与工业系统在复杂语用场景下存在显著性能差距,研究进一步分析了影响表达式指令遵循的关键设计因素,为构建更自然、鲁棒且符合人类沟通习惯的语音代理提供了指导。

Comments Accepted by ACL 2026 main

详情
英文摘要

Recent advances in large language models (LLMs) have fundamentally reshaped speech-to-speech (S2S) systems, enabling increasingly natural spoken interaction. However, existing benchmarks still rely heavily on text-based evaluation and largely ignore paralinguistic cues such as prosody, emotion, and speaker traits, which are central to expressive and human-like communication. We introduce S2S-Arena, a speech-native benchmark for evaluating instruction-following S2S models with explicit assessment of both semantic understanding and paralinguistic expression. S2S-Arena features a four-level interaction protocol that systematically probes models under increasing paralinguistic complexity, a two-stage data construction pipeline that produces 1,243 speech samples spanning 100+ real-world tasks, and an arena-style evaluation framework that enables reference-free, pairwise comparison directly in the speech modality. Benchmarking 10 state-of-the-art S2S systems over 1,000+ comparisons reveals substantial performance gaps (especially under complex paralinguistic demands) between current academic and industrial systems. Our analysis further identifies key design factors governing expressive instruction following, providing actionable insights for building more natural, robust, and human-aligned speech agents.

2503.04638 2026-05-11 cs.LG

No Forgetting Learning: Buffer-free Continual Learning Classification

Mohammad Ali Vahedifar, Qi Zhang

AI总结 本文提出了一种无需回放缓冲区的持续学习框架No Forgetting Learning(NFL),旨在解决传统方法中因存储示例而带来的内存开销和隐私问题。该方法通过将网络分解为共享主干和任务特定头,并结合逐步冻结、知识蒸馏和双软目标锚定等策略,实现了对先前任务知识的有效保留与新任务学习的平衡。此外,NFL+引入了欠完备自编码器以增强特征保留和类别不平衡修正,NFL+LoRA则进一步扩展至预训练视觉Transformer,通过低秩子空间更新和Fisher加权正则化,保持了模型内存成本的恒定。实验表明,该方法在多个数据集上表现优异,且模型规模仅为基于缓冲区方法的2.53%。

详情
英文摘要

Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Learning (NFL), a buffer-free framework for class- and task-incremental learning that instead exploits the inherent redundancy of overparameterized networks. NFL decomposes the network into a shared backbone and task-specific heads, then applies a stepwise freezing protocol: new capabilities are first isolated, shared representations are adapted under knowledge distillation, and all components are jointly refined with dual soft-target anchoring. NFL+ augments this pipeline with an under-complete auto-encoder that preserves informative features from previous tasks and corrects the prediction bias caused by class imbalance. NFL+LoRA further extends the framework to pre-trained Vision Transformers by confining updates to a low-rank subspace with Fisher-weighted regularization, maintaining constant backbone memory cost regardless of the number of tasks. On CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks, NFL+ outperforms all buffer-free baselines and matches memory-based methods while requiring only 2.53\% of their model size. We also propose a Plasticity--Stability score for more balanced trade-off evaluation.

2503.02107 2026-05-11 cs.RO

Balancing Act: Trading Off Odometry and Map Registration for Efficient Lidar Localization

Katya M. Papais, Daniil Lisus, Cedric Le Gentil, David J. Yoon, Timothy D. Barfoot

AI总结 本文研究了如何在激光雷达定位中平衡定位精度与计算效率的问题,提出两种改进方法并分析其性能影响。首先,将两种轻量化的里程计估计器集成到拓扑定位框架中,并与最先进的ICP方法进行对比,揭示了不同方法在速度与精度上的权衡。其次,通过控制定位更新频率并利用里程计估计进行补偿,展示了在保持高精度的同时提升计算效率的可行性。实验表明,与ICP相比,所提方法在减少计算量的同时仍能保持先进水平的定位精度。

Comments 8 pages

详情
英文摘要

Most autonomous vehicles rely on accurate and efficient localization, which is achieved by comparing live sensor data to a preexisting map, to navigate their environment. Balancing the accuracy of localization with computational efficiency remains a significant challenge, as high-accuracy methods often come with higher computational costs. In this paper, we present two ways of improving lidar localization efficiency and study their impact on performance. First, we integrate two lightweight odometry estimators, a correspondence-free Doppler-inertial estimator and a low-cost wheel odometer-gyroscope (OG) method, into a topometric localization pipeline and compare them against a state-of-the-art (SOTA) iterative closest point (ICP) baseline. We highlight the trade-offs between these approaches: the Doppler and OG estimators offer faster, lightweight updates, while ICP provides higher accuracy at the cost of increased computational load. Second, by controlling the frequency of localization updates and leveraging odometry estimates between them, we demonstrate that accurate localization can be maintained while optimizing for computational efficiency using any of the presented methods. We evaluate these approaches using over 100 km of unique real-world driving data in different on-road environments. By varying the localization interval, we demonstrate that computational effort can be reduced by 27%, 80%, and 91% for the ICP, Doppler, and OG estimators, respectively, while maintaining SOTA accuracy.

2502.17500 2026-05-11 cs.LG cs.AI

Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS

Andrzej Cichocki

AI总结 本文深入研究了双参数广义欧拉对数及其逆函数的性质,并与多种变形指数函数建立了联系,为广义熵和散度度量提供了一个统一的内核。在算法方面,作者将欧拉对数应用于现代机器学习与优化,提出基于欧拉对数的广义指数梯度和镜像下降方法,并设计了一种适用于深度神经网络的广义交叉熵损失函数,推导了其精确的反向传播公式,并实现了与自然梯度下降的无缝结合。研究展示了如何通过调整两个变形参数,有效分离模型的尾部鲁棒性与局部梯度塑造。

Comments 34 pages, preprint of Journal paper

详情
英文摘要

This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, concavity, and invertibility, derive series and integral representations, and provide explicit links to a broad class of one- and two-parameter deformations, including Tsallis, Kaniadakis, Schwämmle--Tsallis, Kaniadakis--Scarfone, and Tempesta-type logarithms and their inverse exponentials. In this way, the Euler $(a,b)$-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, we extend applications of the Euler logarithm to modern machine learning and optimization. We introduce generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes in which the Euler $(a,b)$-logarithm acts as a flexible link function in the underlying Bregman divergence. In addition, we propose an Euler-based Generalized Cross-Entropy (GCE) loss for deep neural networks, derive its exact backpropagation formulas, and detail its seamless integration with Fisher-Rao Natural Gradient (NG) descent. By isolating the Fisher Information Matrix (FIM) and developing a diagonal NG approximation, we demonstrate how the two deformation parameters successfully decouple tail robustness from local gradient shaping.

2501.09189 2026-05-11 cs.LG cs.DS

Testing Noise Assumptions of Learning Algorithms

Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

AI总结 本文研究了计算学习理论中的一个基本问题:能否高效地检验训练数据是否满足某个噪声模型的假设。作者提出了一种高效的算法,用于测试训练数据中的多种噪声假设,并扩展了现有的可检验学习框架,提出了满足特定条件的测试与学习算法。研究还展示了可检验学习与传统噪声学习之间的本质区别,指出在随机分类噪声下,可检验学习需要超多项式时间,而传统学习却非常简单。

Comments 45 pages, Best Paper Award at Reliable ML workshop at NeurIPS 2025, Accepted to COLT 2026

详情
英文摘要

We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of noise. In this work, we show that this task is tractable and present the first efficient algorithm to test various noise assumptions on the training data. To model this question, we extend the recently proposed testable learning framework of Rubinfeld and Vasilyan (2023) and require a learner to run an associated test that satisfies the following two conditions: (1) whenever the test accepts, the learner outputs a classifier along with a certificate of optimality, and (2) the test must pass for any dataset drawn according to a specified modeling assumption on both the marginal distribution and the noise model. We then consider the problem of learning halfspaces over Gaussian marginals with Massart noise (where each label can be flipped with probability less than $1/2$ depending on the input features), and give a fully-polynomial time testable learning algorithm. We also show a separation between the classical setting of learning in the presence of structured noise and testable learning. In fact, for the simple case of random classification noise (where each label is flipped with fixed probability $η= 1/2$), we show that testable learning requires super-polynomial time while classical learning is trivial.

2411.16748 2026-05-11 cs.CV

Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation

Haojie Zhang, Zhihao Liang, Ruibo Fu, Bingyan Liu, Zhengqi Wen, Xuefei Liu, Jianhua Tao, Yaling Liang

AI总结 长时长说话视频生成面临视频质量、人物肖像一致性、时间连贯性及计算效率等多重挑战。为解决这些问题,本文提出了一种名为 LetsTalk 的扩散变换框架,结合多模态引导和新颖的记忆库机制,有效保持上下文连续性,实现高质量、高效且鲁棒的长时长说话视频生成。该方法引入噪声正则化记忆库以缓解误差累积和采样伪影,并采用深度压缩自编码器和时空感知变换器提升效率与时空建模能力,实验表明其在生成质量、时间一致性及多样性方面均达到新水平。

Comments 16 pages, 25 figures

详情
英文摘要

Long-duration talking video synthesis faces enduring challenges in achieving high video quality, portrait consistency, temporal coherence, and computational efficiency. As video length increases, issues such as visual degradation, portrait drift, temporal artifacts, and error accumulation become increasingly problematic, severely affecting the realism and reliability of the results. To address these challenges, we present LetsTalk, a diffusion transformer framework equipped with multimodal guidance and a novel memory bank mechanism, explicitly maintaining contextual continuity and enabling robust, high-quality, and efficient generation of long-duration talking videos. In particular, LetsTalk introduces a noise-regularized memory bank to alleviate error accumulation and sampling artifacts during extended video generation. To further improve efficiency and spatiotemporal modeling, LetsTalk employs a deep compression autoencoder and a spatiotemporal-aware transformer with linear attention for effective multimodal fusion. We systematically analyze three fusion schemes and show that combining deep (Symbiotic Fusion) for portrait features and shallow (Direct Fusion) for audio achieves superior visual realism and precise speech-driven motion, while preserving diversity of movements. Extensive experiments demonstrate that LetsTalk establishes new state-of-the-art in generation quality, producing temporally coherent and realistic talking videos with enhanced diversity and liveliness, and maintains remarkable efficiency with 8x fewer parameters than previous approaches.

2410.21438 2026-05-11 cs.CL cs.LG

UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

Zhichao Wang, Bin Bi, Zixu Zhu, Xiangbo Mao, Jun Wang, Shiyu Wang, Cheng Wang, Dong Nie, Lingzi Hong

AI总结 该论文提出了一种统一微调方法UFT,通过隐式奖励函数将监督微调(SFT)与对齐方法(如RLHF、DPO等)整合到单一训练阶段,解决了传统分阶段微调导致的任务性能下降问题。实验表明,UFT在指令调优数据上优于单独使用SFT,并在结合对齐数据时有效防止性能退化,在指令遵循和事实性任务中表现出显著提升。该方法为大语言模型的后训练提供了一个高效且通用的框架。

详情
英文摘要

By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Because SFT and alignment have different objectives and underlying processes, performance on certain tasks can decline. To address this, we seamlessly introduce Unified Fine-Tuning (UFT), which integrates SFT and alignment into a single training stage using the same objective and loss functions through an implicit reward function. Our experimental results demonstrate that UFT outperforms SFT on instruction-tuning data alone. Moreover, when combining instruction-tuning data with alignment data, UFT effectively prevents the degradation on some tasks across these two stages and shows a clear advantage over sequentially applying SFT and alignment. This is evident in the significant improvements observed in the \textbf{ifeval} task for instruction-following and the \textbf{truthful} task for factuality. The proposed general fine-tuning framework UFT establishes an effective and efficient paradigm for LLM post-training.

2410.18715 2026-05-11 cs.CV

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

Zijia Zhao, Longteng Guo, Tongtian Yue, Erdong Hu, Shuai Shao, Zehuan Yuan, Hua Huang, Jing Liu

AI总结 本文研究了基于开放域图像的通用对话式图像检索任务,旨在通过人机交互对话检索目标图像。为此,作者构建了一个包含多轮图文对话上下文的检索数据集ChatSearch,并提出了一种端到端训练的生成式检索模型ChatSearcher,该模型能够处理图文交织的输入输出,具备多模态上下文推理能力和利用世界知识进行图像检索的能力。实验表明,该模型在ChatSearch数据集及其他相关任务中均表现出色,有望推动交互式多模态检索系统的研究进展。

详情
Journal ref
Pattern Recognition, 167 (2025) 111696
英文摘要

In this paper, we investigate the task of general conversational image retrieval on open-domain images. The objective is to search for images based on interactive conversations between humans and computers. To advance this task, we curate a dataset called ChatSearch. This dataset includes a multi-round multimodal conversational context query for each target image, thereby requiring the retrieval system to find the accurate image from database. Simultaneously, we propose a generative retrieval model named ChatSearcher, which is trained end-to-end to accept/produce interleaved image-text inputs/outputs. ChatSearcher exhibits strong capability in reasoning with multimodal context and can leverage world knowledge to yield visual retrieval results. It demonstrates superior performance on the ChatSearch dataset and also achieves competitive results on other image retrieval tasks and visual conversation tasks. We anticipate that this work will inspire further research on interactive multimodal retrieval systems. Our dataset will be available at https://github.com/joez17/ChatSearch.

2410.01308 2026-05-11 cs.LG cs.AI

How Hard Is It for Message-Passing GNNs to Simulate One Weisfeiler-Lehman Color-Refinement Step?

Guanyu Cui, Yuhe Guo, Zhewei Wei, Hsin-Hao Su

AI总结 本文研究了消息传递图神经网络(MPGNNs)模拟Weisfeiler-Lehman(WL)颜色细化步骤所需的计算成本,特别是在无属性图上。研究区分了输入无关和输入相关的模拟方式,发现输入无关的模拟在最坏情况下需要较大的网络深度或消息大小,而输入相关的模拟可以更浅,但参数难以预先确定。研究还表明,当颜色集较大时,使用有限随机性可以显著降低计算成本,而颜色集较小时则需要在层数和消息大小之间进行权衡。

详情
英文摘要

Message-passing graph neural networks (MPGNNs) are commonly compared with the Weisfeiler-Lehman (WL) color-refinement procedure, but this comparison does not quantify the resource parameters a network needs to realize color refinement with bounded-size messages and finite numerical precision. We study the cost of simulating a single color-refinement step on unattributed graphs. We distinguish input-independent, or oblivious, simulation from instance-dependent simulation. In the former, the parameters, or their distributions in randomized models, are fixed before the input instance is known. Our results show that the local form of WL color refinement hides a global relabeling problem. In the oblivious setting, deterministic and zero-error randomized MPGNNs cannot solve this problem in the worst case using only shallow networks with small messages. We complement this lower bound with a nearly matching construction in a stronger rooted, port-aware model. By contrast, when the color set is large, bounded-error randomness can greatly reduce the cost, and a one-layer MPGNN with messages of logarithmic size and a logarithmic number of random bits suffices. We show that this logarithmic number of random bits is essentially necessary for shallow, small-message simulations. When the color set is small, we still obtain a rooted, port-aware simulation, but this construction requires more layers or larger messages. We also prove that this extra cost is partly unavoidable, as small color sets force a nontrivial trade-off between the number of layers and the message size. Finally, instance-dependent simulation can be much shallower, but the required instance-specific parameters are not necessarily easy to find. Together, these results reveal quantitative structure hidden behind the statement that MPGNNs match WL color refinement.

2408.15339 2026-05-11 cs.LG cs.CL

UNA: A Unified Supervised Framework for Efficient LLM Alignment Across Feedback Types

Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu, Sitaram Asur, Na Claire Cheng, Cheng Wan, Dong Nie, Lingzi Hong

AI总结 本文提出了一种统一的监督框架UNA,用于高效地对齐大语言模型(LLM)在不同类型的反馈数据上。该框架通过一个通用的隐式奖励函数,能够处理包括二元、成对和基于评分的多种反馈类型,解决了现有方法难以统一处理异构监督信号的问题。实验表明,UNA在经典基准测试中表现出色,验证了其在模型对齐过程中的有效性与优越性。

详情
英文摘要

RL alignment methods, including RLHF and DPO, are primarily based on pairwise preference data. Although scalar or score-based feedback has been collected in some settings, it is rarely used directly, and preference magnitude information is typically ignored. Furthermore, current alignment frameworks offer limited capability for unifying heterogeneous supervision signals, making it difficult to jointly leverage diverse data types within a single training paradigm. This limitation constrains the richness and scalability of the alignment process. To address this gap, we propose a \textbf{UN}ified \textbf{A}lignment (UNA) framework capable of training across different types of feedback, including binary, pairwise, and score-based, through a generalized implicit reward function. The reward function is theoretically proved to be the optimal policy by the log sum inequality. Extensive experiments on classical benchmarks consistently demonstrate the advantage of the proposed unified framework with typical LLM base models.

2408.09929 2026-05-11 cs.LG cs.CV

Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise

Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li

AI总结 本文研究对比学习中的数据增强与正激励噪声(π-噪声)之间的关系,提出将对比损失转化为辅助高斯分布以量化对比模型的难度,并定义了对比学习中的任务熵这一核心概念。基于理论分析,作者设计了一种生成π-噪声的框架,用于学习有益的数据增强策略,替代传统的预定义增强方法,该方法适用于多种数据类型并兼容现有对比模型,实验表明其能有效生成高质量的数据增强。

Comments Accepted by ICML 2026

详情
英文摘要

Inspired by the idea of Positive-incentive Noise (Pi-Noise or $π$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $π$-noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the information theory framework, we properly define the task entropy, the core concept of $π$-noise, of contrastive learning. It is further proved that the predefined data augmentation in the standard contrastive learning paradigm can be regarded as a kind of point estimation of $π$-noise. Inspired by the theoretical study, a framework that develops a $π$-noise generator to learn the beneficial noise (instead of estimation) as data augmentations for contrast is proposed. The designed framework can be applied to diverse types of data and is also completely compatible with the existing contrastive models. From the visualization, we surprisingly find that the proposed method successfully learns effective augmentations. Our code is available at https://github.com/hyzhang98/PiNDA.

2407.04183 2026-05-11 cs.CL cs.AI cs.CY cs.HC

Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Joshua Ashkinaze, Ruijia Guan, Laura Kurek, Eytan Adar, Ceren Budak, Eric Gilbert

AI总结 本研究探讨了大型语言模型(LLMs)在应用维基百科中立观点(NPOV)政策时的表现,评估了其检测和纠正偏见编辑的能力。实验发现,LLMs在偏见检测任务上表现一般,准确率仅为64%,且存在预测偏差;但在生成任务中表现较好,去除了79%的编辑内容。然而,LLMs的修改往往超出编辑者的中立化范围,导致高召回但低精度。研究还发现,LLMs的改写在公众看来更中立流畅,但可能引入非中立相关的变化,影响编辑者的自主权和审核工作。

Comments Appeared at ICWSM 2026

详情
英文摘要

Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predicted bias), suggesting distinct priors about neutrality. LLMs performed better at generation, removing 79% of words removed by Wikipedia editors. However, LLMs made additional changes beyond Wikipedia editors' simpler neutralizations, resulting in high-recall but low-precision editing. Interestingly, crowdworkers rated AI rewrites as more neutral (70%) and fluent (61%) than Wikipedia-editor rewrites. Qualitative analysis found LLMs sometimes applied NPOV more comprehensively than Wikipedia editors but often made extraneous non-NPOV-related changes (such as grammar). LLMs may apply rules in ways that resonate with the public but diverge from community experts. While potentially effective for generation, LLMs may reduce editor agency and increase moderation workload (e.g., verifying additions). Even when rules are easy to articulate, having LLMs apply them like community members may still be difficult.

2310.15288 2026-05-11 cs.AI cs.LG

Active teacher selection for reward learning

Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell

AI总结 该研究针对从多个教师获取反馈的奖励学习问题,提出了隐藏效用老虎机(HUB)框架,用于建模教师在理性程度、专业性和成本上的差异。研究设计了主动教师选择(ATS)算法,通过动态选择合适的教师进行查询,显著提升了学习效果,并在论文推荐和疫苗测试等实际场景中验证了方法的有效性。主要贡献包括HUB框架的提出、基于主动学习的ATS算法,以及在现实问题中的应用验证。

详情
英文摘要

Reward learning techniques enable machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite gathering feedback from large and heterogeneous populations. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing. We find that Active Teacher Selection (ATS) algorithms outperform baselines by actively selecting when and which teacher to query. Our key contributions are 1) the HUB framework: a novel mathematical framework for modeling the teacher selection problem, 2) ATS: an active-learning based algorithmic approach that demonstrates the utility of modeling teacher heterogeneity, and 3) proof-of-concept application of the HUB framework and ATS approaches to model and solve multiple real-world problems with complex trade-offs between reward learning and optimization.

2305.18593 2026-05-11 cs.LG cs.AI

On Diffusion Modeling for Anomaly Detection

Victor Livernoche, Vineet Jain, Yashar Hezaveh, Siamak Ravanbakhsh

AI总结 本文研究了扩散模型在无监督和半监督异常检测中的应用,提出了一种名为扩散时间估计(DTE)的新方法。该方法通过估计输入数据的扩散时间分布并利用其均值或众数作为异常分数,相比传统的去噪扩散概率模型(DDPM)在计算效率上有了显著提升。实验表明,DTE在多个基准数据集上表现优异,且推理速度比DDPM快得多,为异常检测提供了一种高效且具有竞争力的扩散模型解决方案。

详情
Journal ref
Proceedings of the International Conference on Learning Representations (ICLR 2024)
英文摘要

Known for their impressive performance in generative modeling, diffusion models are attractive candidates for density-based anomaly detection. This paper investigates different variations of diffusion modeling for unsupervised and semi-supervised anomaly detection. In particular, we find that Denoising Diffusion Probability Models (DDPM) are performant on anomaly detection benchmarks yet computationally expensive. By simplifying DDPM in application to anomaly detection, we are naturally led to an alternative approach called Diffusion Time Estimation (DTE). DTE estimates the distribution over diffusion time for a given input and uses the mode or mean of this distribution as the anomaly score. We derive an analytical form for this density and leverage a deep neural network to improve inference efficiency. Through empirical evaluations on the ADBench benchmark, we demonstrate that all diffusion-based anomaly detection methods perform competitively for both semi-supervised and unsupervised settings. Notably, DTE achieves orders of magnitude faster inference time than DDPM, while outperforming it on this benchmark. These results establish diffusion-based anomaly detection as a scalable alternative to traditional methods and recent deep-learning techniques for standard unsupervised and semi-supervised anomaly detection settings.

2106.09636 2026-05-11 cs.LG

Multi-Stage Prototype Learning for Interpretable Time Series Classification

Bhavesh Kalisetti, Vincent Wang, Gaurav R. Ghosal, Maryam Bijanzadeh, Reza Abbasi-Asl

AI总结 该论文提出了一种多阶段原型学习框架,用于可解释的多元时间序列分类。该方法通过识别各变量内部及跨变量的预测性时间模式,提升模型的可解释性。实验表明,该模型在保持与现有先进方法相当分类精度的同时,提供了层次化的原型解释,有助于理解模型的预测机制。

详情
英文摘要

Deep learning methods are powerful tools in classifying multivariate time series data. Despite their high performance, these methods are hard to interpret, which diminishes their applications in high-risk domains such as healthcare. In this paper, we propose a novel multi-stage prototype learning framework for multivariate time series classification. By design, our framework identifies predictive temporal patterns in individual variables as well as cross-variable patterns that are highly predictive of each class. We validate our model on one simulated and four real-world datasets and demonstrate comparable accuracy to state-of-the-art methods while providing substantially improved interpretability through explicit, hierarchical prototype-based explanations. These explanations reveal both single-variable temporal patterns as well as cross-variable interactions that are most predictive for each class, providing insights into underlying mechanisms of the predictive model.

2605.08072 2026-05-11 stat.ML cs.DS cs.LG math.ST stat.TH

A Note on Non-Negative $L_1$-Approximating Polynomials

Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

AI总结 本文研究了在高斯分布下具有非负性的 $L_1$-逼近多项式的存在性,这类多项式在逼近指示函数时不仅满足 $L_1$-范数误差要求,还保证输出非负。作者证明了对于具有有限高斯表面面积(GSA)的集合类,存在次数为 $\tilde{O}(Γ^2/\varepsilon^2)$ 的非负多项式,能够以 $\varepsilon$ 的误差逼近其指示函数。该结果在保持 $L_1$-逼近能力的同时,提供了更强的点态保证,并且与当前最优的无非负性约束的高斯 $L_1$-逼近多项式次数相差仅常数因子。

详情
英文摘要

$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently found uses in smoothed learning from positive-only examples. In this short note, we prove that every class of sets with Gaussian surface area (GSA) at most $Γ$ under the standard Gaussian admits degree-$k$ non-negative polynomials that $\eps$-approximate its indicator functions in $L_1$-norm, for $k=\tilde{O}(Γ^2/\varepsilon^2)$. Equivalently, finite GSA implies $L_1$-approximation with the stronger pointwise guarantee that the approximating polynomial has range contained in $[0,\infty)$. Up to a constant-factor, this matches the degree of the best currently known Gaussian $L_1$-approximation degree bound without the non-negativity constraint.

2605.08035 2026-05-11 eess.SP cs.LG

PropSplat: Map-Free RF Field Reconstruction via 3D Gaussian Propagation Splatting

William Bjorndahl, Maninder Pal Singh, Farhad Nouri, Joseph Camp

AI总结 PropSplat 是一种无需地图的无线传播建模方法,通过3D各向异性高斯原语重建射频场,能够从稀疏的射频测量数据中学习传播环境。该方法利用可学习的路径损耗指数对高斯进行初始化和优化,无需依赖平面图、地形数据库等外部信息。实验表明,PropSplat 在室内外场景中均优于现有方法,实现了更精确的信号强度预测和定位性能,展示了从稀疏测量数据中实现高精度传播建模的可行性。

Comments Accepted for presentation at IEEE DySPAN 2026

详情
英文摘要

Building a site-specific propagation model typically requires either ray-tracing over detailed 3D maps or dense measurement campaigns. Both approaches are expensive and often infeasible for rapid deployments where geographic data is unavailable or outdated. We present PropSplat, a map-free propagation modeling method that reconstructs radio frequency (RF) fields using 3D anisotropic Gaussian primitives. Each Gaussian encodes a scalar path loss offset relative to an explicit baseline path loss model with a learnable path loss exponent. Gaussians are initialized along observed transmitter--receiver paths and optimized end-to-end to learn the propagation environment without external information like floor plans, terrain databases, or clutter data. We evaluate PropSplat against wireless radiance field methods NeRF$^2$, GSRF, and WRF-GS+ on two real-world datasets. On large-scale outdoor drive-tests spanning multiple topographical regions at six sub-6 GHz frequencies, PropSplat achieves 5.38 dB RMSE when training measurements are spaced 300m apart and outperforms WRF-GS+ (5.87 dB), GSRF (7.46 dB), and NeRF$^2$ (14.76 dB). On indoor Bluetooth Low Energy measurements, PropSplat achieves 0.19m mean localization error, an order of magnitude better than NeRF$^2$ (1.84m), while achieving near-identical received signal strength prediction accuracy. These results show that accurate site-specific propagation reconstruction is achievable from sparse RF-native measurements. The need for geographic data as a prerequisite for scalable RF environment modeling is reduced.

2605.08034 2026-05-11 stat.ML cs.LG

Semiparametric Efficient Test for Interpretable Distributional Treatment Effects

Houssam Zenati, Arthur Gretton

AI总结 该研究提出了一种名为DR-ME的半参数高效测试方法,用于检测可解释的分布性处理效应。该方法能够在观测数据中识别出处理对结果分布不同位置的影响,而不仅仅是整体差异,通过学习关键结果位置并结合正交的双重稳健核特征,实现了对分布尾部、模式等变化的精确检测。实验表明,DR-ME在控制第一类错误率和检测能力方面表现优异,并能有效定位医学影像研究中的分布性处理效应。

详情
英文摘要

Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome laws, but global tests do not reveal where the laws differ. We propose DR-ME, to our knowledge the first semiparametrically efficient finite-location test for interpretable distributional treatment effects. DR-ME evaluates an interventional kernel witness at learned outcome locations, returning causal-discrepancy coordinates rather than only a global rejection. From observational data, we derive orthogonal doubly robust kernel features whose centered oracle form is the canonical gradient of this finite witness. For fixed locations, we characterize the local testing limit: DR-ME is chi-square calibrated under the null, has noncentral chi-square local power, and uses the covariance whitening that optimizes local signal-to-noise for discrepancies visible through the selected coordinates. This efficient local-power geometry yields a principled location-learning criterion, with sample splitting preserving post-selection validity. Experiments show near-nominal type-I error, competitive power against global doubly robust kernel tests, and interpretable learned locations that localize distributional effects in a semi-synthetic medical-imaging study.

2605.08022 2026-05-11 cs.NE cs.AI cs.LG

Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction

Himanshu Udupi, Xiaocong Yang, ChengXiang Zhai

AI总结 本文研究了脉冲神经网络(SNN)的全局最优训练问题,针对其因脉冲函数非微分性导致的训练困难,提出了一种基于参数重构的训练方法。该方法通过将并行递归阈值网络的凸化理论扩展到SNN,实现了更准确的梯度计算,有效减少了近似误差。实验表明,该算法在多种任务中表现出优越的性能和鲁棒性,为大规模SNN训练提供了新思路。

详情
英文摘要

Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers. To address this challenge, we extend the work on convexification of parallel feedforward threshold networks to parallel recurrent threshold networks, which subsume parallel SNNs as a structured special case. Building on this theoretical framework, we propose a parameter reconstruction algorithm for SNN training that demonstrates consistent and significant advantages across various tasks, both as a standalone method and in combination with surrogate-gradient training. The ablations further demonstrate the data scalability and robustness to model configurations of our training algorithm, pointing toward its potential in large-scale SNN training.

2605.08006 2026-05-11 math.OC cs.LG stat.ML

Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

Yiyang Shen, Yutian He, Weiran Wang, Qihang Lin

AI总结 本文研究了一类具有上下层均为极小极大结构的双层优化问题,这类问题在许多新兴应用中具有广泛代表性。为了解决现有方法在处理下层为极小极大问题时的不足,作者提出了一种基于惩罚函数的一阶优化方法,无需假设下层问题强凸,即可高效求解。在确定性设置下,该方法能够以 $\tilde{O}(ε^{-4})$ 的计算复杂度找到 $ε$-KKT 点,并在随机设置下也给出了相应的复杂度分析,显著优于现有结果。

详情
英文摘要

We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.

2605.07987 2026-05-11 eess.IV cs.CV

Uncertainty Quantification for Cardiac Shape Reconstruction with Deep Signed Distance Functions via MCMC methods

Jan Verhülsdonk, Thomas Grandits, Francisco Sahli Costabal, Thomas Beiert, Simone Pezzuto, Alexander Effland

AI总结 本文提出了一种基于深度符号距离函数(DeepSDF)和马尔可夫链蒙特卡洛(MCMC)方法的概率框架,用于实现具有不确定性感知的心脏形状重建。该方法通过神经网络隐式建模心脏几何结构,能够同时重建左心室和右心室的多表面形态,并在潜在空间中进行贝叶斯推断,以获得最大后验估计和不确定性采样重建结果。实验表明,该方法在公开心脏数据集上实现了高精度重建,并能提供校准良好的不确定性估计。

详情
英文摘要

Atlas-based approaches allow high-quality, patient-specific shape reconstructions of cardiac anatomy from sparse and/or noisy data such as point clouds. However, these methods are mainly prior-driven, so the impact of uncertainty can be large, limiting their clinical reliability. We propose a probabilistic framework for uncertainty-aware cardiac shape reconstruction that combines Deep Signed Distance Functions (DeepSDFs) with Markov Chain Monte Carlo (MCMC) sampling. Cardiac geometries are modeled implicitly as zero-level sets of a neural network conditioned on learned latent codes, enabling multi-surface reconstruction of the left and right ventricles. By interpreting the reconstruction loss as a log-likelihood, we perform Bayesian inference in the latent space to obtain both maximum a posteriori (MAP) and posterior-sampled reconstructions. Experiments on a public cardiac dataset show that our approach produces accurate reconstructions and well-calibrated uncertainty estimates.

2605.07986 2026-05-11 cs.HC cs.AI cs.CY

Towards Apples to Apples for AI Evaluations: From Real-World Use Cases to Evaluation Scenarios

Yee-Yin Choong, Kristen Greene, Alice Qian, Meryem Marasli, Ziqi Yang, Sophia Chen, Laura Dabbish, Anand Rao, Hong Shen

AI总结 该论文旨在解决AI系统评估中常见的“苹果对橘子”式比较问题,提出通过方法论透明、操作性基础和以用户为中心的设计原则,实现更一致的评估标准。研究设计了一种可重复的流程,通过结构化的AI用例工作表,从领域专家处获取高层次用例,并将其转化为详细的评估场景。该方法在金融服务业进行了验证,展示了从六个典型AI用例生成107个具体场景的过程,并通过多阶段的人机协作流程确保场景符合实际应用需求。

Comments 23 pages, 3 figures

详情
英文摘要

AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-world AI evaluations, this work advocates for methodological transparency in evaluation scenarios, operational grounding, and human-centered design (HCD) principles. We propose a repeatable process for transforming high-level use cases to detailed scenarios by eliciting use cases from subject matter experts (SMEs) via a structured AI Use Case Worksheet with six key elements: use case, sector, user (direct and indirect), intended outcomes, expected impacts (positive and negative), and KPIs and metrics. We demonstrate utility of the worksheet and process in the U.S. financial services sector. This paper reports on example high-level AI use cases identified by financial services sector SMEs: cyber defense enablement, developer productivity, financial crime aggregation, suspicious activity report (SAR) filing, credit memo generation, and internal call center support. These AI use cases provided are illustrative of the process and not exhaustive. Central to our work is a three-stage expansion pipeline combining LLM prompting with human reviews to generate 107 scenarios from those use cases elicited from SMEs. This process integrates iterative human reviews at every juncture to ensure operational grounding: for scenario titles and descriptions; for core scenario elements like users, benefits and risks, and metrics; and for scenario narratives and evaluation objectives. Human checkpoints ensure scenarios remain reflective of real-world usage and human needs. We describe a validation rubric to assess scenario quality. By defining key scenario components, this work supports a more consistent and meaningful paradigm for human-centered AI evaluations.

2605.07970 2026-05-11 math.ST cs.LG stat.TH

Linear Response Estimators for Singular Statistical Models

Chris Elliott, Daniel Murfet

AI总结 本文研究了一类统计模型在数据扰动下可观测量的响应特性,定义了用于衡量这种响应的“易感度”指标。作者提出了一种针对这些易感度的估计方法,并证明了在数据量趋于无穷大时,这些估计量具有一致性和渐近无偏性。该研究为理解复杂统计模型对数据变化的敏感性提供了理论基础和实用工具。

Comments 24 pages, comments welcome!

详情
英文摘要

We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence of n data-points and prove that these estimators are consistent and asymptotically unbiased in the large n regime.

2605.07947 2026-05-11 cs.CE cs.AI cs.LG math.OC

Exploring the non-convexity in machine learning using quantum-inspired optimization

Kandula Eswara Sai Kumar, Parth Dhananjay Danve, Abhishek Chopra, Rut Lineswala

AI总结 本文研究了现代机器学习中高维非凸优化问题的求解挑战,尤其是存在严重异常值时的结构恢复问题。为此,作者提出了一种基于量子启发进化优化(QIEO)的统一框架,通过量子叠加的概率表示保持全局搜索视角,有效克服传统梯度下降和贪心算法易陷入局部最优的缺陷。实验表明,QIEO在稀疏信号恢复和鲁棒线性回归等任务中,相比现有先进算法具有更高的结构保真度、更低的均方误差和更强的鲁棒性。

详情
英文摘要

The escalating complexity of modern machine learning necessitates solving challenging non-convex optimization problems, particularly in high-dimensional regimes and scenarios contaminated by gross outliers. Traditional approaches, relying on convex relaxations or specialized local search heuristics, frequently succumb to suboptimal local minima and fail to recover the true underlying discrete structures. In this paper, we propose treating these non-convex challenges as a global search problem and introduce a unified framework based on Quantum-Inspired Evolutionary Optimization (QIEO). By leveraging a probabilistic representation inspired by quantum superposition, QIEO maintains a global view of the search space, enabling it to tunnel through local optima that trap conventional gradient-based and greedy solvers. We comprehensively evaluate QIEO across diverse non-convex applications, including sparse signal recovery (gene expression analysis and compressed sensing) and robust linear regression. Extensive benchmarking against state-of-the-art continuous solvers (ADAM, Differential Evolution), classical metaheuristics (Genetic Algorithms), and specialized non-convex algorithms (Iterative Hard Thresholding) demonstrates that QIEO consistently achieves superior structural fidelity, lower mean squared error, and enhanced robustness without support inflation. Our findings suggest that embracing a quantum-inspired global search provides a resilient, unified paradigm for overcoming the inherent intractability of discrete nonconvex machine learning landscapes.

2605.07908 2026-05-11 math.ST cs.AI cs.LG stat.TH

Statistical inference with belief functions: A survey

Fabio Cuzzolin

AI总结 本文综述了基于信任函数的统计推断方法,重点探讨了在数据不足的情况下如何从统计数据中学习信任度量的问题。文章回顾了该领域的重要研究成果,总结了相关的核心方法与理论进展,为不确定性建模提供了有效的数学框架。

Comments 9 pages, 0 figures

详情
英文摘要

Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data. In this survey we focus, in particular, on making inference from statistical data, and review the most significant contributions in the area.

2605.07907 2026-05-11 stat.ML cs.CV cs.LG

Consistency Regularised Gradient Flows for Inverse Problems

Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra, O. Deniz Akyildiz

AI总结 本文提出了一种基于一致性正则化的梯度流方法,用于解决逆问题,通过统一的欧几里得-沃瑟斯坦2梯度流框架,在潜在空间中联合进行后验采样和提示优化,从而减少计算成本并提升重建质量。该方法结合少量步骤的潜在文本到图像模型,避免了通过自动编码器进行反向传播,显著降低了神经函数评估次数,实验表明其在多个经典成像逆问题中达到了最先进的性能。

详情
英文摘要

Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space through a single flow that aligns the prior and posterior with the observed data. Combined with few-step latent text-to-image models, this formulation enables low-NFE inference without backpropagation through autoencoders. Experiments across several canonical imaging inverse problems show that our method achieves state-of-the-art performance with significantly reduced computational cost.

2605.07896 2026-05-11 cs.CY cs.AI

What if AI systems weren't chatbots?

Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Avijit Ghosh

AI总结 本文探讨了人工智能系统日益依赖聊天机器人界面所带来的深远影响,指出这一范式并非中立,而是重塑了社会、经济、法律和环境等多个领域的主导技术配置。研究分析了聊天机器人在复杂或高风险场景中常无法满足用户需求的问题,并揭示了其对工作方式、学习模式和决策过程的改变,可能导致技能退化、知识同质化和对专业知识期望的转变。文章还讨论了聊天机器人普及带来的劳动替代、经济权力集中和环境成本上升等社会影响,呼吁重新思考人工智能的发展方向,强调多样化系统设计、任务专用工具和制度保障的重要性。

Comments Accepted at The 2026 ACM Conference on Fairness, Accountability, and Transparency, June 25--28, 2026, Montreal, QC, Canada

详情
英文摘要

The rapid convergence of artificial intelligence (AI) toward conversational chatbot interfaces marks a critical moment for the industry. This paper argues that the chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems. We examine how treating AI primarily as conversational assistants has extensive structural downsides. We show how chatbot-based systems often fail to adequately meet user needs, particularly in complex or high-stakes contexts, while projecting confidence and authority. We further analyze how the normalization of chatbot-mediated interaction alters patterns of work, learning, and decision-making, contributing to deskilling, homogenization of knowledge, and shifting expectations of expertise. Finally, we examine broader societal effects, including labor displacement, concentration of economic power, and increased environmental costs driven by sustained investment in large-scale chatbot infrastructures. While acknowledging legitimate benefits, we argue that the current trajectory of AI development reflects specific value choices that prioritize conversational generality over domain specificity, accountability, and long-term social sustainability. We conclude by outlining alternative directions for AI development and governance that move beyond one-size-fits-all chatbots, emphasizing pluralistic system design, task-specific tools, and institutional safeguards to mitigate social and economic harm.