2606.06819 2026-06-08 cs.CV 新提交

VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation

VideoSEG-O3：用于推理视频对象分割的多轮强化学习框架

Ming Dai, Sen Yang, Boqiang Duan, Boyuan Tong, Jiedong Zhuang, Wankou Yang, Jingdong Wang

AI总结提出VideoSEG-O3，首个多轮强化学习框架，通过多轮时空思维链和SEG感知逻辑校准，实现从粗到细的推理视频对象分割，解决复杂视频中的精确像素定位问题。

Comments ICML2026

详情

AI中文摘要

推理视频对象分割（RVOS）需要时间动态、空间细节和语言推理的复杂集成，以实现精确的像素级定位。现有方法局限于对固定初始输入进行推理，缺乏主动获取更多视觉证据的能力，而这对于解决长或复杂视频中的复杂引用通常至关重要。为了解决这个问题，我们提出了\textbf{VideoSEG-O3}，这是第一个用于RVOS的多轮强化学习框架，模拟人类的“从粗到细”认知过程。它采用\textit{多轮时空思维链}，通过迭代定位关键区间和关键帧来捕获细粒度细节。此外，为了使策略在强化学习阶段能够感知超出\texttt{[SEG]}文本概率的分割质量，我们引入了\textit{SEG感知逻辑校准}，将像素级分割反馈直接集成到令牌级逻辑中。此外，我们设计了一个\textit{解耦思考轨迹}，将推理过程分层分解为时间、空间和语言维度，并构建了\textbf{VTS-CoT}，一个包含全面推理轨迹的专门冷启动数据集。代码和模型将在以下网址发布：this https URL。

英文摘要

Reasoning Video Object Segmentation (RVOS) demands a sophisticated integration of temporal dynamics, spatial details, and linguistic reasoning to achieve precise pixel-level localization. Existing methods are limited to reasoning over fixed initial inputs and lack the capacity to actively acquire further visual evidence, which is often essential for resolving complex references in long or intricate videos. To address this, we propose \textbf{VideoSEG-O3}, the first multi-turn reinforcement learning framework for RVOS that emulates the human \textit{``coarse-to-fine''} cognitive process. It employs a \textit{multi-turn temporal-spatial chain-of-thought} to capture fine-grained details by iteratively pinpointing critical intervals and keyframes. Additionally, to enable the policy to perceive segmentation quality beyond mere text probability of \texttt{[SEG]} during the RL stage, we introduce \textit{SEG-aware logit calibration}, which integrates pixel-wise segmentation feedback directly into the token-level logits. Furthermore, we design a \textit{decoupled thinking trace} to hierarchically decompose the reasoning process into temporal, spatial, and linguistic dimensions, and construct \textbf{VTS-CoT}, a specialized cold-start dataset featuring comprehensive reasoning trajectories. The code and models will be released at https://github.com/Dmmm1997/VideoSEG-O3.

URL PDF HTML ☆

赞 0 踩 0

2606.06748 2026-06-08 cs.CL cs.AI cs.LG 新提交

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

检索增强生成中的证据图一致性：基于模型的幻觉检测分析

Jianru Shen

AI总结提出证据图一致性（EGC）框架，通过构建局部证据图并计算五种结构一致性指标检测幻觉，发现不同模型族间一致性特征方向相反，表明嵌入图一致性不能作为模型无关的检测信号。

Comments Accepted at the International Conference on Advanced Machine Learning and Data Science; to appear in the IEEE Xplore proceedings

详情

AI中文摘要

检索增强生成（RAG）减少了但并未消除大型语言模型中的幻觉。现有检测方法依赖于生成答案与检索段落之间的平面相似性，忽略了证据片段与答案声明之间的结构关系。我们提出了证据图一致性（EGC）框架，该框架为每个响应构建一个局部证据图，并计算五种结构一致性度量作为幻觉指标。在RAGTruth的完整问答拆分上，跨六个LLM（5,767个响应）进行评估，EGC揭示了一个一致的模型族分裂：图一致性特征在Llama-2模型中显示出预期的诊断方向，但在GPT-4、GPT-3.5和Mistral-7B中表现出系统性逆转。这种逆转表明不同模型族之间存在定性的不同幻觉模式，并表明基于嵌入的图一致性不能作为模型无关的幻觉检测信号。

英文摘要

Retrieval-Augmented Generation (RAG) reduces but does not eliminate hallucination in large language models. Existing detection methods rely on flat similarity between generated answers and retrieved passages, ignoring structural relationships among evidence pieces and answer claims. We propose Evidence Graph Consistency (EGC), a framework that constructs a local evidence graph per response and computes five structural consistency measures as hallucination indicators. Evaluated on the full question answering split of RAGTruth across six LLMs (5,767 responses), EGC reveals a consistent model-family split: graph consistency features show the expected diagnostic direction for hallucinations in Llama-2 models but exhibit systematic reversal in GPT-4, GPT-3.5, and Mistral-7B. This reversal suggests qualitatively different hallucination patterns across model families and indicates that embedding-based graph consistency cannot serve as a model-independent hallucination detection signal.

URL PDF HTML ☆

赞 0 踩 0

2606.06682 2026-06-08 cs.LG 新提交

Spatiotemporal Imputation with Graph-Informed Flow Matching

基于图信息流匹配的时空插补

Zepeng Zhang, Aref Einizade, Jhony H. Giraldo, Olga Fink

AI总结提出GiFlow框架，利用图信息先验和混合向量场模型进行时空插补，优于现有方法。

Comments Accepted at ICML 2026

详情

AI中文摘要

缺失数据是时空系统中的常见挑战，出现在空气质量监测和城市交通管理等应用中。传统的机器学习方法，如循环神经网络和图神经网络，依赖于迭代传播，这往往会在时间和空间上累积误差。最近的基于扩散的方法减轻了误差传播，但需要迭代采样，并且通常依赖于问题无关的高斯先验，限制了效率和有效性。为了解决这些局限性，我们提出了GiFlow，一种用于时空插补的图信息流匹配框架。GiFlow将典型的高斯先验替换为通过时空滤波可观测信号构建的图信息先验，这更好地使源分布与目标对齐，从而简化了生成轨迹。流场由一个混合向量场模型参数化，该模型整合了空间注意力、时间注意力和时空传播，能够联合建模空间和时间依赖性。在合成和真实世界数据集上的大量实验表明，所提出的GiFlow在时空插补中优于最先进的方法。代码可在该 https URL 获取。

英文摘要

Missing data is a common challenge in spatiotemporal systems, arising in applications such as air quality monitoring and urban traffic management. Traditional machine learning approaches, like recurrent and graph neural networks, rely on iterative propagation, which tends to accumulate errors over time and space. Recent diffusion-based methods mitigate error propagation but require iterative sampling and often depend on problem-agnostic Gaussian priors, limiting both efficiency and effectiveness. To address these limitations, we propose GiFlow, a Graph-Informed Flow Matching framework for spatiotemporal imputation. GiFlow replaces the typical Gaussian prior with a graph-informed prior constructed via spatiotemporal filtering of observable signals, which better aligns the source distribution to the target and thereby simplifies the generation trajectory. The flow field is parameterized by a hybrid vector field model that integrates spatial attention, temporal attention, and spatiotemporal propagation, enabling joint modeling of spatial and temporal dependencies. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed GiFlow outperforms the state-of-the-art approaches in spatiotemporal imputation. The code is available at https://github.com/zepengzhang/GiFlow.

URL PDF HTML ☆

赞 0 踩 0

2606.06663 2026-06-08 cs.LG 新提交

Explainable Runtime Dependency Tracking for AI-RAN Conflict Monitoring

面向AI-RAN冲突监控的可解释运行时依赖追踪

Christie Djidjev, Nicholas Kaminski

AI总结针对AI-RAN中参数-KPI依赖关系可能失效的问题，提出基于布尔矩阵的滑动窗口推理方法，通过事件流一致性检测实现轻量级可解释依赖追踪。

详情

AI中文摘要

未来集成AI的无线接入网络（AI-RAN）将结合开放可编程性与支持学习的xApps、rApps以及作用于共享参数和关键性能指标（KPI）的控制功能。对于冲突监控，仅知道部署了哪些应用是不够的；系统还必须知道运行时诊断所假设的参数-KPI依赖关系在当前运行状态下是否仍然有效。本文研究了一种轻量级的监控原语：从流式遥测事件中追踪可解释的依赖关系表示。我们将活跃依赖关系表示为布尔矩阵，并使用布尔矩阵乘法来检查最近的参数活动事件和KPI响应事件是否与当前估计一致。我们提出了一种滑动窗口推理过程，当估计一致时重复使用，当最近观测表明结构变化时重新计算。该追踪器旨在作为冲突诊断和慢循环模型刷新的可解释信号，而非自主缓解机制。在受控的布尔事件流上的实验表明，在依赖关系变化和布尔观测噪声下，该追踪器能够高效且准确地追踪。

英文摘要

Future AI-integrated Radio Access Networks (AI-RAN) will combine open programmability with learning-enabled xApps, rApps, and control functions that act on shared parameters and key performance indicators (KPIs). For conflict monitoring, it is not enough to know which applications are deployed; the system must also know whether the parameter--KPI dependencies assumed by runtime diagnosis remain valid under the current operating regime. This paper studies a lightweight monitoring primitive for that purpose: tracking an interpretable dependency representation from streaming telemetry events. We represent active dependencies by a Boolean matrix and use Boolean matrix multiplication to check whether recent parameter-activity and KPI-response events are consistent with the current estimate. We propose a sliding-window inference procedure that reuses the estimate when it remains consistent and recomputes it when recent observations indicate structural change. The tracker is intended as an explainable signal for conflict diagnosis and slow-loop model refresh, not as an autonomous mitigation mechanism. Experiments on controlled Boolean event streams show efficient and accurate tracking under dependency changes and Boolean observation noise.

URL PDF HTML ☆

赞 0 踩 0

2606.06572 2026-06-08 cs.LG cs.AI cs.CY econ.GN q-fin.EC 新提交

Generative Models Erode Human Temporal Learning Through Market Selection

生成模型通过市场选择侵蚀人类时间学习

Wenjun Cao

AI总结本文论证现代生成模型在亚AGI能力水平上通过市场选择机制侵蚀人类时间学习，提出价值崩溃路径并用昂贵检验框架形式化，跨领域证据显示验证侵蚀四阶段。

Comments Accepted at ICML 2026

Journal ref Forty-third International Conference on Machine Learning Position Paper Track (2026)

详情

未配准光谱图像融合：解混、对抗学习与可恢复性

Jiahui Song, Sagar Shrestha, Xiao Fu

AI总结提出无监督框架，通过耦合光谱解混和潜在空间对抗学习同时超分辨未配准的高光谱和多光谱图像，并首次建立可恢复性理论保证。

详情

AI中文摘要

本文研究一对空间未配准的高光谱图像（HSI）和多光谱图像（MSI）的融合问题，两者覆盖大致重叠区域。HSI提供高光谱但低空间分辨率，而MSI则相反。目标是整合它们的互补信息，以提升HSI空间分辨率和MSI光谱分辨率。虽然高光谱-多光谱融合（HMF）已被广泛研究，但未配准设置仍然具有挑战性。许多现有方法仅关注MSI超分辨，而保持HSI不变。监督深度学习方法被提出用于HSI超分辨，但依赖于准确的训练数据，这通常不可用。此外，理论分析主要处理已配准情况，导致未配准HMF理解不足。本文提出一种无监督框架，同时超分辨MSI和HSI。该方法将用于MSI超分辨的耦合光谱解混与用于HSI超分辨的潜在空间对抗学习相结合。在合理的生成模型下，建立了超分辨MSI和HSI可恢复性的理论保证——据我们所知，这是首次为未配准HMF提供此类见解。该方法在半真实和真实HSI-MSI对的不同条件下得到验证。

英文摘要

This paper addresses the fusion of a pair of spatially unregistered hyperspectral image (HSI) and multispectral image (MSI) covering roughly overlapping regions. HSIs offer high spectral but low spatial resolution, while MSIs provide the opposite. The goal is to integrate their complementary information to enhance both HSI spatial resolution and MSI spectral resolution. While hyperspectral-multispectral fusion (HMF) has been widely studied, the unregistered setting remains challenging. Many existing methods focus solely on MSI super-resolution, leaving HSI unchanged. Supervised deep learning approaches were proposed for HSI super-resolution, but rely on accurate training data, which is often unavailable. Moreover, theoretical analyses largely address the co-registered case, leaving unregistered HMF poorly understood. In this work, an unsupervised framework is proposed to simultaneously super-resolve both MSI and HSI. The method integrates coupled spectral unmixing for MSI super-resolution with latent-space adversarial learning for HSI super-resolution. Theoretical guarantees on the recoverability of the super-resolution MSI and HSI are established under reasonable generative models -- providing, to our best knowledge, the first such insights for unregistered HMF. The approach is validated on semi-real and real HSI-MSI pairs across diverse conditions.

URL PDF HTML ☆

赞 0 踩 0

2303.11949 2026-06-08 cs.NE cs.LG

A fuzzy adaptive evolutionary-based feature selection and machine learning framework for single and multi-objective body fat prediction

一种基于模糊自适应进化的方法用于单目标和多目标身体脂肪预测的特征选择和机器学习框架

Farshid Keivanian, Raymond Chiong, Zongwen Fan

AI总结本文提出了一种融合模糊集理论和进化算法的特征选择与机器学习框架，用于提升身体脂肪预测的准确性与稳定性，同时解决多目标优化中的冲突问题。

Comments Due to unforeseen challenges in coordination and supervision, including unavoidable delays, this study requires further review and refinement. To ensure it meets necessary academic and methodological standards, we have decided to withdraw the paper. We appreciate the understanding of the research community

Journal ref Neurocomputing, Article 132974, 2026

详情

DOI: 10.1016/j.neucom.2026.132974

AI中文摘要

预测身体脂肪可以为医疗人员和用户提供预防和诊断心脏病的重要信息。混合机器学习模型通过选择相关身体测量值并捕捉所选特征之间的复杂非线性关系，比简单的回归分析方法表现更好。然而，这些模型也存在一些缺点。将身体脂肪预测建模为组合的单目标和多目标优化问题时，常常陷入局部最优。当多个特征子集产生相似或接近的预测时，避免局部最优变得更加复杂。进化特征选择已被用于解决几种基于机器学习的优化问题。模糊集理论决定了探索和利用的适当水平，同时管理参数化和计算成本。通过进化特征选择、模糊集理论和机器学习算法，探索了一种加权求和身体脂肪预测方法，将矛盾的指标整合到一个复合目标中，由模糊自适应进化特征选择优化。混合模糊自适应全局学习局部搜索通用多样性特征选择应用于这种单目标特征选择-机器学习框架（FAGLSUD-based FS-ML）。在使用较少特征的情况下，该模型比其他混合和最新机器学习模型获得了更准确和稳定的脂肪百分比估计。还提出了多目标FAGLSUD-based FS-MLP，用于同时分析准确性、稳定性和维度冲突。为了做出关于最关键身体部位脂肪沉积和血液脂质水平的明智决策，医疗人员和用户可以使用一个良好的分布的帕累托集的权衡解决方案。

英文摘要

Predicting body fat can provide medical practitioners and users with essential information for preventing and diagnosing heart diseases. Hybrid machine learning models offer better performance than simple regression analysis methods by selecting relevant body measurements and capturing complex nonlinear relationships among selected features in modelling body fat prediction problems. There are, however, some disadvantages to them. Current machine learning. Modelling body fat prediction as a combinatorial single- and multi-objective optimisation problem often gets stuck in local optima. When multiple feature subsets produce similar or close predictions, avoiding local optima becomes more complex. Evolutionary feature selection has been used to solve several machine-learning-based optimisation problems. A fuzzy set theory determines appropriate levels of exploration and exploitation while managing parameterisation and computational costs. A weighted-sum body fat prediction approach was explored using evolutionary feature selection, fuzzy set theory, and machine learning algorithms, integrating contradictory metrics into a single composite goal optimised by fuzzy adaptive evolutionary feature selection. Hybrid fuzzy adaptive global learning local search universal diversity-based feature selection is applied to this single-objective feature selection-machine learning framework (FAGLSUD-based FS-ML). While using fewer features, this model achieved a more accurate and stable estimate of body fat percentage than other hybrid and state-of-the-art machine learning models. A multi-objective FAGLSUD-based FS-MLP is also proposed to analyse accuracy, stability, and dimensionality conflicts simultaneously. To make informed decisions about fat deposits in the most vital body parts and blood lipid levels, medical practitioners and users can use a well-distributed Pareto set of trade-off solutions.

URL PDF HTML ☆

赞 0 踩 0

2606.07469 2026-06-08 econ.EM cs.NA econ.TH math.NA math.PR 新提交

Statistical and Numerical Convergence in Stochastic Equilibrium

随机均衡中的统计与数值收敛

David Staines

AI总结本文基于SELCKE的严格随机均衡理论，发现系统以特征值或逆特征值中更接近单位圆者与最大冲击持久性中较大者给出的速率几何收敛至长期均衡，并开发了检验随机均衡存在的模拟程序。

Comments 91 Pages: 63 Main Text, 28 Suppelementary Materials

详情

AI中文摘要

本文阐述了来自SELCKE（Staines (2024a)）arXiv:2312.16214的严格随机均衡理论的最一般的计算和计量经济学含义。分析基础是发现系统几何收敛至长期均衡，其速率由特征值或逆特征值（来自外部）中更接近单位圆者与最大冲击持久性中的较大者给出。高阶冲击收敛更快。我开发了一个模拟程序，用于渐近检验特定模型是否存在随机均衡。基本逼近结果断言，无论展开阶数或损失函数如何，随机稳态都能提供最准确的摄动解。我还证明了当二阶项消失时，会出现超一致参数估计量$O(1/T)$。除了Calvo模型，我还研究了两种替代定价模型中的随机均衡。动力学显著简化。我通过误差中的最大滞后限制了脉冲响应达到峰值的时间。这为泰勒合同提供了经验支持，尽管存在单位根和强成本渠道的问题。对于菜单成本，我证明了初始价格分布超指数衰减，产生了一个等价于具有内生重置概率的Calvo模型的系统。异质性扰动的影响表现为实际产出与有效产出之间的额外楔子。借助新的分布论证，证明了目标函数在边界处的爆破，因此该模型满足递归均衡的现有特征值存在条件。在此过程中，为现有的理论模型和统计程序提供了新的见解。

英文摘要

This paper sets out the most general computational and econometric implications of the rigorous stochastic equilibrium theory from SELCKE (Staines (2024a)) arXiv:2312.16214. The analytical backbone is the discovery that the system converges geometrically to long-run equilibrium, at a rate given by the greater of the eigenvalue or inverse eigenvalue (from outside) closest to the unit circle and the maximum shock persistence. High-order shocks converge faster. I develop a simulation procedure to test, with asymptotic power, whether stochastic equilibrium exists for a particular model. The fundamental approximation result asserts that, whatever the order of expansion or loss function, the stochastic steady state delivers the most accurate perturbation solution. I also show that super-consistent parameter estimators $O(1/T)$ arise whenever second-order terms vanish. Besides Calvo, I study stochastic equilibrium in two alternative pricing models. Dynamics simplify considerably. I bound the time the impulse response peaks, by the maximum lag in the errors. This lends empirical support to Taylor contracts, although there are issues surrounding unit roots and the strong cost-channel. For menu costs, I demonstrate that the initial price distribution decays away super-exponentially, producing a system equivalent to Calvo with an endogenous reset probability. The impact of idiosyncratic disturbances appears as an additional wedge between actual and efficient output. Blow-up of the objective function at the boundary is proven, with the help of new distributional arguments, so the model meets existing eigenvalue existence conditions for the recursive equilibrium. Along the way, new light is shone on existing theoretical models and statistical procedures.

URL PDF HTML ☆

赞 0 踩 0

2606.07049 2026-06-08 econ.EM 新提交

CausalAlpha: A Real-Time Geopolitical Risk Index from OSINT Channels for Causal Discovery in Financial Markets

CausalAlpha: 来自OSINT渠道的实时地缘政治风险指数及其在金融市场因果发现中的应用

Andres Azqueta-Gavaldon, Borja Ureta

AI总结提出CausalAlpha框架，利用Telegram OSINT渠道构建高频地缘政治风险指数，通过PC算法发现地缘政治不确定性与金融变量之间的有向因果结构，并识别出政治不稳定和能源媒体覆盖是冲突覆盖的因果前因。

详情

AI中文摘要

我们介绍了CausalAlpha，一个开源框架，它利用自然语言处理从Telegram OSINT渠道构建高频地缘政治风险（GPR）指数，并应用因果发现方法识别地缘政治不确定性与金融市场变量之间的有向因果结构。与标准的情绪指数或格兰杰因果关系方法不同，CausalAlpha采用Peter-Clark（PC）算法来恢复五个类别特定GPR指标与一组涵盖大宗商品价格、股票指数和信用工具的金融变量之间的因果依赖有向无环图（DAG），并在四种DAG规范和三个显著性水平下使用500次块自助重采样进行估计。在alpha = 0.10时，所有DAG规范中出现了两个全局稳健的发现：政治不稳定和能源媒体覆盖独立且因果地先于冲突覆盖，将冲突确立为实时OSINT渠道中地缘政治叙事升级的主要因果汇。在最严格的显著性水平（alpha = 0.05）下，冲突覆盖因果地先于能源板块股票回报（delta XLE），这与地缘政治升级传导至能源市场一致。核心宏观面板的结构VAR证实，地缘政治NLP信号到金融市场价格的动态传导在日频上统计上较弱，表明地缘政治新闻信号主要作用于媒体叙事系统内部。该框架作为生产应用程序部署在Google Cloud Run上，具有自动数据收集和指数构建功能，代表了利用OSINT进行实时宏观金融风险监测的一步。

英文摘要

We introduce CausalAlpha, an open-source framework that constructs a high-frequency Geopolitical Risk (GPR) index from Telegram OSINT channels using natural language processing, and applies causal discovery methods to identify the directed causal structure between geopolitical uncertainty and financial market variables. Unlike standard sentiment indices or Granger-causality approaches, CausalAlpha employs the Peter-Clark (PC) algorithm to recover the directed acyclic graph (DAG) of causal dependencies between five category-specific GPR indicators and a set of financial variables spanning commodity prices, equity indices, and credit instruments, estimated across four DAG specifications and three significance levels with 500 block-bootstrap resamples. Two findings emerge as globally robust across all DAG specifications at alpha = 0.10: political instability and energy media coverage independently and causally precede conflict coverage, establishing conflict as the primary causal sink of geopolitical narrative escalation in real-time OSINT channels. At the strictest significance level (alpha = 0.05), conflict coverage causally precedes energy sector equity returns (delta XLE), consistent with geopolitical escalation transmitting to energy markets. A Structural VAR on the core macro panel confirms that dynamic transmission from geopolitical NLP signals to financial market prices is statistically weak at daily frequency, suggesting that geopolitical news signals operate primarily within the media narrative system. The framework is deployed as a production application on Google Cloud Run with automated data collection and index construction, representing a step toward real-time macrofinancial risk monitoring using OSINT.

URL PDF HTML ☆

赞 0 踩 0

2606.06638 2026-06-08 econ.EM 新提交

Audio Imitator: 通过音频参考控制视频到音频合成中的音色和节奏

Jiahui Zhao, Tianrui Wang, Chunyu Qiang, Cheng Gong, Xijuan Zeng, Feng Deng, Longbiao Wang

AI总结提出AudioIM框架，通过双编码器分离建模音色和节奏，实现细粒度风格控制，在保持语义一致性的同时提升风格相似度。

详情

AI中文摘要

视频到音频生成在实现无声视频的语义一致性和时间对齐方面取得了显著进展。然而，音频包含丰富的风格属性，如音色和节奏，这些很难仅从视觉和文本输入中推断出来。虽然参考音频可以作为额外的条件，但它通常被视为整体信号，限制了细粒度的风格控制。我们提出AudioIM，一个属性感知框架，明确将音色和节奏建模为独立的控制因素，而不是依赖整体提示条件。双编码器提取互补的音色相关和节奏相关表示，并通过全局条件注入。基于掩码的训练策略使得在推理时能够进行有效的潜在提示条件。在VGGSound上的实验表明，在保持语义对齐和同步的同时，风格相似度得到了提升。音频样本可在以下网址获取：this https URL。

英文摘要

Video-to-audio generation has made significant progress in achieving semantic consistency and temporal alignment from silent videos. However, audio contains rich stylistic attributes such as timbre and tempo that are difficult to infer from visual and textual inputs alone. While reference audio can serve as additional conditioning, it is typically treated as a holistic signal, limiting fine-grained style control. We propose AudioIM, an attribute-aware framework that explicitly models timbre and tempo as separate control factors rather than relying on holistic prompt conditioning. Dual encoders extract complementary timbre-related and tempo-related representations, which are injected through global conditioning. A masking-based training strategy enables effective latent prompt conditioning at inference. Experiments on VGGSound show improved style similarity while preserving semantic alignment and synchronization. Audio samples are available at: https://anonymousdemo757.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2606.07104 2026-06-08 eess.SP 新提交

Robust Secure Beamforming for Movable Antenna Enhanced Integrated Sensing and Communications

可移动天线增强集成感知与通信的鲁棒安全波束赋形

Yuan Chen, Ning Wei, Ahmad Bazzi, Xiangyu Dong, Ran Yang, You Li, Yue Xiu

AI总结针对不完美窃听信道状态信息，提出联合优化发射波束赋形和天线位置的鲁棒波束赋形设计，以最大化雷达信干噪比并保证通信安全，采用基于块坐标下降的算法结合逐次凸近似和分数规划。

详情

AI中文摘要

在这封信中，我们研究了可移动天线增强的安全集成感知与通信系统中，在存在不完美窃听信道状态信息情况下的鲁棒波束赋形设计。为了提升雷达感知性能，我们通过联合优化发射波束赋形和天线位置，同时确保通信数据安全，提出了一个雷达信干噪比最大化问题。然而，由于天线位置到信道系数的非线性映射以及窃听者信道的不确定性，所得到的优化问题本质上是难以处理的。为了应对这些挑战，我们提出了一种基于块坐标下降的算法，结合了逐次凸近似和分数规划技术。仿真结果表明，我们提出的算法具有快速收敛性，并在保证通信安全的同时显著提升了雷达信干噪比。

英文摘要

In this letter, we investigate robust beamforming design for a movable antenna (MA)-enhanced secure integrated sensing and communications (ISAC) system with imperfect eaves?dropping channel state information (CSI). To improve radar sensing performance, we formulate a radar signal-to-interference?plus-noise ratio (SINR) maximization problem by jointly opti?mizing the transmit beamforming and antenna placement while ensuring communication data security. However, the resulting op?timization problem is inherently intractable due to the nonlinea mapping from antenna positions to channel coefficients, as well as the eavesdropper (Eve) channel uncertainty. To handle these challenges, we propose a block coordinate descent (BCD)-based algorithm incorporating successive convex approximation (SCA) and fractional programming (FP) techniques. Simulation results show that our proposed algorithm exhibits fast convergence and achieves a significant improvement in the radar SINR while guaranteeing communication security.

URL PDF HTML ☆

赞 0 踩 0

2606.07091 2026-06-08 eess.SP 新提交

变长有限速率CSI反馈与生成先验

Yangxuan Cheng, Fanyang Meng, Jian Zou, Jiacheng Xie, Zhongqiang Zhang, Ye Wang, Yongsheng Liang

AI总结提出CsiCoGen，一种基于生成扩散模型的变长CSI反馈结构，通过可迁移码本实现灵活序列长度和量化精度，无需联合训练，在COST2100上达到高码率下室内-31 dB、室外-20 dB NMSE。

详情

AI中文摘要

本文从结构角度研究了变长有限速率CSI反馈，并提出了CsiCoGen，一种新颖的生成式反馈结构，具有无需联合训练的可迁移码本机制。UE将$H_0$映射为有序的码本索引序列，而BS利用共享的去噪先验从接收到的任意部分反馈索引序列中递归恢复CSI。这通过码本大小实现了反馈序列长度和每步量化精度的灵活控制。CsiCoGen不需要联合训练特定任务的反馈编码器或码本与重构器，且相同的在线结构可以搭配不同的预训练去噪器。在本文中，我们使用生成扩散模型实例化解码器。在COST2100上的仿真结果表明，与代表性基线相比，CsiCoGen在速率-NMSE和速率-$\ ho$权衡上表现优异，在高码率下达到约-31 dB室内NMSE和-20 dB室外NMSE，同时展示了可扩展的解码复杂度和可调节的每步量化精度。

英文摘要

This letter studies variable-length finite-rate CSI feedback from a structural perspective and proposes CsiCoGen, a novel generative feedback structure with a transferable codebook mechanism without joint training. The UE maps $H_0$ into an ordered sequence of codebook indices, while the BS recursively recovers CSI from any received partial sequence of feedback indices using a shared denoising prior. This enables flexible control of feedback sequence length and per-step quantization precision through codebook size. CsiCoGen does not require jointly training a task-specific feedback encoder or codebook with the reconstructor, and the same online structure can be paired with different pretrained denoisers. In this work, we instantiate the decoder with a generative diffusion model. Simulation results on COST2100 show favorable rate-NMSE and rate-$ρ$ tradeoffs against representative baselines, with CsiCoGen reaching about -31 dB indoor NMSE and -20 dB outdoor NMSE in the high-rate regime while demonstrating scalable decoding complexity and adjustable per-step quantization precision.

URL PDF HTML ☆

赞 0 踩 0

2606.06792 2026-06-08 eess.SP 新提交

Copula Function Parameter Regions in Analyzing Wireless Communications Performances

无线通信性能分析中的Copula函数参数区域

Mona Mohsenzadeh, Saeid Pakravan, Ghosheh Abed Hodtani

AI总结提出Copula依赖参数区域概念，通过两用户MAC信道中FGM Copula的示例，从通信和概率角度推导参数区域，表明实际需求可显著缩小经典可容许区间。

详情

AI中文摘要

Copula函数已广泛应用于无线通信分析中，用于建模依赖结构和评估系统性能。然而，现有研究通常用Copula依赖参数表达性能指标，而未明确表征其可容许区域。本文介绍了Copula依赖参数区域的概念，并研究了其在无线通信中的重要性。考虑一个由双变量Farlie--Gumbel--Morgenstern (FGM) Copula建模的相关瑞利衰落的两用户无线多址接入信道 (MAC)，从中断概率和皮尔逊相关系数 (PCC) 约束出发，从通信理论和概率角度推导出显式参数区域。结果表明，实际通信和统计要求可以显著缩小经典的Copula可容许区间，使得一些理论上可容许的依赖结构变得不可行。数值示例说明了所提出的概念及其实际意义。

英文摘要

Copula functions have been widely employed in wireless communication analysis to model dependence structures and evaluate system performance. However, existing studies generally express performance metrics in terms of copula dependence parameters without explicitly characterizing their admissible regions. This letter introduces the concept of copula dependence parameter regions and investigates its significance in wireless communications. Considering a two-user wireless multiple access channel (MAC) with correlated Rayleigh fading modeled by the bivariate Farlie--Gumbel--Morgenstern (FGM) copula, explicit parameter regions are derived from communication-theoretic and probabilistic perspectives using outage probability and Pearson correlation coefficient (PCC) constraints. The results show that practical communication and statistical requirements can significantly shrink the classical copula admissible interval, rendering some theoretically admissible dependence structures infeasible. Numerical examples illustrate the proposed concept and its practical implications.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

CAPE: Contrastive Action-conditioned Parallel Encoding for Embodied Planning

Extending Responsibility-Sensitive Safety for the Assessment of Offloaded Autonomous Driving Services

Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments

VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

Spatiotemporal Imputation with Graph-Informed Flow Matching

Explainable Runtime Dependency Tracking for AI-RAN Conflict Monitoring

Generative Models Erode Human Temporal Learning Through Market Selection

Are you sure? A Comprehensive and Comprehensible Survey of Uncertainty Quantification in Symbolic Regression

CARVE-Q: Quantum-Proposed, Classically Certified Interactive Driving Repair

Topology-Aware Skeleton Detection via Lighthouse-Guided Structured Inference

Unregistered Spectral Image Fusion: Unmixing, Adversarial Learning, and Recoverability

A fuzzy adaptive evolutionary-based feature selection and machine learning framework for single and multi-objective body fat prediction

Statistical and Numerical Convergence in Stochastic Equilibrium

CausalAlpha: A Real-Time Geopolitical Risk Index from OSINT Channels for Causal Discovery in Financial Markets

Consistent estimation in logit models using historical choices as practical consideration set

CSI Phase Averaging for High-Sensitivity Wi-Fi Sensing in Low-Multipath Environments

Implementation and Calibration of 3GPP-Compliant ISAC Channel Simulator

RSMA Enabled Hierarchical UAV Networks with Non Linear Energy Harvesting: Outage Probability Analysis and UAV Placement Optimization

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

Audio Imitator: Controlling Timbre and Tempo in Video2Audio Synthesis with Audio Reference

Robust Secure Beamforming for Movable Antenna Enhanced Integrated Sensing and Communications

Rate-Splitting--Inspired Uplink Near-Field ISAC

Optimized Sampling of Angle-Resolved Scatterometry Data Using End-to-End Compressed Learning Model for Nanograss Deficiency Detection

A Novel Stripe-based RIS Optimization for UAV Communications and Sensing in Low-Altitude Wireless Networks

FSC-Net: Integrating Fast Fourier Convolutions and Progressive Learning for Speech Bandwidth Extension

Learn to Access and Backhaul the Sky: Multi-Scale Radio Map Guided Multi-UAV Cooperation

A 3D Formulation of the Extended Phaseless Rytov Approximation

Variable-Length Finite-Rate CSI Feedback With Generative Priors

Copula Function Parameter Regions in Analyzing Wireless Communications Performances