arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.00825 2026-05-04 cs.CV

Posterior Augmented Flow Matching

George Stoica, Sayak Paul, Matthew Wallingford, Vivek Ramanujan, Abhay Nori, Winson Han, Ali Farhadi, Ranjay Krishna, Judy Hoffman

详情
英文摘要

Flow matching (FM) trains a time-dependent vector field that transports samples from a simple prior to a complex data distribution. However, for high-dimensional images, each training sample supervises only a single trajectory and intermediate point, yielding an extremely sparse and high-variance training signal. This under-constrained supervision can cause flow collapse, where the learned dynamics memorize specific source-target pairings, mapping diverse inputs to overly similar outputs, failing to generalize. We introduce Posterior-Augmented Flow Matching (PAFM), a theoretically grounded generalization of FM that replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition. PAFM factorizes this intractable posterior into (i) the likelihood of the intermediate under a hypothesized endpoint and (ii) the prior probability of that endpoint under the condition, and uses an importance sampling scheme to construct a mixture over multiple candidate targets. We prove that PAFM yields an unbiased estimator of the original FM objective while substantially reducing gradient variance during training by aggregating information from many plausible continuation trajectories per intermediate. Finally, we show that PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2), different architectures (SiT and MMDiT), and in both class and text conditioned benchmarks (ImageNet and CC12M), with a negligible increase in the compute overhead. Code: https://github.com/gstoica27/PAFM.git.

2605.00824 2026-05-04 cs.MM

CustomDancer: Customized Dance Recommendation by Text-Dance Retrieval

Yawen Qin, Ke Qiu, Qin Zhang

详情
英文摘要

Dance serves as both a cultural cornerstone and a medium for personal expression, yet the rapid growth of online dance content has made personalized discovery increasingly difficult. Text-based dance retrieval offers a natural interface for users to search with choreographic intent, but it remains underexplored because dance requires simultaneous reasoning over linguistic semantics, musical rhythm, and full-body motion dynamics. We introduce TD-Data, a large-scale open dataset for text-dance retrieval, containing about 4,000 12-second dance clips, 14.6 hours of motion, 22 genres, and annotations from professional dance experts. On top of this dataset, we propose CustomDancer, a multimodal retrieval framework that aligns text with dance through a CLIP-based text encoder, music and motion encoders, and a music-motion blending module. CustomDancer achieves state-of-the-art performance on TD-Data, reaching 10.23% Recall@1 and improving retrieval quality in both quantitative benchmarks and user preference studies.

2605.00823 2026-05-04 math.NA cs.NA

Reliability, Robustness, and Resilience Modeling for Surveillance System in Advanced Air Mobility Operations

Esrat Farhana Dulia, Caleb Adams, Syed Arbab Mohd Shihab, Ruben Del Rosario

详情
英文摘要

Ensuring the safe and efficient operation of Advanced Air Mobility (AAM) in low-altitude airspace requires a reliable, robust, and resilient surveillance system capable of continuously detecting, identifying, and tracking aircraft under both normal and off-nominal conditions. To address this need, this study develops a comprehensive 3R modeling framework, reliability, robustness, and resilience, for the Surveillance for Advanced Air Mobility (SAM) system, with a focus on the optimal design and operation of a multi-type sensor network. Under normal operating conditions, the reliability model determines the baseline sensor types, quantities, and locations required to satisfy surveillance coverage and detection requirements. To address external perturbations, such as adverse weather conditions or sudden increases in AAM traffic demand, the robustness model identifies additional sensor requirements needed to maintain system performance. Furthermore, for surveillance outages caused by primary sensor failures, the resiliency model develops backup sensor deployment and dispatch strategies to provide temporary surveillance coverage, minimize operational disruptions, and support the safe continuation of AAM operations.

2605.00820 2026-05-04 cs.CE cs.LG cs.NA math.NA

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

Jinpai Zhao, Nishant Panda, Yen Ting Lin, Eirik Valseth, Diane Oyen, Clint Dawson

详情
英文摘要

We introduce HyCOP, a modular framework that learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures, boundary handling) in a query-conditioned way. Rather than learning a monolithic map, HyCOP learns a policy over short programs - which module to apply and for how long - conditioned on regime features and state statistics. Modules may be numerical sub-solvers or learned components, enabling hybrid surrogates evaluated at arbitrary query times without autoregressive rollout. Across diverse PDE benchmarks, HyCOP produces interpretable programs, delivers order-of-magnitude OOD improvements over monolithic neural operators, and supports modular transfer through dictionary updates (e.g., boundary swaps, residual enrichment). Our theory characterizes expressivity and gives an error decomposition that separates composition error from module error and doubles as a process-level diagnostic.

2605.00812 2026-05-04 cs.LO math.LO

Univalence without function extensionality

Evan Cavallo, Jonas Höfer

Comments 20 pages

详情
英文摘要

It is a well-known theorem of homotopy type theory, originally due to Voevodsky, that function extensionality holds inside any univalent universe. We consider a weaker variant of the univalence axiom, asserting that the wild category formed by the universe is univalent, which we call categorical univalence. We show that categorical univalence does not imply function extensionality by an analysis of Von Glehn's polynomial model construction, which produces models of Martin-Löf type theory that always refute function extensionality. We find in particular that when the base model has a univalent universe, its polynomial model has a universe that is categorically univalent but lacks function extensionality.

2605.00804 2026-05-04 cs.HC

Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence

Haoyu Wang, Fengyuan Zhu, Bingjian Huang, Zhecheng Wang, Ludwig Sidenmark

Comments Accepted to ACM DIS 2026

详情
英文摘要

Mixed Reality (MR) aims to blend digital and physical worlds, but the absence of haptic feedback often breaks visual-tactile consistency. We introduce Prop-Chromeleon, a MR system based on generative artificial intelligence (AI) that dynamically transforms everyday objects into adaptive passive haptic props through user-provided text prompts. Our AI pipeline performs generation and anchoring of virtual assets that align with the shape of physical props, allowing us to study how virtual content generation behaves under geometric and prompt-based constraints. We evaluate Prop-Chromeleon's effectiveness through a generation study using varied object shapes and user prompts, combining quantitative shape similarity metrics with qualitative prompt fidelity analysis. Our user study further showcases Prop-Chromeleon's improvements in perceived realism, immersion, and enjoyment compared to static baselines. These results show that shape-aware generation can support both believable haptic interaction and creative engagement in MR.

2605.00803 2026-05-04 cs.SE cs.AI cs.CL

Can Coding Agents Reproduce Findings in Computational Materials Science?

Ziyang Huang, Yi Cao, Ali K. Shargh, Jing Luo, Ruidong Mei, Mohd Zaki, Zhan Liu, Wyatt Bunstine, William Jurayj, Somdatta Goswami, Tyrel McQueen, Michael Shields, Jaafar El-Awady, Paulette Clancy, Benjamin Van Durme, Nicholas Andrews, William Walden, Daniel Khashabi

详情
英文摘要

Large language models are increasingly deployed as autonomous coding agents and have achieved remarkably strong performance on software engineering benchmarks. However, it is unclear whether such success transfers to computational scientific workflows, where tasks require not only strong coding ability, but also the ability to navigate complex, domain-specific procedures and to interpret results in the context of scientific claims. To address this question, we present AutoMat, a benchmark for evaluating LLM-based agents' ability to reproduce claims from computational materials science. AutoMat poses three interrelated challenges: recovering underspecified computational procedures, navigating specialized toolchains, and determining whether the resulting evidence supports a claim. By working closely with subject matter experts, we curate a set of claims from real materials science papers to test whether coding agents can recover and execute the end-to-end workflow needed to support (or undermine) such claims. We then evaluate multiple representative coding agent settings across several foundation models. Our results show that current LLM-based agents obtain low overall success rates on AutoMat, with the best-performing setting achieving a success rate of only 54.1%. Error analysis further reveals that agents perform worst when workflows must be reconstructed from paper text alone and that they fail primarily due to incomplete procedures, methodological deviations, and execution fragility. Taken together, these findings position AutoMat as both a benchmark for computational scientific reproducibility and a tool for diagnosing the current limitations of agentic systems in AI-for-science settings.

2605.00800 2026-05-04 cs.LG

Generating Statistical Charts with Validation-Driven LLM Workflows

Pavlin G. Poličar, Andraž Pevcin, Blaž Zupan

详情
英文摘要

Generating diverse, readable statistical charts from tabular data remains challenging for LLMs, as many failures become apparent after rendering and are not detectable from data or code alone. Existing chart datasets also rarely provide fully aligned artifacts, such as executable code, dataset context, and question-answer pairs. We present a structured LLM-based workflow that decomposes chart generation into dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation. By incorporating rendered-output validation, the workflow addresses visualization-specific failure modes such as readability and semantic mismatch. It treats chart generation as an inspectable process rather than a one-shot prompt-to-code task, retaining each chart with its code, dataset context, description, and question-answer pairs. Applied to UCI datasets, the workflow produces 1,500 charts from 74 datasets, spanning 24 chart families and paired with 30,003 question-answer pairs. We evaluate 16 multimodal LLMs (MLLMs) on these chart-question pairs. The results show that chart-syntax questions are nearly saturated, while value extraction, comparison, and reasoning remain more challenging, illustrating the workflow's utility for diagnostic studies of chart-grounded multimodal reasoning.

2605.00799 2026-05-04 cs.CV

GMGaze: MoE-Based Context-Aware Gaze Estimation with CLIP and Multiscale Transformer

Xinyuan Zhao, Yihang Wu, Ahmad Chaddad, Sarah A. Alkhodair, Reem Kateb

Comments Accepted in KBS

详情
英文摘要

Gaze estimation methods commonly use facial appearances to predict the direction of a person gaze. However, previous studies show three major challenges with convolutional neural network (CNN)-based, transformer-based, and contrastive language-image pre-training (CLIP)-based methods, including late fusion of image features, lack of factor-aware conditioning, and impractical capacity scaling. To address these challenges, we propose Globally-conditioned Multi-scale Gaze estimation (GMGaze), which leverages a multi-scale transformer architecture. Specifically, the model first introduces semantic prototype conditioning, which modulates the CLIP global image embedding using four learned prototype banks (i.e., illumination, background, head pose and appearance) to generate two complementary context-biased global tokens. These tokens, along with the CLIP patch and CNN tokens, are fused at the first layer. This early unified fusion prevents information loss common in late-stage merging. Finally, each token passes through sparse Mixture-of-Experts modules, providing conditional computational capacity without uniformly increasing dense parameters. For cross-domain adaptation, we incorporate an adversarial domain adaptation technique with a feature separation loss that encourages the two global tokens to remain de-correlated. Experiments using four public benchmarks (MPIIFaceGaze, EYEDIAP, Gaze360, and ETH-XGaze) show that GMGaze achieves mean angular errors of 2.49$^\circ$, 3.22$^\circ$, 10.16$^\circ$, and 1.44$^\circ$, respectively, outperforming previous baselines in all within-domain settings. In cross-domain evaluations, it provides state-of-the-art (SOTA) results on two standard transfer routes.

2605.00798 2026-05-04 cs.LG cs.CL cs.MA

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

Arunabh Srivastava, Mohammad A., Khojastepour, Srimat Chakradhar, Sennur Ulukus

详情
英文摘要

Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics. RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language with explicit control constructs (e.g., \texttt{IF}, \texttt{GOTO}, \texttt{FORALL}). Beyond verifying syntactic and semantic verification of the step output, which is performed based on the specific instruction of each step, RunAgent autonomously derives and validates constraints based on the description of the task and its instance at each step. RunAgent also dynamically selects among LLM-based reasoning, tool usage, and code generation and execution (e.g., in Python), and incorporates error correction mechanisms to ensure correctness. Finally, RunAgent filters the context history by retaining only relevant information during the execution of each step. Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs and state-of-the-art PlanGEN methods.

2605.00797 2026-05-04 cs.DS

A Faster Deterministic Algorithm for Fully Dynamic Maximal Matching

Julia Chuzhoy, Sanjeev Khanna, Junkai Song

详情
英文摘要

In the fully dynamic maximal matching problem, the goal is to maintain a maximal matching in a graph undergoing an online sequence of edge insertions and deletions. The problem has been studied extensively in the oblivious-adversary setting, where randomized algorithms with polylogarithmic worst-case and constant amortized update time have been known for some time. A major challenge in this area has been designing an algorithm with non-trivial update time against an adaptive adversary. In a recent breakthrough, Bernstein, Bhattacharya, Kiss, and Saranurak (STOC 2025; hereafter, BBKS25) obtained the first algorithms with sublinear update time for this setting: namely, a randomized algorithm with $\tilde{O}(n^{3/4})$ amortized update time, and a deterministic algorithm with $\tilde{O}(n^{8/9})$ amortized update time. Our main result is a deterministic algorithm for fully dynamic maximal matching with amortized update time $n^{1/2+o(1)}$. A powerful tool in dynamic matching is the use of matching sparsifiers: sparse subgraphs that preserve enough information to recover matchings with desired properties. Sparsifiers, such as the EDCS data structure, have been successfully used for approximate maximum matching. For maximal matching, however, this paradigm is not as natural, since maximality must hold with respect to the entire graph. Nevertheless, BBKS25 showed that EDCS can be repurposed as a verification-and-repair mechanism for fully dynamic maximal matching against adaptive adversaries. We introduce a new deterministic framework, referred to as the subgraph system, which, in contrast to EDCS, is purpose-built for verification and maintenance of maximality. It is also designed to allow efficient recursive refinements leading to stronger and stronger parameters, that yield our deterministic algorithm with $n^{1/2+o(1)}$ amortized update time.

2605.00796 2026-05-04 cs.CR cs.AI cs.CL

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

Alfredo Madrid-García, Miguel Rujas

详情
英文摘要

Background: Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information. AI-assisted development lowers the barrier to building them, but they still demand rigorous security, privacy, and governance controls. Objective: To report an anonymized, non-destructive security assessment of a publicly accessible patient-facing medical RAG chatbot and identify governance lessons for safe deployment of generative AI in health. Methods: We used a two-stage strategy. First, Claude Opus 4.6 supported exploratory prompt-based testing and structured vulnerability hypotheses. Second, candidate findings were manually verified using Chrome Developer Tools, inspecting browser-visible network traffic, payloads, API schemas, configuration objects, and stored interaction data. Results: The LLM-assisted phase identified a critical vulnerability: sensitive system and RAG configuration appeared exposed through client-server communication rather than restricted server-side. Manual verification confirmed that ordinary browser inspection allowed collection of the system prompt, model and embedding configuration, retrieval parameters, backend endpoints, API schema, document and chunk metadata, knowledge-base content, and the 1,000 most recent patient-chatbot conversations. The deployment also contradicted its privacy assurances: full conversation records, including health-related queries, were retrievable without authentication. Conclusions: Serious privacy and security failures in patient-facing RAG chatbots can be identified with standard browser tools, without specialist skills or authentication; independent review should be a prerequisite for deployment. Commercial LLMs accelerated this assessment, including under a false developer persona; assistance available to auditors is equally available to adversaries.

2605.00789 2026-05-04 cs.CV cs.AI cs.LG

Make Your LVLM KV Cache More Lightweight

Xihao Chen, Yangyang Guo, Roger Zimmermann

Comments Accepted to Transactions on Machine Learning Research (TMLR), 2026

详情
英文摘要

Key-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the prefill stage. To tackle this problem, we propose LightKV, a novel approach that reduces KV cache size by exploiting the redundancy among vision-token embeddings. Guided by text prompts, LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill. This prompt-aware guidance distinguishes our method from prior vision-only compression strategies. We evaluate LightKV on eight open-source LVLMs across eight public benchmark datasets, e.g., MME and SeedBench. Experimental results demonstrate that with only 55% of the original vision tokens, LightKV (a) halves the vision-token KV cache size, (b) reduces computation by up to 40%, and (c) preserves general-purpose performance while significantly outperforming existing baselines.

2605.00788 2026-05-04 cs.CR

Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift

Adam Arthur, Christopher Schwartz

Comments 2 figures

详情
英文摘要

Public image diffusion models are now powerful enough that an attacker without the resources to train a tabular-specific generator may repurpose one off the shelf. This study tests that possibility directly. An unmodified Stable Diffusion U-Net is applied to the UCI Adult Income dataset by reshaping each row into a small single-channel pseudo-image. The architecture's inductive bias toward spatial locality makes feature placement a design variable, and several layouts are tested. However, this is only the beginning of the story, as this paper also draws two philosophical distinctions. One separates statistical from perceptual realism: whether synthetic content holds up to a machine's correlation audits or a human's sensory inspection. The other introduces synthetic evidence as a category alongside synthetic media: AI-generated material whose consumer is a machine in a closed evidentiary pipeline rather than a person in an open information system. An attacker succeeds with synthetic evidence by thinking like the machine that will receive it. And the more the attacker succeeds, the more they can induce ground truth drift: the silent reclassification of AI-generated outputs as authentic when reused in pipelines that do not interrogate their provenance.

2605.00787 2026-05-04 cs.LG

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

Stavros Orfanoudakis, Pedro P. Vergara

Comments Reinforcement Learning

详情
英文摘要

While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy updates directly in the action space. To bridge this gap, a geometry-aware RL algorithm that explicitly incorporates value-based similarity into the policy update, State-Action Value Geometry Optimization (SAVGO), is proposed. In detail, SAVGO learns a joint state-action embedding space in which pairs with similar action-value estimates exhibit high cosine similarity, while dissimilar pairs are mapped to distinct directions. This learned geometry enables the generation of a similarity kernel over candidate actions sampled at each update, allowing policy improvement to be guided directly toward higher-value regions beyond local gradient-based updates. As a result, representation learning, value estimation, and policy optimization are unified within a single geometry-consistent objective, while preserving the scalability of off-policy actor-critic training. The proposed method is evaluated on standard MuJoCo continuous-control benchmarks, demonstrating improvements over strong baselines on challenging high-dimensional tasks. Ablation studies are done to analyze the contributions of value-geometry learning and similarity-based policy updates.

2605.00782 2026-05-04 cs.SE cs.AI

GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair

Yinhao Xiao, Rongbo Xiao, Yihan Zhang

详情
英文摘要

Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS systems generate fluent scripts but rarely enforce these geographic rules at scale. We present GeoContra, a verification and repair framework for LLM-driven Python GIS workflows. It represents each task as an executable geospatial contract-including natural-language questions, schemas, CRS metadata, expected outputs, spatial predicates, topology, metrics, required operations, and forbidden shortcuts. Generated programs undergo static rule inspection, runtime validation, and semantic verification, with violations fed back into a bounded repair loop. Evaluated on 7,079 real geospatial tasks across 15 Boston-area zones, 9 task families, and 11 open-source models (600 runs each), GeoContra improves spatial correctness on closed models from 47.6% to 77.5% for DeepSeek-V4 and from 57.7% to 81.5% for Kimi-K2.5. Across 11 open models, average correctness rises by 26.6%. GeoContra turns fluent code production into verifiable spatial analysis, catching negative travel times, CRS/field-schema violations, missing predicates, and brittle output casts that otherwise yield executable but geographically invalid results.

2605.00781 2026-05-04 cs.CV

Map2World: Segment Map Conditioned Text to 3D World Generation

Jaeyoung Chung, Suyoung Lee, Jianfeng Xiang, Jiaolong Yang, Kyoung Mu Lee

Comments project page: https://robot0321.github.io/Map2World/index.html

详情
英文摘要

3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.

2605.00778 2026-05-04 cs.LG q-bio.NC

Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

Jacques Raynal, Pierre Slangen, Jacques Margerit

Comments 1 table, 4 figures. Exploratory single-case study

详情
英文摘要

In biomechanical systems, observable performance is often used as a proxy for underlying system organization. However, this assumption implicitly presumes a correspondence between output metrics and internal system states that may not hold in adaptive systems. In this study, the vertical dimension of occlusion (VDO) is considered as a constraint applied to an adaptive neuromechanical system, enabling the exploration of system-level responses under controlled variations. A single-case design in a patient with Parkinson's disease allows an intra-individual analysis across repeated conditions.The analysis is structured across three complementary levels: (i) aggregated linear metrics describing observable performance, (ii) a dynamical systems framework describing temporal organization in state space, and (iii) a latent space representation obtained through unsupervised embedding. The results show that conditions with comparable observable performance may correspond to different organizations in both state space and latent space representations. This dissociation highlights a limitation of aggregated metrics and suggests that similar outputs may arise from non-equivalent system states. A fourth level is proposed as a purely conceptual extension describing potential relationships between system states. This level is not implemented and is not derived from experimental data. These observations are strictly exploratory and non-causal. The proposed framework does not establish mechanistic, predictive, or directional relationships, but provides a structured approach for analyzing constraint-driven systems across multiple levels of representation.

2605.00777 2026-05-04 cs.SD cs.CL eess.AS

LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

Venkata Pushpak Teja Menta

Comments 7 pages, 2 figures, 2 tables. Code, model, and datasets at https://github.com/praxelhq/lase

详情
英文摘要

A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pair Western-accented voice corpus across English, Hindi, Telugu, and Tamil, WavLM-base-plus-sv loses 0.082 absolute cosine similarity when the same voice changes script and ECAPA-TDNN loses 0.105. On a 1369-pair Indian-accented voice corpus, the gap shrinks to 0.006 (WavLM-SV) and 0.044 (ECAPA-TDNN). The leak is largest where it matters most for cross-script TTS: when a system projects a non-Indic-trained voice into Indic scripts. We present LASE (Language-Adversarial Speaker Encoder), a small projection head over frozen WavLM-base-plus trained with two losses: a supervised contrastive loss over voice identity, and a gradient-reversal cross-entropy against a 4-language classifier that pushes the embedding to be language-uninformative while remaining speaker-informative. Trained on 1118 quality-gated cross-script pairs synthesised from 8 commercial multilingual voices, LASE's residual gap is consistent with zero on both corpora (Delta = 0.013 Western, Delta = 0.026 Indian; both bootstrap 95% CIs include zero) and amplifies the cross-script-vs-floor margin 2.4-2.7x over both baselines. An ECAPA+GRL ablation shows the GRL objective improves either backbone but the WavLM choice contributes too. In synthetic multi-speaker diarisation, LASE matches ECAPA-TDNN on cross-script speaker recall (0.788 vs 0.789) with ~100x less training data. We release the r1 checkpoint, both corpora, and the bootstrap recipe.

2605.00776 2026-05-04 cs.CL cs.AI

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

Scott Friedman, Ruta Wheelock, Sonja Schmer-Galunder, Drisana Iverson, Jake Vasilakes, Joan Zheng, Jeffrey Rye, Vasanth Sarathy, Christopher Miller

Comments 32 pages, 12 figures, 7 tables

详情
英文摘要

The language in online platforms, influence operations, and political rhetoric frequently directs a mix of pro-social sentiment (e.g., advocacy, helpfulness, compassion) and anti-social sentiment (e.g., threats, opposition, blame) at different topics, all in the same message. While many natural language processing (NLP) tools classify or score a text's overall sentiment as positive, neutral, or negative, these tools cannot report that positive and negative sentiments coexist, and they cannot report the target of those sentiments. This paper presents the Directed Social Regard (DSR) approach to multi-dimensional, multi-valence sentiment analysis, comprised of a pair of transformer-based models that (1) detects span-level targets of sentiment in a message and then (2) scores all spans within the message context along three (-1, 1) axes of regard that are motivated by social science theories of moral disengagement and moral framing. We present a data collection and annotation strategy for DSR dataset construction, a transformer-based architecture for span-level scoring, and a validation study with promising results. We apply the validated DSR model on six third-party datasets of online media and report meaningful correlations between DSR outputs and the labels and topics in these pre-existing social science datasets.

2605.00773 2026-05-04 math.CT cs.LO

The Synthetic Sierpiński Cone

Fredrik Bakke, Jonathan Sterling, Mark Damuni Williams, Lingyuan Ye

详情
英文摘要

In domains, categories, and toposes, the Sierpiński cone construction glues onto a space a universal closed point lying below all the other points. Although this is a lax colimit, it also enjoys a well-known right-handed universal property: the Sierpiński cone classifies partial maps defined on an open subspace. The situation proves more subtle in synthetic models of space based on extending homotopy type theory with an interval, as in several recent approaches to synthetic higher categories and domains: although globally it may well be the case that the Sierpiński cone classifies partial maps, this property cannot hold of all parameterised types without degenerating the theory. On the other hand, there are reflective subuniverses within which the classifying property nonetheless holds. We show that the largest subuniverse in which the Sierpiński cone classifies partial maps is the accessible localisation at a family of embeddings parameterised in the interval, and this subuniverse is contained within the Segal types; this containment is moreover strict in the sense that when the interval is non-trivial, it is not possible for all Segal types to lie in the subuniverse. We finally extend these results from Sierpiński cones to mapping cylinders, providing a new right-handed universal property for the latter.

2605.00769 2026-05-04 eess.SY cs.SY

Voltage Ride-Through in Large Loads- A Dual PQ Approach

Amir Norouzi, Michael Morel

Comments 10 pages

详情
英文摘要

This paper provides a detailed investigation of voltage ride-through in large loads, such as Artificial Intelligence data centers. Voltage ride-through capability of large loads during transient disturbances in the power grid is important because of the potential impact on the stability and reliability of the Bulk Power System. A mathematical analysis is presented and it is shown how the traditional approach, based on reactive power compensation, may not be adequate for voltage ride-through in large loads. Ultimately, due to capacity limits of the load's power distribution infrastructure and grid's constraints, there is a limit to using reactive power as a corrective tool. A new dual active and reactive power (PQ) approach is proposed in which non-grid resources with dynamic P and Q capabilities are shown to be needed to help with voltage ride-through. Additionally, the analysis illustrates that at extreme voltage dips in the power grid maintaining an acceptable level of load voltage can become practically or theoretically unattainable, which may lead to the load's disconnection from the grid. Analytical results are provided with practical numerical examples.

2605.00764 2026-05-04 cs.CV cs.HC

Modeling Subjective Urban Perception with Human Gaze

Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer

详情
英文摘要

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.

2605.00762 2026-05-04 cs.LG cs.AI cs.MA

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Shradha Sharma, Swapnil Dhamal, Shweta Jain

详情
英文摘要

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.

2605.00761 2026-05-04 cs.IT eess.SP math.IT

The Benefit of Decoder-Provided Pilots in Highly Dynamic Channels

Duschia Bodet, Muriel Médard, Muralidhar Rangaswamy, Ken Duffy

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Communications in highly dynamic channels relying on training-based channel estimation experience a trade-off between increasing channel measurement accuracy by sending more frequent training sequences and increasing data rate by sending fewer training sequences. Simultaneously, most communication systems use forward error correction to enable error detection and correction at the receiver. This paper presents decoder-provided pilots for time-varying channels by using decoded codewords as training sequences to update the channel estimate at the receiver. In contrast to approaches such as data-aided channel estimation, decision-feedback equalization, joint channel estimation and error correction, and turbo equalization, the decoder-provided pilots approach is non-iterative, which is ideal for low-latency requirements in highly dynamic scenarios. Furthermore, it is modulation-, code-, and decoder-agnostic, meaning it can be implemented on top of virtually any communication system that uses forward error correction. From an information-theoretic perspective, we derive the fundamental limits of decoder-provided pilots' ability to simultaneously sense the channel and transmit data. Simulation results demonstrate that decoder-provided pilots significantly improve performance, that when coding across frequency, soft-output can further enhance performance, and that when coding across time, short codes can outperform long codes of the same rate in fast-fading channels.

2605.00760 2026-05-04 cs.LG

Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries

Rodolphe Barlogis, Ferhat Tamssaouet, Quentin Falcoz, Stéphane Grieu

Comments 24 pages, 16 figures

详情
英文摘要

This paper deals with solving the 2D Helmholtz equation on non-parametric domains, leveraging a physics-informed neural operator network based on the DeepONet framework. We consider a 2D square domain with an inclusion of arbitrary boundary geometry at its center. This inclusion acts as a scatterer for an incoming harmonic wave. The aim is to learn the operator linking the geometry of the scatterer to the resulting scattered field. A signed distance function to the boundary of the inner inclusion, evaluated at several points in the domain, is used to encode its geometry. It serves as input for the branch part of the DeepONet architecture, while local information is used as input for the trunk part. This approach enables the encoding of arbitrary geometries, whether they are parameterized or not. The evaluation of the model on unseen geometries is compared with its finite element method (FEM) equivalent to test its generalization capabilities. The trained network weights implicitly embed the local physics and their interaction with the domain geometry. If the training space sufficiently covers the target evaluation space, the model can generalize accordingly. Furthermore, it can be refined to extend to another region of interest without retraining from scratch. This framework also avoids the need to remesh the domain for each geometry. The proposed approach delivers a computationally lighter surrogate model than FEM alternatives and avoids relying on FEM-generated training data.

2605.00758 2026-05-04 physics.soc-ph cs.GT

Optimal network structure for collective performance with strategic information sharing

Ye Wang, Andrea Civilini, Anzhi Sheng, Xiaojie Chen, Long Wang, Vito Latora

详情
英文摘要

Information sharing between individuals is crucial to improve performance in collective tasks. However, in a competitive world, individuals may be reluctant to share information with the others, and it is still unclear how the presence of strategic behaviors affects the collective performance of a group. In this study, we introduce an evolutionary game modeling the dynamics of individual behaviors in a collective estimation task. The individuals are organized in a network and have to guess the distribution of ball colors in a box. Each of them samples a given number of balls and can strategically decide whether to share or not this information with its neighbors. We develop a framework that allows to investigate analytically how the collective performance depends on the network structure. We find that the optimal network results from a trade-off between the sharing rate and the way the information is integrated in the network. We further reveal that there exists an intermediate average degree for each type of network maximizing the collective performance. In addition to the uniform case, we consider the case of non-homogeneous allocations of the number of individual samples, showing that the largest collective performance is obtained when the number of ball extracted by an individual is inversely proportional to its degree.

2605.00755 2026-05-04 cs.NI

AdvNet: Revealing Performance Issues in Network Protocols by Generating Adversarial Environments

Shehab Sarar Ahmed, William Sentosa, Yinjie Zhang, Yoav Lebendiker, Michael Shnaiderman, Tomer Gilad, Nathan H. Jay, Brighten Godfrey, Michael Schapira

Comments 18 pages, 8 figures

详情
英文摘要

Infrastructure protocols like Congestion Control (CC) seek to provide reliable performance across a wide range of Internet environments. Currently, protocol designers assess performance through hand-designed test cases or data sets captured from real environments. However, such approaches may inadvertently overlook critical facets of the algorithm's behavior when they encounter an unanticipated environment or workload. We seek to understand the unanticipated with \sys, a system that automatically generates adversarial network environments that cause a target protocol implementation to perform poorly. AdvNet employs machine learning-based optimization to generate environments, and incorporates a robust noise-handling mechanism to mitigate the variability inherent in real-world protocol performance. Although our approach is more general, this paper focuses specifically on transport protocols and their CC implementations. We showcase AdvNet's capability to create adversarial scenarios for 27 kernel-space implementations of both single-path and multi-path CC protocols, for several use cases with different performance goals. AdvNet identifies problematic network conditions that expose previously unnoticed Linux kernel bugs and uncovers hidden limitations in CC implementations, and provides insights about robustness. These results suggest that automated adversarial testing can be a valuable tool in protocol development, and that robustness is a useful new dimension for benchmarking CC protocols.

2605.00752 2026-05-04 eess.SY cs.LO cs.SY

HyperCertificates: Verification of Discrete-time Dynamical Systems against HyperLTL Specifications

Vishnu Murali, Amin Falah, Ashutosh Trivedi, Majid Zamani

Comments 24 pages, 3 figures, 1 table

详情
英文摘要

We introduce a functional inductive framework to verify discrete-time dynamical systems against hyperproperties specified as Hyperlinear temporal logic formulae via a notion of HyperCertificates. Unlike linear temporal logic (LTL) formulae which are concerned with individual traces of a system, hyperproperties are properties that are concerned with how the traces of a system relate to one another. HyperLTL is an extension of LTL for hyperproperties, and is useful to describe specifications such as opacity, privacy as well as notions of robustness. Our notion of HyperCertificates consists of a pair of functions, where the first models the lookahead, and the second relies on a combination of barrier and ranking functions. We use closure certificates, to act as a model for this lookahead and then rely on barrier and ranking function arguments modulo this lookahead to provide guarantees against HyperLTL formulae. We demonstrate how our approach is automatable via existing techniques such as sum-of-squares optimization (SOS) and satisfiability modulo theories (SMT) solvers. Finally, we demonstrate our approach on some case studies.

2605.00751 2026-05-04 cs.LG

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

Comments Accepted by ICML 2026 as Spotlight

详情
英文摘要

Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.