Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision
Comments Project webpage: https://glab-caltech.github.io/converseg/
Aadarsh Sahoo, Georgia Gkioxari
Comments Project webpage: https://glab-caltech.github.io/converseg/
Conversational image segmentation grounds abstract, intent-driven concepts into pixel-accurate masks. Prior work on referring image grounding focuses on categorical and spatial queries (e.g., "left-most apple") and overlooks functional and physical reasoning (e.g., "where can I safely store the knife?"). We address this gap and introduce Conversational Image Segmentation (CIS) and ConverSeg, a benchmark spanning entities, spatial relations, intent, affordances, functions, safety, and physical reasoning. We also present ConverSeg-Net, which fuses strong segmentation priors with language understanding, and an AI-powered data engine that generates prompt-mask pairs without human supervision. We show that current language-guided segmentation models are inadequate for CIS, while ConverSeg-Net trained on our data engine achieves significant gains on ConverSeg and maintains strong performance on existing language-guided segmentation benchmarks. Project webpage: https://glab-caltech.github.io/converseg/
Mingzhi Sheng, Zekai Gu, Peng Li, Cheng Lin, Hao-Xiang Guo, Ying-Cong Chen, Yuan Liu
Comments Codes: https://github.com/IGL-HKUST/FlexAM
Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of "appearance" and "motion" provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks.
Wei Wei, Foroozan Daneshzand, Zezhong Wang, Erica Mattson, Charles Perin, Sheelagh Carpendale
Co-design is an increasingly popular approach in HCI and visualization, yet there is little guidance on how to effectively apply this method in visualization contexts. In this paper, we visually present our experience of a two-and-a-half-year co-design project with the local arts community. Focusing on facilitating community exploration and sense-making around arts funding distribution, the project involved a series of co-design sessions between visualization researchers and members of the arts community. Through these iterative sessions, we built shared understanding and developed visualization prototypes tailored to community needs. However, the practice is far from complete, and we found ourselves continually returning to the "fuzzy front end" of the co-design process. We share this ongoing story through comic-style visuals and reflect on three fuzzy front ends that we encountered during the project. By sharing these experiences with the visualization community, we hope to offer insights that others can draw on in their own community-engaged co-design work.
Saad Ahmed Jamal, Ammara Nusrat, Muhammad Azmat, Muhammad Osama Nusrat
Comments 28 pages
Effective water resource management depends on accurate projections of flows in water channels. For projected climate data, use of different General Circulation Models (GCM) simulates contrasting results. This study shows selection of GCM for the latest generation CMIP6 for hydroclimate change impact studies. Envelope based method was used for the selection, which includes components based on machine learning techniques, allowing the selection of GCMs without the need for in-situ reference data. According to our knowledge, for the first time, such a comparison was performed for the CMIP6 Shared Socioeconomic Pathway (SSP) scenarios data. In addition, the effect of climate change under SSP scenarios was studied, along with the calculation of extreme indices. Finally, GCMs were compared to quantify spatiotemporal differences between CMIP5 and CMIP6 data. Results provide NorESM2 LM, FGOALS g3 as selected models for the Jhelum and Chenab River. Highly vulnerable regions under the effect of climate change were highlighted through spatial maps, which included parts of Punjab, Jammu, and Kashmir. Upon comparison of CMIP5 and CMIP6, no discernible difference was found between the RCP and SSP scenarios precipitation projections. In the future, more detailed statistical comparisons could further reinforce the proposition.
Jiankun Zhang, Shenglai Zeng, Kai Guo, Xinnan Dai, Hui Liu, Jiliang Tang, Yi Chang
Multimodal Retrieval-Augmented Generation (MRAG) has emerged as a key paradigm for grounding MLLMs with external knowledge. While query pre-processing (e.g., rewriting) is standard in text-based RAG, existing MRAG pipelines predominantly treat visual inputs as static and immutable, implicitly assuming they are noise-free. However, real-world visual queries are often ``imperfect'' -- suffering from geometric distortions, quality degradation, or semantic ambiguity -- leading to catastrophic retrieval failures. To address this gap, we propose V-QPP-Bench, the first comprehensive benchmark dedicated to Visual Query Pre-processing (V-QPP). We formulate V-QPP as an agentic decision-making task where MLLMs must autonomously diagnose imperfections and deploy perceptual tools to refine queries. Our extensive evaluation across 46,700 imperfect queries and diverse MRAG paradigms reveals three critical insights: (1) Vulnerability -- visual imperfections severely degrade both retrieval recall and end-to-end MRAG performance; (2) Restoration Potential \& Bottleneck -- while oracle preprocessing recovers near-perfect performance, off-the-shelf MLLMs struggle with tool selection and parameter prediction without specialized training; and (3) Training Enhancement -- supervised fine-tuning enables compact models to achieve comparable or superior performance to larger proprietary models, demonstrating the benchmark's value for developing robust MRAG systems The code is available at https://github.com/phycholosogy/VQQP_Bench
Swati Gupta, Jai Moondra, Mohit Singh
OMD and its variants give a flexible framework for OCO where the performance depends crucially on the choice of the mirror map. While the geometries underlying OPGD and OEG, both special cases of OMD, are well understood, it remains a challenging open question on how to construct an optimal mirror map for any given constrained set and a general family of loss functions, e.g., sparse losses. Motivated by parameterizing a near-optimal set of mirror maps, we consider a simpler question: is it even possible to obtain polynomial gains in regret by using mirror maps for geometries that interpolate between $L_1$ and $L_2$, which may not be possible by restricting to only OEG ($L_1$) or OPGD ($L_2$). Our main result answers this question positively. We show that mirror maps based on block norms adapt better to the sparsity of loss functions, compared to previous $L_p$ (for $p \in [1, 2]$) interpolations. In particular, we construct a family of online convex optimization instances in $\mathbb{R}^d$, where block norm-based mirror maps achieve a provable polynomial (in $d$) improvement in regret over OEG and OPGD for sparse loss functions. We then turn to the setting in which the sparsity level of the loss functions is unknown. In this case, the choice of geometry itself becomes an online decision problem. We first show that naively switching between OEG and OPGD can incur linear regret, highlighting the intrinsic difficulty of geometry selection. To overcome this issue, we propose a meta-algorithm based on multiplicative weights that dynamically selects among a family of uniform block norms. We show that this approach effectively tunes OMD to the sparsity of the losses, yielding adaptive regret guarantees. Overall, our results demonstrate that online mirror-map selection can significantly enhance the ability of OMD to exploit sparsity in online convex optimization.
Seth Donahue, J. D. Peiffer, R. Tyler Richardson, Yishan Zhong, Shaun Q. Y. Tan, Benoit Marteau, Stephanie R. Russo, May D. Wang, R. James Cotton, Ross Chafetz
To validate a clinically accessible approach for quantifying the Upper Extremity Reachable Workspace (UERW) using a single (monocular) camera and Artificial Intelligence (AI)-driven Markerless Motion Capture (MMC) for biomechanical analysis. Objective assessment and validation of these techniques for specific clinically oriented tasks are crucial for their adoption in clinical motion analysis. AI-driven monocular MMC reduces the barriers to adoption in the clinic and has the potential to reduce the overhead for analysis of this common clinical assessment. Nine adult participants with no impairments performed the standardized UERW task, which entails reaching targets distributed across a virtual sphere centered on the torso, with targets displayed in a VR headset. Movements were simultaneously captured using a marker-based motion capture system and a set of eight FLIR cameras. We performed monocular video analysis on two of these video camera views to compare a frontal and offset camera configurations. The frontal camera orientation demonstrated strong agreement with the marker-based reference, exhibiting a minimal mean bias of $0.61 \pm 0.12$ \% reachspace reached per octanct (mean $\pm$ standard deviation). In contrast, the offset camera view underestimated the percent workspace reached ($-5.66 \pm 0.45$ \% reachspace reached). Conclusion: The findings support the feasibility of a frontal monocular camera configuration for UERW assessment, particularly for anterior workspace evaluation where agreement with marker-based motion capture was highest. The overall performance demonstrates clinical potential for practical, single-camera assessments. This study provides the first validation of monocular MMC system for the assessment of the UERW task. By reducing technical complexity, this approach enables broader implementation of quantitative upper extremity mobility assessment.
Torkel E. Loman, Yurij Salmaniw, Antonio Leon Villares, Jose A. Carrillo, Ruth E. Baker
Comments 16 pages with 6 figures. Additional 24 pages and 19 figures supplementary information
Partial differential equations often contain unknown functions that are difficult or impossible to measure directly, hampering our ability to derive predictions from the model. Workflows for recovering scalar PDE parameters from data are well studied: here we show how similar workflows can be used to recover functions from data. Specifically, we embed neural networks into the PDE and show how, as they are trained on data, they can approximate unknown functions with arbitrary accuracy. Using nonlocal aggregation-diffusion equations as a case study, we recover interaction kernels and external potentials from steady state data. Specifically, we investigate how a wide range of factors, such as the number of available solutions, their properties, sampling density, and measurement noise, affect our ability to successfully recover functions. Our approach is advantageous because it can utilise standard parameter-fitting workflows, and in that the trained PDE can be treated as a normal PDE for purposes such as generating system predictions.
Yoav Moran, Oded Schwartz, Shuncheng Yuan
Comments 21 pages, 2 tables
Fast matrix multiplication algorithms are asymptotically faster than the classical cubic-time algorithm, but they are often slower in practice. One important obstacle is the use of complex coefficients, which increases arithmetic overhead and limits practical efficiency. This paper focuses on transforming complex-coefficient matrix multiplication schemes into equivalent real- or rational-coefficient ones. We present a systematic method that, given a complex-coefficient scheme, either constructs a family of equivalent rational algorithms or proves that no equivalent rational scheme exists. Our approach relies only on basic linear-algebraic properties of similarity transformations of complex matrices. This method recovers the previously known ad hoc results of Dumas, Pernet, and Sedoglavic (2025) and extends them to more general settings, including algorithms involving rational coefficients and square roots, with $i=\sqrt{-1}$ as a special case. Using this framework, we show that no rational scheme is equivalent to Smirnov's $\langle4,4,9,104\rangle$ $\mathbb{Q}[\sqrt{161}]$ algorithm (2022) and that no real scheme is equivalent to the $\langle4,4,4,48\rangle$ complex algorithm of Kaporin (2024). More generally, our approach can also be used to prove the non-existence of integer-coefficient schemes.
Saleha Muzammil, Mughees Ur Rehman, Zoe Kotti, Diomidis Spinellis
Comments Published at the 23rd International Conference on Mining Software Repositories
Journal ref In: Proceedings of the 23rd International Conference on Mining Software Repositories (MSR 2026), ACM, 2026
Software source code often harbours "hotspots": small portions of the code that change far more often than the rest of the project and thus concentrate maintenance activity. We mine the complete version histories of 91 evolving, actively developed GitHub repositories and identify 15 recurring line-level hotspot patterns that explain why these hotspots emerge. The three most prevalent patterns are Pinned Version Bump (26%), revealing brittle release practices; Long Line Change (17%), signalling deficient layout; and Formatting Ping-Pong (9%), indicating missing or inconsistent style automation. Surprisingly, automated accounts generate 74% of all hotspot edits, suggesting that bot activity is a dominant but largely avoidable source of noise in change histories. By mapping each pattern to concrete refactoring guidelines and continuous integration checks, our taxonomy equips practitioners with actionable steps to curb hotspots and systematically improve software quality in terms of configurability, stability, and changeability.
Dong Han, Yong Li, Joachim Denzler
Comments Accepted to AAAI 2026
With the advancement of face recognition (FR) systems, privacy-preserving face recognition (PPFR) systems have gained popularity for their accurate recognition, enhanced facial privacy protection, and robustness to various attacks. However, there are limited studies to further verify privacy risks by reconstructing realistic high-resolution face images from embeddings of these systems, especially for PPFR. In this work, we propose the face embedding mapping (FEM), a general framework that explores Kolmogorov-Arnold Network (KAN) for conducting the embedding-to-face attack by leveraging pre-trained Identity-Preserving diffusion model against state-of-the-art (SOTA) FR and PPFR systems. Based on extensive experiments, we verify that reconstructed faces can be used for accessing other real-word FR systems. Besides, the proposed method shows the robustness in reconstructing faces from the partial and protected face embeddings. Moreover, FEM can be utilized as a tool for evaluating safety of FR and PPFR systems in terms of privacy leakage. All images used in this work are from public datasets.
Shlomi Dolev, Ehud Gudes, Daniel Shlomo
The rapid growth of decentralized systems in theWeb3 ecosystem has introduced numerous challenges, particularly in ensuring data security, privacy, and scalability [3, 8]. These systems rely heavily on distributed architectures, requiring robust mechanisms to manage data and interactions among participants securely. One critical aspect of decentralized systems is key management, which is essential for encrypting files, securing database segments, and enabling private transactions. However, securely managing cryptographic keys in a distributed environment poses significant risks, especially when nodes in the network can be compromised [9]. This research proposes a decentralized database scheme specifically designed for secure and private key management. Our approach ensures that cryptographic keys are not stored explicitly at any location, preventing their discovery even if an attacker gains control of multiple nodes. Instead of traditional storage, keys are encoded and distributed using the BFLUT (Bloom Filter for Private Look-Up Tables) algorithm [7], which enables secure retrieval without direct exposure. The system leverages OrbitDB [4], IPFS [1], and IPNS [10] for decentralized data management, providing robust support for consistency, scalability, and simultaneous updates. By combining these technologies, our scheme enhances both security and privacy while maintaining high performance and reliability. Our findings demonstrate the system's capability to securely manage keys, prevent unauthorized access, and ensure privacy, making it a foundational solution for Web3 applications requiring decentralized security.
Hugo Henry, Arthur Tsai, Kelly Cohen
Comments 12 pages, 12 figures, conference paper
This paper presents a hybrid obstacle avoidance architecture that integrates Optimal Control under clearance with a Fuzzy Rule Based System (FRBS) to enable adaptive constraint handling for unmanned aircraft. Motivated by the limitations of classical optimal control under uncertainty and the need for interpretable decision making in safety critical aviation systems, we design a three stage Takagi Sugeno Kang fuzzy layer that modulates constraint radii, urgency levels, and activation decisions based on regulatory separation minima and airworthiness guidelines from FAA and EASA. These fuzzy-derived clearances are then incorporated as soft constraints into an optimal control problem solved using the FALCON toolbox and IPOPT. The framework aims to reduce unnecessary recomputations by selectively activating obstacle avoidance updates while maintaining compliance with aviation procedures. A proof of concept implementation using a simplified aircraft model demonstrates that the approach can generate optimal trajectories with computation times of 2,3 seconds per iteration in a single threaded MATLAB environment, suggesting feasibility for near real time applications. However, our experiments revealed a critical software incompatibility in the latest versions of FALCON and IPOPT, in which the Lagrangian penalty term remained identically zero, preventing proper constraint enforcement. This behavior was consistent across scenarios and indicates a solver toolbox regression rather than a modeling flaw. Future work includes validating this effect by reverting to earlier software versions, optimizing the fuzzy membership functions using evolutionary methods, and extending the system to higher fidelity aircraft models and stochastic obstacle environments.
Saitarun Nadipineni, Chenhao Hong, Tanishtha Ramlall, Chapa Sirithunge, Kaspar Althoefer, Fumiya Iida, Thilina Dulantha Lalitharatne
Soft robotics has emerged as a versatile field with applications across various domains, from healthcare to industrial automation, and more recently, art and interactive installations. The inherent flexibility, adaptability, and safety of soft robots make them ideal for applications that require delicate, organic, and lifelike movement, allowing for immersive and responsive interactions. This study explores the intersection of human emotions, soft robotics, and art to establish and create new forms of human emotion-mediated soft robotic art. In this paper, we introduce two soft embodiments: a soft character and a soft flower as an art display that dynamically responds to brain signals based on alpha waves, reflecting different emotion levels. We present how human emotions can be measured as alpha waves based on brain/EEG signals, how we map the alpha waves to the dynamic movements of the two soft embodiments, and demonstrate our proposed concept using experiments. The findings of this study highlight how soft robotics can embody human emotional states, offering a new medium for insightful artistic expression and interaction, and demonstrating how art displays can be embodied.
Ashwin Satish Menon, Eric R. Damm, Eli S. Lancaster, Felix A. Sanchez, Jason M. Gregory, Thomas M. Howard
Comments 12 pages, 8 figures
Due to sensor limitations, environments that off-road mobile robots operate in are often only partially observable. As the robots move throughout the environment and towards their goal, the optimal route is continuously revised as the sensors perceive new information. In traditional autonomous navigation architectures, a regional motion planner will consume the environment map and output a trajectory for the local motion planner to use as a reference. Due to the continuous revision of the regional plan guidance as a result of changing map information, the reference trajectories which are passed down to the local planner can differ significantly across sequential planning cycles. This rapidly changing guidance can result in unsafe navigation behavior, often requiring manual safety interventions during autonomous traversals in off-road environments. To remedy this problem, we propose Temporally-Sampled Efficiently Adaptive State Lattices (TSEASL), which is a regional planner arbitration architecture that considers updated and optimized versions of previously generated trajectories against the currently generated trajectory. When tested on a Clearpath Robotics Warthog Unmanned Ground Vehicle as well as real map data collected from the Warthog, results indicate that when running TSEASL, the robot did not require manual interventions in the same locations where the robot was running the baseline planner. Additionally, higher levels of planner stability were recorded with TSEASL over the baseline. The paper concludes with a discussion of further improvements to TSEASL in order to make it more generalizable to various off-road autonomy scenarios.
Sean Bowerfind, Matthew R. Kirchner, Gary Hewer
Presented is an algorithm to synthesize the optimal infinite-horizon LQR feedback controller for continuous-time systems. The algorithm does not require knowledge of the system dynamics but instead uses only a finite-length sampling of arbitrary input-output data. The algorithm is based on a constrained optimization problem that enforces a necessary condition on the dynamics of the optimal value function along any trajectory. In addition to calculating the standard LQR gain matrix, a feedforward gain can be found to implement a reference tracking controller. This paper presents a theoretical justification for the method and shows several examples, including a validation test on a real scale aircraft.
Ruiqi Wang, Yiming Yang, Atif Shamim
Reconfigurable intelligent surfaces (RIS) are conventionally implemented as two-dimensional (2D) electromagnetic (EM) structures to steer incident waves toward desired reflection angles. This approach limits the reflection to a single hemisphere, and the beam-scanning range is relatively small. In this work, a novel three-dimensional (3D) RIS concept is proposed, where beam-scanning can be realized not only through reflection from the illuminated surface but also through controlled transmission toward adjacent surfaces, enabling near blind-spot-free coverage in the full 3D spatial domain. A cube-based 3D-RIS design operating at millimeter-wave (mm-Wave) frequencies and consisting of six interconnected RIS surfaces is presented. Each surface integrates reconfigurable receiving and reflecting arrays with orthogonal polarizations to ensure intrinsic EM isolation, while a reconfigurable feeding network supports dynamic operation. A subarray-based synthesis approach with binary amplitude gating and predefined phase offsets is developed through a unified theoretical model. This model, validated through full-wave simulations, enables efficient beam switching through a shared aperture. Based on this framework, an 8 x 12 element surface comprising six 4 x 4 subarrays is designed, with each surface covering an angular range from -30 deg to +30 deg. The experimental prototype has been characterized in the 24 to 30 GHz band, and the results demonstrate a gain enhancement of 14.7 dB for reflection, while 14.1 dB is achieved for transmission to the neighboring surface. Finally, wireless communication trials using the Pluto software-defined radio platform combined with frequency up/down converters confirm improved constellation quality and a 6-7 dB improvement in error vector magnitude (EVM) for both reflection and neighboring surface transmission scenarios.
Sergey Goncharov, Dirk Hofmann, Pedro Nora, Lutz Schröder, Paul Wild
Distributive laws of set functors over the powerset monad (also known as Kleisli laws for the powerset monad) are well-known to be in one-to-one correspondence with extensions of set functors to functors on the category of sets and relations. We study the question of existence and uniqueness of such distributive laws. Our main result entails that an accessible set functor admits a distributive law over the powerset monad if and only if it preserves weak pullbacks, in which case the so-called power law (which induces the Barr extension) is the unique one. Furthermore, we show that the powerset functor admits exactly three distributive laws over the powerset monad, revealing that uniqueness may fail for non-accessible functors.
Pingzhi Li, Hongxuan Li, Zirui Liu, Xingcheng Lin, Tianlong Chen
Comments Code is at https://github.com/UNITES-Lab/flash-molecular-dynamics
Graph neural network (GNN) potentials such as SchNet improve the accuracy and transferability of molecular dynamics (MD) simulation by learning many-body interactions, but remain slower than classical force fields due to fragmented kernels and memory-bound pipelines that underutilize GPUs. We show that a missing principle is making GNN-MD IO-aware, carefully accounting for reads and writes between GPU high-bandwidth memory (HBM) and on-chip SRAM. We present FlashSchNet, an efficient and accurate IO-aware SchNet-style GNN-MD framework built on four techniques: (1) flash radial basis, which fuses pairwise distance computation, Gaussian basis expansion, and cosine envelope into a single tiled pass, computing each distance once and reusing it across all basis functions; (2) flash message passing, which fuses cutoff, neighbor gather, filter multiplication, and reduction to avoid materializing edge tensors in HBM; (3) flash aggregation, which reformulates scatter-add via CSR segment reduce, reducing atomic writes by a factor of feature dimension and enabling contention-free accumulation in both forward and backward passes; (4) channel-wise 16-bit quantization that exploits the low per-channel dynamic range in SchNet MLP weights to further improve throughput with negligible accuracy loss. On a single NVIDIA RTX PRO 6000, FlashSchNet achieves 1000 ns/day aggregate simulation throughput over 64 parallel replicas on coarse-grained (CG) protein containing 269 beads (6.5x faster than CGSchNet baseline with 80% reduction of peak memory), surpassing classical force fields (e.g. MARTINI) while retaining SchNet-level accuracy and transferability.
Chenguang Wang, Zihan Zhou, Lei Bai, Tianshu Yu
Template-free retrosynthesis methods treat the task as black-box sequence generation, limiting learning efficiency, while semi-template approaches rely on rigid reaction libraries that constrain generalization. We address this gap with a key insight: atom ordering in neural representations matters. Building on this insight, we propose a structure-aware template-free framework that encodes the two-stage nature of chemical reactions as a positional inductive bias. By placing reaction center atoms at the sequence head, our method transforms implicit chemical knowledge into explicit positional patterns that the model can readily capture. The proposed RetroDiT backbone, a graph transformer with rotary position embeddings, exploits this ordering to prioritize chemically critical regions. Combined with discrete flow matching, our approach decouples training from sampling and enables generation in 20--50 steps versus 500 for prior diffusion methods. Our method achieves state-of-the-art performance on both USPTO-50k (61.2% top-1) and the large-scale USPTO-Full (51.3% top-1) with predicted reaction centers. With oracle centers, performance reaches 71.1% and 63.4% respectively, surpassing foundation models trained on 10 billion reactions while using orders of magnitude less data. Ablation studies further reveal that structural priors outperform brute-force scaling: a 280K-parameter model with proper ordering matches a 65M-parameter model without it.
Huishi Luo, Shuokai Li, Hanchen Yang, Zhongbo Sun, Haojie Ding, Boheng Zhang, Zijia Cai, Renliang Qian, Fan Yang, Tingting Gao, Chenyi Lei, Wenwu Ou, Fuzhen Zhuang
Awakening dormant users, who remain engaged but exhibit low conversion, is a pivotal driver for incremental GMV growth in large-scale e-commerce platforms. However, existing approaches often yield suboptimal results since they typically rely on single-step estimation of an item's intrinsic value (e.g., immediate click probability). This mechanism overlooks the instrumental effect of items, where specific interactions act as triggers to shape latent intent and drive subsequent decisions along a conversion trajectory. To bridge this gap, we propose RoleGen, a novel framework that synergizes a Conversion Trajectory Reasoner with a Generative Behavioral Backbone. Specifically, the LLM-based Reasoner explicitly models the context-dependent Functional Role of items to reconstruct intent evolution. It further employs counterfactual inference to simulate diverse conversion paths, effectively mitigating interest collapse. These reasoned candidate items are integrated into the generative backbone, which is optimized via a collaborative "Reasoning-Execution-Feedback-Reflection" closed-loop strategy to ensure grounded execution. Extensive offline experiments and online A/B testing on the Kuaishou e-commerce platform demonstrate that RoleGen achieves a 6.2% gain in Recall@1 and a 7.3% increase in online order volume, confirming its effectiveness in activating the dormant user base.
Zhipeng Li, Yi-Chi Liao, Christian Holz
Generative models are increasingly powerful, yet users struggle to guide them through prompts. The generative process is difficult to control and unpredictable, and user instructions may be ambiguous or under-specified. Prior prompt refinement tools heavily rely on human effort, while prompt optimization methods focus on numerical functions and are not designed for human-centered generative tasks, where feedback is better expressed as binary preferences and demands convergence within few iterations. We present APPO, a preference-guided prompt optimization algorithm. Instead of iterating prompts, users only provide binary preferential feedback. APPO adaptively balances its strategies between exploiting user feedback and exploring new directions, yielding effective and efficient optimization. We evaluate APPO on image generation, and the results show APPO enables achieving satisfactory outcomes in fewer iterations with lower cognitive load than manual prompt editing. We anticipate APPO will advance human-AI collaboration in generative tasks by leveraging user preferences to guide complex content creation.
Mohamed Tarraf, Alex Chan, Alex Yakovlev, Rishad Shafik
Comments Pre-print of latest work
Binary Neural Networks (BNNs) offer a low-complexity and energy-efficient alternative to traditional full-precision neural networks by constraining their weights and activations to binary values. However, their discrete, highly non-linear behavior makes them difficult to explain, validate and formally verify. As a result, BNNs remain largely opaque, limiting their suitability in safety-critical domains, where causal transparency and behavioral guarantees are essential. In this work, we introduce a Petri net (PN)-based framework that captures the BNN's internal operations as event-driven processes. By "eventizing" their operations, we expose their causal relationships and dependencies for a fine-grained analysis of concurrency, ordering, and state evolution. Here, we construct modular PN blueprints for core BNN components including activation, gradient computation and weight updates, and compose them into a complete system-level model. We then validate the composed PN against a reference software-based BNN, verify it against reachability and structural checks to establish 1-safeness, deadlock-freeness, mutual exclusion and correct-by-construction causal sequencing, before we assess its scalability and complexity at segment, component, and system levels using the automated measurement tools in Workcraft. Overall, this framework enables causal introspection of transparent and event-driven BNNs that are amenable to formal reasoning and verification.
Maria Ryskina, Matthew R. Gormley, Kyle Mahowald, David R. Mortensen, Taylor Berg-Kirkpatrick, Vivek Kulkarni
Comments Accepted to LChange 2026
Living languages are shaped by a host of conflicting internal and external evolutionary pressures. While some of these pressures are universal across languages and cultures, others differ depending on the social and conversational context: language use in newspapers is subject to very different constraints than language use on social media. Prior distributional semantic work on English word emergence (neology) identified two factors correlated with creation of new words by analyzing a corpus consisting primarily of historical published texts (Ryskina et al., 2020, arXiv:2001.07740). Extending this methodology to contextual embeddings in addition to static ones and applying it to a new corpus of Twitter posts, we show that the same findings hold for both domains, though the topic popularity growth factor may contribute less to neology on Twitter than in published writing. We hypothesize that this difference can be explained by the two domains favouring different neologism formation mechanisms.
Radosław Piórkowski
This paper establishes logical and expression-based characterizations for the class of languages recognized by nondeterministic register automata with guessing (NRA) over infinite alphabets. We introduce Scoped MSO, a logic featuring a novel segment modality and syntactic restrictions on data comparisons. We prove this logic is expressively equivalent to NRA over data domains where ``strong guessing'' can be eliminated. Furthermore, we define Data-Regular Expressions, a minimalist regular-expression calculus built from quantifier-free regions and equipped with $k$-contracting concatenation, and demonstrate its equivalence to NRA over arbitrary relational structures. Together, these formalisms provide a robust descriptive theory for register automata, bridging the gap between automata, logic, and expressions.
Minghe Lu, Zhanming Chen, May Sunmin Hwang, Ji Youn Shin
Comments 31 pages, 8 figures, conference
Journal ref Proc. ACM Hum.-Comput. Interact. 10, 2, Article CSCW026 (April 2026), 31 pages
Farming plays a significant role in the economy by supporting related industries such as food, retail, and local services. Community-based small farms, while offering unique social and cultural benefits, face persistent challenges, including limited access to formal education and underdeveloped infrastructure, which have been discussed in prior research. This study focuses on community-driven factors, such as workarounds for recording critical information and practices for passing down farming knowledge across generations. Through 11 semi-structured interviews with farmers from a small ethnic community, the Hmong, we explore how bonding social capital, rooted in close family and community ties, supports informal knowledge exchange and creates pathways to bridging and linking capital. These relationships help farmers connect to broader networks, resources, and institutions. Our findings highlight opportunities for designing technologies that support and strengthen existing support systems. We discuss how technologies should be designed to reflect the cultural values, unique practices, and intergenerational relationships embedded in community-based farms.
Matia Bojovic, Saverio Salzo, Massimiliano Pontil
Comments 24 pages
Vanilla gradient methods are often highly sensitive to the choice of stepsize, which typically requires manual tuning. Adaptive methods alleviate this issue and have therefore become widely used. Among them, AdaGrad has been particularly influential. In this paper, we propose an AdaGrad-style adaptive method in which the adaptation is driven by the cumulative squared norms of successive gradient differences rather than gradient norms themselves. The key idea is that when gradients vary little across iterations, the stepsize is not unnecessarily reduced, while significant gradient fluctuations, reflecting curvature or instability, lead to automatic stepsize damping. Numerical experiments demonstrate that the proposed method is more robust than AdaGrad in several practically relevant settings.
J. H. Hoekstra, B. Györök, R. Töth, M. Schoukens
Comments Submitted to IFAC WC 2026
Nonlinear system identification (NL-SI) has proven to be effective in obtaining accurate models for highly complex systems. Recent encoder-based methods for artificial neural network state-space (ANN-SS) models have shown state-of-the-art performance with improved computational efficiency, where the encoder is used to estimate the initial state allowing for batch optimisation methods. To address the lack of interpretability of these black-box ANN models, model augmentation approaches can be used. These combine prior available baseline models with the ANN learning components, resulting in faster convergence and more interpretable models. The combination of the encoder-based method with model augmentation has shown potential. Thus far, however, the encoder has still been treated as a black-box function in the overall estimation process, while additional information in the form of the baseline model is available to predict the model state from past input-output data. In this paper, we propose novel encoder initialisation approaches based on the available baseline model, resulting in improved noise robustness and faster convergence compared to black-box initialisation. The performance of these initialisation methods is demonstrated on a mass-spring-damper system.
Fabrizio Conca, Benjamin Jany, Alberto Ravagnani
We investigate the structure of intersecting error-correcting codes, with a particular focus on their connection to matroid theory. We establish properties and bounds for intersecting codes with the Hamming metric and illustrate how these distinguish the subfamily of minimal codes within the family of intersecting codes. We prove that the property of a code being intersecting is characterized by the matroid-theoretic notion of vertical connectivity, showing that intersecting codes are precisely those achieving the highest possible value of this parameter. We then introduce the concept of vertical connectivity for $q$-matroids and link it to the theory of intersecting codes endowed with the rank metric.
Solveig Wittig, Antonis Vasileiou, Robert R. Nerem, Timo Stoll, Floris Geerts, Yusu Wang, Christopher Morris
In recent years, there has been growing interest in understanding neural architectures' ability to learn to execute discrete algorithms, a line of work often referred to as neural algorithmic reasoning. The goal is to integrate algorithmic reasoning capabilities into larger neural pipelines. Many such architectures are based on (message-passing) graph neural networks (MPNNs), owing to their permutation equivariance and ability to deal with sparsity and variable-sized inputs. However, existing work is either largely empirical and lacks formal guarantees or it focuses solely on expressivity, leaving open the question of when and how such architectures generalize beyond a finite training set. In this work, we propose a general theoretical framework that characterizes the sufficient conditions under which MPNNs can learn an algorithm from a training set of small instances and provably approximate its behavior on inputs of arbitrary size. Our framework applies to a broad class of algorithms, including single-source shortest paths, minimum spanning trees, and general dynamic programming problems, such as the $0$-$1$ knapsack problem. In addition, we establish impossibility results for a wide range of algorithmic tasks, showing that standard MPNNs cannot learn them, and we derive more expressive MPNN-like architectures that overcome these limitations. Finally, we refine our analysis for the Bellman-Ford algorithm, yielding a substantially smaller required training set and significantly extending the recent work of Nerem et al. [2025] by allowing for a differentiable regularization loss. Empirical results largely support our theoretical findings.
扫码添加微信好友,提出您的宝贵建议 👇
💡 备注请填写:网站反馈