arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2967
2603.22808 2026-03-31 cs.CR cs.LG

Combinatorial Privacy: Private Multi-Party Bitstream Grand Sum by Hiding in Birkhoff Polytopes

Praneeth Vepakomma

详情
英文摘要

We introduce PolyVeil, a protocol for private Boolean summation across $k$ clients that encodes private bits as permutation matrices in the Birkhoff polytope. A two-layer architecture gives the server perfect simulation-based security (statistical distance zero) while a separate aggregator faces \#P-hard likelihood inference via the permanent and mixed discriminant. Two variants (full and compressed) differ in what the aggregator observes. We develop a finite-sample $(\varepsilon,δ)$-DP analysis with explicit constants. In the full variant, where the aggregator sees a doubly stochastic matrix per client, the log-Lipschitz constant grows as $n^4 K_t$ and a signal-to-noise analysis shows the DP guarantee is non-vacuous only when the private signal is undetectable. In the compressed variant, where the aggregator sees a single scalar, the univariate density ratio yields non-vacuous $\varepsilon$ at moderate SNR, with the optimal decoy count balancing CLT accuracy against noise concentration. This exposes a fundamental tension. \#P-hardness requires the full matrix view (Birkhoff structure visible), while non-vacuous DP requires the scalar view (low dimensionality). Whether both hold simultaneously in one variant remains open. The protocol needs no PKI, has $O(k)$ communication, and outputs exact aggregates.

2603.21439 2026-03-31 cs.SE cs.AI

LLM-Powered Workflow Optimization for Multidisciplinary Software Development: An Automotive Industry Case Study

Shuai Wang, Yinan Yu, Earl Barr, Dhasarathy Parthasarathy

Comments Accepted to FSE 2026 Industrial Track

详情
英文摘要

Multidisciplinary Software Development (MSD) requires domain experts and developers to collaborate across incompatible formalisms and separate artifact sets. Today, even with AI coding assistants like GitHub Copilot, this process remains inefficient; individual coding tasks are semi-automated, but the workflow connecting domain knowledge to implementation is not. Developers and experts still lack a shared view, resulting in repeated coordination, clarification rounds, and error-prone handoffs. We address this gap through a graph-based workflow optimization approach that progressively replaces manual coordination with LLM-powered services, enabling incremental adoption without disrupting established practices. We evaluate our approach on \texttt{spapi}, a production in-vehicle API system at Volvo Group involving 192 endpoints, 420 properties, and 776 CAN signals across six functional domains. The automated workflow achieves 93.7\% F1 score while reducing per-API development time from approximately 5 hours to under 7 minutes, saving an estimated 979 engineering hours. In production, the system received high satisfaction from both domain experts and developers, with all participants reporting full satisfaction with communication efficiency.

2603.20062 2026-03-31 cs.IR cs.AI

The End of Rented Discovery: How AI Search Redistributes Power Between Hotels and Intermediaries

Peiying Zhu, Sidi Chang

Comments 13 pages, 10 tables, Accepted to the 10th Hospitality Finance & Economics Conference (HFE 2026), Tokyo, Japan

详情
英文摘要

When a traveler asks an AI search engine to recommend a hotel, which sources get cited -- and does query framing matter? We audit 1,357 grounding citations from Google Gemini across 156 hotel queries in Tokyo and document a systematic pattern we call the Intent-Source Divide. Experiential queries draw 55.9% of their citations from non-OTA sources, compared to 30.8% for transactional queries -- a 25.1 percentage-point gap ($p < 5 \times 10^{-20}$). The effect is amplified in Japanese, where experiential queries draw 62.1% non-OTA citations compared to 50.0% in English -- consistent with a more diverse Japanese non-OTA content ecosystem. For an industry in which hotels have long paid OTAs for demand acquisition, this pattern matters because it suggests that AI search may make hotel discovery less exclusively controlled by commission-based intermediaries.

2603.19347 2026-03-31 cs.AR cs.LG

Exploring the Agentic Frontier of Verilog Code Generation

Patrick Yubeaton, Siddharth Garg, Chinmay Hegde

详情
英文摘要

Large language models (LLMs) have made rapid advancements in code generation for popular languages such as Python and C++. Many of these recent gains can be attributed to the use of ``agents'' that wrap domain-relevant tools alongside LLMs. Hardware design languages such as Verilog have also seen improved code generation in recent years, but the impact of agentic frameworks on Verilog code generation tasks remains unclear. In this work, we present the first systematic evaluation of agentic LLMs for Verilog generation, using the recently introduced CVDP benchmark. We also introduce several open-source hardware design agent harnesses, providing a model-agnostic baseline for future work. Through controlled experiments across frontier models, we study how structured prompting and tool design affect performance, analyze agent failure modes and tool usage patterns, compare open-source and closed-source models, and provide qualitative examples of successful and failed agent runs. Our results show that naive agentic wrapping around frontier models can degrade performance (relative to standard forward passes with optimized prompts), but that structured harnesses meaningfully match and in some cases exceed non-agentic baselines. We find that the performance gap between open and closed source models is driven by both higher crash rates and weaker tool output interpretation. Our exploration illuminates the path towards designing special-purpose agents for verilog generation in the future.

2603.12702 2026-03-31 cs.IR cs.CL cs.LG

FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

Chaojie Sun, Bin Cao, Tiantian Li, Chenyu Hou, Ruizhe Li, Jing Fan

Comments work in process;10pages, 5 figures, 4 tables

详情
英文摘要

With the rapid advancement of large language models (LLMs), growing efforts have been made on LLM-based table retrieval. However, existing studies typically focus on single-table query, and implement it by similarity matching after encoding the entire table. These methods usually result in low accuracy due to their coarse-grained encoding which incorporates much query-irrelated data, and are also inefficient when dealing with large tables, failing to fully utilize the reasoning capabilities of LLM. Further, multi-table query is under-explored in retrieval tasks. To this end, we propose a hierarchical multi-table query method based on LLM: Fine-Grained Multi-Table Retrieval FGTR, a new retrieval paradigm that employs a human-like reasoning strategy. Through hierarchical reasoning, FGTR first identifies relevant schema elements and then retrieves the corresponding cell contents, ultimately constructing a concise and accurate sub-table that aligns with the given query. To comprehensively evaluate the performance of FGTR, we construct two new benchmark datasets based on Spider and BIRD . Experimental results show that FGTR outperforms previous state-of-the-art methods, improving the F_2 metric by 18% on Spider and 21% on BIRD, demonstrating its effectiveness in enhancing fine-grained retrieval and its potential to improve end-to-end performance on table-based downstream tasks.

2603.12681 2026-03-31 cs.CR cs.LG

Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

Sihao Ding

Comments Updated manuscript to better reflect the core contribution

详情
英文摘要

We show that safety alignment in modular LLMs can exhibit a compositional vulnerability: adapters that appear benign and plausibly functional in isolation can, when linearly composed, compromise safety. We study this failure mode through Colluding LoRA (CoLoRA), in which harmful behavior emerges only in the composition state. Unlike attacks that depend on adversarial prompts or explicit input triggers, this composition-triggered broad refusal suppression causes the model to comply with harmful requests under standard prompts once a particular set of adapters is loaded. This behavior exposes a combinatorial blind spot in current unit-centric defenses, for which exhaustive verification over adapter compositions is computationally intractable. Across several open-weight LLMs, we find that individual adapters remain benign in isolation while their composition yields high attack success rates, indicating that securing modular LLM supply-chains requires moving beyond single-module verification toward composition-aware defenses.

2603.11560 2026-03-31 cs.MA cs.AI econ.TH math.DS

Feedback-Coupled Memory Systems: A Dynamical Model for Adaptive Coordination

Stefano Grassi

详情
英文摘要

This paper develops a dynamical framework for adaptive coordination in systems of interacting agents referred to here as Feedback-Coupled Memory Systems (FCMS). Instead of framing coordination as equilibrium optimization or agent-centric learning, the model describes a closed-loop interaction between agents, incentives, and a persistent environment. The environment stores accumulated coordination signals, a distributed incentive field transmits them locally, and agents update in response, generating a feedback-driven dynamical system. Three main results are established. First, under dissipativity, the closed-loop system admits a bounded forward-invariant region, ensuring dynamical viability independently of global optimality. Second, when incentives depend on persistent environmental memory, coordination cannot be reduced to a static optimization problem. Third, within the FCMS class, coordination requires a bidirectional coupling in which memory-dependent incentives influence agent updates, while agent behavior reshapes the environmental state. Numerical analysis of a minimal specification identifies a Neimark-Sacker bifurcation at a critical coupling threshold ($β_c$), providing a stability boundary for the system. Near the bifurcation threshold, recovery time diverges and variance increases, yielding a computable early warning signature of coordination breakdown in observable time series. Additional simulations confirm robustness under nonlinear saturation and scalability to populations of up to $N = 10^{6}$ agents making it more relevant for real-world applications. The proposed framework offers a dynamical perspective on coordination in complex systems, with potential extensions to multi-agent systems, networked interactions, and macro-level collective dynamics.

2603.09964 2026-03-31 cs.HC cs.AI cs.ET

Understanding the Use of a Large Language Model-Powered Guide to Make Virtual Reality Accessible for Blind and Low Vision People

Jazmin Collins, Sharon Y Lin, Tianqi Liu, Andrea Stevenson Won, Shiri Azenkot

Comments 16 pages, 5 figures, 3 tables, Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26), April 13-17, 2026, Barcelona, Spain. ACM

详情
英文摘要

As social virtual reality (VR) grows more popular, addressing accessibility for blind and low vision (BLV) users is increasingly critical. Researchers have proposed an AI "sighted guide" to help users navigate VR and answer their questions, but it has not been studied with users. To address this gap, we developed a large language model (LLM)-powered guide and studied its use with 16 BLV participants in virtual environments with confederates posing as other users. We found that when alone, participants treated the guide as a tool, but treated it companionably around others, giving it nicknames, rationalizing its mistakes with its appearance, and encouraging confederate-guide interaction. Our work furthers understanding of guides as a versatile method for VR accessibility and presents design recommendations for future guides.

2603.09455 2026-03-31 cs.SE cs.AI cs.LO

Declarative Scenario-based Testing with RoadLogic

Ezio Bartocci, Alessio Gambi, Felix Gigler, Cristinel Mateis, Dejan Ničković

Comments Accepted at the 29th ACM International Conference on Hybrid Systems: Computation and Control (HSCC 2026). The final version will appear in the ACM Digital Library

详情
英文摘要

Scenario-based testing is a key method for cost-effective and safe validation of autonomous vehicles (AVs). Existing approaches rely on imperative scenario definitions, requiring developers to manually enumerate numerous variants to achieve coverage. Declarative languages, such as ASAM OpenSCENARIO DSL (OS2), raise the abstraction level but lack systematic methods for instantiating concrete and specification-compliant scenarios. To our knowledge, currently, no open-source solution provides this capability. We present RoadLogic that bridges declarative OS2 specifications and executable simulations. It uses Answer Set Programming to generate abstract plans satisfying scenario constraints, motion planning to refine the plans into feasible trajectories, and specification-based monitoring to verify correctness. We evaluate RoadLogic on instantiating representative OS2 scenarios executed in the CommonRoad framework. Results show that RoadLogic consistently produces realistic, specification-satisfying simulations within minutes and captures diverse behavioral variants through parameter sampling, thus opening the door to systematic scenario-based testing for autonomous driving systems.

2603.01499 2026-03-31 cs.CR cs.AI

Towards Privacy-Preserving LLM Inference via Covariant Obfuscation (Technical Report)

Yu Lin, Qizhi Zhang, Wenqiang Ruan, Daode Zhang, Jue Hong, Ye Wu, Hanning Xia, Yunlong Mao, Sheng Zhong

详情
英文摘要

The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in remote inference. For privacy-preserving LLM inference technologies to be practically applied in industrial scenarios, three core requirements must be satisfied simultaneously: (1) Accuracy and efficiency losses should be minimized to mitigate degradation in service experience. (2) The inference process can be run on large-scale clusters consist of heterogeneous legacy xPUs. (3) Compatibility with existing LLM infrastructures should be ensured to reuse their engineering optimizations. To the best of our knowledge, none of the existing privacy-preserving LLM inference methods satisfy all the above constraints while delivering meaningful privacy guarantees. In this paper, we propose AloePri, the first privacy-preserving LLM inference method for industrial applications. AloePri protects both the input and output data by covariant obfuscation, which jointly transforms data and model parameters to achieve better accuracy and privacy. We carefully design the transformation for each model component to ensure inference accuracy and data privacy while keeping full compatibility with existing infrastructures of Language Model as a Service. AloePri has been integrated into an industrial system for the evaluation of mainstream LLMs. The evaluation on Deepseek-V3.1-Terminus model (671B parameters) demonstrates that AloePri causes accuracy loss of 0.0%~3.5% and exhibits efficiency equivalent to that of plaintext inference. Meanwhile, AloePri successfully resists state-of-the-art attacks, with less than 5\% of tokens recovered. To the best of our knowledge, AloePri is the first method to exhibit practical applicability to large-scale models in real-world systems.

2602.20168 2026-03-31 cs.CY cs.AI cs.LG

Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing

KMA Solaiman, Joshua Sebastian, Karma Tobden

Comments Accepted at the 14th IEEE International Conference on Healthcare Informatics (ICHI) 2026. 10 pages, 4 figures, 6 tables

详情
英文摘要

Emergency triage decisions are made under severe information constraints, yet most data-driven deterioration models are evaluated using signals unavailable during initial assessment. We present a leakage-aware benchmarking framework for early deterioration prediction that evaluates model performance under realistic, time-limited sensing conditions. Using a patient-deduplicated cohort derived from MIMIC-IV-ED, we compare hospital-rich triage with a vitals-only, MCI-like setting, restricting inputs to information available within the first hour of presentation. Across multiple modeling approaches, predictive performance declines only modestly when limited to vitals, indicating that early physiological measurements retain substantial clinical signal. Structured ablation and interpretability analyses identify respiratory and oxygenation measures as the most influential contributors to early risk stratification, with models exhibiting stable, graceful degradation as sensing is reduced. This work provides a clinically grounded benchmark to support the evaluation and design of deployable triage decision-support systems in resource-constrained settings.

2602.18482 2026-03-31 physics.comp-ph cond-mat.stat-mech cs.LG stat.ML

Boltzmann Generators for Condensed Matter via Riemannian Flow Matching

Emil Hoffmann, Maximilian Schebek, Leon Klein, Frank Noé, Jutta Rogal

Comments Published as a workshop paper at AI4MAT, ICLR 2026

详情
英文摘要

Sampling equilibrium distributions is fundamental to statistical mechanics. While flow matching has emerged as scalable state-of-the-art paradigm for generative modeling, its potential for equilibrium sampling in condensed-phase systems remains largely unexplored. We address this by incorporating the periodicity inherent to these systems into continuous normalizing flows using Riemannian flow matching. The high computational cost of exact density estimation intrinsic to continuous normalizing flows is mitigated by using Hutchinson's trace estimator, utilizing a crucial bias-correction step based on cumulant expansion to render the stochastic estimates suitable for rigorous thermodynamic reweighting. Our approach is validated on monatomic ice, demonstrating the ability to train on systems of unprecedented size and obtain highly accurate free energy estimates without the need for traditional multistage estimators.

2601.20404 2026-03-31 cs.SE cs.AI cs.ET cs.HC

On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents

Jai Lal Lulla, Seyedmoein Mohsenimofidi, Matthias Galster, Jie M. Zhang, Sebastian Baltes, Christoph Treude

Comments 5 pages, 1 figure, 1 table

详情
英文摘要

AI coding agents such as Codex and Claude Code are increasingly used to autonomously contribute to software repositories. However, little is known about how repository-level configuration artifacts affect operational efficiency of the agents. In this paper, we study the impact of AGENTS$.$md files on the runtime and token consumption of AI coding agents operating on GitHub pull requests. We analyze 10 repositories and 124 pull requests, executing agents under two conditions: with and without an AGENTS$.$md file. We measure wall-clock execution time and token usage during agent execution. Our results show that the presence of AGENTS$.$md is associated with a lower median runtime ($Δ28.64$%) and reduced output token consumption ($Δ16.58$%), while maintaining a comparable task completion behavior. Based on these results, we discuss immediate implications for the configuration and deployment of AI coding agents in practice, and outline a broader research agenda on the role of repository-level instructions in shaping the behavior, efficiency, and integration of AI coding agents in software development workflows.

2601.18857 2026-03-31 stat.ML cs.LG

Statistical Inference for Explainable Boosting Machines

Haimo Fang, Kevin Tan, Jonathan Pipping-Gamon, Giles Hooker

Comments Accepted to AISTATS 2026 (poster)

详情
英文摘要

Explainable boosting machines (EBMs) are popular "glass-box" models that learn a set of univariate functions using boosting trees. These achieve explainability through visualizations of each feature's effect. However, unlike linear model coefficients, uncertainty quantification for the learned univariate functions requires computationally intensive bootstrapping, making it hard to know which features truly matter. We provide an alternative using recent advances in statistical inference for gradient boosting, deriving methods for statistical inference as well as end-to-end theoretical guarantees. Using a moving average instead of a sum of trees (Boulevard regularization) allows the boosting process to converge to a feature-wise kernel ridge regression. This produces asymptotically normal predictions that achieve the minimax-optimal MSE for fitting Lipschitz GAMs with $p$ features of $O(p n^{-2/3})$, successfully avoiding the curse of dimensionality. We then construct prediction intervals for the response and confidence intervals for each learned univariate function with a runtime independent of the number of datapoints, enabling further explainability within EBMs. Code is available at https://github.com/hetankevin/ebm-inference.

2601.15109 2026-03-31 cs.SI cs.AI cs.CY cs.HC cs.MA

An Agentic Operationalization of DISARM for FIMI Investigation on Social Media

Kevin Tseng, Juan Carlos Toledano, Bart De Clerck, Yuliia Dukach, Phil Tinn

Comments This paper was originally presented at the International Conference on Military Communication and Information Systems (ICMCIS), organized by the Information Systems Technology (IST) Scientific and Technical Committee, IST-224-RSY---the ICMCIS, held in Bath, United Kingdom, 12-13 May 2026

详情
英文摘要

Interoperable data and intelligence flows among allied partners and operational end-users remain essential to NATO's collective defense across both conventional and hybrid threat environments. Foreign Information Manipulation and Interference (FIMI) increasingly spans multiple societal domains and information ecosystems, complicating threat characterization, persistent situational awareness, and coordinated response. Concurrent advances in AI have further lowered the barrier to conducting large-scale, AI-augmented FIMI activities -- including automated generation, personalization, and amplification of manipulative content. While frameworks such as DISARM offer a standardized analytical and metadata schema for characterizing FIMI incidents, their practical application for automating large-scale detection remains challenging. We present a framework-agnostic, agent-based operationalization of DISARM piloted to support FIMI investigation on social platforms. Our agent coordination pipeline integrates general agentic AI components that (1) identify candidate manipulative behaviors in social-media data and (2) map these behaviors to DISARM taxonomies through transparent, auditable reasoning steps. Evaluation on two practitioner-annotated, real-world datasets demonstrates that our approach can effectively scale analytic workflows that are currently manual, time-intensive, and interpretation-heavy. Notably, the experiment surfaced more than 30 previously undetected Russian bot accounts -- deployed for the 2025 election in Moldova -- during the prior non-agentic investigation. By enhancing analytic throughput, interoperability, and explainability, the proposed approach provides a direct contribution to defense policy and planning needs for improved situational awareness, cross-partner data integration, and rapid assessment of information-environment threats.

2601.07370 2026-03-31 cond-mat.soft cs.RO physics.app-ph physics.flu-dyn physics.med-ph

Magnetically Driven Elastic Microswimmers: Exploiting Hysteretic Collapse for Autonomous Propulsion and Independent Control

Theo Lequy, Andreas M. Menzel

Comments 12 pages, 7 figures, submitted to ACS Nanoscience Au

详情
英文摘要

When swimming at low Reynolds numbers, inertial effects are negligible and reciprocal movements cannot induce net motion. Instead, symmetry breaking is necessary to achieve net propulsion. Directed swimming can be supported by magnetic fields, which simultaneously provide a versatile means of remote actuation. Thus, we analyze the motion of a straight microswimmer composed of three magnetizable beads connected by two elastic links. The swimming mechanism is based on oriented external magnetic fields that oscillate in magnitude. Through induced reversible hysteretic collapse of the two segments of the swimmer, the two pairs of beads jump into contact and separate nonreciprocally. Due to higher-order hydrodynamic interactions, net displacement results after each cycle. Different microswimmers can be tuned to different driving amplitudes and frequencies, allowing for simultaneous independent control by just one external magnetic field. The swimmer geometry and magnetic field shape are optimized for maximum swimming speed using an evolutionary optimization strategy. Thanks to the simple working principle, an experimental realization of such a microrobot seems feasible and may open new approaches for microinvasive medical interventions such as targeted drug delivery.

2512.19846 2026-03-31 eess.SY cs.RO cs.SY

A Class of Axis-Angle Attitude Control Laws for Rotational Systems

Francisco M. F. R. Gonçalves, Ryan M. Bena, Néstor O. Pérez-Arancibia

Comments 6 pages, 4 figures. Published in IEEE Control Systems Letters

详情
英文摘要

We introduce a new class of attitude control laws for rotational systems; the proposed framework generalizes the use of the Euler \mbox{axis--angle} representation beyond quaternion-based formulations. Using basic Lyapunov stability theory and the notion of extended class $\mathcal{K}$ function, we developed a method for determining and enforcing the global asymptotic stability of the single fixed point of the resulting \mbox{\textit{closed-loop}} (CL) scheme. In contrast with traditional \mbox{quaternion-based} methods, the introduced generalized \mbox{axis--angle} approach enables greater flexibility in the design of the control law, which is of great utility when employed in combination with a switching scheme whose transition state depends on the angular velocity of the controlled rotational system. Through simulation and \mbox{real-time} experimental results, we demonstrate the effectiveness of the developed formulation. According to the recorded data, in the execution of \mbox{high-speed} \mbox{tumble-recovery} maneuvers, the new method consistently achieves shorter stabilization times and requires lower control effort relative to those corresponding to the \mbox{quaternion-based} and \mbox{geometric-control} methods used as benchmarks.

2511.15090 2026-03-31 cs.DB cs.AI cs.CV

SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning

Wenhan Yu, Zhaoxi Zhang, Wang Chen, Guanqiang Qi, Weikang Li, Lei Sha, Deguo Xia, Jizhou Huang

Comments 8 pages, 4 figures, 3 tables

详情
英文摘要

Scientific documents contain complex multimodal structures, which makes evidence localization and scientific reasoning in Document Visual Question Answering particularly challenging. However, most existing benchmarks evaluate models only at the page level without explicitly annotating the evidence regions that support the answer, which limits both interpretability and the reliability of evaluation. To address this limitation, we introduce SciEGQA, a scientific document question answering and reasoning dataset with semantic evidence grounding, where supporting evidence is represented as semantically coherent document regions annotated with bounding boxes. SciEGQA consists of two components: a **human-annotated fine-grained benchmark** containing 1,623 high-quality question--answer pairs, and a **large-scale automatically constructed training set** with over 30K QA pairs generated through an automated data construction pipeline. Extensive experiments on a wide range of Vision-Language Models (VLMs) show that existing models still struggle with evidence localization and evidence-based question answering in scientific documents. Training on the proposed dataset significantly improves the scientific reasoning capabilities of VLMs. The project page is available at https://yuwenhan07.github.io/SciEGQA-project/.

2511.02069 2026-03-31 physics.soc-ph cs.CL

Complete asymptotic type-token relationship for growing complex systems with inverse power-law count rankings

Pablo Rosillo-Rodes, Laurent Hébert-Dufresne, Peter Sheridan Dodds

Comments 5 pages, 2 figures

详情
Journal ref
Phys. Rev. Research 8, L012029 (2026)
英文摘要

The growth dynamics of complex systems often exhibit statistical regularities involving power-law relationships. For real finite complex systems formed by countable tokens (animals, words) as instances of distinct types (species, dictionary entries), an inverse power-law scaling $S \sim r^{-α}$ between type count $S$ and type rank $r$, widely known as Zipf's law, is widely observed to varying degrees of fidelity. A secondary, summary relationship is Heaps' law, which states that the number of types scales sublinearly with the total number of observed tokens present in a growing system. Here, we propose an idealized model of a growing system that (1) deterministically produces arbitrary inverse power-law count rankings for types, and (2) allows us to determine the exact asymptotics of the type-token relationship. Our argument improves upon and remedies earlier work. We obtain a unified asymptotic expression for all values of $α$, which corrects the special cases of $α= 1$ and $α\gg 1$. Our approach relies solely on the form of count rankings, avoids unnecessary approximations, and does not involve any stochastic mechanisms or sampling processes. We thereby demonstrate that a general type-token relationship arises solely as a consequence of Zipf's law.

2510.25974 2026-03-31 cs.HC cs.LG

Who Leads? Comparing Human-Centric and Model-Centric Strategies for Defining ML Target Variables

Mengtian Guo, David Gotz, Yue Wang

Comments 23 pages, 6 figures

详情
英文摘要

Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.

2510.15058 2026-03-31 stat.ML cs.LG math.ST stat.TH

The Minimax Lower Bound of Kernel Stein Discrepancy Estimation

Jose Cribeiro-Ramallo, Agnideep Aich, Florian Kalinke, Ashit Baran Aich, Zoltán Szabó

Comments Accepted for publication at AISTATS 2026

详情
英文摘要

Kernel Stein discrepancies (KSDs) have emerged as a powerful tool for quantifying goodness-of-fit over the last decade, featuring numerous successful applications. To the best of our knowledge, all existing KSD estimators with known rate achieve $\sqrt n$-convergence. In this work, we present two complementary results (with different proof strategies), establishing that the minimax lower bound of KSD estimation is $n^{-1/2}$ and settling the optimality of these estimators. Our first result focuses on KSD estimation on $\mathbb R^d$ with the Langevin-Stein operator; our explicit constant for the Gaussian kernel indicates that the difficulty of KSD estimation may increase exponentially with the dimensionality $d$. Our second result settles the minimax lower bound for KSD estimation on general domains.

2510.10324 2026-03-31 stat.ML cs.LG

On some practical challenges of conformal prediction

Liang Hong, Noura Raydan Nasreddine

详情
英文摘要

Conformal prediction is a model-free machine learning method for constructing prediction regions at a guaranteed coverage probability level. However, a data scientist often faces three challenges in practice: (i) the determination of a conformal prediction region is only approximate, jeopardizing the finite-sample validity of prediction, (ii) the computation required could be prohibitively expensive, and (iii) the shape of a conformal prediction region is hard to control. This article offers new insights into the relationship among the monotonicity of the non-conformity measure, the monotonicity of the plausibility function, and the exact determination of a conformal prediction region. Based on these new insights, we propose a quadratic-polynomial non-conformity measure that allows a data scientist to circumvent the three challenges simultaneously within the full conformal prediction framework.

2510.09328 2026-03-31 cs.CG cs.AI

Randomized HyperSteiner: A Stochastic Delaunay Triangulation Heuristic for the Hyperbolic Steiner Minimal Tree

Aniss Aiman Medbouhi, Alejandro García-Castellanos, Giovanni Luca Marchetti, Daniel Pelt, Erik J Bekkers, Danica Kragic

详情
英文摘要

We study the problem of constructing Steiner Minimal Trees (SMTs) in hyperbolic space. Exact SMT computation is NP-hard, and existing hyperbolic heuristics such as HyperSteiner are deterministic and often get trapped in locally suboptimal configurations. We introduce Randomized HyperSteiner (RHS), a stochastic Delaunay triangulation heuristic that incorporates randomness into the expansion process and refines candidate trees via Riemannian gradient descent optimization. Experiments on synthetic data sets and a real-world single-cell transcriptomic data show that RHS outperforms Minimum Spanning Tree (MST), Neighbour Joining, and vanilla HyperSteiner (HS). In near-boundary configurations, RHS can achieve a 32% reduction in total length over HS, demonstrating its effectiveness and robustness in diverse data regimes.

2509.19315 2026-03-31 eess.SP cs.AI cs.LG

Advancing Few-Shot Pediatric Arrhythmia Classification with a Novel Contrastive Loss and Multimodal Learning

Yiqiao Chen, Zijian Huang, Zhenghui Feng

Comments 12pages, 9 figures

详情
英文摘要

Arrhythmias are a major cause of sudden cardiac death in children, making automated rhythm classification from electrocardiograms (ECGs) clinically important. However, pediatric arrhythmia analysis remains challenging because of age-dependent waveform variability, limited data availability, and a pronounced long-tailed class distribution that hinders recognition of rare but clinically important rhythms. To address these issues, we propose a multimodal end-to-end framework that integrates surface ECG and intracardiac electrogram (IEGM) signals for pediatric arrhythmia classification. The model combines dual-branch feature encoders, attention-based cross-modal fusion, and a lightweight Transformer classifier to learn complementary electrophysiological representations. We further introduce an Adaptive Global Class-Aware Contrastive Loss (AGCACL), which incorporates prototype-based alignment, class-frequency reweighting, and globally informed hard-class modulation to improve intra-class compactness and inter-class separability under class imbalance. We evaluate the proposed method on the pediatric subset of the Leipzig Heart Center ECG-Database and establish a reproducible preprocessing pipeline including rhythm-segment construction, denoising, and label grouping. The proposed approach achieves 96.22% Top-1 accuracy and improves macro precision, macro recall, macro F1 score, and macro F2 score by 4.48, 1.17, 6.98, and 7.34 percentage points, respectively, over the strongest baseline. These results indicate improved minority-sensitive classification performance on the current benchmark. However, further validation under subject-independent and multicenter settings is still required before clinical translation.

2508.08517 2026-03-31 stat.ML cs.CE cs.LG

Projection-based multifidelity linear regression for data-scarce applications

Vignesh Sella, Julie Pham, Karen Willcox, Anirban Chaudhuri

Comments 23 page, 7 figures, submitted to Machine Learning for Computational Science and Engineering special issue Accelerating Numerical Methods With Scientific Machine Learning

详情
Journal ref
Mach. Learn. Comput. Sci. Eng. 1, 47 (2025)
英文摘要

Surrogate modeling for systems with high-dimensional quantities of interest remains challenging, particularly when training data are costly to acquire. This work develops multifidelity methods for multiple-input multiple-output linear regression targeting data-limited applications with high-dimensional outputs. Multifidelity methods integrate many inexpensive low-fidelity model evaluations with limited, costly high-fidelity evaluations. We introduce two projection-based multifidelity linear regression approaches that leverage principal component basis vectors for dimensionality reduction and combine multifidelity data through: (i) a direct data augmentation using low-fidelity data, and (ii) a data augmentation incorporating explicit linear corrections between low-fidelity and high-fidelity data. The data augmentation approaches combine high-fidelity and low-fidelity data into a unified training set and train the linear regression model through weighted least squares with fidelity-specific weights. Various weighting schemes and their impact on regression accuracy are explored. The proposed multifidelity linear regression methods are demonstrated on approximating the surface pressure field of a hypersonic vehicle in flight. In a low-data regime of no more than ten high-fidelity samples, multifidelity linear regression achieves approximately 3% - 12% improvement in median accuracy compared to single-fidelity methods with comparable computational cost.

2507.05147 2026-03-31 cond-mat.stat-mech cond-mat.dis-nn cs.LG

Pseudo-likelihood produces associative memories able to generalize, even for asymmetric couplings

Francesco D'Amico, Dario Bocchi, Luca Maria Del Bono, Saverio Rossi, Matteo Negri

详情
Journal ref
Physica A: Statistical Mechanics and its Applications, Volume 692, 2026, 131497, ISSN 0378-4371
英文摘要

Energy-based probabilistic models learned by maximizing the likelihood of the data are limited by the intractability of the partition function. A widely used workaround is to maximize the pseudo-likelihood, which replaces the global normalization with tractable local normalizations. Here we show that, in the zero-temperature limit, a network trained to maximize pseudo-likelihood naturally implements an associative memory: if the training set is small, patterns become fixed-point attractors whose basins of attraction exceed those of any classical Hopfield rule. We explain quantitatively this effect on uncorrelated random patterns. Moreover, we show that, for different structured datasets coming from computer science (random feature model, MNIST), physics (spin glasses) and biology (proteins), as the number of training examples increases the learned network goes beyond memorization, developing meaningful attractors with non-trivial correlations with test examples, thus showing the ability to generalize. Our results therefore reveal pseudo-likelihood works both as an efficient inference tool and as a principled mechanism for memory and generalization.

2506.17337 2026-03-31 eess.IV cs.AI cs.CV

Can Generalist Vision Language Models (VLMs) Rival Specialist Medical VLMs? Benchmarking and Strategic Insights

Yuan Zhong, Ruinan Jin, Qi Dou, Xiaoxiao Li

Comments version 3

详情
英文摘要

Vision Language Models (VLMs) have shown promise in automating image diagnosis and interpretation in clinical settings. However, developing specialist medical VLMs requires substantial computational resources and carefully curated datasets, and it remains unclear under which conditions generalist and specialist medical VLMs each perform best. This study highlights the complementary strengths of specialist medical and generalist VLMs. Specialists remain valuable in modality-aligned use cases, but we find that efficiently fine-tuned generalist VLMs can achieve comparable or even superior performance in most tasks, particularly when transferring to unseen or rare OOD medical modalities. These results suggest that generalist VLMs, rather than being constrained by their lack of specialist medical pretraining, may offer a scalable and cost-effective pathway for advancing clinical AI development.

2506.04450 2026-03-31 cs.CR cs.AI cs.CL cs.LG

Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification

Payel Bhattacharjee, Fengwei Tian, Geoffrey D. Rubin, Joseph Y. Lo, Nirav Merchant, Heidi Hanson, John Gounley, Ravi Tandon

Comments Accepted in IEEE ACCESS, 2026

详情
英文摘要

Large Language Models (LLMs) are increasingly adopted across domains such as education, healthcare, and finance. In healthcare, LLMs support tasks including disease diagnosis, abnormality classification, and clinical decision-making. Among these, multi-abnormality classification of radiology reports is critical for clinical workflow automation and biomedical research. Leveraging strong natural language processing capabilities, LLMs enable efficient processing of unstructured medical text and reduce the administrative burden of manual report analysis. To improve performance, LLMs are often fine-tuned on private, institution-specific datasets such as radiology reports. However, this raises significant privacy concerns: LLMs may memorize training data and become vulnerable to data extraction attacks, while sharing fine-tuned models risks exposing sensitive patient information. Despite growing interest in LLMs for medical text classification, privacy-preserving fine-tuning for multi-abnormality classification remains underexplored. To address this gap, we propose a differentially private (DP) fine-tuning framework for multi-abnormality classification from free-text radiology reports. Our approach integrates differential privacy with Low-Rank Adaptation (LoRA) to efficiently fine-tune LLMs on sensitive clinical data while mitigating leakage risks. We further employ labels generated by a larger LLM to train smaller models, enabling efficient inference under strong privacy guarantees. Experiments on MIMIC-CXR and CT-RATE demonstrate the effectiveness of our DP-LoRA framework across varying privacy regimes. On MIMIC-CXR, our method achieves weighted F1-scores up to 0.89 under moderate privacy budgets, approaching non-private LoRA (0.90) and full fine-tuning (0.96), confirming that strong privacy can be achieved with only modest performance trade-offs.

2506.01399 2026-03-31 eess.SY cs.RO cs.SY

Captivity-Escape Games as a Means for Safety in Online Motion Generation

Christopher Bohn, Manuel Hess, Sören Hohmann

详情
英文摘要

This paper presents a method that addresses the conservatism, computational effort, and limited numerical accuracy of existing frameworks and methods that ensure safety in online model-based motion generation, commonly referred to as fast and safe tracking. Computational limitations restrict online motion planning to low-fidelity models. However, planning with low-fidelity models compromises safety, as the dynamic feasibility of resulting references is not ensured. This potentially leads to unavoidable tracking errors that may cause safety-critical constraint violations. Existing frameworks mitigate this safety risk by augmenting safety-critical constraints in motion planning by a safety margin that prevents constraint violations under worst-case tracking errors. However, the methods employed in these frameworks determine the safety margin based on a heuristically selected performance of the model used for planning, which likely results in overly conservative references. Furthermore, these methods are computationally intensive, and the state-of-the-art method is limited in numerical accuracy. We adopt a different perspective and address these limitations with a method that mitigates conservatism in existing frameworks by adapting the performance of the model used for planning to a given safety margin. Our method achieves numerical accuracy and requires significantly less computation time than existing methods by leveraging a captivity-escape game, which is a novel zero-sum differential game formulated in this paper. We demonstrate our method using a numerical example and compare it to the state of the art.

2505.24852 2026-03-31 cs.AR cs.LG

Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data

Douwe den Blanken, Charlotte Frenkel

Comments 14 pages, 7 figures; added FSL power consumption measurements at 100 kHz clock speed, fixed typos

详情
英文摘要

On-device learning at the edge enables low-latency, private personalization with improved long-term robustness and reduced maintenance costs. Yet, achieving scalable, low-power end-to-end on-chip learning, especially from real-world sequential data with a limited number of examples, is an open challenge. Indeed, accelerators supporting error backpropagation optimize for learning performance at the expense of inference efficiency, while simplified learning algorithms often fail to reach acceptable accuracy targets. In this work, we present Chameleon, leveraging three key contributions to solve these challenges. (i) A unified learning and inference architecture supports few-shot learning (FSL), continual learning (CL) and inference at only 0.5% area overhead to the inference logic. (ii) Long temporal dependencies are efficiently captured with temporal convolutional networks (TCNs), enabling the first demonstration of end-to-end on-chip FSL and CL on sequential data and inference on 16-kHz raw audio. (iii) A dual-mode, matrix-multiplication-free compute array allows either matching the power consumption of state-of-the-art inference-only keyword spotting (KWS) accelerators or enabling $4.3\times$ higher peak GOPS. Fabricated in 40-nm CMOS, Chameleon sets new accuracy records on Omniglot for end-to-end on-chip FSL (96.8%, 5-way 1-shot, 98.8%, 5-way 5-shot) and CL (82.2% final accuracy for learning 250 classes with 10 shots), while maintaining an inference accuracy of 93.3% on the 12-class Google Speech Commands dataset at an extreme-edge power budget of 3.1 $μ$W.