URL PDF HTML ☆

赞 0 踩 0

2606.20079 2026-06-19 q-fin.RM 新提交

How to spot outliers: an Ensemble Anomaly Detection Framework

如何发现异常值：一种集成异常检测框架

Daniil Peysakhovich, Rafał Sieradzki

AI总结针对风险估值输出中的异常问题，提出集成质量评估框架（EQAF），结合多种无监督异常检测方法，在信用衍生品数据上实现F1分数61-79%，优于最佳单一方法（6-66%），并揭示纯统计方法无法检测冻结馈送异常。

详情

AI中文摘要

由数据馈送失败、模型配置错误或系统故障引起的风险估值输出错误可能通过投资银行的风险基础设施未被检测地传播，并产生重大操作损失。利用一家全球大型投资银行涵盖129个交易日183笔交易的专有每日信用衍生品数据，我们设计、实施并实证评估了集成质量评估框架（EQAF），这是一种分层无监督架构，结合互补的异常检测方法，实时监控风险计算完整性。通过使用八种操作现实场景的受控异常注入协议，我们表明校准后的集成在四个不同风险度量数据集上实现了61-79%的F1分数，显著优于最佳单一方法（6-66%）。AUC-ROC提高4-6个百分点证实了这种优势对阈值选择具有鲁棒性。我们进一步证明，纯统计检测方法系统地无法识别冻结值异常，这是一类冻结馈送错误，其中估值输出与先前观测相同，因此与正常数据无法区分，并且领域特定的确定性规则在架构上是不可或缺的。这些发现对巴塞尔III和交易账簿基本审查（FRTB）下的模型风险管理具有直接影响，其中对内部风险模型的自动化和可审计质量控制要求日益增加。

英文摘要

Errors in risk valuation outputs arising from data-feed failures, model misconfiguration, or system malfunctions can propagate undetected through an investment bank's risk infrastructure and generate material operational losses. Using proprietary daily credit-derivatives data from a major global investment bank covering 183 trades across 129 trading days, we design, implement, and empirically evaluate the Ensemble Quality Assessment Framework (EQAF), a layered unsupervised architecture that combines complementary outlier-detection methods to monitor risk calculation integrity in real time. Using a controlled anomaly-injection protocol with eight operationally realistic scenarios, we show that the calibrated ensemble achieves F1 scores of 61-79%, substantially outperforming the best individual method (6-66%) across four distinct risk-measure datasets. Improvements of 4-6 percentage points in AUC-ROC confirm that this advantage is robust to threshold selection. We further demonstrate that purely statistical detection methods systematically fail to identify stale-value anomalies, a class of frozen-feed errors in which valuation outputs are identical to prior observations and therefore indistinguishable from normal data, and that domain-specific deterministic rules are architecturally indispensable. These findings have direct implications for model risk management under Basel III and the Fundamental Review of the Trading Book (FRTB), where automated and auditable quality controls for internal risk models are increasingly required.

URL PDF HTML ☆

赞 0 踩 0

2606.19846 2026-06-19 econ.GN q-fin.EC 新提交

What Capital After Labor? Forecasting the Talent ROI Transition in the Human-AI Era

劳动力之后是什么资本？预测人机时代的人才ROI转型

Kwan Soo Shin, In Seok Kang

AI总结针对AI增强打破劳动时间与贡献的会计关联，本文构建从时间到产出的人才ROI预测框架，核心定理为ROI反转，并利用韩国52小时工作制案例验证了前期压力信号，预测产出型企业在2032年TFP增长领先1.5-2.0个百分点。

Comments 90 pages, 6 figures

详情

婴儿大脑发育中结构-功能模块一致性的鲁棒概率测量

Lingbin Bian, Feihong Liu, Qian Wang, Han Zhang, Dinggang Shen, the UNC/UMN Baby Connectome Project Consortium

AI总结提出基于随机模块的概率方法，鲁棒测量婴儿大脑结构-功能模块一致性，发现0-5岁间一致性下降，初级脑区一致性更高。

详情

DOI: 10.1007/s00429-026-03143-3

AI中文摘要

脑网络通常被划分为模块，用于分析其在神经影像学研究的群体分析中功能分离的角色。这里，我们引入脑网络中的随机模块，用于在受试者群体中对结构-功能模块一致性（SFMC）进行鲁棒的概率测量。具体而言，随机模块可被视为一个脑区在受试者间可能被分配到群体级子网络的机会，其特征为该脑区的分配概率。这种新方法在评估脑网络中的非均匀模块方面有两个优势。首先，它可以鲁棒地评估脑结构模块与功能模块之间的一致性，而两者的群体规模不必相同；其次，它能够考虑群体中模块的个体间变异性。此外，与传统的结构-功能耦合方法相比，我们的基于随机模块的方法揭示了结构与功能之间耦合的更显著下降，表明更强的发育重组。我们使用婴儿连接组项目（BCP）数据集的结果显示，SFMC在0至5岁期间下降，并且在初级脑区（如视觉区域）较高，而在更高级的认知区域（包括与注意力、控制和默认模式网络相关的区域）较低。

英文摘要

Brain network is commonly divided into modules for analyzing their functionally segregated roles for group-level analysis in neuroimaging studies. Here, we introduce stochastic modules within brain networks for a robust probabilistic measurement of structural-functional module consistency (SFMC) in a group of subjects. Specifically, a stochastic module can be regarded as the chance of a brain region across subjects potentially being assigned to a group-level sub-network, characterized as an assignment probability for this brain region. This novel method has two advantages for evaluating inhomogeneous modules in brain networks. The first is that it can robustly evaluate the consistency between brain structural and functional modules whose population sizes are not necessary the same, and the second is that it is able to take into account the inter-individual variability of the modules for the groups. Moreover, compared with the conventional structural-functional coupling approach, our stochastic module-based method reveals a more pronounced decline in the coupling between structure and function, indicating stronger developmental reorganization. Our results using the dataset from Baby Connectome Project (BCP) show that the SFMC decreases from 0 to 5 years old, and is greater in primary brain regions, such as visual areas, while lower in more advanced cognitive regions, including those related to attention, control, and default mode network.

URL PDF HTML ☆

赞 0 踩 0

2606.19396 2026-06-19 q-bio.QM 新提交

BioHarness: Substrate-Aware Evidence Assembly for Biomedical Question Answering across Literature, Knowledge Bases, and Biological Atlases

BioHarness：面向生物医学问答的底物感知证据组装——跨文献、知识库和生物图谱

Meng Xiao, Chuan Qin, Jinmiao Chen, Yihang Cheng, Yuanchun Zhou, Hengshu Zhu

AI总结提出BioHarness，通过级联控制机制在文献检索、知识库和生物图谱间选择性组装证据，提升生物医学问答准确率，在19,302个问答项上得分从65.9提升至71.0。

Comments 14 Pages, 11 Figures, Keywords: biomedical question answering; retrieval-augmented generation; large language models; evidence assembly; biomedical knowledge bases; biological atlases

详情

AI中文摘要

动机：生物医学问答通常需要超越主题检索文献的证据，包括基因别名解析、数据库标识符标准化以及来自图谱的生物测量值。然而，现有的检索增强生成（RAG）系统通常遵循固定工作流程，缺乏明确机制来决定何时检索文本足够、何时需要经过整理的生物医学知识、或何时应调用对结构化测量值的可执行证据组装。这激发了一种底物感知的大语言模型（LLM）框架，能够跨文献、知识库和生物图谱选择性地组装足够的证据。结果：我们引入BioHarness，一种用于分阶段生物医学证据组装的LLM框架，涵盖文献检索、经过整理的生物医学知识资源以及来自图谱的结构化测量值。BioHarness首先尝试根据重排序的文献证据回答问题，并通过基于接地级联控制，仅在当前证据不确定、接地不足或底物不匹配时升级到REPL风格的证据组装。在涵盖七种答案格式的19,302个生物医学问答项上，BioHarness将最强非预言基线的综合得分从65.9提升至71.0。消融实验、案例研究和骨干扩展分析表明，这些提升源于通过重排序、实体接地和结构化测量访问修复证据-底物不匹配，而非不加区分地调用更多推理步骤、检索更多文献或依赖特定答案模型规模。

英文摘要

Motivation: Biomedical question answering often requires evidence beyond topically retrieved literature, including gene alias resolution, database identifier normalization, and atlas-derived biological measurements. However, existing retrieval-augmented generation (RAG) systems typically follow a fixed workflow and lack an explicit mechanism for deciding when retrieved text is sufficient, when curated biomedical knowledge is required, or when executable evidence assembly over structured measurements should be invoked. This motivates a substrate-aware large language model (LLM) harness that selectively assembles sufficient evidence across literature, knowledge bases, and biological atlases. Results: We introduce BioHarness, an LLM harness for staged biomedical evidence assembly across literature retrieval, curated biomedical knowledge resources, and atlas-derived structured measurements. BioHarness first attempts to answer from reranked literature evidence and escalates through grounded cascade control to REPL-style evidence assembly only when the current evidence is uncertain, weakly grounded, or substrate-mismatched. Across 19,302 biomedical QA items spanning seven answer formats, BioHarness improves the pooled score from 65.9 to 71.0 over the strongest non-oracle baseline. Ablations, case studies, and backbone-scaling analyses show that these gains arise from repairing evidence-substrate mismatches through reranking, entity grounding, and structured measurement access, rather than from indiscriminately invoking more reasoning steps, retrieving additional literature, or relying on a particular answer-model scale.

URL PDF HTML ☆

赞 0 踩 0

2606.20315 2026-06-19 q-bio.GN cs.CR 新提交

bioETH-Beacon: A Confidential On-Chain Genomic Beacon with Encrypted Counts, Filters, and Bounded Noise over a Fully Homomorphic EVM

bioETH-Beacon: 基于全同态EVM的机密基因组信标，支持加密计数、过滤和有界噪声

Christos Galanopoulos, Kimon Antonios Provatas, Ilias Georgakopoulos-Soares

AI总结提出基于全同态EVM的智能合约原型bioETH-Beacon，实现加密基因组信标查询，通过加密计数、有界噪声和访问控制抵御成员推理攻击，并优化查询成本。

Comments 11 pages, 6 figures, 8 tables. Research prototype for privacy-preserving genomics using Fully Homomorphic Encryption (FHE) on blockchain (fhEVM)

详情

AI中文摘要

全球基因组学与健康联盟（GA4GH）Beacon协议允许研究人员查询某个基因组变异是否在参与队列中被观察到，并返回聚合的变异级计数。随着Beacon网络的发展，两个隐私风险依然存在：宿主机构可以看到明文查询，而重复的罕见变异查询可能支持成员推理攻击。我们提出了bioETH-Beacon，一个智能合约原型，它在全同态以太坊虚拟机（fhEVM）上对加密数据执行Beacon“聚合计数”查询。医院上传加密的标记计数条目，授权研究人员提交加密的标记查询，合约返回加密答案，通过链下密钥管理服务仅释放给合约链上ACL中指定的请求者。该设计组织为一个3x4的层级-查询族网格，涵盖基因型、性别、年龄和表型查询，层级在更强的机密性和更低的查询成本之间进行权衡。对于基因型路径，原型可以添加链上有界噪声以减轻探测攻击。基于多基因评分（PGS）目录的合成面板实验显示了预期的扩展行为，并证明当公共标记存在是可接受的权衡时，预聚合可以显著降低查询gas成本。总体而言，bioETH-Beacon提供了一个无需可信计算评估者的机密Beacon式基因组查询研究原型。

英文摘要

The Global Alliance for Genomics and Health (GA4GH) Beacon protocol lets researchers ask whether a genomic variant has been observed in a participating cohort and receive aggregate variant-level counts. As Beacon networks grow, two privacy risks remain: host institutions can see plaintext queries, and repeated rare-variant queries can support membership-inference attacks. We present bioETH-Beacon, a smart-contract prototype that runs the Beacon "aggregate count" query over encrypted data on a fully homomorphic Ethereum Virtual Machine (fhEVM). Hospitals upload encrypted marker-count entries, authorized researchers submit encrypted marker queries, and the contract returns an encrypted answer that is released, via an off-chain key-management service, only to the requester named in the contract's on-chain ACL. The design is organized as a 3x4 tier-by-query-family grid spanning genotype, sex, age, and phenotype queries, with tiers that trade stronger confidentiality for lower query cost. For genotype paths, the prototype can add bounded on-chain noise to mitigate probing attacks. Experiments on synthetic panels derived from a Polygenic Score (PGS) catalog show the expected scaling behavior and demonstrate that pre-aggregation can substantially reduce query gas when public marker presence is an acceptable trade-off. Overall, bioETH-Beacon provides a research prototype for confidential Beacon-style genomic querying without a trusted compute evaluator.

URL PDF HTML ☆

赞 0 踩 0

2606.19794 2026-06-19 econ.GN cs.CY q-fin.EC 新提交

Forecasting AI-Era Productivity: The Intellectually Converged Human Framework and a Missing Cognitive Mediator in Production Function Theory

预测AI时代的生产率：智力融合人类框架与生产函数理论中缺失的认知中介

Kwan Soo Shin, In Seok Kang

AI总结本文提出智力融合人类（ICH）框架，通过引入四维认知构念“融合能力”（C）作为AI与生产率之间的认知中介，解释了AI投资未能带来相应生产率增长的理论悖论，并基于20个OECD国家的数据分析验证了AI与C的交互作用对全要素生产率变异的解释力。

Comments 78 pages, 3 figures

详情

AI中文摘要

为什么大规模AI投资未能产生相应的生产率增长？我们认为这一悖论在理论上是生成的：主流生产函数框架通过将AI视为可分离的生产要素，而未建模AI产生生产性价值的认知中介，从而遇到了结构性边界。这导致投资倾向于部署，而生产率需要先发展我们称之为融合能力（C）的东西。我们提出了智力融合人类（ICH）框架，这是生产函数理论的第五阶段框架：H-hat = H[1 + phi(A,C)]，其中有效生产能力等于人力资本（H）乘以一个增强因子[1 + phi]，phi由AI利用强度（A）和融合能力（C）共同决定，C是一个四维认知构念，涵盖具身理解、元认知、时间整合和整合思维。生产函数Y = F(K, H-hat)为索洛的TFP残差提供了一个以人为中心的机制：A_Solow = [1 + phi(A,C)]^(1-alpha)。该框架预测了三种具有不同政策含义的增强机制。对20个OECD经济体的描述性跨国分析显示，AIxC交互作用与86%的TFP变异相关，而仅AI为31%，这是小n理论传统中模式一致的发现。韩国是国家级欠增强的例证：高H、大量A、低C导致phi=0。我们将融合能力与相邻构念——吸收能力、动态能力和人力资本——区分开来，并证明C构成了先前框架中隐含的特定认知中介。我们推导出C优先的政策建议，并提出了三个可实证检验的命题及一个可证伪的10年预测。

英文摘要

Why does massive AI investment fail to generate commensurate productivity gains? We argue the paradox is theoretically generated: prevailing production function frameworks encounter a structural boundary by treating AI as a separable factor of production without modeling the cognitive mediation through which AI generates productive value. This directs investment toward deployment when productivity requires prior development of what we term convergence capacity (C). We propose the Intellectually Converged Human (ICH) framework, a fifth-stage framework for production function theory: H-hat = H[1 + phi(A,C)], where effective productive capacity equals human capital (H) scaled by an augmentation factor [1 + phi], with phi jointly determined by AI utilization intensity (A) and convergence capacity (C), a four-dimensional cognitive construct encompassing embodied understanding, metacognition, temporal integration, and integrative thinking. The production function Y = F(K, H-hat) provides a human-centered mechanism for Solow's TFP residual: A_Solow = [1 + phi(A,C)]^(1-alpha). The framework predicts three augmentation regimes with distinct policy implications. Descriptive cross-national analysis of 20 OECD economies shows the AIxC interaction is associated with 86% of TFP variance versus 31% for AI alone, a pattern-consistent finding in the small-n theoretical tradition. South Korea exemplifies national-scale under-augmentation: high H, substantial A, low C produce phi = 0. We distinguish convergence capacity from adjacent constructs, absorptive capacity, dynamic capability, and human capital, and demonstrate that C constitutes the specific cognitive mediator that prior frameworks have left implicit. We derive C-first policy prescriptions and offer three empirically testable propositions with a falsifiable 10-year forecast.

URL PDF HTML ☆

赞 0 踩 0

2606.20553 2026-06-19 cs.CR 新提交

From Efficiency to Leakage -- Privacy Backdoor in Federated Language Model Fine-Tuning

从效率到泄露——联邦语言模型微调中的隐私后门

Shanghao Shi, Chaoyu Zhang, Heng Jin, Yang Xiao, Yevgeniy Vorobeychik, William Yeoh, Ning Zhang, Y. Thomas Hou, Wenjing Lou

AI总结提出NeuroImprint攻击，恶意参数服务器在参数高效微调中植入隐私后门，通过为每个样本分配独立神经元并限制单次更新，实现高保真重建训练文本。

详情

AI中文摘要

联邦学习（FL）使多方能够协作微调语言模型以完成特定领域任务，而无需共享原始数据。由于完整模型微调对FL客户端而言通常过于昂贵，参数高效微调（PEFT）已成为实践中的事实标准，它冻结基础模型，仅训练少量适配器。在本文中，我们表明恶意参数服务器可以隐秘地将PEFT适配器破坏为隐私后门，该后门隐式记忆客户端的训练样本，作为存储在独立神经元中的隔离的每样本参数更新，而不降低模型效用。具体来说，我们的攻击NeuroImprint为每个训练样本分配一个专用的记忆神经元，并约束每个神经元在局部微调轨迹中最多更新一次。这种设计减轻了语言模型微调中由大批量和状态优化器（如Adam/AdamW）引入的跨样本碰撞和跨步混合。微调后，得到的隔离的每样本更新可以通过闭式解析逆变换恢复文本嵌入，然后确定性地映射回令牌序列。为了理解我们方法的通用性，我们在多个语言模型（BERT、GPT-2、Qwen2和Llama3.2）上实现了NeuroImprint，并在涵盖不同领域的四个微调数据集上进行了评估。结果表明，我们的攻击能够以高语义保真度重建59%至79%的所有微调样本。

英文摘要

Federated learning (FL) enables multiple parties to collaboratively fine-tune language models for domain-specific tasks without sharing raw data. Since full model fine-tuning is often prohibitively expensive for FL clients, parameter-efficient fine-tuning (PEFT) has become the de facto approach in practice, freezing the base model and training only a small set of adapters. In this paper, we show that a malicious parameter server can stealthily corrupt a PEFT adapter into a privacy backdoor that implicitly memorizes the client's training samples as isolated per-sample parameter updates stored in separate neurons, without degrading model utility. Concretely, our attack, NeuroImprint, assigns a dedicated memorization neuron to each training sample and constrains that each neuron is updated at most once along the local fine-tuning trajectory. This design mitigates both cross-sample collisions and cross-step mixing introduced by large local batches and stateful optimizers (e.g., Adam/AdamW) in language-model fine-tuning. After fine-tuning, the resulting isolated per-sample updates can be analytically inverted in closed form to recover text embeddings, which are then deterministically mapped back to token sequences. To understand the generality of our method, we implemented NeuroImprint on multiple language models (BERT, GPT-2, Qwen2, and Llama3.2) and evaluated it across four fine-tuning datasets spanning diverse domains. The results demonstrate that our attack can reconstruct 59% to 79% of all finetuning samples with high semantic fidelity.

URL PDF HTML ☆

赞 0 踩 0

2606.20550 2026-06-19 cs.DL cs.HC cs.IR 新提交

Easy Reads: A Python program for making Scientific Papers on arXiv more Reader Friendly and Accessible

Easy Reads: 一个使arXiv上的科学论文更易读和更易访问的Python程序

Vishal Verma

AI总结针对科学论文排版紧凑、可读性差的问题，提出Easy Reads——一个自动化、端到端的开源Python程序，通过自定义字体大小和列数等格式，从arXiv获取论文并重新排版，提升可读性和可访问性。

Comments 9 pages. Open-source software project available at: https://github.com/Curious-flow/Easy-Reads

2606.20539 2026-06-19 cs.DB cs.DS 新提交

Caching for Dollars, Not Hits: An Exact Offline Reference for Cloud-Egress Caching and the Crossover That Decides When It Pays

为美元缓存，而非命中率：云出口缓存的精确离线参考及决定何时值得的交叉点

Madhulatha Mandarapu, Sandeep Kunkunuru

AI总结针对云存储出口费用而非延迟的缓存问题，提出多项式时间精确离线最优策略，发现LRU的美元后悔随成本分散度上升，而成本感知的GreedyDual可大幅降低，并给出决定何时需要成本感知缓存的闭合形式交叉点。

Comments 6 pages, 3 figures. Code, benchmarks, and full pre-registration: https://github.com/samyama-ai/cloud-egress-cache

详情

AI中文摘要

当缓存未命中从云对象存储获取数据时，计费基于每次GET请求和每字节出口流量，而非延迟。经典缓存最小化未命中率，这是错误的目标：一个很少但昂贵获取的对象可能比一个频繁但廉价获取的对象花费数千倍。广义缓存理论界定了未命中成本目标，但尚无公开基准衡量实际部署的启发式策略在真实云价格下与美元最优离线策略的差距。我们提供了该参考。对于具有异构未命中成本的统一大小页面缓存，离线美元最优可通过积分区间线性规划在多项式时间内精确求解——经暴力验证；可变大小是NP难的，因此我们将基于流的离线界从命中率目标扩展到美元（成本-FOO），误差约4%。基于此参考我们发现：(i) 异质性遗憾定律——LRU的美元遗憾随未命中成本分散度上升（Spearman 0.87），而成本感知的GreedyDual将其降至约十分之一；(ii) 竞争边界——当预算恰好覆盖昂贵工作集时，GreedyDual的残余遗憾降至接近零，否则为开放区间；(iii) 闭合形式交叉点 s* = GET费用/出口费率（S3上约4 KB，GCS上约330 B），可预测哪些部署需要成本感知缓存。在真实Twitter轨迹上，仅价格向量即可使工作负载跨越s*，按预测改变状态。该工件是一个可复现的计费忠实基准；其构建的启发式策略和界为先前工作，已致谢。

英文摘要

When a cache miss fetches from cloud object storage, the bill is per GET request and per byte of egress, not latency. Classic caching minimizes the miss rate, the wrong objective: a rarely but expensively fetched object can cost thousands of times more dollars than a frequently but cheaply fetched one. Generalized-caching theory bounds the miss-cost objective, but no reported benchmark measures how far deployed heuristics sit from the dollar-optimal offline policy on real cloud prices. We supply that reference. For uniform-size page caches with heterogeneous miss costs the offline dollar-optimum is exact in polynomial time via an integral interval linear program -- validated against brute force; variable sizes are NP-hard, so we extend the flow-based offline bound from the hit-ratio objective to dollars (cost-FOO), tight to about four percent. Against this reference we find: (i) a heterogeneity-regret law -- LRU's dollar-regret rises with miss-cost dispersion (Spearman 0.87) while cost-aware GreedyDual cuts it to roughly a tenth; (ii) a contention frontier -- GreedyDual's residual regret collapses to near zero exactly when the budget fits the expensive working set, and is the open slice otherwise; and (iii) a closed-form crossover s* = GET_fee/egress_rate (about 4 KB on S3, 330 B on GCS) that predicts which deployments need dollar-aware caching at all. On a real Twitter trace the price vector alone moves the workload across s*, shifting the regime as predicted. The artifact is a reproducible billing-faithful benchmark; heuristics and bounds it builds on are prior work, credited.

URL PDF HTML ☆

赞 0 踩 0

2606.20492 2026-06-19 cs.CR cs.LO 新提交

A-COMPASS: Formal Foundations for Anonymity Analysis in Microdata

A-COMPASS：微观数据匿名性分析的形式化基础

Tamara Tagliavia, Silvia Ghilezan

AI总结本文修改COMPASS语言为A-COMPASS，使其适用于微观数据表，支持匿名条件检查与匿名化操作，并证明其语义的确定性和组合性，可用于验证k-匿名和l-多样性等属性。

详情

AI中文摘要

在信息时代，主要问题之一是如何确保个人隐私。根据考虑隐私的背景，出现了各种数据隐私模型。然而，即使对于最基本的模型，这些模型的形式化验证领域仍未得到充分探索。验证隐私需求的一种尝试是合规断言语言（COMPASS）。在COMPASS中，可以指定表需要满足的匿名条件，以及条件不满足时将修改表的操作。它设计用于对预处理后的表进行操作，形式为一条记录对应一组人。在本文中，我们修改COMPASS语言，使其以通常的一条记录对应一个人的形式对微观数据表进行操作。修改后的语言称为A-COMPASS。除了检查先前应用的匿名条件外，A-COMPASS还作为新功能支持执行匿名化操作。我们进一步提供了A-COMPASS语言的语法和语义。我们还证明了引入的语义的最重要属性，如确定性和组合性。最后，我们提供了一种验证匿名属性（如k-匿名和l-多样性）的机制。

英文摘要

In the information age, one of the leading problems is how to ensure individual's privacy. Depending on the context in which privacy is considered, various data privacy models have emerged. However, the domain of formal verification of these models is still not sufficiently explored even when it comes to the most basic models. An attempt to verify privacy requirements is the Compliance Assertion Language (COMPASS). In COMPASS, one can specify an anonymity condition that a table needs to satisfy, and an action that will modify the table if the condition is not satisfied. It is designed to operate on preprocessed tables in a form one record - one group of people. In this paper, we modify the COMPASS language in order to operate on microdata tables in their usual form of one record - one person. The modified language is called A-COMPASS. Along with checking of previously applied anonymity conditions, A-COMPASS enables the execution of anonymization actions as a new feature. We further provide the syntax and the semantics for the A-COMPASS language. We also prove the most important properties of the introduced semantics like determinism and compositionality. Finally, we provide a mechanism to verify anonymity properties, such as k-anonymity and l-diversity.

URL PDF HTML ☆

赞 0 踩 0

2606.20490 2026-06-19 cs.MS 新提交

Software package MaRDI Open Interfaces for improved interoperability in numerical optimization

软件包MaRDI开放接口：提升数值优化互操作性

Dmitry I. Kabanov, Stephan Rave, Mario Ohlberger

AI总结提出MaRDI开放接口软件包，通过统一非线性优化接口减少编码与测试工作，并以物理信息神经网络求解粘性Burgers方程为例验证其互操作性。

Comments 15 pages, 1 figure, 1 table, GAMM2026

2606.20465 2026-06-19 cs.CY cs.SI 新提交

Farmer Connect: Improving Farmers' Access to Produce Markets

Farmer Connect：改善农民进入农产品市场的途径

Micheal Amanya, Darius Kainamura, Christine Namatovu, Lailah Kobugabe, Solomon Buwule Fortune, Adones Rukundo

AI总结针对乌干达小农户面临的市场准入难、议价能力弱等问题，提出基于合作社的数字平台Farmer Connect，通过移动优先架构和云后端支持群体管理、市场协调和收益透明，实现约85%的用户需求。

详情

AI中文摘要

乌干达的小农户玉米种植者仍然面临有限的市场准入、薄弱的议价能力、低价格透明度以及对中间商的严重依赖。这些问题因农产品协调不善、付款延迟以及合作社交易可见性差而加剧。本文介绍了Farmer Connect，一个基于合作社的数字平台，旨在支持农民群体之间的农产品管理、市场协调和透明的收益跟踪。该系统支持四种用户角色：管理员、监督员、农民和客户。其核心功能包括农民群体管理、贡献记录和验证、市场列表、订单处理、基于先进先出的农产品分配、收益可见性、移动货币支付支持和通知服务。该平台采用移动优先架构，配备基于云的后端服务和行政网页仪表板。功能实现表明，该系统能够支持基于群体的玉米营销和合作社协调所需的主要工作流程，约85%的已识别用户需求得到实现。研究表明，以合作社为中心的数字平台可以为改善小农户的透明度、协调性和买家准入提供实用框架。

英文摘要

Smallholder maize farmers in Uganda continue to face limited market access, weak bargaining power, low price transparency, and heavy reliance on intermediaries. These challenges are compounded by poor produce coordination, delayed payments, and weak visibility into cooperative transactions. This paper presents Farmer Connect, a cooperative-based digital platform designed to support produce management, marketplace coordination, and transparent earnings tracking among farmer groups. The system supports four user roles: administrators, supervisors, farmers, and customers. Its core functions include farmer group management, contribution recording and verification, marketplace listing, order processing, First In First Out based produce allocation, earnings visibility, mobile money payment support, and notification services. The platform was implemented using a mobile-first architecture with cloud-based backend services and an administrative web dashboard. Functional implementation showed that the system was able to support the major workflows required for group-based maize marketing and cooperative coordination, with approximately 85% of identified user requirements implemented. The study shows that cooperative-centered digital platforms can provide a practical framework for improving transparency, coordination, and buyer access for smallholder farmers.

URL PDF HTML ☆

赞 0 踩 0

2606.20454 2026-06-19 cs.FL 新提交

Minimality of Random Moore Automata under Prefix-Dependent Congruences

随机摩尔自动机在前缀依赖同余下的极小性

Matías Carrasco, Sergio Yovine

AI总结研究随机确定性迁移系统中前缀依赖同余的平凡性，证明在标签独立且每个标签至少有三个可接受符号时，同余高概率为平凡。

Comments 9 pages

详情

AI中文摘要

我们研究带有状态输出的随机确定性迁移系统的前缀依赖同余。在此设定下，用于比较两个状态的可接受延续可能依赖于观察到的前缀，并且只有当没有共同的可接受延续能区分它们的未来输出时，两个状态才被识别。该框架包括概率确定性有限自动机作为一个激励性的特例。我们分析随机迁移模型，其中所有迁移值是独立且均匀的。每个状态还被分配一个独立标签，该标签指定其输出及其可接受符号集。如果两个独立标签以严格小于1的概率一致，并且每个标签至少有三个可接受符号，则诱导的同余以高概率是平凡的。证明结合了配对上的剪枝过程、控制其早期演化的无碰撞探索，以及表明剩余配对无法组织成非平凡等价类的第一矩论证。

英文摘要

We study prefix-dependent congruences for random deterministic transition systems with state outputs. In this setting, the admissible continuations used to compare two states may depend on the observed prefix, and two states are identified only if no common admissible continuation distinguishes their future outputs. The framework includes probabilistic deterministic finite automata as a motivating special case. We analyze the random transition model in which all transition values are independent and uniform. Each state is also assigned an independent label that specifies both its output and its set of admissible symbols. If two independent labels agree with probability strictly less than one, and every label has at least three admissible symbols, then the induced congruence is trivial with high probability. The proof combines a pruning process on pairs, a collision-free exploration controlling its early evolution, and a first-moment argument showing that the remaining pairs cannot organize into nontrivial equivalence classes.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

A case study of causal mediation using Bayesian nonparametrics and semiparametric corrections

DASH: A Dimensionality Reduction Method for Large-scale Convex MIQP with Applications in Subset Portfolio Selection

Community detection in small-sample ordinal regimes: A benchmarking framework for Delphi data

A Law of Iterated Expectation Primer for Causal Inference

A minimum-risk and cost-efficient two-sample sequential testing framework for the shifted exponential models with application to precipitation data

Built-in Selection Bias in Proportional Hazards Models with Omitted Covariates: Simulation Evidence and Alternative Approaches

The Ghosh-Lin and Fine-Gray models for a mix of administrative and random censoring

Covariate-Adjusted Functional Principal Components Analysis for Modeling Hazard Rates of Physical Activity in the US Population

A Bayesian spatio-temporal nearest neighbor Gaussian process model for pooled genetic data

Calibration without labels in multiple testing

Machine Learning Integrated in Wavelet Shrinkage (MLShrink)

SCOPE Shrinkage: A Unified Framework for Wavelet Denoising

Overfitted high-dimensional matrix factorizations via adaptive spectral shrinkage

Advanced Calibration Analysis and Tools: Identifying Influential Observations in Stochastic Interest Rate Model Calibration

How to spot outliers: an Ensemble Anomaly Detection Framework

What Capital After Labor? Forecasting the Talent ROI Transition in the Human-AI Era

Which Portfolios? The Construction Dependence of Factor Model Performance

Do Prediction Markets Match Option Prices? Bitcoin Threshold Evidence from Binance and Polymarket

Oscillations and Spatial Patterns in Large-Scale Stochastic Gene Regulatory Networks

Robust probabilistic measurement of structural-functional module consistency in infant brain development

BioHarness: Substrate-Aware Evidence Assembly for Biomedical Question Answering across Literature, Knowledge Bases, and Biological Atlases

bioETH-Beacon: A Confidential On-Chain Genomic Beacon with Encrypted Counts, Filters, and Bounded Noise over a Fully Homomorphic EVM

Forecasting AI-Era Productivity: The Intellectually Converged Human Framework and a Missing Cognitive Mediator in Production Function Theory

From Efficiency to Leakage -- Privacy Backdoor in Federated Language Model Fine-Tuning

Easy Reads: A Python program for making Scientific Papers on arXiv more Reader Friendly and Accessible

Caching for Dollars, Not Hits: An Exact Offline Reference for Cloud-Egress Caching and the Crossover That Decides When It Pays

A-COMPASS: Formal Foundations for Anonymity Analysis in Microdata

Software package MaRDI Open Interfaces for improved interoperability in numerical optimization

Farmer Connect: Improving Farmers' Access to Produce Markets

Minimality of Random Moore Automata under Prefix-Dependent Congruences