arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2251
2606.10311 2026-06-10 cs.SE 新提交

From Awareness to Action: How Developers Engage with Accessibility Innovation in LLM-Assisted Development

从意识到行动:开发者如何在LLM辅助开发中参与无障碍创新

Thayssa Águila da Rocha, Luciane Silva, Ana Duarte, Marcelle Pereira Mota, Gustavo Pinto

AI总结 通过分析14个基于LLM的无障碍项目提案和9名参与者的焦点小组讨论,发现由残障人士主导的倡议能促进包容性创新,推动无障碍从合规要求转变为技术卓越和文化变革的驱动力。

Comments WASHES 2026

详情
AI中文摘要

开发者通常在企业环境中难以设计真正可访问的数字解决方案。在这些环境中,无障碍通常被视为合规要求而非创新机会。通过分析巴西一家科技公司的14个基于LLM的无障碍项目提案和9名参与者的焦点小组讨论,我们发现,当倡议由残障人士(PWD)自己领导时,包容性创新尤其可能出现。如果组织采用类似的参与式方法,无障碍将从事后考虑演变为技术卓越和文化变革的驱动力。

英文摘要

Developers often struggle to design truly accessible digital solutions in corporate environments. In these environments, accessibility is usually treated as a compliance requirement rather than an innovation opportunity. By analyzing 14 LLM-based accessibility project proposals and focus group discussions with 9 participants at a Brazilian tech company, we found that inclusive innovation can emerge particularly when initiatives are led by People with Disabilities (PWD) themselves. If organizations adopt similar participatory approaches, accessibility would evolve from an afterthought into a driving force for technological excellence and cultural transformation.

2606.10308 2026-06-10 eess.SY cs.SY 新提交

On Time-Delay Compensators for Delayed-Output Systems

关于延迟输出系统的时间延迟补偿器

Hieu Trinh

AI总结 针对测量延迟与内部状态延迟独立且满足0<h<τ的线性时滞系统,提出一种多内部延迟与增强架构的功能观测器,建立代数存在条件并给出构造性综合方法,通过引入额外延迟测量向量扩展设计空间。

详情
AI中文摘要

本文通过解决线性时滞系统中的传感延迟问题,推进了功能观测器理论的实际应用。我们考虑在测量延迟$h$与内部状态延迟$\tau$独立的情况下,估计功能$z(t)=Fx(t)$,特别关注条件$0 < h < \tau$。为了补偿传感滞后,我们提出了一种具有多个内部延迟和增强架构的功能观测器结构。建立了代数存在条件以及构造性综合步骤。通过引入额外的延迟测量向量,我们证明了该方法显著扩展了设计空间,并适用于具有更大状态和输出延迟的更广泛系统类别。

英文摘要

This paper advances the practical utility of functional observer theory by addressing sensing latency in linear time-delay systems. We address the estimation of the functional $z(t)=Fx(t)$ in cases where the measurement delay $h$ is independent of the internal state delay $τ$, with a specific focus on the condition $0 < h < τ$. To compensate for sensing lags, we propose a functional observer structure characterized by multiple internal delays and an augmented architecture. Algebraic existence conditions are established alongside a constructive synthesis procedure. By incorporating an additional delayed measurement vector, we demonstrate that this approach significantly expands the design space and is applicable to a wider class of systems with larger state and output delays.

2606.10303 2026-06-10 cs.AR cs.DC 新提交

Isolation-aware Scheduling Framework for DNN-based End-to-End Autonomous Driving System on Tile-based Accelerators

基于瓦片加速器的端到端自动驾驶系统中DNN的隔离感知调度框架

Chenguang Zhang, Yuanpeng Zhang, Chenhao Xue, Yihan Yin, Chen Zhang, Guangyu Sun

AI总结 提出ADS-Tile框架,通过可配置隔离与弹性预留结合,在瓦片加速器上优化DNN共置调度,降低重分配开销,满足严格延迟约束。

Comments Accepted by IEEE Transactions on Computers

详情
AI中文摘要

L4+级自动驾驶系统(ADS)必须在严格延迟约束(<=100毫秒)下运行数十个异构深度神经网络(DNN)作为端到端(E2E)流水线,即使执行时间变化高达3.3倍。成本因素排除了在大规模生产的ADS中为每个功能分配独立硬件,因此这些DNN必须密集共置于单个芯片上,这引入了共享资源争用。基于瓦片的加速器提供了传统ADS调度器未利用的两个调度机会。首先,它们提供了可调并行度(DoP):分配更多瓦片可提高DoP并缩短DNN执行时间。其次,它们提供了硬件原生隔离:瓦片可以在共置的DNN之间进行物理分区。但利用这种灵活性代价高昂:更改任务的DoP会触发其权重和中间特征的停止-迁移-重启重新分配。在ADS任务速率为10-240 Hz时,这些停顿沿E2E链累积并威胁截止时间。基于预留的调度器固定DoP,未利用这种灵活性;工作守恒调度器利用它但假设重新分配代价低,并将截止时间视为独立。我们提出ADS-Tile,将可配置隔离和弹性预留结合成时空隔离共享空间,限制重新分配发生的位置和时间;然后,概率延迟模型和DAG感知运行时调度器利用该空间在共享E2E截止时间下决定任务共置和DoP。在工业和学术衍生的ADS基准测试中,ADS-Tile在截止时间关键场景下比工作守恒基线使用最多少32%的瓦片,并将重新分配导致的浪费处理能力从17%-44%降低到1.2%以下。受控的时空共享提高了基于瓦片ADS的资源效率和延迟可预测性。

英文摘要

Level-4+ autonomous driving systems (ADS) must run dozens of heterogeneous deep neural networks (DNNs) as end-to-end (E2E) pipelines under a strict latency constraint (<=100 ms), even as execution time varies by up to 3.3x. Cost rules out dedicating isolated hardware to each function in mass-produced ADS, so these DNNs must be densely colocated on a single chip, which introduces shared-resource contention. Tile-based accelerators expose two scheduling opportunities that conventional ADS schedulers do not exploit. First, they provide a tunable degree of parallelism (DoP): assigning more tiles raises DoP and can shorten DNN execution time. Second, they provide hardware-native isolation: tiles can be physically partitioned among co-located DNNs. But using this flexibility is expensive: changing a task's DoP triggers a stop-migrate-restart reallocation of its weights and intermediate features. At ADS task rates of 10-240 Hz, these stalls accumulate along E2E chains and threaten deadlines. Reservation-based schedulers fix DoP and leave this flexibility unused; work-conserving schedulers exploit it but assume reallocation is cheap and treat deadlines as independent. We present ADS-Tile that combines configurable isolation and elastic reservation into a spatio-temporal isolation-sharing space that bounds where and when reallocation occurs; a probabilistic latency model and a DAG-aware runtime scheduler then use this space to decide task colocation and DoP under shared E2E deadlines. On an industry- and academia- derived ADS benchmark, ADS-Tile uses up to 32% fewer tiles than the work-conserving baseline in deadline-critical settings and cuts reallocation-induced wasted processing capacity from 17%-44% to below 1.2%. Controlled spatio-temporal sharing improves resource efficiency and latency predictability for tile-based ADS.

2606.10297 2026-06-10 cs.CE 新提交

Decomposing Firm-Level Crisis Responses from Incomplete Market Signals: Evidence from China's IT Sector During COVID-19

从不完全市场信号中分解企业层面的危机响应:来自COVID-19期间中国IT行业的证据

Xiao Han, Yao Xiao

AI总结 本文提出结合因果识别、无监督行为发现和横截面韧性预测的多方法框架,从噪声市场信号中分解企业层面的危机响应异质性,并应用于COVID-19期间246家中国A股IT公司。

详情
AI中文摘要

外生冲击在企业间产生异质性行为响应,然而事件研究通常只报告行业层面的平均值。本文开发了一种多方法方法,结合因果识别(使用聚类稳健推断的倍差法)、无监督行为发现(K均值轨迹聚类、高斯隐马尔可夫模型)和横截面韧性预测(使用bootstrap推断的逻辑回归),从噪声市场信号中分解企业层面的响应异质性。我们在COVID-19冲击(2020年1月)期间对246家中国A股IT公司(其中216家具有完整数据用于所有分析)展示了该方法,使用252家非IT沪深300公司作为对照。回报下降是市场范围的,而非IT特定(DID p = 0.59);IT特定效应是波动性升高(DID \eta = 0.043,聚类稳健p < 0.001),该效应在替代规格下通过Benjamini-Hochberg校正。无监督聚类产生了三类轨迹:快速恢复(36家公司,+29.7%)、韧性/中等(67家公司)和持续拖累(113家公司,-6.9%)。危机前的财务基本面不能很好地预测韧性(AUC = 0.64,95% CI: 0.57-0.71),这与有效市场将公共信息纳入股价的观点一致。因果分析、无监督学习和预测的结合代表了一个可复制的框架,可应用于其他市场时期的危机。

英文摘要

Exogenous shocks generate heterogeneous behavioral responses across firms, yet event studies typically report only sector-level averages. This paper develops a multi-method approach combining causal identification (difference-in-differences with cluster-robust inference), unsupervised behavioral discovery (K-means trajectory clustering, Gaussian hidden Markov models), and cross-sectional resilience prediction (logistic regression with bootstrap inference) to decompose firm-level response heterogeneity from noisy market signals. We demonstrate the approach on 246 Chinese A-share IT firms (216 with complete data for all analyses) during the COVID-19 shock (January 2020), using 252 non-IT CSI 300 firms as controls. The return decline was market-wide, not IT-specific (DID p = 0.59); the IT-specific effect was elevated volatility (DID \b{eta} = 0.043, cluster-robust p < 0.001), with the effect surviving Benjamini-Hochberg correction across alternative specifications. Unsupervised clustering produced three categories of trajectories: fast recovery (36 companies, +29.7%), resilient/moderate (67 companies), and persistent drag (113 companies, -6.9%). Prior-to-crisis financial fundamentals did not predict resilience well (AUC = 0.64, 95% CI: 0.57-0.71), in line with efficient markets' incorporation of public information into stock prices. The combination of causal analysis, unsupervised learning, and prediction represents a reproducible framework which can be applied to crises in other market periods.

2606.10293 2026-06-10 cs.DL cs.GT 新提交

How Many Submissions May an Author Make? A Harmonic Quota for Submissions under Coauthorship

一位作者可以提交多少篇论文?合作作者制下的和谐配额制

Nihar B. Shah

AI总结 针对合作作者制下的提交限额问题,提出和谐配额规则,使作者成本随合作者数量递减,并开发广义框架以平衡合作与防操纵。

详情
AI中文摘要

研究评估系统——包括期刊、会议和资助机构——越来越多地使用作者级别的提交限额来管理不断增长的提交量。现有大多数政策将每篇提交视为对每位合著者配额的单一成本。这对待独著提交和大型合作提交对每位作者相同,尽管合作提交的审稿需求是由多位作者共同承担而非一人。因此我们提出一个问题:在合作作者制下,一位作者可以提交多少篇论文?我们提出一个“和谐配额规则”,其中作者对一篇提交的成本随合著者数量减少,为其调和数的倒数。我们以一种原则性的方式推导出该规则,该方式在尊重合作与通过添加虚假作者来抵抗操纵之间取得平衡。我们还开发了一个广义和谐配额规则,这是一个包含和谐配额规则和其他自然配额规则的框架。我们的框架只需指定三个可解释的参数,从而使组织者能够在各种看似不同的规则中进行选择。我们的工作也可能适用于其他稀缺资源分配场景,例如计算资源和望远镜时间的分配。交互式工具可在此 https URL 获取。

英文摘要

Research evaluation systems -- including journals, conferences, and funders -- are increasingly using author-level submission limits to manage growing submission loads. Most existing policies charge each submission as a unit cost against every coauthor's quota. This treats a solo-authored submission and a large collaborative submission identically for each author, even though the reviewing demand of a collaborative submission is jointly attributable to many authors rather than one. Thus we ask the question: how many submissions may an author make under coauthorships? We propose a "Harmonic Quota Rule", in which an author's cost for a submission decreases with the number of coauthors as the reciprocal of their harmonic number. We derive this rule in a principled manner that navigates the tension between respecting collaborations and being resistant to manipulation by adding spurious authors. We also develop a Generalized Harmonic Quota Rule, a framework that subsumes the Harmonic Quota Rule and other natural quota rules. Our framework requires specification of only three interpretable parameters, thereby enabling organizers to choose among various seemingly disparate rules. Our work may also be useful in other scarce-resource allocation settings, such as allocation of compute and telescope time. An interactive tool is available at https://www.cs.cmu.edu/~nihars/quota/organizer.html

2606.10290 2026-06-10 cs.CR cs.SE 新提交

The Linux IOCTL Census: A Source-Derived Database of the Linux Kernel Control-Code Surface

Linux IOCTL 普查:基于源码的 Linux 内核控制码表面数据库

Michael J. Bommarito

AI总结 通过静态分析构建 Linux 内核 ioctl 接口的可查询清单,覆盖 878 个模块、586 个分发入口点及 1289 个命令码,并编码威胁模型以区分权限门控区域,支持跨操作系统查询。

Comments 15 pages, 5 figures, 4 tables. Companion structural-tier dataset: https://huggingface.co/datasets/mjbommar/linux-ioctl-census

详情
AI中文摘要

ioctl 系统调用是 Linux 的全功能设备控制接口。用户空间程序打开设备节点并向驱动程序传递一个数字命令码和一个参数缓冲区,驱动程序执行该命令码对应的操作,无论是配置硬件、读取状态,还是在内核与用户空间之间移动数据。驱动程序自行定义这些命令,数量成千上万,并在内核上下文中解析其参数,这使得 ioctl 处理程序成为内核中最广泛且最不统一的局部攻击面之一。一个信任未经验证参数长度的处理程序可能导致越界读写内核内存,而命令空间没有集中编目。我们提出了 Linux IOCTL Census,这是一个基于源码且可查询的攻击面清单。一次 allmodconfig 构建会编译 169 个子树中的 878 个模块,通过一次确定的 libclang 遍历内核源码,我们恢复了 586 个 ioctl 分发入口点、1289 个解码的 _IOC 命令码、3583 个受控输入接收点以及 1298 个权限门。第二次遍历将内核自身文档化的威胁模型编码为可查询的列,将无能力门控的 ioctl 表面(非特权可达的上界,而非已证明的可达性)与硬能力门控排除的部分区分开。我们使用 22 个近期内核树内 ioctl CVE 对普查结果进行回溯测试,并以开放数据形式发布结构层,采用与配套的 Windows IOCTL Census 共享的模式,以便单个查询可跨越两个操作系统。

英文摘要

The ioctl system call is Linux's catch-all device-control interface. A userspace program opens a device node and hands the driver a numeric command code and an argument buffer, and the driver does whatever that code means, whether configuring hardware, reading back state, or moving data into and out of the kernel. Drivers define these commands themselves, by the thousand, and parse their arguments in kernel context, which makes ioctl handlers one of the broadest and least uniform local attack surfaces in the kernel. A handler that trusts an argument length it never validates can read or write kernel memory out of bounds, and the command space is catalogued in no central place. We present the Linux IOCTL Census, a source-derived and queryable inventory of that surface. An allmodconfig build compiles 878 modules across 169 subtrees, and over them a single deterministic libclang pass over the kernel source recovers 586 ioctl dispatch entry points, 1,289 decoded _IOC command codes, 3,583 controlled-input sinks, and 1,298 permission gates. A second pass encodes the kernel's own documented threat model as a queryable column, separating the capability-ungated ioctl surface, an upper bound on unprivileged reach rather than proven reach, from the part a hard capability gate puts out of scope. We backtest the census against 22 recent in-tree ioctl CVEs and release the structural tier as open data, on a schema shared with the companion Windows IOCTL Census so a single query spans both operating systems.

2606.10270 2026-06-10 cs.DB cs.DC cs.LO 新提交

Determination Provenance: From Ambiguity to Algebra

确定性溯源:从歧义到代数

Joseph M. Hellerstein

AI总结 提出确定性溯源框架,通过支持集和滤过结构量化数据结果对语义分辨的依赖程度,适用于事务隔离和Datalog¬。

Comments 15 pages body, 34 pages total

详情
AI中文摘要

许多数据系统对同一输入允许多个可接受的结果:并发事务可能以多种顺序之一序列化;逻辑程序可能有多个稳定模型。经典数据溯源甚至无法在此类设置中提出问题——它解释结果是如何推导的,但仅在某个选择决定了产生哪个结果之后。我们引入\emph{确定性溯源}来跟踪解决这种歧义的承诺。元组的\emph{支持集}是其成立的分辨集。支持集构成交换半环,分层承诺诱导一个\emph{滤过},度量每个元组的\emph{查询相对深度}——它依赖的语义分辨层数。正关系代数尊重该滤过,使得组合鲁棒性分析和分辨代价的定量诊断成为可能。我们为事务隔离和$\mbox{Datalog}^\neg$实例化该框架;在两者中,经典语义变体(隔离级别;否定语义)对应于单个共享滤过的不同视图。

英文摘要

Many data systems admit multiple admissible outcomes for the same input: concurrent transactions may serialize in one of many orders; a logic program may have multiple stable models. Classical data provenance cannot even pose its question in such settings -- it explains how a result was derived, but only after something has chosen which result to produce. We introduce \emph{determination provenance} to track the commitments that resolve this ambiguity. A tuple's \emph{support} is the set of resolutions under which it holds. Supports form a commutative semiring, and layered commitments induce a \emph{filtration} measuring each tuple's \emph{query-relative depth} -- how many layers of semantic resolution it depends on. Positive relational algebra respects the filtration, enabling compositional robustness analysis and quantitative diagnosis of resolution cost. We instantiate the framework for transactional isolation and for $\mbox{Datalog}^\neg$; in both, classical semantic variants (isolation levels; negation semantics) correspond to different views of a single shared filtration.

2606.10264 2026-06-10 cs.CR cs.SE 新提交

RECON: An LLM-Enhanced Backward Constraint Analysis Framework

RECON:一种增强LLM的逆向约束分析框架

Babangida Bappah, Lamine Noureddine, Umar Farooq, Aisha Ali-Gombe

AI总结 提出RECON框架,结合LLM语义理解与静态分析,从Android字节码中提取精确执行约束,比传统符号执行快5.8倍且成功率达100%。

详情
AI中文摘要

尽管传统技术(如符号执行)为程序分析中的精确约束推理提供了原则性基础,但它们难以扩展到现代软件系统,主要原因包括路径爆炸、函数建模需求以及低级程序表示中语义意图的丢失。在诸如Android等复杂执行环境中,由于广泛的框架交互和事件驱动行为,这些限制更加突出。因此,在本文中,我们提出了一种新颖的大型语言模型(LLM)增强的逆向约束分析框架,该框架结合了静态程序分析的精确性与LLM的语义理解能力,从Android字节码中提取精确的执行约束。我们的方法名为RECON,从目标方法到应用程序入口点执行逆向路径发现,发现方法级控制流约束,并利用LLM推理将字节码条件转换为可解释的规范。我们使用五种LLM在78个Android约束提取场景中评估了RECON,并将其与真实世界应用上的传统符号执行进行了比较。结果表明,我们的方法比传统符号执行快5.8倍,成功率达100%,同时保持逻辑等价性并提供显著更精确和可解释的输出。我们进一步在100个样本上评估了RECON用于恶意软件分析。结果表明,在生成导致危险API行为执行的语义约束以及检测跨多个执行路径的复杂约束方面,成功率达84%。

英文摘要

While traditional techniques, such as symbolic execution, provide a principled foundation for precise constraint reasoning in program analysis, they struggle to scale to modern software systems mainly due to path explosion, the need for function modeling, and the loss of semantic intent at low-level program representations. In complex execution environments such as Android, characterized by extensive framework interactions and event-driven behavior, these limitations are even more amplified. Thus, in this paper, we present a novel large language model (LLM)-enhanced backward constraint analysis framework that combines the precision of static program analysis with LLM's semantic understanding to extract precise execution constraints from Android bytecode. Our approach, titled RECON, performs backward path discovery from target method(s) to the application entry point(s), discovers method-level control-flow constraints, and leverages LLM reasoning to transform bytecode conditions into interpretable specifications. We evaluated RECON using five LLMs across 78 Android constraint-extraction scenarios and compared it with traditional symbolic execution on real-world applications. Results demonstrate that our approach operates 5.8X faster than traditional symbolic execution, with a 100% success rate, while maintaining logical equivalence and providing significantly more precise and interpretable output. We further evaluated RECON for malware analysis on 100 samples. The results indicate an 84% success rate in generating semantic constraints that lead to the execution of dangerous API behaviors and in detecting complex constraints across multiple execution paths.

2606.10263 2026-06-10 cs.LO 新提交

Dynamic E-unification

动态E-统一

Kun Han, Christopher Lynch

AI总结 提出一种针对非基(不)等式集合与动态基(不)等式集合的E-统一过程,通过叠加和实例化规则实现完备性,可用于量化SMT问题中的模型演化。

详情
AI中文摘要

我们提出了一种针对非基(不)等式集合以及动态基(不)等式集合的E-统一过程,并证明了其完备性。基部分之所以是动态的,是因为它不断变化。该算法使用基于基理论的叠加来饱和非基等式。我们还有一个实例化规则,将非基(不)等式的左侧与基项匹配,生成新的基(不)等式,从而改变基理论。该算法可用于量化SMT问题,其中动态基理论代表演化的模型。我们开发了一种排序来比较模基理论的项,用于定向非基等式。我们证明了该排序的性质,使用了弱形式的单调性和子项性质。最后,我们为排序提出了一组推理规则,允许我们在某些有限数据结构理论中正确定向等式,例如具有长度和追加的有限列表理论。

英文摘要

We present an E-unification procedure for a set of non-ground (dis)equations, along with a dynamic set of ground (dis)equations, and prove its completeness. The ground part is dynamic in the sense that it continually changes. The algorithm saturates the non-ground equations using Superposition modulo the ground theory. We also have an Instantiation rule that matches the left hand side of non-ground (dis)equations with ground terms, creating new ground (dis)equations, which changes the ground theory. This algorithm can be used in quantified SMT problems, where the dynamic ground theory represents the evolving model. We develop an ordering to compare terms modulo a ground theory, which is used to orient non-ground equations. We prove properties of this ordering, using a weak form of monotonicity and subterm property. We finally present a set of inference rules for our ordering, which allows us to properly orient equations in theories of some finite data structures, such as a theory of finite lists with length and append.

2606.10261 2026-06-10 cs.GT 新提交

Leveraging Machine-Learned Advice in Strategic Interactions with No-Regret Learners

利用机器学习建议与无悔学习者的策略互动

Tinashe Handina, Tongxin Li, Kishan Panaganti, Eric Mazumdar, Adam Wierman

AI总结 研究在与无悔学习者博弈时,如何有效利用可能不完美的建议,通过引入伪度量量化建议有用性,并展示其在模拟器和收益矩阵预测中的应用,证明好建议能降低计算近似Stackelberg策略的交互复杂度。

Comments AISTATS 2026

详情
AI中文摘要

我们研究了一个在两人重复博弈中的智能体如何在与无悔学习者互动时有效利用可能不完美的建议。通过引入一个伪度量来量化建议实例的有用性,我们刻画了建议的格局。我们通过两种形式的建议展示了伪度量的适用性:模拟器和收益矩阵预测。然后,我们展示了拥有建议正确性保证的优化玩家如何利用模拟器更高效地计算近似Stackelberg策略,减少了传统上所需的交互复杂度,并说明了良好建议的力量。最后,我们将分析扩展到建议没有任何正确性保证的设置。我们发现,一般来说,当建议近似准确时,玩家无法同时保证接近Stackelberg的表现,而当建议不准确时,也无法保证无悔条件。然而,我们确实表明,在某些(粗)相关均衡中,接受建议的玩家可以弱支配其效用。

英文摘要

We study how an agent in a two-player repeated game can effectively utilize potentially imperfect advice when interacting with a no-regret learner. We characterize the advice landscape by introducing a pseudo-metric to quantify the usefulness of an advice instance. We demonstrate the pseudo-metric's applicability through two forms of advice: simulators and payoff matrix predictions. We then show how an optimizing player, equipped with correctness guarantees on the advice, could leverage simulators to compute approximate Stackelberg strategies more efficiently, reducing the interaction complexity traditionally required and illustrating the power of good advice. Finally, we extend our analysis to settings where the advice does not have any guarantee of correctness. We find that, in general, a player cannot simultaneously guarantee near Stackelberg performance when the advice is approximately accurate and a no-regret condition when the advice is inaccurate. We do show, however, that it is possible for an advice-aided player to weakly dominate their utility in some (coarse)-correlated equilibria.

2606.10226 2026-06-10 cs.GT 新提交

Continuity of VaR and Continuous Differentiability of CVaR under Decision-Dependent Losses

决策依赖损失下VaR的连续性和CVaR的连续可微性

Amal Sakr, Andrea Araldo, Tamer Başar, Tijani Chahed

AI总结 研究决策依赖损失下VaR的连续性和CVaR的连续可微性,给出简单充分条件并推导CVaR梯度公式,为尾风险模型的一阶分析提供依据。

详情
AI中文摘要

风险价值(VaR)和条件风险价值(CVaR)广泛应用于风险感知优化和均衡模型。当损失依赖于决策变量时,诱导分布、VaR阈值和CVaR尾部集均随决策变化,这使得VaR和CVaR映射的正则性变得非平凡。我们给出了简单充分条件,使得VaR映射连续且相应的CVaR映射连续可微。这些假设在VaR水平附近是局部的,并依赖于情景损失的主路径可微性。我们还推导了CVaR梯度公式,从而为决策依赖尾风险模型的一阶分析提供了依据。

英文摘要

Value-at-risk (VaR) and conditional value-at-risk (CVaR) are widely used in risk-aware optimization and equilibrium models. When the loss depends on a decision variable, the induced distribution, the VaR threshold, and the CVaR tail set all change with the decision. This makes the regularity of the VaR and CVaR maps nontrivial. We give simple sufficient conditions under which the VaR map is continuous and the corresponding CVaR map is continuously differentiable. The assumptions are local around the VaR level and rely on dominated pathwise differentiability of the scenario-wise loss. We also derive the CVaR gradient formula, thereby justifying first-order analysis for decision-dependent tail-risk models.

2606.10225 2026-06-10 cs.CE cs.SI 新提交

BENI Global 10: A Multilingual Economic Narrative Corpus for the Global South

BENI Global 10:面向全球南方的多语言经济叙事语料库

Ann Naser Nabil

AI总结 针对经济叙事研究英语中心化问题,构建覆盖10种语言、7个语系和5个经济区域的522,397篇经济新闻语料库,提供可复现流式处理管道、跨语言索引及区域叙事框架差异分析。

详情
AI中文摘要

经济叙事指数主要基于英语;84%的基于情感预测的研究聚焦于发达经济体。我们提出BENI Global 10,首个多语言经济新闻语料库,覆盖10种语言(孟加拉语、印地语、土耳其语、印尼语、葡萄牙语、阿拉伯语、越南语、菲律宾语、斯瓦希里语和乌尔都语),跨越7个语系和5个经济区域(孟加拉国、印度、土耳其、印度尼西亚、巴西、埃及、越南、菲律宾、肯尼亚和巴基斯坦)。该语料库包含从280万原始文档中通过每语言25-32个翻译关键词过滤得到的522,397篇经济相关文章。我们提供:(1) 可复现的流式处理管道,支持低资源环境下的断点续传;(2) 每语言的模式归一化Parquet文件,包含经济相关性标签;(3) 覆盖2018-2024年的时间同步跨语言索引;(4) 比较分析,揭示全球南方地区经济叙事框架的系统性差异。标注者间一致性在所有语言上达到kappa > 0.70。完整数据集、代码和标注指南已公开发布供研究使用。

英文摘要

Economic narrative indices are predominantly English-centric; 84% of sentiment-based forecasting research focuses on developed economies. We present BENI Global 10, the first multilingual economic news corpus spanning 10 languages across 7 language families and 5 economic regions: Bangla (Bangladesh), Hindi (India), Turkish (Turkey), Indonesian (Indonesia), Portuguese (Brazil), Arabic (Egypt), Vietnamese (Vietnam), Filipino (Philippines), Swahili (Kenya), and Urdu (Pakistan). The corpus contains 522,397 economically relevant articles filtered from 2.8M raw documents using 25-32 translated keywords per language. We provide: (1) a reproducible streaming pipeline with checkpoint-resume for low-resource environments, (2) per-language schema-normalized Parquet files with economic relevance labels, (3) a temporally synced cross-lingual index covering 2018-2024, and (4) comparative analysis revealing systematic differences in how economic narratives are framed across Global South regions. Inter-annotator agreement reaches kappa > 0.70 across all languages. The complete dataset, code, and annotation guidelines are publicly released for research use.

2606.10211 2026-06-10 cs.SE 新提交

TestMap: Evidence Infrastructure for Foundation-Model-Assisted Test Generation

TestMap:基础模型辅助测试生成的证据基础设施

Hunter Leary, Luke Hanuska, Chris Brown

AI总结 提出TestMap基础设施,自动化C#/.NET仓库的测试生成与验证,记录候选测试生命周期,支持多维度评估。

Comments 10 pages, 1 figure, 2 tables. Accepted to present at AIWare 2026 (arXiv Track)

详情
AI中文摘要

基础模型(FM)可以生成合理的单元测试,但确定这些测试是否正确、有用、可维护且值得集成仍然困难。生成的测试必须映射到它们所针对的代码,插入到实际项目中,构建、执行,与基线套件进行比较,必要时进行修复,并在不同模型和生成策略之间进行比较。这一验证过程分散在构建系统、测试运行器、覆盖率工具、变异测试工具、静态分析器和实验脚本中。这个问题尤其重要,因为生成的测试既是代码工件又是验证工件:它们本身必须经过验证,才能作为被测系统的证据被信任。本文介绍了TestMap,一个开源基础设施原型,它自动进行基于证据的基础模型辅助测试生成,适用于C#/.NET仓库。TestMap支持仓库分析、源代码-测试映射、基线执行、代码度量收集、测试异味检测、覆盖率测量、变异测试、模型引导的测试生成、验证、修复以及仓库特定的实验跟踪。TestMap不仅报告最终通过的测试,还记录每个生成候选的生命周期,包括失败、修复、低影响和证据阳性的结果。这些中间结果可以揭示模型限制、缺失上下文、修复成本、工具链效率低下或被测系统中可能的故障。以TestMap作为设计案例,我们描述了使生成的测试在仓库、模型、提示和生成策略之间可观察、可重复和可比较所需的架构和证据模型。最后,我们总结了经验教训和开放挑战,包括预言和断言质量、度量归因、测试可维护性、脆弱性、执行成本和开发人员接受度。

英文摘要

Foundation models (FMs) can generate plausible unit tests, but determining whether those tests are correct, useful, maintainable, and worth integrating remains difficult. Generated tests must be mapped to the code they target, inserted into real projects, built, executed, measured against the baseline suite, repaired when necessary, and compared across models and generation strategies. This validation process is fragmented across build systems, test runners, coverage tools, mutation tools, static analyzers, and experiment scripts. The problem is especially important because generated tests are both code artifacts and validation artifacts: they must themselves be validated before they can be trusted as evidence about the system under test. This paper presents TestMap, an open-source infrastructure prototype that automates evidence-backed foundation-model-assisted test generation for C#/.NET repositories. TestMap supports repository analysis, source-test mapping, baseline execution, code metric collection, test smell detection, coverage measurement, mutation testing, model-guided test generation, validation, repair, and repository-specific experiment tracking. Rather than reporting only final passing tests, TestMap records the lifecycle of each generated candidate, including failed, repaired, low-impact, and evidence positive outcomes. These intermediate outcomes can reveal model limitations, missing context, repair cost, toolchain inefficiencies, or possible faults in the system under test. Using TestMap as a design case, we describe the architecture and evidence model needed to make generated tests observable, repeatable, and comparable across repositories, models, prompts, and generation strategies. We conclude with lessons learned and open challenges, including oracle and assertion quality, metric attribution, test maintainability, flakiness, execution cost, and developer acceptance.

2606.10210 2026-06-10 cs.CY 新提交

AnnotateThis: Analyzing a human-LLM system for annotating social media data with the concept of climate change mitigation pessimism

AnnotateThis:分析用于以气候变化缓解悲观主义概念标注社交媒体数据的人机协作系统

Zexuan Li, Derek Van Berkel, Ariel Hasell, Grant Schoenebeck, John Barry Ryan, Sabina Tomkins

AI总结 提出人机协作系统AnnotateThis,通过交互式特征帮助用户改进LLM对复杂社会概念的标注质量,实验表明人工干预显著提升F1和准确率。

详情
AI中文摘要

大型语言模型(LLM)正越来越多地被整合到研究工作流程中。然而,LLM在处理计算社会科学(CSS)研究中出现的困难且细微的概念时表现不佳。在CSS社区中,人们呼吁开发以人为中心的新系统,用于LLM支持的科学工作流程。我们开发了AnnotateThis,一个以人为中心的系统,用于检查和改进LLM标注,我们将这一过程称为针对目标概念的LLM接地。AnnotateThis是与计算科学家和社会科学家共同开发的,以反映现有的数据标注工作流程。它包含一系列信息特征,供用户查询LLM标注的质量和可靠性。我们在两种设置下评估了系统。在第一种设置中,我们假设研究人员可能无法访问真实数据,并且AnnotateThis的用户对希望LLM标注的概念的先验知识有限。也就是说,他们可能同时进行概念规范和LLM接地。在第二种设置中,我们假设可以访问真实标签,并且概念已针对给定标注任务指定;在这里,LLM接地的任务更为直接。我们发现,在两种设置下,用户都可以通过AnnotateThis提高LLM标注的质量,并且他们的最终标注远远超过没有人工干预的标注。例如,当使用真实标签进行评估时,与全自动的最先进提示优化方法相比,F值绝对提高了0.15,准确率绝对提高了0.23。

英文摘要

Large language models (LLMs) are increasingly being integrated into research workflows. However, LLMs have been shown to struggle with difficult and nuanced concepts such as those found in computational social science (CSS) research. Within the CSS community, there has been a call for new systems to be developed which center humans in LLM-supported scientific workflows. We develop AnnotateThis, a human-centered system for inspecting and improving LLM annotations, a process we refer to as LLM grounding for a target concept. AnnotateThis is developed with both computational and social scientists to reflect existing workflows for data annotation. It includes a range of information features for users to interrogate the quality and reliability of LLM annotations. We evaluate our system in two settings. In the first, we assume a researcher may not have access to ground truth data and that users of AnnotateThis have limited prior knowledge of the concept they would like an LLM to annotate. That is, they may be conducting concept specification and LLM grounding simultaneously. In the second setting, we assume access to ground truth labels and that the concept is specified for a given annotation task; here, the task of LLM grounding is more straightforward. We find that in both settings users can improve the quality of LLM annotations with AnnotateThis and that their final annotations far surpass those created without human intervention. For example, when we evaluate with ground truth labels, we see an absolute improvement of 0.15 in F-Measure and 0.23 in accuracy over a fully automated state-of-the-art method for prompt refinement.

2606.10201 2026-06-10 eess.SY cs.SY 新提交

Game-Theoretic Area Coverage Control with Cooperative-Adversarial Multi-Agent Systems

博弈论视角下的合作-对抗多智能体系统区域覆盖控制

Ruiming Zheng, Mohammad Pirani, Davide Spinello

AI总结 将多智能体区域覆盖控制建模为两组目标冲突智能体之间的零和博弈,通过耦合梯度下降-上升控制器分析系统动力学,发现由控制增益比决定的Hopf分岔,并刻画纳什均衡条件。

Comments This work has been submitted to IFAC for possible publication

详情
AI中文摘要

我们将多智能体区域覆盖控制问题建模为两组目标冲突智能体之间的两人零和博弈。传统的覆盖控制基于环境风险密度场分配资源。相比之下,我们通过允许第二组对抗智能体生成空间风险场来推广这一度量。耦合的智能体动力学通过区域覆盖度量(作为博弈奖励)相连接。该框架为各组诱导出耦合的梯度下降-上升控制器。对低维情况的分析揭示了由两组控制增益比决定的Hopf分岔。在对抗智能体主导的机制下,系统进入周期性追逐-逃避循环。在普通智能体主导的机制下,系统收敛到固定配置。数值模拟验证了这些理论见解。最后,我们刻画了纳什均衡条件。在该均衡下,普通智能体收敛到广义质心Voronoi图,而对抗智能体停留在其对应的均衡质心处。

英文摘要

We formulate a multi-agent area coverage control problem as a two-player zero-sum game between two agent groups with conflicting goals. Conventional coverage control allocates resources based on an environmental risk density field. In contrast, we generalize this metric by allowing a second group of adversarial agents to generate the spatial risk field. Coupled agent dynamics are linked through the area coverage metric, which functions as the game reward. This framework induces coupled gradient-descent-ascent controllers for the groups. Analysis of a low-dimensional case reveals a Hopf bifurcation dictated by the ratio of the groups' control gains. In the regime dominated by adversarial agents, the system is driven into a periodic chase-evasion cycle. In the regime dominated by ordinary agents, the system converges to a fixed configuration. Numerical simulations validate these theoretical insights. Finally, we characterize the Nash equilibrium conditions. Under this equilibrium, ordinary agents converge to a generalized centroidal Voronoi tessellation, whereas adversarial agents settle at their corresponding equilibrium centroids.

2606.10182 2026-06-10 cs.HC 新提交

Creativity in the BioFoundry: Supporting scientific creativity in the age of automation

生物铸造厂中的创造力:自动化时代支持科学创造力

Mingyan Claire Tian, Sarah Sterman

AI总结 通过访谈九位科学家和专家,研究自动化如何改变生物铸造厂中的科学创造力实践,提出应将其视为创造力支持工具,而非自动化工厂。

Comments 13 pages, 6 figures, 2 tables, ACM Creativity and Cognition Conference 2026

详情
AI中文摘要

生物铸造厂以前所未有的规模自动化生物实验,有望实现速度、可重复性和可访问性。然而,自动化也重塑了科学家体验实验和创造力的方式。通过对九位学术界和工业界的科学家和专家(包括生物铸造厂开发者、自动化工程师和最终用户)的深入访谈,我们研究了自动化下科学创造力是如何实现的。生物铸造厂取代了感官线索,重新分配了人与机器之间的责任,并将故障排除从一种具身的、局部的实践转变为一种预测性的、社会性的和解释性的实践。我们不是将生物铸造厂视为自动化工厂,而是认为它们应被理解为创造力支持工具,其设计直接塑造了研究人员如何注意到故障、行使判断、从失败中学习以及通过成功取得进展。通过将生物铸造厂实践与先前关于自动化、调试和分布式创造力的人机交互研究联系起来,本文展示了生物铸造厂作为科学创造力研究的一个独特且及时的场所。

英文摘要

Biofoundries automate biological experimentation at unprecedented scale, promising speed, reproducibility, and access. Yet automation also reshapes how scientists experience experimentation and creativity. Through in-depth interviews with nine scientists and experts across academia and industry (including biofoundry developers, automation engineers, and end-users), we examine how scientific creativity is enacted under automation. Biofoundries displace sensory cues, redistribute responsibility between humans and machines, and transform troubleshooting from an embodied, local practice into a predictive, social, and interpretive one. Rather than framing biofoundries as automation factories, we argue that they should be understood as Creativity Support Tools, whose design directly shapes how researchers notice breakdowns, exercise judgment, learn from failure, and progress through success. By connecting biofoundry practice with prior HCI work on automation, debugging, and distributed creativity, this paper demonstrates biofoundries as a distinctive and timely site for creativity research in science.

2606.10177 2026-06-10 cs.HC 新提交

VArify: A Visual Analytics System for Verifying Knowledge Enhanced Large Language Model Responses in Food Science

VArify:一个用于验证食品科学中知识增强型大语言模型响应的可视化分析系统

Sam Yu-Te Lee, Yan To Linus Lam, Manami Nakagawa, Kwan-Liu Ma

AI总结 针对GraphRAG系统检索数据复杂、难以验证的问题,提出VArify可视化分析系统,通过文件目录式树状可视化支持证据的组间关系与组内层次探索,用户研究证明其有效区分LLM内部知识与外部证据,并帮助识别知识图谱错误。

详情
AI中文摘要

图检索增强生成(GraphRAG)使大语言模型(LLMs)能够利用结构化的领域特定知识图谱数据库生成基于事实的响应。然而,检索到不相关或冲突的数据仍可能导致错误响应。在知识密集型和证据导向的领域,人工验证LLM响应的支持证据仍然必要。我们进行了一项形成性试点研究,以表征验证GraphRAG系统检索的复杂多层数据的挑战。基于这些见解,我们提出了VArify,一个可视化分析系统,利用受文件目录启发的树状可视化,支持同时探索检索证据中的组间关系和组内层次。我们通过与六位食品科学专家和学生的用户研究评估了VArify。结果表明,该系统有效帮助用户区分LLM的内部参数知识和外部图源证据。此外,可视化帮助专家识别底层知识图谱本身的不准确性,从而对模型输出产生更校准的信任。最后,我们讨论了利用可视化进一步支持关于未知未知、个性化和知识图谱局限性的验证的机会。

英文摘要

Graph Retrieval-Augmented Generation (GraphRAG) enables Large Language Models (LLMs) to leverage structured, domain-specific knowledge graph databases for factually grounded responses. However, the retrieval of irrelevant or conflicting data can still result in erroneous responses. In knowledge-intensive and evidence-focused domains, human verification of the supporting evidence for an LLM response is still necessary. We conducted a formative pilot study to characterize the challenges of verifying complex, multi-layered data retrieved by GraphRAG systems. Based on these insights, we present VArify, a visual analytics system that leverages a file directory-inspired tree visualization to support simultaneous exploration of inter-group relationships and intra-group hierarchies within the retrieved evidence. We evaluate VArify through a user study with six food science experts and students. Our results indicate that the system effectively helps users distinguish between an LLM's internal parametric knowledge and external graph-sourced evidence. Furthermore, the visualization helped experts identify inaccuracies within the underlying knowledge graph itself, leading to more calibrated trust in the model's output. We conclude by discussing opportunities to leverage visualizations to further support verification regarding unknown unknowns, personalization, and limitations of knowledge graphs.

2606.10172 2026-06-10 cs.CR 新提交

Proof of Source of Funds: Efficient On-chain Provenance of Cryptoassets

资金来源证明:加密资产的高效链上溯源

Alireza Kavousi, István András Seres, Zhipeng Wang

AI总结 提出资金来源证明(PoSoF)框架,通过用户侧零知识证明实现合规,无需平台链上监控,支持UTXO和账户模型,验证时间恒定。

详情
AI中文摘要

监管合规对于去中心化金融和隐私增强技术日益成为强制性要求。当前方法依赖于中心化区块链情报公司的二元包含/排除列表或回溯图分析。这种方法剥夺了诚实用户的金融隐私,导致误报和漏报,并迫使去中心化平台承担链上交易监控的负担。在这项工作中,我们提出了一种范式转变:从平台侧监控转向用户侧溯源。我们引入了资金来源证明(PoSoF),一种新颖的密码学框架,将负担转移给用户。平台无需追踪资金,而是由用户本地生成零知识证明,证明其存款完全来自一组合规来源。平台因此免除了链分析职责,只需恒定时间O(1)的验证即可执行准入控制。我们制定了一个统一的时间有向无环图(DAG)抽象,在广义价值流模型中形式化了UTXO和基于账户的账本历史。用户提取其交易历史的合规子DAG,并利用增量可验证计算(IVC)证明严格的状态转换谓词,以防范各种攻击向量。关键的是,PoSoF提供了可验证的密码学溯源;它保证资金的合法性,而不泄露中间交易拓扑、中间地址或所使用的具体来源。我们正式定义了PoSoF的安全属性,并评估了一个兼容以太坊的原型。我们的基准测试表明,完全隐私的主动合规非常实用,每笔新交易仅需约1.8秒增量更新用户的PoSoF,最终链上EVM验证只需恒定时间约1.5毫秒(约80万gas)。

英文摘要

Regulatory compliance is increasingly mandatory for decentralized finance and privacy-enhancing technologies. Current approaches rely on binary inclusion/exclusion lists or retroactive graph analysis by centralized blockchain intelligence firms. This approach strips honest users of their financial privacy, leads to false positives and negatives, and forces decentralized platforms to bear the burden of on-chain transaction monitoring. In this work, we propose a paradigm shift: moving from platform-side surveillance to user-side provenance. We introduce Proof of Source of Funds (PoSoF), a novel cryptographic framework that shifts the burden to the user. Rather than the platform tracing funds, the user locally generates a zero-knowledge proof demonstrating that their deposit originates exclusively from a set of compliant sources. The platform is thus relieved of chain-analysis duties, requiring a constant-time, O(1) verification to enforce admission control. We formulate a unified temporal Directed Acyclic Graph (DAG) abstraction that formalizes both UTXO and account-based ledger histories within a generalized value-flow model. Users extract a compliant sub-DAG of their transaction history and utilize Incrementally Verifiable Computation (IVC) to prove rigorous state-transition predicates that protect against various attack vectors. Crucially, PoSoF provides verifiable cryptographic provenance; it guarantees the legitimacy of the funds without leaking the intermediate transaction topology, intermediary addresses, or the specific origins utilized. We formally define the security properties of PoSoF and evaluate an Ethereum-compatible prototype. Our benchmarks demonstrate that fully private, proactive compliance is highly practical, requiring only ~1.8 s to incrementally update a user's PoSoF per new transaction, and a constant-time ~1.5 ms (~800k gas) for final on-chain EVM verification.

2606.10169 2026-06-10 cs.GT 新提交

Benchmark-Tight Approximation Ratio of Simple Mechanism for a Unit-Demand Buyer

单位需求买家简单机制的基准紧近似比

Yaonan Jin, Pinyan Lu

AI总结 研究单位需求买家场景下收益最大化问题,提出均匀熨平虚拟价值单品定价机制,首次实现与对偶松弛基准的紧3-近似,突破此前4的界限。

详情
AI中文摘要

我们研究单位需求单一买家场景下的收益最大化。主要结果是,\textsf{均匀熨平虚拟价值单品定价} 保证了对 \textsf{对偶松弛基准} [Chawla-Malec-Sivan, EC'10/GEB'15; Cai-Devanur-Weinberg, STOC'16/ SICOMP'21] 的 {\em 紧} $3$-近似,打破了自 [Chawla-Hartline-Malec-Sivan, STOC'10; Chawla-Malec-Sivan, EC'10/GEB'15] 以来的 $4$ 界限。据我们所知,这是任何简单多物品机制的首个 {\em 基准紧} 收益保证。技术上,所有先前工作都使用 \textsf{Myerson拍卖} 作为中间步骤。$4$ 的界限源于 \textsf{均匀熨平虚拟价值单品定价} 实现了对 \textsf{Myerson拍卖} 的 {\em 紧} $2$-近似,而后者又实现了对 \textsf{对偶松弛基准} 的 {\em 紧} $2$-近似。相反,我们的新方法避免了 \textsf{Myerson拍卖},从而实现了改进。我们工作的核心是一个 {\em 基于基准} 的 $3$-竞争先知不等式及其 {\em 完全构造性} 证明。这种变体先知不等式将在未来找到应用,例如在多物品机制设计中,最优收益被松弛为各种更易处理的基准。我们用不可能性结果补充了基准紧比率。所有先前工作和我们的工作都遵循 [Chawla-Hartline-Kleinberg, EC'07] 引入的 {\em 单维代表} 方法。针对 \textsf{对偶松弛基准},事实证明,对于一大类 \textsf{单品定价},该方法无法超越我们的 $3$ 界限。

英文摘要

We study revenue maximization in the unit-demand single-buyer setting. Our main result is that \textsf{Uniform-Ironed-Virtual-Value Item Pricing} guarantees a {\em tight} $3$-approximation to the \textsf{Duality Relaxation Benchmark} [Chawla-Malec-Sivan, EC'10/GEB'15; Cai-Devanur-Weinberg, STOC'16/ SICOMP'21], breaking the barrier of $4$ since [Chawla-Hartline-Malec-Sivan, STOC'10; Chawla-Malec-Sivan, EC'10/GEB'15]. To our knowledge, this is the first {\em benchmark-tight} revenue guarantee of any simple multi-item mechanism. Technically, all previous works employ \textsf{Myerson Auction} as an intermediary. The barrier of $4$ follows as \textsf{Uniform-Ironed-Virtual-Value Item Pricing} achieves a {\em tight} $2$-approximation to \textsf{Myerson Auction}, which then achieves a {\em tight} $2$-approximation to \textsf{Duality Relaxation Benchmark}. Instead, our new approach avoids \textsf{Myerson Auction}, thus enabling the improvement. Central to our work are a {\em benchmark-based} $3$-competitive prophet inequality and its {\em fully constructive} proof. Such variant prophet inequalities shall find future applications, e.g., to Multi-Item Mechanism Design where optimal revenues are relaxed to various more accessible benchmarks. We complement our benchmark-tight ratio with an impossibility result. All previous works and ours follow the {\em single-dimensional representative} approach introduced by [Chawla-Hartline-Kleinberg, EC'07]. Against \textsf{Duality Relaxation Benchmark}, it turns out that this approach cannot beat our bound of $3$ for a large class of \textsf{Item Pricing}'s.

2606.10163 2026-06-10 cs.CR cs.AR 新提交

GRAFT: Graphlet-Triggered Backdoor Attack on GNN-Based Hardware Security Systems

GRAFT: 基于图元的GNN硬件安全系统后门攻击

Sanaz Kazemi Abharian, Sai Manoj Pudukotai Dinakarrao

AI总结 提出GRAFT方法,利用图元触发器在RTL或门级嵌入后门,保持电路功能,有效攻击GNN硬件安全系统,攻击成功率高达100%。

详情
AI中文摘要

集成电路(IC)供应链的全球化增加了安全威胁的风险,例如硬件木马(HT)和知识产权(IP)盗窃。图神经网络(GNN)作为处理图结构数据的最强大的深度学习方法之一,已被广泛用于检测此类威胁。然而,GNN容易受到后门攻击,这些攻击可以恶意地将输出预测操纵到对抗目标。这些攻击不仅难以检测,而且会损害基于GNN的安全系统的完整性。大多数先前的工作使用随机生成的子图或梯度引导的生成子图来嵌入后门触发器。然而,这种触发器对于基于GNN的硬件安全应用是不切实际的,因为它们不能保证保留电路功能。在本文中,我们提出了GRAFT,一种针对基于GNN的硬件安全的图元触发后门攻击。GRAFT在设计中的寄存器传输级(RTL)或门级嵌入基于图元的触发器,同时保留电路的原始功能。我们在ISCAS-85和TrustHub数据集上评估了GRAFT。我们的实验结果表明,GRAFT可以有效逃避HT检测和IP盗版检测,攻击成功率(ASR)高达100%。

英文摘要

The globalization of the integrated circuit (IC) supply chain increases the risk of security threats, such as hardware Trojans (HTs) and the theft of intellectual property (IP). Graph Neural Networks (GNNs), among the most powerful deep learning methods for processing graph-structured data, have been widely adopted to detect such threats. However, GNNs are susceptible to backdoor attacks that can maliciously manipulate output predictions toward an adversarial target. These attacks are not only difficult to detect but also compromise the integrity of GNN-based security systems. Most prior work embeds backdoor triggers using randomly generated subgraphs or gradient-guided generative subgraphs. However, such triggers are impractical for GNN-based hardware security applications as they do not guarantee the preservation of circuit functionality. In this paper, we propose GRAFT, a graph let-triggered backdoor attack targeting GNN-based hardware security. GRAFT embeds graphlet-based triggers at either the register-transfer level (RTL) or gate level of the design while preserving the circuit 's original function. We evaluate GRAFT on the ISCAS-85 and TrustHub datasets. Our experimental results demonstrate that GRAFT can effectively evade HT detection and IP piracy detection, achieving an attack success rate (ASR) of up to 100%.

2606.10158 2026-06-10 cs.CY cs.HC 新提交

"Where is this coming from?" Uncovering Trustworthiness Ideals in AI-powered Peripartum Information Seeking

“这从何而来?”揭示AI驱动的围产期信息寻求中的可信赖性理想

Vaibhav Balloli, Julia Erickson, Xinyi Li, Erin MacMurray van Liemt, Alex Peahl, Elizabeth Bondi-Kelly

AI总结 通过焦点小组研究围产期利益相关者,发现高风险健康情境中可信赖性必须可审查而非声称,提出四项治理要求:社会与身份意义建构、多元验证、可审查治理与生态互补整合。

Comments Accepted at ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2026

详情
AI中文摘要

AI驱动的工具越来越多地承诺填补健康信息空白,尤其是在孕产妇和生殖健康等需要及时、准确和可操作信息的领域。这一点极其重要,因为美国在可预防死亡方面领先于同行国家,且存在显著的种族差异。然而,当前的AI和NLP系统旨在通过将用户查询引导至事实性回答来改善经过验证的孕产妇健康信息的获取,同时未充分说明塑造实践中信任、使用和伤害的社会技术治理结构。我们报告了四个同步焦点小组(n=24)的研究结果,涉及围产期信息支持的三个核心利益相关者群体:生育者、临床医生和健康工作者(如导乐、社会工作者、社区健康工作者),探讨了信息寻求、当前临床基础设施经验、错误信息以及一个AI驱动的事实回答工具设计探针等主题。我们的归纳分析揭示了一个核心发现:在由历史不平等塑造的高风险健康情境中,可信赖性必须是可审查的,而非声称的。尽管利益相关者对信息可信度的看法存在分歧,但他们一致认为需要透明度、追索权和生态系统互补性。基于讨论,我们确定了四个主题和治理要求:(1)支持社会和基于身份的意义建构,(2)多元验证实践,(3)具有追索机制的可审查治理,以及(4)避免转移负担的生态系统感知整合。基于这些发现,我们提出了具有不信任意识的设计工件,并促进透明、多元AI系统的原则性治理机制。最后,我们讨论了研究结果对扩展人-AI评估和改进已部署AI系统透明度的启示。

英文摘要

AI-powered tools increasingly promise to fill information gaps in health, especially in domains like maternal and reproductive health that demand timely, accurate, and actionable information. This is extremely important, as the United States leads peer nations in preventable deaths, with stark racial disparities. However, current AI and NLP-powered systems aim to improve access to vetted maternal health information by routing user queries to a factual response while under-specifying the socio-technical governance structures that shape trust, use, and harm in practice. We report findings from four synchronous focus groups ($n=24$) with three stakeholder groups central to peripartum information support: birthing people, clinicians, and health workers (e.g., doulas, social workers, community health workers) exploring topics around information seeking, experience with current clinical infrastructure, misinformation, and an AI-enabled factual answering tool design probe. Our inductive analysis surfaces a central finding: in high-stakes health contexts shaped by historical inequities, trustworthiness must be inspectable and not asserted. While stakeholders diverge on what makes information credible, they converge on the need for transparency, recourse, and ecosystem complementarity. Based on the discussions, we identify four themes and governance requirements: (1) support for social and identity-based sensemaking, (2) pluralistic verification practices, (3) inspectable governance with recourse mechanisms, and (4) ecosystem-aware integration that avoids shifting burden. Building on these findings, we propose design artifacts that are mistrust-aware and promote principled governance mechanisms for transparent, pluralistic AI systems. Finally, we discuss the implications of our findings for expanding human-AI evaluations and improving the transparency of deployed AI systems.

2606.10157 2026-06-10 eess.SY cs.SY 新提交

An Algebraic State Observer for a Self-Sensing Active Magnetic Bearing System

自感应主动磁轴承系统的代数状态观测器

Olga Zarina, Natalya Martyuhova, Alexey Bobtsov, Romeo Ortega

AI总结 针对仅测量电流和电压的自感应主动磁轴承系统,提出一种基于代数关系的全局稳定状态观测器,并给出鲁棒渐近版本,仿真验证了性能。

Comments 7 pages, 9 figures, submitted to International Journal of Adaptive Control and Signal Processing

详情
AI中文摘要

本文研究了仅假设测量电流和电压的自感应主动磁轴承系统的全局稳定观测器设计问题。为此,我们首先设计了一种全新的高性能状态观测器,该观测器通过引入新颖技术获得。实际上,我们的目标是建立不可测状态与系统输入输出滤波版本之间的代数关系,该关系对所有时间成立。然后,利用该代数观测器,我们提出了一个鲁棒渐近版本的观测器。还给出了说明观测器性能的仿真结果。

英文摘要

The problem of designing a globally stable observer for a self-sensing active magnetic bearing system assuming only measurements of currents and voltages is addressed in this paper. Towards this end, we first design a radically different, high performance, state observer, which is obtained invoking novel techniques. Indeed, our objective is to obtain an algebraic relation between the unmeasurable part of the state and filtered versions of the systems inputs and outputs, which holds for all times. Then, using this algebraic observer, we propose a robust asymptotic version of the observer. Simulation results that illustrate the performance of the observer are also presented.

2606.10148 2026-06-10 cs.CR 新提交

RadKey: An LLM-Guided RF Backscatter System for Through-Wall Keystroke Inference

RadKey: 一种基于LLM引导的RF反向散射系统用于穿墙击键推断

Qijun Wang, Chunqi Qian, Huacheng Zeng

AI总结 提出RadKey系统,利用无源反向散射标签捕获击键振动和声音,通过RF信号远距离穿墙窃听,结合信号处理与LLM在线自适应,实现跨用户、跨键盘的准确击键推断。

Comments Accepted to the 47th IEEE Symposium on Security and Privacy (IEEE S&P), 2026

详情
AI中文摘要

在当今数字化互联的世界中,键盘仍然是输入敏感信息的主要接口,使其成为窃听攻击的持续目标。先前的击键推断技术利用了声学和振动等侧信道信号,但它们通常依赖于显眼的短距离传感器,并需要受害者特定数据进行模型训练,限制了其实用性、可扩展性和隐蔽性。在本文中,我们提出了RadKey,一种用于隐蔽、远距离、穿墙击键窃听的RF反向散射系统。RadKey包含两个组件:一个紧凑的无电池反向散射标签和一个RF阅读器。该标签捕获击键引起的振动和声学信号,利用两个磁耦合LC谐振器将这些信号调制到其反向散射RF信号的频移上。这种设计还实现了激励信号和反向散射信号之间的频谱分离,减轻了RF阅读器的自干扰,从而延长了窃听距离。RF阅读器解调反向散射RF信号以推断键入内容。它采用专门的信号处理流水线,在时域和频域提取与用户和键盘无关的击键特征,从而实现强大的泛化能力。为了进一步增强适应性,RadKey集成了一个LLM进行在线自适应,利用LLM输出作为伪真实标签在运行时优化分类器。我们构建了完整的RadKey系统原型,并通过广泛的空中实验进行了评估。结果表明,RadKey在现实环境中跨不同用户实现了准确且稳健的击键推断。演示视频可在以下网址获取:this https URL

英文摘要

In today's digitally connected world, keyboards remain the primary interface for inputting sensitive information, making them a persistent target for eavesdropping attacks. While prior keystroke inference techniques have exploited side-channel signals such as acoustics and vibrations, they typically rely on conspicuous, short-range sensors and require victim-specific data for model training, limiting their practicality, scalability, and stealth. In this paper, we present RadKey, an RF backscatter system for covert, long-range, through-wall keystroke eavesdropping. RadKey comprises two components: a compact batteryless backscatter tag and an RF reader. The tag captures keystroke-induced vibrations and acoustic signals, modulating them onto the frequency shift of its backscattered RF signal using two magnetically-coupled LC resonators. This design also enables spectral separation between the excitation and backscatter signals, mitigating self-interference for the RF reader and thus extending eavesdropping range. The RF reader demodulates the backscattered RF signal to infer typed content. It employs a dedicated signal processing pipeline that extracts user- and keyboard-independent keystroke features across time and frequency domains, enabling strong generalizability. To further enhance adaptability, RadKey integrates an LLM for online adaptation, leveraging LLM outputs as pseudo ground-truth labels to refine the classifier during runtime. We have built a prototype of the full RadKey system and evaluated it through extensive over-the-air experiments. Results show that RadKey achieves accurate and robust keystroke inference across diverse users in real-world settings. A demo video is available at: https://radkey-submission.github.io/RadKey/

2606.10097 2026-06-10 cs.CR cs.NI 新提交

Secrets Best Not Shared: DNS Privacy Enhancements for the Constrained IoT

最好不共享的秘密:受限物联网的DNS隐私增强

Martine S. Lenders, Thomas C. Schmidt, Matthias Wählisch

AI总结 针对受限物联网设备,研究在加密基础上混淆DNS流量以提升隐私,通过实验评估DNS over CoAP等协议,发现均衡包长、分块传输和头部压缩可将DNS帧识别准确率降至86%,结合负载压缩进一步降至77%。

Comments 20 pages, 20 figures, 2 tables

详情
AI中文摘要

攻击者经常识别DNS流量以破坏或危及互联网服务。虽然先前的工作侧重于使用DNS over TLS、HTTPS或QUIC加密查询来对抗此类攻击,但我们考虑了为资源受限的物联网设备设计的IETF协议,并实证分析了在加密之外混淆DNS流量的潜力。我们创建了一个机器对机器兼容的数据对象数据集及其对应的DNS解析过程,评估了296种解析主机名的部署场景,包括基于受限应用层协议(CoAP)的DNS和CoAP的洋葱路由变体,在不同链路层条件下进行。我们将它们与DNS over HTTPS进行比较。使用随机森林和头部字段分析,我们识别出泄露最多信息的字段。我们的研究结果表明,采用均衡包长、分块传输和头部压缩的DNS over CoAP将识别DNS帧的准确率降至86%,进一步结合负载压缩降至77%。我们的方法优于DNS over HTTPS,后者中分类器总是基于IP地址识别DNS帧。该数据集公开可用。

英文摘要

Attackers often identify DNS traffic to disrupt or compromise Internet services. While prior work has focused on encrypting queries using DNS over TLS, HTTPS, or QUIC to counter such attacks, we consider IETF protocols designed for resource-constrained IoT devices and empirically analyze the potential of obfuscating DNS traffic in addition to encryption. We create a dataset of machine-to-machine-compatible data objects along with the corresponding DNS resolution processes, evaluating 296 deployment scenarios of resolving host names, including DNS over the Constrained Application Layer Protocol (CoAP) and an onion routing flavor of CoAP under varying link-layer conditions. We compare them to DNS over HTTPS. Using Random Forest and a header field analysis, we identify fields that leak most information. Our findings show that DNS over CoAP with equalized packet lengths, block-wise transfer, and header compression reduces the accuracy of identifying DNS frames to 86% and further to 77% with payload compression. Our approach outperforms DNS over HTTPS, where classifiers always identify DNS frames based on IP addresses. The dataset is publicly available.

2606.10095 2026-06-10 cs.HC 新提交

LLM-Based Visualization Evaluation: How Well Do Literacy-Stratified Personas Approximate Human Judgments?

基于LLM的可视化评估:按识字分层的人物角色在多大程度上接近人类判断?

Swaroop Panda

AI总结 提出识字分层LLM评估框架(LSLE),通过构建可视化识字角色并引导LLM模拟评估,与人类数据对比,发现其在早期设计探索中有效,但在总结性评估中系统失效。

详情
AI中文摘要

评估不同用户群体的数据可视化仍然是可视化研究中的一个重大方法论挑战。我们提出了一个理论化的评估框架——识字分层LLM评估(LSLE),它形式化了一个两阶段过程。第一阶段涉及基于VLAT等既定框架构建可视化识字角色。第二阶段指导大型语言模型采用这些角色作为可视化工件的模拟评估者。我们通过认识论分析来奠定该框架的基础,该分析描述了LLM角色模拟可能在何种条件下产生识字依赖感知的合理代理——以及关键地,在何种条件下不会——直接涉及来自VIS和HCI文献中对LLM作为参与者范式的新兴批评。为了实证测试LSLE的边界,我们将其输出与来自两个既定工具(VLAT和BeauVIS)验证研究的公开人类响应数据进行基准测试。使用与原始人类研究相同的刺激和评估项目,我们比较了跨识字层的LSLE角色响应与已发布的人类分布以及默认(非角色)LLM基线。我们的分析揭示了识字分层角色在哪些方面与人类响应模式趋同和分歧——识别了角色模拟接近人类变异性以及系统失败的任务类型和评估维度。我们讨论了将LLM辅助评估作为实证方法补充的负责任使用的含义,并提出了LSLE可能最合适的边界条件:早期设计探索和快速比较筛选,而非总结性评估。

英文摘要

Evaluating data visualizations across diverse user populations continues to pose a significant methodological challenge within visualization research. We propose a theorized evaluation framework, Literacy-Stratified LLM Evaluation (LSLE), which formalizes a two-stage process. The first stage involves constructing visualization literacy personas grounded in established frameworks such as VLAT. The second stage directs large language models to adopt these personas as simulated evaluators of visualization artifacts. We ground the framework in an epistemic analysis that characterizes the conditions under which LLM persona simulation may produce plausible proxies for literacy-dependent perception - and, critically, the conditions under which it does not - engaging directly with emerging critiques of LLM-as-participant paradigms from the VIS and HCI literature. To empirically test LSLE's boundaries, we benchmark its outputs against openly available human response data from the validation studies of two established instruments: VLAT and BeauVIS. Using the same stimuli and assessment items as the original human studies, we compare LSLE persona responses across literacy strata against published human distributions and against default (non-persona) LLM baselines. Our analysis reveals where literacy-stratified personas converge with and diverge from human response patterns - identifying task types and evaluation dimensions where persona simulation approximates human variability and where it systematically fails. We discuss implications for the responsible use of LLM-assisted evaluation as a complement to empirical methods, and propose boundary conditions for when LSLE may be most appropriate: early-stage design exploration and rapid comparative screening rather than summative evaluation.

2606.10083 2026-06-10 cs.CR cs.CY 新提交

The Human Vulnerabilities & Exploits (HVE) Framework

人类漏洞与利用(HVE)框架

Avichai Ben, Tom Rahav, Daniel Illaev, Aviv Nahon, Avi Grushka

AI总结 针对社会工程和欺诈攻击缺乏标准化框架的问题,本文提出HVE框架,基于行为科学理论分类、评分和缓解人类行为心理漏洞。

详情
AI中文摘要

网络安全社区投入了二十多年时间构建标准化框架,包括通用漏洞与暴露(CVE)系统、通用漏洞评分系统(CVSS)和通用弱点枚举(CWE),以识别、分类和修复数字基础设施的威胁。然而,新兴研究表明,绝大多数成功的网络攻击利用的不是软件缺陷,而是人类行为和心理漏洞。社会工程、欺诈和诈骗攻击操纵人类认知、情感和信任,缺乏相应的标准化框架。同时,行为科学和心理学研究已经建立了坚实的理论基础,如双过程理论、前景理论、社会影响框架和内脏状态模型,精确解释了这些攻击为何以及如何成功。本文介绍了人类漏洞与利用(HVE)框架,这是一种结构化的方法,用于识别、分类和缓解在诈骗、社会工程及其他以人为中心的欺诈和攻击中被利用的行为和心理漏洞,其概念类似于CVE如何帮助分类软件漏洞:它提供了一种共享的、机器可读的分类法,包含结构化标识符、通过人类漏洞严重性评分(HVSS)进行的多维度严重性评分,以及通过人类漏洞补丁(HVP)提供的可操作修复指南。本引言综合了网络安全标准化、行为科学和欺诈防御方面的相关文献,为HVE框架建立了理论和实践基础,其架构和技术规格将在后续章节中详细说明。

英文摘要

The cybersecurity community has invested over two decades in building standardized frameworks, the Common Vulnerabilities and Exposures (CVE) system, the Common Vulnerability Scoring System (CVSS), and the Common Weakness Enumeration (CWE) to identify, classify, and remediate threats to digital infrastructure. However, an emerging body of research reveals that a vast majority of successful cyberattacks exploit not software flaws, but human behavioral and psychological vulnerabilities. Social engineering, fraud, and scam attacks, which manipulate human cognition, emotion, and trust, do not have an equivalent standardized framework. Meanwhile, behavioral science and psychology research has established robust theoretical foundations, such as dual-process theory, prospect theory, social influence frameworks, and visceral state models, which explain precisely why and how these attacks succeed. This paper introduces the Human Vulnerabilities & Exploits (HVE) Framework, a structured approach for identifying, classifying, and mitigating the behavioral and psychological vulnerabilities exploited in scams, social engineering, and other human-centric fraud and attacks, analogous in concept to how CVE helps classify software vulnerabilities: it provides a shared, machine-readable taxonomy with structured identifiers, multi-dimensional severity scoring via the Human Vulnerability Severity Score (HVSS), and actionable remediation guidance through Human Vulnerability Patches (HVPs). This introduction synthesizes the relevant literature across cybersecurity standardization, behavioral science, and fraud defense to establish the theoretical and practical foundations for the HVE framework, whose architecture and technical specifications are detailed in subsequent sections.

2606.10078 2026-06-10 cs.IR 新提交

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

Mult-DPO:用于推荐系统的多项直接偏好优化

Yaochen Zhu, Harald Steck, James McInerney, Aditya Sinha, Yinhan He, Nathan Kallus, Jundong Li

AI总结 针对推荐系统中集合级偏好(多个正项)的LLM对齐问题,提出Mult-DPO,通过可计算的多项式替代似然实现直接偏好优化,并证明其是边际化Plackett-Luce DPO损失的可计算上界。

详情
AI中文摘要

直接偏好优化(DPO)是一种基于成对偏好的简单有效的LLM对齐策略。然而,在推荐系统中,用户反馈很少是成对的。对于给定上下文(如用户、会话或对话),我们通常观察到包含多个正项的集合级偏好,其中每个正项应优于所有未观察或明确负项,而正项之间或负项之间没有规定顺序。一个自然的泛化是使用Plackett-Luce(PL)奖励模型,它将原始DPO背后的Bradley-Terry奖励模型从成对偏好扩展到候选项的完整排序。然而,我们表明,将PL模型适应于集合级偏好需要对所有正项排序进行边际化,所得表达式在复杂度上是组合性的。为了解决这一根本挑战,我们提出了Mult-DPO,一种新颖的DPO目标,它使用集合级偏好事件上的可计算多项式替代似然,用于基于LLM的推荐系统的用户偏好对齐。多项式构造本身不是排序分布,但它定义在相同的奖励诱导权重空间上,并允许闭式DPO风格的目标,从而通过分类风格目标直接对齐LLM与多个候选项。此外,我们证明,在针对集合级偏好数据优化时,多项式DPO损失是边际化PL DPO损失的可计算上界。我们进一步根据正项与负项的相对总权重刻画了该界的紧致性,这为使用更丰富或更难的负项来收紧界提供了见解。最后,我们将Mult-DPO扩展到具有多个偏好级别的LLM对齐。代码可在https://this URL获取。

英文摘要

Direct preference optimization (DPO) is a simple and effective alignment strategy for large language models (LLMs) based on pairwise preferences. In recommender systems, however, user feedback is rarely pairwise. For a given context, e.g., a user, a session, or a conversation, we typically observe set-wise preferences with multiple positive items, where every positive item should outrank every unobserved or explicitly negative item, with no prescribed order among the positives or the negatives themselves. A natural generalization is to use the Plackett-Luce (PL) reward model, which extends the Bradley-Terry reward model underlying vanilla DPO from pairwise preferences to full rankings of candidates. However, we show that adapting the PL model to set-wise preferences requires marginalizing over all positive orderings, where the resulting expression is combinatorial in complexity. To address this fundamental challenge, we propose Mult-DPO, a novel DPO objective with a tractable multinomial surrogate likelihood over set-wise preference events for the user-preference alignment of LLM-based recommender systems. The multinomial construction is not itself a ranking distribution, but it is defined on the same reward-induced weight space and admits a closed-form DPO-style objective, enabling direct alignment of LLMs with multiple candidates through a classification-style objective. In addition, we prove that the multinomial DPO loss is a tractable upper bound on the marginalized PL DPO loss when optimizing against the set-wise preference data. We further characterize the tightness of this bound in terms of the relative total weight of positives versus negatives, which provides insights into tightening the bound with richer or harder negatives. Finally, we extend Mult-DPO to the alignment of LLMs with multiple preference levels. Code is available at https://github.com/yaochenzhu/Mult_DPO

2606.10056 2026-06-10 cs.LO 新提交

Experimental evaluation of optimal abstract operators for sharing and linearity analysis

共享与线性分析中最优抽象算子的实验评估

Francesca Scozzari, Gianluca Amato

AI总结 针对逻辑程序静态分析中抽象算子的最优性与性能权衡问题,在PLAI分析器中实现了多个最优算子,并通过实验评估了它们对共享与线性分析精度和性能的影响。

Comments Accepted for publication in ICLP 2026

详情
AI中文摘要

在逻辑程序静态分析领域,抽象算子的最优性是一个有价值的理论性质,因为它提供了对抽象域结构以及可达到的最大精度的洞察。然而,实现最优算子通常很复杂,并且可能显著影响性能,从而在精度和效率之间产生权衡。我们在逻辑程序的共享和线性分析背景下实验性地研究了这种权衡。我们的实验建立在先前提出的用于合一和匹配的几个最优算子的工作之上。我们在CiaoPP预处理器的PLAI分析器中实现了这些抽象算子和相应的抽象域,并报告了增加算子精度对整个分析准确性和性能的影响。

英文摘要

In the field of static analysis of logic programs, the optimality of abstract operators is a valuable theoretical property, as it provides insight into the structure of abstract domains and the maximum precision that can be achieved. However, implementing optimal operators is often complex and may significantly impact performance, giving rise to a trade-off between precision and efficiency. We experimentally investigate this trade-off in the context of sharing and linearity analysis of logic programs. Our experiments build on previous work that proposed several optimal operators for unification and matching. We have implemented these abstract operators and the corresponding abstract domains within the PLAI analyzer, part of the CiaoPP preprocessor, and we report the impact of increasing operator precision on the accuracy and performance of the overall analysis.

2606.10053 2026-06-10 cs.GT cs.IR 新提交

Stability in Competitive Search with Results Diversification

结果多样化下的竞争搜索稳定性

Itamar Reinman, Omer Madmon, Moshe Tennenholtz, Oren Kurland

AI总结 研究竞争搜索中结果多样化对语料库稳定性的影响,通过博弈论分析揭示多样性-稳定性权衡,并提出保证稳定性的多样化排序方法。

Comments Accepted to ICTIR 2026

详情
AI中文摘要

在竞争搜索环境中,发布者会策略性地修改其文档以应对诱导的排名,从而改善其未来排名。我们对应用了搜索结果多样化的竞争搜索环境进行了新颖的博弈论分析。我们的分析揭示了语料库多样性与语料库稳定性之间的内在权衡,其中稳定性对应于博弈中的均衡。我们分析了两种代表性的多样化方法,并表明稳定性不一定能够达到,导致语料库因发布者受排名激励的修改而快速变化。然后,我们提出了一种新颖的方法来设计基于多样化的排名函数,这些函数保证能够导致语料库稳定性。

英文摘要

In a competitive search setting, publishers strategically modify their documents in response to induced rankings so as to improve their future ranking. We present a novel game-theoretic analysis of a competitive search setting where search-results diversification is applied. Our analysis reveals an inherent tradeoff between corpus diversity and corpus stability, where the latter corresponds to an equilibrium in a game. We analyze two representative diversification methods and show that stability need not necessarily be reached, leaving the corpus to rapid changes due to ranking incentivized modifications of publishers. We then present a novel approach to devise diversification-based ranking functions that are guaranteed to lead to corpus stability.

2606.10051 2026-06-10 cs.CY cs.HC 新提交

The Empirically Grounded Adaptive Virtual Patient for Psychotherapy Training: Disclosure That Responds to Therapist Micro-Skills

基于经验的自适应虚拟患者用于心理治疗培训:对治疗师微技能做出反应的披露

Angela Chen, Siwei Jin, Catherine Bao, Canwen Wang, Robert E. Kraut, Tongshuang Wu, Haiyi Zhu

AI总结 提出自适应虚拟患者(AVP),基于近2000小时真实治疗转录的结构方程模型,动态调整披露水平以响应治疗师共情和探索技能,在80次会话评估中优于纯提示基线。

详情
AI中文摘要

模拟患者为培训心理治疗微技能(如共情回应和探索性提问)提供了一种可扩展的方式,但当前系统要么遵循固定脚本,要么依赖在长时间会话中不可预测地漂移的LLM。我们提出了自适应虚拟患者(AVP),它根据受训者的技能调整其披露行为——从保守、适度开放到完全披露。AVP基于一个结构方程模型,该模型拟合了近2000小时的真实心理治疗转录,量化了治疗师的共情和探索如何随时间改变患者的开放性。LLM根据动态模块每轮更新的披露水平生成AVP的话语。在20名临床医生和受训者进行的80次会话(1033轮)评估中,AVP的披露水平随治疗师的共情和探索而上升,而纯提示基线保持平坦;消融实验证实,基于经验的参数化优于替代方案,其中探索承载了大部分自适应信号。

英文摘要

Simulated patients offer a scalable way to train psychotherapy micro-skills such as empathic responding and exploratory probing, but current systems either follow fixed scripts or rely on LLMs that drift unpredictably over long sessions. We present the Adaptive Virtual Patient (AVP), which adapts its disclosure behavior -- from guarded, through moderate openness, to full disclosure -- in response to trainee skill. The AVP is grounded in a structural equation model fit to nearly 2{,}000 hours of real-world psychotherapy transcripts, which quantifies how therapist empathy and exploration shift a patient's openness over time. An LLM generates the AVP's utterances conditioned on a disclosure level that the dynamics module updates each turn. In an evaluation with 20 clinicians and trainees over 80 sessions (1{,}033 turns), the AVP's disclosure rises in response to therapist empathy and exploration, while a prompt-only baseline stays flat; ablations confirm that the empirically motivated parameterization outperforms alternatives, with exploration carrying most of the adaptive signal.