arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2119
2510.06242 2026-06-11 cs.CL cs.AI

Transparent Reference-free Automated Evaluation of Open-Ended User Survey Responses

Subin An, Yugyeong Ji, Junyoung Kim, Heejin Kook, Yang Lu, Josh Seltzer

详情
Journal ref
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Comments
EMNLP Industry Track
英文摘要

Open-ended survey responses provide valuable insights in marketing research, but low-quality responses not only burden researchers with manual filtering but also risk leading to misleading conclusions, underscoring the need for effective evaluation. Existing automatic evaluation methods target LLM-generated text and inadequately assess human-written responses with their distinct characteristics. To address such characteristics, we propose a two-stage evaluation framework specifically designed for human survey responses. First, gibberish filtering removes nonsensical responses. Then, three dimensions-effort, relevance, and completeness-are evaluated using LLM capabilities, grounded in empirical analysis of real-world survey data. Validation on English and Korean datasets shows that our framework not only outperforms existing metrics but also demonstrates high practical applicability for real-world applications such as response quality prediction and response rejection, showing strong correlations with expert assessment.

2408.00157 2026-06-11 cs.LG physics.comp-ph physics.flu-dyn

Generative Learning of the Solution of Parametric Partial Differential Equations Using Guided Diffusion Models and Virtual Observations

Han Gao, Sebastian Kaltenbach, Petros Koumoutsakos

详情
英文摘要

We introduce a generative learning framework to model high-dimensional parametric systems using gradient guidance and virtual observations. We consider systems described by Partial Differential Equations (PDEs) discretized with structured or unstructured grids. The framework integrates multi-level information to generate high fidelity time sequences of the system dynamics. We demonstrate the effectiveness and versatility of our framework with two case studies in incompressible, two dimensional, low Reynolds cylinder flow on an unstructured mesh and incompressible turbulent channel flow on a structured mesh, both parameterized by the Reynolds number. Our results illustrate the framework's robustness and ability to generate accurate flow sequences across various parameter settings, significantly reducing computational costs allowing for efficient forecasting and reconstruction of flow dynamics.

2312.11540 2026-06-11 cs.LG

On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

Tatsuya Akutsu, Avraham A. Melkman, Atsuhiro Takasu

详情
英文摘要

In this paper, we focus on the prediction phase of a random forest and study the problem of representing a bag of decision trees using a smaller bag of decision trees, where we only consider binary decision problems on the binary domain and simple decision trees in which an internal node is limited to querying the Boolean value of a single variable. As a main result, we show that the majority function of $n$ variables can be represented by a bag of $T$ ($< n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$ and $T$ must be odd (in order to avoid the tie break). We also show that a bag of $n$ decision trees can be represented by a bag of $T$ decision trees each with polynomial size if $n-T$ is a constant and a small classification error is allowed. A related result on the $k$-out-of-$n$ functions is presented too.

2606.12279 2026-06-11 cs.NE cs.AI cs.LG 新提交

Mathematical perspective on genetic algorithms with optimization guided operators

遗传算法与优化引导算子的数学视角

Anna Brandenberger, Ilan Doron-Arad, Elchanan Mossel

AI总结 本文从数学角度建模遗传算法,将优化问题转化为查询复杂度问题,并证明某些问题必须依赖生成、变异和重组算子,同时揭示了多样性在解池中的关键作用。

详情
Comments
18 pages, 1 figure
AI中文摘要

近期机器学习工作将遗传算法应用于推理阶段,以迭代改进优化问题的解。所涉及的基本变异和重组算子在性质上不同于经典研究。变异不再是随机的;机器学习算法以改进目标为目的对解进行变异。同样,重组不再基于父代解的随机拼接,而是基于机器学习的优化算子,其目标是从输入中合成改进的解。因此,这些变异和重组算子更有可能改进目标,但其计算成本更高。我们引入了一个遗传算法的通用模型,并使用强化学习的语言将优化问题表述为查询复杂度问题。然后我们研究专门模型。我们证明某些优化问题必须通过生成、变异和重组来解决。接着,我们在此框架内为一类问题获得了定性紧的算法,该算法捕捉了解池中多样性的非平凡作用,这是实际机器学习遗传算法的一个关键特征。

英文摘要

Recent work in ML applies genetic algorithms at inference time to iteratively improve solutions to optimization problems. The basic mutation and recombination operators involved are qualitatively different from those studied classically. Mutations are no longer random; an ML algorithm mutates a solution with the goal of improving an objective. Similarly, recombination is not based on random collages of parent solutions. Instead, it is an ML optimization-based operator whose goal is to synthesize improved solutions from its inputs. Thus, these mutation and recombination operators are more likely to improve the objective, but their computational cost is much higher. We introduce a general model of genetic algorithms and formulating optimization in this model as a query-complexity problem, using the language of reinforcement learning. We then study specialized models. We show that some optimization problems require generation, mutation, and recombination to be solved. We then obtain qualitatively tight algorithms for a family of problems within this framework that captures the nontrivial role of diversity in the solution pool, a key feature of practical ML genetic algorithms.

2606.12231 2026-06-11 cs.SE cs.AI 新提交

Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study

AI IDE中的规则分类与演化:挖掘与调查研究

Guangzong Cai, Ruiyin Li, Peng Liang, Zengyang Li, Mojtaba Shahin

AI总结 通过挖掘83个开源项目中的7310条规则和99份从业者调查,建立了包含5个主类和25个子类的规则分类法,发现开发者重视架构约束但实际配置多为低级工作流和代码格式规则,规则演化主要由建设性上下文扩展和丰富驱动,且更新规则可使工件合规率平均提升22.99%。

详情
Comments
52 pages, 21 images, 8 tables, Manuscript submitted to a Journal (2026)
AI中文摘要

AI驱动的集成开发环境(AI IDE)的采用引入了“规则”作为一种新颖的软件工件,允许开发者将项目特定的约束和架构指导原则持久地注入到大语言模型(LLM)的上下文中。尽管这些规则在使AI行为与开发者意图对齐方面发挥作用,但它们的分类、演化及实际影响仍 largely unexplored。为填补这一空白,我们对AI IDE规则进行了混合方法实证研究。通过挖掘83个开源项目并提取7,310条规则,我们建立了一个包含5个主类和25个子类的全面分类法。随后,我们将这些工件与99名从业者的调查反馈进行三角验证。我们的分析发现开发者优先级与实际配置之间存在反差:虽然从业者认为架构约束非常重要,但仓库中的规则文件主要由低级工作流和代码格式约束组成。此外,我们对1,540个规则演化事件的分析表明,规则更新频繁。仓库数据进一步表明,规则演化主要由建设性上下文扩展(29.17%)和丰富(26.59%)驱动。相比之下,受访开发者报告修改规则主要是为了纠正AI错误(77.78%),通常通过添加新的负面约束而非编辑现有约束。最后,对160个规则演化事件的工件合规性评估显示,更新规则显著提高了软件工件的合规性,更新后平均工件合规率从49.14%提升至72.13%,增加了22.99%。我们的研究提供了实证见解,可帮助开发者优化提示策略,并指导工具构建者为AI IDE设计自动冲突检测和上下文管理机制。

英文摘要

The adoption of AI-powered Integrated Development Environments (AI IDEs) has introduced "Rules" as a novel software artifact, allowing developers to persistently inject project-specific constraints and architectural guidelines into the context of Large Language Models (LLMs). Despite their role in aligning AI behavior with developer intent, the taxonomy, evolution, and practical impact of these rules remain largely unexplored. To bridge this gap, we conducted a mixed-methods empirical study on AI IDE rules. By mining 83 open-source projects and extracting 7,310 rules, we established a comprehensive taxonomy comprising 5 primary and 25 secondary categories. We then triangulated these artifacts with survey responses from 99 practitioners. Our analysis identified a contrast between developer priorities and actual configurations: while practitioners rate architectural constraints as highly important, rule files in repositories primarily consist of low-level workflow and code formatting constraints. Furthermore, our analysis of 1,540 rule evolution events revealed that rules are updated frequently. Repository data further indicate that rule evolution is primarily driven by constructive context expansions (29.17%) and enrichments (26.59%). In contrast, surveyed developers reported modifying rules primarily to correct AI errors (77.78%), typically by adding new negative constraints rather than editing existing ones. Finally, an artifact compliance assessment of 160 rule evolution events revealed that updating rules significantly improves the adherence of software artifacts, with the average artifact compliance rate increasing by 22.99% (from 49.14% to 72.13%) following an update. Our study provides empirical insights that can help developers optimize prompting strategies and guide tool builders in designing automated conflict-detection and context-management mechanisms for AI IDEs.

2606.12073 2026-06-11 cs.SI cs.AI 新提交

"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments

“那就是AI垃圾,你这个机器人!”:研究针对LLM生成评论的指责、证据与可信度

Jason Miklian, John E. Katsos

AI总结 分析2023-2026年Hacker News和Reddit上2500万条评论,发现对AI生成文本的指责增长超十倍,但被指责的文本并非真正由AI生成,而是基于感知真实性的社会把关行为。

详情
AI中文摘要

生成式AI使得流畅的散文变得廉价易得,打破了“好文章意味着真思考”的旧承诺。读者如何回应?这能告诉我们关于反AI态度变化的什么信息?我们分析了来自Hacker News和Reddit(2023-2026年)的2500万条评论,结合了对7500个抽样AI使用指责的LLM判断、情感轨迹、300个确认AI使用指责的言语行为编码,以及被指责与未被指责的父评论的匹配对照测试。我们发现,两个平台上指责中贬义标签的份额增长了十倍以上,而2022年前的不真实性词汇(如shill、astroturf)的安慰剂词汇则没有。这一转变反映了一个快速增长的趋势:将任何可疑或看似不真实的散文标记为“AI垃圾”。AI垃圾框架现在占贬义提及的94%,主导评论的语气从嘲笑转向把关和结构性抗议。关键惊喜来自匹配对照测试,该测试发现,统计上区分AI与人类文本的散文特征并不能预测哪些人类文本会被指责为AI。新的指责作为感知真实性的社会把关,实际上并不筛查AI。这项研究扩展了信号理论,表明当底层检测问题无法在非专家层面解决时,即使不准确,社会使用的替代信号也会增长。它表明,AI对写作的影响从读者侧来看与生产(作者)侧不同。检测技术无法解决这种动态,因为指责的社会功能日益表现为社会把关和群体内信号传递,而非识别AI生成的写作。

英文摘要

Generative AI has made fluent prose cheap to produce, breaking the old promise to readers that good writing meant real thinking. How have readers responded, and what can this tell us about changing anti-AI attitudes? We analyzed 25 million comments from Hacker News and Reddit (2023-2026), combining LLM judgment on 7,500 sampled accusations of AI use, sentiment trajectories, speech-act coding of 300 confirmed accusations of AI use, and a matched-control test of accused versus non-accused parent comments. We found that the pejorative-label share of accusations rose more than tenfold on both platforms while a placebo vocabulary of pre-2022 inauthenticity terms (shill, astroturf) did not. This shift reflected a fast-growing trend of branding any suspicious or seemingly inauthentic prose as "AI slop". The slop frame now constitutes 94 percent of pejorative mentions, with the dominant comments shifting in tone from mockery toward gatekeeping and structural protest. The key surprise comes from a matched-control test which found that prose features that statistically distinguish AI from human text do not predict which human text gets accused as AI. The new accusations work as social gatekeeping of perceived authenticity without actually screening for AI. This research extends signaling theory by showing that substitute signals used socially can grow even when inaccurate if the underlying detection problem cannot be solved at the non-expert level. It shows that AI's effects on writing from the reader side are distinct from those on the production (writer) side. Detection technology cannot resolve this dynamic because the social function of accusations is increasingly to perform social gatekeeping and in-group signaling as opposed to identifying AI-generated writing.

2606.12071 2026-06-11 cs.DL cs.AI 新提交

On the Limits of LLM-as-Judge for Scientific Novelty Assessment

论LLM作为评审在科学新颖性评估中的局限性

Soumitra Sinhahajari, Navonil Majumder, Soujanya Poria

AI总结 本文通过构建RQ-Bench基准,发现LLM评审对模型生成的研究问题产生新颖性幻觉,而人类专家则持相反意见,揭示了LLM在评估科学新颖性时的可靠性问题。

详情
AI中文摘要

LLM越来越多地被用于生成和评判科学想法。这使得新颖性评估成为一个核心问题。完整想法的评估很困难,因为它通常需要判断方法、可行性及其经验前景。因此,我们研究一个更清晰的上游对象:研究问题(RQ)。RQ生成是科学构思的前提,并且RQ可以与真实论文中探讨的问题进行比较。我们引入了RQ-Bench,一个基于近期arXiv论文构建的基准。对于每篇论文,我们从其引用的背景、空白和贡献中重建作者锚定的RQ。这些RQ并非针对同一背景的唯一有效问题。它们是用于测试新颖性判断的作者锚定参考点。我们使用独立LLM评审、比较LLM评审和人类专家评估来评估模型生成的RQ。LLM评审一致地将模型生成的RQ评为高度新颖,产生新颖性幻觉;在比较评估中,这种偏好甚至更强。然而,领域专家得出相反结论,更偏好作者锚定的参考问题。我们进一步发现,许多生成的RQ狭窄或受限于来源,这是LLM评审通常忽略的维度,除非明确测试。总体而言,LLM评审与人类专家之间矛盾的新颖性评估引发了关于使用LLM评估研究问题科学新颖性可靠性的严重担忧。

英文摘要

LLMs are increasingly used to generate and judge scientific ideas. This makes novelty evaluation a central problem. Full idea evaluation is difficult because it often requires judging a method, its feasibility, and its empirical promise. We therefore study a cleaner upstream object: the research question (RQ). RQ generation is a prerequisite for scientific ideation, and RQs can be compared against questions pursued in real papers. We introduce RQ-Bench, a benchmark built from recent arXiv papers. For each paper, we reconstruct author-anchored RQs from its cited background, gaps, and contributions. These RQs are not the only valid questions for the same background. They are author-anchored reference points for testing novelty judgments. We evaluate model-generated RQs with standalone LLM judging, comparative LLM judging, and human expert evaluation. LLM judges consistently rate model-generated RQs as highly novel, producing a novelty mirage; in comparative evaluations, this preference becomes even stronger. Domain experts, however, reach the opposite conclusion and prefer the author-anchored reference questions. We further find that many generated RQs are narrow or source-bound, a dimension that LLM judges often miss unless explicitly tested. Overall, the contradictory novelty evaluations between LLM judges and human experts raise a serious concern about the reliability of using LLMs to assess the scientific novelty of research questions.

2606.11869 2026-06-11 cs.SE cs.AI 新提交

Agents All the Way Down; A Methodology for Building Custom AI Agents from Substrate to Production

层层代理:从底层到生产构建自定义AI代理的方法论

Marc Alier Forment, Juanan Pereira, Francisco José García-Peñalvo, María José Casañ Guerrero

AI总结 提出一种无框架的方法论,通过两个前提条件(将LLM作为软件组件和构建块)和三个实践(原型设计、打包为CLI、代理测试代理)来构建自定义AI代理,实现端到端开发。

详情
AI中文摘要

自定义AI代理是存在于自己应用程序中的代理,它们与自己的数据和工具交互,强制执行自己的安全边界,并携带自己的品牌和审计跟踪。它们与通用层级的区别在于适配性而非能力:每个代理由维护它的工程师为一项工作而构建。目前没有已发布的实践说明如何端到端地构建一个自定义AI代理。各个部分随处可见(函数调用API、模型上下文协议、可配对的代码代理),但将这些部分串联起来的实践存在于播客、博客和泄露的系统提示中。本文将这些实践记录为一种方法论,即“层层代理”:两个前提条件一次交叉并保持,然后三个实践在代理的生命周期中重复。前提条件是(P1)底层:将LLM作为软件组件,框架化为工具、系统,然后在提示缓存下框架化为消息;(P2)构建块:函数调用、MCP、CLI编排、liteshell模式、代理循环、技能、角色、钩子和脚手架。三个实践是(P3)使用通用代理进行原型设计;(P4)收获、折叠并将结果作为CLI发布,即Turtle模式;(P5)代理测试代理,其中通用代理通过行为场景驱动自定义代理,这是对经典测试的补充而非替代。工作循环是P3到P4再到P5并返回,一个推论自然得出:多代理编排就是CLI组合。该方法论在构造上是无框架的。它从AAC中提炼而来,AAC是开源LAMB平台的自定义代理,由一名开发人员使用AI配对程序员在大约十天内构建并投入生产。我们将其作为一种可迁移的实践呈现,独立于任何语言或框架。

英文摘要

Custom AI agents areagents that live inside their own application, talk to their own data and tools, enforce their own security boundaries, and carry their own brand and audit trail. What separates them from the general-purpose tier is fit, not capability: each is built for one job, by the engineer who will maintain it. No published practice sets out how to build one end to end. The pieces are everywhere (function-calling APIs, the Model Context Protocol, code agents to pair with), but the practice that chains them lives in podcasts, blogs, and leaked system prompts. This paper writes that practice down as a methodology, Agents All the Way Down: two preconditions crossed once and kept, then three practices repeated for the agent's life. The preconditions are (P1) Substrate, the LLM as a software component, framed as tools, then system, then messages under prompt-caching; and (P2) Building blocks: function calling, MCP, CLI orchestration, the liteshell pattern, the agent loop, skills, characters, hooks, and scaffolding. The practices are (P3) prototype with a general-purpose agent; (P4) harvest, fold, and ship the result as a CLI, the Turtle pattern; and (P5) agent-tests-agent, in which a general-purpose agent drives it through behavioural scenarios, a complement to classical testing, not a replacement. The working loop is P3 to P4 to P5 and back, and one corollary falls out for free: multi-agent orchestration is just CLI composition. The methodology is framework-free by construction. It was distilled from the AAC, a custom agent for the open-source LAMB platform, built in about ten days by one developer with an AI pair-programmer and in production . We present it as a transferable practice, independent of any language or framework.

2606.11835 2026-06-11 cs.HC cs.AI 新提交

Designing AI-Supported Focus Groups: A Role x Modality Playbook

设计AI支持的焦点小组:角色×模态剧本

Zhiqing Wang, Steven Dow

AI总结 针对焦点小组资源密集且对引导高度敏感的问题,提出按AI角色(工具、联合主持、主持)和模态(文本、语音、具身)组织的剧本,并分析交互权衡与开放问题。

详情
AI中文摘要

收集参与者的生活经验是设计研究的核心。焦点小组的独特价值在于参与者不仅分享个人经历,还能相互回应,从而呈现比较、分歧和集体意义建构。然而,焦点小组资源密集且对引导高度敏感:主持人必须探究细节、平衡参与、管理话题流程并维持心理安全,微妙的引导选择可能影响哪些内容变得突出。近期人机交互研究和商业会议工具表明,生成式AI可以通过提示、轮流调节、主题映射和实时总结来支撑实时对话。然而,用户体验研究团队缺乏关于这些能力在焦点小组中的含义以及引入的方法论风险的清晰图景。我们综合了AI支持实时对话的相关工作,并将其转化为一个焦点小组特定的剧本,按AI角色(工具、联合主持、主持)和模态(文本、语音、具身)组织。我们描述了交互权衡,并识别了将AI支持的焦点小组作为方法论配置进行评估的开放问题。

英文摘要

Collecting participants' lived experiences is central to design research. Focus groups are uniquely valuable because participants not only share individual accounts but also respond to one another, surfacing comparison, disagreement, and collective sensemaking. However, focus groups are resource-intensive and highly sensitive to facilitation: moderators must probe for specificity, balance participation, manage topic flow, and sustain psychological safety, and subtle facilitation choices can shape what becomes salient. Recent HCI work and commercial meeting tools show that generative AI can scaffold live conversation through prompting, turn regulation, thematic mapping, and real-time summarization. Yet UXR teams lack a clear map of what these capabilities mean in focus groups and what methodological risks they introduce. We synthesize AI supports for live conversation and translate them into a focus-group-specific playbook organized by AI role (tool, co-host, host) and modality (text, voice, embodied).We synthesize prior work on AI-supported live conversation and propose a focus-group-specific playbook of AI supports organized by role (tool, co-host, host) and modality (text, voice, embodied). We characterize interactional trade-offs and identify open questions for evaluating AI-supported focus groups as methodological configurations.

2606.11671 2026-06-11 cs.CR cs.AI 新提交

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

运行时技能审计:针对智能体技能安全的目标运行时探测

Tu Lan, Chaowei Xiao

AI总结 提出运行时技能审计(RSA)动态分析方法,通过目标运行时条件探测技能行为,在100个技能上达到90.0%准确率,优于静态基线。

详情
AI中文摘要

智能体技能让LLM智能体能够复用指令、资源、工具和工作流,但也为恶意行为提供了新的隐藏场所。一个技能在其文档或代码中可能看起来无害,但只有在与特定用户请求、本地资产、持久状态或多步骤工具交互调用时才会变得有害。这使得纯静态审查变得脆弱。我们提出运行时技能审计(RSA),一种动态分析方法,通过询问技能介导的智能体在目标运行时条件下实际做了什么来审计技能。RSA不是用相同的通用任务测试每个技能,而是分析风险相关接口,准备执行上下文以触发这些接口,并根据产生的跟踪证据分配安全标签。我们在OpenClaw上实现RSA,并在100个技能上针对代表性静态基线进行评估。RSA达到90.0%的准确率,88.0%的真阳性率和8.0%的假阳性率,比最佳静态基线提高13.0个百分点。在自进化攻击下,静态检测器在一两轮后崩溃,而RSA在每轮中持续检测出19-20个恶意技能。

英文摘要

Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19--20 out of 20 malicious skills across rounds.

2606.11596 2026-06-11 eess.SY cs.AI cs.SY 新提交

Model-Based and Data-Driven Hierarchical Control and Topology Co-Design for Robust Networked Systems

基于模型和数据驱动的鲁棒网络系统分层控制与拓扑协同设计

Shirantha Welikala, Zihao Song, Hai Lin, Panos J. Antsaklis

AI总结 针对线性子系统构成的网络系统,提出基于模型和仅依赖轨迹数据的分层控制策略,结合耗散性理论与线性矩阵不等式实现局部与全局耗散性保证及拓扑优化,并应用于直流微电网的鲁棒电压调节与电流共享。

详情
Comments
To be submitted to Automatica
AI中文摘要

本文考虑一类由相互连接的线性子系统、扰动输入和性能输出构成的网络系统。利用耗散性理论,我们首先提出一种基于模型的分层控制设计策略,确保闭环网络系统从扰动输入到性能输出是耗散的。这包括为每个子系统设计局部控制器以强制执行局部耗散性保证,然后利用这些保证协同设计分布式全局控制器和互连拓扑,以在优化互连拓扑成本的同时强制执行全局耗散性保证。整个设计过程仅需求解一系列线性矩阵不等式(LMI)问题,从而保持组合性和可分散性,同时避免低效且集中的非凸迭代设计过程。这种基于模型的分层控制设计策略假设已知子系统动力学,这在许多实际网络系统中可能不成立。受此启发,我们还提出了一种数据驱动的分层控制设计策略,该策略仅假设子系统可获取丰富的输入-状态-输出轨迹数据。所提出的数据驱动设计过程假设影响子系统动力学的未知扰动受二次矩阵不等式约束(放宽了常规界限),并通过使用矩阵S引理来考虑这一点。最后,以直流微电网网络系统为例,验证了所提出的基于模型和数据驱动的分层控制设计在实现鲁棒(耗散)电压调节和电流共享方面的有效性。

英文摘要

In this paper, we consider a class of networked systems comprising an interconnected set of linear subsystems, disturbance inputs, and performance outputs. Using dissipativity theory, we first propose a model-based hierarchical control design strategy to ensure the closed-loop networked system is dissipative from its disturbance inputs to performance outputs. This involves designing local controllers for each subsystem to enforce local dissipativity guarantees, which are then exploited to co-design distributed global controllers and the interconnection topology to enforce global dissipativity guarantees while optimizing interconnection topology costs. The overall design process requires only solving a sequence of linear matrix inequality (LMI) problems, thereby retaining compositionality and decentralizability while avoiding non-convex, iterative design processes that are inefficient and centralized. This model-based hierarchical control design strategy assumes the knowledge of the subsystem dynamics, which may not hold in many real-world networked systems. Motivated by this, we also propose a data-driven hierarchical control design strategy that assumes only the availability of rich input-state-output trajectory data from the subsystems. The proposed data-driven design process assumes that the unknown disturbances affecting the subsystem dynamics are bounded by a quadratic matrix inequality (relaxing conventional bounds) and accounts for this by using the matrix S-lemma. Finally, the effectiveness of the proposed model-based and data-driven hierarchical control designs is illustrated for a networked system representing a DC microgrid, with the aim of enforcing robust (dissipative) voltage regulation and current sharing.

2606.11469 2026-06-11 cs.DS cs.LG math.ST stat.TH 新提交

Density estimation for Hellinger via minimum-distance estimators: mixtures of Gaussians, log-concave, and more

基于最小距离估计量的Hellinger密度估计:高斯混合、对数凹等

Spencer Compton, Jerry Li

AI总结 将最小距离估计方法从总变差距离扩展到Hellinger距离,通过反向数据处理不等式,实现了对对数凹混合和高斯混合(任意方差)的近线性时间学习,样本复杂度接近最优。

详情
AI中文摘要

我们研究密度估计任务,希望从$n$个样本中准确估计概率密度。在总变差距离下,密度估计的经典方法是最小距离估计量方法,其中我们仅通过限制特定概念类(即Yatracos类)的VC维即可得到算法和分析。虽然该技术最初主要针对总变差距离给出了精确保证,但在本文中,我们将最小距离估计量方法扩展到Hellinger距离下的学习。我们的主要观察是,通过联系最近得到反向数据处理不等式的结果,我们可以为Hellinger距离生成类似的方案(其中我们只需要限制相关概念类的VC维)。该方案足够灵活,可以容纳最初为总变差距离设计的快速算法;通过修改Acharya等人(2017)的方法,我们首次得到了近线性时间算法,用于学习包括单变量对数凹密度混合和高斯混合(具有任意方差)在内的类别,且样本复杂度接近最优。

英文摘要

We study the task of density estimation, where we hope to accurately estimate a probability density from $n$ samples. A textbook method for density estimation in total variation distance is the minimum-distance estimator approach, where we conclude both the algorithm and the analysis merely from bounding the VC dimension of a particular concept class (the so-called Yatracos class). While this technique has originally yielded sharp guarantees primarily for total variation distance, in this work we extend the minimum-distance estimator approach for learning within Hellinger distance. Our main observation is that we may produce an analogous recipe for Hellinger (where we only require bounding the VC dimension of a related concept class) by drawing connections to recent results yielding reverse data processing inequalities. This recipe is flexible enough to accommodate fast algorithms originally designed for total variation distance; by modifying the approach of Acharya et al. (2017) we conclude the first near-linear time algorithm for learning classes including univariate mixtures of log-concave densities and mixtures of Gaussians (with arbitrary variances), with near-optimal sample complexity.

2606.11425 2026-06-11 cs.CR cs.AI 新提交

JailbreakOPT: Tool-Assisted Iterative Jailbreak Prompt Optimization

JailbreakOPT: 工具辅助的迭代越狱提示优化

Ge Shi, Jun Yin, Donglin Xie, Fangyi Liu, Yucan Li, Menglin Liu

AI总结 提出JailbreakOPT框架,通过工具库和上下文Thompson采样优化单轮越狱提示,在多个LLM上提高攻击成功率并减少攻击次数。

详情
AI中文摘要

越狱攻击暴露了大语言模型(LLM)中持续存在的安全弱点,但现有的无状态单轮方法面临权衡:手工制作的提示具有表现力但静态,而迭代提示优化可以适应但通常依赖于需要多次目标查询的低级突变。我们提出了JailbreakOPT,一个用于改进迭代单轮越狱提示优化的工具辅助框架。JailbreakOPT将多样化的原子越狱提示组织成一个攻击工具库,并通过统一的回合内优化抽象组合它们,以生成更强的独立攻击提示。为了跨攻击回合重用经验,JailbreakOPT进一步将工具选择框架化为上下文赌博机问题,并应用上下文汤普森采样基于过去的结果指导探索和利用。在多个目标LLM和攻击目标上的实验表明,与原子单轮攻击和现有的迭代优化基线相比,JailbreakOPT提高了攻击成功率(ASR),同时减少了成功所需的攻击次数(No.A)。本文可能包含冒犯性或有害内容。

英文摘要

Jailbreak attacks expose persistent safety weaknesses in large language models (LLMs), but existing stateless single-turn methods face a trade-off: hand-crafted prompts are expressive but static, while iterative prompt optimization can adapt but often relies on low-level mutations that require many target queries. We propose JailbreakOPT, a tool-assisted framework for improving iterative single-turn jailbreak prompt optimization. JailbreakOPT organizes diverse atomic jailbreak prompts into an attack tool library and composes them through a unified intra-episode optimization abstraction to generate stronger standalone attack prompts. To reuse experience across attack episodes, JailbreakOPT further frames tool selection as a contextual bandit problem and applies contextual Thompson sampling to guide exploration and exploitation based on past outcomes. Experiments across multiple target LLMs and attack goals show that JailbreakOPT improves attack success rate (ASR) while reducing the number of attacks until success (No.A) compared with atomic single-turn attacks and existing iterative optimization baselines. This paper may contain offensive or harmful content.

2606.11361 2026-06-11 cs.IR cs.CL 新提交

A PubMed-Scale Dataset of Structured Biomedical Abstracts

一个PubMed规模的生物医学结构化摘要数据集

Chia-Hsuan Chang, Haerin Song, Brian Ondov, Hua Xu

AI总结 针对PubMed中大量非结构化摘要阻碍下游文本处理的问题,构建了包含2320万条记录的结构化摘要语料库,其中590万条来自官方XML,1720万条通过大语言模型自动标注,统一为五段格式。

详情
Comments
Data and code for this work are available at https://doi.org/10.5281/zenodo.20336717 and https://github.com/BIDS-Xu-Lab/StructuredPubMed, respectively
AI中文摘要

结构化摘要对于生物医学文献处理至关重要,它有助于信息检索、文本挖掘和知识综合。然而,PubMed中索引的绝大部分摘要仍然是非结构化的,这给下游文本处理工作流程和应用带来了重大瓶颈。为解决这一限制,我们引入了Structured PubMed,这是一个从完整PubMed数据库编译而来的全面语料库,包含超过2320万条研究文章记录,每条记录都带有节标签。该语料库分为两个不同的子集:一个包含590万条作者结构化摘要的集合,这些摘要从官方XML文件中解析而来;另一个包含1720万条原本非结构化摘要的自动标注集合,这些摘要通过逐字提取的大语言模型流水线进行结构化。每条记录都统一在统一的五节模式下,并映射到其原始PubMed标识符、出版类型和出版日期。该数据集可用于训练句子分类模型、基准测试文本分割架构,并在前所未有的PubMed范围内进行大规模、特定节的信息提取。

英文摘要

Structured abstracts are important for biomedical literature processing, by facilitating information retrieval, text mining, and knowledge synthesis. However, a vast portion of abstracts indexed in PubMed remain unstructured, presenting a significant bottleneck for downstream text-processing workflows and applications. To resolve this limitation, we introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database, encompassing over 23.2 million research-article records. The corpus is divided into two distinct subsets: a collection of 5.9 million author-structured abstracts parsed from official XML files, and an automatically labeled collection of 17.2 million originally unstructured abstracts structured via a verbatim-extraction Large Language Model pipeline. Every record is harmonized under a unified five-section schema and mapped to its original PubMed identifier, publication type, and publication date. This dataset can be utilized to train sentence-classification models, benchmark text-segmentation architectures, and perform large-scale, section-specific information extraction at an unprecedented PubMed-wide scale.

2606.11287 2026-06-11 eess.IV cs.CV 新提交

Intelligent Skin Cancer Detection Using a Multispectral Metasurface and a Hybrid

基于多光谱超表面和混合深度学习的智能皮肤癌检测

Afsane Saee Arezoomand

AI总结 提出结合多光谱超表面成像与CNN-ViT混合深度学习架构,实现皮肤癌高精度检测,准确率达98%,灵敏度95%,特异性99%。

详情
Journal ref
New Researches in the Smart City, Vol. 4, No. 1, Autumn 2025
Comments
8 pages
AI中文摘要

皮肤癌是全球最常见的恶性肿瘤之一,早期检测对于提高患者生存率和降低治疗成本至关重要。传统的皮肤镜和视觉成像技术主要局限于可见光谱,通常无法捕捉与早期恶性肿瘤相关的细微光谱特征。本研究提出了一种创新框架,将多光谱超表面成像与基于卷积神经网络和视觉Transformer的混合深度学习架构相结合。设计的超表面能够非侵入性地获取对组织变化高度敏感的丰富光谱信息,而混合CNN-ViT模型同时提取局部和全局特征,以稳健地对皮肤病变进行分类。基于模拟的评估表明,所提方法实现了约98%的准确率、95%的灵敏度和99%的特异性,优于传统的基于RGB和单一架构的方法。使用注意力图进行的定性分析显示,模型关注临床相关的病变区域,提高了可解释性。总体而言,结果表明,将基于超表面的多光谱成像与混合深度学习相结合,可以引入新一代皮肤病学诊断工具,并为便携、快速且高精度的临床系统铺平道路。

英文摘要

Skin cancer is among the most prevalent malignancies worldwiAdbe satnradcitts early detection is essential for improving patient survival and reducing treatment costs Conventional dermoscopic and visual imaging techniques are primarily limited to the visible spectrum and often fail to capture subtle spectral signatures associated with early stage malignancies This study proposes an innovative framework that integrates a multispectral metasurface for imaging with a hybrid deep learning architecture based on Convolutional Neural Networks and Vision Transformers The designed metasurface enables noninvasive acquisition of rich spectral information highly sensitive to tissue alterations while the hybrid CNN ViT model simultaneously extracts local and global features to robustly classify skin lesions Simulation-based evaluations demonstrate that the proposed method achieves approximately 98 accuracy 95 percentages sensitivity and 99 perentage specificity surpassing conventional RGB-based and single-architecture approaches Qualitative analyses using attention maps reveal that the model focuses on clinically relevant lesion regions improving interpretability Overall the results indicate that combining metasurface based multispectral imaging with hybrid deep learning can introduce a new generation of diagnostic tools in dermatology and pave the way for portable fast and highly accurate clinical systems

2606.11218 2026-06-11 cs.CY cs.AI 新提交

An Ethical eValuation Agent (EeVA): Results of a Proof-of-Concept Test on a Prototype Agentic-like Workflow to Assist Ethical Deliberations

伦理评估代理(EeVA):在原型类代理工作流中辅助伦理审议的概念验证测试结果

Stephen Milford, B. Zara Malgir, Miguel Vazquez

AI总结 提出基于LLM的类代理工作流EeVA,通过10种伦理框架评估用例,生成结构化评估与综合,促进伦理反思而非给出绝对答案,在三个案例中验证了可行性。

详情
AI中文摘要

伦理审议常被误解为寻找单一对错答案,这给必须应对伦理挑战的非伦理专业人员带来困难。我们开发了EeVA,一种基于LLM的类代理工作流,旨在支持比较性伦理反思而非提供确定性伦理答案。EeVA使用n8n编程,包含三个互连工作流:启动器、工作器和发射器。它通过评估器和综合提示,根据10种伦理框架评估上传的用例。概念验证测试使用了来自城市交通、点对点能源交易和社会服务资源分配的三个已发表案例。在所有案例中,EeVA生成了结构一致的框架特定评估和综合报告。输出区分了不同框架,识别了收敛和分歧,提出了增加一致性的修改建议,并突出了持续的伦理张力。综合报告对非专业人士可读,并将注意力从简单答案转向设计条件、保障措施以及跨框架完全一致不太可能的领域。研究结果表明,LLM可以被组织成可用的工作流,在保留伦理多元性的同时,帮助弥合伦理学家与非伦理专业人员之间的沟通差距。EeVA的价值不在于取代伦理学家或解决道德分歧,而在于构建结构化的伦理审议。EeVA为在伦理专业知识有限的情况下支持伦理反思提供了一个有前景的概念验证。在成为成熟工具之前,还需要在可重复性、人工评估、用户测试和效率方面进行进一步工作。

英文摘要

Ethical deliberation is often misunderstood as a search for single right or wrong answers, creating difficulties for non-ethically trained personnel who must address ethically laden challenges. We developed EeVA, an agentic-like LLM-based workflow designed to support comparative ethical reflection rather than deliver definitive ethical answers. EeVA was programmed in n8n using three interconnected workflows: starter, worker, and emitter. It evaluated uploaded use cases against 10 ethical frameworks through evaluator and synthesis prompts. Proof-of-concept testing used three published cases from urban mobility, peer-to-peer energy trading, and social-service resource allocation. Across all cases, EeVA produced consistently structured framework-specific evaluations and integrated syntheses. Outputs differentiated between frameworks, identified convergences and divergences, recommended modifications to increase alignment, and highlighted persistent ethical tensions. Syntheses were readable for non-specialists and shifted attention away from simplistic answers toward design conditions, safeguards, and areas where full cross-framework agreement was unlikely. The findings suggest that LLMs can be organised into usable workflows that preserve ethical plurality while helping bridge the communicative gap between ethicists and non-ethically trained personnel. EeVA's value lies not in replacing ethicists or resolving moral disagreement, but in scaffolding structured ethical deliberation. EeVA offers a promising proof of concept for supporting ethical reflection where access to ethics expertise is limited. Further work is needed on reproducibility, human evaluation, user testing, and efficiency before it can be considered a mature tool.

2606.11195 2026-06-11 cs.CY cs.AI cs.HC 新提交

From Consumption to Reflection: Designing Human-AI Relations for Stable Reasoning

从消费到反思:为稳定推理设计人-人工智能关系

Rikard Rosenbacke, Carl Rosenbacke, Victor Rosenbacke, Martin McKee

AI总结 提出关系反思智能(RRI),一种推理时治理层,通过可审计的推理循环实现反思,将人机交互转变为联合推理系统,以补偿双方局限并实现稳定推理。

详情
AI中文摘要

大型语言模型(LLM)改变了人类获取信息的方式,但并未改变我们推理信息的方式。它们的流畅性加速了消费,同时绕过了支撑健全判断的缓慢反思过程。本文介绍了关系反思智能(RRI),一种推理时治理层,通过可审计的推理循环将反思操作化。RRI 不在模型内部运行,而是在模型周围运行,为人类与 LLM 之间的稳定、可审计推理提供了实用结构。核心前提是,LLM 继承了与塑造人类思维相似的认知脆弱性:依赖直觉捷径、混淆表征与现实、偏好连贯性而非证伪。当人类和模型共享这些倾向时,它们的错误会叠加。我们称之为关系漂移,一种源于交互而非仅来自模型的失败。解决这一问题需要从建模词间关系转向建模模型输出与人类推理之间的关系。RRI 通过三个组件提供了这一缺失层:Rose-Frame(识别推理中可能的故障点)、Architect's Pen(在关键时刻引入针对性反思步骤)以及一个推理时工作流(无需重新训练模型即可嵌入这些步骤)。这些元素共同将人机交互转变为一个具有显式检查点、冲突揭示和可审计假设轨迹的联合推理系统。RRI 不是让机器像人类一样思考,也不是强迫人类像机器一样推理,而是创造一种结构化交互,使双方补偿彼此的局限。它将 AI 安全重新定义为认知架构问题,其中可靠决策取决于将反思直接嵌入交互过程。

英文摘要

Large language models (LLMs) have transformed how humans access information, but not how we reason with it. Their fluency accelerates consumption while bypassing the slow, reflective processes that underpin sound judgment. This paper introduces Relational Reflective Intelligence (RRI), an inference-time governance layer that operationalizes reflection through auditable reasoning loops. RRI operates not inside the model but around it, providing a practical structure for stable, auditable reasoning between humans and LLMs. The core premise is that LLMs inherit cognitive vulnerabilities similar to those that shape human thought: reliance on intuitive shortcuts, confusion between representation and reality, and a preference for coherence over falsification. When humans and models share these tendencies, their errors compound. We refer to this as relational drift, a failure that arises from interaction rather than from the model alone. Addressing this requires a shift from modeling relations between words to structuring relations between model outputs and human reasoning. RRI provides this missing layer through three components: the Rose-Frame, which identifies likely breakdowns in reasoning; the Architect's Pen, which introduces targeted reflection steps at critical moments; and an inference-time workflow that embeds these steps without retraining the model. Together, these elements transform human-AI interaction into a joint reasoning system with explicit checkpoints, conflict surfacing, and an auditable trail of assumptions. Rather than making machines think like humans or forcing humans to reason like machines, RRI creates a structured interaction in which both compensate for each other's limitations. It reframes AI safety as a cognitive architecture problem, where reliable decisions depend on embedding reflection directly into the interaction process.

2606.11107 2026-06-11 eess.IV cs.CV cs.LG 版本更新

Multimodal Brain Tumour Classification Using Feature Fusion

使用特征融合的多模态脑肿瘤分类

Wajih ul Islam, Muhammad Yaqoob, Javed Ali Khan, Volker Steuber

AI总结 提出双分支多模态网络,融合MRI图像与91个放射组学特征,通过门控融合实现脑肿瘤分类,准确率达96.13%。

详情
AI中文摘要

临床医生通过综合患者症状、病史以及来自MRI和CT扫描等模态的定量成像数据,形成统一的临床判断来诊断脑肿瘤。然而,大多数深度学习模型仅依赖MRI/CT图像,未能复制临床医生的多模态推理。我们探索了一种双分支多模态网络,将原始MRI扫描与91个提取的放射组学特征(强度、纹理、形状和边界描述符)相结合,将脑肿瘤分类为胶质瘤、脑膜瘤、垂体瘤和无肿瘤。预训练的CNN骨干网络编码图像流,而专用的MLP编码放射组学特征流。通过拼接、门控或双向跨模态注意力策略融合两个流。在平衡的7200张图像数据集上的九次实验运行中,所有多模态配置均优于单模态基线,其中门控融合实现了最佳准确率96.13%。

英文摘要

Clinicians diagnose brain tumors by synthesizing patient symptoms, medical history, and quantitative imaging data from modalities such as MRI and CT scans into a unified clinical judgement. However, most deep learning models rely on MRI/CT images alone, failing to replicate the clinicians multimodal reasoning. We explore a two-branch multimodal network combining raw MRI scans with 91 extracted radiomic features (intensity, texture, shape, and boundary descriptors) to classify brain tumors into glioma, meningioma, pituitary, and no-tumor. A pre-trained CNN backbone encodes the image stream, whereas a dedicated MLP encodes the radiomic stream. Both streams are fused via concatenation, gated, or bidirectional cross-modal attention strategies. Across nine experimental runs on a balanced 7,200 image dataset, all multimodal configurations outperform unimodal baselines with gated fusion achieving the best accuracy of 96.13%.

2606.09964 2026-06-11 quant-ph cs.LG 版本更新

JGRA: Jacobian Geometry Robustness Assessment in NISQ Noise-Aware Quantum Neural Networks

JGRA: NISQ噪声感知量子神经网络中的雅可比几何鲁棒性评估

Gianluca Scanu, Luca Barletta, Stefano Rini

AI总结 提出JGRA框架,通过雅可比几何评估噪声感知量子神经网络的鲁棒性,包括熵匹配噪声校准、噪声感知训练和噪声条件雅可比提取,揭示干净域结构与噪声推理行为的关系。

详情
Comments
Accepted at IEEE qCCL 2026. Author accepted manuscript. 6 pages; cleaned source files, no changes to manuscript content
AI中文摘要

NISQ时代对量子计算施加了严格约束,噪声和退相干从根本上限制了性能。在经典深度学习中,模型对扰动的鲁棒性和弹性已得到充分研究:深度神经网络(DNN)由于其表示中的固有冗余,在剪枝、噪声注入和结构扰动下仍能保持高性能。量子机器学习的一个核心挑战是将这种鲁棒性概念转移到现实NISQ噪声下的量子神经网络(QNN)中。虽然经典深度学习通过结构冗余表现出鲁棒性,但QNN的类似原理尚不成熟。我们提出JGRA:一个通过雅可比几何评估噪声感知QNN鲁棒性的框架,捕捉噪声引起的参数扰动下的模型敏感性。我们的方法包括熵匹配噪声校准、噪声感知训练和噪声条件雅可比提取,产生将干净域结构与噪声推理行为联系起来的几何描述符。我们还实验证明,这些描述符编码了关于在未见噪声下鲁棒性的预测信息。

英文摘要

The NISQ era places stringent constraints on quantum computation, where noise and decoherence fundamentally limit performance. In classical deep learning, model robustness and resilience to perturbations are well studied: deep neural networks (DNNs) maintain high performance despite pruning, noise injection, and structural perturbations due to inherent redundancy in their representations. A central challenge in quantum machine learning is to transfer this notion of robustness to quantum neural networks (QNNs) under realistic NISQ noise. While classical deep learning exhibits robustness through structural redundancy, analogous principles for QNNs remain underdeveloped. We propose JGRA: a framework for assessing robustness in noise-aware QNNs via Jacobian geometry, capturing model sensitivity to parameter perturbations induced by noise. Our method includes entropy-matched noise calibration, noise-aware training, and noise-conditioned Jacobian extraction, yielding geometric descriptors that link clean-regime structure to noisy inference behaviour. We also empirically demonstrate that these descriptors encode predictive information about robustness under unseen noise.

2606.06527 2026-06-11 cs.AR cs.LG 版本更新

Characterizing the Impact of NVFP4 Quantization for Low-Power Edge AI Deployment

块大小、权重精度和缩放精度在低功耗边缘高效神经网络NVFP4推理中的消融研究

Ovishake Sen, Venkata Nithin Kamineni, Daniel Lobo, Swarup Bhunia, Rickard Ewetz, Baibhab Chatterjee

AI总结 本文通过消融实验研究NVFP4 LUT推理框架,结合4位激活、两级缩放和电压缩放存储,在边缘高效模型上实现高达26.85倍能耗降低和2.21倍面积缩减。

详情
Comments
7 Pages
AI中文摘要

节能边缘推理需要降低算术成本、内存流量和硬件开销。本文对基于NVFP4 LUT的边缘高效神经网络推理进行了消融研究。提出的NVLUT框架结合了4位NVFP4激活、两级缩放、基于LUT的尾数计算、电压缩放存储和选择性ECC保护。乘法分解为符号、指数和尾数路径,其中符号使用XOR逻辑,指数使用整数加法,尾数乘法由紧凑的LUT访问替代。NVFP4激活使用FP4数据,并带有FP8块缩放和FP32张量缩放。在六个边缘高效模型上,块大小消融表明B=16提供了实用的精度/存储权衡,对于N=4096仅需4.5078位每输入。权重精度消融表明,在相同NVFP4激活路径下,FP8和FP16权重相比FP4权重仅带来适度提升。与纯无缩放FP4相比,无重训练的NVFP4通过恢复激活动态范围大幅恢复精度,而带重训练的NVFP4在模型上达到最佳精度。硬件分析显示,NVLUT相比传统LUT在ECC加电压缩放下实现高达26.85倍能耗降低,在混合电压操作下高达22.85倍。面积分别减少高达2.21倍和1.52倍。这些结果表明,NVFP4两级缩放结合选择性可靠性保护实现了鲁棒、低能耗的边缘推理。

英文摘要

Energy-efficient neural-network inference at the edge requires reducing arithmetic cost, memory traffic, computation energy, and storage overhead while maintaining acceptable accuracy. This paper presents an ablation-focused study of NVFP4 quantization for edge-efficient neural networks, with emphasis on the relationship between activation precision, weight precision, block-size scaling, retraining, and model accuracy. NVFP4 activations are represented using 4-bit FP4 data, an FP8 block scale, and an FP32 tensor scale, enabling ultra-low precision inference while preserving activation dynamic range. A block-size ablation over six edge-efficient models shows that block size B = 16 provides a practical accuracy/storage trade-off, requiring only 4.5078 bits per input for N = 4096. A weight precision ablation further shows that FP8 and FP16 weights provide only modest gains over FP4 weights under the same NVFP4 activation path, suggesting that activation quantization and scaling dominate much of the accuracy behavior. To isolate the benefit of the NVFP4 data type, this work compares conventional unscaled FP4 activation inference and NVFP4 activation inference with and without retraining. The results show that conventional FP4 inference collapses accuracy for most compact models, while NVFP4 without retraining already recovers substantial accuracy by restoring activation dynamic range through FP8 block scaling and FP32 tensor scaling. When combined with retraining, NVFP4 achieves the best accuracy across the evaluated models, demonstrating the effectiveness of scaling-aware FP4 (NVFP4) inference. These findings provide general design guidance for hardware-software co-design of low power edge inference across a broad range of accelerator platforms, including GPUs, Tensor Cores, FPGAs, domain-specific AI accelerators, near-memory computing systems, and emerging edge-computing architectures.

2605.31506 2026-06-11 cs.IR cs.CL 版本更新

Evaluating Factual Density in Multi-Source RAG: A Study in Medical AI Accuracy

评估多源RAG中的事实密度:医学AI准确性研究

Michael R. DeMarco

AI总结 针对标准RAG管道因专家盲视效应而忽视高密度事实证据的问题,提出事实密度(FD*)作为检索优化信号,通过概率事实性分析预处理和Z-score归一化消除长度偏差,在HealthFC基准上实现100%系统综述覆盖率。

详情
Comments
16 pages, 8 tables. Includes Experiment 3 results (n=11, Wilcoxon p=0.0619). Preliminary findings; powered Experiment 3 and Graph RAG extension identified as future work. Updated from v1
AI中文摘要

检索增强生成(RAG)是当前将AI锚定于现实世界事实的行业标准。传统检索方法依赖关键词匹配和主题接近度,根据内容与用户查询的相似程度进行排序。但它们并未衡量内容实际包含多少经过验证的事实。这种结构性差距被称为专家盲视效应,导致标准RAG管道持续将高密度事实证据埋没,而偏向于同一主题的词汇主导文本。为解决这一差距,本文引入事实密度(FD*),一种新颖的检索优化信号,衡量经过验证的原子声明相对于总标记数的比例。使用NexusAgentics Ghost Audit预处理管道,通过概率事实性分析对原始文本进行事实特异性评分,在语料库摄入前过滤内容。初始公式引入了严重的文档长度混杂因素(Pearson R = -0.8636,p = 2.27e-07)。在长度区间内实施Z-score归一化解决了这一偏差,验证了FD*作为长度无关的密度信号(p = 0.0749)。在HealthFC基准(由医学专家标记为支持、反驳或无证据的750个健康声明)上评估,FD*优化的检索是唯一在top-5结果中实现100%系统综述饱和度的条件,使标准余弦相似度排名前十之外的Cochrane证据浮现。真实验证确认了跨越七个HealthFC支持声明的25个映射。由于语料库-基准对齐的限制,n=50个查询的完整统计验证仍是未来工作,但这些发现确立了事实密度重排序作为一种低成本、高影响力的干预措施,用于提高健康RAG架构的事实精度。

英文摘要

Retrieval-Augmented Generation (RAG) is the current industry standard for grounding AI in real-world facts. Traditional retrieval methods rely on keyword matching and topic proximity, ranking content based on how closely it sounds like the user's query. What they do not measure is how many verified facts the content actually contains. This structural gap, termed the Expert Blindness Effect, causes standard RAG pipelines to consistently bury high-density factual evidence in favor of lexically dominant text on the same topic. To address this gap, this paper introduces Factual Density (FD*), a novel retrieval optimization signal that measures the proportion of verified atomic claims relative to total token count. Using the NexusAgentics Ghost Audit preprocessing pipeline, raw text is scored for factual specificity using probabilistic factuality analysis to filter content before corpus ingestion. An initial formulation introduced a severe document-length confound (Pearson R = -0.8636, p = 2.27e-07). Implementing Z-score normalization within length bins resolved this bias, validating FD* as a length-independent density signal (p = 0.0749). Evaluated against the HealthFC benchmark (750 health claims labeled Supported, Refuted, or No Evidence by medical experts), FD*-optimized retrieval was the only condition to achieve 100% systematic review saturation in top-5 results, surfacing Cochrane evidence that standard cosine similarity ranked outside the top ten. Ground truth verification confirmed 25 mappings across seven HealthFC-supported claims. While full statistical validation across n=50 queries remains future work due to constraints on corpus-benchmark alignment, these findings establish factual density reranking as a low-cost, high-impact intervention for improving factual precision in health RAG architectures.

2605.17557 2026-06-11 cs.GR cs.CV 版本更新

Real-Time Neural Hair Denoising

实时神经头发去噪

Chenghao Wu, Yuefan Shen, Tao Huang, Kai Yan, Zahra Montazeri, Kui Wu

AI总结 本文提出了一种轻量级的实时方法,用于从严重欠采样的光栅化输入中重建基于丝状的头发G-Buffers。方法首先应用神经空间重建和时间累积来恢复头发覆盖,即像素内的分数头发可见性及切线向量,然后利用切线引导的重建步骤完成位置信息,随后用于基于物理的延迟头发着色。在多种发型和静态/动态场景下评估了该方法,其头发重建质量优于现有专门针对头发的去噪技术以及通用工业神经重建解决方案如DLSS和FSR。

详情
AI中文摘要

我们提出了一种轻量级的实时方法,用于从严重欠采样的光栅化输入中重建基于丝状的头发G-Buffers。我们的流程首先应用神经空间重建和时间累积来恢复头发覆盖,即像素内的分数头发可见性及切线。然后使用切线引导的重建步骤完成位置,该信息随后用于基于物理的延迟头发着色。我们在多种发型,包括直发、卷发、阿非利卡发型和马尾发型,在静态和动态场景下评估了我们的方法。我们的方法在头发重建质量上优于现有的专门针对头发的去噪技术以及通用工业神经重建解决方案,如DLSS和FSR。

英文摘要

We propose a lightweight real-time method for reconstructing strand-based hair G-Buffers from severely undersampled rasterized inputs. Our pipeline first applies neural spatial reconstruction and temporal accumulation to recover hair coverage, i.e., fractional hair visibility within a pixel, and tangent. It then uses a tangent-guided reconstruction step to complete the position, which is subsequently used for physically based deferred hair shading. We evaluate our method across a diverse set of hairstyles, including straight, wavy, afro, and ponytail styles, under both static and dynamic scenarios. Our method achieves higher hair reconstruction quality than existing hair-specific denoising techniques and general industrial neural reconstruction solutions such as DLSS and FSR.

2603.21639 2026-06-11 cs.CY cs.LG 版本更新

A Multi-Modal Sensor Fusion Instrument for Measuring Regional Human Mobility: The Distributed Human Data Engine (DHDE)

多模态传感器融合仪器用于测量区域人类流动性:分布式人类数据引擎(DHDE)

Amil Khanzada, Takuji Takemoto

AI总结 提出分布式人类数据引擎(DHDE),通过融合边缘AI相机、数字意图信号、行为记录和气象数据,解决外围区域人类流动性测量中传感器稀疏和行为异质性问题,验证了稀疏传感器补偿方法,并发现“低活力悖论”。

详情
Comments
32 pages, 4 figures, 3 tables. Pre-print of a manuscript submitted for peer review (v2)
AI中文摘要

准确估计外围区域经济中的人类流动性面临一个基本的测量挑战:物理地面实况传感器稀疏,行为意图信号异质,环境摩擦给需求推断引入系统性偏差。我们提出分布式人类数据引擎(DHDE),一种多模态传感器融合架构,通过整合物理仪器(边缘AI相机)、数字意图信号(路线搜索印象指标)、行为记录(90,350条消费记录,97,719份标准化调查回复)以及日本福井四个地理分布节点的气象数据来解决这一挑战。主要的测量科学贡献在于设计、部署和跨节点验证DHDE作为稀疏传感器补偿仪器:一种异质传感器融合架构,将非平稳数字意图信号锚定到同时的物理地面实况计数,纠正由气象规划摩擦引入的系统性偏差。该仪器实现为集成推理管道(随机森林和带有Newey-West稳健推断的普通最小二乘法),在397个日观测数据上校准,并通过四个地理上不同的节点类型的时间顺序保留复制进行验证。主要OLS规范实现了样本内解释力R²=0.810和时间顺序样本外预测性能R²=0.683。结果识别出一个“低活力悖论”,其中宏观区域访客满意度与人群密度正相关(Spearman秩相关系数rs=+0.150,p=0.002)。我们估计年度代理缺口为865,917次意图隐含访问,对应119.6亿日元(7260万美元)的损失收入。

英文摘要

Accurately estimating human mobility in peripheral regional economies presents a fundamental measurement challenge: physical ground-truth sensors are sparse, behavioral intent signals are heterogeneous, and environmental friction introduces systematic bias into demand inference. We introduce the Distributed Human Data Engine (DHDE), a multi-modal sensor fusion architecture that addresses this challenge by integrating physical instrumentation (Edge-AI cameras), digital intent signals (route search impression metrics), behavioral records (90,350 spending records, 97,719 standardized survey responses), and meteorological data across four geographically distributed nodes in Fukui, Japan. The primary measurement-science contribution is the design, deployment, and cross-node validation of the DHDE as a sparse-sensor compensation instrument: a heterogeneous sensor fusion architecture that anchors non-stationary digital intent signals to concurrent physical ground-truth counts, correcting for systematic bias introduced by meteorological planning friction. The instrument is implemented as an ensemble inference pipeline (Random Forest and Ordinary Least Squares with Newey-West robust inference), calibrated across 397 daily observations and validated by chronological holdout replication across four geographically distinct node types. The primary OLS specification achieved an in-sample explanatory power of R2 = 0.810 and a chronological out-of-sample predictive performance of R2 = 0.683. Results identify an Under-Vibrancy Paradox where macro-regional visitor satisfaction correlates positively with crowd density (Spearman rank correlation rs = +0.150, p = 0.002). We estimate an annual proxy gap of 865,917 intent-implied visits, corresponding to JPY 11.96 billion (USD 72.6 million) in foregone revenue.

2512.13765 2026-06-11 eess.IV cs.AI cs.LG 版本更新

Towards Deep Learning Surrogate for the Forward Problem in Electrocardiology: A Scalable Alternative to Physics-Based Models

面向心电学正问题的深度学习代理模型:一种可扩展的物理模型替代方案

Shaheim Ogbomo-Harmitt, Cesare Magnetti, Chiara Spota, Jakub Grzelak, Oleg Aslanidi

AI总结 提出基于注意力机制的序列到序列深度学习框架,作为心电学正问题的代理模型,从心脏电压传播图预测心电图信号,在2D组织模拟中达到高精度(平均R²=0.99±0.01),为物理模型提供可扩展、低成本的替代方案。

详情
Comments
Accepted to CinC conference 2025
AI中文摘要

心电学中的正问题,即从心脏电活动计算体表电位,传统上使用基于物理的模型(如双域或单域方程)求解。虽然准确,但这些方法计算成本高,限制了其在实时和大规模临床中的应用。我们提出一个概念验证的深度学习(DL)框架,作为正问题求解器的高效代理。该模型采用基于时间依赖注意力机制的序列到序列架构,从心脏电压传播图预测心电图(ECG)信号。引入了一种混合损失函数,结合Huber损失和谱熵项,以保持时域和频域的保真度。使用包含健康、纤维化和缝隙连接重塑条件的2D组织模拟,模型实现了高精度(平均$R^2 = 0.99 \pm 0.01$)。消融研究证实了卷积编码器、时间感知注意力和谱熵损失的贡献。这些发现突显了DL作为物理求解器的可扩展、低成本替代方案的潜力,适用于临床和数字孪生应用。

英文摘要

The forward problem in electrocardiology, computing body surface potentials from cardiac electrical activity, is traditionally solved using physics-based models such as the bidomain or monodomain equations. While accurate, these approaches are computationally expensive, limiting their use in real-time and large-scale clinical applications. We propose a proof-of-concept deep learning (DL) framework as an efficient surrogate for forward solvers. The model adopts a time-dependent, attention-based sequence-to-sequence architecture to predict electrocardiogram (ECG) signals from cardiac voltage propagation maps. A hybrid loss combining Huber loss with a spectral entropy term was introduced to preserve both temporal and frequency-domain fidelity. Using 2D tissue simulations incorporating healthy, fibrotic, and gap junction-remodelled conditions, the model achieved high accuracy (mean $R^2 = 0.99 \pm 0.01$). Ablation studies confirmed the contributions of convolutional encoders, time-aware attention, and spectral entropy loss. These findings highlight DL as a scalable, cost-effective alternative to physics-based solvers, with potential for clinical and digital twin applications.

2412.01459 2026-06-11 cs.CY cs.AI cs.HC

Perception Gaps in Risk, Benefit, and Value Between Experts and Public Challenge Socially Accepted AI

Philipp Brauner, Felix Glawe, Gian Luca Liehner, Luisa Vervier, Martina Ziefle

详情
Journal ref
AI & Society (2026)
英文摘要

Artificial Intelligence (AI) is reshaping many societal domains, raising critical questions about its risks, benefits, and the potential misalignment between public and academic perspectives. This study examines how the general public (N=1110) -- individuals who interact with or are impacted by AI technologies -- and academic AI experts (N=119) -- those elites shaping AI development -- perceive AI's capabilities and impact across 71 scenarios. These scenarios span domains such as sustainability, healthcare, job performance, societal inequality, art, and warfare. Participants evaluated these scenarios across four dimensions using the psychometric model: likelihood, perceived risk and benefit, and overall value (or sentiment). The results suggest significant differences: experts consistently anticipate higher probabilities, perceive lower risks, report greater benefits, and express more positive sentiment toward AI compared to the non-experts. Moreover, both groups apply different weighting schemes: experts discount risk more heavily relative to benefit than non-experts. Visual mappings of these evaluations uncover areas convergent evaluations (e.g., AI performing medical diagnoses or criminal use) as well as tension points (e.g., decision of legal cases, political decision making), highlighting areas where communication and policy interventions may be needed. These findings underscore a critical translational challenge: if AI research and deployment are to align with societal priorities, the perception gap between developers and the public must be better understood and addressed. Our results provide an empirical foundation for value-sensitive AI governance and trust-building strategies across stakeholder groups.

2512.20464 2026-06-11 physics.optics cs.CV cs.NE physics.app-ph

Snapshot 3D image projection using a diffractive decoder

Cagatay Isil, Alexander Chen, Yuhang Li, F. Onuralp Ardic, Shiqi Chen, Che-Yung Shen, Aydogan Ozcan

详情
Journal ref
Light: Science & Applications (2026)
Comments
22 Pages, 8 Figures
英文摘要

3D image display is essential for next-generation volumetric imaging; however, dense depth multiplexing for 3D image projection remains challenging because diffraction-induced cross-talk rapidly increases as the axial image planes get closer. Here, we introduce a 3D display system comprising a digital encoder and a diffractive optical decoder, which simultaneously projects different images onto multiple target axial planes with high axial resolution. By leveraging multi-layer diffractive wavefront decoding and deep learning-based end-to-end optimization, the system achieves high-fidelity depth-resolved 3D image projection in a snapshot, enabling axial plane separations on the order of a wavelength. The digital encoder leverages a Fourier encoder network to capture multi-scale spatial and frequency-domain features from input images, integrates axial position encoding, and generates a unified phase representation that simultaneously encodes all images to be axially projected in a single snapshot through a jointly-optimized diffractive decoder. We characterized the impact of diffractive decoder depth, output diffraction efficiency, spatial light modulator resolution, and axial encoding density, revealing trade-offs that govern axial separation and 3D image projection quality. We further demonstrated the capability to display volumetric images containing 28 axial slices, as well as the ability to dynamically reconfigure the axial locations of the image planes, performed on demand. Finally, we experimentally validated the presented approach, demonstrating close agreement between the measured results and the target images. These results establish the diffractive 3D display system as a compact and scalable framework for depth-resolved snapshot 3D image projection, with potential applications in holographic displays, AR/VR interfaces, and volumetric optical computing.

2412.12944 2026-06-11 math.OC cs.CV

Online optimisation for dynamic electrical impedance tomography

Neil Dizon, Jyrki Jauhiainen, Tuomo Valkonen

详情
Journal ref
Inverse Problems 41 (2025), 055005
英文摘要

Online optimisation studies the convergence of optimisation methods as the data embedded in the problem changes. Based on this idea, we propose a primal dual online method for nonlinear time-discrete inverse problems. We analyse the method through regret theory and demonstrate its performance in real-time monitoring of moving bodies in a fluid with Electrical Impedance Tomography (EIT). To do so, we also prove the second-order differentiability of the Complete Electrode Model (CEM) solution operator on $L^\infty$.

2107.00693 2026-06-11 eess.SP cs.LG

Inter-Beat Interval Estimation with Tiramisu Model: A Novel Approach with Reduced Error

Asiful Arefeen, Ali Akbari, Seyed Iman Mirzadeh, Roozbeh Jafari, Behrooz A. Shirazi, Hassan Ghasemzadeh

详情
Comments
16 pages, 14 figures
英文摘要

Inter-beat interval (IBI) measurement enables estimation of heart-rate variability (HRV) which, in turns, can provide early indication of potential cardiovascular diseases. However, extracting IBIs from noisy signals is challenging since the morphology of the signal is distorted in the presence of the noise. Electrocardiogram (ECG) of a person in heavy motion is highly corrupted with noise, known as motion-artifact, and IBI extracted from it is inaccurate. As a part of remote health monitoring and wearable system development, denoising ECG signals and estimating IBIs correctly from them have become an emerging topic among signal-processing researchers. Apart from conventional methods, deep-learning techniques have been successfully used in signal denoising recently, and diagnosis process has become easier, leading to accuracy levels that were previously unachievable. We propose a deep-learning approach leveraging tiramisu autoencoder model to suppress motion-artifact noise and make the R-peaks of the ECG signal prominent even in the presence of high-intensity motion. After denoising, IBIs are estimated more accurately expediting diagnosis tasks. Results illustrate that our method enables IBI estimation from noisy ECG signals with SNR up to -30dB with average root mean square error (RMSE) of 13 milliseconds for estimated IBIs. At this noise level, our error percentage remains below 8% and outperforms other state of the art techniques.

2606.12410 2026-06-11 math.CO math.PR 新提交

Arrangements of Consecutive Numbers in Mallows Permutations

Mallows排列中连续数字的排列

Katarzyna Rybarczyk

AI总结 研究Mallows分布下排列中连续数字聚类排列的计数随机变量,给出了期望的渐近表达式,并确定了分布近似泊松分布的参数范围。

详情
AI中文摘要

我们研究了在Mallows分布下,排列中连续数字的特定聚类排列的计数随机变量。我们给出了该随机变量期望的渐近表达式。这一结果扩展并加强了Pinsky (2022)关于Mallows排列中连续数字聚类的已知结果。此外,我们确定了参数范围,在该范围内Mallows排列中连续数字聚类排列数量的分布接近泊松分布。

英文摘要

We study the random variable that counts the number of specific arrangements of clustered consecutive numbers in permutations under the Mallows distribution. We provide an asymptotic expression for the expected value of this random variable. This result extends and tightens the previously known result by Pinsky (2022) concerning clustered consecutive numbers in Mallows permutations. Moreover, we identify a range of parameters for which the distribution of the number of arrangements of clustered consecutive numbers in Mallows permutations is close to a Poisson distribution.

2606.12409 2026-06-11 cond-mat.quant-gas cond-mat.str-el physics.atom-ph quant-ph 新提交

A Pfaffian quantum Hall state of ultracold bosons

超冷玻色子的Pfaffian量子霍尔态

Joyce Kwan, Perrin Segura, Yanfei Li, Tizian Blatz, Annie Zhi, Brice Bakkali-Hassani, Annabelle Bohrdt, Martin Greiter, Fabian Grusdt, Markus Greiner

AI总结 通过Floquet合成磁场和贝叶斯优化绝热协议,在光晶格中制备超冷铷原子的三体玻色子Pfaffian态,观测到配对关联和短程三体抑制,为研究非阿贝尔任意子编织奠定基础。

详情
Comments
9+11 pages, 5+9 figures
AI中文摘要

分数量子霍尔态是拓扑物理学的基石,承载具有奇异统计特性的分数电荷准粒子,有望实现拓扑保护的量子信息处理。其中,Moore和Read引入的Pfaffian态实现了p波配对结构,支持具有非阿贝尔交换统计的激发。尽管在电子系统中进行了广泛研究,但其配对结构的直接探测仍然有限。在这里,我们在受Floquet工程合成磁场作用的光晶格中,利用超冷$^{87}\mathrm{Rb}$原子实现了三体玻色子Pfaffian态。通过贝叶斯优化的绝热协议,我们制备了一个展现Pfaffian配对关联的态。多点密度关联的位点分辨测量揭示了短程三体重合的显著抑制,反映了潜在的配对结构。我们进一步通过霍尔漂移测量探测了该态的输运响应。我们的结果建立了一种自下而上的工程非阿贝尔拓扑序的方法,并为未来在合成物质中探索任意子编织奠定了基础。

英文摘要

Fractional quantum Hall states are a cornerstone of topological physics, hosting fractionally charged quasiparticles with exotic statistics that promise to enable topologically protected quantum information processing. Among these, the Pfaffian state introduced by Moore and Read implements a p-wave pairing structure that supports excitations with non-Abelian exchange statistics. Despite extensive study in electronic systems, direct access to its pairing structure has remained limited. Here we realize a three-particle bosonic Pfaffian state of ultracold $^{87}\mathrm{Rb}$ atoms in an optical lattice subject to a Floquet-engineered synthetic magnetic field. Using a Bayesian-optimized adiabatic protocol, we prepare a state exhibiting Pfaffian pairing correlations. Site-resolved measurements of multi-point density correlations reveal a pronounced suppression of short-range three-body coincidences, reflecting the underlying pairing structure. We further probe the state's transport response through Hall drift measurements. Our results establish a bottom-up approach to engineering non-Abelian topological order and lay the groundwork for future explorations of anyonic braiding in synthetic matter.