arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.20490 2026-06-19 cs.MS 新提交

Software package MaRDI Open Interfaces for improved interoperability in numerical optimization

软件包MaRDI开放接口：提升数值优化互操作性

Dmitry I. Kabanov, Stephan Rave, Mario Ohlberger

AI总结提出MaRDI开放接口软件包，通过统一非线性优化接口减少编码与测试工作，并以物理信息神经网络求解粘性Burgers方程为例验证其互操作性。

Comments 15 pages, 1 figure, 1 table, GAMM2026

2606.20488 2026-06-19 cs.CV 新提交

How Fragile Are Training-Free AI-Generated Image Detectors? A Controlled Audit of Score Direction, Preprocessing, and Compression

无训练AI生成图像检测器有多脆弱？对分数方向、预处理和压缩的受控审计

Jingwen Zhou, Mingzhe Wang

发表机构 * Xidian University（西安电子科技大学）

AI总结本文通过统一协议审计两种无训练检测分数（自编码重建和噪声扰动特征相似性）及kNN基线，发现实现细节、分数方向选择和数据集格式偏差会导致AUROC变化高达0.38，且简单融合无法超越最佳单分数。

详情

AI中文摘要

无训练的AI生成图像检测器承诺无需分类器训练即可实现生成器无关的部署，但其报告的数字很少在单一受控协议下进行比较。我们审计了两种代表性的无训练分数——一种自编码器重建分数（AEROBLADE风格）和一种噪声扰动特征相似性分数（RIGID风格），外加一个朴素的特征kNN控制，在包含七个生成器和JPEG压缩质量70和50的公共1,500图像GenImage衍生基准上进行。审计得出三个警示性发现。（i）实现细节伪装成方法差异：将LPIPS骨干网络（AlexNet -> VGG-16）替换使整体AUROC变化+0.085，在resize-to-512和原始分辨率预处理之间切换使每个生成器的结论翻转高达0.38 AUROC。（ii）分数方向不是方法的属性而是其超参数的属性：RIGID风格分数在噪声水平sigma=0.05时对SD1.5和Wukong反转（AUROC < 0.5），在sigma=0.01时对所有生成器恢复至>0.5，在sigma=0.3时降至0.15。（iii）数据集格式偏差夸大鲁棒性声明：没有统一重新编码时，JPEG-50下的AUROC超过AlexNet骨干重建分数的干净条件；偏差校正后残余异常定位到单个生成器（BigGAN）。审计的分数具有互补的逐生成器失败集，但朴素z-score融合未能击败最佳单分数，表明利用互补性需要方向感知的组合。

英文摘要

Training-free detectors of AI-generated images promise generator-agnostic deployment without classifier training, yet their reported numbers are rarely compared under a single controlled protocol. We audit two representative training-free scores -- an autoencoder-reconstruction score (AEROBLADE-style) and a noise-perturbation feature-similarity score (RIGID-style) -- plus a naive feature-kNN control, on a common 1,500-image GenImage-derived benchmark spanning seven generators and JPEG compression at quality 70 and 50. The audit yields three cautionary findings. (i) Implementation details masquerade as method differences: replacing the LPIPS backbone (AlexNet -> VGG-16) changes overall AUROC by +0.085, and switching between resize-to-512 and native-resolution preprocessing flips per-generator conclusions by up to 0.38 AUROC. (ii) Score direction is not a property of the method but of its hyperparameters: the RIGID-style score is inverted (AUROC < 0.5) on SD1.5 and Wukong at noise level sigma=0.05, recovers to >0.5 for every generator at sigma=0.01, and collapses to 0.15 at sigma=0.3. (iii) Dataset format bias inflates robustness claims: without unified re-encoding, AUROC under JPEG-50 exceeds the clean condition for the AlexNet-backbone reconstruction score; after bias correction the residual anomaly localizes to a single generator (BigGAN). The audited scores have complementary per-generator failure sets, but naive z-score fusion does not beat the best single score, indicating that exploiting complementarity requires direction-aware combination.

URL PDF HTML ☆

赞 0 踩 0

2606.20487 2026-06-19 cs.CL 新提交

Beyond Global Replanning: Hierarchical Recovery for Cross-Device Agent Systems

超越全局重规划：跨设备智能体系统的分层恢复

Shu Yao, Yuhua Luo, Qian Long, Jingru Fan, Zhuoyuan Yu, Yuheng Wang, Lin Wu, Yufan Dang, Huatao Li, Chen Qian

发表机构 * School of Artificial Intelligence, Shanghai Jiao Tong University（上海交通大学人工智能学院）； Shanghai Innovation Institute（上海创新研究院）； Southeast University（东南大学）； Tsinghua University（清华大学）

AI总结提出分层重规划框架H-RePlan，通过统一API-CLI-GUI执行和跨层失败抽象，区分设备本地策略恢复与全局重规划，在HeraBench基准上显著提升跨设备任务完成率和指令遵循度。

详情

AI中文摘要

现实世界中的计算机使用任务通常跨越多个应用程序和设备，要求智能体在动态运行时故障下协调异构环境。现有的多设备智能体系统支持任务分解和跨设备分配，但恢复仍然粗粒度：当执行失败时，它们通常重试相同策略、重新分配子任务或修改全局计划，而没有系统地建模设备本地策略空间。这限制了它们区分可在当前设备内修复的故障与需要跨设备重规划的故障的能力。我们提出\textbf{H-RePlan}，一个用于具有统一API-CLI-GUI执行的多设备智能体的分层重规划框架。H-RePlan为每个设备配备可互换的执行策略，并通过紧凑的跨层失败抽象将设备本地策略恢复与编排器级全局重规划分离。为了评估这一能力，我们引入\textbf{HeraBench}，一个故障注入基准，它在Linux和Android设备上构建跨设备工作流，并注入策略级和设备级故障。实验表明，H-RePlan显著优于单策略和粗粒度多设备基线，实现了更高的完成率、指令遵循率和完美通过率，同时降低了可靠端到端成功所需的令牌成本。这些结果表明，范围感知的分层恢复对于鲁棒的多设备智能体执行至关重要。

英文摘要

Real-world computer-use tasks often span multiple applications and devices, requiring agents to coordinate heterogeneous environments under dynamic runtime failures. Existing multi-device agent systems support task decomposition and cross-device assignment, but recovery remains largely coarse-grained: when execution fails, they typically retry the same strategy, reassign the subtask, or revise the global plan, without systematically modeling the device-local strategy space. This limits their ability to distinguish failures that can be repaired within the current device from those that require cross-device replanning. We propose \textbf{H-RePlan}, a hierarchical replanning framework for multi-device agents with unified API--CLI--GUI execution. H-RePlan equips each device with interchangeable execution strategies and separates device-local strategy recovery from orchestrator-level global replanning through a compact cross-layer failure abstraction. To evaluate this capability, we introduce \textbf{HeraBench}, a fault-injected benchmark that constructs cross-device workflows over Linux and Android devices and injects strategy- and device-level failures. Experiments show that H-RePlan substantially outperforms single-strategy and coarse-grained multi-device baselines, achieving higher completion, instruction adherence, and perfect-pass rates while reducing the token cost required for reliable end-to-end success. These results demonstrate that scope-aware hierarchical recovery is essential for robust multi-device agent execution.

URL PDF HTML ☆

赞 0 踩 0

2606.20482 2026-06-19 cs.CL cs.HC cs.LG 新提交

Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

你的鼠标和眼睛悄悄泄露你的偏好：利用用户隐式反馈进行LLM对齐

Haw-Shiuan Chang, Jeffrey Gomez, Mehul Patwari, Aryan Sajith, Hamed Zamani

发表机构 * University of Massachusetts, Amherst（马萨诸塞大学阿默斯特分校）； York University（约克大学）

AI总结针对显式反馈稀缺的问题，提出利用鼠标轨迹和眼动数据等隐式反馈训练奖励模型，将文本奖励模型准确率从55%提升至64%，并显著提高DPO对齐后响应质量。

详情

AI中文摘要

为了对齐大型语言模型（LLM），大多数现有方法收集显式的人类反馈，并基于响应文本训练奖励模型来预测人类偏好。这些现有方法有两个关键局限性。首先，用户很少为LLM响应提供显式反馈，这使得高质量偏好标注的收集成本高昂。其次，这些方法没有利用隐式人类反馈，而隐式反馈已被证明对互联网巨头的经济护城河至关重要。为了量化隐式反馈的价值，我们构建了一个名为IFLLM的新数据集，收集了来自59名Mechanical Turk工作者的1336个多轮问题、他们的鼠标轨迹以及通过网络摄像头对LLM响应的眼动注视点。IFLLM显示用户具有非常多样化的注视行为和鼠标轨迹。基于隐式用户反馈的奖励模型将基于文本的奖励模型准确率从55%提升至64%，并在将DPO应用于八个LLM后，相对响应质量改进几乎翻了三倍，证明了隐式反馈在现实场景中的价值。我们的数据收集网站、数据集和代码可在以下网址找到：此https URL。

英文摘要

To align a Large Language Model (LLM), most existing methods collect explicit human feedback and train a reward model to predict the human preference based on the response text. These existing methods have two key limitations. First, the users rarely provide explicit feedback for LLM responses, which makes the high-quality preference annotation expensive to collect. Second, the methods do not leverage implicit human feedback, which has proven vital to the economic moats of Internet giants. To quantify the value of implicit feedback, we build a new dataset called IFLLM, which collects 1336 multi-turn questions from the 59 Mechanical Turk workers, their mouse trajectories, and eye gazing points to the LLMs' responses from their webcams. IFLLM shows that the users have very diverse types of gazing behavior and mouse trajectories. Our reward model based on the implicit user feedback boosts the accuracy of the text-based reward model from 55% to 64% and nearly triples the relative response quality improvements after applying the DPO to eight LLMs, demonstrating the value of implicit feedback in the wild. Our data collection website, dataset, and codes can be found at https://github.com/themehulpatwari/llm-implicit-feedback/.

URL PDF HTML ☆

赞 0 踩 0

2606.20479 2026-06-19 cs.RO 新提交

GroundControl: Anticipating Navigation Failures in Vision-Language Agents via Trajectory-Consistent Uncertainty Estimates

GroundControl: 通过轨迹一致的不确定性估计预测视觉语言智能体中的导航失败

Nastaran Darabi, Divake Kumar, Sina Tayebati, Devashri Naik, Amit Ranjan Trivedi

发表机构 * University of Illinois at Chicago (UIC)（伊利诺伊大学芝加哥分校）

AI总结提出轨迹一致的不确定性估计方法GroundControl，通过卡尔曼滤波建模距离变化并结合轨迹特征，有效预测导航失败，在选择性风险-覆盖评估中优于基线。

详情

AI中文摘要

视觉语言导航智能体在基准任务上取得了具有竞争力的平均成功率，但失败通常源于可预测的轨迹级问题，如振荡、停滞或低效绕路。因此，可靠部署需要能够在执行过程中预测新兴失败动态的不确定性信号，而不仅仅是反映瞬时动作熵。我们引入了\emph{GroundControl}，一种轨迹一致的不确定性估计器，定义为在一个回合中聚合的、相对于标称目标导向的距离-目标动态的统计偏差。GroundControl使用恒定速度卡尔曼滤波器对距离演化进行建模，并将归一化创新统计量与补充轨迹特征（捕捉进展、单调性、路径效率和振荡行为）相结合。由此产生的不确定性分数反映了导航行为中的几何和时间不一致性，而非局部预测分散。为了独立于任务成功评估不确定性质量，我们形式化了\emph{选择性风险-覆盖导航（SRCN）}协议，该协议通过风险-覆盖曲线和AURC/E-AURC摘要，衡量不确定性分数按失败或低效对回合进行排序的有效性。在五个EB-Navigation分割（$N=300$个回合）上，基于成功的选择性风险下，轨迹一致的不确定性实现了接近神谕的排序，GPT-4o模型的加权平均$\mathrm{E\text{-}AURC}_{\mathrm{SR}}=0.0024$，显著优于熵、共形和启发式基线。在基于SPL的选择性评估下，GroundControl在模型和导航分割上始终实现最低的AURC和E-AURC。这些结果表明，对目标导向动态的偏离进行建模，为预测视觉语言智能体中的导航失败提供了可解释且鲁棒的信号。

英文摘要

Vision-language navigation agents achieve competitive average success on benchmark tasks, yet failures often arise through predictable trajectory-level breakdowns such as oscillation, stagnation, or inefficient detours. Reliable deployment, therefore, requires uncertainty signals that anticipate emerging failure dynamics during execution rather than reflect only instantaneous action entropy. We introduce \emph{GroundControl}, a trajectory-consistent uncertainty estimator defined as statistical deviation from nominal goal-directed distance-to-goal dynamics aggregated over an episode. GroundControl models distance evolution using a constant-velocity Kalman filter and combines normalized innovation statistics with complementary trajectory features capturing progress, monotonicity, path efficiency, and oscillatory behavior. The resulting uncertainty score reflects geometric and temporal inconsistency in navigation behavior rather than local prediction dispersion. To evaluate uncertainty quality independently of task success, we formalize \emph{Selective Risk--Coverage Navigation (SRCN)}, a protocol that measures how effectively an uncertainty score ranks episodes by failure or inefficiency using risk--coverage curves and AURC / E-AURC summaries. Across five EB-Navigation splits ($N=300$ episodes), trajectory-consistent uncertainty achieves near-oracle ordering under success-based selective risk, with weighted-average $\mathrm{E\text{-}AURC}_{\mathrm{SR}}=0.0024$ for the GPT-4o model, substantially outperforming entropy-, conformal-, and heuristic baselines. Under SPL-based selective evaluation, GroundControl consistently achieves the lowest AURC and E-AURC across models and navigation splits. These results show that modeling deviation from goal-directed dynamics provides an interpretable and robust signal for anticipating navigation failures in vision-language agents.

URL PDF HTML ☆

赞 0 踩 0

2606.20477 2026-06-19 cs.CV cs.CL cs.LG 新提交

Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology

面向放射学的空间定位2D视觉-语言模型的可扩展训练

Yusuf Salcan, Simon Ging, Robin Schirrmeister, Philipp Arnold, Elmar Kotter, Behzad Bozorgtabar, Thomas Brox

发表机构 * Computer Vision Group, University of Freiburg, Germany（德国弗莱堡大学计算机视觉组）； Department of Radiology, Medical Center -- University of Freiburg, Germany（德国弗莱堡大学医学中心放射科）； CRIION-AI Lab, Freiburg, Germany（德国弗莱堡CRIION-AI实验室）

AI总结提出RefRad2D大规模双语数据集，通过LLM和自动分割生成空间定位数据，训练RadGrounder模型联合完成报告生成、VQA和空间定位，在外部基准上取得竞争性结果。

Comments Accepted for MICCAI 2026. First two authors: equal contribution. Last two authors: equal supervision

详情

AI中文摘要

我们研究了如何在没有手动空间标注的情况下，为放射学训练具有视觉定位能力的视觉-语言模型（VLM）。我们引入了RefRad2D，这是一个大规模的双语（德语/英语）数据集，包含来自临床实践的120万对CT和MR图像-文本对，并通过基于LLM的筛选和自动分割自动生成任务特定的VQA和空间定位子集。在此数据上训练的模型RadGrounder联合执行报告生成、视觉问答以及通过边界框检测或分割进行的空间定位。在外部VQA基准（Slake，VQA-RAD）上，RadGrounder取得了与专用医学VLM竞争的结果。将我们的临床数据加入训练混合集，相比于仅在下游数据集上微调，提高了开放式VQA的性能，显示了数据集的迁移性。关键在于，添加定位监督不会降低语言质量，从而在不牺牲VQA性能的情况下实现空间可验证的输出。

英文摘要

We study how to train visually grounded vision-language models (VLMs) for radiology without manual spatial annotations. We introduce RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with task-specific VQA and spatial grounding subsets generated automatically via LLM-based curation and automated segmentation. Trained on this data, our model RadGrounder jointly performs report generation, visual question answering, and spatial grounding via bounding-box detection or segmentation. On external VQA benchmarks (Slake, VQA-RAD), RadGrounder achieves competitive results with specialized medical VLMs. Adding our clinical data to the training mixture improves open-ended VQA over fine-tuning on the downstream datasets alone, showing the transferability of our dataset. Crucially, adding grounding supervision does not degrade language quality, enabling spatially verifiable outputs at no cost to VQA performance.

URL PDF HTML ☆

赞 0 踩 0

2606.20475 2026-06-19 cs.LG 新提交

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

边际优势累积用于记忆驱动智能体自我进化

Mingyu Yang, Keye Zheng, Congchao Cheng, Yujie Liu, Xingkang Lu, Fan Jiang, Yefei Zheng

发表机构 * Alibaba International Digital Commerce Group（阿里巴巴国际数字商业集团）

AI总结针对批量式轨迹蒸馏中跨批次证据缺失问题，提出边际优势累积（MAA）方法，通过差分信号构造、指数移动平均累积和语义身份合并，在16个设置中14个取得最佳结果，优化阶段token消耗减少约75%。

Comments 26 pages, 4 figures, 10 tables, 42 references

2606.20474 2026-06-19 cs.LG cs.AI cs.PF 新提交

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant: 面向上下文密集型智能体的4位KV缓存

Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao

发表机构 * Advanced Micro Devices（超威半导体）； University of California, Los Angeles（加州大学洛杉矶分校）； Purdue University（普渡大学）

AI总结针对上下文密集型智能体场景，提出UltraQuant方法，通过4位KV缓存压缩、旋转量化和代码本量化，结合AMD GPU优化，在长上下文多轮任务中延迟降低3.47倍，吞吐量提升1.63倍。

Comments 11 pages, 9 figures

详情

AI中文摘要

上下文密集型智能体给键值（KV）缓存带来了异常压力：长前缀在多个短轮次中重复使用，而并发性决定了服务系统能否保持GPU利用率。我们针对此场景研究4位KV缓存压缩，采用TurboQuant风格的旋转和代码本量化作为质量锚点，vLLM FP8 KV缓存作为部署锚点。我们报告三项贡献。首先，我们将4位KV缓存框架用于多轮智能体工作负载，其中任务质量、缓存驻留和服务吞吐量必须联合衡量。其次，我们描述了使4位路径鲁棒所需的实际设计选择，包括非对称K/V处理、Walsh-Hadamard旋转、QJL移除和块尺度变体。第三，我们展示了AMD GPU上的服务优化，包括优化的解码注意力内核和UltraQuant，一种使用FP8查询、FP4 KV张量、UE8M0组尺度和CDNA4上原生缩放MFMA支持的FP4近似路径。在长上下文、多轮智能体工作负载上，UltraQuant在缓存压力大的后期轮次中将P50首令牌延迟降低了3.47倍（所有轮次平均2.3倍），并将输出吞吐量比FP8 KV基线提高了1.63倍。

英文摘要

Context-heavy agents place unusual pressure on the key-value (KV) cache: long prefixes are reused across many short turns, while concurrency determines whether the serving system can keep GPUs utilized. We study 4-bit KV-cache compression for this setting, using TurboQuant-style rotation and codebook quantization as a quality anchor and vLLM FP8 KV caching as the deployment anchor. We report three contributions. First, we frame 4-bit KV caching around multi-round agent workloads where task quality, cache residency, and serving throughput must be measured jointly. Second, we describe the practical design choices needed to make the 4-bit path robust, including asymmetric K/V treatment, Walsh-Hadamard rotation, QJL removal, and block-scale variants. Third, we present serving optimizations on AMD GPUs, including optimized decode-attention kernels and UltraQuant, an FP4 approximation path that uses FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA support on CDNA4. On a long-context, multi-turn agentic workload, UltraQuant cuts P50 time-to-first-token by 3.47x in the cache-pressured late rounds (2.3x across all rounds) and raises output throughput by 1.63x over the FP8 KV baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.20470 2026-06-19 cs.CR cs.AI 新提交

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

分析针对基于模型引导的自动化攻击的防御性误导策略在智能体AI系统中的应用

Reza Soosahabi, Vivek Namsani

AI总结本文通过概率模型分析智能体AI系统的攻击-防御场景，提出“检测-误导”策略（如CMPE）以替代传统“检测-拦截”方法，通过产生误导性响应降低攻击者成功率，并在基准测试中将攻击成功率上限降低两个数量级。

详情

AI中文摘要

智能体AI系统越来越依赖语言模型组件来解释指令、处理外部数据、调用工具以及与其他智能体协调。这些能力使得提示注入和越狱攻击的后果更加严重，尤其是当攻击者采用模型引导的自动化来扩展探测、提示优化和响应评估时。本文通过目标系统、其防御机制以及攻击者的自动评判器的概率模型来分析由此产生的攻击-防御场景。我们的分析表明，传统的“检测-拦截”防御可能使攻击者成功率（ASR）随着查询预算的增长而趋近于1，因为可预测的拒绝为自动化搜索提供了有用的反馈。然后，我们研究了“检测-误导”策略，其中检测到的恶意交互会收到受控的、非操作性的响应，旨在诱导攻击者评判器产生假阳性错误。这种策略降低了攻击者选择候选的正预测值，并产生有界的渐近ASR。我们通过渐进式参与的上下文误导（CMPE）评估了该策略的概念验证实现，这是一种轻量级的对话误导方法，旨在在自动化越狱设置中用安全但具有战略误导性的响应替换可预测的拒绝文本。在越狱基准测试中，CMPE将估计的ASR上限降低了两个数量级，并在端到端PAIR和GPTFuzz攻击运行中几乎消除了验证的攻击成功。

英文摘要

Agentic AI systems increasingly rely on language-model components to interpret instructions, process external data, invoke tools, and coordinate with other agents. These capabilities make prompt-injection and jailbreak attacks more consequential, especially as attackers adopt model-guided automation to scale probing, prompt refinement, and response evaluation. This work analyzes the resulting attack-defense setting through a probabilistic model of a target system, its defense mechanism, and the attacker's automated judge. Our analysis shows that conventional detect-and-block defenses can allow attacker success rate (ASR) to approach one as the query budget grows, since predictable refusals provide useful feedback to automated search. We then examine detect-and-misdirect, where detected malicious interactions receive controlled, non-operational responses designed to induce false-positive errors in the attacker's judge. This strategy reduces the positive predictive value of attacker-selected candidates and yields a bounded asymptotic ASR. We evaluate a proof-of-concept realization of this strategy through Contextual Misdirection via Progressive Engagement (CMPE), a lightweight conversational misdirection method designed to replace predictable refusal text with safe but strategically misleading responses in automated jailbreak settings. On jailbreak benchmarks, CMPE reduces estimated ASR upper bounds by up to two orders of magnitude and nearly eliminates verified attack success in end-to-end PAIR and GPTFuzz attack runs.

URL PDF HTML ☆

赞 0 踩 0

2606.20469 2026-06-19 cs.LG cs.CG 新提交

Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

Fisher-几何锐度与SGD对平坦极小值的隐式偏好

Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta

发表机构 * Gauhati University（高哈蒂大学）

AI总结针对SGD偏好平坦极小值但欧氏锐度不具重参数化不变性的问题，提出基于Fisher信息矩阵的黎曼锐度，证明其不变性，并导出SGD稳态分布集中于平坦极小值，PAC-Bayes界联系泛化性能。

Comments 18 pages, 5 figures, preprint

详情

AI中文摘要

深度学习中的一个广泛直觉是随机梯度下降（SGD）隐式偏好平坦极小值，且平坦极小值泛化更好，但损失Hessian的迹或最大特征值等标准欧氏平坦度度量在保持网络函数的重参数化下并非不变，这削弱了这一叙事的理论基础。在本研究中，我们通过将平坦度建立在由Fisher信息矩阵（FIM）诱导的统计流形的黎曼几何上，解决了这一问题。我们在数学上定义了黎曼锐度，并证明它在光滑、保函数的重参数化下是不变的，这直接回应了Dinh等人在论文“Sharp minima can generalize for deep nets”中的批评。我们注意到这种不变性是真实FIM的一个性质；实践中使用的对角经验估计量（以及下面所有实验中的）仅近似继承不变性，而在任意重参数化下的精确不变性需要结构化估计量如K-FAC。我们将小批量SGD的梯度噪声形式化为具有与FIM成比例的协方差结构，推导出所得随机微分方程的稳态分布，然后证明概率质量指数级集中在黎曼平坦极小值处。一个由SR显式控制的PAC-Bayes泛化界正式地将这种几何偏差与测试性能联系起来。我们在MNIST和CIFAR-10上的实验证实，SR以欧氏锐度无法做到的方式可靠地跟踪泛化，并且其随$\eta/B$的缩放与理论预测相匹配。这些结果共同提供了一个严格的、重参数化不变的解释，说明为什么平坦极小值能泛化。

英文摘要

A widely held intuition in deep learning is that stochastic gradient descent (SGD) implicitly favors flat minima and that flat minima generalize better, but standard Euclidean measures of flatness such as the trace or maximum eigenvalue of the loss Hessian are not invariant under reparametrizations that preserve the network function, which undermines the theoretical foundations of this narrative. In this study we resolve this issue by grounding flatness in the Riemannian geometry of the statistical manifold induced by the Fisher Information Matrix (FIM). We define Riemannian sharpness mathematically and prove that it is invariant under smooth, function-preserving reparametrizations, which directly addresses the critique of Dinh et al. in the paper ``Sharp minima can generalize for deep nets''.We note that this invariance is a property of the true FIM; the diagonal empirical estimator used in practice (and in all experiments below) inherits invariance only approximately, and exact invariance under arbitrary reparametrizations would require structured estimators such as K-FAC. We formalize the gradient noise of mini-batch SGD as having a covariance structure proportional to the FIM, derive the stationary distribution of the resulting stochastic differential equation, and then show that the probability mass is exponentially concentrated at Riemannian-flat minima. A PAC-Bayes generalization bound controlled explicitly by SR formally links this geometric bias to test performance. Our experiments on MNIST and CIFAR-10 confirm that SR reliably tracks generalization in ways that Euclidean sharpness does not, and that its scaling with $η/B$ matches the theoretical predictions. Together these results provide a rigorous, reparametrization-invariant account of why flat minima generalize.

URL PDF HTML ☆

赞 0 踩 0

2606.20465 2026-06-19 cs.CY cs.SI 新提交

Farmer Connect: Improving Farmers' Access to Produce Markets

Farmer Connect：改善农民进入农产品市场的途径

Micheal Amanya, Darius Kainamura, Christine Namatovu, Lailah Kobugabe, Solomon Buwule Fortune, Adones Rukundo

AI总结针对乌干达小农户面临的市场准入难、议价能力弱等问题，提出基于合作社的数字平台Farmer Connect，通过移动优先架构和云后端支持群体管理、市场协调和收益透明，实现约85%的用户需求。

详情

AI中文摘要

乌干达的小农户玉米种植者仍然面临有限的市场准入、薄弱的议价能力、低价格透明度以及对中间商的严重依赖。这些问题因农产品协调不善、付款延迟以及合作社交易可见性差而加剧。本文介绍了Farmer Connect，一个基于合作社的数字平台，旨在支持农民群体之间的农产品管理、市场协调和透明的收益跟踪。该系统支持四种用户角色：管理员、监督员、农民和客户。其核心功能包括农民群体管理、贡献记录和验证、市场列表、订单处理、基于先进先出的农产品分配、收益可见性、移动货币支付支持和通知服务。该平台采用移动优先架构，配备基于云的后端服务和行政网页仪表板。功能实现表明，该系统能够支持基于群体的玉米营销和合作社协调所需的主要工作流程，约85%的已识别用户需求得到实现。研究表明，以合作社为中心的数字平台可以为改善小农户的透明度、协调性和买家准入提供实用框架。

英文摘要

Smallholder maize farmers in Uganda continue to face limited market access, weak bargaining power, low price transparency, and heavy reliance on intermediaries. These challenges are compounded by poor produce coordination, delayed payments, and weak visibility into cooperative transactions. This paper presents Farmer Connect, a cooperative-based digital platform designed to support produce management, marketplace coordination, and transparent earnings tracking among farmer groups. The system supports four user roles: administrators, supervisors, farmers, and customers. Its core functions include farmer group management, contribution recording and verification, marketplace listing, order processing, First In First Out based produce allocation, earnings visibility, mobile money payment support, and notification services. The platform was implemented using a mobile-first architecture with cloud-based backend services and an administrative web dashboard. Functional implementation showed that the system was able to support the major workflows required for group-based maize marketing and cooperative coordination, with approximately 85% of identified user requirements implemented. The study shows that cooperative-centered digital platforms can provide a practical framework for improving transparency, coordination, and buyer access for smallholder farmers.

URL PDF HTML ☆

赞 0 踩 0

2606.20461 2026-06-19 cs.LG cs.CY cs.DB 新提交

Data Bias Mitigation under Coverage Constraints & The Price of Fairness

覆盖约束下的数据偏差缓解与公平的代价

Bruno Scarone, Alfredo Viola, Renée J. Miller

发表机构 * Khoury College of Computer Sciences, Northeastern University（东北大学库里计算机科学学院）； Cheriton School of Computer Science, University of Waterloo（滑铁卢大学切里顿计算机科学学院）

AI总结针对多敏感属性交叉群体的偏差问题，提出在覆盖约束下扩展偏差缓解框架，通过整数线性规划优化缓解策略，权衡偏差近似误差与数据效率，并刻画公平的代价。

Comments Accepted to FAccT 2026

详情

AI中文摘要

机器学习模型已被证明在多个敏感属性（如种族和性别）交叉的个体上表现出歧视性结果或性能下降。这源于两个相互关联的挑战：缺乏量化偏差（可能是交叉的）的原则性措施，以及训练数据中交叉子群的代表性不足。我们扩展了一个最近的偏差缓解框架，以纳入覆盖约束，确保跨群体（包括交叉子群）的充分代表性。由于对所有群体实现完全零偏差可能不是数据高效的（意味着可能需要大量数据），我们的解决方案在满足覆盖约束的同时，用偏差的小近似误差换取更高的数据效率。我们还将偏差缓解表述为一个整数线性规划，优化所有缓解策略，并刻画公平的代价，即最小数据修改成本，作为公平容忍度的函数。这对于法律合规（法规可能规定特定的公平阈值）和数据治理（使从业者能够在偏差减少和数据修改（特别是数据购买）成本之间做出明智的权衡）都至关重要。我们在公开数据集上评估了我们的技术，表明通过我们的框架进行偏差缓解可以保持多个分类器的预测准确性，并且覆盖约束虽然出于统计考虑，但对于保持下游机器学习性能至关重要。

英文摘要

Machine learning models have been shown to exhibit discriminatory outcomes or degraded performance for individuals at the intersection of multiple sensitive attributes, such as race and gender. This stems in part from two interrelated challenges: the lack of principled measures for quantifying bias (potentially intersectional), and insufficient representation of intersectional subgroups in training data. We extend a recent bias mitigation framework to incorporate coverage constraints that enforce sufficient representation across groups, including intersectional subgroups. Since achieving exactly zero bias for all groups may not be data efficient (meaning it may require large amounts of data), our solution trades small approximation errors in bias for greater data efficiency while satisfying coverage constraints. We also formulate bias mitigation as an integer linear program that optimizes over all mitigation strategies, and characterize the price of fairness, the minimum data modification cost, as a function of fairness tolerance. This is essential both for legal compliance, where regulations may mandate specific fairness thresholds, and for data governance, enabling practitioners to make informed trade-offs between bias reduction and data modification (particularly, data purchasing) costs. We evaluate our techniques on publicly available datasets, demonstrating that bias mitigation via our framework preserves predictive accuracy across multiple classifiers, and that coverage constraints, while motivated by statistical considerations, are essential for preserving downstream ML performance.

URL PDF HTML ☆

赞 0 踩 0

2606.20459 2026-06-19 cs.AI 新提交

Context-Aware Hierarchical Bayesian Modeling of IVF Laboratory Environmental Conditions

IVF实验室环境条件的上下文感知分层贝叶斯建模

Zahra Asghari Varzaneh, Reza Khoshkangini, Pia Saldeen, Lars Johansson, Thomas Ebner

发表机构 * Department of Computer Science and Media Technology, Malmö University（马尔默大学计算机科学与媒体技术系）

AI总结提出55个上下文感知时间特征捕捉培养箱微环境动态，结合分层贝叶斯Beta回归模型跨诊所共享环境效应，将预测误差从3-5%降至1.27%，并在北欧诊所实现R²=0.86和64%误差降低。

详情

AI中文摘要

IVF妊娠率通常使用患者层面变量进行建模，而高分辨率实验室环境数据仍未得到充分利用。我们表明这是一个错失的机会。我们不再依赖原始传感器平均值，而是设计了55个上下文感知的时间特征，包括滚动热稳定性、同时温湿度符合性、峰值应力持续时间和应力后恢复速度，这些特征捕捉了培养箱微环境的动态。基于来自一家亚洲IVF诊所的61周数据，这些特征将交叉验证预测误差降低至1.27%，而原始平均值的误差为3-5%。然后，我们训练了一个分层贝叶斯Beta回归模型，通过部分池化在亚洲和北欧诊所之间共享环境效应，同时保留特定于诊所的基线。在来自北欧诊所的保留数据上，该模型在35-39岁年龄组中实现了R²=0.86和相对于朴素基线的64%误差降低，表明结构化的环境监测包含具有临床意义的可迁移信号。

英文摘要

IVF pregnancy rates are routinely modeled using patient-level variables, while high-resolution laboratory environmental data remain underutilized. We show that this is a missed opportunity. Rather than relying on raw sensor averages, we engineer 55 context-aware temporal features, including rolling thermal stability, simultaneous temperature-humidity adherence, peak stress duration, and post-stress recovery speed, that capture the dynamics of incubator microenvironments. On 61 weeks of data from an Asian IVF clinic, these features reduce cross-validated prediction error to 1.27%, compared to 3-5% for raw averages. We then train a hierarchical Bayesian Beta regression model that shares environmental effects across an Asian and a Northern European clinic via partial pooling, while preserving site-specific baselines. On held-out data from the Northern European clinic, the model achieves R2 = 0.86 and a 64% error reduction for the 35-39 age group over a naive baseline, demonstrating that structured environmental monitoring contains clinically meaningful, transferable signal.

URL PDF HTML ☆

赞 0 踩 0

2606.20458 2026-06-19 cs.RO 新提交

Slow Brain, Fast Planner: Latency-Resilient VLM-Augmented Urban Navigation

慢速大脑，快速规划器：延迟鲁棒的VLM增强城市导航

Zhenghao "Mark'' Peng, Honglin He, Quanyi Li, Yukai Ma, Bolei Zhou

发表机构 * Amazon FAR（亚马逊 FAR）； UCLA（加州大学洛杉矶分校）； Independent（独立）； Zhejiang University（浙江大学）

AI总结针对移动机器人在人行道导航中轨迹评分差距问题，提出一种无需训练的延迟鲁棒轨迹级融合层，利用VLM选择候选轨迹并与规划器输出融合，在挑战场景下降低ADE 30%。

详情

AI中文摘要

基于学习的 sidewalk 导航规划器可以实时生成多样化的候选轨迹，但其评分函数在挑战性场景中往往无法选择最佳轨迹，即使同一集合中存在更好的候选，也会输出使移动机器人驶入草地、朝向行人或错误方向的轨迹。我们称之为轨迹评分差距：在真实世界的人行道导航中，基于锚点的规划器的最佳选择与最佳候选之间的差距很大，这可能是由于规划器的高层场景理解能力有限。我们不是用端到端的视觉-语言-动作模型替换规划器，而是提出一种VLM-规划器接口，使用VLM从规划器的候选集合中选择一个候选索引，然后将其与规划器的初始输出融合。然而，VLM每次查询需要1-3秒，因此无法直接驱动5-20Hz的控制循环。我们贡献了一种无需训练、延迟鲁棒的轨迹级融合层，通过指数衰减的几何相似性将过时的VLM选择转化为实时规划器评分。在约2000个具有挑战性的真实世界场景（例如交叉口、行人相遇）中，VLM选择相比规划器的最佳选择实现了30%的ADE降低，而规划器在常规场景中仍保持竞争力。在仿真中，Score Fusion在高达5秒的延迟下仍保持>80%的成功率。我们在移动机器人上展示了完整系统，在具有不同网络延迟的具有挑战性的校园人行道上进行导航。

英文摘要

Learning-based planners for sidewalk navigation can generate diverse candidate trajectories in real time, yet their scoring functions often fail to select the best trajectory in challenging situations, outputting trajectories that make the mobile robot drive onto grass, toward pedestrians, or in the wrong direction, even when better candidates exist in the same set. We call this the trajectory scoring gap: in real-world sidewalk navigation, the gap between an anchor-based planner's top choice and the best possible candidate is substantial, likely due to limited high-level scene understanding capability of the planner. Rather than replacing the planner with an end-to-end Vision-Language-Action model, we propose a VLM-Planner interface that uses a VLM to select a candidate index from the planner's proposal set and then fuse it with the planner's initial output. However, VLMs take 1--3s per query and so cannot directly drive a 5--20Hz control loop. We contribute a training-free, latency-resilient trajectory-level fusion layer that turns a stale VLM selection into real-time planner scoring via geometric similarity with exponential decay. On $\sim$2,000 challenging real-world scenarios (e.g., junctions, pedestrian encounters), VLM selection achieves 30% ADE reduction versus the planner's best selection, while the planner remains competitive in routine situations. In simulation, Score Fusion maintains >80% success rate with delays up to 5s. We demonstrate the full system on a mobile robot navigating challenging campus sidewalks with varied network latency.

URL PDF HTML ☆

赞 0 踩 0

2606.20455 2026-06-19 cs.CV 新提交

PCFootprint: A Large-Scale Dataset and Benchmark for Vectorized Building Footprint Extraction from Aerial LiDAR Point Clouds

PCFootprint：用于从航空LiDAR点云中提取矢量化建筑足迹的大规模数据集与基准

Haoyuan Shen, Kuihao Wang, Ruisheng Wang, Yujun Liu

发表机构 * School of Architecture and Urban Planning, Shenzhen University（深圳大学建筑与城市规划学院）

AI总结提出首个大规模航空激光扫描点云建筑足迹提取数据集PCFootprint，含33000个瓦片及跨域测试集，通过评估主流方法揭示复杂地理环境下的挑战。

Comments 14 pages, 9 figures

详情

AI中文摘要

建筑足迹提取是摄影测量、遥感和计算机视觉中的基本任务。近年来，基于图像的方法在高分辨率光学影像的矢量化足迹提取方面取得了显著进展。然而，光学影像本质上易受遮挡、透视畸变和残余地形位移的影响，导致足迹提取不完整或错位。此外，缺乏显式高程信息限制了其在细节层次建筑建模中的直接适用性。本文提出PCFootprint，这是首个用于从机载激光扫描点云中提取足迹的大规模公共数据集。PCFootprint包含来自爱沙尼亚土地和空间发展局的33000个瓦片，覆盖多样化的城市和乡村景观。每个瓦片大小为128×128米，并配有与点云对齐的系统性矢量化足迹。该数据集包括一个3000个瓦片的跨域测试集，用于评估跨地理区域的泛化能力。我们通过评估主流方法建立了全面的基准。实验结果表明，在复杂地理环境中存在高类内方差、数据不平衡和噪声等显著挑战。我们相信PCFootprint将推动建筑建模、城市场景理解和地理空间分析的未来研究。PCFootprint数据集公开于：https://this https URL。

英文摘要

Building footprint extraction is a fundamental task in photogrammetry, remote sensing, and computer vision. Recent image-based methods have achieved remarkable progress in extracting vectorized footprints from high-resolution optical imagery. However, optical imagery inherently susceptible to occlusions, perspective distortions, and residual relief displacement, yielding incomplete or misaligned footprint extraction. Furthermore, the lack of explicit elevation information limits its direct applicability to Level of Detail building modeling. In this paper, we present PCFootprint, the first large-scale public dataset for footprint extraction from airborne laser scanning point clouds. PCFootprint comprises \num{33000} tiles derived from the Estonian Land and Spatial Development Board, covering diverse urban and rural landscapes. Each tile spans \qtyproduct{128 x 128}{\m} with systematically aligned vectorized footprints aligned to point clouds. The dataset includes a \num{3000} tiles cross-domain test set for evaluating generalization across geographic regions. We establish comprehensive benchmarks by evaluating mainstream methods. Experimental results reveal significant challenges including high intra-class variance, data imbalance, and noise across complex geospatial environments. We believe PCFootprint will advance future research in building modeling, urban scene understanding, and geospatial analysis. The PCFootprint dataset is publicly available at \url{https://huggingface.co/datasets/Haoyuan-Shen/PCFootprint}.

URL PDF HTML ☆

赞 0 踩 0

2606.20454 2026-06-19 cs.FL 新提交

Minimality of Random Moore Automata under Prefix-Dependent Congruences

随机摩尔自动机在前缀依赖同余下的极小性

Matías Carrasco, Sergio Yovine

AI总结研究随机确定性迁移系统中前缀依赖同余的平凡性，证明在标签独立且每个标签至少有三个可接受符号时，同余高概率为平凡。

Comments 9 pages

详情

AI中文摘要

我们研究带有状态输出的随机确定性迁移系统的前缀依赖同余。在此设定下，用于比较两个状态的可接受延续可能依赖于观察到的前缀，并且只有当没有共同的可接受延续能区分它们的未来输出时，两个状态才被识别。该框架包括概率确定性有限自动机作为一个激励性的特例。我们分析随机迁移模型，其中所有迁移值是独立且均匀的。每个状态还被分配一个独立标签，该标签指定其输出及其可接受符号集。如果两个独立标签以严格小于1的概率一致，并且每个标签至少有三个可接受符号，则诱导的同余以高概率是平凡的。证明结合了配对上的剪枝过程、控制其早期演化的无碰撞探索，以及表明剩余配对无法组织成非平凡等价类的第一矩论证。

英文摘要

We study prefix-dependent congruences for random deterministic transition systems with state outputs. In this setting, the admissible continuations used to compare two states may depend on the observed prefix, and two states are identified only if no common admissible continuation distinguishes their future outputs. The framework includes probabilistic deterministic finite automata as a motivating special case. We analyze the random transition model in which all transition values are independent and uniform. Each state is also assigned an independent label that specifies both its output and its set of admissible symbols. If two independent labels agree with probability strictly less than one, and every label has at least three admissible symbols, then the induced congruence is trivial with high probability. The proof combines a pruning process on pairs, a collision-free exploration controlling its early evolution, and a first-moment argument showing that the remaining pairs cannot organize into nontrivial equivalence classes.

URL PDF HTML ☆

赞 0 踩 0

2606.20453 2026-06-19 cs.CY cs.HC 新提交

Directors Duties in the Age of Agentic Artificial Intelligence

代理人工智能时代的董事职责

Deirdre Ahern

AI总结探讨董事在采纳代理AI时如何平衡股东与员工利益，分析四种公司治理模型，主张通过更广泛的法律视角促进员工福利。

Journal ref Cambridge Forum on AI: Law and Governance 2, e7 (2026)

详情

DOI: 10.1017/cfl.2026.10049

AI中文摘要

随着董事会采用包括代理AI在内的人工智能以提高运营效率，这为利润最大化提供了新机会。AI的采用越来越与员工角色替代相关联，在公司中，员工作为利益相关者的利益需要探讨。一个新颖的问题是，在AI崛起的时代，当AI在公司中的角色接近或超越人类员工时，AI是否应被赋予利益相关者地位。本文探讨了董事履行公司最佳利益职责时的四种公司目的模型：股东至上模型、开明股东价值模型、利益相关者友好模型和利益相关者价值模型，强调了董事在董事会围绕AI的决策中容纳员工利益的可用空间。结论是，鉴于董事在其最佳利益职责方面免受法律审查的程度，采取更广泛的法律视角来促进员工福利将有利于员工、董事和公司的利益。这将使董事与员工进行有意义的接触，并提供再培训机会以适应AI时代。

英文摘要

As boards engage with the adoption of Artificial Intelligence including agentic AI to drive operational efficiencies, this presents new opportunities for profit maximisation. AI adoption is increasingly identified with employee role displacement and in companies, and the interests of employees as stakeholders require exploration. A novel question posed is whether in an age of AI ascendancy AI may warrant being given stakeholder status as its role in the company approximates or eclipses that of human employees. The article probes four distinct models of corporate purpose within the duty on directors to act in the best interests of the company, the shareholder primacy model, the Enlightened Shareholder value model, the stakeholder friendly model, and the stakeholder value model, highlighting the available scope for directors to accommodate the interests of employees around AI adoption in decision-making by boards around AI. It is concluded that given the degree to which directors are insulated from legal scrutiny in relation to their best interests duty, adopting a wider law in context approach to promote employee welfare would serve the interests of employees, directors and companies alike. This would see directors engaging meaningfully with employees and providing opportunities for reskilling to adapt to the age of AI.

URL PDF HTML ☆

赞 0 踩 0

2606.20449 2026-06-19 cs.CV 新提交

InfantFace: Detecting infant faces in neonatal clinical environments

InfantFace：新生儿临床环境中的婴儿面部检测

Abdullah Bin-Obaid, Maria M. Cobo, Rebeccah Slater, Lionel Tarassenko, Mauricio Villarroel

AI总结针对新生儿临床环境中的遮挡和光照问题，提出基于YOLOv11m的单阶段面部检测模型，在多个公开数据集预训练后，通过临床数据微调，AP50从0.87提升至0.96。

Comments 32 pages, 7 figures, 4 tables; supplementary information included

详情

AI中文摘要

新生儿面部的可靠定位是基于视频摄像头的非接触式评估的第一步，例如疼痛和痛苦相关的面部表情分析、疼痛评分、心肺信号提取和呼吸停止警报。然而，新生儿临床环境中仍存在重大挑战。杂乱的背景、光照变化和不良照明条件会降低面部检测模型的准确性。临床干预、监测设备以及在某些情况下的医疗设备可能会遮挡面部，使视觉评估变得困难。我们提出了一种基于YOLOv11m的单阶段模型，专门用于新生儿临床环境中的婴儿面部检测。我们结合了多个公开数据集（VGGFace2、CelebA、FDDB、WIDER FACE）来训练和评估我们提出的模型。然后，我们在一个新生儿研究数据集上对模型进行了微调，该数据集包含来自114个记录会话的228个视频，涉及113名独立婴儿。在微调之前，我们的模型达到了0.87的AP50，超过了三个最先进的通用面部检测器的性能。在临床领域适应后，性能进一步提高到0.96的AP50。由于缺乏公开的新生儿数据集，评估不同数据集上的面部检测性能仍然是一个挑战。优先创建此类数据集，同时在其创建和使用中维护适当的隐私保护措施和伦理标准，将极大地支持该领域的进一步进展。

英文摘要

Reliable localisation of the neonatal face is the first step for several video-camera based non-contact assessments such as pain and distress related facial expression analysis, pain scoring, cardiorespiratory signal extraction and cessation of breathing alerts. However, major challenges persist in neonatal clinical environments. Cluttered backgrounds, illumination changes and poor lighting conditions can reduce the accuracy of face detection models. Clinical interventions, monitoring equipment and, in some cases, medical devices can obstruct the face, making visual assessment difficult. We propose a one-stage YOLOv11m-based model tailored for face detection of infants in neonatal clinical environments. We combined multiple publicly available datasets (VGGFace2, CelebA, FDDB, WIDER FACE) to train and evaluate our proposed model. We then fine-tuned our model on a neonatal research dataset involving 228 videos from 114 recording sessions of 113 independent infants. Before fine-tuning, our model achieved an AP50 of 0.87, surpassing the performance of three state-of-the-art general face detectors. Performance improved further to an AP50 of 0.96 after clinical-domain adaptation. Evaluating face detection performance across different datasets remains a challenge due to the lack of publicly available neonatal datasets. Prioritising the creation of such datasets, while upholding appropriate privacy safeguards and ethical standards in their creation and use, would greatly support further progress in this field.

URL PDF HTML ☆

赞 0 踩 0

2606.20444 2026-06-19 cs.CR cs.SE 新提交

Image Encryption Algorithm Based on Convolutional Neural Networks and Dynamic S-Box Generation

基于卷积神经网络和动态S盒生成的图像加密算法

Ans Ibrahim, Fadhil Abbas Fadhil, Mahameed Reza Feizi Derakhshi, Maryam Mahdi Alhusseini, Nikolai Safiullin

AI总结提出一种结合CNN与经典密码学的动态图像加密方法，通过CNN学习特征生成自适应S盒，增强非线性、唯一性和输入依赖性，提高抗攻击能力。

2606.20438 2026-06-19 cs.AI 新提交

Interpretable Sperm Morphology Classification via Attention-Guided Deep Learning

可解释的精子形态分类：基于注意力引导的深度学习

Zahra Asghari Varzaneh, Reza Khoshkangini, Thomas Ebner, Lars Johansson

发表机构 * Department of Computer Science and Media Technology, Malmö University（马尔默大学计算机科学与媒体技术系）

AI总结提出注意力引导的深度学习框架，结合EfficientNet-B0和CBAM模块进行精子形态分类，在SMIDS和HuSHem数据集上分别达到90.2%和93.9%的准确率，并通过Grad-CAM++可视化增强可解释性。

2606.20436 2026-06-19 cs.CR cs.AI 新提交

Multi-View Decompilation for LLM-Based Malware Classification

基于LLM的恶意软件分类的多视角反编译

Bercan Turkmen, Vyas Raina

AI总结提出多反编译器视角提升LLM恶意软件分类性能，通过Ghidra和RetDec的互补伪C代码提高召回率和F1分数。

详情

AI中文摘要

恶意软件分析师通常在源代码不可用时，通过反编译的伪C代码检查编译后的二进制文件。最近的研究表明，大型语言模型（LLMs）可以通过将反编译代码分类为良性或恶意来辅助这一过程，但现有的流程通常依赖于单一的反编译器视角。我们认为这一假设是脆弱的：反编译器是有损的启发式工具，不同的反编译器可能暴露同一二进制文件的不同特征。我们整理了一个包含良性工具和恶意程序的基准测试，涵盖一系列威胁行为。每个样本都使用Ghidra和RetDec进行编译和反编译，生成匹配的伪C视图。在来自主要模型系列的一系列LLMs中，我们发现提供两种反编译器视图可以提高恶意类别的F1分数，主要是通过提高恶意样本的召回率。一致性分析进一步表明，Ghidra和RetDec会犯部分不同的错误，支持反编译器输出提供互补证据的观点。我们的结果表明，多反编译器提示是一种简单、无需训练的方法，可以在实际环境中改进基于LLM的恶意软件分类。

英文摘要

Malware analysts often inspect compiled binaries through decompiled pseudo-C, when source code is unavailable. Recent work suggests that large language models (LLMs) can assist this process by classifying decompiled code as benign or malicious, but existing pipelines typically rely on a single decompiler view. We argue that this assumption is fragile: decompilers are lossy heuristic tools, and different decompilers can expose different artefacts of the same binary. We curate a benchmark of benign utilities and malicious programs spanning a range of threat behaviors. Each sample is compiled and decompiled with both Ghidra and RetDec, yielding matched pseudo-C views. Across a range of LLMs from major model families, we find that providing both decompiler views improves malicious-class F1, mainly by increasing recall on malicious samples. Agreement analyses further show that Ghidra and RetDec make partially different errors, supporting the view that decompiler outputs provide complementary evidence. Our results suggest that multi-decompiler prompting is a simple, training-free way to improve LLM-based malware triage in practical settings.

URL PDF HTML ☆

赞 0 踩 0

2606.20431 2026-06-19 cs.LG 新提交

Sparsity, Superposition, and Forgetting: A Mechanistic Study of Representation Retention in Continual Learning

稀疏性、叠加与遗忘：持续学习中表示保持的机制研究

Jan Wasilewski, Jędrzej Kozal, Michał Woźniak, Bartosz Krawczyk

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）； Wrocław University of Science and Technology（弗罗茨瓦夫科技大学）

AI总结通过可控玩具框架研究持续学习中的遗忘机制，发现叠加随时间增加但任务边界处有瞬降，高稀疏性增加叠加但不必然导致遗忘，任务级有效秩随稀疏性增长。

详情

AI中文摘要

持续学习（CL）系统常常遗忘先前获得的知识，但由于真实数据集纠缠了许多因素，遗忘的机制在实践中难以孤立。我们提出了一个可控的玩具世界框架，使这些机制可观察和可测试。使用合成生成器-分离器流水线，我们定义了真实潜在特征，构建了具有可调稀疏性和重叠的任务，并引入了表示强度和叠加（特征间的方向重叠）的可测量量。然后，我们通过拟合保留、叠加和暴露历史之间的稀疏动态关系（通过SINDy）来研究保留动态——表示强度的时间变化。基于有效秩的互补任务级分析表征了表示能力如何在任务间分配。我们的受控实验得出三个要点。（1）叠加随时间增加，在任务边界处有瞬降，表明边界特定的干扰而非稳定漂移。（2）更高的特征稀疏性导致更多叠加，但不必然引起遗忘；当表示保持强时，尽管重叠，遗忘可以减少。（3）任务级有效秩随稀疏性增长，表明在稀疏机制下更广泛的能力使用。这些结果共同细化了常见直觉——更多叠加导致更多遗忘，通过显示重叠与表示强度和能力分配相互作用。我们的玩具分析为CL提供了可证伪的假设和诊断工具。

英文摘要

Continual learning (CL) systems often forget previously acquired knowledge, yet the mechanisms driving forgetting remain hard to isolate in practice because real datasets entangle many factors. We present a controlled, toy-world framework that makes these mechanisms observable and testable. Using a synthetic generator-separator pipeline, we define ground-truth latent features, build tasks with tunable sparsity and overlap, and introduce measurable quantities for representation strength and superposition (directional overlap among features). We then study retention dynamics-the temporal change of representation strength by fitting sparse dynamical relations (via SINDy) between retention, superposition, and exposure history. A complementary task-level analysis based on effective rank characterizes how representational capacity is allocated across tasks. Our controlled experiments yield three takeaways. (1) Superposition tends to increase over time with transient dips at task boundaries, suggesting boundary-specific interference rather than steady drift. (2) Higher feature sparsity induces more superposition yet does not inevitably cause forgetting; when representations remain strong, forgetting can be reduced despite overlap. (3) Task-level effective rank grows with sparsity, indicating broader capacity usage under sparse regimes. Together, these results nuance the common intuition that more superposition leads to more forgetting by showing that overlap interacts with representation strength and capacity allocation. Our toy analysis provides falsifiable hypotheses and diagnostic tools for CL.

URL PDF HTML ☆

赞 0 踩 0

2606.20428 2026-06-19 cs.RO 新提交

具有不确定性量化的神经网络代理模型用于偏微分方程反问题

Christian Jimenez-Beltran, Aretha L. Teckentrup, Antonio Vergari, Konstantinos C. Zygalakis

AI总结提出DeepGaLA神经网络代理模型，为微分方程求解器提供不确定性感知预测，结合延迟接受MCMC诊断，实现高效可靠的贝叶斯反演。

详情

AI中文摘要

微分方程的反问题在科学和工程中普遍存在，其目标是从噪声或不完整的观测中推断未知模型参数。传统数值方法通常计算成本高昂，尤其是在贝叶斯设置中，对于复杂正向模型和高维参数空间，评估似然函数变得非常昂贵。为了应对这一挑战，我们引入了DeepGaLA，一种用于微分方程求解器的神经网络代理模型，它提供不确定性感知的预测，在训练数据有限时减少过度自信的推断。为了在实践中评估代理诱导的后验近似的保真度，我们表明，短时间运行的延迟接受马尔可夫链蒙特卡洛可以作为有效的诊断工具。在一系列数值实验中，DeepGaLA提供的正向模型近似精度与已建立的高斯过程代理相当，同时在参数维度增加时更好地保持效率。此外，它可以纳入微分方程约束，包括非线性情况。总体而言，这些结果表明，具有不确定性量化的神经代理模型能够实现复杂系统中反问题的可扩展且可靠的贝叶斯推断。

英文摘要

Inverse problems for differential equations arise throughout science and engineering, where one seeks to infer unknown model parameters from noisy or incomplete observations. Traditional numerical methods for these problems are often computationally expensive, particularly in Bayesian settings where evaluating the likelihood becomes costly for complex forward models and high-dimensional parameter spaces. To address this challenge, we introduce DeepGaLA, a neural-network surrogate for differential equation solvers that provides uncertainty-aware predictions, reducing overconfident inference when training data are limited. To evaluate the fidelity of the surrogate-induced posterior approximations in practice, we show that a short run of delayed-acceptance Markov chain Monte Carlo can serve as an effective diagnostic. Across a range of numerical experiments, DeepGaLA delivers forward-model approximations with accuracy comparable to established Gaussian-process surrogates, while better maintaining efficiency as parameter dimension grows. Moreover, it can incorporate differential-equation constraints, including in nonlinear settings. Overall, these results indicate that uncertainty-quantified neural surrogates can enable scalable and reliable Bayesian inference for inverse problems in complex systems.

URL PDF HTML ☆

赞 0 踩 0

2606.20416 2026-06-19 cs.LG cs.CV 新提交

On the Redundancy of Timestep Embeddings in Diffusion Models

扩散模型中时间步嵌入的冗余性研究

José A. Chávez

发表机构 * Independent Researcher, Lima, Peru（独立研究者，秘鲁利马）

AI总结本文通过理论和实验证明，在U-Net和Diffusion Transformer架构中，扩散模型无需显式时间步嵌入也能达到全局最优，甚至在某些指标上超越有条件模型。

Comments 17 pages

详情

AI中文摘要

扩散模型严重依赖显式的时间步嵌入来调节不同噪声尺度下的去噪过程。在这项工作中，我们通过分析时间步嵌入对U-Net和Diffusion Transformer架构的影响，挑战了这些时间信号的必要性。除了经验证据外，我们提供了一个理论框架，证明在某些条件下，无需显式时间步条件即可达到扩散训练目标的全局最小值。我们的发现揭示了当完全移除时间步嵌入时令人惊讶的鲁棒性。在CelebA和CIFAR-10数据集上的大量消融研究表明，这些时间无关模型可以保持高结构保真度，甚至在竞争性指标（包括FID、精确率和召回率）上超越其有条件对应模型。我们的分析表明，这些架构可以在特定假设下从损坏输入中隐式推断噪声尺度，使得显式时间条件变得冗余。这项研究挑战了长期以来的时间条件范式，并为更高效、更注重结构的生成架构铺平了道路。

英文摘要

Diffusion models rely heavily on explicit timestep embeddings to modulate the denoising process across various noise scales. In this work, we challenge the necessity of these temporal signals by analyzing their impact on U-Net and Diffusion Transformer architectures. Beyond empirical evidence, we provide a theoretical framework demonstrating that, under certain conditions, the global minimizer of the diffusion training objective can be achieved without explicit timestep conditioning. Our findings reveal a surprising robustness when timestep embeddings are completely removed. Extensive ablation studies on the CelebA and CIFAR-10 datasets show that these time-agnostic models can maintain high structural fidelity and even surpass their conditioned counterparts in competitive metrics, including FID, precision, and recall. Our analysis suggests these architectures can implicitly infer noise scales from the corrupted input under specific assumptions, rendering explicit temporal conditioning redundant. This study challenges long-standing temporal conditioning paradigms and paves the way for more efficient and structurally focused generative architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.20415 2026-06-19 cs.LG 新提交

Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power Grids

伪特征填充：一种针对电网虚假数据注入的轻量级防御方法

Farhin Farhad Riya, Shahinul Hoque, Yingyuan Yang, Jinyuan Sun, Kevin Tomsovic

发表机构 * University of Tennessee（田纳西大学）； The University of Illinois at Springfield（伊利诺伊大学斯普林菲尔德分校）； Clemson University（克莱姆森大学）

AI总结提出一种轻量级防御框架，通过基于输入统计分布的伪特征填充增加输入维度，使对抗攻击因扰动不可转移和填充结构不可预测而计算不可行，显著提升深度神经网络在电网状态估计中的鲁棒性。

详情

AI中文摘要

深度神经网络（DNN）在各种任务中取得了显著的准确性，包括在信息物理系统（CPS）中用于检测关键操作期间的虚假数据注入攻击（FDIA）。然而，CPS的独特基础设施使得DNN容易受到攻击者的利用，以逃避检测。此外，CPS的独特性质对传统的FDIA防御机制提出了挑战。本文提出了一种创新的防御框架，通过引入一个额外的输入层，该层使用从输入统计分布中导出的伪特征值对输入样本进行填充，从而增强DNN抵御此类攻击的能力。这种填充以随机化和数据感知的方式增加了输入维度，使得由于精心设计的扰动的不可转移性和填充结构的不可预测性，对抗攻击在计算上变得不可行。我们的方法轻量级、与模型无关，并且不需要对核心架构进行修改，使其在现实世界的CPS环境中高度可部署。我们在关键电网应用（如使用IEEE 14节点、30节点、118节点和300节点系统的状态估计）上评估了我们的框架。对抗性设置下的实验表明，我们的填充策略显著提高了模型的鲁棒性，对性能的影响可以忽略不计，并有效缓解了原本会绕过传统防御的攻击。

英文摘要

Deep Neural Networks DNNs have achieved remarkable accuracy in various tasks including their application in CyberPhysical Systems CPS for detecting False Data Injection Attacks FDIA during critical operations However the unique infrastructure of CPS makes DNNs vulnerable to exploitation by attackers aiming to evade detection Additionally the distinct nature of CPS presents challenges for conventional defense mechanisms against FDIA This paper proposes an innovative defense framework that strengthens DNNs against such attacks by introducing an additional input layer that performs padding in the input samples using pseudofeature values derived from the inputs statistical distribution This padding increases the input dimensionality in a randomized and dataaware manner making adversarial attacks computationally infeasible due to the nontransferable nature of crafted perturbations and the unpredictability of the padded structure Our method is lightweight modelagnostic and requires no modifications to the core architecture making it highly deployable in realworld CPS settings We evaluated our framework on critical power grid applications such as state estimation using the IEEE 14bus 30bus 118bus and 300bus systems Experiments under adversarial settings demonstrate that our padding strategy significantly improves model robustness with negligible impact on performance and effectively mitigates attacks that would otherwise bypass conventional defenses

URL PDF HTML ☆

赞 0 踩 0