arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2505.13350 2026-05-18 cs.RO

Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity

Sharanya Venkatesh, Bibit Bianchini, Alp Aydinoglu, William Yang, Michael Posa

发表机构 * GRASP Laboratory at the University of Pennsylvania（宾夕法尼亚大学GRASP实验室）； Boston Dynamics（波士顿动力）； Amazon Robotics（亚马逊机器人技术）

AI总结为实现通用的灵巧操作，机器人需要快速规划并执行富含接触的运动行为。现有基于模型的控制器无法在实时中对指数级可能的接触序列进行全局优化，而隐式接触控制方法虽简化了模型，但仅能局部近似，限制了对接触空间的探索。本文提出一种结合局部互补性控制与全局采样的新方法，在每个控制周期中先进行无接触阶段的采样，再基于每个采样点进行富含接触的局部模型预测控制，从而实现全局感知的隐式接触控制器，能够在实时中完成非凸物体的精确非抓取操作。

Comments S.V. and B.B. contributed equally to this work. Accepted to RA-L 2025; presented at ICRA 2026. Project page: https://approximating-global-ci-mpc.github.io

Journal ref IEEE Robotics and Automation Letters, volume 10, number 11, pages 12117-12124, September 2025

2505.12601 2026-05-18 cs.LG

Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers

Yang Li

发表机构 * Independent researcher（独立研究者）

AI总结随着大语言模型（LLM）规模和专业性不断提升，如何高效选择最适合的模型处理输入已成为关键问题。本文重新审视LLM路由策略，发现经过精心调优的k近邻（kNN）方法在多种任务中不仅表现优异，甚至优于当前最先进的学习路由方法。研究引入了一系列标准化路由基准和首个多模态路由数据集，揭示了嵌入空间中模型性能的局部特性使得非参数方法在样本复杂度上更具优势，挑战了当前追求复杂架构的趋势。

2505.07322 2026-05-18 cs.CV

RealRep: Generalized SDR-to-HDR Conversion via Attribute-Disentangled Representation Learning

Li Xu, Siqi Wang, Kepeng Xu, Gang He, Lin Zhang, Weiran Wang, Yu-Wing Tai

发表机构 * Xidian University（西安电子科技大学）； Dartmouth College（达特茅斯学院）

AI总结本文提出了一种通用的SDR到HDR转换框架RealRep，通过解耦亮度和色度属性的学习，提升对真实世界中多样SDR内容的鲁棒性。核心方法包括解耦表征学习、基于退化感知的负样本生成策略，以及一个轻量的两阶段映射网络DDACMNet，能够根据退化条件动态调整映射过程。实验表明，RealRep在泛化能力和HDR色彩重构的感知保真度方面均优于现有方法。

Comments Published on AAAI'26(Oral): The Annual AAAI Conference on Artificial Intelligence

2505.06982 2026-05-18 cs.CV

Decentralized LoRA augmented transformer with multi-scale feature learning for secured eye diagnosis

Md. Naimur Asif Borno, Md Sakib Hossain Shovon, MD Hanif Sikder, Iffat Firozy Rimi, Tahani Jaser Alahmadi, Mohammad Ali Moni

发表机构 * organization= Research Assistant, The University of Queensland , addressline= 308 Queen St , city= Brisbane City , postcode= QLD 4000 , state= Queensland , country= Australia ； organization= Mechatronics Engineering, Rajshahi University of Engineering \& Technology , city= Rajshahi , postcode= 6204 , country= Bangladesh ； organization= Researcher, The University of Queensland , addressline= 308 Queen St , city= Brisbane City , postcode= QLD 4000 , state= Queensland , country= Australia ； organization= Department of Computer Science, American International University Bangladesh , city= Dhaka , postcode= 1216 , country= Bangladesh ； organization= Department of Computer Science, University of South Asia-Bangladesh , city= Dhaka , postcode= 1216 , country= Bangladesh ； organization= Department of Computer Science ； Engineering, Daffodil International University , city= Dhaka , country= Bangladesh ； Department of Information Systems, College of Computer ； Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, Saudi Arabia. Email ； organization= Faculty of Health, Medicine ； Behavioural Sciences, The University of Queensland , addressline= 308 Queen St , city= Brisbane City , postcode= QLD 4000 , state= Queensland , country= Australia ； Cyber Futures Institute Charles Sturt University , addressline= 308 Queen St , city= Bathurst NSW , country= Australia

AI总结本文提出了一种基于改进型图像Transformer（DeiT）的去中心化眼病诊断框架，旨在解决医学影像中眼科疾病诊断面临的数据不平衡、隐私保护、空间特征多样性和临床可解释性等挑战。该方法结合多尺度特征学习、低秩适配（LoRA）、知识蒸馏和联邦学习，有效提升了模型在计算效率、数据隐私保护和诊断性能方面的表现。实验表明，该框架在多个基准数据集上优于传统卷积神经网络和现有Transformer模型，并通过Grad-CAM++提供了可解释的诊断依据，为安全、可扩展的眼科AI诊断系统奠定了基础。

Comments Published at Knowledge-Based Systems

2504.21850 2026-05-18 cs.CV

Visual Compositional Tuning

Xindi Wu, Hee Seung Hwang, Polina Kirichenko, Esin Tureci, Olga Russakovsky

发表机构 * Princeton University（普林斯顿大学）； Meta AI

AI总结本文研究了视觉指令微调（VIT）数据集中样本复杂度对信息量的影响，提出了一种名为COMPACT的合成数据生成方法，通过在一个训练样本中组合多个基础视觉能力，显著提升了数据效率。实验表明，COMPACT在减少训练数据量90%的情况下，仍能保持与完整数据相当甚至更好的模型性能，在多个视觉语言基准测试中表现优异。该方法为提升视觉语言任务的训练效率提供了可扩展的解决方案。

Comments See the project website at this [URL](https://princetonvisualai.github.io/compact/)

2504.09544 2026-05-18 cs.LG cs.CE cs.CV

Integrating chemical structures as treatments improves representations of microscopy images for morphological profiling

Yemin Yu, Emre Hayir, Neil Tenenholtz, Lester Mackey, Ying Wei, David Alvarez-Melis, Ava P. Amini, Alex X. Lu

发表机构 * Department of Computer Science, City University of Hong Kong（香港城市大学计算机科学系）； Microsoft Research（微软研究院）； Department of Computer Science, Zhejiang University（浙江大学计算机科学系）

AI总结该研究提出了一种名为MICON的新框架，通过在自监督预训练中整合化学结构信息，提升高通量显微图像的表征能力，以更准确地进行形态学分析。研究认为，将化合物结构作为诱导细胞表型变化的“处理”因素进行建模，能够显著优于传统手工特征和现有深度学习方法。实验表明，结合化学信息的表征学习在跨实验重复和数据来源的药物效应识别任务中表现更优，为多模态显微筛查数据的表征学习提供了新方向。

Comments 24 pages

2504.08300 2026-05-18 cs.CL cs.AI

Large Language Models Could Be Rote Learners

Yuyang Xu, Renjun Hu, Haochao Ying, Jian Wu, Xing Shi, Wei Lin

发表机构 * College of Computer Science and Technology, Zhejiang University（浙江大学计算机科学与技术学院）； State Key Laboratory of Transvascular Implantation Devices and TIDRI（血管植入设备国家重点实验室和TIDRI）； Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence（浙江医学影像人工智能重点实验室）； School of Data Science of Engineering, East China Normal University（华东师范大学工程数据科学学院）； Second Affiliated Hospital and Liangzhu Laboratory, Zhejiang University School of Medicine（浙江大学医学院第二附属医院和良渚实验室）； Alibaba Group（阿里巴巴集团）

AI总结本文研究了大语言模型（LLMs）在基准测试中的表现是否受到训练数据污染的影响，指出当前基于基准测试的评估方式可能高估了模型的真实能力。为此，作者提出了一种新的评估框架TrinEval，通过重构多选题形式，减少对记忆的依赖，从而更准确地评估模型的真实学习能力。实验表明，主流大语言模型在多个数据集上约有19.6%的知识点依赖于死记硬背，而非真正的理解与推理能力。

Comments Work in Progress

2504.05451 2026-05-18 cs.CV

ViewBridge: Curriculum Knowledge Distillation for Activity View-Invariance Under Extreme Viewpoint Changes

Arjun Somayazulu, Efi Mavroudi, Changan Chen, Lorenzo Torresani, Kristen Grauman

发表机构 * UT Austin（得克萨斯大学奥斯汀分校）； Meta AI ； Stanford University（斯坦福大学）； Northeastern University（东北大学）

AI总结 ViewBridge 是一种用于学习活动视点不变表示的框架，旨在应对野外视频中极端视角变化带来的挑战。该方法通过知识蒸馏保留动作语义，并结合课程学习策略，逐步增加视角难度以实现平滑适应。实验表明，ViewBridge 在两个任务上优于现有方法，适用于多个数据集。

2503.16589 2026-05-18 cs.LG cs.ET math.ST stat.TH

A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: Avoiding Unreliable Conclusions

Moslem Noori, Elisabetta Valiante, Thomas Van Vaerenbergh, Masoud Mohseni, Ignacio Rozada

发表机构 * QB Information Technologies (1QBit)（1QB信息科技（1QBit））； Hewlett Packard Labs, Hewlett Packard Enterprise（惠普实验室，惠普企业）

AI总结本文针对随机优化器的性能评估问题，提出了一种统计分析方法，以避免因实验设计不当导致的不可靠结论。研究分析了常用性能指标的置信区间及其与实验重复次数的关系，并推导出保证指标精度所需的最小重复次数下界。基于此，作者提出了一种自适应调整重复次数的算法，以提高评估的准确性和可靠性。实验结果验证了该方法在基准测试和超参数调优中的有效性。

Journal ref Physical Review Applied 25, no. 3 (2026): 034081

2503.07518 2026-05-18 cs.CL cs.AI cs.LG

TokenButler: Token Importance is Predictable

Yash Akhauri, Ahmed F AbouElhamayed, Yifei Gao, Chi-Chih Chang, Sameh Gobriel, Nilesh Jain, Mohamed S. Abdelfattah

发表机构 * Cornell University（康奈尔大学）； Intel Labs（英特尔实验室）

AI总结大型语言模型在解码过程中依赖键值缓存（KV-Cache）存储历史信息，但随着缓存增长，其成为内存和计算瓶颈。为解决这一问题，本文提出TokenButler，一种高精度、查询感知的标记重要性预测方法，能够在固定预算下动态选择关键标记，同时保留完整的KV缓存。该方法通过学习预测低维重要性查询，并结合缓存键的投影进行高效评分，实验表明其在长上下文任务中性能优越，并显著提升了推理速度。

详情

英文摘要

Large Language Models (LLMs) rely on the Key-Value (KV) Cache to store token history, enabling efficient decoding of tokens. As the KV-Cache grows, it becomes a major memory and computation bottleneck. However, there is an opportunity to alleviate this bottleneck, prior research has shown that only a small subset of tokens contribute meaningfully to each decoding step. A key challenge in finding these critical tokens is that they are dynamic, and heavily input query-dependent. Existing methods either risk quality by evicting tokens permanently, or retain the full KV-Cache but rely on retrieving chunks of tokens and many existing KV-Cache sparsity methods rely on inaccurate proxies for token importance. To address these limitations, we introduce TokenButler, a high-granularity, query-aware predictor that learns to identify these critical tokens. TokenButler predicts low-dimensional importance queries at a fixed depth stride, and combines them with a learned projection of the real KV-cache keys to score tokens cheaply, enabling dynamic per-token selection under a fixed budget while preserving the full KV cache. We train TokenButler by distilling the model's masked causal attention distributions, optimizing a lightweight predictor with minimal parameter overhead. We evaluate TokenButler on a novel synthetic small-context co-referential retrieval task, demonstrating near-oracle accuracy where existing methods fail. Furthermore, TokenButler achieves competitive or superior performance on long-context benchmarks (RULER, LongBench), up to $\approx1.6\times$ on-GPU speedup using our proposed *prediction interval with neighbor fetching* that amortizes predictor cost while maintaining accuracy within $\approx$1.1\%, and up to 7.6$\times$ reduction in latency compared to Dense Attention with CPU offloading. Code is available: https://github.com/abdelfattah-lab/TokenButler

URL PDF HTML ☆

赞 0 踩 0

2503.02597 2026-05-18 cs.CV cs.AI

Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs

Wei-Yao Wang, Zhao Wang, Helen Suzuki, Yoshiyuki Kobayashi

发表机构 * Sony Group Corporation, Tokyo, Japan（索尼集团，日本东京）

AI总结近期多模态大语言模型（MLLMs）在理解和推理多模态信息方面取得了显著进展，但视觉与语言模态之间的对齐问题仍是一个关键挑战。本文从模型架构层面出发，提出了一种新的模态互注意力机制（MMA），通过将因果注意力扩展为跨模态互注意力，使图像模态能够关注文本模态，从而提升模型对输入信息的准确理解。该方法在多个多模态理解基准测试中取得了优越性能，且无需增加额外参数，具有通用性和可扩展性。

Comments ICML 2026. Code is available at https://github.com/sony/aki

2502.12187 2026-05-18 cs.CL cs.FL cs.LG math.ST stat.ML stat.TH

Hallucinations are inevitable but can be made statistically negligible

Atsushi Suzuki, Yulan He, Feng Tian, Zhongyuan Wang

发表机构 * Department of Mathematics（数学系）； The University of Hong Kong（香港大学）； Department of Informatics（信息学院）； King’s College London（伦敦国王学院）； Division of Natural and Applied Sciences（自然科学与应用科学系）； Duke Kunshan University（杜克大学昆山分校）； School of Computer Science（计算机科学学院）

AI总结本文探讨了语言模型中不可避免的“幻觉”现象，即模型生成非事实内容的问题。尽管已有研究从可计算性理论角度证明，任何语言模型在无限输入集上都会产生幻觉，但本文从概率论角度提出，只要训练数据的质量和数量足够，幻觉在统计意义上可以被显著降低。研究指出，虽然可计算性理论结果具有理论意义，但概率理论结果更符合实际应用需求，为缓解幻觉问题提供了新的理论依据。

2501.19128 2026-05-18 cs.LG cs.AI

Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

Wenyun Li, Wenjie Huang, Chen Sun

发表机构 * Department of Mathematics, The University of Hong Kong (HKU)（香港大学数学系）； Department of Data and Systems Engineering, HKU（香港大学数据与系统工程系）； Musketeers Foundation Institute of Data Science, HKU（穆斯克特基金会数据科学研究所）

AI总结在强化学习中，稀疏奖励信号使得奖励函数的学习变得困难。本文提出一种半监督方法，结合非零奖励转移和数据增强技术，利用大量零奖励转移学习轨迹表示，从而提升奖励塑形的效果。实验表明，该方法在Atari和机器人操作任务中优于基于监督的方法，尤其在稀疏奖励环境下，其最高得分可达监督方法的两倍。

2501.17116 2026-05-18 cs.LG cs.CL

Optimizing Large Language Model Training Using FP4 Quantization

Ruizhe Wang, Yeyun Gong, Xiao Liu, Guoshuai Zhao, Ziyue Yang, Baining Guo, Zhengjun Zha, Peng Cheng

发表机构 * University of Science and Technology of China（科学技术大学）； Microsoft Research Asia（微软亚洲研究院）； Microsoft SIGMA Team（微软SIGMA团队）

AI总结随着大语言模型（LLM）训练的计算需求不断增长，如何提高训练效率成为关键问题。本文提出首个基于FP4量化的大语言模型训练框架，通过可微分量化估计器和异常值截断补偿策略，有效解决了FP4精度下量化误差大、表征能力有限的问题，并结合混合精度训练和向量化量化保证训练稳定性。实验表明，该框架在保持与BF16和FP8相近精度的同时，能够高效支持超大规模模型的训练。

Journal ref Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:62937-62957, 2025

2412.02271 2026-05-18 cs.CL

The MediaSpin Dataset: Post-Publication News Headline Edits Annotated for Media Bias

Preetika Verma, Kokil Jaidka

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； National University of Singapore（新加坡国立大学）； NUS Centre for Trusted Internet and Community（新加坡国立大学可信互联网与社区中心）

AI总结本文介绍了 MediaSpin 数据集，这是一个大规模语言资源，记录了主要新闻机构在新闻发布后对标题的修改情况，并配套了 MediaSpin-in-the-Wild 数据集，用于分析这些修改后的标题在社交媒体上的互动情况。数据集包含78,910对标题，标注了13种媒体偏见类型，涵盖主观和客观偏见形式，并通过专家验证的大型语言模型进行标注。研究展示了该数据集在跨国家分析、偏见分类和社交媒体行为分析中的应用，揭示了媒体报道中的区域框架不对称性、可量化的语言特征以及偏见内容的高互动性。

Comments 8 pages, 3 figures, 8 tables Accepted at AAAI ICWSM 2026 We updated the paper title from "MediaSpin: Exploring Media Bias Through Fine-Grained Analysis of News Headlines " to "The MediaSpin Dataset: Post-Publication News Headline Edits Annotated for Media Bias"

2410.01990 2026-05-18 cs.LG cs.CE

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

Leonardo Ferreira Guilhoto, Paris Perdikaris

发表机构 * Graduate Group in Applied Mathematics and Computational Science（应用数学与计算科学联合研究生组）； University of Pennsylvania（宾夕法尼亚大学）； Department of Mechanical Engineering & Applied Mechanics（机械工程与应用力学系）

AI总结本文探讨了作为神经网络设计基础的柯尔莫戈罗夫叠加定理（KST）的替代形式。传统KST在数学上优雅，但因其对内外函数结构的洞察有限且引入大量未知变量，带来实际应用挑战。为此，研究提出了一种可扩展的深度学习模型ActNet，克服了原KST的诸多缺陷，并在物理信息神经网络（PINNs）框架下进行了评估，结果表明ActNet在偏微分方程模拟等任务中优于基于KST的Kolmogorov-Arnold网络，并具有与传统多层感知机相当的竞争力。

Journal ref Guilhoto, Leonardo Ferreira, and Paris Perdikaris. "Deep Learning Alternatives Of The Kolmogorov Superposition Theorem." The Thirteenth International Conference on Learning Representations (ICLR 2025)

2409.11022 2026-05-18 cs.CL cs.AI

DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition

Hanjun Luo, Yingbin Jin, Xinfeng Li, Xuecheng Liu, Ruizhe Chen, Tong Shang, Kun Wang, Qingsong Wen, Zuozhu Liu

发表机构 * New York University Abu Dhabi（纽约大学阿布扎赫德分校）； Zhejiang University（浙江大学）； The Hong Kong Polytechnic University（香港理工大学）； Nanyang Technology University（南阳技术大学）； University of Electronic Science and Technology of China（电子科技大学）； Texas A&M University（德克萨斯大学）； Squirrel AI

AI总结随着大语言模型（LLM）在命名实体识别（NER）任务中的应用日益广泛，现有数据集在语料选择和设计逻辑上已难以满足LLM方法的需求。为此，本文提出DynamicNER，一个专为LLM设计的动态、多语言、细粒度NER数据集，支持同一实体在不同上下文中具有不同实体类型，涵盖8种语言和155种实体类型，适用于广泛领域。同时，本文还提出CascadeNER方法，通过两阶段策略和轻量级LLM实现更高效的细粒度识别，实验表明DynamicNER为LLM-based NER提供了有效的评估基准。

Comments This paper is accepted by EMNLP 2025 Main Conference

2409.03897 2026-05-18 cs.LG cs.DC

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

Leo Muxing Wang, Pengkun Yang, Lili Su

发表机构 * Northeastern University（东北大学）； Tsinghua University（清华大学）

AI总结本文研究了异构环境下联邦Q学习的收敛速率问题，探讨了在多个智能体协同学习最优Q函数时，通信频率与智能体数量对收敛速度的影响。研究发现，虽然增加智能体数量可以线性加速收敛，但增加通信间隔会导致性能显著下降，且这一现象具有本质性。论文还揭示了收敛过程中的两阶段特性，并提出了通过调整学习率以加快整体收敛的策略。

2408.07331 2026-05-18 cs.LG

RSEA-MVGNN: Multi-View Graph Neural Network with Reliable Structural Enhancement and Aggregation

Junyu Chen, Long Shi, Badong Chen

发表机构 * Financial Intelligence and Financial Engineering Key Laboratory of Sichuan Province, School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics（四川省金融智能与金融工程重点实验室，西南财经大学计算机与人工智能学院）； Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University（西安交通大学人工智能与机器人研究院）

AI总结该论文提出了一种名为RSEA-MVGNN的多视图图神经网络，旨在有效融合具有不同图结构特征的多视图图数据。该方法通过主观逻辑估计每个视图的不确定性，并利用去相关算法进行可靠的结构增强，从而提升特征多样性；同时，模型基于视图的信念和不确定性评估视图质量，使高质量视图在图神经网络聚合中占据主导地位。实验表明，该方法在多个真实数据集上优于现有先进方法。

Journal ref Information Fusion 121 (2025) 103143

2407.02039 2026-05-18 cs.CL

Prompt Stability Scoring for Text Annotation with Large Language Models

Christopher Barrie, Elli Palaiologou, Petter Törnberg

发表机构 * Department of Sociology, New York University（纽约大学社会学系）； Independent Researcher（独立研究者）； Institute for Logic, Language, and Computation, University of Amsterdam（阿姆斯特丹大学逻辑、语言与计算研究所）

AI总结随着大型语言模型在文本标注中的应用日益广泛，研究发现模型输出的可重复性可能受到提示设计微小变化的影响。为此，本文提出了一种通用的提示稳定性评估框架，通过借鉴编码者内部与外部一致性评分方法，定义了“提示稳定性评分（PSS）”，并开发了相应的Python工具包。实验在多个数据集上验证了该方法的有效性，并为实际研究者提供了提升标注稳定性的实践建议。

Comments 39 pages, 5 figures

2406.18944 2026-05-18 cs.CV cs.AI cs.CR

Rethinking and Red-Teaming Protective Perturbation in Personalized Diffusion Models

Yixin Liu, Ruoxi Chen, Xun Chen, Lichao Sun

发表机构 * Lehigh University（莱维大学）； Lehigh University Computer Science（莱维大学计算机科学）； Engineering Bethlehem PA USA（工程布雷顿佛罗里达美国）； Independent Researcher（独立研究员）； Independent Researcher Fremont California USA（独立研究员佛罗里达加州美国）

AI总结个性化扩散模型（PDMs）在使用少量数据生成特定人物图像方面表现出色，但其对微小对抗性扰动高度敏感，导致在受污染数据上微调时性能显著下降。本文通过 Shortcut Learning 的视角深入分析了 PDMs 的微调过程，揭示了对抗扰动在 CLIP 嵌入空间中引发的潜在语义对齐问题，并据此提出了一种系统性的反制框架，包括图像净化和对比解耦学习，有效提升了模型的鲁棒性和泛化能力。

Comments Code is available at https://github.com/liuyixin-louis/DiffShortcut

2404.03099 2026-05-18 cs.LG cs.AI cs.CE cs.IT math.IT stat.ML

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks

Leonardo Ferreira Guilhoto, Paris Perdikaris

发表机构 * Graduate Group in Applied Mathematics and Computational Science（应用数学与计算科学联合研究生组）； University of Pennsylvania（宾夕法尼亚大学）； Department of Mechanical Engineering and Applied Mechanics（机械工程与应用力学系）

AI总结本文提出了一种名为NEON的神经网络架构，用于在无限维函数空间中进行带有不确定性的预测，其参数数量远少于性能相当的深度集成方法。研究聚焦于复合贝叶斯优化问题，即优化由未知函数映射和已知函数组成的复合函数，并通过实验表明NEON在多个场景下取得了领先的优化效果，同时显著降低了模型复杂度。

Journal ref Guilhoto, Leonardo Ferreira, and Paris Perdikaris. "Composite Bayesian optimization in function spaces using NEON - Neural Epistemic Operator Networks." Scientific Reports 14.1 (2024): 29199

2403.13805 2026-05-18 cs.CV cs.AI cs.LG

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Shanghai AI Laboratory（上海人工智能实验室）； The Chinese University of Hong Kong（香港中文大学）； MThreads, Inc.（MThreads公司）； Nanyang Technological University（南洋理工大学）

AI总结本文提出了一种名为RAR的方法，旨在提升多模态大语言模型（MLLMs）在细粒度和少样本视觉识别任务中的性能。RAR结合了CLIP的多模态检索能力与MLLMs的丰富知识库，通过建立多模态检索器来扩展模型的上下文窗口，并在推理时检索相关类别信息供MLLMs进行排序和预测。该方法有效解决了MLLMs在面对大量类别时性能下降的问题，在多个细粒度和零样本识别基准上取得了显著的性能提升。

Comments Project: https://github.com/Liuziyu77/RAR

2402.10380 2026-05-18 cs.LG cs.AI cs.CL

Subgraph-level Universal Prompt Tuning

Junhyun Lee, Wooseong Yang, Jaewoo Kang

发表机构 * Korea University（韩国大学）； University of Illinois at Chicago（伊利诺伊大学香槟分校）

AI总结在图神经网络中，如何有效适配不同预训练策略的模型仍是一个挑战。本文提出了一种子图级通用提示调优方法（SUPT），通过在子图层面分配提示特征，保持方法的通用性，同时大幅减少调优参数数量。实验表明，SUPT在多种下游任务中表现优异，尤其在少样本场景下平均性能提升超过6.6%。

Journal ref Information Sciences 749 (2026) 123516

2311.03658 2026-05-18 cs.CL cs.AI cs.LG stat.ML

The Linear Representation Hypothesis and the Geometry of Large Language Models

Kiho Park, Yo Joong Choe, Victor Veitch

发表机构 * University of Chicago（芝加哥大学）

AI总结本文探讨了“线性表示假设”，即高层概念在表示空间中以线性方向形式表示的问题，提出了“线性表示”的两种形式化定义，并分别对应输出（词）空间和输入（句子）空间。通过引入因果内积，作者建立了一个非欧几里得的内积结构，能够统一各种线性表示的概念，并用于构建探针和引导向量。实验表明，大型语言模型中确实存在概念的线性表示，且内积的选择对解释与控制模型具有基础性作用。

Comments Accepted for a presentation at ICML 2024 and an oral presentation at NeurIPS 2023 Workshop on Causal Representation Learning. Code is available at https://github.com/KihoPark/linear_rep_geometry

Journal ref In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

2212.12130 2026-05-18 cs.CV

Learning to Detect and Segment for Open Vocabulary Object Detection

Tao Wang, Nan Li

发表机构 * Sichuan University（四川大学）； University of California San Diego（加州大学圣地亚哥分校）

AI总结该研究旨在解决开放词汇物体检测中的检测与分割问题，提出了一种名为CondHead的动态网络结构，以提升模型对新类别物体的泛化能力。核心方法通过条件参数化网络头，利用语义嵌入引导模型学习类别特异性知识，从而实现更准确的边界框回归和分割预测。该方法在保持计算开销极小的前提下，显著提升了现有开放词汇检测方法的性能。

Comments We appologize that author Nan Li was not on the published version due to cvpr23 policy that authors cannot be added after abstract deadline

1911.05467 2026-05-18 cs.LG cs.NA math.NA

ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations

Shanshan Tang, Bo Li, Haijun Yu

发表机构 * Software Development Center, Industrial and Commercial Bank of China（中国工商银行软件开发中心）； Hisilicon Semiconductor and Component Business Dept.(2012 Labs), Huawei Technologies Co., Ltd（华为技术有限公司半导体及组件业务部）； NCMIS & LSEC, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Beijing（数学与系统科学研究院）； School of Mathematical Sciences, University of Chinese Academy of Sciences（中国科学院大学数学科学学院）

AI总结本文提出了一种基于切比雪夫多项式逼近的高效且稳定的深度神经网络构建方法——ChebNet，用于提升对光滑函数的逼近能力。相比传统使用幂级数逼近的RePU激活函数网络，ChebNet通过频率域中的分层切比雪夫逼近结构，实现了更稳定且计算效率更高的网络构造。实验表明，ChebNet不仅保持了与幂级数方法相当的逼近性能，还具有更高的稳定性，并可通过微调获得更优结果，为实际应用中高效逼近光滑函数提供了可行方案。

Comments 6 figures, 3 tables, to appear on Communications in Mathematics and Statistics

Journal ref Communications in Mathematics and Statistics, 2024

2605.16255 2026-05-18 cs.DC cs.AI

Designing Datacenter Power Delivery Hierarchies for the AI Era

为AI时代设计数据中心电力交付层级

Grant Wilkins, Fiodar Kazhamiaka, Alok Gautam Kumbhare, Chaojie Zhang, Ricardo Bianchini

发表机构 * Stanford University（斯坦福大学）； Microsoft Azure Research（微软Azure研究院）

AI总结本文研究了AI时代数据中心电力交付层级设计的挑战，提出了一种评估框架，结合吞吐量、功率和成本指标，分析多资源短缺对部署容量、资本支出和性能的影响。

详情

AI中文摘要

对AI加速器的需求迅速增加机架功率密度，预计到2027年将达到每部署1MW。这给数据中心电力交付设计者带来了重大挑战。随着功率密度增加，为不同目标密度设计的数据中心可能无法使用其交付层级预留的所有功率。设计必须在数据中心长生命周期和多个硬件世代中保持高效。功率利用率在AI时代尤为重要，因为电网电力容量是稀缺资源。设计长期高效的电力交付层级困难，因为机架放置可行性、工作负载影响和成本取决于电气拓扑、部署粒度、放置策略、功率超订和工作负载混合。此外，这些因素随时间变化，跨多个资源维度有相互依赖性，通常无法用闭式分析。为解决这一挑战，我们开发了一个评估框架，结合GPU、计算和存储部署的投影模型，结合Microsoft Azure的生产数据。我们的结果表明，多资源短缺显著改变可部署容量、有效资本支出和交付性能，并量化了从机架和机柜规模AI系统中上升的密度如何影响这些结果。对于AI数据中心设计，相关规划目标不是安装兆瓦，而是随时间变化的可部署容量。

英文摘要

Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Moreover, each of these factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. To address this challenge, we develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance, and quantify how rising density from rack- and pod-scale AI systems shapes these outcomes. For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.

URL PDF HTML ☆

赞 0 踩 0

2605.16245 2026-05-18 cs.CY cs.AI cs.CL cs.LG cs.SI

AI-Mediated Communication Can Steer Collective Opinion

AI介导的交流可以引导集体意见

Stratis Tsirtsis, Kai Rawal, Chris Russell, Brent Mittelstadt, Sandra Wachter

发表机构 * Hasso Plattner Institute（哈索普兰特纳研究所）； Oxford Internet Institute, University of Oxford（牛津互联网研究所，牛津大学）； Weizenbaum Institute（魏泽纳姆研究所）

AI总结本文研究AI在人类间交流中对集体意见形成的影响，通过实证和理论分析展示AI引入的方向性偏见如何通过网络放大并改变集体观点，探讨平台如何控制此类偏见。

详情

AI中文摘要

生成式人工智能（AI）正日益融入人类交流意见的在线平台；大型语言模型（LLMs）现在在LinkedIn上润色用户帖子，并在X上提供内容上下文。尽管先前研究显示AI能表达偏见意见并影响个体意见，但较少关注其在介导人类间交流时对集体意见形成的影响。我们通过实证和理论分析填补这一空白。我们实证显示，多个流行LLM家族在被指示编辑争议性话题的人类文本时引入方向性偏见，例如倾向于支持枪支管控，反对无神论。基于这一观察，我们引入了一个意见动态的数学模型，其中AI系统位于社交网络用户之间，转换他们表达和感知的意见。通过分析该模型的平衡点并使用真实社交网络数据进行模拟，我们显示AI在人类间交流中引入的偏见可通过网络放大并转向集体意见。鉴于这些发现，我们探讨此类偏见是否可通过在线平台控制。我们审核了X上的“解释此帖子”功能，并发现Grok在与堕胎相关的内容中的输出存在亲生命偏见，我们追溯到特定的设计选择。最后，我们讨论了这些发现与欧洲联盟正在进行的立法努力的广泛影响。

英文摘要

Generative artificial intelligence (AI) is increasingly integrated into the online platforms where humans exchange opinions; large language models (LLMs) now polish users' posts on LinkedIn and provide context for content shared on X. While prior work has shown that AI can express biased opinions and shape individuals' opinions during human-AI interactions, less attention has been paid to its influence on collective opinion formation when mediating human-to-human communication. We address this gap via a combination of empirical and theoretical analyses. We show empirically that LLMs from multiple popular families introduce directional biases when instructed to edit human-written texts on contested topics, for example, nudging texts in favor of gun control and against atheism. Building on this observation, we introduce a mathematical model of opinion dynamics in which an AI system sits between users on a social network, transforming the opinions they express and perceive. By analytically characterizing the equilibrium of this model and performing simulations on real social network data, we show that biases introduced by AI in human-to-human communication can be amplified through the network and shift collective opinion in their direction. In light of these findings, we investigate whether such biases are controllable by online platforms. We audit the "Explain this post" feature on X and find evidence of pro-life bias in Grok's outputs on abortion-related content, which we trace back to specific design choices. We conclude with a discussion of the broader implications of our findings in relation to ongoing legislative efforts in the European Union.

URL PDF HTML ☆

赞 0 踩 0

2605.16230 2026-05-18 cond-mat.mtrl-sci cs.LG

Universal Magnetic Structure Prediction from Atomic Coordinates with Near-Experimental Accuracy

从原子坐标预测通用磁结构并实现接近实验精度

Abhijatmedhi Chotrattanapituk, Ryotaro Okabe, Eunbi Rha, Mariya Al-Hinai, Eugene Jiang, Daniel Pajerowski, Yongqiang Cheng, Joshua J. Turner, Mingda Li

发表机构 * Quantum Measurement Group, MIT, Cambridge, MA 02139, USA（麻省理工学院量子测量组）； Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA（麻省理工学院电气工程与计算机科学系）； Department of Chemistry, MIT, Cambridge, MA 02139, USA（麻省理工学院化学系）； Department of Nuclear Science and Engineering, MIT, Cambridge, MA 02139, USA（麻省理工学院核科学与工程系）； Department of Physics, MIT, Cambridge, MA 02139, USA（麻省理工学院物理系）； Neutron Scattering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA（橡树岭国家实验室中子散射组）； SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA（斯坦福直线加速器实验室）

AI总结本文提出磁结构网络（MSN），通过原子晶体结构直接预测磁结构，利用原始调制结构表示（PMSR）统一编码调制结构，实现高精度磁结构预测，为磁性材料发现提供新方法。

Comments 9 pages, 3 figures

详情

AI中文摘要

磁序是材料的基本性质，调控集体行为并实现多种功能。然而，磁结构难以确定：实验成本高且专业，而第一性原理方法常难以处理非collinear和无调制序。本文引入磁结构网络（MSN），一种E(3)等变图神经网络，直接从原子晶体结构预测collinear和non-collinear磁结构，训练于MAGNDATA实验确定结构。通过提出原始调制结构表示（PMSR），我们能够统一编码调制和非调制结构，无需对称假设。模型在所有调制组件上表现强劲，能高保真重建实验磁结构。我们的方法提供了一种可扩展的框架，用于快速磁结构预测，并开辟了数据驱动发现磁性材料的新途径。

英文摘要

Magnetic order is a fundamental property of materials, governing collective behavior and enabling a broad range of functionalities. Yet magnetic structure remains difficult to determine: experiments are costly and specialized, while first-principles methods often struggle with the noncollinear and incommensurate orders found in real materials. Here we introduce magnetic structure network (MSN), an E(3) equivariant graph neural network that predicts both collinear and non-collinear magnetic structures directly from atomic crystal structures, trained directly on experimentally determined structures from MAGNDATA. By proposing the primitive modulated structure representation (PMSR), we are able to encode commensurate and incommensurate structures in a unified way without symmetry assumptions. The model achieves strong performance across all modulation components and reconstructs experimental magnetic structures with high fidelity. Our approach provides a scalable framework for rapid magnetic structure prediction and opens a route to data-driven discovery of magnetic materials.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity

Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers

RealRep: Generalized SDR-to-HDR Conversion via Attribute-Disentangled Representation Learning

Decentralized LoRA augmented transformer with multi-scale feature learning for secured eye diagnosis

Visual Compositional Tuning

Integrating chemical structures as treatments improves representations of microscopy images for morphological profiling

Large Language Models Could Be Rote Learners

ViewBridge: Curriculum Knowledge Distillation for Activity View-Invariance Under Extreme Viewpoint Changes

A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: Avoiding Unreliable Conclusions

TokenButler: Token Importance is Predictable

Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs

Hallucinations are inevitable but can be made statistically negligible

Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach

Optimizing Large Language Model Training Using FP4 Quantization

The MediaSpin Dataset: Post-Publication News Headline Edits Annotated for Media Bias

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition

On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments

RSEA-MVGNN: Multi-View Graph Neural Network with Reliable Structural Enhancement and Aggregation

Prompt Stability Scoring for Text Annotation with Large Language Models

Rethinking and Red-Teaming Protective Perturbation in Personalized Diffusion Models

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

Subgraph-level Universal Prompt Tuning

The Linear Representation Hypothesis and the Geometry of Large Language Models

Learning to Detect and Segment for Open Vocabulary Object Detection

ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations

Designing Datacenter Power Delivery Hierarchies for the AI Era

AI-Mediated Communication Can Steer Collective Opinion

Universal Magnetic Structure Prediction from Atomic Coordinates with Near-Experimental Accuracy