arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2508.11936 2026-02-23 cs.LG

M3OOD: Automatic Selection of Multimodal OOD Detectors

Yuehan Qin, Li Li, Defu Cao, Tiankai Yang, Jiate Li, Yue Zhao

2508.04581 2026-02-23 cs.CL cs.AI cs.LG

Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning

Magauiya Zhussip, Dmitriy Shopkhoev, Ammar Ali, Stamatios Lefkimmiatis

Comments This work has been accepted and presented at AAAI 2026 in Singapore

详情

英文摘要

Large language models have revolutionized AI applications, yet their high computational and memory demands hinder their widespread deployment. Existing compression techniques focus on intra-block optimizations (e.g., low-rank approximation or attention pruning), while the repetitive layered structure of transformers implies significant inter-block redundancy - a dimension largely unexplored beyond key-value (KV) caching. Inspired by dictionary learning in convolutional networks, we propose a framework for structured weight sharing across transformer layers. Our approach decomposes attention projection matrices (Q, K, V, O) into shared dictionary atoms, reducing the attention module's parameters by 66.7\% while achieving on-par performance. Unlike complex methods requiring distillation or architectural changes, MASA (Matrix Atom Sharing in Attention) operates as a drop-in replacement-trained with standard optimizers - and represents each layer's weights as linear combinations of shared matrix atoms. Experiments across scales (100M-700M parameters) show that MASA achieves better benchmark accuracy and perplexity than GQA, low-rank baselines and recent Repeat-all-over/Sequential sharing at comparable parameter budgets. Ablation studies confirm robustness to the dictionary size and the efficacy of shared representations in capturing cross-layer statistical regularities. Extending to Vision Transformers (ViT), MASA matches performance metrics on image classification tasks with 66.7\% fewer attention parameters. By combining dictionary learning strategies with transformer efficiency, MASA offers a scalable blueprint for parameter-efficient models without sacrificing performance. Finally, we investigate the possibility of employing MASA on large pretrained models to reduce their number of parameters without experiencing any significant drop in their performance.

URL PDF HTML ☆

赞 0 踩 0

2508.03923 2026-02-23 cs.CL

CoAct-1: Computer-using Multi-Agent System with Coding Actions

Linxin Song, Yutong Dai, Viraj Prabhu, Jieyu Zhang, Taiwei Shi, Li Li, Junnan Li, Silvio Savarese, Zeyuan Chen, Jieyu Zhao, Ran Xu, Caiming Xiong

2507.18031 2026-02-23 cs.CV cs.AI cs.LG

ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan

详情

英文摘要

The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feature extraction across spatial and frequency domains, ViGText captures details that enhance its robustness and accuracy to detect sophisticated deepfakes. Extensive experiments demonstrate that ViGText significantly enhances generalization and achieves a notable performance boost when it detects user-customized deepfakes. Specifically, average F1 scores rise from 72.45% to 98.32% under generalization evaluation, and reflects the model's superior ability to generalize to unseen, fine-tuned variations of stable diffusion models. As for robustness, ViGText achieves an increase of 11.1% in recall compared to other deepfake detection approaches. When facing targeted attacks that exploit its graph-based architecture, ViGText limits classification performance degradation to less than 4%. ViGText uses detailed visual and textual analysis to set a new standard for detecting deepfakes, helping ensure media authenticity and information integrity.

URL PDF HTML ☆

赞 0 踩 0

2507.10587 2026-02-23 cs.CL cs.AI

Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing

Dennis Ulmer, Alexandra Lorson, Ivan Titov, Christian Hardmeier

2507.10134 2026-02-23 cs.AI

FRSICL: LLM-Enabled In-Context Learning Flight Resource Allocation for Fresh Data Collection in UAV-Assisted Wildfire Monitoring

Yousef Emami, Hao Zhou, Miguel Gutierrez Gaitan, Kai Li, Luis Almeida

2507.09650 2026-02-23 cs.LG

Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

Lily Hong Zhang, Smitha Milli, Karen Jusko, Jonathan Smith, Brandon Amos, Wassim Bouaziz, Manon Revel, Jack Kussman, Yasha Sheynin, Lisa Titus, Bhaktipriya Radharapu, Jane Yu, Vidya Sarma, Kris Rose, Maximilian Nickel

2507.09043 2026-02-23 cs.LG stat.ML

GAGA: Gaussianity-Aware Gaussian Approximation for Efficient 3D Molecular Generation

Jingxiang Qu, Wenhan Gao, Ruichen Xu, Yi Liu

2507.00319 2026-02-23 cs.RO

When Digital Twins Meet Large Language Models: Realistic, Interactive, and Editable Simulation for Autonomous Driving

Tanmay Vilas Samak, Chinmay Vilas Samak, Bing Li, Venkat Krovi

Comments Accepted in IEEE Robotics & Automation Magazine (RAM)

2506.23339 2026-02-23 cs.LG cs.AI physics.chem-ph q-bio.QM

VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design

Malikussaid, Hilal Hudan Nuha, Isman Kurniawan

Comments 6 pages, 1 figure, 1 algorithm, 5 tables, to be published in ISPACS 2025, unabridged version exists as arXiv:2506.23339v1

Journal ref Proc. 2025 Int. Symp. on Intell. Signal Process. and Commun. Syst. (ISPACS), 2025, pp. 1-6

2506.22095 2026-02-23 cs.LG cs.AI

Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs

Filip Rydin, Attila Lischka, Jiaming Wu, Morteza Haghir Chehreghani, Balázs Kulcsár

Comments Accepted by ICLR 2026, Final Camera-Ready Version. 34 pages, 6 Figures

2506.15408 2026-02-23 cs.LG cs.AI

Unifying VXAI: A Systematic Review and Framework for the Evaluation of Explainable AI

David Dembinsky, Adriano Lucieri, Stanislav Frolov, Hiba Najjar, Ko Watanabe, Andreas Dengel

Comments Published at TMLR

Journal ref Transactions on Machine Learning Research (2026), ISSN 2835-8856

2506.14825 2026-02-23 cs.CV cs.AI

GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction

Ke Song, Yunhe Wu, Chunchit Siu, Huiyuan Xiong

Journal ref IEEE Transactions on Circuits and Systems for Video Technology 2026

详情

DOI: 10.1109/TCSVT.2026.3667111

英文摘要

Addressing the task of 3D semantic occupancy prediction for autonomous driving, we tackle two key issues in existing 3D Gaussian Splatting (3DGS) methods: (1) unified feature aggregation neglecting semantic correlations among similar categories and across regions, (2) boundary ambiguities caused by the lack of geometric constraints in MLP iterative optimization and (3) biased issues in dynamic-static object coupling optimization. We propose the GraphGSOcc model, a novel framework that combines semantic and geometric graph Transformer and decouples dynamic-static objects optimization for 3D Gaussian Splatting-based Occupancy Prediction. We propose the Dual Gaussians Graph Attenntion, which dynamically constructs dual graph structures: a geometric graph adaptively calculating KNN search radii based on Gaussian poses, enabling large-scale Gaussians to aggregate features from broader neighborhoods while compact Gaussians focus on local geometric consistency; a semantic graph retaining top-M highly correlated nodes via cosine similarity to explicitly encode semantic relationships within and across instances. Coupled with the Multi-scale Graph Attention framework, fine-grained attention at lower layers optimizes boundary details, while coarsegrained attention at higher layers models object-level topology. On the other hand, we decouple dynamic and static objects by leveraging semantic probability distributions and design a Dynamic-Static Decoupled Gaussian Attention mechanism to optimize the prediction performance for both dynamic objects and static scenes. GraphGSOcc achieves state-ofthe-art performance on the SurroundOcc-nuScenes, Occ3D-nuScenes, OpenOcc and KITTI occupancy benchmarks. Experiments on the SurroundOcc dataset achieve an mIoU of 25.20%, reducing GPU memory to 6.8 GB, demonstrating a 1.97% mIoU improvement and 13.7% memory reduction compared to GaussianWorld.

URL PDF HTML ☆

赞 0 踩 0

2506.14457 2026-02-23 cs.LG

Dataset distillation for memorized data: Soft labels can leak held-out teacher knowledge

Freya Behrens, Lenka Zdeborová

Comments 9 pages, 21 figures

Journal ref ICLR 2026

2506.08364 2026-02-23 cs.CL

Structure-Augmented Reasoning Generation

Jash Rajesh Parekh, Pengcheng Jiang, Jiawei Han

2505.20674 2026-02-23 cs.CL cs.AI

PonderLM: Pretraining Language Models to Ponder in Continuous Space

Boyi Zeng, Shixiang Song, Siyuan Huang, Yixuan Wang, He Li, Ziwei He, Xinbing Wang, Zhiyu Li, Zhouhan Lin

Comments ICLR 2026

2505.18612 2026-02-23 cs.CV

Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter

Weizhi Zhong, Huan Yang, Zheng Liu, Huiguo He, Zijian He, Xuesong Niu, Di Zhang, Guanbin Li

Comments Accepted by ICLR 2026, project page: https://weizhi-zhong.github.io/Mod-Adapter

2505.18150 2026-02-23 cs.LG q-bio.QM stat.ML

Generative Distribution Embeddings: Lifting autoencoders to the space of distributions for multiscale representation learning

Nic Fishman, Gokul Gowri, Peng Yin, Jonathan Gootenberg, Omar Abudayyeh

Comments NeurIPS 2025

2505.17748 2026-02-23 cs.LG cs.CV

Soft-CAM: Making black box models self-explainable for medical image analysis

Kerol Djoumessi, Philipp Berens

Comments Accepted at the Medical Imaging with Deep Learning Conference (MIDL 2026)

2505.17064 2026-02-23 cs.CV cs.AI cs.LG

Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

Maria-Teresa De Rosa Palmini, Eva Cetinic

2505.14825 2026-02-23 cs.LG math.ST physics.data-an stat.ME stat.ML stat.TH

Assimilative Causal Inference

Marios Andreou, Nan Chen, Erik Bollt

Comments 47 pages (Main Text pp. 1--17; Supplementary Information pp. 18--47), 11 figures (3 in Main Text, 8 in Supplementary Information). Published in Nature Communications. The MATLAB code used in the analyses and to generate the figures in this work can be found in https://github.com/marandmath/ACI_code . For further details visit https://mariosandreou.short.gy/ACI

Journal ref Nature Communications 17, 1854 (2026)

2505.14338 2026-02-23 cs.LG cs.DM cs.NE math.CO

Better Neural Network Expressivity: Subdividing the Simplex

Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, Amir Yehudayoff

Comments 12 pages, 2 figures

2504.17311 2026-02-23 cs.CL cs.AI

FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

Yulia Otmakhova, Hung Thinh Truong, Rahmad Mahendra, Zenan Zhai, Rongxin Zhu, Daniel Beck, Jey Han Lau

Comments Accepted to EACL 2026 Findings

2504.14556 2026-02-23 cs.AI cs.ET cs.LG cs.RO

LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks

Yousef Emami, Hao Zhou, SeyedSina Nabavirazani, Luis Almeida

2504.06629 2026-02-23 cs.CV

Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

MinKyu Lee, Sangeek Hyun, Woojin Jun, Hyunjun Kim, Jiwoo Chung, Jae-Pil Heo

Comments Codes are available at: https://github.com/2minkyulee/i-LN

2502.17160 2026-02-23 cs.CV cs.LG

A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis

Yuli Wu, Fucheng Liu, Rüveyda Yilmaz, Henning Konermann, Peter Walter, Johannes Stegmaier

Comments MIDL 2026

2502.16189 2026-02-23 cs.LG cond-mat.mtrl-sci q-bio.BM q-bio.QM

Co-Evolution-Based Metal-Binding Residue Prediction with Graph Neural Networks

Sayedmohammadreza Rastegari, Sina Tabakhi, Xianyuan Liu, Tianyi Jiang, Wei Sang, Haiping Lu

Comments 10 pages, 6 figures

2502.03738 2026-02-23 cs.CV

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Feng Wang, Yaodong Yu, Guoyizhe Wei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

2501.04817 2026-02-23 cs.LG cs.AI

Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

Ziyuan Bao, Eiman Kanjo, Soumya Banerjee, Hasib-Al Rashid, Tinoosh Mohsenin

Journal ref IEEE Pervasive Computing, 2026

2412.13897 2026-02-23 cs.LG cs.CV

Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model

Yuqiu Liu, Jingxuan Xu, Mauricio Soroco, Yunchao Wei, Wuyang Chen

Comments Accepted by 3DV 2026

AI 大模型

视觉与机器人

科学与医疗

M3OOD: Automatic Selection of Multimodal OOD Detectors

Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning

CoAct-1: Computer-using Multi-Agent System with Coding Actions

ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing

FRSICL: LLM-Enabled In-Context Learning Flight Resource Allocation for Fresh Data Collection in UAV-Assisted Wildfire Monitoring

Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

GAGA: Gaussianity-Aware Gaussian Approximation for Efficient 3D Molecular Generation

When Digital Twins Meet Large Language Models: Realistic, Interactive, and Editable Simulation for Autonomous Driving

VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design

Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs

Unifying VXAI: A Systematic Review and Framework for the Evaluation of Explainable AI

GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction

Dataset distillation for memorized data: Soft labels can leak held-out teacher knowledge

Structure-Augmented Reasoning Generation

PonderLM: Pretraining Language Models to Ponder in Continuous Space

Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter

Generative Distribution Embeddings: Lifting autoencoders to the space of distributions for multiscale representation learning

Soft-CAM: Making black box models self-explainable for medical image analysis

Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

Assimilative Causal Inference

Better Neural Network Expressivity: Subdividing the Simplex

FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation

LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks

Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis

Co-Evolution-Based Metal-Binding Residue Prediction with Graph Neural Networks

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Decentralised Resource Sharing in TinyML: Wireless Bilayer Gossip Parallel SGD for Collaborative Learning

Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model