arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2604.10918 2026-04-14 cs.AI

CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation

Yunfan Yang, Cuiling Lan, Jitao Sang, Yan Lu

Comments Accepted by ACL2026 (main conference)

2604.10917 2026-04-14 cs.CL

HTAA: Enhancing LLM Planning via Hybrid Toolset Agentization & Adaptation

Chengrui Huang, Junshuo Zhang, Zhiyuan Ma, Xikun Wang, Ximeng Wang, Menghua Jiang, Gang Zeng, Zhaobing Han, Shen Gao, Shuo Shang

Comments 22 pages, 3 figures

2604.10912 2026-04-14 cs.CV

TAMISeg: Text-Aligned Multi-scale Medical Image Segmentation with Semantic Encoder Distillation

Qiang Gao, Yi Wang, Yong Zhang, Yong Li, Yongbing Deng, Lan Du, Cunjian Chen

Comments Accepted by IEEE International Conference on Multimedia and Expo (ICME), 2026

2604.10908 2026-04-14 cs.AI

Reasoning as Data: Representation-Computation Unity and Its Implementation in a Domain-Algebraic Inference Engine

Chao Li, Yuru Wang

Comments 16pages ; Open-source implementation and evaluation scripts will be released in a subsequent revision

2604.10905 2026-04-14 cs.SD cs.AI cs.CL eess.AS

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, Lasha Koroshinadze, Nishit Anand, Zhifeng Kong, Siddharth Gururani, Sang-gil Lee, Jaehyeon Kim, Aya Aljafari, Chao-Han Huck Yang, Sungwon Kim, Ramani Duraiswami, Dinesh Manocha, Mohammad Shoeybi, Bryan Catanzaro, Ming-Yu Liu, Wei Ping

Comments Project website: https://afnext-umd-nvidia.github.io/

详情

英文摘要

We present Audio Flamingo Next (AF-Next), the next-generation and most capable large audio-language model in the Audio Flamingo series, designed to advance understanding and reasoning over speech, environmental sounds and music. Compared to Audio Flamingo 3, AF-Next introduces: (i) a stronger foundational audio-language model that significantly improves accuracy across diverse audio understanding tasks; (ii) scalable strategies for constructing large-scale audio understanding and reasoning data beyond existing academic benchmarks; (iii) support for long and complex audio inputs up to 30 minutes; and (iv) Temporal Audio Chain-of-Thought, a new reasoning paradigm that explicitly grounds intermediate reasoning steps to timestamps in long audio, enabling fine-grained temporal alignment and improved interpretability. To enable these capabilities, we first conduct a systematic analysis of Audio Flamingo 3 to identify key gaps in audio understanding and reasoning. We then curate and scale new large-scale datasets totaling over 1 million hours to address these limitations and expand the existing AudioSkills-XL, LongAudio-XL, AF-Think and AF-Chat datasets. AF-Next is trained using a curriculum-based strategy spanning pre-training, mid-training and post-training stages. Extensive experiments across 20 audio understanding and reasoning benchmarks, including challenging long-audio tasks, show that AF-Next outperforms similarly sized open models by large margins and remains highly competitive with and sometimes surpasses, much larger open-weight and closed models. Beyond benchmark performance, AF-Next exhibits strong real-world utility and transfers well to unseen tasks, highlighting its robustness and generalization ability. In addition to all data, code and methods, we open-source 3 variants of AF-Next, including AF-Next-Instruct, AF-Next-Think and AF-Next-Captioner.

URL PDF HTML ☆

赞 0 踩 0

2604.10904 2026-04-14 cs.CV cs.AI

Evaluating the Impact of Medical Image Reconstruction on Downstream AI Fairness and Performance

Matteo Wohlrapp, Niklas Bubeck, Daniel Rueckert, William Lotter

Comments Proceedings of the Medical Imaging with Deep Learning (MIDL) Conference 2026

2604.10900 2026-04-14 cs.AI cs.LG

CASK: Core-Aware Selective KV Compression for Reasoning Traces

Buseong Kim, Heejun Gwon

Comments 25 pages, 8 figures, 3 main tables, appendices included

2604.10894 2026-04-14 cs.CV

EviRCOD: Evidence-Guided Probabilistic Decoding for Referring Camouflaged Object Detection

Ye Wang, Kai Huang, Sumin Shen, Chenyang Ma

2604.10885 2026-04-14 cs.CV cs.AI cs.GR

Product Review Based on Optimized Facial Expression Detection

Vikrant Chaugule, Abhishek D, Aadheeshwar Vijayakumar, Pravin Bhaskar Ramteke, Shashidhar G. Koolagudi

Comments 9 pages, 11 figures, Published in the 2016 Ninth International Conference on Contemporary Computing (IC3), August 11-13, 2016, Noida, India. This is a pre-print version of the paper

Journal ref 2016 Ninth International Conference on Contemporary Computing (IC3), Noida, India, 2016

2604.10882 2026-04-14 cs.LG cs.AI

DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation

Yang Yan, Qiuyan Wang, Tianjin Huang, Qiudong Yu, Kexin Zhang

2604.10874 2026-04-14 cs.CL cs.AI

AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis

Qinjiang Niu, Lu Yan

2604.10865 2026-04-14 cs.AI

Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering

Mingjie Zhao, Yunfan Zhang, Yiqun Zhang, Yiu-ming Cheung

2604.10862 2026-04-14 cs.CV

LRD-Net: A Lightweight Real-Centered Detection Network for Cross-Domain Face Forgery Detection

Xuecen Zhang, Vipin Chaudhary

2604.10857 2026-04-14 cs.LG cs.AI cs.DS math.ST stat.ML stat.TH

Query Lower Bounds for Diffusion Sampling

Zhiyang Xun, Eric Price

2604.10856 2026-04-14 cs.RO cs.AI

BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

Seth Z. Zhao, Luobin Wang, Hongwei Ruan, Yuxin Bao, Yilan Chen, Ziyang Leng, Abhijit Ravichandran, Honglin He, Zewei Zhou, Xu Han, Abhishek Peri, Zhiyu Huang, Pranav Desai, Henrik Christensen, Jiaqi Ma, Bolei Zhou

2604.10853 2026-04-14 cs.AI

A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness

Maruf Ahmed Mridul, Rohit Kapa, Oshani Seneviratne

2604.10849 2026-04-14 cs.LG cs.AI

Task2vec Readiness: Diagnostics for Federated Learning from Pre-Training Embeddings

Cristiano Mafuz, Rodrigo Silva

2604.10848 2026-04-14 cs.LG

Transformers Learn Latent Mixture Models In-Context via Mirror Descent

Francesco D'Angelo, Nicolas Flammarion

2604.10843 2026-04-14 cs.CV cs.AI cs.LG cs.NE

Retinal Cyst Detection from Optical Coherence Tomography Images

Abhishek Dharmaratnakar, Aadheeshwar Vijayakumar, Suchand Dayanand

Comments 13 pages, 9 figures

2604.10837 2026-04-14 cs.CV

Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation

Zeqian Long, Ozgur Kara, Haotian Xue, Yongxin Chen, James M. Rehg

2604.10836 2026-04-14 cs.CV cs.RO

HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching

Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen, Jiankang Deng, Cordelia Schmid, Stefanos Zafeiriou

Comments Project Page: https://zerchen.github.io/projects/hoflow.html

2604.10823 2026-04-14 cs.CV cs.LG

Uncertainty-Guided Attention and Entropy-Weighted Loss for Precise Plant Seedling Segmentation

Mohamed Ehab, Ali Hamdi

2604.10821 2026-04-14 cs.LG stat.CO stat.ML

Slithering Through Gaps: Capturing Discrete Isolated Modes via Logistic Bridging

Pinaki Mohanty, Ruqi Zhang

2604.10814 2026-04-14 cs.LG math.ST stat.TH

Online Covariance Estimation in Averaged SGD: Improved Batch-Mean Rates and Minimax Optimality via Trajectory Regression

Yijin Ni, Xiaoming Huo

2604.10812 2026-04-14 cs.LG

PokeRL: Reinforcement Learning for Pokemon Red

Dheeraj Mudireddy, Sai Patibandla

2604.10809 2026-04-14 cs.RO

WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations

Harry Freeman, Chung Hee Kim, George Kantor

2604.10797 2026-04-14 cs.CV

WBCBench 2026: A Challenge for Robust White Blood Cell Classification Under Class Imbalance

Xin Tian, Xudong Ma, Tianqi Yang, Alin Achim, Bartłomiej W Papież, Phandee Watanaboonyongcharoen, Nantheera Anantrasirichai

Comments IEEE International Symposium on Biomedical Imaging (ISBI)

2604.10791 2026-04-14 cs.CL cs.LG

Position-Agnostic Pre-Projection for Transformer Attention: Nonlinear Feature Construction and Content Skip Before Q/K/V

Chirag Shinde

Comments 7 pages, 2 figures, 5 tables. Code: https://github.com/cs-cmyk/preprojection

2604.10789 2026-04-14 cs.CV

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment

Mingyu Dong, Chong Xia, Mingyuan Jia, Weichen Lyu, Long Xu, Zheng Zhu, Yueqi Duan

Comments Project Page: https://xiac20.github.io/ReplicateAnyScene/

2604.10787 2026-04-14 cs.CL

When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities

Sarmistha Das, Shreyas Guha, Suvrayan Bandyopadhyay, Salisa Phosit, Kitsuchart Pasupa, Sriparna Saha

AI 大模型

视觉与机器人

科学与医疗