arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2511.00524 2026-04-01 cs.CV

Text-guided Fine-Grained Video Anomaly Understanding

Jihao Gu, Kun Li, He Wang, Kaan Akşit

Comments Accepted by CVPR 2026 SVC Workshop

2510.23364 2026-04-01 cs.LG cs.AI

ZeroFlood: Flood Hazard Mapping from Single-Modality SAR Using Geo-Foundation Models

Hyeongkyun Kim, Orestis Oikonomou

Comments 16th European Conference on Synthetic Aperture Radar

2510.23043 2026-04-01 cs.CV

HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

Joungbin An, Kristen Grauman

Comments CVPR 2026. Project Page: https://vision.cs.utexas.edu/projects/hieramamba/

2510.20155 2026-04-01 cs.CV

PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding

Penghao Wang, Yiyang He, Xin Lv, Yukai Zhou, Lan Xu, Jingyi Yu, Jiayuan Gu

Comments NeurIPS 2025 DB Track. Project page: https://authoritywang.github.io/partnext

2510.17899 2026-04-01 cs.LG cs.AI cs.NE

Automated Algorithm Design for Auto-Tuning Optimizers

Floris-Jan Willemsen, Niki van Stein, Ben van Werkhoven

2510.13860 2026-04-01 cs.CL cs.AI

ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models

Shivanshu Kumar, Gopalakrishnan Srinivasan

2510.13077 2026-04-01 cs.LG cs.AI eess.SP

A Semi-amortized Lifted Learning-to-Optimize Masked (SALLO-M) Transformer Model for Scalable and Generalizable Beamforming

Yubo Zhang, Xiao-Yang Liu, Xiaodong Wang

Comments 13 pages

2510.12405 2026-04-01 cs.LG cond-mat.mtrl-sci

Continuous SUN (Stable, Unique, and Novel) Metric for Generative Modeling of Inorganic Crystals

Masahiro Negishi, Hyunsoo Park, Kinga O. Mastej, Aron Walsh

Comments 23 pages (17 pages of main text). See https://github.com/WMD-group/xtalmet for the code. Significantly extended from the early version of this work, which was accepted to the AI4Mat workshop at NeurIPS 2025

2510.09858 2026-04-01 cs.AI

AI and Consciousness

Eric Schwitzgebel

2510.09734 2026-04-01 cs.LG cs.AI

ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting

Jindong Tian, Yifei Ding, Ronghui Xu, Hao Miao, Chenjuan Guo, Bin Yang

Comments 25 pages, 16 figures, ICLR 2026 Camera Ready

2510.06662 2026-04-01 cs.LG stat.ML

The Effect of Attention Head Count on Transformer Approximation

Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li

Comments Accepted by ICLR 2026

2510.04923 2026-04-01 cs.CV cs.AI

REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis

Alec K. Peltekian, Halil Ertugrul Aktas, Gorkem Durak, Kevin Grudzinski, Bradford C. Bemiss, Carrie Richardson, Jane E. Dematte, G. R. Scott Budinger, Anthony J. Esposito, Alexander Misharin, Alok Choudhary, Ankit Agrawal, Ulas Bagci

Comments 13 pages, 4 figures, 5 tables

2510.03638 2026-04-01 cs.LG cs.AI math.RT stat.ML

Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling

Jialin Liu, Lisang Ding, Stanley Osher, Wotao Yin

2510.02789 2026-04-01 cs.CV cs.AI cs.LG

Align Your Query: Representation Alignment for Multimodality Medical Object Detection

Ara Seo, Bryan Sangwoo Kim, Hyungjin Chung, Jong Chul Ye

Comments Project page: https://araseo.github.io/alignyourquery/

2509.23067 2026-04-01 cs.CL cs.AI

Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks

Chunyang Jiang, Yonggang Zhang, Yiyang Cai, Chi-Min Chan, Yulong Liu, Mingming Chen, Wei Xue, Yike Guo

2509.21223 2026-04-01 cs.CV cs.CL

Sigma: Semantically Informative Pre-training for Skeleton-based Sign Language Understanding

Muxin Pu, Mei Kuan Lim, Chun Yong Chong, Chen Change Loy

2509.21035 2026-04-01 cs.AI cs.CL

CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering

Yang Zhao, Chengxiao Dai, Wei Zhuo, Yue Xiu, Dusit Niyato

2509.18527 2026-04-01 cs.AI

FERA: A Pose-Based Framework for Rule-Grounded Multimedia Decision Support with a Foil Fencing Case Study

Ziwen Chen, Zhong Wang

Comments Changed which model architectures were tested in the pipeline, introduced deeper end to end analysis, increased performance in move detection

2509.15695 2026-04-01 cs.CV cs.LG

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

Zhaoyang Li, Zhan Ling, Yuchen Zhou, Litian Gong, Erdem Bıyık, Hao Su

Comments We request withdrawal of this paper because one of the listed institutional affiliations was included without proper authorization. This issue cannot be resolved through a simple revision, and we therefore request withdrawal to prevent dissemination of incorrect or unauthorized affiliation information

2509.11452 2026-04-01 cs.LG cs.CL

Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting

Yining Lu, Zilong Wang, Shiyang Li, Xin Liu, Changlong Yu, Qingyu Yin, Zhan Shi, Zixuan Zhang, Meng Jiang

2509.10388 2026-04-01 cs.CV

VT-Intrinsic: Physics-Based Decomposition of Reflectance and Shading using a Single Visible-Thermal Image Pair

Zeqing Yuan, Mani Ramanagopal, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan

Comments Accepted by CVPR 2026. Project webpage: https://vt-intrinsic.github.io/

2509.09666 2026-04-01 cs.CV

Unified Multimodal Models as Auto-Encoders

Zhiyuan Yan, Kaiqing Lin, Zongjian Li, Junyan Ye, Hui Han, Haochen Wang, Zhendong Wang, Bin Lin, Hao Li, Xinyan Xiao, Jingdong Wang, Haifeng Wang, Li Yuan

详情

英文摘要

Image-to-text (I2T) understanding and text-to-image (T2I) generation are two fundamental, important yet traditionally isolated multimodal tasks. Despite their intrinsic connection, existing approaches typically optimize them independently, missing the opportunity for mutual enhancement. In this paper, we argue that the both tasks can be connected under a shared Auto-Encoder perspective, where text serves as the intermediate latent representation bridging the two directions - encoding images into textual semantics (I2T) and decoding text back into images (T2I). Our key insight is that if the encoder truly "understands" the image, it should capture all essential structure, and if the decoder truly "understands" the text, it should recover that structure faithfully. Building upon this principle, we propose Unified-GRPO, a post-training method based on reinforcement learning that jointly optimizes both modules through reconstructive rewards, maximizing the semantic consistency between the input and the generated images. Under this reconstruction objective, the encoder is encouraged to extract as much accurate and comprehensive semantic information from the input image to maximize reconstruction quality, while the decoder is simultaneously optimized to generate conditioned on the encoder's prior, enabling a self-evolving improvement. Empirically, we find that using text as the intermediate representation and training under a reconstructive RL paradigm effectively benefits both I2T and T2I. The I2T module gains stronger fine-grained visual perception, such as small-object recognition, grounding, etc, while its dense embeddings and language priors, in turn, provide richer semantic signals that improve T2I fidelity and complex instruction following. These results demonstrate that the reconstructive RL establishes a mutually reinforcing cross-modal synergy within the auto-encoding framework.

URL PDF HTML ☆

赞 0 踩 0

2509.06285 2026-04-01 cs.RO

DCReg: Decoupled Characterization for Efficient Degenerate LiDAR Registration

Xiangcheng Hu, Xieyuanli Chen, Mingkai Jia, Jin Wu, Ping Tan, Steven L. Waslander

Comments 27 pages, 19 figures, 9 tables

详情

英文摘要

LiDAR point cloud registration is fundamental to robotic perception and navigation. In geometrically degenerate environments (e.g., corridors), registration becomes ill-conditioned: certain motion directions are weakly constrained, causing unstable solutions and degraded accuracy. Existing detect-then-mitigate methods fail to reliably detect, physically interpret, and stabilize this ill-conditioning without corrupting the optimization. We introduce DCReg (Decoupled Characterization for Ill-conditioned Registration), establishing a detect-characterize-mitigate paradigm that systematically addresses ill-conditioned registration via three innovations. First, DCReg achieves reliable ill-conditioning detection by employing Schur complement decomposition on the Hessian matrix. This decouples the 6-DoF registration into 3-DoF clean rotational and translational subspaces, eliminating coupling effects that mask degeneracy in full-Hessian analyses. Second, within these subspaces, we develop interpretable characterization techniques resolving eigen-basis ambiguities via basis alignment. This establishes stable mappings between eigenspaces and physical motion directions, providing actionable insights on which motions lack constraints and to what extent. Third, leveraging this spectral information, we design a targeted mitigation via a structured preconditioner. Guided by MAP regularization, we implement eigenvalue clamping exclusively within the preconditioner rather than modifying the original problem. This preserves the least-squares objective and minimizer, enabling efficient optimization via Preconditioned Conjugate Gradient with a single interpretable parameter. Experiments demonstrate DCReg achieves 20-50% higher long-duration localization accuracy and 5-30x speedups (up to 116x) over degeneracy-aware baselines across diverse environments. Code: https://github.com/JokerJohn/DCReg

URL PDF HTML ☆

赞 0 踩 0

2508.14292 2026-04-01 cs.CL

Tokens with Meaning: A Hybrid Tokenization Approach for Turkish

M. Ali Bayram, Ali Arda Fincan, Ahmet Semih Gümüş, Sercan Karakaş, Banu Diri, Savaş Yıldırım, Demircan Çelik

详情

英文摘要

Tokenization shapes how language models perceive morphology and meaning in NLP, yet widely used frequency-driven subword tokenizers (e.g., Byte Pair Encoding and WordPiece) can fragment morphologically rich and agglutinative languages in ways that obscure morpheme boundaries. We introduce a linguistically informed hybrid tokenizer for Turkish that combines (i) dictionary-driven morphological segmentation (roots and affixes), (ii) phonological normalization that maps allomorphic variants to shared identifiers, and (iii) a controlled subword fallback for out-of-vocabulary coverage. Concretely, our released Turkish vocabulary contains 22,231 root tokens mapped to 20,000 canonical root identifiers (with leading spaces to mark word boundaries), 72 affix identifiers that cover 177 allomorphic surface forms, and 12,696 subword units; an orthographic case token preserves capitalization without inflating the vocabulary. We evaluate tokenization quality on the TR-MMLU dataset using two linguistic alignment metrics: Turkish Token Percentage (TR~\%), the proportion of produced tokens that correspond to Turkish lexical/morphemic units under our lexical resources, and Pure Token Percentage (Pure~\%), the proportion of tokens aligning with unambiguous root/affix boundaries. The proposed tokenizer reaches 90.29\% TR~\% and 85.80\% Pure~\% on TR-MMLU, substantially exceeding several general-purpose tokenizers. We further validate practical utility with downstream sentence embedding benchmarks under a strict \emph{random initialization} control to isolate tokenizer inductive bias. Across four matched models (TurkishTokenizer, CosmosGPT2, Mursit, and Tabi), TurkishTokenizer outperforms all baselines on the Turkish STS Benchmark and achieves the strongest overall average on MTEB-TR. It also yields the strongest average accuracy on the TurBLiMP under a centroid-based proxy.

URL PDF HTML ☆

赞 0 踩 0

2508.12692 2026-04-01 cs.CV cs.AI cs.LG

Multi-Level Knowledge Distillation and Dynamic Self-Supervised Learning for Continual Learning

Taeheon Kim, San Kim, Minhyuk Seo, Dongjae Jeon, Wonje Jeung, Jonghyun Choi

Comments 2nd Place in Class-Incremental with Repetition (CIR) using Unlabeled Data Challenge at 5th CLVISION workshop, CVPR 2024

2508.12690 2026-04-01 cs.CV cs.AI cs.LG

TTA-DAME: Test-Time Adaptation with Domain Augmentation and Model Ensemble for Dynamic Driving Conditions

Dongjae Jeon, Taeheon Kim, Seongwon Cho, Minhyuk Seo, Jonghyun Choi

Comments 1st Place in Continual Test-time Adaptation for Object Detection Challenge at VCL Workshop, ICCV 2023

2508.08115 2026-04-01 cs.AI

TeamMedAgents: Pareto-Efficient Multi-Agent Medical Reasoning Through Teamwork Theory

Pranav Pushkar Mishra, Mohammad Arvan, Mohan Zalake

Comments 19 pages, 6 figure, 12 tables, 2 algorithm

2507.21273 2026-04-01 cs.LG

Deep Polynomial Chaos Expansion

Johannes Exenberger, Sascha Ranftl, Robert Peharz

Comments 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026

2507.13266 2026-04-01 cs.CL cs.AI

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Jiazheng Li, Hongzhou Lin, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Yi Wu, Jingzhao Zhang

Comments 25 pages, 18 figures, ICLR 2026

2507.11539 2026-04-01 cs.CV cs.AI cs.LG

Streaming 4D Visual Geometry Transformer

Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Yuqi Wu, Jie Zhou, Jiwen Lu

Comments Code is available at: https://github.com/wzzheng/StreamVGGT

AI 大模型

视觉与机器人

科学与医疗

Text-guided Fine-Grained Video Anomaly Understanding

ZeroFlood: Flood Hazard Mapping from Single-Modality SAR Using Geo-Foundation Models

HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling

PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding

Automated Algorithm Design for Auto-Tuning Optimizers

ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models

A Semi-amortized Lifted Learning-to-Optimize Masked (SALLO-M) Transformer Model for Scalable and Generalizable Beamforming

Continuous SUN (Stable, Unique, and Novel) Metric for Generative Modeling of Inorganic Crystals

AI and Consciousness

ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting

The Effect of Attention Head Count on Transformer Approximation

REN: Anatomically-Informed Mixture-of-Experts for Interstitial Lung Disease Diagnosis

Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling

Align Your Query: Representation Alignment for Multimodality Medical Object Detection

Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks

Sigma: Semantically Informative Pre-training for Skeleton-based Sign Language Understanding

CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering

FERA: A Pose-Based Framework for Rule-Grounded Multimedia Decision Support with a Foil Fencing Case Study

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting

VT-Intrinsic: Physics-Based Decomposition of Reflectance and Shading using a Single Visible-Thermal Image Pair

Unified Multimodal Models as Auto-Encoders

DCReg: Decoupled Characterization for Efficient Degenerate LiDAR Registration

Tokens with Meaning: A Hybrid Tokenization Approach for Turkish

Multi-Level Knowledge Distillation and Dynamic Self-Supervised Learning for Continual Learning

TTA-DAME: Test-Time Adaptation with Domain Augmentation and Model Ensemble for Dynamic Driving Conditions

TeamMedAgents: Pareto-Efficient Multi-Agent Medical Reasoning Through Teamwork Theory

Deep Polynomial Chaos Expansion

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Streaming 4D Visual Geometry Transformer