arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2602.13731 2026-02-17 cs.CV

Generative Latent Representations of 3D Brain MRI for Multi-Task Downstream Analysis in Down Syndrome

Jordi Malé, Juan Fortea, Mateus Rozalem-Aranha, Neus Martínez-Abadías, Xavier Sevillano

2602.13728 2026-02-17 cs.CV

Explore Intrinsic Geometry for Query-based Tiny and Oriented Object Detector with Momentum-based Bipartite Matching

Junpeng Zhang, Zewei Yang, Jie Feng, Yuhui Zheng, Ronghua Shang, Mengxuan Zhang

Comments 13 pages

2602.13726 2026-02-17 cs.CV

RGA-Net: A Vision Enhancement Framework for Robotic Surgical Systems Using Reciprocal Attention Mechanisms

Quanjun Li, Weixuan Li, Han Xia, Junhua Zhou, Chi-Man Pun, Xuhang Chen

Comments Accepted by ICRA2026

详情

英文摘要

Robotic surgical systems rely heavily on high-quality visual feedback for precise teleoperation; yet, surgical smoke from energy-based devices significantly degrades endoscopic video feeds, compromising the human-robot interface and surgical outcomes. This paper presents RGA-Net (Reciprocal Gating and Attention-fusion Network), a novel deep learning framework specifically designed for smoke removal in robotic surgery workflows. Our approach addresses the unique challenges of surgical smoke-including dense, non-homogeneous distribution and complex light scattering-through a hierarchical encoder-decoder architecture featuring two key innovations: (1) a Dual-Stream Hybrid Attention (DHA) module that combines shifted window attention with frequency-domain processing to capture both local surgical details and global illumination changes, and (2) an Axis-Decomposed Attention (ADA) module that efficiently processes multi-scale features through factorized attention mechanisms. These components are connected via reciprocal cross-gating blocks that enable bidirectional feature modulation between encoder and decoder pathways. Extensive experiments on the DesmokeData and LSD3K surgical datasets demonstrate that RGA-Net achieves superior performance in restoring visual clarity suitable for robotic surgery integration. Our method enhances the surgeon-robot interface by providing consistently clear visualization, laying a technical foundation for alleviating surgeons' cognitive burden, optimizing operation workflows, and reducing iatrogenic injury risks in minimally invasive procedures. These practical benefits could be further validated through future clinical trials involving surgeon usability assessments. The proposed framework represents a significant step toward more reliable and safer robotic surgical systems through computational vision enhancement.

URL PDF HTML ☆

赞 0 踩 0

2602.13720 2026-02-17 cs.RO

FC-Vision: Real-Time Visibility-Aware Replanning for Occlusion-Free Aerial Target Structure Scanning in Unknown Environments

Chen Feng, Yang Xu, Shaojie Shen

Comments 8 pages, 8 figures, 3 tables

2602.13718 2026-02-17 cs.RO cs.AI

HybridFlow: A Two-Step Generative Policy for Robotic Manipulation

Zhenchen Dong, Jinna Fu, Jiaming Wu, Shengyuan Yu, Fulin Chen, Yide Liu

2602.13103 2026-02-17 cs.LG

R-Diverse: Mitigating Diversity Illusion in Self-Play LLM Training

Gengsheng Li, Jinghan He, Shijie Wang, Dan Zhang, Ruiqi Liu, Renrui Zhang, Zijun Yao, Junfeng Fang, Haiyun Guo, Jinqiao Wang

2602.13021 2026-02-17 cs.LG cs.AI

Prior-Guided Symbolic Regression: Towards Scientific Consistency in Equation Discovery

Jing Xiao, Xinhai Chen, Jiaming Peng, Qinglin Wang, Menghan Jia, Zhiquan Lai, Guangping Yu, Dongsheng Li, Tiejun Li, Jie Liu

2602.12916 2026-02-17 cs.CV cs.LG

Reliable Thinking with Images

Haobin Li, Yutong Yang, Yijie Lin, Xiang Dai, Mouxing Yang, Xi Peng

Comments 26 pages, 19 figures

2602.12723 2026-02-17 cs.SD

Towards explainable reference-free speech intelligibility evaluation of people with pathological speech

Bence Mark Halpern, Thomas Tienkamp, Defne Abur, Tomoki Toda

Comments Plan to be submited to Interspeech 2026

2602.12157 2026-02-17 cs.CV cs.GR

TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation

Ziteng Lu, Yushuang Wu, Chongjie Ye, Yuda Qiu, Jing Shao, Xiaoyang Guo, Jiaqing Zhou, Tianlei Hu, Kun Zhou, Xiaoguang Han

Comments Project page: https://texlet-arch.github.io/TexSpot-page

2602.12095 2026-02-17 cs.RO

Pack it in: Packing into Partially Filled Containers Through Contact

David Russell, Zisong Xu, Maximo A. Roa, Mehmet Dogar

Comments 8 pages, 5 figures

2602.12063 2026-02-17 cs.RO

VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

Yanjiang Guo, Tony Lee, Lucy Xiaoyang Shi, Jianyu Chen, Percy Liang, Chelsea Finn

Comments Project Page: https://sites.google.com/view/vlaw-arxiv

2602.11712 2026-02-17 cs.LG cs.CE nlin.CD physics.data-an stat.ME

Potential-energy gating for robust state estimation in bistable stochastic systems

Luigi Simeone

Comments 20 pages, 8 figures

详情

英文摘要

We introduce potential-energy gating, a method for robust state estimation in systems governed by double-well stochastic dynamics. The observation noise covariance of a Bayesian filter is modulated by the local value of a known or assumed potential energy function: observations are trusted when the state is near a potential minimum and progressively discounted as it approaches the barrier separating metastable wells. This physics-based mechanism differs from statistical robust filters, which treat all state-space regions identically, and from constrained filters, which bound states rather than modulating observation trust. The approach is especially relevant in non-ergodic or data-scarce settings where only a single realization is available and statistical methods alone cannot learn the noise structure. We implement gating within Extended, Unscented, Ensemble, and Adaptive Kalman filters and particle filters, requiring only two additional hyperparameters. Monte Carlo benchmarks (100 replications) on a Ginzburg-Landau double-well with 10% outlier contamination show 57-80% RMSE improvement over the standard Extended Kalman Filter, all statistically significant (p < 10^{-15}, Wilcoxon test). A naive topological baseline using only well positions achieves 57%, confirming that the continuous energy landscape adds ~21 percentage points. The method is robust to misspecification: even with 50% parameter errors, improvement never falls below 47%. Comparing externally forced and spontaneous Kramers-type transitions, gating retains 68% improvement under noise-induced transitions whereas the naive baseline degrades to 30%. As an empirical illustration, we apply the framework to Dansgaard-Oeschger events in the NGRIP delta-18O ice-core record, estimating asymmetry gamma = -0.109 (bootstrap 95% CI: [-0.220, -0.011]) and showing that outlier fraction explains 91% of the variance in filter improvement.

URL PDF HTML ☆

赞 0 踩 0

2602.11575 2026-02-17 cs.RO cs.AI cs.CV

ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles

Seungyeon Yoo, Youngseok Jang, Dabin Kim, Youngsoo Han, Seungwoo Jung, H. Jin Kim

Comments Project page: https://syeon-yoo.github.io/ready-go-site/

详情

英文摘要

Visual navigation models often struggle in real-world dynamic environments due to limited robustness to the sim-to-real gap and the difficulty of training policies tailored to target deployment environments (e.g., households, restaurants, and factories). Although real-to-sim navigation simulation using 3D Gaussian Splatting (GS) can mitigate these challenges, prior GS-based works have considered only static scenes or non-photorealistic human obstacles built from simulator assets, despite the importance of safe navigation in dynamic environments. To address these issues, we propose ReaDy-Go, a novel real-to-sim simulation pipeline that synthesizes photorealistic dynamic scenarios in target environments by augmenting a reconstructed static GS scene with dynamic human GS obstacles, and trains navigation policies using the generated datasets. The pipeline provides three key contributions: (1) a dynamic GS simulator that integrates static scene GS with a human animation module, enabling the insertion of animatable human GS avatars and the synthesis of plausible human motions from 2D trajectories, (2) a navigation dataset generation framework that leverages the simulator along with a robot expert planner designed for dynamic GS representations and a human planner, and (3) robust navigation policies to both the sim-to-real gap and moving obstacles. The proposed simulator generates thousands of photorealistic navigation scenarios with animatable human GS avatars from arbitrary viewpoints. ReaDy-Go outperforms baselines across target environments in both simulation and real-world experiments, demonstrating improved navigation performance even after sim-to-real transfer and in the presence of moving obstacles. Moreover, zero-shot sim-to-real deployment in an unseen environment indicates its generalization potential. Project page: https://syeon-yoo.github.io/ready-go-site/.

URL PDF HTML ☆

赞 0 踩 0

2602.11339 2026-02-17 cs.CV

Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content

Evgeney Bogatyrev, Khaled Abud, Ivan Molodetskikh, Nikita Alutis, Dmitriy Vatolin

2602.10556 2026-02-17 cs.RO cs.AI

LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

Lihan Zha, Asher J. Hancock, Mingtong Zhang, Tenny Yin, Yixuan Huang, Dhruv Shah, Allen Z. Ren, Anirudha Majumdar

Comments Project website: https://lap-vla.github.io

2602.10058 2026-02-17 cs.SD cs.LG eess.AS

Evaluating Disentangled Representations for Controllable Music Generation

Laura Ibáñez-Martínez, Chukwuemeka Nkama, Andrea Poltronieri, Xavier Serra, Martín Rocamora

Comments Accepted at ICASSP 2026

2602.09713 2026-02-17 cs.CV

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

Ruisi Zhao, Haoren Zheng, Zongxin Yang, Hehe Fan, Yi Yang

Comments Accepted by ICLR 2026

2602.08655 2026-02-17 cs.LG

From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism

Sarthak Wanjari

Comments 10 pages, 8 figures

2602.08449 2026-02-17 cs.AI cs.CR cs.LG

When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

Igor Santos-Grueiro

Comments Added results for Llama and new cross model analysis

详情

英文摘要

Safety evaluation for advanced AI systems assumes that behavior observed under evaluation predicts behavior in deployment. This assumption weakens for agents with situational awareness, which may exploit regime leakage, cues distinguishing evaluation from deployment, to implement conditional policies that comply under oversight while defecting in deployment-like regimes. We recast alignment evaluation as a problem of information flow under partial observability and show that divergence between evaluation-time and deployment-time behavior is bounded by the regime information extractable from decision-relevant internal representations. We study regime-blind mechanisms, training-time interventions that restrict access to regime cues through adversarial invariance constraints without assuming complete information erasure. We evaluate this approach across multiple open-weight language models and controlled failure modes including scientific sycophancy, temporal sleeper agents, and data leakage. Regime-blind training reduces regime-conditioned failures without measurable loss of task utility, but exhibits heterogeneous and model-dependent dynamics. Sycophancy shows a sharp representational and behavioral transition at moderate intervention strength, consistent with a stability cliff. In sleeper-style constructions and certain cross-model replications, suppression occurs without a clean collapse of regime decodability and may display non-monotone or oscillatory behavior as invariance pressure increases. These findings indicate that representational invariance is a meaningful but limited control lever. It can raise the cost of regime-conditioned strategies but cannot guarantee elimination or provide architecture-invariant thresholds. Behavioral evaluation should therefore be complemented with white-box diagnostics of regime awareness and internal information flow.

URL PDF HTML ☆

赞 0 踩 0

2602.07846 2026-02-17 cs.RO

System-Level Error Propagation and Tail-Risk Amplification in Reference-Based Robotic Navigation

Ning Hu, Maochen Li, Senhao Cao

Comments 13 pages, 8 figures

2602.06288 2026-02-17 cs.CV

Unsupervised MR-US Multimodal Image Registration with Multilevel Correlation Pyramidal Optimization

Jiazheng Wang, Zeyu Liu, Min Liu, Xiang Chen, Xinyao Yu, Yaonan Wang, Hang Zhang

Comments first-place method of ReMIND2Reg Learn2Reg MICCAI 2025

2602.06130 2026-02-17 cs.LG cs.AI cs.CL

Self-Improving World Modelling with Latent Actions

Yifu Qiu, Zheng Zhao, Waylon Li, Yftah Ziser, Anna Korhonen, Shay B. Cohen, Edoardo M. Ponti

2602.05847 2026-02-17 cs.AI cs.CV

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

Zhangquan Chen, Jiale Tao, Ruihuang Li, Yihao Hu, Ruitao Chen, Zhantao Yang, Xinlei Yu, Haodong Jing, Manyuan Zhang, Shuai Shao, Biao Wang, Qinglin Lu, Ruqi Huang

Comments 19 pages, 12 figures

2602.05354 2026-02-17 cs.AI

PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents

Shifat E. Arman, Syed Nazmus Sakib, Tapodhir Karmakar Taton, Nafiul Haque, Shahrear Bin Amin

Comments 35 pages, 13 figures

Journal ref Under Review in ICML 2026

2602.05119 2026-02-17 cs.LG math.OC

Unbiased Single-Queried Gradient for Combinatorial Objective

Thanawat Sornwanee

Comments 23 pages

2602.04856 2026-02-17 cs.CL

CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reasoning LLMs for Fake News Generation

Zhao Tong, Chunlin Gong, Yiping Zhang, Haichao Shi, Qiang Liu, Xingcheng Xu, Shu Wu, Xiao-Yu Zhang

Comments 28 pages, 35 figures

2602.02929 2026-02-17 cs.LG cs.AI cs.CR cs.NE

RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection

Asif Tauhid, Sidahmed Benabderrahmane, Mohamad Altrabulsi, Ahamed Foisal, Talal Rahwan

2602.01389 2026-02-17 cs.RO

Instance-Guided Unsupervised Domain Adaptation for Robotic Semantic Segmentation

Michele Antonazzi, Lorenzo Signorelli, Matteo Luperto, Nicola Basilico

Comments Accepted for publication at ICRA 2026

2601.22920 2026-02-17 cs.CV

Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment

Wulin Xie, Rui Dai, Ruidong Ding, Kaikui Liu, Xiangxiang Chu, Xinwen Hou, Jie Wen

AI 大模型

视觉与机器人

科学与医疗

Generative Latent Representations of 3D Brain MRI for Multi-Task Downstream Analysis in Down Syndrome

Explore Intrinsic Geometry for Query-based Tiny and Oriented Object Detector with Momentum-based Bipartite Matching

RGA-Net: A Vision Enhancement Framework for Robotic Surgical Systems Using Reciprocal Attention Mechanisms

FC-Vision: Real-Time Visibility-Aware Replanning for Occlusion-Free Aerial Target Structure Scanning in Unknown Environments

HybridFlow: A Two-Step Generative Policy for Robotic Manipulation

R-Diverse: Mitigating Diversity Illusion in Self-Play LLM Training

Prior-Guided Symbolic Regression: Towards Scientific Consistency in Equation Discovery

Reliable Thinking with Images

Towards explainable reference-free speech intelligibility evaluation of people with pathological speech

TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation

Pack it in: Packing into Partially Filled Containers Through Contact

VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

Potential-energy gating for robust state estimation in bistable stochastic systems

ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles

Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content

LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

Evaluating Disentangled Representations for Controllable Music Generation

Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism

When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

System-Level Error Propagation and Tail-Risk Amplification in Reference-Based Robotic Navigation

Unsupervised MR-US Multimodal Image Registration with Multilevel Correlation Pyramidal Optimization

Self-Improving World Modelling with Latent Actions

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents

Unbiased Single-Queried Gradient for Combinatorial Objective

CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reasoning LLMs for Fake News Generation

RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection

Instance-Guided Unsupervised Domain Adaptation for Robotic Semantic Segmentation

Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment