arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2502.10040 2026-02-24 cs.RO

Diffusion Trajectory-guided Policy for Long-horizon Robot Manipulation

Shichao Fan, Quantao Yang, Yajie Liu, Kun Wu, Zhengping Che, Qingjie Liu, Min Wan

Comments 8 pages, 5 figures, accepted to IEEE Robotics and Automation Letters (RAL)

Journal ref IEEE Robotics and Automation Letters (Volume: 10, Issue: 12, December 2025)

2502.09257 2026-02-24 cs.LG cs.AI stat.ML

From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

Liad Erez, Tomer Koren

2502.08941 2026-02-24 cs.LG cs.AI

Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation

Han-Dong Lim, Donghwan Lee

Comments Added experiments for n-step PVI and n-step TD convergence/divergence

2502.05795 2026-02-24 cs.LG cs.AI

The Curse of Depth in Large Language Models

Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu

Comments Accepted by NeurIPS 2025

2502.04638 2026-02-24 cs.CV cs.AI

Learning Street View Representations with Spatiotemporal Contrast

Yong Li, Yingjing Huang, Gengchen Mai, Fan Zhang

详情

DOI: 10.1016/j.compenvurbsys.2025.102393

英文摘要

Street view imagery is extensively utilized in representation learning for urban visual environments, supporting various sustainable development tasks such as environmental perception and socio-economic assessment. However, it is challenging for existing image representations to specifically encode the dynamic urban environment (such as pedestrians, vehicles, and vegetation), the built environment (including buildings, roads, and urban infrastructure), and the environmental ambiance (such as the cultural and socioeconomic atmosphere) depicted in street view imagery to address downstream tasks related to the city. In this work, we propose an innovative self-supervised learning framework that leverages temporal and spatial attributes of street view imagery to learn image representations of the dynamic urban environment for diverse downstream tasks. By employing street view images captured at the same location over time and spatially nearby views at the same time, we construct contrastive learning tasks designed to learn the temporal-invariant characteristics of the built environment and the spatial-invariant neighborhood ambiance. Our approach significantly outperforms traditional supervised and unsupervised methods in tasks such as visual place recognition, socioeconomic estimation, and human-environment perception. Moreover, we demonstrate the varying behaviors of image representations learned through different contrastive learning objectives across various downstream tasks. This study systematically discusses representation learning strategies for urban studies based on street view images, providing a benchmark that enhances the applicability of visual data in urban science. The code is available at https://github.com/yonglleee/UrbanSTCL.

URL PDF HTML ☆

赞 0 踩 0

2501.00339 2026-02-24 cs.CL cs.LG

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression

Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang, Jing Xiao

Comments EMNLP 2025(Main)

2412.17596 2026-02-24 cs.CL cs.AI

Evaluating LLMs' Divergent Thinking Capabilities for Scientific Idea Generation with Minimal Context

Kai Ruan, Xuan Wang, Jixiang Hong, Peng Wang, Yang Liu, Hao Sun

Comments Updated manuscript and title

2412.17052 2026-02-24 cs.AI

ViLBias: Detecting and Reasoning about Bias in Multimodal Content

Shaina Raza, Caesar Saleh, Azib Farooq, Emrul Hasan, Franklin Ogidi, Haad Zahid, Maximus Powers, Marcelo Lotif, Anam Zahid, Karanpal Sekhon, Veronica Chatrath, Roya Javedi, Vahid Reza Khazaie, Zhenyu Yu

Comments Under review

2412.13877 2026-02-24 cs.RO cs.AI

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

Kun Wu, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, Shichao Fan, Xinhua Wang, Fei Liao, Zhen Zhao, Guangyu Li, Zhao Jin, Lecheng Wang, Jilei Mao, Ning Liu, Pei Ren, Qiang Zhang, Yaoxu Lyu, Mengzhen Liu, Jingyang He, Yulin Luo, Zeyu Gao, Chenxuan Li, Chenyang Gu, Yankai Fu, Di Wu, Xingyu Wang, Sixiang Chen, Zhenyu Wang, Pengju An, Siyuan Qian, Shanghang Zhang, Jian Tang

Comments 21 pages, 17 figures, Robotics: Science and Systems 2025

Journal ref Robotics: Science and Systems XXI (RSS 2025)

详情

DOI: 10.15607/RSS.2025.XXI.152

英文摘要

In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, and linguistic task descriptions. To ensure data consistency and reliability for imitation learning, RoboMIND is built on a unified data collection platform and a standardized protocol, covering four distinct robotic embodiments: the Franka Emika Panda, the UR5e, the AgileX dual-arm robot, and a humanoid robot with dual dexterous hands. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction during policy learning. Additionally, we created a digital twin environment in the Isaac Sim simulator, replicating the real-world tasks and assets, which facilitates the low-cost collection of additional training data and enables efficient evaluation. To demonstrate the quality and diversity of our dataset, we conducted extensive experiments using various imitation learning methods for single-task settings and state-of-the-art Vision-Language-Action (VLA) models for multi-task scenarios. By leveraging RoboMIND, the VLA models achieved high manipulation success rates and demonstrated strong generalization capabilities. To the best of our knowledge, RoboMIND is the largest multi-embodiment teleoperation dataset collected on a unified platform, providing large-scale and high-quality robotic training data. Our project is at https://x-humanoid-robomind.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2411.17411 2026-02-24 cs.AI cs.DM cs.LG math.CO

Advancing Uncertain Combinatorics through Graphization, Hyperization, and Uncertainization: Fuzzy, Neutrosophic, Soft, Rough, and Beyond

Takaaki Fujita, Florentin Smarandache

Comments 185 pages. Published as a book (1st Edition) in 2024. Publisher: Biblio Publishing. ISBN: 978-1-59973-861-1. Published as a book (2nd Edition) in 2026

2411.02770 2026-02-24 cs.LG math.PR stat.CO stat.ML

A spectral mixture representation of isotropic kernels with application to random Fourier features

Nicolas Langrené, Xavier Warin, Pierre Gruet

Comments 27 pages, 12 figures

2411.01685 2026-02-24 cs.LG cs.CY cs.DB

Reducing Biases in Record Matching Through Scores Calibration

Mohammad Hossein Moslemi, Mostafa Milani

2411.01574 2026-02-24 cs.AI

DELE: Deductive $\mathcal{EL}^{++}$ Embeddings for Knowledge Base Completion

Olga Mashkova, Fernando Zhapa-Camacho, Robert Hoehndorf

Comments Extended version of the paper "Enhancing Geometric Ontology Embeddings for $\mathcal{EL}^{++}$ with Negative Sampling and Deductive Closure Filtering" presented at NeSy 2024 conference, revised version

2410.13331 2026-02-24 cs.LG cs.AI

Improving Discrete Optimisation Via Decoupled Straight-Through Estimator

Rushi Shah, Mingyuan Yan, Michael Curtis Mozer, Dianbo Liu

2407.10590 2026-02-24 cs.CV

Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function

Giulia Panconi, Stefano Grasso, Sara Guarducci, Lorenzo Mucchi, Diego Minciacchi, Riccardo Bravi

Journal ref Sci Rep 15(1) (2025) 2364

2405.14504 2026-02-24 cs.CV cs.AI

Adaptive Runge-Kutta Dynamics for Spatiotemporal Prediction

Xuanle Zhao, Yue Sun, Ziyi Wang, Bo Xu, Tielin Zhang

Comments Accepted by ICASSP 2026

2404.16890 2026-02-24 cs.LG cs.AI

Layer Collapse Can be Induced by Unstructured Pruning

Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione

2404.10652 2026-02-24 cs.CL

ViTextVQA: A Large-Scale Visual Question Answering Dataset and a Novel Multimodal Feature Fusion Method for Vietnamese Text Comprehension in Images

Quan Van Nguyen, Dan Quang Tran, Huy Quang Pham, Thang Kien-Bao Nguyen, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Comments International Journal of Expert Systems with Applications

2403.10996 2026-02-24 cs.RO cs.LG cs.MA

Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies

Chinmay Vilas Samak, Tanmay Vilas Samak, Venkat Narayan Krovi

Comments Accepted in IEEE Robotics and Automation Letters (RA-L) and additionally accepted to be presented at IEEE International Conference on Robotics and Automation (ICRA) 2026

Journal ref IEEE Robotics and Automation Letters, vol. 10, no. 9, pp. 9040-9047, Sept. 2025

2403.00506 2026-02-24 cs.CL

PoTeC: A German Naturalistic Eye-tracking-while-reading Corpus

Deborah N. Jakobi, Thomas Kern, David R. Reich, Patrick Haller, Lena A. Jäger

Journal ref Behav Res 57, 211 (2025)

2402.13904 2026-02-24 cs.CL

Calibrating Large Language Models with Sample Consistency

Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan, Chris Callison-Burch

Comments AAAI 2024

2402.08646 2026-02-24 cs.AI

Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data

Hiroyuki Kido

2401.08957 2026-02-24 cs.RO cs.AI

Learning from Imperfect Demonstrations with Self-Supervision for Robotic Manipulation

Kun Wu, Ning Liu, Zhen Zhao, Di Qiu, Jinming Li, Zhengping Che, Zhiyuan Xu, Jian Tang

Comments 8 pages, 4 figures

Journal ref 2025 IEEE International Conference on Robotics and Automation (ICRA)

2310.01770 2026-02-24 cs.LG cs.AI

A simple connection from loss flatness to compressed neural representations

Shirui Chen, Stefano Recanatesi, Eric Shea-Brown

2306.03584 2026-02-24 cs.CV cs.AI

RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

Haowen Wang, Zhengping Che, Yufan Yang, Mingyuan Wang, Zhiyuan Xu, Xiuquan Qiao, Mengshi Qi, Feifei Feng, Jian Tang

Comments Haowen Wang and Zhengping Che are with equal contributions. Paper accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856). arXiv admin note: text overlap with arXiv:2203.10856

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 46, Issue: 11, November 2024)

详情

DOI: 10.1109/TPAMI.2024.3388004

英文摘要

Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of incomplete depth maps imposes significant challenges for subsequent vision applications, prompting the development of numerous depth completion techniques to mitigate this problem. Numerous methods excel at reconstructing dense depth maps from sparse samples, but they often falter when faced with extensive contiguous regions of missing depth values, a prevalent and critical challenge in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps while ensuring high fidelity through cycle consistency. We fuse the two branches via adaptive fusion modules named W-AdaIN and train the model with the help of pseudo depth maps. Comprehensive evaluations on NYU-Depth V2 and SUN RGB-D datasets show that our method significantly enhances depth completion performance particularly in realistic indoor settings.

URL PDF HTML ☆

赞 0 踩 0

2305.11098 2026-02-24 cs.AI

A Simple Generative Model of Logical Reasoning and Statistical Learning

Hiroyuki Kido

2211.12817 2026-02-24 cs.CV cs.AI

Learning to See the Elephant in the Room: Self-Supervised Context Reasoning in Humans and AI

Xiao Liu, Soumick Sarker, Ankur Sikarwar, Bryan Atista Kiely, Gabriel Kreiman, Zenglin Shi, Mengmi Zhang

2210.11974 2026-02-24 cs.CV

Face Pyramid Vision Transformer

Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood

Comments Accepted in BMVC 2022

2103.00250 2026-02-24 cs.LG cs.CR

Effective Universal Unrestricted Adversarial Attacks using a MOE Approach

A. E. Baia, G. Di Bari, V. Poggioni

Journal ref Int Conf on the Applications of Evolutionary Computation, LNTCS,volume 12694, 2021

2602.19043 2026-02-24 cs.CL

Uncovering Context Reliance in Unstructured Knowledge Editing

Zisheng Zhou, Mengqi Zhang, Shiguang Wu, Xiaotian Ye, Chi Zhang, Zhumin Chen, Pengjie Ren

Comments 21 pages, 14 figures

AI 大模型

视觉与机器人

科学与医疗

Diffusion Trajectory-guided Policy for Long-horizon Robot Manipulation

From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation

The Curse of Depth in Large Language Models

Learning Street View Representations with Spatiotemporal Contrast

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression

Evaluating LLMs' Divergent Thinking Capabilities for Scientific Idea Generation with Minimal Context

ViLBias: Detecting and Reasoning about Bias in Multimodal Content

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

Advancing Uncertain Combinatorics through Graphization, Hyperization, and Uncertainization: Fuzzy, Neutrosophic, Soft, Rough, and Beyond

A spectral mixture representation of isotropic kernels with application to random Fourier features

Reducing Biases in Record Matching Through Scores Calibration

DELE: Deductive $\mathcal{EL}^{++}$ Embeddings for Knowledge Base Completion

Improving Discrete Optimisation Via Decoupled Straight-Through Estimator

Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function

Adaptive Runge-Kutta Dynamics for Spatiotemporal Prediction

Layer Collapse Can be Induced by Unstructured Pruning

ViTextVQA: A Large-Scale Visual Question Answering Dataset and a Novel Multimodal Feature Fusion Method for Vietnamese Text Comprehension in Images

Mixed-Reality Digital Twins: Leveraging the Physical and Virtual Worlds for Hybrid Sim2Real Transition of Multi-Agent Reinforcement Learning Policies

PoTeC: A German Naturalistic Eye-tracking-while-reading Corpus

Calibrating Large Language Models with Sample Consistency

Inference of Abstraction for a Unified Account of Symbolic Reasoning from Data

Learning from Imperfect Demonstrations with Self-Supervision for Robotic Manipulation

A simple connection from loss flatness to compressed neural representations

RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

A Simple Generative Model of Logical Reasoning and Statistical Learning

Learning to See the Elephant in the Room: Self-Supervised Context Reasoning in Humans and AI

Face Pyramid Vision Transformer

Effective Universal Unrestricted Adversarial Attacks using a MOE Approach

Uncovering Context Reliance in Unstructured Knowledge Editing