arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2503.12087 2026-02-09 cs.CV

Temporally Consistent Mitral Annulus Measurements from Sparse Annotations in Echocardiographic Videos

Gino E. Jansen, Mark J. Schuuring, Berto J. Bouma, Ivana Išgum

Journal ref Proc. SPIE 13406 (2025) 192-200

2503.06558 2026-02-09 cs.LG stat.ML

Generative modelling with jump-diffusions

Adrian Baule

Comments New version contains: (i) A generalized score function in closed analytical form leading to the jump-Laplace (JL) model; (ii) Additional numerical experiments comparing JL ODE/SDE, Gaussian ODE, and Levy-Ito-Model SDE

2502.13064 2026-02-09 cs.SD

A Dual-Stage Time-Context Network for Speech-Based Alzheimer's Disease Detection

Yifan Gao, Long Guo, Hong Liu

2502.10273 2026-02-09 cs.CV cs.AI

Probing Perceptual Constancy in Large Vision-Language Models

Haoran Sun, Bingyang Wang, Suyang Yu, Yijiang Li, Qingying Gao, Haiyun Lyu, Lianyu Huang, Zelong Hong, Jiahui Ge, Qianli Ma, Hang He, Yifan Zhou, Lingzi Guo, Lantao Mei, Maijunxian Wang, Dezhi Luo, Hokin Deng

Comments Under Review

2502.01989 2026-02-09 cs.LG

VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model

Tao Zhang, Jia-Shu Pan, Ruiqi Feng, Tailin Wu

Comments ICLR 2026. 30 pages, 13 figures

2501.16093 2026-02-09 cs.CL cs.AI

STAR: Stepwise Task Augmentation with Relation Learning for Aspect Sentiment Quad Prediction

Wenna Lai, Haoran Xie, Guandong Xu, Qing Li

Comments 17 pages, 6 figures, and 7 tables

2501.07681 2026-02-09 cs.LG cs.CV math.OC stat.ML

Dataset Distillation as Pushforward Optimal Quantization

Hong Ye Tan, Emma Slade

Comments ICLR 2026, https://openreview.net/forum?id=FMSp8AUF3m

2412.13486 2026-02-09 cs.CV cs.CL cs.GR

T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Synthesis in Controllable Concept Art Generation

Zhenhong Sun, Yifu Wang, Yonhon Ng, Yongzhi Xu, Daoyi Dong, Hongdong Li, Pan Ji

Comments https://openreview.net/forum?id=lyn2BgKQ8F

2411.18212 2026-02-09 cs.LG cs.AI cs.RO cs.SY eess.SY

SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins

Aladin Djuhera, Amin Seffo, Vlad C. Andrei, Holger Boche, Walid Saad

Journal ref Asilomar Conference on Signals, Systems, and Computers, 2025

2411.08010 2026-02-09 cs.CL cs.AI

ExpressivityBench: Can LLMs Communicate Implicitly?

Joshua Tint, Som Sagar, Aditya Taparia, Kelly Raines, Bimsara Pathiraja, Caleb Liu, Ransalu Senanayake

Comments 21 pages, 7 figures

2411.04285 2026-02-09 cs.LG cs.AI

Robust Real-Time Mortality Prediction in the Intensive Care Unit using Temporal Difference Learning

Thomas Frost, Kezhi Li, Steve Harris

Comments To be published in the Proceedings of the 4th Machine Learning for Health symposium, Proceedings of Machine Learning Research (PMLR)

Journal ref Proc. Mach. Learn. Res. 259:350-363 (2025)

2410.09771 2026-02-09 cs.CV cs.AI cs.LG

EUGens: Efficient, Unified, and General Dense Layers

Sang Min Kim, Byeongchan Kim, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Rahul Kidambi, Dongseok Shim, Avinava Dubey, Snigdha Chaturvedi, Min-hwan Oh, Krzysztof Choromanski

Comments Neurips 2025

2410.07790 2026-02-09 cs.CV

Enhancing Hyperspectral Image Prediction with Contrastive Learning in Low-Label Regime

Salma Haidar, José Oramas

Journal ref Applied Intelligence, vol. 56, article 74 (2026)

详情

DOI: 10.1007/s10489-025-07071-3

英文摘要

Self-supervised contrastive learning is an effective approach for addressing the challenge of limited labelled data. This study builds upon the previously established two-stage patch-level, multi-label classification method for hyperspectral remote sensing imagery. We evaluate the method's performance for both the single-label and multi-label classification tasks, particularly under scenarios of limited training data. The methodology unfolds in two stages. Initially, we focus on training an encoder and a projection network using a contrastive learning approach. This step is crucial for enhancing the ability of the encoder to discern patterns within the unlabelled data. Next, we employ the pre-trained encoder to guide the training of two distinct predictors: one for multi-label and another for single-label classification. Empirical results on four public datasets show that the predictors trained with our method perform better than those trained under fully supervised techniques. Notably, the performance is maintained even when the amount of training data is reduced by $50\%$. This advantage is consistent across both tasks. The method's effectiveness comes from its streamlined architecture. This design allows for retraining the encoder along with the predictor. As a result, the encoder becomes more adaptable to the features identified by the classifier, improving the overall classification performance. Qualitative analysis reveals the contrastive-learning-based encoder's capability to provide representations that allow separation among classes and identify location-based features despite not being explicitly trained for that. This observation indicates the method's potential in uncovering implicit spatial information within the data.

URL PDF HTML ☆

赞 0 踩 0

2410.04010 2026-02-09 cs.LG cs.AI cs.CL cs.NE

Hyperbolic Fine-Tuning for Large Language Models

Menglin Yang, Ram Samarth B B, Aosong Feng, Bo Xiong, Jihong Liu, Irwin King, Rex Ying

Comments NeurIPS 2025; https://github.com/marlin-codes/HypLoRA

2408.04567 2026-02-09 cs.CV cs.GR

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Zhenhong Sun, Yang Li, Pan Ji, Hongdong Li

Comments Project Page: https://xrvisionlabs.github.io/Sketch2Scene/ Code: https://github.com/Tencent/Triplet_Tuning

2407.19992 2026-02-09 cs.CV

High-Precision Edge Detection via Task-Adaptive Texture Handling and Ideal-Prior Guidance

Hao Shu

Comments 30 pages

2407.17835 2026-02-09 cs.LG math.CT math.DG math.MG

IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrations

Lukas Silvester Barth, Fatemeh, Fahimi, Parvaneh Joharinad, Jürgen Jost, Janis Keck

Journal ref Proceedings of the AAAI Conference on Artificial Intelligence, 39(17), 17699-17706 (2025)

2407.14069 2026-02-09 cs.CV

Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective

Zeen Song, Wenwen Qiang, Changwen Zheng, Hui Xiong, Gang Hua

详情

英文摘要

Video contrastive learning (V-CL) has emerged as a popular framework for unsupervised video representation learning, demonstrating strong results in tasks such as action classification and detection. Yet, to harness these benefits, it is critical for the learned representations to fully capture both static and dynamic semantics. However, our experiments show that existing V-CL methods fail to effectively learn either type of feature. Through a rigorous theoretical analysis based on the Structural Causal Model and gradient update, we find that in a given dataset, certain static semantics consistently co-occur with specific dynamic semantics. This phenomenon creates spurious correlations between static and dynamic semantics in the dataset. However, existing V-CL methods do not differentiate static and dynamic similarities when computing sample similarity. As a result, learning only one type of semantics is sufficient for the model to minimize the contrastive loss. Ultimately, this causes the V-CL pre-training process to prioritize learning the easier-to-learn semantics. To address this limitation, we propose Bi-level Optimization with Decoupling for Video Contrastive Learning. (BOD-VCL). In BOD-VCL, we model videos as linear dynamical systems based on Koopman theory. In this system, all frame-to-frame transitions are represented by a linear Koopman operator. By performing eigen-decomposition on this operator, we can separate time-variant and time-invariant components of semantics, which allows us to explicitly separate the static and dynamic semantics in the video. By modeling static and dynamic similarity separately, both types of semantics can be fully exploited during the V-CL training process. BOD-VCL can be seamlessly integrated into existing V-CL frameworks, and experimental results highlight the significant improvements achieved by our method.

URL PDF HTML ☆

赞 0 踩 0

2407.06605 2026-02-09 cs.RO

Robust Meta-Learning of Vehicle Yaw Rate Dynamics via Conditional Neural Processes

Lars Ullrich, Andreas Völz, Knut Graichen

Comments Published in 2023 62nd IEEE IEEE Conference on Decision and Control (CDC), Singapore, Singapore, December 13 - 15, 2023

Journal ref 2023 62nd IEEE IEEE Conference on Decision and Control (CDC), Singapore, Singapore, December 13 - 15, 2023, pp. 322--327

2405.16895 2026-02-09 cs.CV

Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation

Liang Shi, Jie Zhang, Shiguang Shan

Comments Accepted by IJCV

2404.09657 2026-02-09 cs.RO cs.LG

Sampling for Model Predictive Trajectory Planning in Autonomous Driving using Normalizing Flows

Georg Rabenstein, Lars Ullrich, Knut Graichen

Comments Accepted to be published as part of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Shinhwa World, Jeju Island, Korea, June 2-5, 2024

2402.09004 2026-02-09 cs.CV cs.LG

STAG: Structural Test-time Alignment of Gradients for Online Adaptation

Juhyeon Shin, Yujin Oh, Jonghyun Lee, Saehyung Lee, Minjun Park, Dongjun Lee, Uiwon Hwang, Sungroh Yoon

2106.10823 2026-02-09 cs.CV

3D Object Detection for Autonomous Driving: A Survey

Rui Qian, Xin Lai, Xirong Li

Comments The manuscript is accepted by Pattern Recognition on 14 May 2022

2104.10330 2026-02-09 cs.CV

BADet: Boundary-Aware 3D Object Detection from Point Clouds

Rui Qian, Xin Lai, Xirong Li

Comments The manuscript is accepted by Pattern Recognition on 6 Jan, 2022

详情

DOI: 10.1016/j.patcog.2022.108524

英文摘要

Currently, existing state-of-the-art 3D object detectors are in two-stage paradigm. These methods typically comprise two steps: 1) Utilize a region proposal network to propose a handful of high-quality proposals in a bottom-up fashion. 2) Resize and pool the semantic features from the proposed regions to summarize RoI-wise representations for further refinement. Note that these RoI-wise representations in step 2) are considered individually as uncorrelated entries when fed to following detection headers. Nevertheless, we observe these proposals generated by step 1) offset from ground truth somehow, emerging in local neighborhood densely with an underlying probability. Challenges arise in the case where a proposal largely forsakes its boundary information due to coordinate offset while existing networks lack corresponding information compensation mechanism. In this paper, we propose $BADet$ for 3D object detection from point clouds. Specifically, instead of refining each proposal independently as previous works do, we represent each proposal as a node for graph construction within a given cut-off threshold, associating proposals in the form of local neighborhood graph, with boundary correlations of an object being explicitly exploited. Besides, we devise a lightweight Region Feature Aggregation Module to fully exploit voxel-wise, pixel-wise, and point-wise features with expanding receptive fields for more informative RoI-wise representations. We validate BADet both on widely used KITTI Dataset and highly challenging nuScenes Dataset. As of Apr. 17th, 2021, our BADet achieves on par performance on KITTI 3D detection leaderboard and ranks $1^{st}$ on $Moderate$ difficulty of $Car$ category on KITTI BEV detection leaderboard. The source code is available at https://github.com/rui-qian/BADet.

URL PDF HTML ☆

赞 0 踩 0

2602.06294 2026-02-09 cs.RO

Robots That Generate Planarity Through Geometry

Jakub F. Kowalewski, Abdulaziz O. Alrashed, Jacob Alpert, Rishi Ponnapalli, Lucas R. Meza, Jeffrey Ian Lipton

2602.06291 2026-02-09 cs.CL

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Hyunwoo Ko, Amit Agarwal, Sunghee Ahn, Kyong-Ha Lee, Youngjae Yu

Comments Preprint

2602.06287 2026-02-09 cs.LG cs.AI physics.ao-ph

Toward generative machine learning for boosting ensembles of climate simulations

Parsa Gooya, Reinel Sospedra-Alfonso, Johannes Exenberger

Comments SI_Toward_generative_machine_learning_for_boosting_the_ensembles_size_of_climate_simulation.pdf contains Supplementary Information

2602.06285 2026-02-09 cs.CV

MMEarth-Bench: Global Model Adaptation via Multimodal Test-Time Training

Lucia Gordon, Serge Belongie, Christian Igel, Nico Lang

2602.06273 2026-02-09 cs.RO cs.HC

A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

Harsh Chhajed, Tian Guo

2602.06271 2026-02-09 cs.SD eess.AS

Misophonia Trigger Sound Detection on Synthetic Soundscapes Using a Hybrid Model with a Frozen Pre-Trained CNN and a Time-Series Module

Kurumi Sashida, Gouhei Tanaka

Comments 13 pages, 3 figures. Submitted to IJCNN 2026

详情

英文摘要

Misophonia is a disorder characterized by a decreased tolerance to specific everyday sounds (trigger sounds) that can evoke intense negative emotional responses such as anger, panic, or anxiety. These reactions can substantially impair daily functioning and quality of life. Assistive technologies that selectively detect trigger sounds could help reduce distress and improve well-being. In this study, we investigate sound event detection (SED) to localize intervals of trigger sounds in continuous environmental audio as a foundational step toward such assistive support. Motivated by the scarcity of real-world misophonia data, we generate synthetic soundscapes tailored to misophonia trigger sound detection using audio synthesis techniques. Then, we perform trigger sound detection tasks using hybrid CNN-based models. The models combine feature extraction using a frozen pre-trained CNN backbone with a trainable time-series module such as gated recurrent units (GRUs), long short-term memories (LSTMs), echo state networks (ESNs), and their bidirectional variants. The detection performance is evaluated using common SED metrics, including Polyphonic Sound Detection Score 1 (PSDS1). On the multi-class trigger SED task, bidirectional temporal modeling consistently improves detection performance, with Bidirectional GRU (BiGRU) achieving the best overall accuracy. Notably, the Bidirectional ESN (BiESN) attains competitive performance while requiring orders of magnitude fewer trainable parameters by optimizing only the readout. We further simulate user personalization via a few-shot "eating sound" detection task with at most five support clips, in which BiGRU and BiESN are compared. In this strict adaptation setting, BiESN shows robust and stable performance, suggesting that lightweight temporal modules are promising for personalized misophonia trigger SED.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Temporally Consistent Mitral Annulus Measurements from Sparse Annotations in Echocardiographic Videos

Generative modelling with jump-diffusions

A Dual-Stage Time-Context Network for Speech-Based Alzheimer's Disease Detection

Probing Perceptual Constancy in Large Vision-Language Models

VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model

STAR: Stepwise Task Augmentation with Relation Learning for Aspect Sentiment Quad Prediction

Dataset Distillation as Pushforward Optimal Quantization

T$^3$-S2S: Training-free Triplet Tuning for Sketch to Scene Synthesis in Controllable Concept Art Generation

SCoTT: Strategic Chain-of-Thought Tasking for Wireless-Aware Robot Navigation in Digital Twins

ExpressivityBench: Can LLMs Communicate Implicitly?

Robust Real-Time Mortality Prediction in the Intensive Care Unit using Temporal Difference Learning

EUGens: Efficient, Unified, and General Dense Layers

Enhancing Hyperspectral Image Prediction with Contrastive Learning in Low-Label Regime

Hyperbolic Fine-Tuning for Large Language Models

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

High-Precision Edge Detection via Task-Adaptive Texture Handling and Ideal-Prior Guidance

IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrations

Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective

Robust Meta-Learning of Vehicle Yaw Rate Dynamics via Conditional Neural Processes

Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation

Sampling for Model Predictive Trajectory Planning in Autonomous Driving using Normalizing Flows

STAG: Structural Test-time Alignment of Gradients for Online Adaptation

3D Object Detection for Autonomous Driving: A Survey

BADet: Boundary-Aware 3D Object Detection from Point Clouds

Robots That Generate Planarity Through Geometry

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Toward generative machine learning for boosting ensembles of climate simulations

MMEarth-Bench: Global Model Adaptation via Multimodal Test-Time Training

A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

Misophonia Trigger Sound Detection on Synthetic Soundscapes Using a Hybrid Model with a Frozen Pre-Trained CNN and a Time-Series Module