arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2510.03346 2026-02-24 cs.LG cs.AI cs.MA

KVComm: Enabling Efficient LLM Communication through Selective KV Sharing

Xiangyu Shi, Marco Chiesa, Gerald Q. Maguire, Dejan Kostic

Comments ICLR 2026

2510.02808 2026-02-24 cs.RO cs.SY eess.SY

Assist-as-needed Control for FES in Foot Drop Management

Andreas Christou, Elliot Lister, Georgia Andreopoulou, Don Mahad, Sethu Vijayakumar

2510.01650 2026-02-24 cs.LG cs.AI

The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM

Kwanhee Lee, Hyeondo Jang, Dongyeop Lee, Dan Alistarh, Namhoon Lee

Comments ICLR 2026

2510.00523 2026-02-24 cs.AI cs.CV

VIRTUE: Visual-Interactive Text-Image Universal Embedder

Wei-Yao Wang, Kazuya Tateishi, Qiyu Wu, Shusuke Takahashi, Yuki Mitsufuji

Comments ICLR 2026. 25 pages. Project page: https://sony.github.io/virtue/

2509.26308 2026-02-24 cs.RO

Anomaly detection for generic failure monitoring in robotic assembly, screwing and manipulation

Niklas Grambow, Lisa-Marie Fenner, Felipe Kempkes, Philip Hotz, Dingyuan Wan, Jörg Krüger, Kevin Haninger

Comments 8 pages, 5 figures, 4 tables, the paper has been accepted for publication in the IEEE Robotics and Automation Letters

详情

DOI: 10.1109/LRA.2026.3664647

英文摘要

Out-of-distribution states in robot manipulation often lead to unpredictable robot behavior or task failure, limiting success rates and increasing risk of damage. Anomaly detection (AD) can identify deviations from expected patterns in data, which can be used to trigger failsafe behaviors and recovery strategies. Prior work has applied data-driven AD on time series data for specific robotic tasks, however the transferability of an AD approach between different robot control strategies and task types has not been shown. Leveraging time series data, such as force/torque signals, allows to directly capture robot-environment interactions, crucial for manipulation and online failure detection. As robotic tasks can have widely signal characteristics and requirements, AD methods which can be applied in the same way to a wide range of tasks is needed, ideally with good data efficiency. We examine three industrial robotic tasks, robotic cabling, screwing, and sanding, each with multi-modal time series data and several anomalies. Several autoencoderbased methods are compared, and we evaluate the generalization across different robotic tasks and control methods (diffusion policy-, position-, and impedance-controlled). This allows us to validate the integration of AD in complex tasks involving tighter tolerances and variation from both the robot and its environment. Additionally, we evaluate data efficiency, detection latency, and task characteristics which support robust detection. The results indicate reliable detection with AUROC exceeding 0.96 in failures in the cabling and screwing task, such as incorrect or misaligned parts and obstructed targets. In the polishing task, only severe failures were reliably detected, while more subtle failure types remained undetected.

URL PDF HTML ☆

赞 0 踩 0

2509.26287 2026-02-24 cs.CV cs.LG

Flower: A Flow-Matching Solver for Inverse Problems

Mehrsa Pourya, Bassam El Rawas, Michael Unser

2509.24228 2026-02-24 cs.LG

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

Wei Wang, Dong-Dong Wu, Ming Li, Jingxiong Zhang, Gang Niu, Masashi Sugiyama

Comments ICLR 2026

2509.23106 2026-02-24 cs.LG

Effective Quantization of Muon Optimizer States

Aman Gupta, Rafael Celente, Abhishek Shivanna, D. T. Braithwaite, Gregory Dexter, Shao Tang, Hiroto Udagawa, Daniel Silva, Rohan Ramanath, S. Sathiya Keerthi

Comments 16 pages

2509.22876 2026-02-24 cs.CL cs.LG

HEART: Emotionally-Driven Test-Time Scaling of Language Models

Gabriela Pinto, Palash Goyal, Mihir Parmar, Yiwen Song, Souradip Chakraborty, Zifeng Wang, Jinsung Yoon, Tomas Pfister, Hamid Palangi

2509.22387 2026-02-24 cs.LG cs.AI cs.GT

SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly

Narada Maugin, Tristan Cazenave

Comments Accepted at Advances in Computer Games (ACG) 2025, LNCS (Springer)

2509.21990 2026-02-24 cs.CV cs.SD

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

Changli Tang, Qinfan Xiao, Ke Mei, Tianyi Wang, Fengyun Rao, Chao Zhang

2509.21896 2026-02-24 cs.AI

GenesisGeo: Technical Report

Minfeng Zhu, Zi Wang, Sizhe Ji, Zhengtong Du, Shengqiang Tai, Junming Ke, Xiao Deng, Zanlang Yin, Xiuqi Huang, Heyu Wang, Wei Chen

2509.21730 2026-02-24 cs.CL

ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation

Jiho Kim, Junseong Choi, Woosog Chay, Daeun Kyung, Yeonsu Kwon, Yohan Jo, Edward Choi

Comments Accepted at ICLR 2026

2509.18949 2026-02-24 cs.LG cs.AI

Towards Privacy-Aware Bayesian Networks: A Credal Approach

Niccolò Rocchi, Fabio Stella, Cassio de Campos

Comments Accepted at ECAI2025 conference, 20 pages, 1 figure

详情

DOI: 10.3233/faia251419

英文摘要

Bayesian networks (BN) are probabilistic graphical models that enable efficient knowledge representation and inference. These have proven effective across diverse domains, including healthcare, bioinformatics and economics. The structure and parameters of a BN can be obtained by domain experts or directly learned from available data. However, as privacy concerns escalate, it becomes increasingly critical for publicly released models to safeguard sensitive information in training data. Typically, released models do not prioritize privacy by design. In particular, tracing attacks from adversaries can combine the released BN with auxiliary data to determine whether specific individuals belong to the data from which the BN was learned. State-of-the-art protection tecniques involve introducing noise into the learned parameters. While this offers robust protection against tracing attacks, it significantly impacts the model's utility, in terms of both the significance and accuracy of the resulting inferences. Hence, high privacy may be attained at the cost of releasing a possibly ineffective model. This paper introduces credal networks (CN) as a novel solution for balancing the model's privacy and utility. After adapting the notion of tracing attacks, we demonstrate that a CN enables the masking of the learned BN, thereby reducing the probability of successful attacks. As CNs are obfuscated but not noisy versions of BNs, they can achieve meaningful inferences while safeguarding privacy. Moreover, we identify key learning information that must be concealed to prevent attackers from recovering the underlying BN. Finally, we conduct a set of numerical experiments to analyze how privacy gains can be modulated by tuning the CN hyperparameters. Our results confirm that CNs provide a principled, practical, and effective approach towards the development of privacy-aware probabilistic graphical models.

URL PDF HTML ☆

赞 0 踩 0

2509.15886 2026-02-24 cs.CV

RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation

Paul Julius Kühn, Duc Anh Nguyen, Arjan Kuijper, Saptarshi Neil Sinha

详情

英文摘要

Point cloud segmentation is central to autonomous driving and 3D scene understanding. While voxel- and point-based methods dominate recent research due to their compatibility with deep architectures and ability to capture fine-grained geometry, they often incur high computational cost, irregular memory access, and limited real-time efficiency. In contrast, range-view methods, though relatively underexplored - can leverage mature 2D semantic segmentation techniques for fast and accurate predictions. Motivated by the rapid progress in Visual Foundation Models (VFMs) for captioning, zero-shot recognition, and multimodal tasks, we investigate whether SAM2, the current state-of-the-art VFM for segmentation tasks, can serve as a strong backbone for LiDAR point cloud segmentation in the range view. We present , to our knowledge, the first range-view framework that adapts SAM2 to 3D segmentation, coupling efficient 2D feature extraction with standard projection/back-projection to operate on point clouds. To optimize SAM2 for range-view representations, we implement several architectural modifications to the encoder: (1) a novel module that emphasizes horizontal spatial dependencies inherent in LiDAR range images, (2) a customized configuration of tailored to the geometric properties of spherical projections, and (3) an adapted mechanism in the encoder backbone specifically designed to capture the unique spatial patterns and discontinuities present in range-view pseudo-images. Our approach achieves competitive performance on SemanticKITTI while benefiting from the speed, scalability, and deployment simplicity of 2D-centric pipelines. This work highlights the viability of VFMs as general-purpose backbones for 3D perception and opens a path toward unified, foundation-model-driven LiDAR segmentation. Results lets us conclude that range-view segmentation methods using VFMs leads to promising results.

URL PDF HTML ☆

赞 0 踩 0

2509.12746 2026-02-24 cs.CV

Modelling and analysis of the 8 filters from the "master key filters hypothesis" for depthwise-separable deep networks in relation to idealized receptive fields based on scale-space theory

Tony Lindeberg, Zahra Babaiee, Peyman M. Kiasari

Comments 25 pages, 5 figures, 11 tables

详情

英文摘要

This paper presents the results of analysing and modelling a set of 8 ``master key filters'', which have been extracted by applying a clustering approach to the receptive fields learned in depthwise-separable deep networks based on the ConvNeXt architecture. For this purpose, we first compute spatial spread measures in terms of weighted mean values and weighted variances of the absolute values of the learned filters, which support the working hypotheses that: (i) the learned filters can be modelled by separable filtering operations over the spatial domain, and that (ii) the spatial offsets of the those learned filters that are non-centered are rather close to half a grid unit. Then, we model the clustered ``master key filters'' in terms of difference operators applied to a spatial smoothing operation in terms of the discrete analogue of the Gaussian kernel, and demonstrate that the resulting idealized models of the receptive fields show good qualitative similarity to the learned filters. This modelling is performed in two different ways: (i) using possibly different values of the scale parameters in the coordinate directions for each filter, and (ii) using the same value of the scale parameter in both coordinate directions. Then, we perform the actual model fitting by either (i) requiring spatial spread measures in terms of spatial variances of the absolute values of the receptive fields to be equal, or (ii) minimizing the discrete $l_1$- or $l_2$-norms between the idealized receptive field models and the learned filters. Complementary experimental results then demonstrate the idealized models of receptive fields have good predictive properties for replacing the learned filters by idealized filters in depthwise-separable deep networks, thus showing that the learned filters in depthwise-separable deep networks can be well approximated by discrete scale-space filters.

URL PDF HTML ☆

赞 0 踩 0

2509.06685 2026-02-24 cs.CV

MOGS: Monocular Object-guided Gaussian Splatting in Large Scenes

Shengkai Zhang, Yuhe Liu, Jianhua He, Xuedou Xiao, Mozi Chen, Kezhong Liu

2509.04575 2026-02-24 cs.LG

Bootstrapping Task Spaces for Self-Improvement

Minqi Jiang, Andrei Lupu, Yoram Bachrach

Comments TMLR, February 2026

2509.03830 2026-02-24 cs.AI cs.CV cs.CY

Decoding Tourist Perception in Historic Urban Quarters with Multimodal Social Media Data: An AI-Based Framework and Evidence from Shanghai

Kaizhen Tan, Yufan Wu, Yuxuan Liu, Haoran Zeng

2508.14734 2026-02-24 cs.LG cs.AI

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition

Valter Schütz, Han Wu, Reza Rezvan, Linus Aronsson, Morteza Haghir Chehreghani

2508.11915 2026-02-24 cs.CL cs.AI cs.LG

CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

Punya Syon Pandey, Yongjin Yang, Jiarui Liu, Zhijing Jin

Comments EACL 2026 (Main)

2508.08540 2026-02-24 cs.LG

Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems

Jihyun Lim, Junhyuk Jo, Chanhyeok Ko, Young Min Go, Jimin Hwa, Sunwoo Lee

2508.06199 2026-02-24 cs.LG cs.AI

Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning

Mateusz Praski, Jakub Adamczyk, Wojciech Czech

2508.03111 2026-02-24 cs.LG cs.AI

GEDAN: Learning the Edit Costs for Graph Edit Distance

Francesco Leonardi, Markus Orsi, Jean-Louis Reymond, Kaspar Riesen

2507.23465 2026-02-24 cs.CL cs.AI

Role-Aware Language Models for Secure and Contextualized Access Control in Organizations

Saeed Almheiri, Yerulan Kongrat, Adrian Santosh, Ruslan Tasmukhanov, Josemaria Loza Vera, Muhammad Dehan Al Kautsar, Fajri Koto

Comments AACL 2025 - Main

2507.19418 2026-02-24 cs.CV cs.AI eess.IV

DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment

Yiwei Lou, Yuanpeng He, Rongchao Zhang, Yongzhi Cao, Hanpin Wang, Yu Huang

2507.11230 2026-02-24 cs.CL

Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages

Lyzander Marciano Andrylie, Inaya Rahmanisa, Mahardika Krisna Ihsani, Alfan Farizki Wicaksono, Haryo Akbarianto Wibowo, Alham Fikri Aji

2507.10065 2026-02-24 cs.CV

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Chenguo Lin, Yuchen Lin, Panwang Pan, Yifan Yu, Tao Hu, Honglei Yan, Katerina Fragkiadaki, Yadong Mu

Comments Project page: https://chenguolin.github.io/projects/MoVieS; Accepted to CVPR 2026

2507.05992 2026-02-24 cs.CV cs.AI

Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge

Xin Wu, Fei Teng, Yue Feng, Kaibo Shi, Zhuosheng Lin, Ji Zhang, James Wang

Comments Accepted by IEEE Transactions on Multimedia

2507.04446 2026-02-24 cs.LG

Sampling-aware Adversarial Attacks Against Large Language Models

Tim Beyer, Yan Scholten, Leo Schwinn, Stephan Günnemann

AI 大模型

视觉与机器人

科学与医疗

KVComm: Enabling Efficient LLM Communication through Selective KV Sharing

Assist-as-needed Control for FES in Foot Drop Management

The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM

VIRTUE: Visual-Interactive Text-Image Universal Embedder

Anomaly detection for generic failure monitoring in robotic assembly, screwing and manipulation

Flower: A Flow-Matching Solver for Inverse Problems

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

Effective Quantization of Muon Optimizer States

HEART: Emotionally-Driven Test-Time Scaling of Language Models

SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

GenesisGeo: Technical Report

ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation

Towards Privacy-Aware Bayesian Networks: A Credal Approach

RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation

Modelling and analysis of the 8 filters from the "master key filters hypothesis" for depthwise-separable deep networks in relation to idealized receptive fields based on scale-space theory

MOGS: Monocular Object-guided Gaussian Splatting in Large Scenes

Bootstrapping Task Spaces for Self-Improvement

Decoding Tourist Perception in Historic Urban Quarters with Multimodal Social Media Data: An AI-Based Framework and Evidence from Shanghai

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition

CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems

Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning

GEDAN: Learning the Edit Costs for Graph Edit Distance

Role-Aware Language Models for Secure and Contextualized Access Control in Organizations

DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment

Sparse Autoencoders Can Capture Language-Specific Concepts Across Diverse Languages

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge

Sampling-aware Adversarial Attacks Against Large Language Models