arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2504.02519 2026-03-24 cs.CV cs.LG

Tiny Neural Networks for Multi-Object Tracking in a Modular Kalman Framework

Christian Alexander Holz, Christian Bader, Markus Enzweiler, Matthias Drüppel

详情

英文摘要

We present a modular, production-ready approach that integrates compact Neural Network (NN) into a Kalmanfilter-based Multi-Object Tracking (MOT) pipeline. We design three tiny task-specific networks to retain modularity, interpretability and eal-time suitability for embedded Automotive Driver Assistance Systems: (i) SPENT (Single-Prediction Network) - predicts per-track states and replaces heuristic motion models used by the Kalman Filter (KF). (ii) SANT (Single-Association Network) - assigns a single incoming sensor object to existing tracks, without relying on heuristic distance and association metrics. (iii) MANTa (Multi-Association Network) - jointly associates multiple sensor objects to multiple tracks in a single step. Each module has less than 50k trainable parameters. Furthermore, all three can be operated in real-time, are trained from tracking data, and expose modular interfaces so they can be integrated with standard Kalman-filter state updates and track management. This makes them drop-in compatible with many existing trackers. Modularity is ensured, as each network can be trained and evaluated independently of the others. Our evaluation on the KITTI tracking benchmark shows that SPENT reduces prediction RMSE by more than 50% compared to a standard Kalman filter, while SANT and MANTa achieve up to 95% assignment accuracy. These results demonstrate that small, task-specific neural modules can substantially improve tracking accuracy and robustness without sacrificing modularity, interpretability, or the real-time constraints required for automotive deployment.

URL PDF HTML ☆

赞 0 踩 0

2503.18007 2026-03-24 cs.CV

SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance

Hongyu Yan, Zijun Li, Kunming Luo, Li Lu, Ping Tan

Comments Accepted by AAAI 2025 (Oral presentation), Code: https://github.com/HongyuYann/SymmCompletion

2503.13617 2026-03-24 cs.CV

Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization

Hao Li, Yubin Xiao, Ke Liang, Mengzhu Wang, Long Lan, Kenli Li, Xinwang Liu

Comments 26 pages, 10 figures (Accepted by IJCV)

2503.13074 2026-03-24 cs.CV

Bridging the Perception Gap in Image Super-Resolution Evaluation

Shaolin Su, Josep M. Rocafort, Danna Xue, David Serrano-Lozano, Lei Sun, Javier Vazquez-Corral

Comments Accepted to CVPR 2026

2503.03110 2026-03-24 cs.LG cs.CV

WarmFed: Federated Learning with Warm-Start for Globalization and Personalization Via Personalized Diffusion Models

Tao Feng, Jie Zhang, Xiangjian Li, Rong Huang, Huashan Liu, Zhijie Wang

2503.01886 2026-03-24 cs.CL cs.AI q-fin.RM

Advanced Deep Learning Techniques for Analyzing Earnings Call Transcripts: Methodologies and Applications

Umair Zakir, Evan Daykin, Amssatou Diagne, Jacob Faile

2502.13777 2026-03-24 cs.LG eess.SP

Herglotz-NET: Implicit Neural Representation of Spherical Data with Harmonic Positional Encoding

Théo Hanon, Nicolas Mil-Homens Cavaco, John Kiely, Laurent Jacques

Comments Keywords: Herglotz, spherical harmonics, spectral analysis, implicit neural representation. Remarks: 4 pages + 1 reference page, 4 figures (In Proc. SAMPTA2025, Vienna)

2502.09125 2026-03-24 cs.CV cs.AI

Enhanced Structured Lasso Pruning with Class-wise Information

Xiang Liu, Mingchen Li, Xia Li, Leigang Qu, Guansu Wang, Zifan Peng, Yijun Song, Zemin Liu, Linshan Jiang, Jialin Li

Comments 11 pages, 3 figures

2502.00931 2026-03-24 cs.RO cs.CV

VL-Nav: A Neuro-Symbolic Approach for Reasoning-based Vision-Language Navigation

Yi Du, Taimeng Fu, Zhipeng Zhao, Shaoshu Su, Zitong Zhan, Qiwei Du, Zhuoqun Chen, Bowen Li, Chen Wang

2412.11590 2026-03-24 cs.RO cs.SY eess.SY

A Real-Time System for Scheduling and Managing UAV Delivery in Urban Areas

Han Liu, Tian Liu, Kai Huang

Comments ROBIO 2025

2412.09686 2026-03-24 cs.LG

The Cost of Replicability in Active Learning

Rupkatha Hira, Dominik Kau, Jessica Sorrell

2412.08757 2026-03-24 cs.RO

Vision-based indoor localization of nano drones in controlled environment with its applications

Simranjeet Singh, Amit Kumar, Fayyaz Pocker Chemban, Vikrant Fernandes, Lohit Penubaku, Kavi Arya

Comments 26 pages. Submitted to Cyber-Physical Systems journal

2412.01583 2026-03-24 cs.CV

3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting

Ziyang Yan, Yihua Shao, Minwen Liao, Siyu Chen, Nan Wang, Muyuan Lin, Jenq-Neng Hwang, Hao Zhao, Fabio Remondino, Lei Li

Comments Accepted by WACV 2026, Project Page: https://ziyangyan.github.io/3DSceneEditor

2411.17292 2026-03-24 cs.CV cs.LG

TPCL: Task Progressive Curriculum Learning for Robust Visual Question Answering

Ahmed Akl, Abdelwahed Khamis, Zhe Wang, Ali Cheraghian, Sara Khalifa, Kewen Wang

Comments Our source code is available at https://github.com/AhmedAAkl/tpcl

2411.10495 2026-03-24 cs.CV

Training-Free Layout-to-Image Generation with Marginal Attention Constraints

Huancheng Chen, Jingtao Li, Weiming Zhuang, Haris Vikalo, Lingjuan Lyu

2411.04822 2026-03-24 cs.CL

Shared Heritage, Distinct Writing: Rethinking Resource Selection for East Asian Historical Documents

Seyoung Song, Haneul Yoo, Jiho Jin, Kyunghyun Cho, Alice Oh

Comments IJCNLP-AACL 2025 Findings

2411.01023 2026-03-24 cs.LG cs.AI cs.DB

Capturing and Anticipating User Intents in Data Analytics via Knowledge Graphs

Gerard Pons, Besim Bilalli, Anna Queralt

Comments Pre-print submitted to Knowledge-Based Systems

详情

DOI: 10.1016/j.knosys.2026.115835
Journal ref: Knowledge-Based Systems, 2026

英文摘要

In today's data-driven world, the ability to extract meaningful information from data is becoming essential for businesses, organizations and researchers alike. For that purpose, a wide range of tools and systems exist addressing data-related tasks, from data integration, preprocessing and modeling, to the interpretation and evaluation of the results. As data continues to grow in volume, variety, and complexity, there is an increasing need for advanced but user-friendly tools, such as intelligent discovery assistants (IDAs) or automated machine learning (AutoML) systems, that facilitate the user's interaction with the data. This enables non-expert users, such as citizen data scientists, to leverage powerful data analytics techniques effectively. The assistance offered by IDAs or AutoML tools should not be guided only by the analytical problem's data but should also be tailored to each individual user. To this end, this work explores the usage of Knowledge Graphs (KG) as a basic framework for capturing in a human-centered manner complex analytics workflows, by storing information not only about the workflow's components, datasets and algorithms but also about the users, their intents and their feedback, among others. The data stored in the generated KG can then be exploited to provide assistance (e.g., recommendations) to the users interacting with these systems. To accomplish this objective, two methods are explored in this work. Initially, the usage of query templates to extract relevant information from the KG is studied. However, upon identifying its main limitations, the usage of link prediction with knowledge graph embeddings is explored, which enhances flexibility and allows leveraging the entire structure and components of the graph. The experiments show that the proposed method is able to capture the graph's structure and to produce sensible suggestions.

URL PDF HTML ☆

赞 0 踩 0

2410.18529 2026-03-24 cs.CL

Instructional Text Across Disciplines: A Survey of Representations, Downstream Tasks, and Open Challenges Toward Capable AI Agents

Abdulfattah Safa, Tamta Kapanadze, Arda Uzunoğlu, Gözde Gül Şahin

Comments Pre-CoLI print. Accepted for publication in Computational Linguistics (MIT Press). Advance online publication. March 2026

2409.12739 2026-03-24 cs.CL

Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models

Peiyi Zhang, Yazhou Zhang, Bo Wang, Lu Rong, Prayag Tiwari, Jing Qin

Comments The authors are withdrawing this paper to make substantial revisions and improvements before future submission

2406.03736 2026-03-24 cs.LG cs.CL

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, Chongxuan Li

2404.12352 2026-03-24 cs.CV

Point-In-Context: Understanding Point Cloud via In-Context Learning

Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Deheng Ye, Xiangtai Li, Chen Change Loy

Comments Project page: https://fanglaosi.github.io/Point-In-Context_Pages. arXiv admin note: text overlap with arXiv:2306.08659

详情

英文摘要

The rise of large-scale models has catalyzed in-context learning as a powerful approach for multitasking, particularly in natural language and image processing. However, its application to 3D point cloud tasks has been largely unexplored. In this paper, we introduce Point-In-Context (PIC), a pioneering framework for 3D point cloud understanding that leverages in-context learning with a standard transformer architecture. PIC uniquely enables the execution of multiple tasks after a single, unified training phase, eliminating the need for fine-tuning. To extend masked point modeling to 3D in-context learning, we introduce a Joint Sampling module, a simple yet effective technique that emphasizes the mapping relationship between input and target. PIC treats both inputs and targets as coordinate-based, addressing the segmentation challenge by associating label points with pre-defined XYZ coordinates for each category. However, relying on such fixed label-coordinate assignments limits the model's ability to generalize to unseen domains. To address this limitation, we further propose two innovative training strategies: In-Context Labeling and In-Context Enhancing. These strategies are integrated into PIC++, which enhances dynamic in-context labeling and model training. Besides its multitask capability, PIC++ demonstrates generalization across part segmentation datasets by employing dynamic in-context labels and regular in-context pairs. Remarkably, PIC++, trained once without fine-tuning, can generalize effectively to unseen datasets and perform novel part segmentation through customized prompts. Overall, PIC is a general framework that seamlessly integrates additional tasks or datasets through a unified data format via in-context learning. Extensive experiments substantiate PIC's versatility and adaptability in handling diverse tasks and segmenting multiple datasets simultaneously.

URL PDF HTML ☆

赞 0 踩 0

2402.13876 2026-03-24 cs.CV

Scene Prior Filtering for Depth Super-Resolution

Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Guangwei Gao, Ying Tai, Jian Yang

Comments Accepted to IJCV 2026

2402.01304 2026-03-24 cs.CV

Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

Hao Li, Wei Wang, Cong Wang, Zhigang Luo, Xinwang Liu, Kenli Li, Xiaochun Cao

Comments 16 pages, 7 figures

2401.04317 2026-03-24 cs.CV cs.CL

WiFi-GEN: High-Resolution Indoor Imaging from WiFi Signals Using Generative AI

Jianyang Shi, Bowen Zhang, Amartansh Dubey, Ross Murch, Liwen Jing

2312.17251 2026-03-24 cs.CV cond-mat.mtrl-sci cs.LG

MatSegNet: a New Boundary-aware Deep Learning Model for Accurate Carbide Precipitate Analysis in High-Strength Steels

Xiaohan Bie, Manoj Arthanari, Evelin Barbosa de Melo, Baihua Ren, Juancheng Li, Nicolas Brodusch, Stephen Yue, Salim Brahimi, Raynald Gauvin, Jun Song

2312.02246 2026-03-24 cs.CV cs.AI cs.LG stat.ML

Conditional Variational Diffusion Models

Gabriel della Maggiora, Luis Alberto Croquevielle, Nikita Deshpande, Harry Horsley, Thomas Heinis, Artur Yakimovich

Comments Denoising Diffusion Probabilistic Models, Inverse Problems, Generative Models, Super Resolution, Phase Quantification, Variational Methods

2308.05629 2026-03-24 cs.LG

Inhibitor Transformers and Gated RNNs for Torus Efficient Fully Homomorphic Encryption

Rickard Brännvall, Tony Zhang, Henrik Forsgren, Andrei Stoian, Fredrik Sandin, Marcus Liwicki

Comments 10 pages, 8 tables, 2 figures. Consolidated manuscript based on prior workshop contributions

2304.03997 2026-03-24 cs.LG cs.AI

Predicting Short Term Energy Demand in Smart Grid: A Deep Learning Approach for Integrating Renewable Energy Sources in Line with SDGs 7, 9, and 13

Md Saef Ullah Miah, Junaida Sulaiman, Md. Imamul Islam, Md. Masuduzzaman, Molla Shahadat Hossain Lipu, Ramdhan Nugraha

详情

DOI: 10.7717/peerj-cs.2819
Journal ref: PeerJ Computer Science 11:e2819 (2025)

英文摘要

Integrating renewable energy sources into the power grid is becoming increasingly important as the world moves towards a more sustainable energy future in line with SDG 7. However, the intermittent nature of renewable energy sources can make it challenging to manage the power grid and ensure a stable supply of electricity, which is crucial for achieving SDG 9. In this paper, we propose a deep learning model for predicting energy demand in a smart power grid, which can improve the integration of renewable energy sources by providing accurate predictions of energy demand. Our approach aligns with SDG 13 on climate action, enabling more efficient management of renewable energy resources. We use long short-term memory networks, well-suited for time series data, to capture complex patterns and dependencies in energy demand data. The proposed approach is evaluated using four historical short-term energy demand data datasets from different energy distribution companies, including American Electric Power, Commonwealth Edison, Dayton Power and Light, and Pennsylvania-New Jersey-Maryland Interconnection. The proposed model is compared with three other state-of-the-art forecasting algorithms: Facebook Prophet, Support Vector Regression, and Random Forest Regression. The experimental results show that the proposed REDf model can accurately predict energy demand with a mean absolute error of 1.4%, indicating its potential to enhance the stability and efficiency of the power grid and contribute to achieving SDGs 7, 9, and 13. The proposed model also has the potential to manage the integration of renewable energy sources effectively.

URL PDF HTML ☆

赞 0 踩 0

2106.02493 2026-03-24 cs.LG eess.SP

Homological Time Series Analysis of Sensor Signals from Power Plants

Luciano Melodia, Richard Lenz

Comments Code available at https://codeberg.org/Jiren/TwirlFlake

1803.01024 2026-03-24 cs.LG cs.DB

PRESISTANT: Learning based assistant for data pre-processing

Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet, Robert Wrembel