arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.04194 2026-03-05 cs.LG cs.AI

Noise-aware Client Selection for carbon-efficient Federated Learning via Gradient Norm Thresholding

Patrick Wilhelm, Inese Yilmaz, Odej Kao

详情

Journal ref: HotCarbon2025

英文摘要

Training large-scale Neural Networks requires substantial computational power and energy. Federated Learning enables distributed model training across geospatially distributed data centers, leveraging renewable energy sources to reduce the carbon footprint of AI training. Various client selection strategies have been developed to align the volatility of renewable energy with stable and fair model training in a federated system. However, due to the privacy-preserving nature of Federated Learning, the quality of data on client devices remains unknown, posing challenges for effective model training. In this paper, we introduce a modular approach on top to state-of-the-art client selection strategies for carbon-efficient Federated Learning. Our method enhances robustness by incorporating a noisy client data filtering, improving both model performance and sustainability in scenarios with unknown data quality. Additionally, we explore the impact of carbon budgets on model convergence, balancing efficiency and sustainability. Through extensive evaluations, we demonstrate that modern client selection strategies based on local client loss tend to select clients with noisy data, ultimately degrading model performance. To address this, we propose a gradient norm thresholding mechanism using probing rounds for more effective client selection and noise detection, contributing to the practical deployment of carbon-efficient Federated Learning.

URL PDF HTML ☆

赞 0 踩 0

2603.04191 2026-03-05 cs.AI

Towards Realistic Personalization: Evaluating Long-Horizon Preference Following in Personalized User-LLM Interactions

Qianyun Guo, Yibo Li, Yue Liu, Bryan Hooi

2603.04181 2026-03-05 cs.LG

REDNET-ML: A Multi-Sensor Machine Learning Pipeline for Harmful Algal Bloom Risk Detection Along the Omani Coast

Ameer Alhashemi

Comments 11 pages

2603.04180 2026-03-05 cs.LG cs.AI

Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection

Jay Noon

Comments 17 pages, 15 figures

2603.04166 2026-03-05 cs.RO cs.LG

Learning Hip Exoskeleton Control Policy via Predictive Neuromusculoskeletal Simulation

Ilseung Park, Changseob Song, Inseung Kang

2603.04163 2026-03-05 cs.CV

Degradation-based augmented training for robust individual animal re-identification

Thanos Polychronou, Lukáš Adam, Viktor Penchev, Kostas Papafitsoros

2603.04158 2026-03-05 cs.RO cs.AI

GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning

Mingleyang Li, Yuran Wang, Yue Chen, Tianxing Chen, Jiaqi Liang, Zishun Shen, Haoran Lu, Ruihai Wu, Hao Dong

Comments ICRA2026 Accepted

2603.04146 2026-03-05 cs.CV

LISTA-Transformer Model Based on Sparse Coding and Attention Mechanism and Its Application in Fault Diagnosis

Shuang Liu, Lina Zhao, Tian Wang, Huaqing Wang

Comments 14 pages, 14 figures, conference paper

2603.04145 2026-03-05 cs.CL cs.NE

VietNormalizer: An Open-Source, Dependency-Free Python Library for Vietnamese Text Normalization in TTS and NLP Applications

Hung Vu Nguyen, Loan Do, Thanh Ngoc Nguyen, Ushik Shrestha Khwakhali, Thanh Pham, Vinh Do, Charlotte Nguyen, Hien Nguyen

Comments 10 pages, 1 table

详情

英文摘要

We present VietNormalizer1, an open-source, zero-dependency Python library for Vietnamese text normalization targeting Text-to-Speech (TTS) and Natural Language Processing (NLP) applications. Vietnamese text normalization is a critical yet underserved preprocessing step: real-world Vietnamese text is densely populated with non-standard words (NSWs), including numbers, dates, times, currency amounts, percentages, acronyms, and foreign-language terms, all of which must be converted to fully pronounceable Vietnamese words before TTS synthesis or downstream language processing. Existing Vietnamese normalization tools either require heavy neural dependencies while covering only a narrow subset of NSW classes, or are embedded within larger NLP toolkits without standalone installability. VietNormalizer addresses these gaps through a unified, rule-based pipeline that: (1) converts arbitrary integers, decimals, and large numbers to Vietnamese words; (2) normalizes dates and times to their spoken Vietnamese forms; (3) handles VND and USD currency amounts; (4) expands percentages; (5) resolves acronyms via a customizable CSV dictionary; (6) transliterates non-Vietnamese loanwords and foreign terms to Vietnamese phonetic approximations; and (7) performs Unicode normalization and emoji/special-character removal. All regular expression patterns are pre-compiled at initialization, enabling high-throughput batch processing with minimal memory overhead and no GPU or external API dependency. The library is installable via pip install vietnormalizer, available on PyPI and GitHub at https://github.com/nghimestudio/vietnormalizer, and released under the MIT license. We discuss the design decisions, limitations of existing approaches, and the generalizability of the rule-based normalization paradigm to other low-resource tonal and agglutinative languages.

URL PDF HTML ☆

赞 0 踩 0

2603.04144 2026-03-05 cs.RO cs.CV

HBRB-BoW: A Retrained Bag-of-Words Vocabulary for ORB-SLAM via Hierarchical BRB-KMeans

Minjae Lee, Sang-Min Choi, Gun-Woo Kim, Suwon Lee

2603.04142 2026-03-05 cs.LG

A Multi-Agent Framework for Interpreting Multivariate Physiological Time Series

Davide Gabrielli, Paola Velardi, Stefano Faralli, Bardh Prenkaj

详情

英文摘要

Continuous physiological monitoring is central to emergency care, yet deploying trustworthy AI is challenging. While LLMs can translate complex physiological signals into clinical narratives, it is unclear how agentic systems perform relative to zero-shot inference. To address these questions, we present Vivaldi, a role-structured multi-agent system that explains multivariate physiological time series. Due to regulatory constraints that preclude live deployment, we instantiate Vivaldi in a controlled, clinical pilot to a small, highly qualified cohort of emergency medicine experts, whose evaluations reveal a context-dependent picture that contrasts with prevailing assumptions that agentic reasoning uniformly improves performance. Our experiments show that agentic pipelines substantially benefit non-thinking and medically fine-tuned models, improving expert-rated explanation justification and relevance by +6.9 and +9.7 points, respectively. Contrarily, for thinking models, agentic orchestration often degrades explanation quality, including a 14-point drop in relevance, while improving diagnostic precision (ESI F1 +3.6). We also find that explicit tool-based computation is decisive for codifiable clinical metrics, whereas subjective targets, such as pain scores and length of stay, show limited or inconsistent changes. Expert evaluation further indicates that gains in clinical utility depend on visualization conventions, with medically specialized models achieving the most favorable trade-offs between utility and clarity. Together, these findings show that the value of agentic AI lies in the selective externalization of computation and structure rather than in maximal reasoning complexity, and highlight concrete design trade-offs and learned lessons, broadly applicable to explainable AI in safety-critical healthcare settings.

URL PDF HTML ☆

赞 0 踩 0

2603.04135 2026-03-05 cs.LG cs.AI

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization

Haodong Zhu, Yangyang Ren, Yanjing Li, Mingbao Lin, Linlin Yang, Xuhui Liu, Xiantong Zhen, Haiguang Liu, Baochang Zhang

Comments 20 pages, 4 figures

2603.04134 2026-03-05 cs.LG

InstMeter: An Instruction-Level Method to Predict Energy and Latency of DL Model Inference on MCUs

Hao Liu, Qing Wang, Marco Zuniga

Comments 17 pages

2603.04132 2026-03-05 cs.LG

Two-Stage Photovoltaic Forecasting: Separating Weather Prediction from Plant-Characteristics

Philipp Danner, Hermann de Meer

2603.04130 2026-03-05 cs.CV

Mask-Guided Attention Regulation for Anatomically Consistent Counterfactual CXR Synthesis

Zichun Zhang, Weizhi Nie, Honglin Guo, Yuting Su

2603.04128 2026-03-05 cs.CV cs.AI cs.MM

Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Dongnuan Cai, Henghui Du, Chang Zhou, Xi Chen, Dan Guo, Hongyuan Zhang, Xuelong Li, Di Hu

详情

英文摘要

Developing Audio-Visual Large Language Models (AV-LLMs) for unified scene understanding is pivotal in multimodal intelligence. While instruction tuning enables pre-trained models with multi-task abilities, we observe that conventional multi-task unification methods often suffer from severe negative transfer, where nearly 55% of tasks degrade compared to single-task training. We attribute this phenomenon to audio-visual task heterogeneity, characterized by disparate task granularity and divergent capability demands, which lead to negative interference under joint training. To tackle this, we present Crab$^{+}$, a scalable and unified audio-visual scene understanding model that addresses task heterogeneity through explicit cooperation from both data and model perspectives. On the data side, we introduce AV-UIE v2, a comprehensive Audio-Visual Unified Instruction-tuning dataset with Explicit reasoning processes. It contains approximately 222K samples spanning 17 datasets and 7 tasks, enabling the model to capture cross-task relationships at different levels of granularity. On the model side, we design a unified interface to align heterogeneous task formulations, and propose Interaction-aware LoRA (I-LoRA), which explicitly models inter-task relationships via dynamic routing to coordinate distinct audio-visual interaction patterns, mitigating parameter interference. Extensive experiments show Crab$^{+}$ covers broader tasks than existing unified models while outperforming specialized models on various benchmarks. We successfully reverse the negative transfer trend, achieving positive transfer where multi-task learning surpasses single-task baselines in nearly 88% of tasks. These results hold across diverse AV-LLM paradigms and are validated through in-depth visualization, positioning Crab$^{+}$ as a robust step towards holistic audio-visual scene understanding.

URL PDF HTML ☆

赞 0 踩 0

2603.04127 2026-03-05 cs.LG cs.AI

Data-Aware Random Feature Kernel for Transformers

Amirhossein Farzam, Hossein Mobahi, Nolan Andrew Miller, Luke Sernau

2603.04124 2026-03-05 cs.AI cond-mat.mtrl-sci cs.CL cs.LG

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

Tarjei Paule Hage, Markus J. Buehler

2603.04123 2026-03-05 cs.CL

FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation

Juhyun Oh, Nayeon Lee, Chani Jung, Jiho Jin, Junho Myung, Jongwon Lee, Taeui Song, Alice Oh

Comments Accepted to EACL 2026 Findings

2603.04122 2026-03-05 cs.SD cs.LG

FastWave: Optimized Diffusion Model for Audio Super-Resolution

Nikita Kuznetsov, Maksim Kaledin

2603.04118 2026-03-05 cs.RO

Modeling and Control of a Pneumatic Soft Robotic Catheter Using Neural Koopman Operators

Yiyao Yue, Noah Barnes, Lingyun Di, Olivia Young, Ryan D. Sochol, Jeremy D. Brown, Axel Krieger

Comments 8 pages, 6 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

2603.04117 2026-03-05 cs.LG

When to restart? Exploring escalating restarts on convergence

Ayush K. Varshney, Šarūnas Girdzijauskas, Konstantinos Vandikas, Aneta Vulgarakis Feljan

Comments Paper accepted in Sci4DL workshop in ICLR 2026. https://openreview.net/forum?id=18Yf2KKIn0

2603.04115 2026-03-05 cs.CV

TextBoost: Boosting Scene Text Fidelity in Ultra-low Bitrate Image Compression

Bingxin Wang, Yuan Lan, Zhaoyi Sun, Yang Xiang, Jie Sun

2603.04113 2026-03-05 cs.CV cs.AI

Understanding Sources of Demographic Predictability in Brain MRI via Disentangling Anatomy and Contrast

Mehmet Yigit Avci, Akshit Achara, Andrew King, Jorge Cardoso

2603.04099 2026-03-05 cs.CV cs.AI

Efficient Point Cloud Processing with High-Dimensional Positional Encoding and Non-Local MLPs

Yanmei Zou, Hongshan Yu, Yaonan Wang, Zhengeng Yang, Xieyuanli Chen, Kailun Yang, Naveed Akhtar

Comments Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Source code is available at https://github.com/zouyanmei/HPENet_v2.git

2603.04098 2026-03-05 cs.CV cs.HC

Real Eyes Realize Faster: Gaze Stability and Pupil Novelty for Efficient Egocentric Learning

Ajan Subramanian, Sumukh Bettadapura, Rohan Sathish

Comments 14 pages, 4 figures, 3 tables, plus supplementary material

2603.04093 2026-03-05 cs.LG physics.app-ph physics.comp-ph physics.data-an

Reducing hyperparameter sensitivity in measurement-feedback based Ising machines

Toon Sevenants, Guy Van der Sande, Guy Verschaffelt

Comments 15 pages, 11 figures

2603.04091 2026-03-05 cs.CV

CLIP-Guided Multi-Task Regression for Multi-View Plant Phenotyping

Simon Warmers, Muhammad Zawish, Fayaz Ali Dharejo, Steven Davy, Radu Timofte

Comments Under review at IEEE Conference

2603.04090 2026-03-05 cs.CV cs.GR cs.HC

EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR

Zhenyu Li, Sai Kumar Dwivedi, Filip Maric, Carlos Chacon, Nadine Bertsch, Filippo Arcadu, Tomas Hodan, Michael Ramamonjisoa, Peter Wonka, Amy Zhao, Robin Kips, Cem Keskin, Anastasia Tkach, Chenhongyi Yang

Comments Accepted to CVPR 2026

2603.04083 2026-03-05 cs.CL

Hindsight Quality Prediction Experiments in Multi-Candidate Human-Post-Edited Machine Translation

Malik Marmonier, Benoît Sagot, Rachel Bawden

Comments Accepted to the 2026 Language Resources and Evaluation Conference (LREC)