arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2365
专题追踪
2602.18591 2026-02-24 cs.LG

Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning

Afiya Ayman, Ayan Mukhopadhyay, Aron Laszka

详情
英文摘要

A fundamental problem in multi-task learning (MTL) is identifying groups of tasks that should be learned together. Since training MTL models for all possible combinations of tasks is prohibitively expensive for large task sets, a crucial component of efficient and effective task grouping is predicting whether a group of tasks would benefit from learning together, measured as per-task performance gain over single-task learning. In this paper, we propose ETAP (Ensemble Task Affinity Predictor), a scalable framework that integrates principled and data-driven estimators to predict MTL performance gains. First, we consider the gradient-based updates of shared parameters in an MTL model to measure the affinity between a pair of tasks as the similarity between the parameter updates based on these tasks. This linear estimator, which we call affinity score, naturally extends to estimating affinity within a group of tasks. Second, to refine these estimates, we train predictors that apply non-linear transformations and correct residual errors, capturing complex and non-linear task relationships. We train these predictors on a limited number of task groups for which we obtain ground-truth gain values via multi-task learning for each group. We demonstrate on benchmark datasets that ETAP improves MTL gain prediction and enables more effective task grouping, outperforming state-of-the-art baselines across diverse application domains.

2602.18585 2026-02-24 cs.CV cs.AI

BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants

Safwat Nusrat, Prithwiraj Bhattacharjee

Comments Accepted for publication in 7th International Conference on Trends in Computational and Cognitive Engineering (TCCE-2025)

详情
英文摘要

Precise localization and recognition of flowers are crucial for advancing automated agriculture, particularly in plant phenotyping, crop estimation, and yield monitoring. This paper benchmarks several YOLO architectures such as YOLOv5s, YOLOv8n/s/m, and YOLOv12n for flower object detection under two annotation regimes: single-image single-bounding box (SISBB) and single-image multiple-bounding box (SIMBB). The FloralSix dataset, comprising 2,816 high-resolution photos of six different flower species, is also introduced. It is annotated for both dense (clustered) and sparse (isolated) scenarios. The models were evaluated using Precision, Recall, and Mean Average Precision (mAP) at IoU thresholds of 0.5 (mAP@0.5) and 0.5-0.95 (mAP@0.5:0.95). In SISBB, YOLOv8m (SGD) achieved the best results with Precision 0.956, Recall 0.951, mAP@0.5 0.978, and mAP@0.5:0.95 0.865, illustrating strong accuracy in detecting isolated flowers. With mAP@0.5 0.934 and mAP@0.5:0.95 0.752, YOLOv12n (SGD) outperformed the more complicated SIMBB scenario, proving robustness in dense, multi-object detection. Results show how annotation density, IoU thresholds, and model size interact: recall-optimized models perform better in crowded environments, whereas precision-oriented models perform best in sparse scenarios. In both cases, the Stochastic Gradient Descent (SGD) optimizer consistently performed better than alternatives. These density-sensitive sensors are helpful for non-destructive crop analysis, growth tracking, robotic pollination, and stress evaluation.

2602.18583 2026-02-24 cs.CL cs.AI cs.LG

Luna-2: Scalable Single-Token Evaluation with Small Language Models

Vatsal Goel, Rishon Dsouza, Nikhil Ega, Amey Ramesh Rambatla, Rob Friel, Shuai Shao, Yash Sheth

详情
英文摘要

Real-time guardrails require evaluation that is accurate, cheap, and fast - yet today's default, LLM-as-a-judge (LLMAJ), is slow, expensive, and operationally non-deterministic due to multi-token generation. We present Luna-2, a novel architecture that leverages decoder-only small language models (SLMs) into a deterministic evaluation model to reliably compute complex task-specific LLMAJ metrics (e.g. toxicity, hallucination, tool selection quality, etc.) at an accuracy at par or higher than LLMAJ using frontier LLMs while drastically reducing the cost and latency of computation. Each metric is implemented as a lightweight LoRA/PEFT head on top of a shared SLM backbone, enabling hundreds of specialized metrics to run concurrently on a single GPU, deployable locally next to AI systems in a privacy-preserving and latency optimizing manner. Across content safety and hallucination benchmarks, Luna-2 matches the accuracy of state-of-the-art LLM-based evaluators while reducing inference cost by over 80x and latency by over 20x. In this paper, we outline the model architecture, training methodology and report real-world empirical results on accuracy, latency, and throughput results. In production, Luna-2 is protecting 100M+ AI sessions and processing over 100B tokens per month for our customers with eval cost savings of over $30M annually.

2602.18582 2026-02-24 cs.AI cs.CL cs.HC cs.LG

Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

Zhiqin Qian, Ryan Diaz, Sangwon Seo, Vaibhav Unhelkar

Comments Extended version of an identically-titled paper accepted at AAMAS 2026

详情
英文摘要

When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed. As AI agents tackle increasingly complex tasks, aligning their behavior with human-provided specifications becomes critical for responsible AI deployment. Reward design provides a direct channel for such alignment by translating human expectations into reward functions that guide reinforcement learning (RL). However, existing methods are often too limited to capture nuanced human preferences that arise in long-horizon tasks. Hence, we introduce Hierarchical Reward Design from Language (HRDL): a problem formulation that extends classical reward design to encode richer behavioral specifications for hierarchical RL agents. We further propose Language to Hierarchical Rewards (L2HR) as a solution to HRDL. Experiments show that AI agents trained with rewards designed via L2HR not only complete tasks effectively but also better adhere to human specifications. Together, HRDL and L2HR advance the research on human-aligned AI agents.

2602.18581 2026-02-24 cs.LG cond-mat.stat-mech physics.soc-ph

Learning Beyond Optimization: Stress-Gated Dynamical Regime Regulation in Autonomous Systems

Sheng Ran

详情
英文摘要

Despite their apparent diversity, modern machine learning methods can be reduced to a remarkably simple core principle: learning is achieved by continuously optimizing parameters to minimize or maximize a scalar objective function. This paradigm has been extraordinarily successful for well-defined tasks where goals are fixed and evaluation criteria are explicit. However, if artificial systems are to move toward true autonomy-operating over long horizons and across evolving contexts-objectives may become ill-defined, shifting, or entirely absent. In such settings, a fundamental question emerges: in the absence of an explicit objective function, how can a system determine whether its ongoing internal dynamics are productive or pathological? And how should it regulate structural change without external supervision? In this work, we propose a dynamical framework for learning without an explicit objective. Instead of minimizing external error signals, the system evaluates the intrinsic health of its own internal dynamics and regulates structural plasticity accordingly. We introduce a two-timescale architecture that separates fast state evolution from slow structural adaptation, coupled through an internally generated stress variable that accumulates evidence of persistent dynamical dysfunction. Structural modification is then triggered not continuously, but as a state-dependent event. Through a minimal toy model, we demonstrate that this stress-regulated mechanism produces temporally segmented, self-organized learning episodes without reliance on externally defined goals. Our results suggest a possible route toward autonomous learning systems capable of self-assessment and internally regulated structural reorganization.

2602.18572 2026-02-24 cs.LG q-fin.ST

Sub-City Real Estate Price Index Forecasting at Weekly Horizons Using Satellite Radar and News Sentiment

Baris Arat, Hasan Fehmi Ates, Emre Sefer

详情
英文摘要

Reliable real estate price indicators are typically published at city level and low frequency, limiting their use for neighborhood-scale monitoring and long-horizon planning. We study whether sub-city price indices can be forecasted at weekly frequency by combining physical development signals from satellite radar with market narratives from news text. Using over 350,000 transactions from Dubai Land Department (2015-2025), we construct weekly price indices for 19 sub-city regions and evaluate forecasts from 2 to 34 weeks ahead. Our framework fuses regional transaction history with Sentinel-1 SAR backscatter, news sentiment combining lexical tone and semantic embeddings, and macroeconomic context. Results are strongly horizon dependent: at horizons up to 10 weeks, price history alone matches multimodal configurations, but beyond 14 weeks sentiment and SAR become critical. At long horizons (26-34 weeks), the full multimodal model reduces mean absolute error from 4.48 to 2.93 (35% reduction), with gains statistically significant across regions. Nonparametric learners consistently outperform deep architectures in this data regime. These findings establish benchmarks for weekly sub-city index forecasting and demonstrate that remote sensing and news sentiment materially improve predictability at strategically relevant horizons.

2602.18569 2026-02-24 cs.RO cs.SY eess.SY

Design and Biomechanical Evaluation of a Lightweight Low-Complexity Soft Bilateral Ankle Exoskeleton

Josée Mallah, Zakii Javed, Zafer Azak, Thomas Stone, Luigi G. Occhipinti

详情
英文摘要

Many people could benefit from exoskeleton assistance during gait, for either medical or nonmedical purposes. But exoskeletons bring added mass and structure, which in turn require compensating for. In this work, we present a lightweight, low-complexity, soft bilateral ankle exoskeleton for plantarflexion assistance, with a shoe attachment design that can be mounted on top of any pair of shoes. Experimental tests show no significant difference in lower limb kinematics and kinetics when wearing the exoskeleton in zero-torque mode relative to not wearing an exoskeleton, showing that our device does not obstruct healthy gait, and proving it as a compliant and comfortable device, promising to provide effective assistance. Hence, a control system was developed, and additional tests are underway.

2602.18540 2026-02-24 cs.CV cs.AI

Rodent-Bench

Thomas Heap, Laurence Aitchison, Emma Cahill, Adriana Casado Rodriguez

详情
英文摘要

We present Rodent-Bench, a novel benchmark designed to evaluate the ability of Multimodal Large Language Models (MLLMs) to annotate rodent behaviour footage. We evaluate state-of-the-art MLLMs, including Gemini-2.5-Pro, Gemini-2.5-Flash and Qwen-VL-Max, using this benchmark and find that none of these models perform strongly enough to be used as an assistant for this task. Our benchmark encompasses diverse datasets spanning multiple behavioral paradigms including social interactions, grooming, scratching, and freezing behaviors, with videos ranging from 10 minutes to 35 minutes in length. We provide two benchmark versions to accommodate varying model capabilities and establish standardized evaluation metrics including second-wise accuracy, macro F1, mean average precision, mutual information, and Matthew's correlation coefficient. While some models show modest performance on certain datasets (notably grooming detection), overall results reveal significant challenges in temporal segmentation, handling extended video sequences, and distinguishing subtle behavioral states. Our analysis identifies key limitations in current MLLMs for scientific video annotation and provides insights for future model development. Rodent-Bench serves as a foundation for tracking progress toward reliable automated behavioral annotation in neuroscience research.

2602.18535 2026-02-24 cs.SD cs.AI

Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS

Arianna Francesconi, Zhixiang Dai, Arthur Stefano Moscheni, Himesh Morgan Perera Kanattage, Donato Cappetta, Fabio Rebecchi, Paolo Soda, Valerio Guarrasi, Rosa Sicilia, Mary-Anne Hartley

Comments 7 pages, 1 figure. Submitted to Pattern Recognition Letters

详情
英文摘要

Voice-based digital biomarkers can enable scalable, non-invasive screening and monitoring of Parkinson's disease (PD) and Amyotrophic Lateral Sclerosis (ALS). However, models trained on one cohort or device often fail on new acquisition settings due to cross-device and cross-cohort domain shift. This challenge is amplified in real-world scenarios with partial-label mismatch, where datasets may contain different disease labels and only partially overlap in class space. In addition, voice-based models may exploit demographic cues, raising concerns about gender-related unfairness, particularly when deployed across heterogeneous cohorts. To tackle these challenges, we propose a hybrid framework for unified three-class (healthy/PD/ALS) cross-domain voice classification from partially overlapping cohorts. The method combines style-based domain generalization with conditional adversarial alignment tailored to partial-label settings, reducing negative transfer. An additional adversarial gender branch promotes gender-invariant representations. We conduct a comprehensive evaluation across four heterogeneous sustained-vowel datasets, spanning distinct acquisition settings and devices, under both domain generalization and unsupervised domain adaptation protocols. The proposed approach is compared against twelve state-of-the-art machine learning and deep learning methods, and further evaluated through three targeted ablations, providing the first cross-cohort benchmark and end-to-end domain-adaptive framework for unified healthy/PD/ALS voice classification under partial-label mismatch and fairness constraints. Across all experimental settings, our method consistently achieves the best external generalization over the considered evaluation metrics, while maintaining reduced gender disparities. Notably, no competing method shows statistically significant gains in external performance.

2602.18533 2026-02-24 cs.CV

Morphological Addressing of Identity Basins in Text-to-Image Diffusion Models

Andrew Fraser

详情
英文摘要

We demonstrate that morphological pressure creates navigable gradients at multiple levels of the text-to-image generative pipeline. In Study~1, identity basins in Stable Diffusion 1.5 can be navigated using morphological descriptors -- constituent features like platinum blonde,'' beauty mark,'' and 1950s glamour'' -- without the target's name or photographs. A self-distillation loop (generating synthetic images from descriptor prompts, then training a LoRA on those outputs) achieves consistent convergence toward a specific identity as measured by ArcFace similarity. The trained LoRA creates a local coordinate system shaping not only the target identity but also its inverse: maximal away-conditioning produces eldritch'' structural breakdown in base SD1.5, while the LoRA-equipped model produces ``uncanny valley'' outputs -- coherent but precisely wrong. In Study~2, we extend this to prompt-level morphology. Drawing on phonestheme theory, we generate 200 novel nonsense words from English sound-symbolic clusters (e.g., \emph{cr-}, \emph{sn-}, \emph{-oid}, \emph{-ax}) and find that phonestheme-bearing candidates produce significantly more visually coherent outputs than random controls (mean Purity@1 = 0.371 vs.\ 0.209, p<0.00001p < 0.00001 p<0.00001, Cohen's d=0.55d = 0.55 d=0.55). Three candidates -- \emph{snudgeoid}, \emph{crashax}, and \emph{broomix} -- achieve perfect visual consistency (Purity@1 = 1.0) with zero training data contamination, each generating a distinct, coherent visual identity from phonesthetic structure alone. Together, these studies establish that morphological structure -- whether in feature descriptors or prompt-level phonological form -- creates systematic navigational gradients through diffusion model latent spaces. We document phase transitions in identity basins, CFG-invariant identity stability, and novel visual concepts emerging from sub-lexical sound patterns.

2602.18531 2026-02-24 cs.LG cs.AI cs.DC

Deep Reinforcement Learning for Optimizing Energy Consumption in Smart Grid Systems

Abeer Alsheikhi, Amirfarhad Farhadi, Azadeh Zamanifar

Comments arXiv admin note: text overlap with arXiv:2510.17380 by other authors

详情
英文摘要

The energy management problem in the context of smart grids is inherently complex due to the interdependencies among diverse system components. Although Reinforcement Learning (RL) has been proposed for solving Optimal Power Flow (OPF) problems, the requirement for iterative interaction with an environment often necessitates computationally expensive simulators, leading to significant sample inefficiency. In this study, these challenges are addressed through the use of Physics-Informed Neural Networks (PINNs), which can replace conventional and costly smart grid simulators. The RL policy learning process is enhanced so that convergence can be achieved in a fraction of the time required by the original environment. The PINN-based surrogate is compared with other benchmark data-driven surrogate models. By incorporating knowledge of the underlying physical laws, the results show that the PINN surrogate is the only approach considered in this context that can obtain a strong RL policy even without access to samples from the true simulator. The results demonstrate that using PINN surrogates can accelerate training by 50% compared to RL training without a surrogate. This approach enables the rapid generation of performance scores similar to those produced by the original simulator.

2602.18530 2026-02-24 cs.CV

Image-Based Classification of Olive Varieties Native to Turkiye Using Multiple Deep Learning Architectures: Analysis of Performance, Complexity, and Generalization

Hatice Karatas, Irfan Atabas

详情
英文摘要

This study compares multiple deep learning architectures for the automated, image-based classification of five locally cultivated black table olive varieties in Turkey: Gemlik, Ayvalik, Uslu, Erkence, and Celebi. Using a dataset of 2500 images, ten architectures - MobileNetV2, EfficientNetB0, EfficientNetV2-S, ResNet50, ResNet101, DenseNet121, InceptionV3, ConvNeXt-Tiny, ViT-B16, and Swin-T - were trained using transfer learning. Model performance was evaluated using accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), Cohen's Kappa, ROC-AUC, number of parameters, FLOPs, inference time, and generalization gap. EfficientNetV2-S achieved the highest classification accuracy (95.8%), while EfficientNetB0 provided the best trade-off between accuracy and computational complexity. Overall, the results indicate that under limited data conditions, parametric efficiency plays a more critical role than model depth alone.

2602.18528 2026-02-24 cs.LG cs.SD

Audio-Visual Continual Test-Time Adaptation without Forgetting

Sarthak Kumar Maharana, Akshay Mehra, Bhavya Ramakrishna, Yunhui Guo, Guan-Ming Su

详情
英文摘要

Audio-visual continual test-time adaptation involves continually adapting a source audio-visual model at test-time, to unlabeled non-stationary domains, where either or both modalities can be distributionally shifted, which hampers online cross-modal learning and eventually leads to poor accuracy. While previous works have tackled this problem, we find that SOTA methods suffer from catastrophic forgetting, where the model's performance drops well below the source model due to continual parameter updates at test-time. In this work, we first show that adapting only the modality fusion layer to a target domain not only improves performance on that domain but can also enhance performance on subsequent domains. Based on this strong cross-task transferability of the fusion layer's parameters, we propose a method, $\texttt{AV-CTTA}$, that improves test-time performance of the models without access to any source data. Our approach works by using a selective parameter retrieval mechanism that dynamically retrieves the best fusion layer parameters from a buffer using only a small batch of test data. These parameters are then integrated into the model, adapted to the current test distribution, and saved back for future use. Extensive experiments on benchmark datasets involving unimodal and bimodal corruptions show our proposed $\texttt{AV-CTTA}$ significantly outperforms existing methods while minimizing catastrophic forgetting.

2602.18525 2026-02-24 cs.CV cs.LG

Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity

Vasile Marian, Yong-Bin Kang, Alexander Buddery

Comments 23 pages, 13 figures, includes appendix

详情
英文摘要

Synthetic images are increasingly used to augment object-detection training sets, but reliably evaluating a synthetic dataset before training remains difficult: standard global generative metrics (e.g., FID) often do not predict downstream detection mAP. We present a controlled evaluation of synthetic augmentation for YOLOv11 across three single-class detection regimes -- Traffic Signs (sparse/near-saturated), Cityscapes Pedestrian (dense/occlusion-heavy), and COCO PottedPlant (multi-instance/high-variability). We benchmark six GAN-, diffusion-, and hybrid-based generators over augmentation ratios from 10% to 150% of the real training split, and train YOLOv11 both from scratch and with COCO-pretrained initialization, evaluating on held-out real test splits (mAP@0.50:0.95). For each dataset-generator-augmentation configuration, we compute pre-training dataset metrics under a matched-size bootstrap protocol, including (i) global feature-space metrics in both Inception-v3 and DINOv2 embeddings and (ii) object-centric distribution distances over bounding-box statistics. Synthetic augmentation yields substantial gains in the more challenging regimes (up to +7.6% and +30.6% relative mAP in Pedestrian and PottedPlant, respectively) but is marginal in Traffic Signs and under pretrained fine-tuning. To separate metric signal from augmentation quantity, we report both raw and augmentation-controlled (residualized) correlations with multiple-testing correction, showing that metric-performance alignment is strongly regime-dependent and that many apparent raw associations weaken after controlling for augmentation level.

2602.18521 2026-02-24 cs.LG cs.AI

AdaptStress: Online Adaptive Learning for Interpretable and Personalized Stress Prediction Using Multivariate and Sparse Physiological Signals

Xueyi Wang, Claudine J. C. Lamoth, Elisabeth Wilhelm

详情
英文摘要

Continuous stress forecasting could potentially contribute to lifestyle interventions. This paper presents a novel, explainable, and individualized approach for stress prediction using physiological data from consumer-grade smartwatches. We develop a time series forecasting model that leverages multivariate features, including heart rate variability, activity patterns, and sleep metrics, to predict stress levels across 16 temporal horizons (History window: 3, 5, 7, 9 days; forecasting window: 1, 3, 5, 7 days). Our evaluation involves 16 participants monitored for 10-15 weeks. We evaluate our approach across 16 participants, comparing against state-of-the-art time series models (Informer, TimesNet, PatchTST) and traditional baselines (CNN, LSTM, CNN-LSTM) across multiple temporal horizons. Our model achieved performance with an MSE of 0.053, MAE of 0.190, and RMSE of 0.226 in optimal settings (5-day input, 1-day prediction). A comparison with the baseline models shows that our model outperforms TimesNet, PatchTST, CNN-LSTM, LSTM, and CNN under all conditions, representing improvements of 36.9%, 25.5%, and 21.5% over the best baseline. According to the explanability analysis, sleep metrics are the most dominant and consistent stress predictors (importance: 1.1, consistency: 0.9-1.0), while activity features exhibit high inter-participant variability (0.1-0.2). Most notably, the model captures individual-specific patterns where identical features can have opposing effects across users, validating its personalization capabilities. These findings establish that consumer wearables, combined with adaptive and interpretable deep learning, can deliver relevant stress assessment adapted to individual physiological responses, providing a foundation for scalable, continuous, explainable mental health monitoring in real-world settings.

2602.18520 2026-02-24 cs.CV cs.AI

Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams

Aayam Bansal

详情
英文摘要

Providing timely, rubric-aligned feedback on student-drawn diagrams is a persistent challenge in STEM education. While large multimodal models (LMMs) can jointly parse images and generate explanations, their tendency to hallucinate undermines trust in classroom deployments. We present Sketch2Feedback, a grammar-in-the-loop framework that decomposes the problem into four stages -- hybrid perception, symbolic graph construction, constraint checking, and constrained VLM feedback -- so that the language model verbalizes only violations verified by an upstream rule engine. We evaluate on two synthetic micro-benchmarks, FBD-10 (free-body diagrams) and Circuit-10 (circuit schematics), each with 500 images spanning standard and hard noise augmentation tiers, comparing our pipeline against end-to-end LMMs (LLaVA-1.5-7B, Qwen2-VL-7B), a vision-only detector, a YOLOv8-nano learned detector, and an ensemble oracle. On n=100 test samples per benchmark with 95% bootstrap CIs, results are mixed and instructive: Qwen2-VL-7B achieves the highest micro-F1 on both FBDs (0.570) and circuits (0.528), but with extreme hallucination rates (0.78, 0.98). An ensemble oracle that selects the best prediction per sample reaches F1=0.556 with hallucination 0.320 on FBDs, demonstrating exploitable complementarity between grammar and end-to-end approaches. Confidence thresholding at tau=0.7 reduces circuit hallucination from 0.970 to 0.880 with no F1 loss. Hard noise augmentation reveals domain-dependent robustness: FBD detection is resilient while circuit detection degrades sharply. An LLM-as-judge evaluation confirms that the grammar pipeline produces more actionable circuit feedback (4.85/5) than the end-to-end LMM (3.11/5). We release all code, datasets, and evaluation scripts.

2602.18519 2026-02-24 cs.LG cs.CV

Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data

Joris Bekkers

详情
英文摘要

Traditional approaches to measuring visual exploratory behavior in soccer rely on counting visual exploratory actions (VEAs) based on rapid head movements exceeding 125°/s, but this method suffer from player position bias (i.e., a focus on central midfielders), annotation challenges, binary measurement constraints (i.e., a player is scanning, or not), lack the power to predict relevant short-term in-game future success, and are incompatible with fundamental soccer analytics models such as pitch control. This research introduces a novel formulaic continuous stochastic vision layer to quantify players' visual perception from pose-enhanced spatiotemporal tracking. Our probabilistic field-of-view and occlusion models incorporate head and shoulder rotation angles to create speed-dependent vision maps for individual players in a two-dimensional top-down plane. We combine these vision maps with pitch control and pitch value surfaces to analyze the awaiting phase (when a player is awaiting the ball to arrive after a pass for a teammate) and their subsequent on-ball phase. We demonstrate that aggregated visual metrics - such as the percentage of defended area observed while awaiting a pass - are predictive of controlled pitch value gained at the end of dribbling actions using 32 games of synchronized pose-enhanced tracking data and on-ball event data from the 2024 Copa America. This methodology works regardless of player position, eliminates manual annotation requirements, and provides continuous measurements that seamlessly integrate into existing soccer analytics frameworks. To further support the integration with existing soccer analytics frameworks we open-source the tools required to make these calculations.

2602.18518 2026-02-24 cs.LG stat.ME stat.ML

Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

Attila Dobi, Aravindh Manickavasagam, Benjamin Thompson, Xiaohan Yang, Faisal Farooq

Comments 8 pages

详情
英文摘要

Content safety teams need metrics that reflect what users actually experience, not only what is reported. We study prevalence: the fraction of user views (impressions) that went to content violating a given policy on a given day. Accurate prevalence measurement is challenging because violations are often rare and human labeling is costly, making frequent, platform-representative studies slow. We present a design-based measurement system that (i) draws daily probability samples from the impression stream using ML-assisted weights to concentrate label budget on high-exposure and high-risk content while preserving unbiasedness, (ii) labels sampled items with a multimodal LLM governed by policy prompts and gold-set validation, and (iii) produces design-consistent prevalence estimates with confidence intervals and dashboard drilldowns. A key design goal is one global sample with many pivots: the same daily sample supports prevalence by surface, viewer geography, content age, and other segments through post-stratified estimation. We describe the statistical estimators, variance and confidence interval construction, label-quality monitoring, and an engineering workflow that makes the system configurable across policies.

2602.18515 2026-02-24 cs.LG cs.AI

Weak-Form Evolutionary Kolmogorov-Arnold Networks for Solving Partial Differential Equations

Bongseok Kim, Jiahao Zhang, Guang Lin

详情
英文摘要

Partial differential equations (PDEs) form a central component of scientific computing. Among recent advances in deep learning, evolutionary neural networks have been developed to successively capture the temporal dynamics of time-dependent PDEs via parameter evolution. The parameter updates are obtained by solving a linear system derived from the governing equation residuals at each time step. However, strong-form evolutionary approaches can yield ill-conditioned linear systems due to pointwise residual discretization, and their computational cost scales unfavorably with the number of training samples. To address these limitations, we propose a weak-form evolutionary Kolmogorov-Arnold Network (KAN) for the scalable and accurate prediction of PDE solutions. We decouple the linear system size from the number of training samples through the weak formulation, leading to improved scalability compared to strong-form approaches. We also rigorously enforce boundary conditions by constructing the trial space with boundary-constrained KANs to satisfy Dirichlet and periodic conditions, and by incorporating derivative boundary conditions directly into the weak formulation for Neumann conditions. In conclusion, the proposed weak-form evolutionary KAN framework provides a stable and scalable approach for PDEs and contributes to scientific machine learning with potential relevance to future engineering applications.

2602.18505 2026-02-24 cs.CV

Suppression or Deletion: A Restoration-Based Representation-Level Analysis of Machine Unlearning

Yurim Jang, Jaeung Lee, Dohyun Kim, Jaemin Jo, Simon S. Woo

详情
英文摘要

As pretrained models are increasingly shared on the web, ensuring that models can forget or delete sensitive, copyrighted, or private information upon request has become crucial. Machine unlearning has been proposed to address this challenge. However, current evaluations for unlearning methods rely on output-based metrics, which cannot verify whether information is completely deleted or merely suppressed at the representation level, where suppression is insufficient for true unlearning. To address this gap, we propose a novel restoration-based analysis framework that uses Sparse Autoencoders to identify class-specific expert features in intermediate layers and applies inference-time steering to quantitatively distinguish between suppression and deletion. Applying our framework to 12 major unlearning methods in image classification tasks, we find that most methods achieve high restoration rates of unlearned information, indicating that they only suppress information at the decision-boundary level, while preserving semantic features in intermediate representations. Notably, even retraining from pretrained checkpoints shows high restoration, revealing that robust semantic features inherited from pretraining are not removed by retraining. These results demonstrate that representation-level retention poses significant risks overlooked by output-based metrics, highlighting the need for new unlearning evaluation criteria. We propose new evaluation guidelines that prioritize representation-level verification, especially for privacy-critical applications in the era of pre-trained models.

2602.18504 2026-02-24 cs.CV cs.AI

A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage

Daniel Tshiani

Comments Presented at the Robyn Rafferty Mathias Reseaerch Conference. Additional Information available at: https://DGT-International.com

详情
英文摘要

Clubs with access to expensive multi-camera setups or GPS tracking systems gain a competitive advantage through detailed data, whereas lower-budget teams are often unable to collect similar information. This paper examines whether such data can instead be extracted directly from standard broadcast footage using a single-camera computer vision pipeline. This project develops an end-to-end system that combines a YOLO object detector with the ByteTrack tracking algorithm to identify and track players, referees, goalkeepers, and the ball throughout a match. Experimental results show that the pipeline achieves high performance in detecting and tracking players and officials, with strong precision, recall, and mAP50 scores, while ball detection remains the primary challenge. Despite this limitation, our findings demonstrate that AI can extract meaningful player-level spatial information from a single broadcast camera. By reducing reliance on specialized hardware, the proposed approach enables colleges, academies, and amateur clubs to adopt scalable, data-driven analysis methods previously accessible only to professional teams, highlighting the potential for affordable computer vision-based soccer analytics.

2602.18500 2026-02-24 cs.CV cs.ET cs.HC

Scaling Ultrasound Volumetric Reconstruction via Mobile Augmented Reality

Kian Wei Ng, Yujia Gao, Deborah Khoo, Ying Zhen Tan, Chengzheng Mao, Haojie Cheng, Andrew Makmur, Kee Yuan Ngiam, Serene Goh, Eng Tat Khoo

Comments Submitted to MICCAI 2026

详情
英文摘要

Accurate volumetric characterization of lesions is essential for oncologic diagnosis, risk stratification, and treatment planning. While imaging modalities such as Computed Tomography provide high-quality 3D data, 2D ultrasound (2D-US) remains the preferred first-line modality for breast and thyroid imaging due to cost, portability, and safety factors. However, volume estimates derived from 2D-US suffer from high inter-user variability even among experienced clinicians. Existing 3D ultrasound (3D-US) solutions use specialized probes or external tracking hardware, but such configurations increase costs and diminish portability, constraining widespread clinical use. To address these limitations, we present Mobile Augmented Reality Volumetric Ultrasound (MARVUS), a resource-efficient system designed to increase accessibility to accurate and reproducible volumetric assessment. MARVUS is interoperable with conventional ultrasound (US) systems, using a foundation model to enhance cross-specialty generalization while minimizing hardware requirements relative to current 3D-US solutions. In a user study involving experienced clinicians performing measurements on breast phantoms, MARVUS yielded a substantial improvement in volume estimation accuracy (mean difference: 0.469 cm3) with reduced inter-user variability (mean difference: 0.417 cm3). Additionally, we prove that augmented reality (AR) visualizations enhance objective performance metrics and clinician-reported usability. Collectively, our findings suggests that MARVUS can enhance US-based cancer screening, diagnostic workflows, and treatment planning in a scalable, cost-conscious, and resource-efficient manner. Usage video demonstration available (https://youtu.be/m4llYcZpqmM).

2602.18496 2026-02-24 cs.CV

A Patient-Specific Digital Twin for Adaptive Radiotherapy of Non-Small Cell Lung Cancer

Anvi Sud, Jialu Huang, Gregory R. Hart, Keshav Saxena, John Kim, Lauren Tressel, Jun Deng

详情
英文摘要

Radiotherapy continues to become more precise and data dense, with current treatment regimens generating high frequency imaging and dosimetry streams ideally suited for AI driven temporal modeling to characterize how normal tissues evolve with time. Each fraction in biologically guided radiotherapy(BGRT) treated non small cell lung cancer (NSCLC) patients records new metabolic, anatomical, and dose information. However, clinical decision making is largely informed by static, population based NTCP models which overlook the dynamic, unique biological trajectories encoded in sequential data. We developed COMPASS (Comprehensive Personalized Assessment System) for safe radiotherapy, functioning as a temporal digital twin architecture utilizing per fraction PET, CT, dosiomics, radiomics, and cumulative biologically equivalent dose (BED) kinetics to model normal tissue biology as a dynamic time series process. A GRU autoencoder was employed to learn organ specific latent trajectories, which were classified via logistic regression to predict eventual CTCAE grade 1 or higher toxicity. Eight NSCLC patients undergoing BGRT contributed to the 99 organ fraction observations covering 24 organ trajectories (spinal cord, heart, and esophagus). Despite the small cohort, intensive temporal phenotyping allowed for comprehensive analysis of individual dose response dynamics. Our findings revealed a viable AI driven early warning window, as increasing risk ratings occurred from several fractions before clinical toxicity. The dense BED driven representation revealed biologically relevant spatial dose texture characteristics that occur before toxicity and are averaged out with traditional volume based dosimetry. COMPASS establishes a proof of concept for AI enabled adaptive radiotherapy, where treatment is guided by a continually updated digital twin that tracks each patients evolving biological response.

2602.18494 2026-02-24 cs.AI cs.LG

On the Dynamics of Observation and Semantics

Xiu Li

详情
英文摘要

A dominant paradigm in visual intelligence treats semantics as a static property of latent representations, assuming that meaning can be discovered through geometric proximity in high dimensional embedding spaces. In this work, we argue that this view is physically incomplete. We propose that intelligence is not a passive mirror of reality but a property of a physically realizable agent, a system bounded by finite memory, finite compute, and finite energy interacting with a high entropy environment. We formalize this interaction through the kinematic structure of an Observation Semantics Fiber Bundle, where raw sensory observation data (the fiber) is projected onto a low entropy causal semantic manifold (the base). We prove that for any bounded agent, the thermodynamic cost of information processing (Landauer's Principle) imposes a strict limit on the complexity of internal state transitions. We term this limit the Semantic Constant B. From these physical constraints, we derive the necessity of symbolic structure. We show that to model a combinatorial world within the bound B, the semantic manifold must undergo a phase transition, it must crystallize into a discrete, compositional, and factorized form. Thus, language and logic are not cultural artifacts but ontological necessities the solid state of information required to prevent thermal collapse. We conclude that understanding is not the recovery of a hidden latent variable, but the construction of a causal quotient that renders the world algorithmically compressible and causally predictable.

2602.18493 2026-02-24 cs.LG cs.AI

Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning

Kehao Zhang, Shangtong Gui, Sheng Yang, Wei Chen, Yang Feng

详情
英文摘要

Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified Memory Agent (UMA), an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy. UMA maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD (create, update, delete, reorganize) over key value entries, enabling proactive consolidation during streaming. To evaluate long-horizon memory behavior, we introduce Ledger-QA, a diagnostic benchmark for continuous state tracking where answers are latent values derived from accumulated updates rather than lo cal span retrieval. Across 13 datasets spanning Ledger-QA, Test-Time Learning, and Accurate Retrieval, UMA substantially outperforms long-context and RAG baselines on dynamic reasoning and learning tasks while remaining competitive on standard retrieval benchmarks, underscoring the importance of learned, end-to-end memory management.

2602.18487 2026-02-24 cs.CL cs.LG

The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder

Ihor Stepanov, Mykhailo Shtopko, Dmytro Vodianytskyi, Oleksandr Lukashov

Comments 13 pages, 1 figure, 4 tables

详情
英文摘要

This paper introduces GLiNER-bi-Encoder, a novel architecture for Named Entity Recognition (NER) that harmonizes zero-shot flexibility with industrial-scale efficiency. While the original GLiNER framework offers strong generalization, its joint-encoding approach suffers from quadratic complexity as the number of entity labels increases. Our proposed bi-encoder design decouples the process into a dedicated label encoder and a context encoder, effectively removing the context-window bottleneck. This architecture enables the simultaneous recognition of thousands, and potentially millions, of entity types with minimal overhead. Experimental results demonstrate state-of-the-art zero-shot performance, achieving 61.5 percent Micro-F1 on the CrossNER benchmark. Crucially, by leveraging pre-computed label embeddings, GLiNER-bi-Encoder achieves up to a 130 times throughput improvement at 1024 labels compared to its uni-encoder predecessors. Furthermore, we introduce GLiNKER, a modular framework that leverages this architecture for high-performance entity linking across massive knowledge bases such as Wikidata.

2602.18486 2026-02-24 cs.LG eess.SP stat.ML

Support Vector Data Description for Radar Target Detection

Jean Pinsolle, Yadang Alexis Rouzoumka, Chengfang Ren, Chistèle Morisseau, Jean-Philippe Ovarlez

Comments 5 pages, 2 figures, to appear in Acoustics, Speech and Signal Processing (ICASSP), 2026 IEEE International Conference on, Barcelona, Spain, May 2026

详情
英文摘要

Classical radar detection techniques rely on adaptive detectors that estimate the noise covariance matrix from target-free secondary data. While effective in Gaussian environments, these methods degrade in the presence of clutter, which is better modeled by heavy-tailed distributions such as the Complex Elliptically Symmetric (CES) and Compound-Gaussian (CGD) families. Robust covariance estimators like M-estimators or Tyler's estimator address this issue, but still struggle when thermal noise combines with clutter. To overcome these challenges, we investigate the use of Support Vector Data Description (SVDD) and its deep extension, Deep SVDD, for target detection. These one-class learning methods avoid direct noise covariance estimation and are adapted here as CFAR detectors. We propose two novel SVDD-based detection algorithms and demonstrate their effectiveness on simulated radar data.

2602.18472 2026-02-24 cs.LG cs.AI q-bio.QM

Physiologically Informed Deep Learning: A Multi-Scale Framework for Next-Generation PBPK Modeling

Shunqi Liu, Han Qiu, Tong Wang

详情
英文摘要

Physiologically Based Pharmacokinetic (PBPK) modeling is a cornerstone of model-informed drug development (MIDD), providing a mechanistic framework to predict drug absorption, distribution, metabolism, and excretion (ADME). Despite its utility, adoption is hindered by high computational costs for large-scale simulations, difficulty in parameter identification for complex biological systems, and uncertainty in interspecies extrapolation. In this work, we propose a unified Scientific Machine Learning (SciML) framework that bridges mechanistic rigor and data-driven flexibility. We introduce three contributions: (1) Foundation PBPK Transformers, which treat pharmacokinetic forecasting as a sequence modeling task; (2) Physiologically Constrained Diffusion Models (PCDM), a generative approach that uses a physics-informed loss to synthesize biologically compliant virtual patient populations; and (3) Neural Allometry, a hybrid architecture combining Graph Neural Networks (GNNs) with Neural ODEs to learn continuous cross-species scaling laws. Experiments on synthetic datasets show that the framework reduces physiological violation rates from 2.00% to 0.50% under constraints while offering a path to faster simulation.

2602.18465 2026-02-24 cs.LG stat.ML

Revisiting the Seasonal Trend Decomposition for Enhanced Time Series Forecasting

Sanjeev Panta, Xu Yuan, Li Chen, Nian-Feng Tzeng

Comments 5 pages, accepted at 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)

详情
英文摘要

Time series forecasting presents significant challenges in real-world applications across various domains. Building upon the decomposition of the time series, we enhance the architecture of machine learning models for better multivariate time series forecasting. To achieve this, we focus on the trend and seasonal components individually and investigate solutions to predict them with less errors. Recognizing that reversible instance normalization is effective only for the trend component, we take a different approach with the seasonal component by directly applying backbone models without any normalization or scaling procedures. Through these strategies, we successfully reduce error values of the existing state-of-the-art models and finally introduce dual-MLP models as more computationally efficient solutions. Furthermore, our approach consistently yields positive results with around 10% MSE average reduction across four state-of-the-art baselines on the benchmark datasets. We also evaluate our approach on a hydrological dataset extracted from the United States Geological Survey (USGS) river stations, where our models achieve significant improvements while maintaining linear time complexity, demonstrating real-world effectiveness. The source code is available at https://github.com/Sanjeev97/Time-Series-Decomposition

2602.18450 2026-02-24 cs.CL cs.IT cs.LG math.IT

Asymptotic Semantic Collapse in Hierarchical Optimization

Faruk Alpay, Bugra Kilictas

Comments 23 pages, 2 figures. Includes a dataset-free benchmark with full metric reporting

详情
英文摘要

Multi-agent language systems can exhibit a failure mode where a shared dominant context progressively absorbs individual semantics, yielding near-uniform behavior across agents. We study this effect under the name Asymptotic Semantic Collapse in Hierarchical Optimization. In a closed linguistic setting with a Dominant Anchor Node whose semantic state has effectively infinite inertia, we show that repeated interactions with Peripheral Agent Nodes drive an asymptotic alignment that minimizes a global loss. We model semantic states as points on a Riemannian manifold and analyze the induced projection dynamics. Two consequences follow. First, the limiting semantic configuration is insensitive to the optimization history: both smooth gradient-style updates and stochastic noisy updates converge to the same topological endpoint, establishing path independence at convergence. Second, the degree of context dependence controls information content: moving from atomic (independent) representations to fully entangled (context-bound) representations forces the node entropy, interpreted as available degrees of freedom, to vanish in the limit. The theory connects information-theoretic quantities with differential-geometric structure and suggests an interpretation as an immutable consensus rule that constrains agents to a shared semantic grammar. A lightweight dataset-free benchmark on an RWKV-7 13B GGUF checkpoint complements the analysis, reporting zero hash collisions, mean compliance of 0.50 under greedy decoding and 0.531 under stochastic decoding, and final Jaccard-to-anchor similarity values of 0.295 and 0.224, respectively.