arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.11937 2026-03-30 cs.LG

Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration

Akhiad Bercovich, Nir Ailon, Vladimir Anisimov, Tomer Asida, Nave Assaf, Mohammad Dabbah, Ido Galil, Amnon Geifman, Yonatan Geifman, Izhak Golan, Roi Koren, Itay Levy, Zach Moshe, Pavlo Molchanov, Najeeb Nabwani, Mostofa Patwary, Omri Puny, Tomer Ronen, Itamar Schen, Elad Segal, Ido Shahaf, Oren Tropp, Ran Zilberstein, Ran El-Yaniv

详情

英文摘要

Reasoning-focused LLMs improve answer quality by generating longer reasoning traces, but the additional tokens dramatically increase serving cost, motivating inference optimization. We extend and apply Puzzle, a post-training neural architecture search (NAS) framework, to gpt-oss-120B to produce gpt-oss-puzzle-88B, a deployment-optimized derivative. Our approach combines heterogeneous MoE expert pruning, selective replacement of full-context attention with window attention, FP8 KV-cache quantization with calibrated scales, and post-training reinforcement learning to recover accuracy, while maintaining low generation length. In terms of per-token speeds, on an 8XH100 node we achieve 1.63X and 1.22X throughput speedups in long-context and short-context settings, respectively. gpt-oss-puzzle-88B also delivers throughput speedups of 2.82X on a single NVIDIA H100 GPU. However, because token counts can change with reasoning effort and model variants, per-token throughput (tok/s) and latency (ms/token) do not necessarily lead to end-to-end speedups: a 2X throughput gain is erased if traces grow 2X. Conversely, throughput gains can be spent on more reasoning tokens to improve accuracy; we therefore advocate request-level efficiency metrics that normalize throughput by tokens generated and trace an accuracy--speed frontier across reasoning efforts. We show that gpt-oss-puzzle-88B improves over gpt-oss-120B along the entire frontier, delivering up to 1.29X higher request-level efficiency. Across various benchmarks, gpt-oss-puzzle-88B matches or slightly exceeds the parent on suite-average accuracy across reasoning efforts, with retention ranging from 100.8% (high) to 108.2% (low), showing that post-training architecture search can substantially reduce inference costs without sacrificing quality.

URL PDF HTML ☆

赞 0 踩 0

2602.05321 2026-03-30 cs.CV

Wid3R: Wide Field-of-View 3D Reconstruction via Camera Model Conditioning

Dongki Jung, Jaehoon Choi, Adil Qureshi, Somi Jeong, Dinesh Manocha, Suyong Yeon

2602.02969 2026-03-30 cs.CV

Dynamic High-frequency Convolution for Infrared Small Target Detection

Ruojing Li, Chao Xiao, Qian Yin, Wei An, Nuo Chen, Xinyi Ying, Miao Li, Yingqian Wang

详情

DOI: 10.1109/TCSVT.2026.3661285

英文摘要

Infrared small targets are typically tiny and locally salient, which belong to high-frequency components (HFCs) in images. Single-frame infrared small target (SIRST) detection is challenging, since there are many HFCs along with targets, such as bright corners, broken clouds, and other clutters. Current learning-based methods rely on the powerful capabilities of deep networks, but neglect explicit modeling and discriminative representation learning of various HFCs, which is important to distinguish targets from other HFCs. To address the aforementioned issues, we propose a dynamic high-frequency convolution (DHiF) to translate the discriminative modeling process into the generation of a dynamic local filter bank. Especially, DHiF is sensitive to HFCs, owing to the dynamic parameters of its generated filters being symmetrically adjusted within a zero-centered range according to Fourier transformation properties. Combining with standard convolution operations, DHiF can adaptively and dynamically process different HFC regions and capture their distinctive grayscale variation characteristics for discriminative representation learning. DHiF functions as a drop-in replacement for standard convolution and can be used in arbitrary SIRST detection networks without significant decrease in computational efficiency. To validate the effectiveness of our DHiF, we conducted extensive experiments across different SIRST detection networks on real-scene datasets. Compared to other state-of-the-art convolution operations, DHiF exhibits superior detection performance with promising improvement. Codes are available at https://github.com/TinaLRJ/DHiF.

URL PDF HTML ☆

赞 0 踩 0

2602.00316 2026-03-30 cs.CL

MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes

Rodrigo Batista, Luís Filipe Cunha, Purificação Silvano, Nuno Guimarães, Alípio Jorge, Evelin Amorim, Ricardo Campos

详情

DOI: 10.1007/978-3-032-21300-6_33
Journal ref: Advances in Information Retrieval. ECIR 2026. Lecture Notes in Computer Science, vol 16484. Springer, Cham

英文摘要

Municipal meeting minutes are official documents of local governance, exhibiting heterogeneous formats and writing styles. Effective information retrieval (IR) requires identifying metadata such as meeting number, date, location, participants, and start/end times, elements that are rarely standardized or easy to extract automatically. Existing named entity recognition (NER) models are ill-suited to this task, as they are not adapted to such domain-specific categories. In this paper, we propose a two-stage pipeline for metadata extraction from municipal minutes. First, a question answering (QA) model identifies the opening and closing text segments containing metadata. Transformer-based models (BERTimbau and XLM-RoBERTa with and without a CRF layer) are then applied for fine-grained entity extraction and enhanced through deslexicalization. To evaluate our proposed pipeline, we benchmark both open-weight (Phi) and closed-weight (Gemini) LLMs, assessing predictive performance, inference cost, and carbon footprint. Our results demonstrate strong in-domain performance, better than larger general-purpose LLMs. However, cross-municipality evaluation reveals reduced generalization reflecting the variability and linguistic complexity of municipal records. This work establishes the first benchmark for metadata extraction from municipal meeting minutes, providing a solid foundation for future research in this domain.

URL PDF HTML ☆

赞 0 踩 0

2601.19490 2026-03-30 cs.CL

ClaimPT: A Portuguese Dataset of Annotated Claims in News Articles

Ricardo Campos, Raquel Sequeira, Sara Nerea, Inês Cantante, Diogo Folques, Luís Filipe Cunha, João Canavilhas, António Branco, Alípio Jorge, Sérgio Nunes, Nuno Guimarães, Purificação Silvano

详情

DOI: 10.1007/978-3-032-21321-1_58
Journal ref: Advances in Information Retrieval. ECIR 2026. Lecture Notes in Computer Science, vol 16486. Springer, Cham

英文摘要

Fact-checking remains a demanding and time-consuming task, still largely dependent on manual verification and unable to match the rapid spread of misinformation online. This is particularly important because debunking false information typically takes longer to reach consumers than the misinformation itself; accelerating corrections through automation can therefore help counter it more effectively. Although many organizations perform manual fact-checking, this approach is difficult to scale given the growing volume of digital content. These limitations have motivated interest in automating fact-checking, where identifying claims is a crucial first step. However, progress has been uneven across languages, with English dominating due to abundant annotated data. Portuguese, like other languages, still lacks accessible, licensed datasets, limiting research, NLP developments and applications. In this paper, we introduce ClaimPT, a dataset of European Portuguese news articles annotated for factual claims, comprising 1,308 articles and 6,875 individual annotations. Unlike most existing resources based on social media or parliamentary transcripts, ClaimPT focuses on journalistic content, collected through a partnership with LUSA, the Portuguese News Agency. To ensure annotation quality, two trained annotators labeled each article, with a curator validating all annotations according to a newly proposed scheme. We also provide baseline models for claim detection, establishing initial benchmarks and enabling future NLP and IR applications. By releasing ClaimPT, we aim to advance research on low-resource fact-checking and enhance understanding of misinformation in news media.

URL PDF HTML ☆

赞 0 踩 0

2601.18374 2026-03-30 cs.CL

CitiLink: Enhancing Municipal Transparency and Citizen Engagement through Searchable Meeting Minutes

Rodrigo Silva, José Evans, José Isidro, Miguel Marques, Afonso Fonseca, Ricardo Morais, João Canavilhas, Arian Pasquali, Purificação Silvano, Alípio Jorge, Nuno Guimarães, Sérgio Nunes, Ricardo Campos

2601.13622 2026-03-30 cs.CV cs.AI

CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Donghee Lee, Rui Cai, Zhe Zhao

2512.24694 2026-03-30 cs.LG

Mobility-Assisted Decentralized Federated Learning: Convergence Analysis and A Data-Driven Approach

Reza Jahani, Md Farhamdur Reza, Richeng Jin, Huaiyu Dai

Comments Under review for potential publication in IEEE Transactions on Cognitive Communications and Networking

2512.18921 2026-03-30 cs.LG

Concurrent training methods for Kolmogorov-Arnold networks: Disjoint datasets and FPGA implementation

Andrew Polar, Michael Poluektov

2512.17109 2026-03-30 cs.LG

Bridging Training and Merging Through Momentum-Aware Optimization

Alireza Moayedikia, Alicia Troncoso

Comments Paper submitted to INS and revised. Currently revision is under review as of 27th Jan 2026

2512.14150 2026-03-30 cs.LG cs.AI

PathFinder: Advancing Path Loss Prediction for Single-to-Multi-Transmitter Scenario

Zhijie Zhong, Zhiwen Yu, Pengyu Li, Jianming Lv, C. L. Philip Chen, Min Chen

Comments 41 pages, 16 figures, 6 tables. Under review

2512.14080 2026-03-30 cs.LG cs.AI

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Wentao Guo, Mayank Mishra, Xinle Cheng, Ion Stoica, Tri Dao

Comments Include the new Blackwell benchmark results

详情

英文摘要

Mixture of Experts (MoE) models have emerged as the de facto architecture for scaling up language models without significantly increasing the computational cost. Recent MoE models demonstrate a clear trend towards high expert granularity (smaller expert intermediate dimension) and higher sparsity (constant number of activated experts with a higher number of total experts), which improve model quality per FLOP. However, fine-grained MoEs suffer from increased activation memory footprint and reduced hardware efficiency due to higher IO costs, while sparser MoEs suffer from wasted computations due to padding in Grouped GEMM kernels. In response, we propose a memory-efficient algorithm to compute the forward and backward passes of MoEs with minimal activation caching for the backward pass. We also design GPU kernels that overlap memory IO with computation, benefiting all MoE architectures. Finally, we propose a novel "token rounding" method that minimizes the wasted compute due to padding in Grouped GEMM kernels. As a result, our method SonicMoE reduces activation memory by 45% and achieves a 1.86x compute throughput improvement on Hopper GPUs compared to ScatterMoE's BF16 MoE kernel for a fine-grained 7B MoE. Concretely, SonicMoE on 64 H100s achieves a training throughput of 213 billion tokens per day, comparable to ScatterMoE's 225 billion tokens per day on 96 H100s for a 7B MoE model training with FSDP-2 using the lm-engine codebase. On Blackwell GPUs, SonicMoE also achieves a 25% and 15% relative speedup on the forward and backward pass respectively compared to a highly optimized DeepGEMM baseline on OLMoE-sized 7B MoE models. Under high MoE sparsity settings, our tile-aware token rounding algorithm yields an additional 1.16x speedup on kernel execution time compared to vanilla top-K routing while maintaining similar downstream performance on Hopper GPUs. We open-source all our kernels.

URL PDF HTML ☆

赞 0 踩 0

2511.22265 2026-03-30 cs.LG

FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning

Yuan Yao, Lixu Wang, Jiaqi Wu, Jin Song, Simin Chen, Zehua Wang, Zijian Tian, Wei Chen, Huixia Li, Xiaoxiao Li

Comments Accepted by CVPR 2026

2511.19669 2026-03-30 cs.AI

HeaRT: A Hierarchical Circuit Reasoning Tree-Based Agentic Framework for AMS Design Optimization

Souradip Poddar, Chia-Tung Ho, Ziming Wei, Weidong Cao, Haoxing Ren, David Z. Pan

Comments Analog Design Automation, Hierarchical Circuit Reasoning, Context-Aware Design Adaptation, LLMs, Agentic Frameworks, Electronic Design Automation (EDA)

2511.10971 2026-03-30 cs.CV

ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization

Anzhe Cheng, Shukai Duan, Shixuan Li, Chenzhong Yin, Mingxi Cheng, Heng Ping, Tamoghna Chattopadhyay, Sophia I Thomopoulos, Shahin Nazarian, Paul Thompson, Paul Bogdan

Comments Accepted in CVPR2026 Main Track

2510.13698 2026-03-30 cs.CV

Attention Misses Visual Risk: Risk-Adaptive Steering for Multimodal Safety Alignment

Jonghyun Park, Minhyuk Seo, Chaewon Yeo, Jonghyun Choi

2509.22498 2026-03-30 cs.RO

HELIOS: Hierarchical Exploration for Language-Grounded Interaction in Open Scenes

Katrina Ashton, Chahyon Ku, Shrey Shah, Saumit Vedula, Tingrui Zhang, Wen Jiang, Kostas Daniilidis, Bernadette Bucher

2509.15219 2026-03-30 cs.CV cs.LG cs.MA cs.MM cs.RO

Out-of-Sight Embodied Agents: Multimodal Tracking, Sensor Fusion, and Trajectory Forecasting

Haichao Zhang, Yi Xu, Yun Fu

Comments Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access), pp. 1-14, March 23, 2026

详情

DOI: 10.1109/TPAMI.2026.3676710
Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

英文摘要

Trajectory prediction is a fundamental problem in computer vision, vision-language-action models, world models, and autonomous systems, with broad impact on autonomous driving, robotics, and surveillance. However, most existing methods assume complete and clean observations, and therefore do not adequately handle out-of-sight agents or noisy sensing signals caused by limited camera coverage, occlusions, and the absence of ground-truth denoised trajectories. These challenges raise safety concerns and reduce robustness in real-world deployment. In this extended study, we introduce major improvements to Out-of-Sight Trajectory (OST), a task for predicting noise-free visual trajectories of out-of-sight objects from noisy sensor observations. Building on our prior work, we expand Out-of-Sight Trajectory Prediction (OOSTraj) from pedestrians to both pedestrians and vehicles, increasing its relevance to autonomous driving, robotics, and surveillance. Our improved Vision-Positioning Denoising Module exploits camera calibration to establish vision-position correspondence, mitigating the lack of direct visual cues and enabling effective unsupervised denoising of noisy sensor signals. Extensive experiments on the Vi-Fi and JRDB datasets show that our method achieves state-of-the-art results for both trajectory denoising and trajectory prediction, with clear gains over prior baselines. We also compare with classical denoising methods, including Kalman filtering, and adapt recent trajectory prediction models to this setting, establishing a stronger benchmark. To the best of our knowledge, this is the first work to use vision-positioning projection to denoise noisy sensor trajectories of out-of-sight agents, opening new directions for future research.

URL PDF HTML ☆

赞 0 踩 0

2506.20964 2026-03-30 cs.CV cs.AI

Evidence-based diagnostic reasoning with multi-agent copilot for human pathology

Luca L. Weishaupt, Chengkuan Chen, Drew F. K. Williamson, Richard J. Chen, Guillaume Jaume, Tong Ding, Bowen Chen, Anurag Vaidya, Long Phi Le, Guillaume Jaume, Ming Y. Lu, Faisal Mahmood

2506.12766 2026-03-30 cs.CV

Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better

Ruojing Li, Wei An, Yingqian Wang, Xinyi Ying, Yimian Dai, Longguang Wang, Miao Li, Yulan Guo, Li Liu

2506.01949 2026-03-30 cs.CV

IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout

Fei Shen, Yutong Gao, Jian Yu, Xiaoyu Du, Jinhui Tang

2504.17696 2026-03-30 cs.CV cs.AI

Hierarchical and Multimodal Data for Daily Activity Understanding

Ghazal Kaviani, Yavuz Yarici, Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib, Mashhour Solh, Ameya Patil

Comments Accepted for publication in DMLR

详情

英文摘要

Daily Activity Recordings for Artificial Intelligence (DARai, pronounced "Dahr-ree") is a multimodal, hierarchically annotated dataset constructed to understand human activities in real-world settings. DARai consists of continuous scripted and unscripted recordings of 50 participants in 10 different environments, totaling over 200 hours of data from 20 sensors including multiple camera views, depth and radar sensors, wearable inertial measurement units (IMUs), electromyography (EMG), insole pressure sensors, biomonitor sensors, and gaze tracker. To capture the complexity in human activities, DARai is annotated at three levels of hierarchy: (i) high-level activities (L1) that are independent tasks, (ii) lower-level actions (L2) that are patterns shared between activities, and (iii) fine-grained procedures (L3) that detail the exact execution steps for actions. The dataset annotations and recordings are designed so that 22.7% of L2 actions are shared between L1 activities and 14.2% of L3 procedures are shared between L2 actions. The overlap and unscripted nature of DARai allows counterfactual activities in the dataset. Experiments with various machine learning models showcase the value of DARai in uncovering important challenges in human-centered applications. Specifically, we conduct unimodal and multimodal sensor fusion experiments for recognition, temporal localization, and future action anticipation across all hierarchical annotation levels. To highlight the limitations of individual sensors, we also conduct domain-variant experiments that are enabled by DARai's multi-sensor and counterfactual activity design setup. The code, documentation, and dataset are available at the dedicated DARai website: https://alregib.ece.gatech.edu/software-and-datasets/darai-daily-activity-recordings-for-artificial-intelligence-and-machine-learning/

URL PDF HTML ☆

赞 0 踩 0

2503.22886 2026-03-30 cs.LG cs.RO

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

Ron Vainshtein, Zohar Rimon, Shie Mannor, Chen Tessler

2503.10055 2026-03-30 cs.CV eess.IV

Fourier Decomposition for Explicit Representation of 3D Point Cloud Attributes

Donghyun Kim, Chanyoung Kim, Hyunah Ko, Seong Jae Hwang

2503.08055 2026-03-30 cs.CV

Beyond Deepfake vs Real: Facial Deepfake Detection in the Open-Set Paradigm

Nadarasar Bahavan, Sachith Seneviratne, Sanjay Saha, Ken Chen, Sanka Rasnayaka, Saman Halgamuge

Comments IEEE/CVF Conference on Computer Vision and Pattern Recognition - Workshop

2502.02468 2026-03-30 cs.CV

High-Fidelity Human Avatars from Laptop Webcams using Edge Compute

Akash Haridas, Imran N. Junejo

Comments 6 pages, 6 figures, 1 table

2412.04882 2026-03-30 cs.LG stat.ML

Nonmyopic Global Optimisation via Approximate Dynamic Programming

Filippo Airaldi, Bart De Schutter, Azita Dabiri

Comments 36 pages, 6 figures, 2 tables, submitted to Springer Machine Learning

详情

英文摘要

Global optimisation to optimise expensive-to-evaluate black-box functions without gradient information. Bayesian optimisation, one of the most well-known techniques, typically employs Gaussian processes as surrogate models, leveraging their probabilistic nature to balance exploration and exploitation. However, these processes become computationally prohibitive in high-dimensional spaces. Recent alternatives, based on inverse distance weighting (IDW) and radial basis functions (RBFs), offer competitive, computationally lighter solutions. Despite their efficiency, both traditional global and Bayesian optimisation strategies suffer from the myopic nature of their acquisition functions, which focus on immediate improvement neglecting future implications of the sequential decision making process. Nonmyopic acquisition functions devised for the Bayesian setting have shown promise in improving long-term performance. Yet, their combination with deterministic surrogate models remains unexplored. In this work, we introduce novel nonmyopic acquisition strategies tailored to IDW and RBF based on approximate dynamic programming paradigms, including rollout and multi-step scenario-based optimisation schemes, to enable lookahead acquisition. These methods optimise a sequence of query points over a horizon by predicting the evolution of the surrogate model, inherently managing the exploration-exploitation trade-off via optimisation techniques. The proposed approach represents a significant advance in extending nonmyopic acquisition principles, previously confined to Bayesian optimisation, to deterministic models. Empirical results on synthetic and hyperparameter tuning benchmark problems, a constrained problem, as well as on a data-driven predictive control application, demonstrate that these nonmyopic methods outperform conventional myopic approaches, leading to faster and more robust convergence.

URL PDF HTML ☆

赞 0 踩 0

2411.12964 2026-03-30 cs.AI

Efficient Energy-Optimal Path Planning for Electric Vehicles Considering Vehicle Dynamics

Saman Ahmadi, Guido Tack, Daniel Harabor, Philip Kilby, Mahdi Jalili

Comments 14 pages, 7 figures, 7 tables

2410.12853 2026-03-30 cs.CL cs.AI cs.LG

Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks

Mahmood Hegazy

Comments 11 pages, 9 figures

2407.19660 2026-03-30 cs.CV cs.LG

Towards Knowledge Guided Pretraining Approaches for Multimodal Foundation Models: Applications in Remote Sensing

Praveen Ravirathinam, Ajitesh Parthasarathy, Ankush Khandelwal, Rahul Ghosh, Vipin Kumar

Comments 33 pages with appendix