arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.02208 2026-03-03 cs.CL

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training

Valentin Lacombe, Valentin Quesnel, Damien Sileo

Comments Keywords: LLMs, NLP, Dataset, Corpus, Procedural Pre-training, Reasoning, Logic, Formal Semantics https://github.com/sileod/reasoning_core

详情

英文摘要

Training on verifiable symbolic data is a promising way to expand the reasoning frontier of language models beyond what standard pre-training corpora provide. Yet existing procedural generators often rely on fixed puzzles or templates and do not deliver the distributional breadth needed at scale. We introduce Reasoning Core, a scalable suite that procedurally generates verifiable symbolic reasoning data across core formal domains: PDDL planning over randomized domains, first-order logic with equality, context-free grammar parsing and generation, causal reasoning over random Bayesian networks, and systems of equations. Each task is paired with an external solver for rigorous verification and admits continuous difficulty control for curriculum design. Examples can optionally include solver-derived reasoning traces, enabling supervised training from the earliest pre-training stages, and the same interface provides verifiable reward functions for reinforcement learning. Our experiments show that mixing Reasoning Core data into pre-training improves downstream reasoning while preserving, or slightly improving, language modeling quality. Zero-shot evaluations confirm these tasks challenge frontier models such as GPT-5. The code and data are publicly available under the MIT license.

URL PDF HTML ☆

赞 0 踩 0

2603.02205 2026-03-03 cs.SD

Analytical Exploration of Spatial Audio Cues: A Differentiable Multi-Sphere Scattering Model

Siminfar Samakoush Galougah, Pranav Pulijala, Ramani Duraiswami

2603.02204 2026-03-03 cs.LG stat.ML

Partial Causal Structure Learning for Valid Selective Conformal Inference under Interventions

Amir Asiaee, Kavey Aryan, James P. Long

2603.02203 2026-03-03 cs.AI cs.CL

Tool Verification for Test-Time Reinforcement Learning

Ruotong Liao, Nikolai Röhrich, Xiaohan Wang, Yuhui Zhang, Yasaman Samadzadeh, Volker Tresp, Serena Yeung-Levy

Comments 12 pages, 11 figures

2603.02202 2026-03-03 cs.LG

Frontier Models Can Take Actions at Low Probabilities

Alex Serrano, Wen Xing, David Lindner, Erik Jenner

2603.02200 2026-03-03 cs.CV cs.AI cs.LG

Adaptive Confidence Regularization for Multimodal Failure Detection

Moru Liu, Hao Dong, Olga Fink, Mario Trapp

Comments Accepted by CVPR 2026

2603.02197 2026-03-03 cs.IT cs.NI cs.SI cs.SY eess.SP eess.SY math.IT

Characterizing Information Accuracy in Timeliness-Based Gossip Networks

Emirhan Tekez, Melih Bastopcu, Sinan Gezici

2603.02194 2026-03-03 cs.CV cs.LG cs.RO cs.SE

From Leaderboard to Deployment: Code Quality Challenges in AV Perception Repositories

Mateus Karvat, Bram Adams, Sidney Givigi

2603.02193 2026-03-03 cs.LG cs.AI stat.ML

Symbol-Equivariant Recurrent Reasoning Models

Richard Freinschlag, Timo Bertram, Erich Kobler, Andreas Mayr, Günter Klambauer

2603.02192 2026-03-03 cs.OH cs.CY

Personal Health Data Integration and Intelligence through Semantic Web and Blockchain Technologies

Oshani Seneviratne, Manan Shukla, Jianjing Lin

2603.02188 2026-03-03 cs.LG

Multi-Head Low-Rank Attention

Songtao Liu, Hongwu Peng, Zhiwei Zhang, Zhengyu Chen, Yue Guo

Comments Accepted by ICLR 2026

2603.02184 2026-03-03 cs.LG cs.AI

MAC: A Conversion Rate Prediction Benchmark Featuring Labels Under Multiple Attribution Mechanisms

Jinqi Wu, Sishuo Chen, Zhangming Chan, Yong Bai, Lei Zhang, Sheng Chen, Chenghuan Hou, Xiang-Rong Sheng, Han Zhu, Jian Xu, Bo Zheng, Chaoyou Fu

Comments Code and data available at https://github.com/alimama-tech/PyMAL

2603.02178 2026-03-03 cs.LG cs.AI stat.ML

Reservoir Subspace Injection for Online ICA under Top-n Whitening

Wenjun Xiao, Yuda Bi, Vince D Calhoun

2603.02176 2026-03-03 cs.CL

Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale

Hao Li, Chunjiang Mu, Jianhao Chen, Siyue Ren, Zhiyao Cui, Yiqun Zhang, Lei Bai, Shuyue Hu

2603.02174 2026-03-03 cs.LG

De-paradox Tree: Breaking Down Simpson's Paradox via A Kernel-Based Partition Algorithm

Xian Teng, Yu-Ru Lin

2603.02172 2026-03-03 cs.CV

GeoDiT: Point-Conditioned Diffusion Transformer for Satellite Image Synthesis

Srikumar Sastry, Dan Cher, Brian Wei, Aayush Dhakal, Subash Khanal, Dev Gupta, Nathan Jacobs

Comments 26 pages, 17 figures

2603.02170 2026-03-03 cs.LG cs.AI

SageBwd: A Trainable Low-bit Attention

Jintao Zhang, Marco Chen, Haoxu Wang, Kai Jiang, Ion Stoica, Joseph E. Gonzalez, Jianfei Chen, Jun Zhu

2603.02164 2026-03-03 cs.DB

Catapults to the Rescue: Accelerating Vector Search by Exploiting Query Locality

Sami Abuzakuk, Anne-Marie Kermarrec, Rafael Pires, Mathis Randl, Martijn de Vos

2603.02162 2026-03-03 cs.CV

Bridging the gap between Performance and Interpretability: An Explainable Disentangled Multimodal Framework for Cancer Survival Prediction

Aniek Eijpe, Soufyan Lakbir, Melis Erdal Cesur, Sara P. Oliveira, Angelos Chatzimparmpas, Sanne Abeln, Wilson Silva

2603.02161 2026-03-03 cs.CR

Boosting Device Utilization in Control Flow Auditing

Alexandra Lengert, Adam Ilyas Caulfield, Ivan De Oliveira Nunes

2603.02156 2026-03-03 cs.NI cs.AI

How Small Can 6G Reason? Scaling Tiny Language Models for AI-Native Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

详情

英文摘要

Emerging 6G visions, reflected in ongoing standardization efforts within 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, increasingly characterize networks as AI-native systems in which high-level semantic reasoning layers operate above standardized control and data-plane functions. Although frontier-scale large language models (LLMs) such as Qwen2.5-7B and Olmo-3-7B demonstrate strong reasoning capability, their computational footprint limits deployment in latency-sensitive, edge-native infrastructures. This paper presents a systematic empirical study of the scaling behavior and deployment efficiency of compact language models for network-level semantic reasoning in AI-native 6G systems. Using 6G-Bench, a standardization-aligned benchmark comprising 30 decision-making tasks across five capability domains, we evaluate models ranging from 135M (SmolLM2-135M) to 7B parameters (Qwen2.5-7B), including mid-scale architectures such as Llama-3.2-1B, Granite-1B, and Qwen2.5-3B. Deterministic accuracy (pass@1) increases from 0.224 at 135M to 0.707 at 7B, but scaling gains are highly non-uniform. A pronounced stability transition occurs in the 1 to 1.5B range, where accuracy rises from 0.373 (Llama-3.2-1B) to 0.531 (Qwen2.5-1.5B) and the instability gap Delta_5 contracts from 0.356 to 0.138. Beyond 3B parameters, improvements diminish (+0.064 from 3B to 7B). Through single-query inference profiling and an Edge Score metric that normalizes accuracy by latency and memory footprint, we show that semantic reliability per unit edge resource does not scale monotonically with parameter count. Instead, mid-scale models (approximately 1.5 to 3B) achieve the most favorable balance between deterministic stability and computational efficiency, providing deployment-relevant guidance for AI-native 6G architectures. All scripts and results are publicly available at https://github.com/maferrag/6G-Bench

URL PDF HTML ☆

赞 0 踩 0

2603.02155 2026-03-03 cs.LG cs.AI math.ST stat.ML stat.TH

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Kaixuan Ji, Qingyue Zhao, Heyang Zhao, Qiwei Di, Quanquan Gu

2603.02153 2026-03-03 cs.IR cs.AI cs.CL

Scaling Retrieval Augmented Generation with RAG Fusion: Lessons from an Industry Deployment

Luigi Medrano, Arush Verma, Mukul Chhabra

2603.02150 2026-03-03 cs.CL cs.AI cs.DB

Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)

Miguel Lopez-Duran, Julian Fierrez, Aythami Morales, Daniel DeAlcala, Gonzalo Mancera, Javier Irigoyen, Ruben Tolosana, Oscar Delgado, Francisco Jurado, Alvaro Ortigosa

Comments Sent for review at the main conference of the International Conference of Document Analysis and Recognition (ICDAR) 2026

2603.02149 2026-03-03 cs.CV eess.SP

3D Field of Junctions: A Noise-Robust, Training-Free Structural Prior for Volumetric Inverse Problems

Namhoon Kim, Narges Moeini, Justin Romberg, Sara Fridovich-Keil

Comments Code will be released soon

2603.02148 2026-03-03 cs.DS

Consistent Low-Rank Approximation

David P. Woodruff, Samson Zhou

Comments ICLR 2026

2603.02146 2026-03-03 cs.CL

LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards

Guanzheng Chen, Michael Qizhe Shieh, Lidong Bing

Comments ICLR 2026

2603.02145 2026-03-03 cs.LG cs.OS

Machine Learning (ML) library in Linux kernel

Viacheslav Dubeyko

2603.02142 2026-03-03 cs.CV cs.LG

Is Bigger Always Better? Efficiency Analysis in Resource-Constrained Small Object Detection

Kwame Mbobda-Kuate, Gabriel Kasmi

Comments 13 pages, 9 figures, 8 tables

2603.02141 2026-03-03 cs.SE

Generative AI in Software Testing: Current Trends and Future Directions

Tanish Singla, Qusay H. Mahmoud