arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.14157 2026-03-17 cs.LG cs.AI

Align Forward, Adapt Backward: Closing the Discretization Gap in Logic Gate Networks

Youngsung Kim

详情

英文摘要

In neural network models, soft mixtures of fixed candidate components (e.g., logic gates and sub-networks) are often used during training for stable optimization, while hard selection is typically used at inference. This raises questions about training-inference mismatch. We analyze this gap by separating forward-pass computation (hard selection vs. soft mixture) from stochasticity (with vs. without Gumbel noise). Using logic gate networks as a testbed, we observe distinct behaviors across four methods: Hard-ST achieves zero selection gap by construction; Gumbel-ST achieves near-zero gap when training succeeds but suffers accuracy collapse at low temperatures; Soft-Mix achieves small gap only at low temperature via weight concentration; and Soft-Gumbel exhibits large gaps despite Gumbel noise, confirming that noise alone does not reduce the gap. We propose CAGE (Confidence-Adaptive Gradient Estimation) to maintain gradient flow while preserving forward alignment. On logic gate networks, Hard-ST with CAGE achieves over 98% accuracy on MNIST and over 58% on CIFAR-10, both with zero selection gap across all temperatures, while Gumbel-ST without CAGE suffers a 47-point accuracy collapse.

URL PDF HTML ☆

赞 0 踩 0

2603.14156 2026-03-17 cs.RO

TransCurriculum: Multi-Dimensional Curriculum Learning for Fast & Stable Locomotion

Prakhar Mishra, Amir Hossain Raj, Xuesu Xiao, Dinesh Manocha

详情

英文摘要

High-speed legged locomotion struggles with stability and transfer losses at higher command velocities during deployment. One reason is that most curricula vary difficulty along single axis, for example increase the range of command velocities, terrain difficulty, or domain parameters (e.g. friction or payload mass) using either fixed update rule or instantaneous rewards while ignoring how the history of robot training has evolved. We propose TransCurriculum, a transformer-based multi-dimensional curriculum learning approach for agile quadrupedal locomotion. TransCurriculum adapts to 3 axes, velocity command targets, terrain difficulty, and domain randomization parameters (friction and payload mass). Rather than feeding task reward history directly into the low-level control policy, our formulation exploits it at the curriculum level. A transformer-based teacher retrieves the sequence of rewards and uses it to predict future rewards, success rate, and learning progress to guide expansion of this multidimensional curriculum towards high performing task bins. Finally we validate our approach on the Unitree Go1 robot in simulation (Isaac Gym) and deploy it zero-shot on Go1 hardware. Our TransCurriculum policy achieves a maximum velocity of 6.3 m/s in simulation and outperforms prior curriculum baselines. We tested our TransCurriculum trained policy on terrains (carpets, slopes, tiles, concrete), achieving a forward velocity of 4.1 m/s on carpet surpassing the fastest curriculum methods by 18.8% and achieves maximum zero-shot value among all tested methods. Our multi-dimensional curriculum also reduces the transfer loss to 18% from 27% for command only curriculum, demonstrating the benefits of joint training over velocity, terrain and domain randomization dimension while keeping the task success rate of 80-90% on rigid indoor and outdoor surfaces.

URL PDF HTML ☆

赞 0 踩 0

2603.14153 2026-03-17 cs.CV

Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories

Junyao Hu, Zhongwei Cheng, Waikeung Wong, Xingxing Zou

Comments CVPR 2026; Project Page: https://artmesciencelab.github.io/Garments2Look

2603.14152 2026-03-17 cs.CV

SK-Adapter: Skeleton-Based Structural Control for Native 3D Generation

Anbang Wang, Yuzhuo Ao, Shangzhe Wu, Chi-Keung Tang

Comments 26 pages, 9 figures

2603.14151 2026-03-17 cs.CV

Seeing Through the PRISM: Compound & Controllable Restoration of Scientific Images

Rupa Kurinchi-Vendhan, Pratyusha Sharma, Antonio Torralba, Sara Beery

2603.14150 2026-03-17 cs.CV

CIPHER: Culvert Inspection through Pairwise Frame Selection and High-Efficiency Reconstruction

Seoyoung Lee, Zhangyang Wang

Comments Accepted by ICCV 2026 End-to-End 3D Learning

2603.14145 2026-03-17 cs.CL cs.CV

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Arushi Goel, Sreyan Ghosh, Vatsal Agarwal, Nishit Anand, Kaousheik Jayakumar, Lasha Koroshinadze, Yao Xu, Katie Lyons, James Case, Karan Sapra, Kevin J. Shih, Siddharth Gururani, Abhinav Shrivastava, Ramani Duraiswami, Dinesh Manocha, Andrew Tao, Bryan Catanzaro, Mohammad Shoeybi, Wei Ping

Comments Project Page: https://huggingface.co/datasets/nvidia/MMOU

2603.14143 2026-03-17 cs.LG

Multifidelity Surrogate Modeling of Depressurized Loss of Forced Cooling in High-temperature Gas Reactors

Meredith Eaheart, Majdi I. Radaideh

Comments 29 pages, 8 figures, 14 Tables

2603.14132 2026-03-17 cs.CV cs.LG

DualSwinFusionSeg: Multimodal Martian Landslide Segmentation via Dual Swin Transformer with Multi-Scale Fusion and UNet++

Shahriar Kabir, Abdullah Muhammed Amimul Ehsan, Istiak Ahmmed Rifti, Md Kaykobad Reza

Comments 10 pages, 2 Figures, 12 Tables. Code is available at: https://github.com/amimulamim/Mars-LS-Segmentation

2603.14131 2026-03-17 cs.LG

Is the reconstruction loss culprit? An attempt to outperform JEPA

Alexey Potapov, Oleg Shcherbakov, Ivan Kravchenko

2603.14130 2026-03-17 cs.CL

The GELATO Dataset for Legislative NER

Matthew Flynn, Timothy Obiso, Sam Newman

Comments Accepted at LREC 2026

2603.14128 2026-03-17 cs.CV cs.AI cs.LG

Diffusion Reinforcement Learning via Centered Reward Distillation

Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, Vicky Kalogeiton

2603.14127 2026-03-17 cs.CV

Implementation and discussion of the Pith Estimation on Rough Log End Images using Local Fourier Spectrum Analysis method

Henry Marichal, Diego Passarella, Gregory Randall

2603.14126 2026-03-17 cs.AI

The Institutional Scaling Law: Non-Monotonic Fitness, Capability-Trust Divergence, and Symbiogenetic Scaling in Generative AI

Mark Baciak, Thomas A. Cellucci

2603.14125 2026-03-17 cs.CV

Low-Field Magnetic Resonance Image Enhancement using Undersampled k-Space

Daniel Tweneboah Anyimadu, Mohammed Abdalla, Mohammed M. Abdelsamea, Ahmed Karam Eldaly

Comments 13 pages, 8 figures

2603.14120 2026-03-17 cs.CV

Low-Field Magnetic Resonance Image Quality Enhancement using Undersampled k-Space and Out-of-Distribution Generalisation

Daniel Tweneboah Anyimadu, Mohammed M. Abdelsamea, Ahmed Karam Eldaly

Comments 5 pages, 5 figures

2603.14117 2026-03-17 cs.CV

Improving Visual Reasoning with Iterative Evidence Refinement

Zeru Shi, Kai Mei, Yihao Quan, Dimitris N. Metaxas, Ruixiang Tang

2603.14112 2026-03-17 cs.CV

Revisiting the Perception-Distortion Trade-off with Spatial-Semantic Guided Super-Resolution

Dan Wang, Haiyan Sun, Shan Du, Z. Jane Wang, Zhaochong An, Serge Belongie, Xinrui Cui

2603.14111 2026-03-17 cs.CL

OasisSimp: An Open-source Asian-English Sentence Simplification Dataset

Hannah Liu, Muxin Tian, Iqra Ali, Haonan Gao, Qiaoyiwen Wu, Blair Yang, Uthayasanker Thayasivam, En-Shiun Annie Lee, Pakawat Nakwijit, Surangika Ranathunga, Ravi Shekhar

Comments Accepted at LREC 2026

2603.14110 2026-03-17 cs.LG

SVD Contextual Sparsity Predictors for Fast LLM Inference

Georgii Serbin, Kirill Koshkin, Zhongao Sun, Anastasiya Bistrigova, C. C. Korikov

2603.14109 2026-03-17 cs.RO

H-RINS: Hierarchical Tightly-coupled Radar-Inertial Navigation via Smoothing and Mapping

Ali Alridha Abdulkarim, Mikhail Litvinov, Dzmitry Tsetserukou

Comments 8 pages, 5 figures, Submitted to conference

2603.14104 2026-03-17 cs.RO

GelSphere: An Omnidirectional Rolling Vision-Based Tactile Sensor for Online 3D Reconstruction and Normal Force Estimation

Seoyeon Lee, Mohammad Amin Mirzaee, Wenzhen Yuan

2603.14096 2026-03-17 cs.LG cs.AI

Concisely Explaining the Doubt: Minimum-Size Abductive Explanations for Linear Models with a Reject Option

Gleilson Pedro Fernandes, Thiago Alves Rocha

Comments Accepted at XAI 2026 (4th World Conference on Explainable Artificial Intelligence)

详情

英文摘要

Trustworthiness in artificial intelligence depends not only on what a model decides, but also on how it handles and explains cases in which a reliable decision cannot be made. In critical domains such as healthcare and finance, a reject option allows the model to abstain when evidence is insufficient, making it essential to explain why an instance is rejected in order to support informed human intervention. In these settings, explanations must not only be interpretable, but also faithful to the underlying model and computationally efficient enough to support real-time decision making. Abductive explanations guarantee fidelity, but their exact computation is known to be NP-hard for many classes of models, limiting their practical applicability. Computing \textbf{minimum-size} abductive explanations is an even more challenging problem, as it requires reasoning not only about fidelity but also about optimality. Prior work has addressed this challenge in restricted settings, including log-linear-time algorithms for computing minimum-size abductive explanations in linear models without rejection, as well as a polynomial-time method based on linear programming for computing abductive explanations, without guarantees of minimum size, for linear models with a reject option. In this work, we bridge these lines of research by computing minimum-size abductive explanations for linear models with a reject option. For accepted instances, we adapt the log-linear algorithm to efficiently compute optimal explanations. For rejected instances, we formulate a 0-1 integer linear programming problem that characterizes minimum-size abductive explanations of rejection. Although this formulation is NP-hard in theory, our experimental results show that it is consistently more efficient in practice than the linear-programming-based approach that does not guarantee minimum-size explanations.

URL PDF HTML ☆

赞 0 踩 0

2603.14092 2026-03-17 cs.LG stat.ME

Soft Mean Expected Calibration Error (SMECE): A Calibration Metric for Probabilistic Labels

Michael Leznik

2603.14087 2026-03-17 cs.LG cs.CL

Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors

Mark Rofin, Jalal Naghiyev, Michael Hahn

Comments ICLR 2026

2603.14086 2026-03-17 cs.CV

Effective Feature Learning for 3D Medical Registration via Domain-Specialized DINO Pretraining

Eytan Kats, Mattias P. Heinrich

Comments Accepted for International Symposium on Biomedical Imaging 2026 (ISBI 2026)

2603.14084 2026-03-17 cs.LG

Bootstrapped Physically-Primed Neural Networks for Robust T2 Distribution Estimation in Low-SNR Pancreatic MRI

Hadas Ben Atya, Nicole Abramenkov, Noa Mashiah, Luise Brock, Daphna Link Sourani, Ram Weiss, Moti Freiman

2603.14078 2026-03-17 cs.CL cs.LG

CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification

Menna Elgabry, Ali Hamdi, Khaled Shaban

2603.14076 2026-03-17 cs.CV

SGR-OCC: Evolving Monocular Priors for Embodied 3D Occupancy Prediction via Soft-Gating Lifting and Semantic-Adaptive Geometric Refinement

Yiran Guo, Simone Mentasti, Xiaofeng Jin, Matteo Frosi, Matteo Matteucci

Comments mian paper: 20 pages, 6 figures; appendix: 15 pages, 5 figures

详情

英文摘要

3D semantic occupancy prediction is a cornerstone for embodied AI, enabling agents to perceive dense scene geometry and semantics incrementally from monocular video streams. However, current online frameworks face two critical bottlenecks: the inherent depth ambiguity of monocular estimation that causes "feature bleeding" at object boundaries , and the "cold start" instability where uninitialized temporal fusion layers distort high-quality spatial priors during early training stages. In this paper, we propose SGR-OCC (Soft-Gating and Ray-refinement Occupancy), a unified framework driven by the philosophy of "Inheritance and Evolution". To perfectly inherit monocular spatial expertise, we introduce a Soft-Gating Feature Lifter that explicitly models depth uncertainty via a Gaussian gate to probabilistically suppress background noise. Furthermore, a Dynamic Ray-Constrained Anchor Refinement module simplifies complex 3D displacement searches into efficient 1D depth corrections along camera rays, ensuring sub-voxel adherence to physical surfaces. To ensure stable evolution toward temporal consistency, we employ a Two-Phase Progressive Training Strategy equipped with identity-initialized fusion, effectively resolving the cold start problem and shielding spatial priors from noisy early gradients. Extensive experiments on the EmbodiedOcc-ScanNet and Occ-ScanNet benchmarks demonstrate that SGR-OCC achieves state-of-the-art performance. In local prediction tasks, SGR-OCC achieves a completion IoU of 58.55$\%$ and a semantic mIoU of 49.89$\%$, surpassing the previous best method, EmbodiedOcc++, by 3.65$\%$ and 3.69$\%$ respectively. In challenging embodied prediction tasks, our model reaches 55.72$\%$ SC-IoU and 46.22$\%$ mIoU. Qualitative results further confirm our model's superior capability in preserving structural integrity and boundary sharpness in complex indoor environments.

URL PDF HTML ☆

赞 0 踩 0

2603.14075 2026-03-17 cs.LG

Enhancing Mental Health Classification with Layer-Attentive Residuals and Contrastive Feature Learning

Menna Elgabry, Ali Hamdi, Khaled Shaban