arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.08826 2026-04-13 cs.LG cs.AI cs.CL

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

Mehran Taghian, Yunke Peng, Xing Huang, Yao Wang, Yaoyuan Wang, Wei Guo, Yuanyong Luo, Tianchi Hu, Junsong Wang, Xin Wang, Hu Liu, Yu Cheng, Ziwei Yu, Hongliang Li, Mehdi Rahimifar, Lei Yan, Xuefei Wang, Zhuang Ma, Lei Liu, Hui Yu, Anandharaju Durai Raju, Hoang Le, Hei Yi Mak, Tanzila Rahman, Shadan Golestan

详情

英文摘要

Large foundation models have become central to modern machine learning, with performance scaling predictably with model size and data. However, training and deploying such models incur substantial computational and memory costs, motivating the development of low-precision training techniques. Recent work has demonstrated that 4-bit floating-point (FP4) formats--such as MXFP4 and NVFP4--can be successfully applied to linear GEMM operations in large language models (LLMs), achieving up to 4x improvements in compute throughput and memory efficiency compared to higher-precision baselines. In this work, we investigate the recently proposed HiFloat4 FP4 format for Huawei Ascend NPUs and systematically compare it with MXFP4 in large-scale training settings. All experiments are conducted on Ascend NPU clusters, with linear and expert GEMM operations performed entirely in FP4 precision. We evaluate both dense architectures (e.g., Pangu and LLaMA-style models) and mixture-of-experts (MoE) models, where both standard linear layers and expert-specific GEMMs operate in FP4. Furthermore, we explore stabilization techniques tailored to FP4 training that significantly reduce numerical degradation, maintaining relative error within 1% of full-precision baselines while preserving the efficiency benefits of 4-bit computation. Our results provide a comprehensive empirical study of FP4 training on NPUs and highlight the practical trade-offs between FP4 formats in large-scale dense and MoE models.

URL PDF HTML ☆

赞 0 踩 0

2604.08816 2026-04-13 cs.LG

Loom: A Scalable Analytical Neural Computer Architecture

Mehmet Kerem Turkcan

2604.08815 2026-04-13 cs.CV

Towards Responsible Multimodal Medical Reasoning via Context-Aligned Vision-Language Models

Sumra Khan, Sagar Chhabriya, Aizan Zafar, Sheeraz Arif, Amgad Muneer, Anas Zafar, Shaina Raza, Rizwan Qureshi

2604.08810 2026-04-13 cs.CV cs.LG

R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII

Zewei Zhou, Jiajun Zou, Jiajia Zhang, Ao Yang, Ruichao He, Haozheng Zhou, Ao Liu, Jiawei Liu, Leilei Jin, Shan Shen, Daying Sun

Comments Accepted as a poster by CVPR2026

2604.08808 2026-04-13 cs.LG cs.HC

Smartwatch-Based Sitting Time Estimation in Real-World Office Settings

Olivia Zhang, Zhilin Zhang

Comments Accepted at the 18th International Conference on Machine Learning and Computing (ICMLC 2026), February 6-9, 2026

2604.08802 2026-04-13 cs.LG cs.SY eess.SY

Alleviating Community Fear in Disasters via Multi-Agent Actor-Critic Reinforcement Learning

Yashodhan D. Hakke, Almuatazbellah M. Boker, Lamine Mili, Michael von Spakovsky, Hoda Eldardiry

Comments 10 pages, 6 figures

2604.08801 2026-04-13 cs.LG cs.CL

$p1$: Better Prompt Optimization with Fewer Prompts

Zhaolin Gao, Yu, Wang, Bo Liu, Thorsten Joachims, Kianté Brantley, Wen Sun

2604.08797 2026-04-13 cs.CL cs.AI

Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation

Sophie Wu, Andrew Piper

2604.08788 2026-04-13 cs.CL

MedConceal: A Benchmark for Clinical Hidden-Concern Reasoning Under Partial Observability

Yikun Han, Joey Chan, Jingyuan Chen, Mengting Ai, Simo Du, Yue Guo

详情

英文摘要

Patient-clinician communication is an asymmetric-information problem: patients often do not disclose fears, misconceptions, or practical barriers unless clinicians elicit them skillfully. Effective medical dialogue therefore requires reasoning under partial observability: clinicians must elicit latent concerns, confirm them through interaction, and respond in ways that guide patients toward appropriate care. However, existing medical dialogue benchmarks largely sidestep this challenge by exposing hidden patient state, collapsing elicitation into extraction, or evaluating responses without modeling what remains hidden. We present MedConceal, a benchmark with an interactive patient simulator for evaluating hidden-concern reasoning in medical dialogue, comprising 300 curated cases and 600 clinician-LLM interactions. Built from clinician-answered online health discussions, each case pairing clinician-visible context with simulator-internal hidden concerns derived from prior literature and structured using an expert-developed taxonomy. The simulator withholds these concerns from the dialogue agent, tracks whether they have been revealed and addressed via theory-grounded turn-level communication signals, and is clinician-reviewed for clinical plausibility. This enables process-aware evaluation of both task success and the interaction process that leads to it. We study two abilities: confirmation, surfacing hidden concerns through multi-turn dialogue, and intervention, addressing the primary concern and guiding the patient toward a target plan. Results show that no single system dominates: frontier models lead on different confirmation metrics, while human clinicians (N=159) remain strongest on intervention success. Together, these results identify hidden-concern reasoning under partial observability as a key unresolved challenge for medical dialogue systems.

URL PDF HTML ☆

赞 0 踩 0

2604.08787 2026-04-13 cs.RO

One Interface, Many Robots: Unified Real-Time Low-Level Motion Planning for Collaborative Arms

Yue Feng, Weicheng Huang, I-Ming Chen

2604.08786 2026-04-13 cs.SD eess.AS

Script Collapse in Multilingual ASR: Defining and Measuring Script Fidelity Rate

Hanif Rahman

2604.08780 2026-04-13 cs.RO cs.LG

Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning

Mohamad H. Danesh, Chenhao Li, Amin Abyaneh, Anas Houssaini, Kirsty Ellis, Glen Berseth, Marco Hutter, Hsiu-Chin Lin

2604.08779 2026-04-13 cs.LG

Adaptive Simulation Experiment for LLM Policy Optimization

Mingjie Hu, Siyang Gao, Jian-qiang Hu, Enlu Zhou

2604.08764 2026-04-13 cs.CL math.DG

Revisiting Anisotropy in Language Transformers: The Geometry of Learning Dynamics

Raphael Bernas, Fanny Jourdan, Antonin Poché, Céline Hudelot

2604.08762 2026-04-13 cs.CV cs.AI

InstrAct: Towards Action-Centric Understanding in Instructional Videos

Zhuoyi Yang, Jiapeng Yu, Reuben Tan, Boyang Li, Huijuan Xu

2604.08761 2026-04-13 cs.CV

State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition

Bryan Cheng, Austin Jin, Jasper Zhang

Comments 8 pages, 3 figures. Accepted to workshop on Algorithmic Fairness Across Alignment Procedures and Agentic Systems at ICLR 2026

2604.08760 2026-04-13 cs.CV

SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation

Ming He, Zhixiang Chen, Steve Maddock

2604.08757 2026-04-13 cs.CL cs.AI

Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

Yousra Fettach, Guillaume Bied, Hannu Toivonen, Tijl De Bie

2604.08756 2026-04-13 cs.AI

Artifacts as Memory Beyond the Agent Boundary

John D. Martin, Fraser Mince, Esra'a Saleh, Amy Pajak

2604.08754 2026-04-13 cs.LG

IKKA: Inversion Classification via Critical Anomalies for Robust Visual Servoing

Darya Pavlenko

Comments 9 pages, 2 figures, 3 tables. Submitted to NeurIPS 2026

2604.08752 2026-04-13 cs.CL cs.AI

LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs

Paolo Gajo, Domenic Rosati, Hassan Sajjad, Alberto Barrón-Cedeño

Comments Accepted at ACL 2026 (Main Conference)

2604.08750 2026-04-13 cs.LG cs.SY eess.SY

Adversarial Sensor Errors for Safe and Robust Wind Turbine Fleet Control

Julian Quick, Marcus Binder Nilsen, Andreas Bechmann, Tran Nguyen Le, Pierre-Elouan Mikael Rethore

Comments Submitted to Journal of Physics: Conference Series (Torque 2026). This is the Accepted Manuscript version of an article accepted for publication in Journal of Physics: Conference Series. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. This Accepted Manuscript is published under a CC BY licence

2604.08741 2026-04-13 cs.CV

LPLCv2: An Expanded Dataset for Fine-Grained License Plate Legibility Classification

Lucas Wojcik, Eduardo A. F. Machoski, Eduil Nascimento, Rayson Laroca, David Menotti

2604.08728 2026-04-13 cs.LG

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

Diyi Hu, Bhaskar Krishnamachari

2604.08726 2026-04-13 cs.RO

Task-Aware Bimanual Affordance Prediction via VLM-Guided Semantic-Geometric Reasoning

Fabian Hahne, Vignesh Prasad, Georgia Chalvatzaki, Jan Peters, Alap Kshirsagar

2604.08723 2026-04-13 cs.CL cs.AI

Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?

Chia-Hsuan Lee, Mingyang Zhou, Renkun Ni, Zelei Cheng, Sihui Dai, Supriyo Chakraborty, Shixiong Zhang, Sambit Sahu, William Campbell

2604.08722 2026-04-13 cs.CV cs.AI

AI Driven Soccer Analysis Using Computer Vision

Adrian Manchado, Tanner Cellio, Jonathan Keane, Yiyang Wang

2604.08719 2026-04-13 cs.CV cs.AI cs.RO

LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving

Hao Shao, Letian Wang, Yang Zhou, Yuxuan Hu, Zhuofan Zong, Steven L. Waslander, Wei Zhan, Hongsheng Li

2604.08716 2026-04-13 cs.CV

What Matters in Virtual Try-Off? Dual-UNet Diffusion Model For Garment Reconstruction

Loc-Phat Truong, Meysam Madadi, Sergio Escalera

2604.08711 2026-04-13 cs.CV cs.AI cs.LG

Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup

Vrushank Ahire, Vivek Kurumanghat, Mudasir Ganaie, Lipika Kabiraj

详情

英文摘要

The disintegration of liquid sheets into ligaments and droplets involves highly transient, multi-scale dynamics that are difficult to quantify from high-speed shadowgraphy images. Identifying droplets, ligaments, and blobs formed during breakup, along with tracking across frames, is essential for spray analysis. However, conventional multi-object tracking frameworks impose strict one-to-one temporal associations and cannot represent one-to-many fragmentation events. In this study, we present a two-stage deep learning framework for object detection and temporal relationship modeling across frames. The framework captures ligament deformation, fragmentation, and parent-child lineage during liquid sheet disintegration. In the first stage, a Faster R-CNN with a ResNet-50 backbone and Feature Pyramid Network detects and classifies ligaments and droplets in high-speed shadowgraphy recordings of an impinging Carbopol gel jet. A morphology-preserving synthetic data generation strategy augments the training set without introducing physically implausible configurations, achieving a held-out F1 score of up to 0.872 across fourteen original-to-synthetic configurations. In the second stage, a Transformer-augmented multilayer perceptron classifies inter-frame associations into continuation, fragmentation (one-to-many), and non-association using physics-informed geometric features. Despite severe class imbalance, the model achieves 86.1% accuracy, 93.2% precision, and perfect recall (1.00) for fragmentation events. Together, the framework enables automated reconstruction of fragmentation trees, preservation of parent-child lineage, and extraction of breakup statistics such as fragment multiplicity and droplet size distributions. By explicitly identifying children droplets formed from ligament fragmentation, the framework provides automated analysis of the primary atomization mode.

URL PDF HTML ☆

赞 0 踩 0