arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.08322 2026-03-10 cs.AI cs.HC math.CO

Agentic Neurosymbolic Collaboration for Mathematical Discovery: A Case Study in Combinatorial Design

Hai Xia, Carla P. Gomes, Bart Selman, Stefan Szeider

详情

英文摘要

We study mathematical discovery through the lens of neurosymbolic reasoning, where an AI agent powered by a large language model (LLM), coupled with symbolic computation tools, and human strategic direction, jointly produced a new result in combinatorial design theory. The main result of this human-AI collaboration is a tight lower bound on the imbalance of Latin squares for the notoriously difficult case $n \equiv 1 \pmod{3}$. We reconstruct the discovery process from detailed interaction logs spanning multiple sessions over several days and identify the distinct cognitive contributions of each component. The AI agent proved effective at uncovering hidden structure and generating hypotheses. The symbolic component consists of computer algebra, constraint solvers, and simulated annealing, which provides rigorous verification and exhaustive enumeration. Human steering supplied the critical research pivot that transformed a dead end into a productive inquiry. Our analysis reveals that multi-model deliberation among frontier LLMs proved reliable for criticism and error detection but unreliable for constructive claims. The resulting human-AI mathematical contribution, a tight lower bound of $4n(n{-}1)/9$, is achieved via a novel class of near-perfect permutations. The bound was formally verified in Lean 4. Our experiments show that neurosymbolic systems can indeed produce genuine discoveries in pure mathematics.

URL PDF HTML ☆

赞 0 踩 0

2603.08321 2026-03-10 cs.AI

CORE-Acu: Structured Reasoning Traces and Knowledge Graph Safety Verification for Acupuncture Clinical Decision Support

Liuyi Xu, Yun Guo, Ming Chen, Zihan Dun, Yining Qian, An-Yang Lu, Shuang Li, Lijun Liu

Comments 19 pages, 5 figures, 18 tables. Includes the Acu-Reasoning dataset and TCM knowledge graph schema

详情

英文摘要

Large language models (LLMs) show significant potential for clinical decision support (CDS), yet their black-box nature -- characterized by untraceable reasoning and probabilistic hallucinations -- poses severe challenges in acupuncture, a field demanding rigorous interpretability and safety. To address this, we propose CORE-Acu, a neuro-symbolic framework for acupuncture clinical decision support that integrates Structured Chain-of-Thought (S-CoT) with knowledge graph (KG) safety verification. First, we construct the first acupuncture Structured Reasoning Trace dataset and a schema-constrained fine-tuning framework. By enforcing an explicit causal chain from pattern identification to treatment principles, treatment plans, and acupoint selection, we transform implicit Traditional Chinese Medicine (TCM) reasoning into interpretable generation constraints, mitigating the opacity of LLM-based CDS. Furthermore, we construct a TCM safety knowledge graph and establish a ``Generate--Verify--Revise'' closed-loop inference system based on a Symbolic Veto Mechanism, employing deterministic rules to intercept hallucinations and enforce hard safety boundaries. Finally, we introduce the Lexicon-Matched Entity-Reweighted Loss (LMERL), which corrects terminology drift caused by the frequency--importance mismatch in general optimization by adaptively amplifying gradient contributions of high-risk entities during fine-tuning. Experiments on 1,000 held-out cases demonstrate CORE-Acu's superior entity fidelity and reasoning quality. Crucially, CORE-Acu achieved 0/1,000 observed safety violations (95\% CI: 0--0.37\%), whereas GPT-4o exhibited an 8.5\% violation rate under identical rules. These results establish CORE-Acu as a robust neuro-symbolic framework for acupuncture clinical decision support, guaranteeing both reasoning auditability and strict safety compliance.

URL PDF HTML ☆

赞 0 踩 0

2603.08317 2026-03-10 cs.CV cs.AI

Human-AI Divergence in Ego-centric Action Recognition under Spatial and Spatiotemporal Manipulations

Sadegh Rahmaniboldaji, Filip Rybansky, Quoc C. Vuong, Anya C. Hurlbert, Frank Guerin, Andrew Gilbert

2603.08313 2026-03-10 cs.CV

HDR-NSFF: High Dynamic Range Neural Scene Flow Fields

Shin Dong-Yeon, Kim Jun-Seong, Kwon Byung-Ki, Tae-Hyun Oh

Comments ICLR 2026. Project page: https://shin-dong-yeon.github.io/HDR-NSFF/

2603.08312 2026-03-10 cs.CL

Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder

Maryem Bouziane, Salima Mdhaffar, Yannick Estève

Comments Submitted to Interspeech

2603.08305 2026-03-10 cs.CV cs.AI

Retrieval-Augmented Anatomical Guidance for Text-to-CT Generation

Daniele Molino, Camillo Maria Caruso, Paolo Soda, Valerio Guarrasi

2603.08289 2026-03-10 cs.CV

Novel Semantic Prompting for Zero-Shot Action Recognition

Salman Iqbal, Waheed Rehman

2603.08286 2026-03-10 cs.CL

LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

Serene Wang, Lavanya Pobbathi, Haihua Chen

2603.08283 2026-03-10 cs.LG cs.SY eess.SY math.OC

PolyFormer: learning efficient reformulations for scalable optimization under complex physical constraints

Yilin Wen, Yi Guo, Bo Zhao, Wei Qi, Zechun Hu, Colin Jones, Jian Sun

Comments Code availability: All the data and code are made openly available at https://github.com/wenyl16/PolyFormer

2603.08282 2026-03-10 cs.CL

Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization

Chaimae Chellaf, Salima Mdhaffar, Yannick Estève, Stéphane Huet

Comments Accepted at LREC 2026

2603.08279 2026-03-10 cs.CV

OSCAR: Occupancy-based Shape Completion via Acoustic Neural Implicit Representations

Magdalena Wysocki, Kadir Burak Buldu, Miruna-Alexandra Gafencu, Mohammad Farid Azampour, Nassir Navab

2603.08278 2026-03-10 cs.LG cs.AI cs.DC cs.ET

TA-RNN-Medical-Hybrid: A Time-Aware and Interpretable Framework for Mortality Risk Prediction

Zahra Jafari, Azadeh Zamanifar, Amirfarhad Farhadi

详情

英文摘要

Accurate and interpretable mortality risk prediction in intensive care units (ICUs) remains a critical challenge due to the irregular temporal structure of electronic health records (EHRs), the complexity of longitudinal disease trajectories, and the lack of clinically grounded explanations in many data-driven models. To address these challenges, we propose \textit{TA-RNN-Medical-Hybrid}, a time-aware and knowledge-enriched deep learning framework that jointly models longitudinal clinical sequences and irregular temporal dynamics through explicit continuous-time encoding, along with standardized medical concept representations. The proposed framework extends time-aware recurrent modeling by integrating explicit continuous-time embeddings that operate independently of visit indexing, SNOMED-based disease representations, and a hierarchical dual-level attention mechanism that captures both visit-level temporal importance and feature/concept-level clinical relevance. This design enables accurate mortality risk estimation while providing transparent and clinically meaningful explanations aligned with established medical knowledge. We evaluate the proposed approach on the MIMIC-III critical care dataset and compare it against strong time-aware and sequential baselines. Experimental results demonstrate that TA-RNN-Medical-Hybrid consistently improves predictive performance in terms of AUC, accuracy, and recall-oriented F$_2$-score. Moreover, qualitative analysis shows that the model effectively decomposes mortality risk across time and clinical concepts, yielding interpretable insights into disease severity, chronicity, and temporal progression. Overall, the proposed framework bridges the gap between predictive accuracy and clinical interpretability, offering a scalable and transparent solution for high-stakes ICU decision support systems.

URL PDF HTML ☆

赞 0 踩 0

2603.08275 2026-03-10 cs.CL cs.AI

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun Qian

详情

英文摘要

With the widespread adoption of Large Language Models (LLMs), respecting indigenous cultures becomes essential for models' culturally safety and responsible global applications. Existing studies separately consider cultural safety and cultural knowledge and neglect that the former should be grounded by the latter. This severely prevents LLMs from yielding culture-specific respectful responses. Consequently, adaptive cultural safety remains a formidable task. In this work, we propose to jointly model cultural safety and knowledge. First and foremost, cultural-safety and knowledge-paired data serve as the key prerequisite to conduct this research. However, the cultural diversity across regions and the subtlety of cultural differences pose significant challenges to the creation of such paired evaluation data. To address this issue, we propose a novel framework that integrates authoritative cultural knowledge descriptions curation, LLM-automated query generation, and heavy manual verification. Accordingly, we obtain a dataset named AdaCultureSafe containing 4.8K manually decomposed fine-grained cultural descriptions and the corresponding 48K manually verified safety- and knowledge-oriented queries. Upon the constructed dataset, we evaluate three families of popular LLMs on their cultural safety and knowledge proficiency, via which we make a critical discovery: no significant correlation exists between their cultural safety and knowledge proficiency. We then delve into the utility-related neuron activations within LLMs to investigate the potential cause of the absence of correlation, which can be attributed to the difference of the objectives of pre-training and post-alignment. We finally present a knowledge-grounded method, which significantly enhances cultural safety by enforcing the integration of knowledge into the LLM response generation process.

URL PDF HTML ☆

赞 0 踩 0

2603.08274 2026-03-10 cs.CL cs.AI

How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms

JV Roig

Comments 18 pages, 12 tables, 2 figures

详情

英文摘要

How much do large language models actually hallucinate when answering questions grounded in provided documents? Despite the critical importance of this question for enterprise AI deployments, reliable measurement has been hampered by benchmarks that rely on static datasets vulnerable to contamination, LLM-based judges with documented biases, or evaluation scales too small for statistical confidence. We address this gap using RIKER, a ground-truth-first evaluation methodology that enables deterministic scoring without human annotation. Across 35 open-weight models, three context lengths (32K, 128K, and 200K tokens), four temperature settings, and three hardware platforms (NVIDIA H200, AMD MI300X, and Intel Gaudi 3), we conducted over 172 billion tokens of evaluation - an order of magnitude beyond prior work. Our findings reveal that: (1) even the best-performing models fabricate answers at a non-trivial rate - 1.19% at best at 32K, with top-tier models at 5 - 7% - and fabrication rises steeply with context length, nearly tripling at 128K and exceeding 10% for all models at 200K; (2) model selection dominates all other factors, with overall accuracy spanning a 72-percentage-point range and model family predicting fabrication resistance better than model size; (3) temperature effects are nuanced - T=0.0 yields the best overall accuracy in roughly 60% of cases, but higher temperatures reduce fabrication for the majority of models and dramatically reduce coherence loss (infinite generation loops), which can reach 48x higher rates at T=0.0 versus T=1.0; (4) grounding ability and fabrication resistance are distinct capabilities - models that excel at finding facts may still fabricate facts that do not exist; and (5) results are consistent across hardware platforms, confirming that deployment decisions need not be hardware-dependent.

URL PDF HTML ☆

赞 0 踩 0

2603.08273 2026-03-10 cs.RO cs.MA

Less is More: Robust Zero-Communication 3D Pursuit-Evasion via Representational Parsimony

Jialin Ying, Zhihao Li, Zicheng Dong, Guohua Wu, Yihuan Liao

Comments 7 pages, 10 figures. This work has been submitted to the IEEE for possible publication

2603.08271 2026-03-10 cs.CV

Prototype-Guided Concept Erasure in Diffusion Models

Yuze Cai, Jiahao Lu, Hongxiang Shi, Yichao Zhou, Hong Lu

Comments Accepted by CVPR 2026

2603.08270 2026-03-10 cs.LG cs.AI

SCL-GNN: Towards Generalizable Graph Neural Networks via Spurious Correlation Learning

Yuxiang Zhang, Enyan Dai

2603.08269 2026-03-10 cs.RO cs.AI

SAIL: Test-Time Scaling for In-Context Imitation Learning with VLM

Makoto Sato, Yusuke Iwasawa, Yujin Tang, So Kuroki

Comments 8 pages, 3 figures

2603.08267 2026-03-10 cs.AI cs.CE cs.LG

Towards a more efficient bias detection in financial language models

Firas Hadj Kacem, Ahmed Khanfir, Mike Papadakis

2603.08265 2026-03-10 cs.LG

Airborne Magnetic Anomaly Navigation with Neural-Network-Augmented Online Calibration

Antonia Hager, Sven Nebendahl, Alexej Klushyn, Jasper Krauser, Torleiv H. Bryne, Tor Arne Johansen

2603.08262 2026-03-10 cs.AI

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

Jiaxuan Lu, Kong Wang, Yemin Wang, Qingmei Tang, Hongwei Zeng, Xiang Chen, Jiahao Pi, Shujian Deng, Lingzhi Chen, Yi Fu, Kehua Yang, Xiao Sun

2603.08260 2026-03-10 cs.RO

Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation

Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Zhengbin Long, Haodong Xiang, Rong Shi, Zhuo Cui, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Biao Liu, Zhenzhe Sun, Tao Shen

2603.08258 2026-03-10 cs.CV

WaDi: Weight Direction-aware Distillation for One-step Image Synthesis

Lei Wang, Yang Cheng, Senmao Li, Ge Wu, Yaxing Wang, Jian Yang

Comments Accepted to CVPR 2026;Code:https://github.com/gudaochangsheng/WaDi

2603.08255 2026-03-10 cs.RO cs.LG

FlowTouch: View-Invariant Visuo-Tactile Prediction

Seongjin Bien, Carlo Kneissl, Tobias Jülg, Frank Fundel, Thomas Ressler-Antal, Florian Walter, Björn Ommer, Gitta Kutyniok, Wolfram Burgard

2603.08254 2026-03-10 cs.CV

DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving

Zhuolin He, Jing Li, Guanghao Li, Xiaolei Chen, Jiacheng Tang, Siyang Zhang, Zhounan Jin, Feipeng Cai, Bin Li, Jian Pu, Jia Cai, Xiangyang Xue

2603.08251 2026-03-10 cs.CL

Not All Queries Need Deep Thought: CoFiCot for Adaptive Coarse-to-fine Stateful Refinement

Dongxu Zhang, Hongqiang Lin, Yiding Sun, Pengyu Wang, Qirui Wang, Ning Yang, Jihua Zhu

2603.08242 2026-03-10 cs.LG stat.AP

Optimising antibiotic switching via forecasting of patient physiology

Magnus Ross, Nel Swanepoel, Akish Luintel, Emma McGuire, Ingemar J. Cox, Steve Harris, Vasileios Lampos

Comments 32 pages, 8 figures

2603.08241 2026-03-10 cs.CL

Sensivity of LLMs' Explanations to the Training Randomness:Context, Class & Task Dependencies

Romain Loncour, Jérémie Bogaert, François-Xavier Standaert

Comments 6 pages, 6 figures

2603.08240 2026-03-10 cs.CV

SiMO: Single-Modality-Operable Multimodal Collaborative Perception

Jiageng Wen, Shengjie Zhao, Bing Li, Jiafeng Huang, Kenan Ye, Hao Deng

Comments Accepted to ICLR 2026. This arXiv version includes an additional appendix (Appendix 15) containing further philosophical discussion not included in the official ICLR peer-reviewed version

2603.08239 2026-03-10 cs.LG cs.AI cs.CL

Fibration Policy Optimization

Chang Li, Tshihao Tsu, Yaren Zhang, Chao Xue, Xiaodong He