arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

Hangrui Hu, Xinfa Zhu, Ting He, Dake Guo, Bin Zhang, Xiong Wang, Zhifang Guo, Ziyue Jiang, Hongkun Hao, Zishan Guo, Xinyu Zhang, Pei Zhang, Baosong Yang, Jin Xu, Jingren Zhou, Junyang Lin

Comments https://github.com/QwenLM/Qwen3-TTS

2601.15615 2026-01-23 cs.CV

Region-aware Spatiotemporal Modeling with Collaborative Domain Generalization for Cross-Subject EEG Emotion Recognition

Weiwei Wu, Yueyang Li, Yuhu Shi, Weiming Zeng, Lang Qin, Yang Yang, Ke Zhou, Zhiguo Zhang, Wai Ting Siok, Nizhuan Wang

2601.15607 2026-01-23 cs.RO

Airflow Source Seeking on Small Quadrotors Using a Single Flow Sensor

Lenworth Thomas, Tjaden Bridges, Sarah Bergbreiter

2601.15597 2026-01-23 cs.LG eess.SP

Neural Nonlinear Shrinkage of Covariance Matrices for Minimum Variance Portfolio Optimization

Liusha Yang, Siqi Zhao, Shuqi Chai

2601.15596 2026-01-23 cs.SD cs.AI eess.AS

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice

Leying Zhang, Tingxiao Zhou, Haiyang Sun, Mengxiao Bi, Yanmin Qian

2601.15589 2026-01-23 cs.LG

Deep Learning for Perishable Inventory Systems with Human Knowledge

Xuan Liao, Zhenkang Peng, Ying Rong

2601.15588 2026-01-23 cs.CL

YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models

Junyu Lin, Meizhen Liu, Xiufeng Huang, Jinfeng Li, Haiwen Hong, Xiaohan Yuan, Yuefeng Chen, Longtao Huang, Hui Xue, Ranjie Duan, Zhikai Chen, Yuchuan Fu, Defeng Li, Lingyao Gao, Yitong Yang

2601.15560 2026-01-23 cs.CV

Relative Classification Accuracy: A Calibrated Metric for Identity Consistency in Fine-Grained K-pop Face Generation

Sylvey Lin, Eranki Vasistha

2601.15558 2026-01-23 cs.CL

From Generation to Collaboration: Using LLMs to Edit for Empathy in Healthcare

Man Luo, Bahareh Harandizadeh, Amara Tariq, Halim Abbas, Umar Ghaffar, Christopher J Warren, Segun O. Kolade, Haidar M. Abdul-Muhsin

2601.15552 2026-01-23 cs.LG cs.AI stat.ML

BanditLP: Large-Scale Stochastic Optimization for Personalized Recommendations

Phuc Nguyen, Benjamin Zelditch, Joyce Chen, Rohit Patra, Changshuai Wei

2601.15551 2026-01-23 cs.AI cs.MA

ALIGNAgent: Adaptive Learner Intelligence for Gap Identification and Next-step guidance

Bismack Tokoli, Luis Jaimes, Ayesha S. Dina

Comments 35 pages

2601.15549 2026-01-23 cs.CV cs.AI

VIOLA: Towards Video In-Context Learning with Minimal Annotations

Ryo Fujii, Hideo Saito, Ryo Hachiuma

2601.15546 2026-01-23 cs.LG

Beyond validation loss: Clinically-tailored optimization metrics improve a model's clinical performance

Charles B. Delahunt, Courosh Mehanian, Daniel E. Shea, Matthew P. Horning

Comments 16 pages, 9 figures

2601.15545 2026-01-23 cs.RO

A Mobile Magnetic Manipulation Platform for Gastrointestinal Navigation with Deep Reinforcement Learning Control

Zhifan Yan, Chang Liu, Yiyang Jiang, Wenxuan Zheng, Xinhao Chen, Axel Krieger

2601.15538 2026-01-23 cs.LG cs.AI

QUAIL: Quantization Aware Unlearning for Mitigating Misinformation in LLMs

Himanshu Mishra, Kanwal Mehreen

2601.15533 2026-01-23 cs.AI

From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models

Zhikang Chen, Tingting Zhu

2601.15511 2026-01-23 cs.CL cs.CY

AdversaRiskQA: An Adversarial Factuality Benchmark for High-Risk Domains

Adam Szelestey, Sofie van Engelen, Tianhao Huang, Justin Snelders, Qintao Zeng, Songgaojun Deng

Comments 13 pages, 4 figures, and 11 tables

详情

英文摘要

Hallucination in large language models (LLMs) remains an acute concern, contributing to the spread of misinformation and diminished public trust, particularly in high-risk domains. Among hallucination types, factuality is crucial, as it concerns a model's alignment with established world knowledge. Adversarial factuality, defined as the deliberate insertion of misinformation into prompts with varying levels of expressed confidence, tests a model's ability to detect and resist confidently framed falsehoods. Existing work lacks high-quality, domain-specific resources for assessing model robustness under such adversarial conditions, and no prior research has examined the impact of injected misinformation on long-form text factuality. To address this gap, we introduce AdversaRiskQA, the first verified and reliable benchmark systematically evaluating adversarial factuality across Health, Finance, and Law. The benchmark includes two difficulty levels to test LLMs' defensive capabilities across varying knowledge depths. We propose two automated methods for evaluating the adversarial attack success and long-form factuality. We evaluate six open- and closed-source LLMs from the Qwen, GPT-OSS, and GPT families, measuring misinformation detection rates. Long-form factuality is assessed on Qwen3 (30B) under both baseline and adversarial conditions. Results show that after excluding meaningless responses, Qwen3 (80B) achieves the highest average accuracy, while GPT-5 maintains consistently high accuracy. Performance scales non-linearly with model size, varies by domains, and gaps between difficulty levels narrow as models grow. Long-form evaluation reveals no significant correlation between injected misinformation and the model's factual output. AdversaRiskQA provides a valuable benchmark for pinpointing LLM weaknesses and developing more reliable models for high-stakes applications.

URL PDF HTML ☆

赞 0 踩 0

2601.15509 2026-01-23 cs.AI cs.CL

The Dark Side of AI Transformers: Sentiment Polarization & the Loss of Business Neutrality by NLP Transformers

Prasanna Kumar

2601.15508 2026-01-23 cs.CL

Computational Representations of Character Significance in Novels

Haaris Mian, Melanie Subbiah, Sharon Marcus, Nora Shaalan, Kathleen McKeown

2601.15506 2026-01-23 cs.CL cs.LG

ViT Registers and Fractal ViT

Jason Chuan-Chih Chou, Abhinav Kumar, Shivank Garg

2601.15504 2026-01-23 cs.LG q-bio.GN q-bio.QM

SAGE-FM: A lightweight and interpretable spatial transcriptomics foundation model

Xianghao Zhan, Jingyu Xu, Yuanning Zheng, Zinaida Good, Olivier Gevaert

Comments 26 pages, 5 figures

2601.15495 2026-01-23 cs.AI cs.CL

Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

Yiyang Feng, Zeming Chen, Haotian Wu, Jiawei Zhou, Antoine Bosselut

Comments Accepted to EACL 2026 (Main)

2601.15490 2026-01-23 cs.CV

Hybrid Vision Transformer_GAN Attribute Neutralizer for Mitigating Bias in Chest X_Ray Diagnosis

Jobeal Solomon, Ali Mohammed Mansoor Alsahag, Seyed Sahand Mohammadi Ziabari

2601.15487 2026-01-23 cs.AI cs.CL cs.MA

MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation

Chandan Kumar Sahu, Premith Kumar Chilukuri, Matthew Hetrich

Comments 12 pages, 2 figures, Submitted to ACL

2601.15486 2026-01-23 cs.RO

A Universal Large Language Model -- Drone Command and Control Interface

Javier N. Ramos-Silva, Peter J. Burke

详情

英文摘要

The use of artificial intelligence (AI) for drone control can have a transformative impact on drone capabilities, especially when real world information can be integrated with drone sensing, command, and control, part of a growing field of physical AI. Large language models (LLMs) can be advantageous if trained at scale on general knowledge, but especially and in particular when the training data includes information such as detailed map geography topology of the entire planet, as well as the ability to access real time situational data such as weather. However, challenges remain in the interface between drones and LLMs in general, with each application requiring a tedious, labor intensive effort to connect the LLM trained knowledge to drone command and control. Here, we solve that problem, using an interface strategy that is LLM agnostic and drone agnostic, providing the first universal, versatile, comprehensive and easy to use drone control interface. We do this using the new model context protocol (MCP) standard, an open standard that provides a universal way for AI systems to access external data, tools, and services. We develop and deploy a cloud based Linux machine hosting an MCP server that supports the Mavlink protocol, an ubiquitous drone control language used almost universally by millions of drones including Ardupilot and PX4 framework.We demonstrate flight control of a real unmanned aerial vehicle. In further testing, we demonstrate extensive flight planning and control capability in a simulated drone, integrated with a Google Maps MCP server for up to date, real time navigation information. This demonstrates a universal approach to integration of LLMs with drone command and control, a paradigm that leverages and exploits virtually all of modern AI industry with drone technology in an easy to use interface that translates natural language to drone control.

URL PDF HTML ☆

赞 0 踩 0

2601.15482 2026-01-23 cs.LG cs.AI

Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding

Huayu Li, ZhengXiao He, Siyuan Tian, Jinghao Wen, Ao Li

2601.15481 2026-01-23 cs.LG math.OC

Early predicting of hospital admission using machine learning algorithms: Priority queues approach

Jakub Antczak, James Montgomery, Małgorzata O'Reilly, Zbigniew Palmowski, Richard Turner

2601.15476 2026-01-23 cs.AI cs.PF

Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases

Alex Dantart

2601.15473 2026-01-23 cs.LG cs.AI

Panther: Faster and Cheaper Computations with Randomized Numerical Linear Algebra

Fahd Seddik, Abdulrahman Elbedewy, Gaser Sami, Mohamed Abdelmoniem, Yahia Zakaria

Comments 5 pages, 3 figures, 2 listings

2601.15457 2026-01-23 cs.CL cs.AI cs.IR

Chunking, Retrieval, and Re-ranking: An Empirical Evaluation of RAG Architectures for Policy Document Question Answering

Anuj Maharjan, Umesh Yadav

AI 大模型

视觉与机器人

科学与医疗

Qwen3-TTS Technical Report