arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.25771 2026-03-30 cs.LG cs.AI cs.CY

Empowering Epidemic Response: The Role of Reinforcement Learning in Infectious Disease Control

Mutong Liu, Yang Liu, Jiming Liu

Comments 8 pages, 1 figure, 3 tables

详情

英文摘要

Reinforcement learning (RL), owing to its adaptability to various dynamic systems in many real-world scenarios and the capability of maximizing long-term outcomes under different constraints, has been used in infectious disease control to optimize the intervention strategies for controlling infectious disease spread and responding to outbreaks in recent years. The potential of RL for assisting public health sectors in preventing and controlling infectious diseases is gradually emerging and being explored by rapidly increasing publications relevant to COVID-19 and other infectious diseases. However, few surveys exclusively discuss this topic, that is, the development and application of RL approaches for optimizing strategies of non-pharmaceutical and pharmaceutical interventions of public health. Therefore, this paper aims to provide a concise review and discussion of the latest literature on how RL approaches have been used to assist in controlling the spread and outbreaks of infectious diseases, covering several critical topics addressing public health demands: resource allocation, balancing between lives and livelihoods, mixed policy of multiple interventions, and inter-regional coordinated control. Finally, we conclude the paper with a discussion of several potential directions for future research.

URL PDF HTML ☆

赞 0 踩 0

2603.25767 2026-03-30 cs.SD cs.AI eess.AS

Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods

Xuanru Zhou, Yiwen Shao, Wei-Cheng Tseng, Dong Yu

Comments Accepted to CVPR 2026

2603.25766 2026-03-30 cs.RO cs.AI

ETA-VLA: Efficient Token Adaptation via Temporal Fusion and Intra-LLM Sparsification for Vision-Language-Action Models

Yiru Wang, Anqing Jiang, Shuo Wang, Yuwen Heng, Zichong Gu, Hao Sun

2603.25765 2026-03-30 cs.CV cond-mat.mtrl-sci

Evaluating Synthetic Images as Effective Substitutes for Experimental Data in Surface Roughness Classification

Binwei Chen, Huachao Leng, Chi Yeung Mang, Tsz Wai Cheung, Yanhua Chen, Wai Keung Anthony Loh, Chi Ho Wong, Chak Yin Tang

2603.25761 2026-03-30 cs.CV cs.DL

A Survey of OCR Evaluation Methods and Metrics and the Invisibility of Historical Documents

Fitsum Sileshi Beyene, Christopher L. Dancy

Comments This manuscript is the author's submitted version to the ACM Conference on Fairness, Accountability, and Transparency (FAccT 2026). Please cite the final published version via ACM Digital Library when available

2603.25758 2026-03-30 cs.CV cs.AI cs.LG eess.IV

A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning

Changyu Liu, James Chenhao Liang, Wenhao Yang, Yiming Cui, Jinghao Yang, Tianyang Wang, Qifan Wang, Dongfang Liu, Cheng Han

2603.25752 2026-03-30 cs.CL cs.SD eess.AS

Relational graph-driven differential denoising and diffusion attention fusion for multimodal conversation emotion recognition

Ying Liu, Yuntao Shou, Wei Ai, Tao Meng, Keqin Li

Comments 19 pages

详情

DOI: 10.1016/j.neucom.2026.133306
Journal ref: neurocomputing2026

英文摘要

In real-world scenarios, audio and video signals are often subject to environmental noise and limited acquisition conditions, resulting in extracted features containing excessive noise. Furthermore, there is an imbalance in data quality and information carrying capacity between different modalities. These two issues together lead to information distortion and weight bias during the fusion phase, impairing overall recognition performance. Most existing methods neglect the impact of noisy modalities and rely on implicit weighting to model modality importance, thereby failing to explicitly account for the predominant contribution of the textual modality in emotion understanding. To address these issues, we propose a relation-aware denoising and diffusion attention fusion model for MCER. Specifically, we first design a differential Transformer that explicitly computes the differences between two attention maps, thereby enhancing temporally consistent information while suppressing time-irrelevant noise, which leads to effective denoising in both audio and video modalities. Second, we construct modality-specific and cross-modality relation subgraphs to capture speaker-dependent emotional dependencies, enabling fine-grained modeling of intra- and inter-modal relationships. Finally, we introduce a text-guided cross-modal diffusion mechanism that leverages self-attention to model intra-modal dependencies and adaptively diffuses audiovisual information into the textual stream, ensuring more robust and semantically aligned multimodal fusion.

URL PDF HTML ☆

赞 0 踩 0

2603.25747 2026-03-30 cs.AI

BeSafe-Bench: Unveiling Behavioral Safety Risks of Situated Agents in Functional Environments

Yuxuan Li, Yi Lin, Peng Wang, Shiming Liu, Xuetao Wei

2603.24060 2026-03-30 cs.RO

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu

Comments 9 pages, 16 figures, 3 table

2603.23629 2026-03-30 cs.LG

Steering Code LLMs with Activation Directions for Language and Library Control

Md Mahbubur Rahman, Arjun Guha, Harshitha Menon

2603.22918 2026-03-30 cs.CV cs.AI cs.CL

EVA: Efficient Reinforcement Learning for End-to-End Video Agent

Yaolun Zhang, Ruohui Wang, Jiahao Wang, Yepeng Tang, Xuanyu Zheng, Haonan Duan, Hao Lu, Hanming Deng, Lewei Lu

Comments CVPR2026

2603.22707 2026-03-30 cs.CL

Detecting Non-Membership in LLM Training Data via Rank Correlations

Pranav Shetty, Mirazul Haque, Zhiqiang Ma, Xiaomo Liu

Comments Accepted to EACL 2026 Main Conference

2603.21039 2026-03-30 cs.LG

Benchmarking Scientific Machine Learning Models for Air Quality Data

Khawja Imran Masud, Venkata Sai Rahul Unnam, Sahara Ali

Comments Accepted at IEEE IGARSS 2026; 22 pages, 6 figures;

详情

英文摘要

Accurate air quality index (AQI) forecasting is essential for the protecting public health in rapidly growing urban regions, and the practical model evaluation and selection are often challenged by the lack of rigorous, region-specific benchmarking on standardized datasets. Physics-guided machine learning and deep learning models could be a good and effective solution to resolve such issues with more accurate and efficient AQI forecasting. This research study presents an explainable and comprehensive benchmark that enables a guideline and proposed physics-guided best model by benchmarking classical time-series, machine-learning, and deep-learning approaches for multi-horizon AQI forecasting in North Texas (Dallas County). Using publicly available U.S. Environmental Protection Agency (EPA) daily observations of air quality data from 2022 to 2024, we curate city-level time series for PM2.5 and O3 by aggregating station measurements and constructing lag-wise forecasting datasets for LAG in {1,7,14,30} days. For benchmarking the best model, linear regression (LR), SARIMAX, multilayer perceptrons (MLP), and LSTM networks are evaluated with the proposed physics-guided variants (MLP+Physics and LSTM+Physics) that incorporate the EPA breakpoint-based AQI formulation as a consistency constraint through a weighted loss. Experiments using chronological train-test splits and error metrics MAE, RMSE showed that deep-learning models outperform simpler baselines, while physics guidance improves stability and yields physically consistent pollutant with AQI relationships, with the largest benefits observed for short-horizon prediction and for PM2.5 and O3. Overall, the results provide a practical reference for selecting AQI forecasting models in North Texas and clarify when lightweight physics constraints meaningfully improve predictive performance across pollutants and forecast horizons.

URL PDF HTML ☆

赞 0 踩 0

2603.18349 2026-03-30 cs.AI cs.CL

Large-Scale Analysis of Persuasive Content on Moltbook

Julia Jose, Meghna Manoj Nair, Rachel Greenstadt

Comments 9 pages, 4 figures

2603.17233 2026-03-30 cs.AI

Draft-and-Prune: Improving the Reliability of Auto-formalization for Logical Reasoning

Zhiyu Ni, Zheng Liang, Liangcheng Song, Chenrui Cao, Xian Zhang, Alberto Sangiovanni-Vincentelli, Pierluigi Nuzzo

2603.16816 2026-03-30 cs.CV cs.DL

WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation

Muhammad Aamir, Naoya Muramatsu, Sangyun Shin, Matthew Wijers, Jia-Xing Zhong, Xinyu Hou, Amir Patel, Andrew Loveridge, Andrew Markham

2603.16629 2026-03-30 cs.CV cs.AI

MLLM-based Textual Explanations for Face Comparison

Redwan Sony, Anil K Jain, Arun Ross

Comments Accepted at 14th International Workshop on Biometrics and Forensics (IWBF)

2603.16233 2026-03-30 cs.CV cs.RO

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

Ryosuke Hori, Jyun-Ting Song, Zhengyi Luo, Jinkun Cao, Soyong Shin, Hideo Saito, Kris Kitani

2603.16130 2026-03-30 cs.CV

EPOFusion: Exposure aware Progressive Optimization Method for Infrared and Visible Image Fusion

Zhiwei Wang, Yayu Zheng, Defeng He, Li Zhao, Xiaoqin Zhang, Yuxing Li, Edmund Y. Lam

2603.15812 2026-03-30 cs.CV

ModTrack: Sensor-Agnostic Multi-View Tracking via Identity-Informed PHD Filtering with Covariance Propagation

Aditya Iyer, Jack Roberts, Nora Ayanian

详情

英文摘要

Multi-View Multi-Object Tracking (MV-MOT) aims to localize and maintain consistent identities of objects observed by multiple sensors. This task is challenging, as viewpoint changes and occlusion disrupt identity consistency across views and time. Recent end-to-end approaches address this by jointly learning 2D Bird's Eye View (BEV) representations and identity associations, achieving high tracking accuracy. However, these methods offer no principled uncertainty accounting and remain tightly coupled to their training configuration, limiting generalization across sensor layouts, modalities, or datasets without retraining. We propose ModTrack, a modular MV-MOT system that matches end-to-end performance while providing cross-modal, sensor-agnostic generalization and traceable uncertainty. ModTrack confines learning methods to just the \textit{Detection and Feature Extraction} stage of the MV-MOT pipeline, performing all fusion, association, and tracking with closed-form analytical methods. Our design reduces each sensor's output to calibrated position-covariance pairs $(\mathbf{z}, R)$; cross-view clustering and precision-weighted fusion then yield unified estimates $(\hat{\mathbf{z}}, \hat{R})$ for identity assignment and temporal tracking. A feedback-coupled, identity-informed Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter with HMM motion modes uses these fused estimates to maintain identities under missed detections and heavy occlusion. ModTrack achieves 95.5 IDF1 and 91.4 MOTA on \textit{WildTrack}, surpassing all prior modular methods by over 21 points and rivaling the state-of-the-art end-to-end methods while providing deployment flexibility they cannot. Specifically, the same tracker core transfers unchanged to \textit{MultiviewX} and \textit{RadarScenes}, with only perception-module replacement required to extend to new domains and sensor modalities.

URL PDF HTML ☆

赞 0 踩 0

2603.12522 2026-03-30 cs.CL cs.AI cs.CY cs.HC

LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation

Himel Ghosh, Nick Elias Werner

Comments Accepted at EACL 2026 (24-29 March, Morocco)

2603.12366 2026-03-30 cs.LG

Sinkhorn-Drifting Generative Models

Ping He, Om Khangaonkar, Hamed Pirsiavash, Yikun Bai, Soheil Kolouri

2603.10300 2026-03-30 cs.CV

From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video Classification

Ke Zhang, Xiangchen Zhao, Yunjie Tian, Jiayu Zheng, Vishal M. Patel, Di Fu

Comments 18 pages, 7 figures

2603.10008 2026-03-30 cs.CL cs.AI cs.LG

GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

Ahmed Khaled Khamis

Comments 5 pages, 2 figures, EACL26, AbjadNLP

2603.10007 2026-03-30 cs.CL cs.LG

GATech at AbjadGenEval Shared Task: Multilingual Embeddings for Arabic Machine-Generated Text Classification

Ahmed Khaled Khamis

Comments 5 pages, 1 figure, EACL26, AbjadNLP

2603.03282 2026-03-30 cs.CV cs.GR cs.HC

MIBURI: Towards Expressive Interactive Gesture Synthesis

M. Hamza Mughal, Rishabh Dabral, Vera Demberg, Christian Theobalt

Comments CVPR 2026 (Main). Project page: https://vcai.mpi-inf.mpg.de/projects/MIBURI/

2603.02080 2026-03-30 cs.CV cs.LG

From Pixels to Patches: Pooling Strategies for Earth Embeddings

Isaac Corley, Caleb Robinson, Inbal Becker-Reshef, Juan M. Lavista Ferres

Comments ICLR 2026 ML4RS Workshop

2602.19157 2026-03-30 cs.CL

Facet-Level Persona Control by Trait-Activated Routing with Contrastive SAE for Role-Playing LLMs

Wenqiu Tang, Zhen Wan, Takahiro Komamizu, Ichiro Ide

Comments Accepted in PAKDD 2026 special session on Data Science :Foundation and Applications

2602.15675 2026-03-30 cs.CL

LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models

Ahmed Khaled Khamis, Hesham Ali

Comments 8 pages, 2 figures, EACL26

2602.12137 2026-03-30 cs.CL

CitiLink-Minutes: A Multilayer Annotated Dataset of Municipal Meeting Minutes

Ricardo Campos, Ana Filipa Pacheco, Ana Luísa Fernandes, Inês Cantante, Rute Rebouças, Luís Filipe Cunha, José Miguel Isidro, José Pedro Evans, Miguel Marques, Rodrigo Batista, Evelin Amorim, Alípio Jorge, Nuno Guimarães, Sérgio Nunes, António Leal, Purificação Silvano

详情

DOI: 10.1007/978-3-032-21321-1_56
Journal ref: Advances in Information Retrieval. ECIR 2026. Lecture Notes in Computer Science, vol 16486. Springer, Cham

英文摘要

City councils play a crucial role in local governance, directly influencing citizens' daily lives through decisions made during municipal meetings. These deliberations are formally documented in meeting minutes, which serve as official records of discussions, decisions, and voting outcomes. Despite their importance, municipal meeting records have received little attention in Information Retrieval (IR) and Natural Language Processing (NLP), largely due to the lack of annotated datasets, which ultimately limit the development of computational models. To address this gap, we introduce CitiLink-Minutes, a multilayer dataset of 120 European Portuguese municipal meeting minutes from six municipalities. Unlike prior annotated datasets of parliamentary or video records, CitiLink-Minutes provides multilayer annotations and structured linkage of official written minutes. The dataset contains over one million tokens, with all personal identifiers de-identified. Each minute was manually annotated by two trained annotators and curated by an experienced linguist across three complementary dimensions: (1) metadata, (2) subjects of discussion, and (3) voting outcomes, totaling over 38,000 individual annotations. Released under FAIR principles and accompanied by baseline results on metadata extraction, topic classification, and vote labeling, CitiLink-Minutes demonstrates its potential for downstream NLP and IR tasks, while promoting transparent access to municipal decisions.

URL PDF HTML ☆

赞 0 踩 0