arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16877 2026-04-29 cs.CL

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Zhiyuan Cheng, Longying Lai, Yue Liu, Kai Cheng, Xiaoxi Qi

Comments 7 pages, 2 figures. Accepted to ICECET 2026

详情

英文摘要

Financial analysts face significant challenges extracting information from lengthy 10-K reports, which often exceed 100 pages. This paper presents a Retrieval-Augmented Generation (RAG) system designed to answer questions about S&P 500 financial reports and evaluates the impact of neural reranking on system performance. Our pipeline employs hybrid search combining full-text and semantic retrieval, followed by an optional reranking stage using a cross-encoder model. We conduct systematic evaluation using the FinDER benchmark dataset, comprising 1,500 queries across five experimental groups. Results demonstrate that reranking significantly improves answer quality, achieving 49.0 percent correctness for scores of 8 or above compared to 33.5 percent without reranking, representing a 15.5 percentage point improvement. Additionally, the error rate for completely incorrect answers decreases from 35.3 percent to 22.5 percent. Our findings emphasize the critical role of reranking in financial RAG systems and demonstrate performance improvements over baseline methods through modern language models and refined retrieval strategies.

URL PDF HTML ☆

赞 0 踩 0

2603.16648 2026-04-29 cs.AI

Domain-Independent Dynamic Programming with Constraint Propagation

Imko Marijnissen, J. Christopher Beck, Emir Demirović, Ryo Kuroiwa

Comments 13 pages. To appear at the 36th International Conference on Automated Planning and Scheduling (ICAPS 2026)

2603.15954 2026-04-29 cs.LG cs.AI

MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment

Hanxian Huang, Igor Fedorov, Andrey Gromov, Bernard Beckerman, Naveen Suda, David Eriksson, Maximilian Balandat, Rylan Conway, Patrick Huber, Chinnadhurai Sankar, Ayushi Dalmia, Zechun Liu, Lemeng Wu, Tarek Elgamal, Adithya Sagar, Vikas Chandra, Raghuraman Krishnamoorthi

Comments Accepted to ACL Industry Track 2026

2603.15473 2026-04-29 cs.AI

Agent Lifecycle Toolkit (ALTK): Reusable Middleware Components for Robust AI Agents

Zidane Wright, Jason Tsay, Anupama Murthi, Osher Elhadad, Diego Del Rio, Saurabh Goyal, Kiran Kate, Jim Laredo, Koren Lazar, Vinod Muthusamy, Yara Rizk

Comments to appear in CAIS 2026 demonstration track

2603.14248 2026-04-29 cs.AI cs.CL

Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Mohamed Aghzal, Gregory J. Stein, Ziyu Yao

Comments Accepted to The 64th Annual Meeting of the Association for Computational Linguistics (ACL) 2026

2603.12118 2026-04-29 cs.LG cs.DC

Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

Jae-Won Chung, Jeff J. Ma, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury

Comments CAIS 2026 Demo track | Open source at https://github.com/cornserve-ai/cornserve | Demo video at https://www.youtube.com/watch?v=nb8R-vztLRg

2603.09723 2026-04-29 cs.CL cs.AI

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Sihong Wu, Yiling Ma, Yilun Zhao, Tiansheng Hu, Owen Jiang, Manasi Patwardhan, Arman Cohan

Comments ACL 2026 Findings

2603.02709 2026-04-29 cs.CL cs.AI

Sensory-Aware Sequential Recommendation via Review-Distilled Representations

Yeo Chan Yoon, Chanjun Park, Kyuhan Koh

2603.01070 2026-04-29 cs.CL

How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning

Xiangxiang Zhang, Caijun Jia, Siyuan Li, Dingyu He, Xiya Xiong, Zheng Sun, Honghao He, Yuchen Wu, Bihui Yu, Linzhuang Sun, Cheng Tan, Jingxuan Wei

2603.00376 2026-04-29 cs.AI

NeuroHex: A Brain-Inspired Hex Coordinate System to Enable Highly Computationally-Efficient World Models for Continuous Online-Adaptive Learning

Quinn Jacobson, Joe Luo, Jingfei Xu, Shanmuga Venkatachalam, Kevin Wang, Dingchao Rong, John Paul Shen

Comments This is an expanded version of the paper titled "NeuroHex: Highly Efficient Hex Coordinate System for Creating World Models to Enable Adaptive AI" published in the proceedings of the 2026 Neuro Inspired Computational Elements (NICE) [1] conference. This is an archival version of the paper and is currently under review for an ACM journal publication

2602.20730 2026-04-29 cs.LG

Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba

Zhenxing Xu, Zeyuan Ma, Weidong Bao, Yan Zheng, Ji Wang, Zhiguang Cao

2602.17697 2026-04-29 cs.LG cs.SE

Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters

Nada Zine, Clément Quinton, Romain Rouvoy

2602.17262 2026-04-29 cs.CL stat.ME

Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study

Kensuke Okada, Yui Furukawa, Kyosuke Bunji

2602.11075 2026-04-29 cs.RO

RISE: Self-Improving Robot Policy with Compositional World Model

Jiazhi Yang, Kunyang Lin, Jinwei Li, Wencong Zhang, Tianwei Lin, Longyan Wu, Zhizhong Su, Hao Zhao, Ya-Qin Zhang, Li Chen, Ping Luo, Xiangyu Yue, Hongyang Li

Comments RSS 2026. Project page: https://opendrivelab.com/RISE/

2602.10718 2026-04-29 cs.LG cs.CL

SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining

Yifan Zhang, Zunhai Su, Shuhao Hu, Rui Yang, Wei Wu, Yulei Qian, Yuchen Xie, Xunliang Cai

2602.05330 2026-04-29 cs.CV

MTPano: Multi-Task Panoramic Scene Understanding via Label-Free Integration of Dense Prediction Priors

Jingdong Zhang, Xiaohang Zhan, Lingzhi Zhang, Yizhou Wang, Zhengming Yu, Jionghao Wang, Wenping Wang, Xin Li

详情

英文摘要

Comprehensive panoramic scene understanding is critical for immersive applications, yet it remains challenging due to the scarcity of high-resolution, multi-task annotations. While perspective foundation models have achieved success through data scaling, directly adapting them to the panoramic domain often fails due to severe geometric distortions and coordinate system discrepancies. Furthermore, the underlying relations between diverse dense prediction tasks in spherical spaces are underexplored. To address these challenges, we propose MTPano, a robust multi-task panoramic foundation model established by a label-free training pipeline. First, to circumvent data scarcity, we leverage powerful perspective dense priors. We project panoramic images into perspective patches to generate accurate, domain-gap-free pseudo-labels using off-the-shelf foundation models, which are then re-projected to serve as patch-wise supervision. Second, to tackle the interference between task types, we categorize tasks into rotation-invariant (e.g., depth, segmentation) and rotation-variant (e.g., surface normals) groups. We introduce the Panoramic Dual BridgeNet, which disentangles these feature streams via geometry-aware modulation layers that inject absolute position and ray direction priors. To handle the distortion from equirectangular projections (ERP), we incorporate ERP token mixers followed by a dual-branch BridgeNet for interactions with gradient truncation, facilitating beneficial cross-task information sharing while blocking conflicting gradients from incompatible task attributes. Additionally, we introduce auxiliary tasks to fertilize the cross-task learning process. Extensive experiments demonstrate that MTPano achieves state-of-the-art performance on multiple benchmarks and delivers competitive results against task-specific panoramic specialist foundation models.

URL PDF HTML ☆

赞 0 踩 0

2602.01433 2026-04-29 cs.LG cs.AI stat.ML

DCD: Decomposition-based Causal Discovery from Autocorrelated and Non-Stationary Temporal Data

Muhammad Hasan Ferdous, Md Osman Gani

2601.22154 2026-04-29 cs.AI cs.CL

Exploring Reasoning Reward Model for Agents

Kaixuan Fan, Kaituo Feng, Manyuan Zhang, Tianshuo Peng, Zhixun Li, Yilei Jiang, Shuang Chen, Peng Pei, Xunliang Cai, Xiangyu Yue

Comments ACL 2026 Findings, Project page: https://github.com/kxfan2002/Reagent

2601.21225 2026-04-29 cs.CL cs.AI

MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

Tianyi Xu, Kosei Uemura, Alfred Malengo Kondoro, Tadesse Destaw Belay, Catherine Nana Nyaah Essuman, Ifeoma Okoh, Ganiyat Afolabi, Ayodele Awokoya, David Ifeoluwa Adelani

2601.09093 2026-04-29 cs.LG

Hidden States as Early Signals: Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling

Zhixiang Liang, Beichen Huang, Zheng Wang, Minjia Zhang

2601.04682 2026-04-29 cs.CV

HATIR: Heat-Aware Diffusion for Turbulent Infrared Video Super-Resolution

Yang Zou, Xingyue Zhu, Kaiqi Han, Jun Ma, Xingyuan Li, Zhiying Jiang, Jinyuan Liu

详情

DOI: 10.1609/aaai.v40i16.38421
Journal ref: Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)

英文摘要

Infrared video has been of great interest in visual tasks under challenging environments, but often suffers from severe atmospheric turbulence and compression degradation. Existing video super-resolution (VSR) methods either neglect the inherent modality gap between infrared and visible images or fail to restore turbulence-induced distortions. Directly cascading turbulence mitigation (TM) algorithms with VSR methods leads to error propagation and accumulation due to the decoupled modeling of degradation between turbulence and resolution. We introduce HATIR, a Heat-Aware Diffusion for Turbulent InfraRed Video Super-Resolution, which injects heat-aware deformation priors into the diffusion sampling path to jointly model the inverse process of turbulent degradation and structural detail loss. Specifically, HATIR constructs a Phasor-Guided Flow Estimator, rooted in the physical principle that thermally active regions exhibit consistent phasor responses over time, enabling reliable turbulence-aware flow to guide the reverse diffusion process. To ensure the fidelity of structural recovery under nonuniform distortions, a Turbulence-Aware Decoder is proposed to selectively suppress unstable temporal cues and enhance edge-aware feature aggregation via turbulence gating and structure-aware attention. We built FLIR-IVSR, the first dataset for turbulent infrared VSR, comprising paired LR-HR sequences from a FLIR T1050sc camera (1024 X 768) spanning 640 diverse scenes with varying camera and object motion conditions. This encourages future research in infrared VSR. Project page: https://github.com/JZ0606/HATIR

URL PDF HTML ☆

赞 0 踩 0

2601.03266 2026-04-29 cs.CL cs.AI

Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

Alif Munim, Jun Ma, Omar Ibrahim, Alhusain Abdalla, Shuolin Yin, Leo Chen, Bo Wang

详情

英文摘要

Large language models (LLMs) have rapidly advanced in clinical decision-making, yet the deployment of proprietary systems is hindered by privacy concerns and reliance on cloud-based infrastructure. Open-source alternatives allow local inference but often have large model sizes that limit their use in resource-constrained clinical settings. Here, we benchmark on-device LLMs from the gpt-oss (20b, 120b), Qwen3.5 (9B, 27B, 35B), and Gemma 4 (31B) families across three representative clinical tasks: general disease diagnosis, specialty-specific (ophthalmology) diagnosis and management, and simulation of human expert grading and evaluation. We compare their performance with state-of-the-art proprietary models (GPT-5.1, GPT-5-mini, and Gemini 3.1 Pro) and a leading open-source model (DeepSeek-R1), and we further evaluate the adaptability of on-device systems by fine-tuning gpt-oss-20b and Qwen3.5-35B on general diagnostic data. Across tasks, on-device models achieve performance comparable to or exceeding DeepSeek-R1 and GPT-5-mini despite being substantially smaller. In addition, fine-tuning remarkably improves diagnostic accuracy, with the fine-tuned Qwen3.5-35B reaching 87.9% and approaching the proprietary GPT-5.1 (89.4%). Among base on-device models, Gemma 4 31B achieved the strongest general diagnostic accuracy at 86.5%, exceeding GPT-5-mini and approaching the fine-tuned Qwen3.5-35B. Error characterization revealed that 87.2% of diagnostic errors across all models were clinically plausible differentials rather than off-topic predictions, and upper-bound analysis showed up to 93.2% attainable accuracy through improved answer selection. These findings highlight the potential of on-device LLMs to deliver accurate, adaptable, and privacy-preserving clinical decision support, offering a practical pathway for broader integration of LLMs into routine clinical practice.

URL PDF HTML ☆

赞 0 踩 0

2601.02078 2026-04-29 cs.RO

Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot

Chenghao Yin, Da Huang, Di Yang, Jichao Wang, Nanshu Zhao, Chen Xu, Wenjun Sun, Linjie Hou, Zhijun Li, Junhui Wu, Zhaobo Liu, Zhen Xiao, Sheng Zhang, Lei Bao, Rui Feng, Zhenquan Pang, Jiayu Li, Qian Wang, Maoqing Yao

2512.17492 2026-04-29 cs.CV

MMLANDMARKS: a Cross-View Instance-Level Benchmark for Geo-Spatial Understanding

Oskar Kristoffersen, Alba Reinders Sánchez, Morten Rieger Hannemose, Anders Bjorholm Dahl, Dim P. Papadopoulos

Comments Accepted at CVPR 2026

2512.17111 2026-04-29 cs.LG

Digitizing Nepal's Written Heritage: A Comprehensive HTR Pipeline for Old Nepali Manuscripts

Anjali Sarawgi, Esteban Garces Arias, Christof Zotter

Comments Accepted at ACL 2026 (Main Conference)

2512.12087 2026-04-29 cs.CL

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

Jiayi Yuan, Cameron Shinn, Kai Xu, Jingze Cui, George Klimiashvili, Guangxuan Xiao, Perkz Zheng, Bo Li, Yuxin Zhou, Zhouhai Ye, Weijie You, Tian Zheng, Dominic Brown, Pengbo Wang, Markus Hoehnerbach, Richard Cai, Julien Demouth, John D. Owens, Xia Hu, Song Han, Timmy Liu, Huizi Mao

2512.12072 2026-04-29 cs.CL cs.LG

VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs

Avinash Amballa, Yashas Malur Saidutta, Chi-Heng Lin, Vivek Kulkarni, Srinivas Chappidi

Comments Accepted to ACL 2026 Main

2512.07348 2026-04-29 cs.CV

MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition

Xinyu Wei, Kangrui Cen, Hongyang Wei, Zhen Guo, Kai Cui, Bairui Li, Zeqing Wang, Jinrui Zhang, Lei Zhang

Comments Project Page: https://MICo-150K.github.io/

2512.07269 2026-04-29 cs.CV cs.LG

A graph generation pipeline for critical infrastructures based on heuristics, images and depth data

Mike Diessner, Yannick E. Tarant

2512.05089 2026-04-29 cs.LG math.OC

The Blueprints of Intelligence: A Functional-Topological Foundation for Perception and Representation

Eduardo Di Santi

Comments 35 pages, 6 figures. This preprint develops a deterministic functional-topological framework showing that physical systems generate compact perceptual manifolds with finite radius. We provide theory, Monte-Carlo estimators, and validation across PM, battery, and ECG domains, unifying biological perception and self-supervised AI