arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.14437 2026-04-17 cs.SE cs.AI

LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB

Vekil Bekmyradov, Noah C. Pütz, Thomas Bartz-Beielstein

详情

英文摘要

Large Language Models (LLMs) have achieved impressive results on public benchmarks, often leading to claims of advanced reasoning and understanding. However, recent research in cognitive science reveals that these models sometimes rely on shallow heuristics and memorization, taking shortcuts rather than demonstrating genuine cognitive abilities. This paper investigates LLM behavior in automated test generation for software, contrasting performance on an open-source system (LevelDB) with SAP HANA, one of the most widely deployed commercial database systems worldwide, whose proprietary codebase is guaranteed to be absent from training data. We combine cognitive evaluation principles, drawing on Mitchell's mechanism-focused assessment methodology, with empirical software testing, employing mutation score and iterative compiler-feedback repair loops to assess both accuracy and underlying reasoning strategies. Results show that LLMs excel on familiar, open-source benchmarks but struggle with unseen, complex domains, often prioritizing compilability over semantic effectiveness. These findings provide independent software engineering evidence for the broader claim that current LLMs lack robust reasoning, and highlight the need for evaluation frameworks that penalize trivial shortcuts and reward true generalization.

URL PDF HTML ☆

赞 0 踩 0

2604.14398 2026-04-17 physics.flu-dyn cs.LG

Timescale Separation Enables Deep Reinforcement Learning Control of Rotating Detonation Engine Mode Transitions

Kristian Holme, Jean Rabault, Ricardo Vinuesa, Mikael Mortensen

2604.14386 2026-04-17 cs.GT cs.AI

Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees

Dongxin Guo, Jikun Wu, Siu-Ming Yiu

Comments 15 pages including supplementary material, 2 figures, 5 tables

2604.14370 2026-04-17 stat.ME cs.LG

Deployment of AI-Assisted Interventions: Capacity Constraints and Noisy Compliance

Carri W. Chan, Yi Han, Hannah Li, Benjamin L. Ranard

2604.14352 2026-04-17 stat.ME cs.LG stat.AP

PROXIMA: A Reliability Scoring Framework for Proxy Metrics in Online Controlled Experiments

Avinash Amudala

Comments 14 pages. Sole-author submission. Independent research. Companion code at https://github.com/Avinash-Amudala/PROXIMA. Zenodo archive: 10.5281/zenodo.15483241. Related US provisional patent application: 63/974,569 (filed Feb 3, 2026)

2604.14317 2026-04-17 cs.CR cs.AI

Challenges and Future Directions in Agentic Reverse Engineering Systems

Salem Radey, Jack West, Kassem Fawaz

Comments 7 pages, 1 figure, accepted at SAGAI 2026

2604.14305 2026-04-17 stat.ME cs.LG q-bio.GN stat.AP

Combining Bayesian and Frequentist Inference for Laboratory-Specific Performance Guarantees in Copy Number Variation Detection

Austin Talbot, Alex V. Kotlar, Yue Ke

2604.14263 2026-04-17 q-bio.TO cs.CV cs.LG

A deep learning framework for glomeruli segmentation with boundary attention

Behnaz Elhaminia, Catherine King, Jiaqi Lv, Lorraine Harper, Paul Moss, Owen Cain, Dimitrios Chanouzas, Shan E Ahmed Raza

2604.14259 2026-04-17 q-bio.TO cs.LG eess.IV

Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay

Qianyu Chen, Shujian Yu

Comments manuscript accepted by CVPR 2026, code is available from \url{https://github.com/4me808/FORGE}

2604.14256 2026-04-17 cs.IR cs.AI

Evaluation of Agents under Simulated AI Marketplace Dynamics

To Eun Kim, Alireza Salemi, Hamed Zamani, Fernando Diaz

Comments SIGIR 2026

2604.14241 2026-04-17 q-bio.BM cond-mat.stat-mech cs.LG q-bio.QM

Polyformer: a generative framework for thermodynamic modeling of polymeric molecules

Alessio Valentini, David Pekker, Chungwen Liang, Todd Martinez, Swagatam Mukhopadhyay

Comments 9+epsilon pages+references+appendix, 6 figures

2604.14233 2026-04-17 cs.CR cs.LG

Anomaly Detection in IEC-61850 GOOSE Networks: Evaluating Unsupervised and Temporal Learning for Real-Time Intrusion Detection

Joseph Moore

Comments 10 pages, 7 figures, 4 tables

2604.14229 2026-04-17 quant-ph cs.AI cs.LG eess.IV

Magnitude Is All You Need? Rethinking Phase in Quantum Encoding of Complex SAR Data

Sakthi Prabhu Gunasekar, Prasanna Kumar R

Comments 8 pages, 4 figures. Under review for IEEE QCE 2026

2604.14228 2026-04-17 cs.SE cs.AI cs.CL cs.LG

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen

Comments Tech report. Code at: https://github.com/VILA-Lab/Dive-into-Claude-Code

2604.14227 2026-04-17 cs.IR cs.AI

FRESCO: Benchmarking and Optimizing Re-rankers for Evolving Semantic Conflict in Retrieval-Augmented Generation

Sohyun An, Hayeon Lee, Shuibenyang Yuan, Chun-cheng Jason Chen, Cho-Jui Hsieh, Vijai Mohan, Alexander Min

2604.14223 2026-04-17 cs.IR cs.AI

TRACE: A Conversational Framework for Sustainable Tourism Recommendation with Agentic Counterfactual Explanations

Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl, Yashar Deldjoo

2604.14222 2026-04-17 cs.IR cs.AI

Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents

Afshan Hashmi

2604.14220 2026-04-17 cs.IR cs.AI

Knowledge Graph RAG: Agentic Crawling and Graph Construction in Enterprise Documents

Koushik Chakraborty, Koyel Guha

Comments 15 pages, 4 figures

2604.14216 2026-04-17 cs.MM cs.AI cs.CL cs.CV cs.GR cs.LG

Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis

Aizierjiang Aiersilan, Mohamad Koubeissi

2604.14211 2026-04-17 math.DG cs.AI cs.SI math.CO

Ollivier-Ricci Curvature of Riemannian Manifolds and Directed Graphs with Applications to Graph Neural Networks

Eleanor Wiesler

2604.14208 2026-04-17 physics.optics cs.LG math.OC physics.comp-ph

ML-based approach to classification and generation of structured light propagation in turbulent media

Aokun Wang, Anjali Nair, Zhongjian Wang, Guillaume Bal

2604.14202 2026-04-17 q-bio.NC cs.AI

Bridging scalp and intracranial EEG in BCI via pretrained neural representations and geometric constraint embedding

Yihang Dong, Changhong Jing, Shuqiang Wang

2604.14200 2026-04-17 q-bio.NC cs.AI

Retina gap junctions support the robust perception by warping neural representational geometries along the visual hierarchy

Yang Yue, Shenjian Zhang, Yonghong Tian, Kai Du, Tiejun Huang

Comments 32 pages, 6 figures

2604.14199 2026-04-17 q-fin.CP cs.AI cs.LG

PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data

Pu Cheng, Juncheng Liu, Yunshen Long

Comments 16 pages, 4 figures, 6 tables

2604.14188 2026-04-17 physics.comp-ph cs.AI cs.CL hep-th

Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs

Xingyang Yu, Yinghuan Zhang, Yufei Zhang, Zijun Cui

Comments 9 pages + appendices, 2 figures, 9 tables

2604.14186 2026-04-17 eess.AS cs.AI cs.CL

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Vrunda N. Sukhadia, Shammur Absar Chowdhury

Comments 8 pages, 2 figures

2604.14184 2026-04-17 eess.SY cs.AI cs.SY

End-to-End Learning-based Operation of Integrated Energy Systems for Buildings and Data Centers

Zhenyu Pu, Yu Yang, Liang Yu, Xiaohong Guan

Comments 7 pages, 4 figures

2604.14154 2026-04-17 eess.SP cs.AI cs.CY

An Edge-Cloud Collaborative Architecture for Proactive Elderly Care: Real-Time Risk Assessment and Three-Level Emergency Response

Lijie Zhou, Luran Wang

详情

DOI: 10.13140/RG.2.2.19954.77761

英文摘要

The rapid aging of global populations has created an urgent need for intelligent healthcare monitoring systems to ensure the safety of elderly individuals living independently. Existing cloud-centric platforms face critical limitations, including high latency unsuitable for emergency response, privacy risks from continuous transmission of sensitive data, and limited, single-channel alert mechanisms lacking scalability and context awareness. This paper proposes an edge-cloud collaborative architecture that addresses these challenges through real-time multi-modal sensor fusion, a four-dimensional risk assessment model, and a three-level emergency response system. The framework adopts a five-layer design - device, edge, service, data, and application layers - enabling real-time risk evaluation with end-to-end alert latency under three seconds. At the edge, a weighted multi-modal fusion algorithm integrates data from five sensor types with confidence propagation. A unified risk score is generated by combining fall probability, physiological indicators, behavioral patterns, and sensor anomaly metrics. Based on dynamic thresholds, a three-tier notification system coordinates responses among family members, community doctors, and nearby volunteers. Experiments on CASAS, MIMIC-III, and SisFall datasets show that the approach achieves 91% activity recognition accuracy and an 84% anomaly detection F1-score, outperforming single-sensor methods. Deployment on Raspberry Pi 4 gateways demonstrates sub-100 ms inference latency while preserving privacy by keeping raw data local. This architecture advances practical, privacy-preserving, and responsive elderly care systems.

URL PDF HTML ☆

赞 0 踩 0

2604.09982 2026-04-17 cs.IR cs.CL cs.LG

Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions

Utshab Kumar Ghosh, Ashish David, Shubham Chatterjee

Comments 10 pages, 9 tables. Accepted to the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026)

2604.05478 2026-04-17 q-bio.GN cs.LG

Transcriptomic Models for Immunotherapy Response Prediction Show Limited Cross-cohort Generalisability

Yuheng Liang, Lucy Chhuo, Ahmadreza Argha, Nona Farbehi, Lu Chen, Roohallah Alizadehsani, Mehdi Hosseinzadeh, Amin Beheshti, Thantrira Porntaveetusm, Youqiong Ye, Hamid Alinejad-Rokny