arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.15921 2026-03-18 cs.SE cs.AI

VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

Srijan Bansal, Jiao Fangkai, Yilun Zhou, Austin Xu, Shafiq Joty, Semih Yavuz

详情

英文摘要

As Large Language Models shift the programming toward human-guided ''vibe coding'', agentic coding tools increasingly rely on models to self-diagnose and repair their own subtle faults -- a capability central to autonomous software engineering yet never systematically evaluated. We present \name{}, the first empirical decomposition that jointly evaluates two coupled tasks: \emph{Fault-Triggering Test Generation (FT-Test)} constructing a discriminative witness that exposes a latent bug, and \emph{Fault-targeted Program Repair (FPR)}, repairing it under varying diagnostic conditions. \name{} pairs competitive programming problems with LLM-generated solutions that pass partial test suites but fail on semantic edge cases, enabling controlled identification of where the diagnostic chain breaks down. Evaluating 12 frontier LLMs, we find that fault-targeted reasoning does not scale with general coding ability. Models produce syntactically valid test inputs at near-ceiling rates yet collapse on discriminative generation, with fault hypothesis generation -- not output validation -- as the dominant bottleneck. Test-guided repair reveals a complementary insight: when self-generated tests successfully witness a fault, the resulting repair matches or outperforms repair guided by externally provided tests, but tests that fail to witness the fault actively degrade repair below unguided baselines. Together, these results reframe the challenge of autonomous debugging: the binding bottleneck is not code synthesis or test validity but fault-target reasoning, a capability that remains deficient across all frontier models. As Large Language Models shift the programming toward human-guided ''vibe coding'', agentic coding tools increasingly rely on models to self-diagnose and repair their own subtle faults -- a capability central to autonomous software engineering yet never systematically evaluated.

URL PDF HTML ☆

赞 0 踩 0

2603.15900 2026-03-18 cs.NI cs.AI

The Internet of Physical AI Agents: Interoperability, Longevity, and the Cost of Getting It Wrong

Roberto Morabito, Mallik Tatipamula

Comments A related version of this work is currently under review for publication in an IEEE magazine

2603.15892 2026-03-18 cs.IR cs.CL

Temporal Fact Conflicts in LLMs: Reproducibility Insights from Unifying DYNAMICQA and MULAN

Ritajit Dey, Iadh Ounis, Graham McDonald, Yashar Moshfeghi

2603.15863 2026-03-18 cs.HC cs.AI

Interpretative Interfaces: Designing for AI-Mediated Reading Practices and the Knowledge Commons

Gabrielle Benabdallah

Comments Accepted at the Proceedings of the CHI 2026 Workshop: Ethics at the Front-End

详情

英文摘要

Explainable AI (XAI) interfaces seek to make large language models more transparent, yet explanation alone does not produce understanding. Explaining a system's behavior is not the same as being able to engage with it, to probe and interpret its operations through direct manipulation. This distinction matters for scientific disciplines in particular: scientists who increasingly rely on LLMs for reading, citing, and producing literature reviews have little means of directly engaging with how these models process and transform the texts they generate. In this ongoing design research project, I argue for a shift from explainability to interpretative engagement. This shift moves away from accounts of system behavior to instead enable users to manipulate a model's intermediate representations. Drawing on textual scholarship, computational poetics, and the history of reading and writing technologies, including practices such as marginalia, glosses, indices, and annotation systems, I propose interpretative interfaces as interactive environments in which non-expert users can intervene in the representational space of a language model. More specifically, such interfaces will allow users to select a token and follow its trajectory through the model's intermediate layers. This way, they can observe how its semantic position shifts as context is processed, and possibly annotate the transformations they find useful or meaningful. The same way readers can create their own maps within a book through annotations and bookmarks, interpretative interfaces will allow users to inscribe their reading of a model's internal representations. The goal of this project is to reframe AI interpretability as an interaction design project rather than a purely technical one, and to open a path toward AI-mediated reading that supports interpretative engagement and critical stewardship of scientific knowledge.

URL PDF HTML ☆

赞 0 踩 0

2603.15143 2026-03-18 eess.IV cs.CV

Clinical Priors Guided Lung Disease Detection in 3D CT Scans

Kejin Lu, Jianfa Bai, Qingqiu Li, Runtian Yuan, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

2603.15057 2026-03-18 stat.ML cs.AI cs.LG

Analyzing Error Sources in Global Feature Effect Estimation

Timo Heiß, Coco Bögel, Bernd Bischl, Giuseppe Casalicchio

Comments Accepted to The 4th World Conference on eXplainable Artificial Intelligence (XAI 2026)

2603.14927 2026-03-18 cs.GR cs.LG

Masked BRep Autoencoder via Hierarchical Graph Transformer

Yifei Li, Kang Wu, Wenming Wu, Xiao-Ming Fu

Comments 27 pages, 11 figures. Under review

2603.11408 2026-03-18 q-fin.ST cs.CL

Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction

Dehao Dai, Ding Ma, Dou Liu, Kerui Geng, Yiqing Wang

Comments 28 pages, 4 figures, 4 tables

2603.10249 2026-03-18 cs.SE cs.AI cs.HC

DUCTILE: Agentic LLM Orchestration of Engineering Analysis in Product Development Practice

Alejandro Pradas-Gomez, Arindam Brahma, Ola Isaksson

Comments 22 pages, including supplemental material. 9 Figures

2603.10080 2026-03-18 cs.CR cs.AI cs.LG

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Ali Raza, Gurang Gupta, Nikolay Matyunin, Jibesh Patra

2603.09531 2026-03-18 q-bio.QM cs.CV eess.IV stat.AP

Association of Progressive PPFE and Mortality in Lung Cancer Screening Cohorts

Shahab Aslani, Mehran Azimbagirad, Daryl Cheng, Daisuke Yamada, Ryoko Egashira, Adam Szmul, Justine Chan-Fook, Robert Chapman, Alfred Chung Pui So, Shanshan Wang, John McCabe, Tianqi Yang, Jose M Brenes, Eyjolfur Gudmundsson, The SUMMIT Consortium, Susan M. Astley, Daniel C. Alexander, Sam M. Janes, Joseph Jacob

2603.08108 2026-03-18 cs.CE cs.LG

Tau-BNO: Brain Neural Operator for Tau Transport Model

Nuutti Barron, Heng Rao, Urmi Saha, Yu Gu, Zhenghao Liu, Ge Yu, Defu Yang, Ashish Raj, Minghan Chen

详情

英文摘要

Mechanistic modeling provides a biophysically grounded framework for studying the spread of pathological tau protein in tauopathies like Alzheimer's disease. Existing approaches typically model tau propagation as a diffusive process on the brain's structural connectome, reproducing macroscopic patterns but neglecting microscale cellular transport and reaction mechanisms. The Network Transport Model (NTM) was introduced to fill this gap, explaining how region-level progression of tau emerges from microscale biophysical processes. However, the NTM faces a common challenge for complex models defined by large systems of partial differential equations: the inability to perform parameter inference and mechanistic discovery due to high computational burden and slow model simulations. To overcome this barrier, we propose Tau-BNO, a Brain Neural Operator surrogate framework for rapidly approximating NTM dynamics that captures both intra-regional reaction kinetics and inter-regional network transport. Tau-BNO combines a function operator that encodes kinetic parameters with a query operator that preserves initial state information, while approximating anisotropic transport through a spectral kernel that retains directionality. Empirical evaluations demonstrate high predictive accuracy ($R^2\approx$ 0.98) across diverse biophysical regimes and an 89\% performance improvement over state-of-the-art sequence models like Transformers and Mamba, which lack inherent structural priors. By reducing simulation time from hours to seconds, we show that the surrogate model is capable of producing new insights and generating new hypotheses. This framework is readily extensible to a broader class of connectome-based biophysical models, showcasing the transformative value of deep learning surrogates to accelerate analysis of large-scale, computationally intensive dynamical systems.

URL PDF HTML ☆

赞 0 踩 0

2603.05551 2026-03-18 cs.IR cs.CV

AutothinkRAG: Complexity-Aware Control of Retrieval-Augmented Reasoning for Image-Text Interaction

Jiashu Yang, Chi Zhang, Abudukelimu Wuerkaixi, Xuxin Cheng, Cao Liu, Ke Zeng, Xu Jia, Xunliang Cai

2602.18466 2026-03-18 cs.CY cs.AI cs.CV

Can Multimodal LLMs See Science Instruction? Benchmarking Pedagogical Reasoning in K-12 Classroom Videos

Yixuan Shen, Peng He, Honglu Liu, Jinxuan Fan, Yuyang Ji, Tingting Li, Tianlong Chen, Kaidi Xu, Feng Liu

Comments 17pages, 3 figures

2602.03999 2026-03-18 math.PR cs.DS cs.LG math.ST stat.ML stat.TH

Functional Stochastic Localization

Anming Gu, Bobby Shi, Kevin Tian

Comments Comments welcome! v2 adds citations and fixes typos

2602.02335 2026-03-18 cs.DC cs.AI cs.DB

Building a Correct-by-Design Lakehouse. Data Contracts, Versioning, and Transactional Pipelines for Humans and Agents

Weiming Sheng, Jinlang Wang, Manuel Barros, Aldrin Montana, Jacopo Tagliabue, Luca Bigon

Comments Submission pre-print, data conference

2601.16926 2026-03-18 cs.CY cs.AI cs.HC

Nishpaksh: TEC Standard-Compliant Framework for Fairness Auditing and Certification of AI Models

Shashank Prakash, Ranjitha Prasad, Avinash Agarwal

Comments Accepted and presented at 2026 18th International Conference on COMmunication Systems and NETworks (COMSNETS)

详情

DOI: 10.1109/COMSNETS67989.2026.11418125
Journal ref: 2026 18th International Conference on COMmunication Systems and NETworks (COMSNETS), Bengaluru, India, 2026, pp. 877-882

英文摘要

The growing reliance on Artificial Intelligence (AI) models in high-stakes decision-making systems, particularly within emerging telecom and 6G applications, underscores the urgent need for transparent and standardized fairness assessment frameworks. While global toolkits such as IBM AI Fairness 360 and Microsoft Fairlearn have advanced bias detection, they often lack alignment with region-specific regulatory requirements and national priorities. To address this gap, we propose Nishpaksh, an indigenous fairness evaluation tool that operationalizes the Telecommunication Engineering Centre (TEC) Standard for the Evaluation and Rating of Artificial Intelligence Systems. Nishpaksh integrates survey-based risk quantification, contextual threshold determination, and quantitative fairness evaluation into a unified, web-based dashboard. The tool employs vectorized computation, reactive state management, and certification-ready reporting to enable reproducible, audit-grade assessments, thereby addressing a critical post-standardization implementation need. Experimental validation on the COMPAS dataset demonstrates Nishpaksh's effectiveness in identifying attribute-specific bias and generating standardized fairness scores compliant with the TEC framework. The system bridges the gap between research-oriented fairness methodologies and regulatory AI governance in India, marking a significant step toward responsible and auditable AI deployment within critical infrastructure like telecommunications.

URL PDF HTML ☆

赞 0 踩 0

2601.06194 2026-03-18 cs.CY cs.AI cs.CL

Political Alignment in Large Language Models: A Multidimensional Audit of Psychometric Identity and Behavioral Bias

Adib Sakhawat, Tahsin Islam, Takia Farhin, Syed Rifat Raiyan, Hasan Mahmud, Md Kamrul Hasan

Comments Under review, 25 pages, 6 figures, 23 tables

2512.16455 2026-03-18 cs.DC cs.AI

AI4EOSC: a Federated Cloud Platform for Artificial Intelligence in Scientific Research

Ignacio Heredia, Álvaro López García, Fernando Aguilar Gómez, Diego Aguirre, Caterina Alarcón Marín, Khadijeh Alibabaei, Lisana Berberi, Miguel Caballer, Amanda Calatrava, Pedro Castro, Alessandro Costantini, Mario David, Jaime Díez Stefan Dlugolinsky, Borja Esteban Sanchis, Giacinto Donvito, Leonhard Duda, Saúl Fernandez, Andrés Heredia Canales, Valentin Kozlov, Sergio Langarita, João Machado, Germán Moltó, Daniel San Martín, Martin Šeleng, Giang Nguyen, Marcin Płóciennik, Marta Obregón Ruiz, Susana Rebolledo Ruiz, Vicente Rodriguez, Judith Sáinz-Pardo Díaz, Viet Tran

2512.07209 2026-03-18 cs.MM cs.LG cs.SD

Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits

Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji

Comments Source code: https://github.com/SonyResearch/CoherentAVEdit

2511.08852 2026-03-18 eess.SP cs.LG cs.NI

DRL-Based Beam Positioning for LEO Satellite Constellations with Weighted Least Squares

Po-Heng Chou, Chiapin Wang, Kuan-Hao Chen, Wei-Chen Hsiao

Comments 6 pages, 3 figures, 1 table, accepted by 2026 IEEE ICC Workshops

2510.20606 2026-03-18 cs.GT cs.CY cs.LG econ.TH

Strategic Costs of Perceived Bias in Fair Selection

L. Elisa Celis, Lingxiao Huang, Milind Sohoni, Nisheeth K. Vishnoi

Comments The paper has been accepted by NeurIPS 2025

2510.02869 2026-03-18 cs.CY cs.AI cs.CV

Representing Beauty: Towards a Participatory but Objective Latent Aesthetics

Alexander Michael Rusnak

2508.03715 2026-03-18 eess.SP cs.AI cs.HC cs.LG

Detection of Autonomic Dysreflexia in Individuals With Spinal Cord Injury Using Multimodal Wearable Sensors

Bertram Fuchs, Mehdi Ejtehadi, Ana Cisnal, Jürgen Pannek, Anke Scheel-Sailer, Robert Riener, Inge Eriks-Hoogland, Diego Paez-Granados

详情

DOI: 10.1038/s41598-025-33797-8

英文摘要

Autonomic Dysreflexia (AD) is a potentially life-threatening condition characterized by sudden, severe blood pressure (BP) spikes in individuals with spinal cord injury (SCI). Early, accurate detection is essential to prevent cardiovascular complications, yet current monitoring methods are either invasive or rely on subjective symptom reporting, limiting applicability in daily file. This study presents a non-invasive, explainable machine learning framework for detecting AD using multimodal wearable sensors. Data were collected from 27 individuals with chronic SCI during urodynamic studies, including electrocardiography (ECG), photoplethysmography (PPG), bioimpedance (BioZ), temperature, respiratory rate (RR), and heart rate (HR), across three commercial devices. Objective AD labels were derived from synchronized cuff-based BP measurements. Following signal preprocessing and feature extraction, BorutaSHAP was used for robust feature selection, and SHAP values for explainability. We trained modality- and device-specific weak learners and aggregated them using a stacked ensemble meta-model. Cross-validation was stratified by participants to ensure generalizability. HR- and ECG-derived features were identified as the most informative, particularly those capturing rhythm morphology and variability. The Nearest Centroid ensemble yielded the highest performance (Macro F1 = 0.77+/-0.03), significantly outperforming baseline models. Among modalities, HR achieved the highest area under the curve (AUC = 0.93), followed by ECG (0.88) and PPG (0.86). RR and temperature features contributed less to overall accuracy, consistent with missing data and low specificity. The model proved robust to sensor dropout and aligned well with clinical AD events. These results represent an important step toward personalized, real-time monitoring for individuals with SCI.

URL PDF HTML ☆

赞 0 踩 0

2507.21903 2026-03-18 cs.SI cs.CL cs.IR

Who's important? -- SUnSET: Synergistic Understanding of Stakeholder, Events and Time for Timeline Generation

Tiviatis Sim, Kaiwen Yang, Shen Xin, Kenji Kawaguchi

2507.21790 2026-03-18 econ.EM cs.AI

Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities

Georges Sfeir, Gabriel Nova, Stephane Hess, Sander van Cranenburgh

Comments 35 pages, 8 figures, 14 tables

详情

英文摘要

Large Language Models (LLMs) are becoming widely used to support various workflows across different disciplines, yet their potential in discrete choice modelling remains relatively unexplored. This work examines the potential of LLMs as assistive agents in the specification and, where technically feasible, estimation of Multinomial Logit models. We implement a systematic experimental framework involving twelve versions of seven leading LLMs (ChatGPT, Claude, DeepSeek, Gemini, Gemma, Llama, and Mistral) evaluated under five experimental configurations. These configurations vary along three dimensions: (i) modelling goal (suggesting vs. suggesting and estimating MNL models); (ii) prompting strategy (Zero-Shot vs. Chain-of-Thoughts (CoT)); and (iii) information availability (full dataset vs. data dictionary summarising variable names and types). Each specification suggested by the LLMs is implemented, estimated, and evaluated based on goodness-of-fit metrics, behavioural plausibility, and model complexity. Our findings reveal that proprietary LLMs can generate valid and behaviourally sound utility specifications, particularly when guided by structured prompts (CoT). Open-weight models such as Llama and Gemma struggled to produce meaningful specifications. Notably, some LLMs performed better when provided with just data dictionary, suggesting that limiting raw data access may enhance internal reasoning capabilities. Among all LLMs, GPT o3, operating in an agentic setting, was uniquely capable of correctly estimating its own specifications by executing self-generated code. Overall, the results demonstrate both the promise and current limitations of LLMs as assistive agents in discrete choice modelling, not only for model specification but also for supporting modelling decision and estimation, and provide practical guidance for integrating these tools into choice modellers' workflows.

URL PDF HTML ☆

赞 0 踩 0

2507.19490 2026-03-18 cs.HC cs.CV

RISEE: A Highly Interactive Naturalistic Driving Trajectories Dataset with Human Subjective Risk Perception and Eye-tracking Information

Xinzheng Wu, Junyi Chen, Peiyi Wang, Shunxiang Chen, Haolan Meng, Yong Shen

Comments Preprint accepted by ITSC 2025

详情

DOI: 10.1109/ITSC60802.2025.11423845
Journal ref: 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC)

英文摘要

In the research and development (R&D) and verification and validation (V&V) phases of autonomous driving decision-making and planning systems, it is necessary to integrate human factors to achieve decision-making and evaluation that align with human cognition. However, most existing datasets primarily focus on vehicle motion states and trajectories, neglecting human-related information. In addition, current naturalistic driving datasets lack sufficient safety-critical scenarios while simulated datasets suffer from low authenticity. To address these issues, this paper constructs the Risk-Informed Subjective Evaluation and Eye-tracking (RISEE) dataset which specifically contains human subjective evaluations and eye-tracking data apart from regular naturalistic driving trajectories. By leveraging the complementary advantages of drone-based (high realism and extensive scenario coverage) and simulation-based (high safety and reproducibility) data collection methods, we first conduct drone-based traffic video recording at a highway ramp merging area. After that, the manually selected highly interactive scenarios are reconstructed in simulation software, and drivers' first-person view (FPV) videos are generated, which are then viewed and evaluated by recruited participants. During the video viewing process, participants' eye-tracking data is collected. After data processing and filtering, 3567 valid subjective risk ratings from 101 participants across 179 scenarios are retained, along with 2045 qualified eye-tracking data segments. The collected data and examples of the generated FPV videos are available in our website.

URL PDF HTML ☆

赞 0 踩 0

2506.24034 2026-03-18 physics.med-ph cs.CV

Supervised Diffusion-Model-Based PET Image Reconstruction

George Webber, Alexander Hammers, Andrew P King, Andrew J Reader

Comments 12 pages, 6 figures. Submitted to MICCAI 2025, not peer-reviewed

2506.01324 2026-03-18 stat.ML cs.IT cs.LG math.IT math.PR

Near-Optimal Clustering in Mixture of Markov Chains

Junghyun Lee, Yassir Jedra, Alexandre Proutière, Se-Young Yun

Comments AISTATS 2026 (50 pages, 6 figures) (ver3: camera-ready version, major revisions)

2505.20085 2026-03-18 cs.HC cs.AI

Explanation User Interfaces: A Systematic Literature Review

Eleonora Cappuccio, Andrea Esposito, Francesco Greco, Giuseppe Desolda, Rosa Lanzilotti, Salvatore Rinzivillo

Comments Second version