arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.22753 2026-04-27 cs.LG

Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection

Sijie Li, Shanda Li, Haowei Lin, Weiwei Sun, Ameet Talwalkar, Yiming Yang

详情
英文摘要

Scaling laws are used to plan multi-million-dollar training runs, but fitting those laws can itself cost millions. In modern large-scale workflows, assembling a sufficiently informative set of pilot experiments is already a major budget-allocation problem rather than a routine preprocessing step. We formulate scaling-law fitting as budget-aware sequential experimental design: given a finite pool of runnable experiments with heterogeneous costs, choose which runs to execute so as to maximize extrapolation accuracy in a high-cost target region. We then propose an uncertainty-aware method for sequentially allocating experimental budget toward the runs most useful for target-region extrapolation. Across a diverse benchmark of scaling-law tasks, our method consistently outperforms classical design-based baselines, and often approaches the performance of fitting on the full experimental set while using only about 10% of the total training budget. Our code is available at https://github.com/PlanarG/active-sl.

2604.22749 2026-04-27 cs.CL

Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities

Ilana Nguyen, Harini Suresh, Thema Monroe-White, Evan Shieh

Comments FAccT '26, June 25-28, 2026, Montreal, QC, Canada

详情
英文摘要

Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating harmful biases about non-dominant communities across the globe. To better evaluate and mitigate such harms, more research examining how LLMs portray diverse individuals is needed. In this work, we study how national origin identities are portrayed by widely-adopted LLMs in response to open-ended narrative generation prompts. Our findings demonstrate the presence of persistent representational harms by national origin, including harmful stereotypes, erasure, and one-dimensional portrayals of Global Majority identities. Minoritized national identities are simultaneously underrepresented in power-neutral stories and overrepresented in subordinated character portrayals, which are over fifty times more likely to appear than dominant portrayals. The degree of harm is amplified when US nationality cues (e.g., ``American'') are present in input prompts. Notably, we find that the harms we identify cannot be explained away via sycophancy, as US-centric biases persist even when replacing US nationality cues with non-US national identities in the prompts. Based on our findings, we call for further exploration of cultural harms in LLMs through methodologies that center Global Majority perspectives and challenge the uncritical adoption of US-based LLMs for the classification, surveillance, and misrepresentation of the majority of our planet.

2604.22747 2026-04-27 cs.SE

Code for All: Educational Applications of the "Vibe Coding" Hackathon in Programming Education across All Skill Levels

Ashley J. Chen, Yijia Cao, Minghao Shao, Ramesh Karri, Muhammad Shafique

Comments 15 pages, 14 figures

详情
英文摘要

The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-long online hackathon that welcomed participants from multiple countries, ranging from complete beginners to experienced developers. The hackathon offered three tracks with increasing technical demands. Spark emphasized basic frontend functionality and dynamic features such as buttons, forms, and API calls. Build required backend or database integration. Launch targeted production ready web applications, including deployment. Participants were required to develop projects using only LLM generated code without manual edits and submitted complete chat histories, source code, demo videos, and functionality reports. We assessed educational effectiveness with a mixed methods design that combined standardized project evaluations across functionality, user interface and user experience design, impact, prompt quality, and code readability, along with post-hackathon surveys of perceived learning outcomes and thematic analysis of open-ended feedback. Our findings describe how participants with different backgrounds engage with vibe coding as task complexity increases, how the no manual editing constraint shapes prompting and debugging practices, and what these patterns imply for integrating AI assisted development into programming education and competitive learning environments.

2604.22746 2026-04-27 math.OC cs.LG

Relaxation-Informed Training of Neural Network Surrogate Models

Calvin Tsay

Comments 35 pages, 5 figures

详情
英文摘要

ReLU neural networks trained as surrogate models can be embedded exactly in mixed-integer linear programs (MILPs), enabling global optimization over the learned function. The tractability of the resulting MILP depends on structural properties of the network, i.e., the number of binary variables in associated formulations and the tightness of the continuous LP relaxation. These properties are determined during training, yet standard training objectives (prediction loss with classical weight regularization) offer no mechanism to directly control them. This work studies training regularizers that directly target downstream MILP tractability. Specifically, we propose simple bound-based regularizers that penalize the big-M constants of MILP formulations and/or the number of unstable neurons. Moreover, we introduce an LP relaxation gap regularizer that explicitly penalizes the per-sample gap of the continuous relaxation at training points. We derive its associated gradient and provide an implementation from LP dual variables without custom automatic differentiation tools. We show that combining the above regularizers can approximate the full total derivative of the LP gap with respect to the network parameters, capturing both direct and indirect sensitivities. Experiments on non-convex benchmark functions and a two-stage stochastic programming problem with quantile neural network surrogates demonstrate that the proposed regularizers can reduce MILP solve times by up to four orders of magnitude relative to an unregularized baseline, while maintaining competitive surrogate model accuracy.

2604.22744 2026-04-27 cs.SI cs.IT math.IT q-bio.QM

Multiplex Hypergraph Modeling of Higher Order Structures in Psychometric Networks

Francesca Possenti, Laura Girelli, Paolo Tieri, Manuela Petti

Comments 17 pages, 6 figures, 2 tables

详情
英文摘要

Psychiatric disorders have been traditionally conceptualized as latent conditions producing observable symptoms, but recent studies suggest that psychopathology may emerge from symptoms interactions. Psychometric networking model these relations focusing on pairwise associations but overlooks higher-order dependencies arising among groups of variables. These dependencies may reflect synergistic mechanisms, where joint symptom configurations convey more information than pairwise relations, or redundancy, where information overlaps. We introduce an information-theoretic multiplex hypergraph framework to identify and compare higher-order interactions in eating disorders data, across diagnostic groups (e.g., anorexia nervosa). Higher-order structures are quantified using $Ω$-information, a measure that captures the balance between redundancy and synergy. To address the combinatorial growth of candidate subsets, multiple testing and estimation instability, we propose a structured pipeline comprising: (i) targeted candidate selection based on dyadic network topology and theory-driven subscale information; (ii) a three-stage inferential procedure combining null-model testing with bootstrap robustness assessment; and (iii) the construction and analysis of diagnosis-layered, synergistic and redundant multiplex hypergraphs. Results highlight how synergy captures the emergent, higher-order organization of diagnoses, revealing both a stable transdiagnostic core and diagnosis-specific ways in which these domains combine. By contrast, redundancy is confined to eating and body-image related content, marking reinforcement rather than broader symptom integration.

2604.22742 2026-04-27 cs.CC

Boolean PCSPs through the lens of Fourier Analysis

Demian Banakh, Katzper Michno

详情
英文摘要

We develop an analytical framework for Boolean Promise Constraint Satisfaction Problems (PCSPs) that studies polymorphisms through the notion of influence from Fourier analysis of Boolean functions. Extending the work of Brakensiek, Guruswami, and Sandeep [ICALP'21] on Ordered PCSPs, we identify two general phenomena in Boolean minions indicative of hardness or tractability: (1) preservation of coordinate influence under random 2-to-1 minors and (2) the presence of sharp thresholds. We demonstrate that these phenomena occur in broader settings than previously established, yielding new hardness/tractability results for minions consisting of unate or polynomial threshold functions.

2604.22740 2026-04-27 eess.SP cs.IT math.IT

Minimax Optimal Procedures for Joint Detection and Estimation

Dominik Reinhard, Michael Fauß, Abdelhak M. Zoubir

Comments 13 pages, 3 figures, 2 tables

详情
英文摘要

We investigate the problem of jointly testing a pair of composite hypotheses and, depending on the test result, estimating a random parameter under distributional uncertainties. Specifically, it is assumed that the distribution of the data given the parameter of interest, is subject to uncertainty. Both, a Bayesian formulation and a Neyman-Pearson-like formulation, are considered. It is shown that the optimal policy induces an $f$-similarity that must be maximized to identify the least favorable distributions. Besides the general results, the implementation is investigated using a band-type uncertainty model. For designing the minimax procedures, existing algorithms are modified to increase convergence speed while maintaining numerical stability. The proposed theory is supplemented by numerical results for both formulations.

2604.22739 2026-04-27 cs.CV

Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis

Xiang Zhang, Xiaotian Li, Taoyue Wang, Nan Bi, Xin Zhou, Cody Zhou, Zoie Wang, Andrew Yang, Yuming Su, Jeff Cohn, Qiang Ji, Lijun Yin

详情
英文摘要

Social interactions dominate our perceptions of the world and shape our daily behavior by attaching social meaning to acts as simple and spontaneous as gestures, facial expressions, voice, and speech. People mimic and otherwise respond to each other's postures, facial expressions, mannerisms, and other verbal and nonverbal behavior, and form appraisals or evaluations in the process. Yet, no publicly-available dataset includes multimodal recordings and self-report measures of multiple persons in social interaction. Dyadic recordings and annotation are lacking. We present a new data corpus of multimodal dyadic interaction (45 dyads, 90 persons) that includes synchronized multi-modality behavior (2D face video, 3D face geometry, thermal spectrum dynamics, voice and speech behavior, physiology (PPG, EDA, heart-rate, blood pressure, and respiration), and self-reported affect of all participants in a communicative interaction scenario. Two types of dyads are included: persons with shared past history and strangers. Annotations include social signals, agreement, disagreement, and neutral stance. With a potent emotion induction, these multimodal data will enable novel modeling of multimodal interpersonal behavior. We present extensive experiments to evaluate multimodal dyadic communication of dyads with and without interpersonal history, and their affect. This new database will make multimodal modeling of social interaction never possible before. The dataset includes 20TB of multimodal data to share with the research community.

2604.22737 2026-04-27 eess.SY cs.SY math.CO math.OC

A Vehicle Routing Problem for Human-Centered Electric Mobility

Mostafa Emam, Björn Martens, Thomas Rottmann, Matthias Gerdts

Comments 7 pages, 5 figures, standard IEEE double-column format

详情
英文摘要

In this paper, we present the Electric Mobility Dial-a-Ride Problem (EM-DARP), which extends the Electric Vehicle Dial-a-Ride Problem (EV-DARP) to better accommodate human-focused mobility services. The problem involves utilizing a fleet of heterogeneous Electric Vehicles (EVs) to fulfill a set of customer requests with DARP and mobility-related specifications, while incorporating visits to charging stations amid requests. The problem is formulated as a Mixed-Integer Linear Program (MILP) and subsequently solved for a number of curated evaluation scenarios to demonstrate its practical applicability.

2604.22736 2026-04-27 cs.LO cs.AI

An Undecidability Proof for the Plan Existence Problem

Antonis Achilleos

详情
英文摘要

The plan existence problem asks, given a goal in the form of a formula in modal logic, an initial epistemic state (a pointed Kripke model), and a set of epistemic actions, whether there exists a sequence of actions that can be applied to reach the goal. We prove that even in the case where the preconditions of the epistemic actions have modal depth at most 1, and there are no postconditions, the plan existence problem is undecidable. The (un)decidability of this problem was previously unknown.

2604.22734 2026-04-27 gr-qc cs.NA math.NA

Radiation outer boundary conditions and near-to-far field signal transformations for the Bardeen-Press equation

Som Dev Bishoyi, Scott E. Field, Stephen R. Lau

Comments 26 pages, 8 figures, 4 tables

详情
英文摘要

Several theoretical and astrophysical problems - including gravitational-wave modeling for extreme mass-ratio inspirals - require accurate time-domain solutions of the spin-weight $s=-2$ Teukolsky equation in Boyer-Lindquist coordinates. Because such simulations are performed on finite computational domains, they typically introduce an artificial outer boundary where nontrivial boundary conditions must be imposed. If these conditions are inaccurate, then spurious reflections and slowly-growing unphysical modes may corrupt long-time evolutions. We develop and implement exact radiation outer boundary conditions for the Bardeen-Press equation (a harmonic moment of the $a=0$ Teukolsky equation), making the artificial boundary transparent at any finite radius. We also construct near-to-far field teleportation kernels that map field data recorded at finite radius $r_1$ to the data reaching $r_2 > r_1$. The possible choice $r_2 = \infty$ corresponds to asymptotic waveform evaluation, that is propagation of the data to future null infinity. We show that both boundary and teleportation kernels are well approximated by exponential sums, with associated error bounds. Implemented in a time-domain solver, our kernel-based boundary conditions eliminate unphysical late-time growth and give the correct late-time decay rates, affording efficient long-duration simulations for waveform modeling and related blackhole perturbation calculations.

2604.22732 2026-04-27 math.NA cs.NA physics.comp-ph

Craig-Bampton-based Quadratic Manifold for Nonlinear Substructuring

Alexander Saccani, Paolo Tiso

详情
英文摘要

Component Mode Synthesis methods, such as the Craig-Bampton (CB) approach, are widely used in structural dynamics due to their modularity and compatibility with substructuring workflows. While highly effective for linear systems, extending these methods to geometrically nonlinear structures remains a significant challenge. In this work, we propose a nonlinear extension of the CB method tailored to such contexts. The approach is based on the construction of a quadratic reduction manifold, derived via perturbation analysis, in which high-frequency fixed-interface modes are statically condensed onto a reduced set of low-frequency modes and interface coordinates. This formulation enables the representation of geometric nonlinear effects without increasing the number of reduced degrees of freedom.The resulting Nonlinear Craig-Bampton (NL-CB) reduced-order model is obtained through Galerkin projection onto the tangent space of the manifold and admits a polynomial structure that is efficient for time integration. The formulation preserves the Lagrangian structure of the underlying finite element model, ensuring consistent energetic behavior and numerical stability.The proposed method is demonstrated on representative nonlinear structural systems of increasing complexity. The results show that the NL-CB model captures the essential nonlinear dynamic response while retaining the modularity and computational efficiency of classical substructuring approaches.

2604.22730 2026-04-27 cs.LG cs.CL

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

Hillary Mutisya, John Mugane

详情
英文摘要

We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a transformer over Bantu morphological paradigms, we analyze 14 Eastern and Southern Bantu languages, extract encoder embeddings for their noun and verb lemmas, and identify 728 noun and 1,525 verb cognate candidates shared across 5+ languages. Evaluating these candidates against established historical resources-the Bantu Lexical Reconstructions database (BLR3; 4,786 reconstructed Proto-Bantu forms) and the ASJP basic vocabulary-we confirm 10 of the top 11 noun candidates (90.9%) align with previously reconstructed Proto-Bantu forms, including *-ntU 'person' (8 languages), *gombe 'cow' (9 languages), and *mUn (9 languages). Extending to verbs, 12 verb cognates align with reconstructed Proto-Bantu roots, including *-bon- 'see' and *-jIm- 'stand', each attested across wide geographic ranges. Cross-model validation using an independent translation model (NLLB-600M) confirms these patterns: both models recover cognate clusters and phylogenetic groupings consistent with established Guthrie-zone classifications (p < 0.01). Cross-lingual noun class analysis reveals that all 13 productive classes maintain >0.83 cosine similarity across languages (within-class > between-class, p < 10^-9). Our dataset is restricted to Eastern and Southern Bantu, so we interpret these results as recovering shared Bantu lexical structure consistent with Proto-Bantu rather than definitively distinguishing Proto-Bantu retentions from later regional innovations.

2604.22724 2026-04-27 cs.RO cs.SY eess.SY

GCImOpt: Learning efficient goal-conditioned policies by imitating optimal trajectories

Jon Goikoetxea, Jesús F. Palacián

Comments Accepted for publication at the 8th Annual Conference on Learning for Dynamics and Control (L4DC 2026). 16 pages (including appendix), 1 figure. For project website, see https://jongoiko.github.io/gcimopt/

详情
英文摘要

Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the generality of our approach, we generate datasets and then train policies for various control tasks, namely cart-pole stabilization, planar and three-dimensional quadcopter stabilization, and point reaching using a 6-DoF robot arm. We show that our trained policies can achieve high success rates and near-optimal control profiles, all while being small (less than 80,000 neural network parameters) and fast enough (up to more than 6,000 times faster than a trajectory optimization solver) that they could be deployed onboard resource-constrained controllers. We provide videos, code, datasets and pre-trained policies under a free software license; see our project website https://jongoiko.github.io/gcimopt/.

2604.22723 2026-04-27 cs.LG cs.CL

Zero-Shot Morphological Discovery in Low-Resource Bantu Languages via Cross-Lingual Transfer and Unsupervised Clustering

Hillary Mutisya, John Mugane

详情
英文摘要

We present a method for discovering morphological features in low-resource Bantu languages by combining cross-lingual transfer learning with unsupervised clustering. Applied to Giriama (nyf), a language with only 91 labeled paradigms, our pipeline discovers noun class assignments for 2,455 words and identifies two previously undocumented morphological patterns: an a- prefix variant for Class 2 (vowel coalescence - the merger of two adjacent vowels - of wa-, 95.1% consistency) and a contracted k'- prefix (98.5% consistency). External validation on 444 known Giriama verb paradigms confirms 78.2% lemmatization accuracy, while a v3 corpus expansion to 19,624 words (9,014 unique lemmas) achieves 97.3% segmentation and 86.7% lemmatization rates across all major word classes. Our ensemble of transfer learning from Swahili and unsupervised clustering, combined via weighted voting, exploits complementary strengths: transfer excels at cognate detection (leveraging ~60% vocabulary overlap) while clustering discovers language-specific innovations invisible to transfer. We release all code and discovered lexicons to support morphological documentation for low-resource Bantu languages.

2604.22721 2026-04-27 physics.ao-ph cs.NA math.NA physics.data-an

Spectral-Domain Local Statistics with Missing-Data Support for Cartesian and Polar Grids

Jairo M. Valdivia-Prado, William E. Chapman, Katja Friedrich

Comments Accompanies the open-source dct_toolkit package

详情
英文摘要

This paper presents a method for computing local mean, variance, standard deviation, and effective sample count on incomplete gridded data using boundary-aware spectral operators. The framework combines normalized convolution with explicit boundary-condition modeling: reflective Discrete Cosine Transform (DCT) for non-periodic Cartesian axes and periodic Real Fast Fourier Transform (RFFT) for circular azimuth processing in polar geometry. Stability safeguards (denominator floor, prefill fallback, and variance clamp) are specified for under-supported regions. We evaluate the framework across three targeted scenarios: a Cartesian boundary-condition check demonstrating the mitigation of wrap-around artifacts, a synthetic 3D outlier-identification test, and a real-radar polar application. Results establish bounded, support-aware interpretation of local statistics while preserving a concise reproducibility path through the open-source 'dct\_toolkit' implementation.

2604.22715 2026-04-27 cs.RO

ATRS: Adaptive Trajectory Re-splitting via a Shared Neural Policy for Parallel Optimization

Jiajun Yu, Guodong Liu, Li Wang, Pengxiang Zhou, Wentao Liu, Yin He, Chao Xu, Fei Gao, Yanjun Cao

Comments 8 pages, submitted to IEEE Robotics and Automation Letters

详情
英文摘要

Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often causes optimization stagnation in highly constrained regions, where a few lagging subproblems delay global convergence. A natural remedy is to adaptively re-split these stagnating segments online. Yet, deciding when, where, and how to split exceeds the capability of rule-based heuristics. To this end, we propose ATRS, a novel framework that embeds a shared Deep Reinforcement Learning policy into the parallel ADMM loop. We formulate this adaptive adjustment as a Multi-Agent Shared-Policy Markov Decision Process, where all trajectory segments act as homogeneous agents and share a unified neural policy network. This parameter-sharing architecture endows the system with size invariance, enabling it to handle dynamically changing segment counts during re-splitting and generalize to arbitrary trajectory lengths. Furthermore, our formulation inherently supports zero-shot generalization to unseen environments, as our network relies solely on the internal states of the numerical solver rather than on the geometric features of the environment. To ensure solver stability, a Confidence-Based Election mechanism selects only the most stagnating segment for re-splitting at each step. Extensive simulations demonstrate that ATRS accelerates convergence, reducing the number of iterations by up to 26.0% and the computation time by up to 19.1%. Real-world experiments further confirm its applicability to both large-scale offline global planning and real-time onboard replanning within 35 ms per cycle, with no sim-to-real degradation.

2604.22714 2026-04-27 cs.CV

Long-tail Internet photo reconstruction

Yuan Li, Yuanbo Xiangli, Hadar Averbuch-Elor, Noah Snavely, Ruojin Cai

Comments Project page: https://megadepth-x.github.io/

详情
英文摘要

Internet photo collections exhibit an extremely long-tailed distribution: a few famous landmarks are densely photographed and easily reconstructed in 3D, while most real-world sites are represented with sparse, noisy, uneven imagery beyond the capabilities of both classical and learned 3D methods. We believe that tackling this long-tail regime represents one of the next frontiers for 3D foundation models. Although reliable ground-truth 3D supervision from sparse scenes is challenging to acquire, we observe that it can be effectively simulated by sampling sparse subsets from well-reconstructed Internet landmarks. To this end, we introduce MegaDepth-X, a large dataset of 3D reconstructions with clean, dense depth, together with a strategy for sampling sets of training images that mimic camera distributions in long-tail scenes. Finetuning 3D foundation models with these components yields robust reconstructions under extreme sparsity, and also enables more reliable reconstruction in symmetric and repetitive scenes, while preserving generalization to standard, dense 3D benchmark datasets.

2604.22710 2026-04-27 cs.NI

Evaluation of the effects of 3GPP-specific beamforming and channel estimation on the 3D EIRP profile of a 5G gNB

Armed Tusha, Joshua Roy Palathinkal, Monisha Ghosh

详情
英文摘要

Spatial domain exploitation through 3D beamforming serves as a critical technology enabler for performance enhancement in the Fifth Generation New Radio (5G NR) specification. This is realized at the gNodeB (gNB) through the integration of massive antenna element arrays that facilitates 3D spatial multiplexing. However, these systems with high-directional transmissions also represent a threat to incumbent services such as radar and satellites. These incumbents already operate in midband spectrum\textemdash{}including the 4.4-4.9 GHz and 7.125-7.4 GHz bands\textemdash{}that are currently being evaluated for future cellular deployments. Here, we present the first work that evaluates the transmitted Effective Isotropic Radiated Power (EIRP) of a gNB in 3D space, using the 3GPP Release-18 standard for FR-1 instead of theoretical analyses of beam nulling, which can be simplistic. We shed light on the problems requiring attention with the EIRP profile in 3D space for existing codebook designs predefined in 3GPP: i) interference from a gNB does not depend only on the worst-case beamforming direction, but on a variety of beamforming directions due to side-lobes; ii) advanced antenna systems (AAS) architecture and antenna port configurations play a crucial role in average 3D EIRP, which are implementation dependent, and iii) we introduce two beam nulling methods, which achieve a 11 dB power reduction toward a target direction, with 3.5-4.5 dB SNR loss in UE link performance at a 10^{-4} bit error rate (BER) across modulation schemes under ideal and practical channel estimation, a higher loss compared to predictions from theoretical analyses.

2604.22708 2026-04-27 cs.MA

Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems

Mengzhuo Chen, Junjie Wang, Fangwen Mu, Yawen Wang, Zhe Liu, Huanxiang Feng, Qing Wang

Comments Accepted by ACL 2026

详情
英文摘要

Failure attribution, i.e., identifying the responsible agent and decisive step of a failure, is particularly challenging in LLM-based multi-agent systems (MAS) due to their natural-language reasoning, nondeterministic outputs, and intricate interaction dynamics. A reliable benchmark is therefore essential to guide and evaluate attribution techniques. Yet existing benchmarks rely on partially observable traces that capture only agent outputs, omitting the inputs and context that developers actually use when debugging. We argue that failure attribution should be studied under full execution observability, aligning with real-world developer-facing scenarios where complete traces, rather than only outputs, are accessible for diagnosis. To this end, we introduce TraceElephant, a benchmark designed for failure attribution with full execution traces and reproducible environments. We then systematically evaluate failure attribution techniques across various configurations. Specifically, full traces improve attribution accuracy by up to 76\% over a partial-observation counterpart, confirming that missing inputs obscure many failure causes. TraceElephant provides a foundation for follow-up failure attribution research, promoting evaluation practices that reflect real-world debugging and supporting the development of more transparent MASs.

2604.22700 2026-04-27 cs.CV

Generative Modeling of Neurodegenerative Brain Anatomy with 4D Longitudinal Diffusion Model

Nivetha Jayakumar, Swakshar Deb, Bahram Jafrasteh, Qingyu Zhao, Miaomiao Zhang

详情
英文摘要

Understanding and predicting the progression of neurodegenerative diseases remains a major challenge in medical AI, with significant implications for early diagnosis, disease monitoring, and treatment planning. However, most available longitudinal neuroimaging datasets are temporally sparse with a few follow-up scans per subject. This scarcity of temporal data limits our ability to model and accurately capture the continuous anatomical changes related to disease progression in individual subjects. To address this problem, we propose a novel 4D (3DxT) diffusion-based generative framework that effectively models and synthesizes longitudinal brain anatomy over time, conditioned on available clinical variables such as health status, age, sex, and other relevant factors. Moreover, while most current approaches focus on manipulating image intensity or texture, our method explicitly learns the data distribution of topology-preserving spatiotemporal deformations to effectively capture the geometric changes of brain structures over time. This design enables the realistic generation of future anatomical states and the reconstruction of anatomically consistent disease trajectories, providing a more faithful representation of longitudinal brain changes. We validate our model through both synthetic sequence generation and downstream longitudinal disease classification, as well as brain segmentation. Experiments on two large-scale longitudinal neuroimage datasets demonstrate that our method outperforms state-of-the-art baselines in generating anatomically accurate, temporally consistent, and clinically meaningful brain trajectories. Our code is available on Github.

2604.22697 2026-04-27 cs.CY cs.HC

RFID-Based Non-Biometric Classroom Attendance System: Proxy Attendance Detection via Weight Sensor Integration

Furkan Ege, Muhsin Özdemir

Comments Full English version followed by the original Turkish version of the paper. Main text in English; Turkish translation appended after the English text

详情
英文摘要

Attendance tracking in educational institutions, when conducted through traditional methods, leads to structural problems that consume instruction time and threaten academic integrity. Attendance durations spanning several minutes in primary and secondary education and exceeding ten minutes in higher education, combined with the proxy attendance problem of signing on behalf of someone else, demonstrate the need for electronic systems. Most existing electronic solutions rely on biometric authentication, which raises legal and ethical risks under the European General Data Protection Regulation (GDPR), the Turkish Personal Data Protection Law (KVKK), and the United States Family Educational Rights and Privacy Act (FERPA). Systems using RFID alone provide no built-in safeguard against proxy attendance through card transfer. This study proposes a biometric-free IoT attendance system addressing both deficiencies. The prototype consists of an RFID module, RFID cards, weight sensors, a Bluetooth module, and an Arduino UNO microcontroller. After the student presents their RFID card, the weight sensor measurement is compared against a statistical reference range of 350 individuals (aged 18-22) compiled from three Kaggle datasets; no personal biometric data is recorded. A Python-based GUI performs student management, course tracking, and CSV-based reporting via Bluetooth. Qualitative tests in conditions close to a real classroom have shown that the RFID reading, weight verification, Bluetooth communication, and GUI modules operate in an integrated manner as expected. The proposed system offers a low-cost and reproducible solution that aims to reduce proxy attendance without storing biometric data.

2604.22695 2026-04-27 eess.SP cs.LG

Time-Localized Parametric Decomposition of Respiratory Airflow for Sub-Breath Analysis

Victoria Ribeiro Rodrigues, Paul W. Davenport, Nicholas J. Napoli

Comments Submitted to IEEE Journal of Biomedical and Health Informatics (under review). 18 pages, 7 figures, 5 tables

详情
英文摘要

Respiratory airflow signals provide critical insight into breathing mechanics, yet conventional analysis methods remain limited in their ability to characterize the internal structure of individual breaths. Traditional approaches treat airflow as a quasi-periodic signal and rely on global descriptors such as tidal volume or peak flow, obscuring sub-breath events that reflect neuromuscular coordination and compensatory breathing strategies. This study introduces a parametric framework for decomposing inspiratory airflow into a small number of time-localized components with explicit amplitude, onset time, and duration parameters. Unlike spectral or data-adaptive methods, the proposed approach employs physiologically grounded basis functions, Half-Sine, Gaussian, and Beta, to represent intrabreath waveform morphology through constrained nonlinear optimization. Evaluation across 8,276 breaths demonstrates high reconstruction accuracy (mean squared error $<$ 0.001 for four-component models) and robust parameter precision under moderate noise. Component-derived features describing sub-breath timing and coordination improved classification of cognitive fatigue states arising from cognitive-respiratory competition by up to 30.7% in Matthews correlation coefficient compared with classical respiratory metrics. These results establish that modeling airflow as a sum of parameterized, time-localized primitives provides an interpretable and precise foundation for quantifying intrabreath organization, compensatory breathing dynamics, and respiratory motor control adaptation under cognitive-respiratory dual-task demands.

2604.22693 2026-04-27 cs.CL cs.AI

CRAFT: Clustered Regression for Adaptive Filtering of Training data

Parthasarathi Panda, Asheswari Swain, Subhrakanta Panda

详情
英文摘要

Selecting a small, high-quality subset from a large corpus for fine-tuning is increasingly important as corpora grow to tens of millions of datapoints, making full fine-tuning expensive and often unnecessary. We propose CRAFT (Clustered Regression for Adaptive Filtering of Training data), a vectorization-agnostic selection method for training sequence-to-sequence models. CRAFT decomposes the joint source-target distribution and performs a two-stage selection: (i) match the validation source distribution through proportional budget allocation across k-means clusters, and (ii) within each source cluster, select training pairs whose target embeddings minimize a conditional expected distance derived from the validation target distribution. We prove that proportional cluster allocation bounds the continuous KL divergence between selected and validation distributions, with the residual controlled by cluster diameters. We evaluate CRAFT on English-Hindi translation by selecting training data from 33 million NLLB sentence pairs and fine-tuning mBART via LoRA. CRAFT achieves 43.34 BLEU, outperforming TSDS (41.21) by 2.13 points on the same candidate pool and encoder while completing selection over 40 times faster. With TF-IDF vectorization, the entire pipeline completes in under one minute on CPU. TAROT achieves 45.61 BLEU, but CRAFT completes selection in 26.86 seconds versus TAROT's 75.6 seconds, a 2.8 time speedup.

2604.22685 2026-04-27 astro-ph.IM astro-ph.EP cs.NI cs.PF

CosmicDancePro -- Measuring LEO satellite's orbital decay and network connectivity implications during solar storms

Suvam Basak, Amitangshu Pal, Debopam Bhattacherjee

详情
英文摘要

The May 2024 solar superstorm highlighted the vulnerability of rapidly expanding low Earth orbit (LEO) satellite networks to severe space weather events. To systematically evaluate LEO network resilience, we introduce an open-source tool, CosmicDancePro. It enables a comprehensive analysis of the effects of solar storms in the LEO satellite network. It integrates real-world multimodal datasets, including space weather measurements from several satellites, upper-atmospheric density conditions from data-driven and high-fidelity physics-based models, and LEO satellite trajectory and LEO network measurement traces to quantify orbital decay driven by enhanced atmospheric density and network connectivity degradation. We utilize CosmicDancePro to analyze the Starlink constellation's behavior during two recent major solar storms. First, we identify the specific fleet management strategies Starlink adopts during the May 2024 solar superstorm and how they differ from its regular orbit-correction strategy. Second, we identify the mechanisms driving the previously unexplained 'W'-shaped altitude variation pattern across orbital planes of LEO constellations. Finally, our network-layer analysis quantifies the connectivity degradation during these storms, revealing transient disruptions that include repetitive short-lived outages, reconfiguration latency spikes above 500 ms, up to 60% increase in uplink loss, distorted diurnal latency patterns, and a 10+ Mbps drop in end-user data rates during storm peaks.

2604.22679 2026-04-27 cs.CY cs.AI

How Supply Chain Dependencies Complicate Bias Measurement and Accountability Attribution in AI Hiring Applications

Gauri Sharma, Maryam Molamohammadi

详情
英文摘要

The increasing adoption of AI systems in hiring has raised concerns about algorithmic bias and accountability, prompting regulatory responses including the EU AI Act, NYC Local Law 144, and Colorado's AI Act. While existing research examines bias through technical or regulatory lenses, both perspectives overlook a fundamental challenge: modern AI hiring systems operate within complex supply chains where responsibility fragments across data vendors, model developers, platform providers, and deploying organizations. This paper investigates how these dependency chains complicate bias evaluation and accountability attribution. Drawing on literature review and regulatory analysis, we demonstrate that fragmented responsibilities create two critical problems. First, bias emerges from component interactions rather than isolated elements, yet proprietary configurations prevent integrated evaluation. A resume parser may function without bias independently but contribute to discrimination when integrated with specific ranking algorithms and filtering thresholds. Second, information asymmetries mean deploying organizations bear legal responsibility without technical visibility into vendor-supplied algorithms, while vendors control implementations without meaningful disclosure requirements. Each stakeholder may believe they are compliant; nevertheless, the integrated system may produce biased outcomes. Analysis of implementation ambiguities reveals these challenges in practice. We propose multi-layered interventions including system-level audits, vendor guidelines, continuous monitoring mechanisms, and documentation across dependency chains. Our findings reveal that effective governance requires coordinated action across technical, organizational, and regulatory domains to establish meaningful accountability in distributed development environments.

2604.22678 2026-04-27 cs.CL

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering

Jinghong Chen, Jingbiao Mei, Guangyu Yang, Bill Byrne

详情
英文摘要

A common approach to question answering with retrieval-augmented generation (RAG) is to concatenate documents into a single context and pass it to a language model to generate an answer. While simple, this strategy can obscure the contribution of individual documents, making attribution difficult and contributing to the ``lost-in-the-middle'' effect, where relevant information in long contexts is overlooked. Concatenation also scales poorly: computational cost grows quadratically with context length, a problem that becomes especially severe when the context includes visual data, as in visual question answering. Attempts to mitigate these issues by limiting context length can further restrict performance by preventing models from benefiting from the improved recall offered by deeper retrieval. We propose Bayesian Ensemble Retrieval-Augmented Generation (BERAG), along with Bayesian Ensemble Fine-Tuning (BEFT), as a RAG framework in which language models are conditioned on individual retrieved documents rather than a single combined context. BERAG treats document posterior probabilities as ensemble weights and updates them token by token using Bayes' rule during generation. This approach enables probabilistic re-ranking, parallel memory usage, and clear attribution of document contribution, making it well-suited for large document collections. We evaluate BERAG and BEFT primarily on knowledge-based visual question answering tasks, where models must reason over long, imperfect retrieval lists. The results show substantial improvements over standard RAG, including strong gains on Document Visual Question Answering and multimodal needle-in-a-haystack benchmarks. We also demonstrate that BERAG mitigates the ``lost-in-the-middle'' effect. The document posterior can be used to detect insufficient grounding and trigger deflection, while document pruning enables faster decoding than standard RAG.

2604.22675 2026-04-27 cs.SI

Measuring Epistemic Unfairness for Algorithmic Decision-Making

Camilla Quaresmini, Lisa Piccinin, Valentina Breschi

详情
英文摘要

Algorithmic systems increasingly function as epistemic infrastructures that govern the conditions of interpretative access and social belief. Yet, mainstream auditing strategies operationalize fairness primarily in predictive terms - error rates, calibration, or group-level parity - leaving epistemic harms under-theorized and under-measured. We propose a quantitative framework for evaluating forms of epistemic injustice in algorithmic environments. First, we introduce a deficit-based template that models epistemic injustices as gaps between ideal and realized conditions across features such as credibility, uptake, and epistemic agency. We map these deficits to concrete stages of algorithmic mediation, showing how epistemic injustice can persist even when standard fairness constraints are satisfied. Drawing on distributive fairness indices, we distinguish two evaluation stances: resource inequality, where indices are applied to distributions of epistemic goods directly, and capability/rights inequity, where indices are applied to output-induced epistemic opportunity. We provide an epistemic translation of canonical indices, illustrating how they diagnose complementary signatures of unfairness - such as exclusionary tails and hierarchical concentration - and support longitudinal auditing under iterative deployment. We also provide a simulation study of a recommender-mediated opinion dynamics setting, showing how the proposed indices capture the evolution of epistemic unfairness under repeated platform interventions. The result is a measurement framework that makes the epistemic dimension of algorithmic harms explicit for system design and evaluation.

2604.22673 2026-04-27 cs.SE cs.SC

Inferring Equivalence Classes from Legacy Undocumented Embedded Binaries for ISO 26262-Compliant Testing

Marco De Luca, Domenico Francesco De Angelis, Domenico Amalfitano, Pasquale Cimmino, Anna Rita Fasolino

Comments Paper Accepted at EASE 26

详情
英文摘要

Equivalence class partitioning is a well-established test design technique mandated by safety standards such as ISO~26262 for systematic testing of safety software. In industrial practice, however, its application to legacy undocumented embedded firmware is often hindered by incomplete or outdated functional specifications. This paper proposes a binary-level methodology for inferring output-oriented equivalence classes directly from compiled firmware, without relying on source-level annotations or external documentation. The approach combines control-flow reconstruction and guided symbolic execution to analyze individual functions and group execution paths according to indistinguishable observable behavior, including return values and output parameters. An optional post-processing step produces human-readable representations to support comprehension and documentation. The methodology is evaluated in an industrial automotive context through a practitioner-based study assessing correctness and interpretability. Results indicate strong alignment with expert expectations and a positive perception of readability and usefulness for supporting function understanding and test design. These findings demonstrate the feasibility and practical relevance of binary-level equivalence class inference for systematic testing of legacy undocumented safety-embedded software.

2604.22672 2026-04-27 cs.LG

Iterative Model-Learning Scheme via Gaussian Processes for Nonlinear Model Predictive Control of (Semi-)Batch Processes

Tai Xuan Tan, Alexander Mitsos, Eike Cramer

Comments 12 pages, 7 figures

详情
英文摘要

Batch processes are inherently transient and typically nonlinear, motivating nonlinear model predictive control (NMPC). However, adopting NMPC is hindered by the cost and unavailability of dynamic models. Thus, we propose to use Gaussian Processes (GP) in a model-learning NMPC scheme (GP-MLMPC) for batch processes. We initialize the GP-MLMPC using data from a single initial trajectory, e.g., from a PI controller. We iteratively apply the NMPC embedded with GPs to run batches and update the GP with new observations from each iteration, thereby achieving batch-wise improvements. Using uncertainty quantification from the GPs, we formulate chance constraints to enforce safe operation to the required confidence levels. We demonstrate our approach in \textit{silico} on a semi-batch polymerization reactor for tracking and economic objectives over durations of two hours, and the reactor temperature is constrained in a range of $\pm2^\circ C$ around its setpoint. After only four batch iterations, tracking error from the GP-MLMPC scheme converged to a reduction of $83\%$, compared to the initial trajectory. Furthermore, under an economic objective, the GP-MLMPC resulted in a 17-fold increase in final product mass by iteration 8, compared to the initial trajectory. In both cases, the resulting GP-MLMPC performance is on par with the full-model NMPC, which shows that the optimal controller can be learned by the approach. By collecting samples around the optimal trajectory, the GP-MLMPC remains sample-efficient across iterations and achieves quick convergence. Thus, the proposed GP-MLMPC scheme presents a promising data-efficient approach for the control of nonlinear batch processes without mechanistic knowledge.