arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1296
2603.10377 2026-04-27 cs.LG cs.AI stat.ME

Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning

Md Muntaqim Meherab, Noor Islam S. Mohammad, Faiza Feroz

Comments We have recently encountered author conflicts related to this work and therefore respectfully request the withdrawal of this paper. We believe this step is necessary to address the situation appropriately and maintain academic integrity in the submission

详情
英文摘要

Sparse autoencoders can localize where concepts live in language models, but not how they interact during multi-step reasoning. We propose Causal Concept Graphs (CCG): a directed acyclic graph over sparse, interpretable latent features, where edges capture learned causal dependencies between concepts. We combine task-conditioned sparse autoencoders for concept discovery with DAGMA-style differentiable structure learning for graph recovery and introduce the Causal Fidelity Score (CFS) to evaluate whether graph-guided interventions induce larger downstream effects than random ones. On ARC-Challenge, StrategyQA, and LogiQA with GPT-2 Medium, across five seeds ($n{=}15$ paired runs), CCG achieves $\CFS=5.654\pm0.625$, outperforming ROME-style tracing ($3.382\pm0.233$), SAE-only ranking ($2.479\pm0.196$), and a random baseline ($1.032\pm0.034$), with $p<0.0001$ after Bonferroni correction. Learned graphs are sparse (5-6\% edge density), domain-specific, and stable across seeds.

2603.10093 2026-04-27 cs.LG cs.AI q-bio.QM

Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation

Junyi An, Chao Qu, Yun-Fei Shi, Zhijian Zhou, Fenglei Cao, Yuan Qi

详情
英文摘要

Recent 3D molecular generation methods primarily use asynchronous auto-regressive or synchronous diffusion models. While auto-regressive models build molecules sequentially, they're limited by a short horizon and a discrepancy between training and inference. Conversely, synchronous diffusion models denoise all atoms at once, offering a molecule-level horizon but failing to capture the causal relationships inherent in hierarchical molecular structures. We introduce Equivariant Asynchronous Diffusion (EAD) to overcome these limitations. EAD is a novel diffusion model that combines the strengths of both approaches: it uses an asynchronous denoising schedule to better capture molecular hierarchy while maintaining a molecule-level horizon. Since these relationships are often complex, we propose a dynamic scheduling mechanism to adaptively determine the denoising timestep. Experimental results show that EAD achieves state-of-the-art performance in 3D molecular generation.

2603.04855 2026-04-27 cs.CL

HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

Yilin Jiang, Fei Tan, Xuanyu Yin, Jing Leng, Aimin Zhou

Comments 46 pages, 7 figures, accepted by ACL 2026. The dataset is available at https://huggingface.co/datasets/sii-research/HACHIMI-1M

详情
英文摘要

Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI

2603.04328 2026-04-27 cs.LG econ.EM

Algorithmic Compliance and Regulatory Loss in Digital Assets

Khem Raj Bhatt, Krishna Sharma

Comments This paper has been withdrawn by the author as it requires substantial revision

详情
英文摘要

We study the deployment performance of machine learning based enforcement systems used in cryptocurrency anti money laundering (AML). Using forward looking and rolling evaluations on Bitcoin transaction data, we show that strong static classification metrics substantially overstate real world regulatory effectiveness. Temporal nonstationarity induces pronounced instability in cost sensitive enforcement thresholds, generating large and persistent excess regulatory losses relative to dynamically optimal benchmarks. The core failure arises from miscalibration of decision rules rather than from declining predictive accuracy per se. These findings underscore the fragility of fixed AML enforcement policies in evolving digital asset markets and motivate loss-based evaluation frameworks for regulatory oversight.

2602.21977 2026-04-27 cs.CV

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Liangwei Lyu, Jiaqi Xu, Jianwei Ding, Qiyao Deng

Comments Accepted to CVPR 2026 main track(poster)

详情
英文摘要

Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.

2602.21750 2026-04-27 cs.LG

From Words to Amino Acids: Does the Curse of Depth Persist?

Aleena Siji, Amir Mohammad Karimi Mamaghan, Ferdinand Kapl, Tobias Höppe, Emmanouil Angelis, Andrea Dittadi, Maurice Brenner, Michael Heinzinger, Karl Henrik Johansson, Kaitlin Maile, Johannes von Oswald, Stefan Bauer

详情
英文摘要

Protein language models (PLMs) have become widely adopted as general-purpose models, demonstrating strong performance in protein engineering and de novo design. Like large language models (LLMs), they are typically trained as deep transformers with next-token or masked-token prediction objectives on massive sequence corpora and are scaled by increasing model depth. Recent work on autoregressive LLMs has identified the Curse of Depth: many later layers contribute little to the final output predictions. These findings naturally raise the question of whether a similar depth inefficiency also appears in PLMs, where many widely used models are not autoregressive, and some are multimodal, accepting both protein sequence and structure as input. In this work, we present a depth analysis of seven popular PLM families across model scales, spanning autoregressive, masked, and diffusion objectives, and quantify how layer contributions evolve with depth using a unified set of probing-, perturbation-, and downstream-evaluation measurements. Across models, we observe consistent depth-dependent patterns that extend prior findings on LLMs: a large fraction of task-relevant computation is concentrated in a subset of layers, while the remaining layers mainly provide incremental refinement of the final prediction. These trends persist beyond sequence-only settings and also appear in multimodal PLMs. Taken together, our results suggest that depth inefficiency is a common feature of modern PLMs, motivating future work on more depth-efficient architectures and training methods.

2602.18734 2026-04-27 cs.CL cs.AI

Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem

Lichang Song, Ting Long, Yi Chang

详情
英文摘要

Retrieval-Augmented Generation (RAG) has demonstrated strong effectiveness in knowledge-intensive tasks by grounding language generation in external evidence. Despite its success, many existing RAG systems are built based on a ranking-centric, asymmetric dependency paradigm, where the generation quality of the generator is highly dependent on reranking results of the reranker. To overcome this limitation, we propose Cooperative Retrieval-Augmented Generation (CoRAG), a framework that treats the reranker and the generator as peer decision-makers rather than being connected through an asymmetric dependency pipeline. By jointly optimizing their behaviors toward a shared task objective, the reranker and generator are encouraged to cooperate, ensuring that document reranking and generation work in concert to improve the final response. Experimental results demonstrate good generalization and improved generation stability of CoRAG, even when the model is trained on only around 10K PopQA samples. Our model released in https://github.com/CoderrrSong/CoRAG.

2602.17681 2026-04-27 cs.LG cs.CL

LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs

Ofir Gordon, Lior Dikstein, Arnon Netzer, Idan Achituve, Hai Victor Habi

Comments 32 pages, 6 figures

详情
英文摘要

Post-training quantization (PTQ) is a widely used approach for reducing the memory and compute costs of large language models (LLMs). Recent studies have shown that applying invertible transformations to activations can significantly improve quantization robustness by reducing activation outliers; however, existing approaches are largely restricted to rotation or Hadamard-based transformations. Moreover, most studies focused primarily on traditional quantization schemes, whereas modern hardware increasingly supports the microscaling (MX) data format. Attempts to combine both showed severe performance degradation, leading prior work to introduce assumptions on the transformations. In this work, we take a complementary perspective. First, we provide a theoretical analysis of transformations under MX quantization by deriving a bound on the quantization error. Our analysis emphasizes the importance of accounting for both the activation distribution and the underlying quantization structure. Building on this analysis, we propose LATMiX, a method that generalizes outlier reduction to learnable invertible affine transformations optimized using standard deep learning tools. Experiments show consistent improvements in average accuracy for MX low-bit quantization over strong baselines on a wide range of zero-shot benchmarks, across multiple model sizes.

2602.12469 2026-04-27 cs.LG

Regularized Meta-Learning for Improved Generalization

Noor Islam S. Mohammad, Md Muntaqim Meherab

Comments We have recently encountered author conflicts related to this work and therefore respectfully request the withdrawal of this paper. We believe this step is necessary to address the situation appropriately and maintain academic integrity in the submission

详情
英文摘要

Deep ensemble methods often improve predictive performance, yet they suffer from three practical limitations: redundancy among base models that inflates computational cost and degrades conditioning, unstable weighting under multicollinearity, and overfitting in meta-learning pipelines. We propose a regularized meta-learning framework that addresses these challenges through a four-stage pipeline combining redundancy-aware projection, statistical meta-feature augmentation, and cross-validated regularized meta-models (Ridge, Lasso, and ElasticNet). Our multi-metric de-duplication strategy removes near-collinear predictors using correlation and MSE thresholds ($τ_{\text{corr}}=0.95$), reducing the effective condition number of the meta-design matrix while preserving predictive diversity. Engineered ensemble statistics and interaction terms recover higher-order structure unavailable to raw prediction columns. A final inverse-RMSE blending stage mitigates regularizer-selection variance. On the Playground Series S6E1 benchmark (100K samples, 72 base models), the proposed framework achieves an out-of-fold RMSE of 8.582, improving over simple averaging (8.894) and conventional Ridge stacking (8.627), while matching greedy hill climbing (8.603) with substantially lower runtime (4 times faster). Conditioning analysis shows a 53.7\% reduction in effective matrix condition number after redundancy projection. Comprehensive ablations demonstrate consistent contributions from de-duplication, statistical meta-features, and meta-ensemble blending. These results position regularized meta-learning as a stable and deployment-efficient stacking strategy for high-dimensional ensemble systems.

2602.11931 2026-04-27 cs.CL cs.AI

AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection

Pretam Ray, Pratik Prabhanjan Brahma, Zicheng Liu, Emad Barsoum

Comments 9 pages, 2 Figues

详情
英文摘要

Evolutionary agentic systems intensify the trade-off between computational efficiency and reasoning capability by repeatedly invoking large language models (LLMs) during inference. This setting raises a central question: how can an agent dynamically select an LLM that is sufficiently capable for the current generation step while remaining computationally efficient? While model cascades offer a practical mechanism for balancing this trade-off, existing routing strategies typically rely on static heuristics or external controllers and do not explicitly account for model uncertainty. We introduce AdaptEvolve: Adaptive LLM Selection for Multi-LLM Evolutionary Refinement within an evolutionary sequential refinement framework that leverages intrinsic generation confidence to estimate real-time solvability. Empirical results show that confidence-driven selection yields a favourable Pareto frontier, reducing total inference cost by an average of 37.9% across benchmarks while retaining 97.5% of the upper-bound accuracy of static large-model baselines. Our code is available at https://github.com/raypretam/adaptive_llm_selection.

2602.11157 2026-04-27 cs.CL

Response-Based Knowledge Distillation for Multilingual Jailbreak Prevention Unwittingly Compromises Safety

Max Zhang, Derek Liu, Kai Zhang, Joshua Franco, Haihao Liu

Comments 9 pages, Poster presented at Socially Responsible and Trustworthy Foundation Models at NeurIPS 2025 Workshop

详情
英文摘要

Large language models (LLMs) are increasingly deployed worldwide, yet their safety alignment remains predominantly English-centric. This allows for vulnerabilities in non-English contexts, especially with low-resource languages. We introduce a novel application of knowledge distillation (KD) in the context of multilingual jailbreak prevention, examining its efficacy. We distill the refusal behaviors of a proprietary teacher model (OpenAI o1-mini) with Low-Rank Adaptation (LoRA) into three open-source student models: Meta-Llama-3-8B-Instruct, Gemma-2-2B-IT, and Qwen3-8B, using ~28,000 multilingual jailbreak prompts from XSafety via black-box response-based, parameter-efficient fine-tuning (PEFT). Evaluation on the MultiJail benchmark reveals a counterintuitive behavior: standard fine-tuning on the teacher's ``safe'' refusal data inadvertently increases Jailbreak Success Rate (JSR) for all student models, up to 16.6 percentage points. Our experiments reveal a divergent generalization to unseen languages during distillation, with varying outcomes depending on the base model. By removing a primary source of safety degradation, nuanced `boundary' refusals, we mitigate or even reverse safety declines in student models, although reductions in reasoning performance (GSM8K) persist. Overall, our exploratory study highlights the challenges and potential of KD as a technique for multilingual safety alignment, offering a foundation for future research in this direction.

2602.10698 2026-04-27 cs.CV cs.AI

AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models

Zhifeng Rao, Wenlong Chen, Lei Xie, Xia Hua, Dongfu Yin, Zhen Tian, F. Richard Yu

详情
Journal ref
ICRA2026
英文摘要

Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic perception and control, yet most existing approaches primarily rely on VLM trained using 2D images, which limits their spatial understanding and action grounding in complex 3D environments. To address this limitation, we propose a novel framework that integrates depth estimation into VLA models to enrich 3D feature representations. Specifically, we employ a depth estimation baseline called VGGT to extract geometry-aware 3D cues from standard RGB inputs, enabling efficient utilization of existing large-scale 2D datasets while implicitly recovering 3D structural information. To further enhance the reliability of these depth-derived features, we introduce a new module called action assistant, which constrains the learned 3D representations with action priors and ensures their consistency with downstream control tasks. By fusing the enhanced 3D features with conventional 2D visual tokens, our approach significantly improves the generalization ability and robustness of VLA models. Experimental results demonstrate that the proposed method not only strengthens perception in geometrically ambiguous scenarios but also leads to superior action prediction accuracy. This work highlights the potential of depth-driven data augmentation and auxiliary expert supervision for bridging the gap between 2D observations and 3D-aware decision-making in robotic systems.

2602.07038 2026-04-27 cs.CV cs.CL

UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents

Yifan Ji, Zhipeng Xu, Zhenghao Liu, Zulong Chen, Qian Zhang, Zhibo Yang, Junyang Lin, Yu Gu, Ge Yu, Maosong Sun

详情
英文摘要

Key Information Extraction (KIE) from real-world documents remains challenging due to substantial variations in layout structures, visual quality, and task-specific information requirements. Recent Large Multimodal Models (LMMs) have shown promising potential for performing end-to-end KIE directly from document images. To enable a comprehensive and systematic evaluation across realistic and diverse application scenarios, we introduce UNIKIE-BENCH, a unified benchmark designed to rigorously evaluate the KIE capabilities of LMMs. UNIKIE-BENCH consists of two complementary tracks: a constrained-category KIE track with scenario-predefined schemas that reflect practical application needs, and an open-category KIE track that extracts any key information that is explicitly present in the document. Experiments on 15 state-of-the-art LMMs reveal substantial performance degradation under diverse schema definitions, long-tail key fields, and complex layouts, along with pronounced performance disparities across different document types and scenarios. These findings underscore persistent challenges in grounding accuracy and layout-aware reasoning for LMM-based KIE. All codes and datasets are available at https://github.com/NEUIR/UNIKIE-BENCH.

2602.06974 2026-04-27 cs.RO cs.CV

FeudalNav: A Simple Framework for Visual Navigation

Faith Johnson, Bryan Bo Cao, Shubham Jain, Ashwin Ashok, Kristin Dana

Comments 8 Pages, 6 figures and 4 tables. arXiv admin note: substantial text overlap with arXiv:2411.09893, arXiv:2402.12498

详情
英文摘要

Visual navigation for robotics is inspired by the human ability to navigate environments using visual cues and memory, eliminating the need for detailed maps. In unseen, unmapped, or GPS-denied settings, traditional metric map-based methods fall short, prompting a shift toward learning-based approaches with minimal exploration. In this work, we develop a hierarchical framework that decomposes the navigation decision-making process into multiple levels. Our method learns to select subgoals through a simple, transferable waypoint selection network. A key component of the approach is a latent-space memory module organized solely by visual similarity, as a proxy for distance. This alternative to graph-based topological representations proves sufficient for navigation tasks, providing a compact, light-weight, simple-to-train navigator that can find its way to the goal in novel locations. We show competitive results with a suite of SOTA methods in Habitat AI environments without using any odometry in training or inference. An additional contribution leverages the interpretablility of the framework for interactive navigation. We consider the question: how much direction intervention/interaction is needed to achieve success in all trials? We demonstrate that even minimal human involvement can significantly enhance overall navigation performance.

2602.06019 2026-04-27 cs.CL cs.LG

Multi-Token Prediction via Self-Distillation

John Kirchenbauer, Abhimanyu Hans, Brian Bartoldson, Micah Goldblum, Ashwinee Panda, Tom Goldstein

Comments 9 pages and 5 figures in the main body

详情
英文摘要

Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We consider a new approach for converting a pretrained autoregressive language model from a slow single next token prediction model into a fast standalone multi-token prediction model using a simple online distillation objective. The final model retains the exact same implementation as the pretrained initial checkpoint and is deployable without the addition of any auxiliary verifier or other specialized inference code. Our method produces models that decode more than $3\times$ faster at $<5\%$ drop in accuracy on GSM8K relative to the single token decoding performance of the same checkpoint.

2602.05639 2026-04-27 cs.LG stat.ML

Joint Embedding Variational Bayes

Amin Oji, Paul Fieguth

详情
Journal ref
Transactions on Machine Learning Research, April 2026
英文摘要

We introduce Variational Joint Embedding (VJE), a reconstruction-free latent-variable framework for non-contrastive self-supervised learning in representation space. VJE maximizes a symmetric conditional evidence lower bound (ELBO) on paired encoder embeddings by defining a conditional likelihood directly on target representations, rather than optimizing a pointwise compatibility objective. The likelihood is instantiated as a heavy-tailed Student--\(t\) distribution on a polar representation of the target embedding, where a directional--radial decomposition separates angular agreement from magnitude consistency and mitigates norm-induced pathologies. The directional factor operates on the unit sphere, yielding a valid variational bound for the associated spherical subdensity model. An amortized inference network parameterizes a diagonal Gaussian posterior whose feature-wise variances are shared with the directional likelihood, yielding anisotropic uncertainty without auxiliary projection heads. Across ImageNet-1K, CIFAR-10/100, and STL-10, VJE is competitive with standard non-contrastive baselines under linear and \(k\)-NN evaluation, while providing probabilistic semantics directly in representation space for downstream uncertainty-aware applications. We validate these semantics through out-of-distribution detection, where representation-space likelihoods yield strong empirical performance. These results position the framework as a principled variational formulation of non-contrastive learning, in which structured feature-wise uncertainty is represented directly in the learned embedding space.

2602.00208 2026-04-27 cs.LG cs.AI cs.IR math.ST stat.ML stat.TH

Analyzing Shapley Additive Explanations to Understand Anomaly Detection Algorithm Behaviors and Their Complementarity

Jordan Levy, Paul Saves, Moncef Garouani, Nicolas Verstaevel, Benoit Gaudou

Comments IDA Frontier Prize and Best Paper Award -Intelligent Data Analysis (IDA) 2026, Springer Nature

详情
Journal ref
In: IDA (LNCS), Springer, vol 16513 (2026)
英文摘要

Unsupervised anomaly detection is a challenging problem due to the diversity of data distributions and the lack of labels. Ensemble methods are often adopted to mitigate these challenges by combining multiple detectors, which can reduce individual biases and increase robustness. Yet building an ensemble that is genuinely complementary remains challenging, since many detectors rely on similar decision cues and end up producing redundant anomaly scores. As a result, the potential of ensemble learning is often limited by the difficulty of identifying models that truly capture different types of irregularities. To address this, we propose a methodology for characterizing anomaly detectors through their decision mechanisms. Using SHapley Additive exPlanations, we quantify how each model attributes importance to input features, and we use these attribution profiles to measure similarity between detectors. We show that detectors with similar explanations tend to produce correlated anomaly scores and identify largely overlapping anomalies. Conversely, explanation divergence reliably indicates complementary detection behavior. Our results demonstrate that explanation-driven metrics offer a different criterion than raw outputs for selecting models in an ensemble. However, we also demonstrate that diversity alone is insufficient; high individual model performance remains a prerequisite for effective ensembles. By explicitly targeting explanation diversity while maintaining model quality, we are able to construct ensembles that are more diverse, more complementary, and ultimately more effective for unsupervised anomaly detection.

2601.23022 2026-04-27 cs.CL

DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis

Lung-Hao Lee, Liang-Chih Yu, Natalia Loukashevich, Ilseyar Alimova, Alexander Panchenko, Tzu-Mi Lin, Zhe-Yu Xu, Jian-Yu Zhou, Guangmin Zheng, Jin Wang, Sharanya Awasthi, Jonas Becker, Jan Philip Wahle, Terry Ruas, Shamsuddeen Hassan Muhammad, Saif M. Mohammad

Comments accepted at ACL 2026

详情
英文摘要

Aspect-Based Sentiment Analysis (ABSA) focuses on extracting sentiment at a fine-grained aspect level and has been widely applied across real-world domains. However, existing ABSA research relies on coarse-grained categorical labels (e.g., positive, negative), which limits its ability to capture nuanced affective states. To address this limitation, we adopt a dimensional approach that represents sentiment with continuous valence-arousal (VA) scores, enabling fine-grained analysis at both the aspect and sentiment levels. To this end, we introduce DimABSA, the first multilingual, dimensional ABSA resource annotated with both traditional ABSA elements (aspect terms, aspect categories, and opinion terms) and newly introduced VA scores. This resource contains 76,958 aspect instances across 42,590 sentences, spanning six languages and four domains. We further introduce three subtasks that combine VA scores with different ABSA elements, providing a bridge from traditional ABSA to dimensional ABSA. Given that these subtasks involve both categorical and continuous outputs, we propose a new unified metric, continuous F1 (cF1), which incorporates VA prediction error into standard F1. We provide a comprehensive benchmark using both prompted and fine-tuned large language models across all subtasks. Our results show that DimABSA is a challenging benchmark and provides a foundation for advancing multilingual dimensional ABSA. We publicly released the DimABSA dataset, which was used for Track A of SemEval-2026 Task 3, attracting over 300 participants.

2601.19674 2026-04-27 cs.LG cs.AI stat.AP stat.ME

Cross-Domain Offshore Wind Power Forecasting: Transfer Learning Through Meteorological Clusters

Dominic Weisser, Chloé Hashimoto-Cullen, Benjamin Guedj

Comments 15 pages, 5 figures, Climate Informatics 2026

详情
英文摘要

Ambitious decarbonisation targets are rapidly increasing the commission of new offshore wind farms. For these newly commissioned plants to run, accurate power forecasts are needed from the onset. These allow grid stability, good reserve management and efficient energy trading. Despite machine learning models having strong performances, they tend to require large volumes of site-specific data that new farms do not yet have. To overcome this data scarcity, we propose a novel transfer learning framework that clusters power output according to covariate meteorological features. Rather than training a single, general-purpose model, we thus forecast with an ensemble of expert models, each trained on a cluster. As these pre-trained models each specialise in a distinct weather pattern, they adapt efficiently to new sites and capture transferable, climate-dependent dynamics. Our contributions are two-fold - we propose this novel framework and comprehensively evaluate it on eight offshore wind farms, achieving accurate cross-domain forecasting with under five months of site-specific data. Our experiments achieve a MAE of 3.52\%, providing empirical verification that reliable forecasts do not require a full annual cycle. Beyond power forecasting, this climate-aware transfer learning method opens new opportunities for offshore wind applications such as early-stage wind resource assessment, where reducing data requirements can significantly accelerate project development whilst effectively mitigating its inherent risks.

2601.14541 2026-04-27 cs.LG cs.AI cs.AR

Report for NSF Workshop on AI for Electronic Design Automation

Deming Chen, Vijay Ganesh, Weikai Li, Yingyan Celine Lin, Yong Liu, Subhasish Mitra, David Z. Pan, Ruchir Puri, Jason Cong, Yizhou Sun

Comments Accepted by IEEE Circuits and Systems Magazine (2026). This is the accepted version. The published version is available at https://ieeexplore.ieee.org/document/11466406

详情
Journal ref
IEEE Circuits and Systems Magazine, vol. 26, no. 1, First Quarter 2026
英文摘要

This report distills the discussions and recommendations from the NSF Workshop on AI for Electronic Design Automation (EDA), held on December 10, 2024 in Vancouver alongside NeurIPS 2024. Bringing together experts across machine learning and EDA, the workshop examined how AI-spanning large language models (LLMs), graph neural networks (GNNs), reinforcement learning (RL), neurosymbolic methods, etc.-can facilitate EDA and shorten design turnaround. The workshop includes four themes: (1) AI for physical synthesis and design for manufacturing (DFM), discussing challenges in physical manufacturing process and potential AI applications; (2) AI for high-level and logic-level synthesis (HLS/LLS), covering pragma insertion, program transformation, RTL code generation, etc.; (3) AI toolbox for optimization and design, discussing frontier AI developments that could potentially be applied to EDA tasks; and (4) AI for test and verification, including LLM-assisted verification tools, ML-augmented SAT solving, security/reliability challenges, etc. The report recommends NSF to foster AI/EDA collaboration, invest in foundational AI for EDA, develop robust data infrastructures, promote scalable compute infrastructure, and invest in workforce development to democratize hardware design and enable next-generation hardware systems. The workshop information can be found on the website https://ai4eda-workshop.github.io/.

2601.14287 2026-04-27 cs.LG

Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents

Xiucheng Xu, Bingbing Xu, Xueyun Tian, Zihe Huang, Rongxin Chen, Yunfan Li, Huawei Shen

详情
英文摘要

External memory systems are pivotal for enabling Large Language Model (LLM) agents to maintain persistent knowledge and perform long-horizon decision-making. Existing paradigms typically follow a two-stage process: computationally expensive memory construction (e.g., structuring data into graphs) followed by naive retrieval-augmented generation. However, our empirical analysis reveals two fundamental limitations: complex construction incurs high costs with marginal performance gains, and simple context concatenation fails to bridge the gap between retrieval recall and reasoning accuracy. To address these challenges, we propose CoM (Chain-of-Memory), a novel framework that advocates for a paradigm shift toward lightweight construction paired with sophisticated utilization. CoM introduces a Chain-of-Memory mechanism that organizes retrieved fragments into coherent inference paths through dynamic evolution, utilizing adaptive truncation to prune irrelevant noise. Extensive experiments on the LongMemEval and LoCoMo benchmarks demonstrate that CoM outperforms strong baselines with accuracy gains of 7.5%-10.4%, while drastically reducing computational overhead to approximately 2.7% of token consumption and 6.0% of latency compared to complex memory architectures.

2601.13895 2026-04-27 cs.CV cs.AI

OmniOVCD: Streamlining Open-Vocabulary Change Detection with SAM 3

Xu Zhang, Danyang Li, Yingjie Xia, Xiaohang Dong, Hualong Yu, Jianye Wang, Qicheng Li

详情
英文摘要

Change Detection (CD) is a fundamental task in remote sensing. It monitors the evolution of land cover over time. Based on this, Open-Vocabulary Change Detection (OVCD) introduces a new requirement. It aims to reduce the reliance on predefined categories. Existing training-free OVCD methods mostly use CLIP to identify categories. These methods also need extra models like DINO to extract features. However, combining different models often causes problems in matching features and makes the system unstable. Recently, the Segment Anything Model 3 (SAM 3) is introduced. It integrates segmentation and identification capabilities within one promptable model, which offers new possibilities for the OVCD task. In this paper, we propose OmniOVCD, a standalone framework designed for OVCD. By leveraging the decoupled output heads of SAM 3, we propose a Synergistic Fusion to Instance Decoupling (SFID) strategy. SFID first fuses the semantic, instance, and presence outputs of SAM 3 to construct land-cover masks, and then decomposes them into individual instance masks for change comparison. This design preserves high accuracy in category recognition and maintains instance-level consistency across images. As a result, the model can generate accurate change masks. Experiments on four public benchmarks (LEVIR-CD, WHU-CD, S2Looking, and SECOND) demonstrate SOTA performance, achieving IoU scores of 67.2, 66.5, 24.5, and 27.1 (class-average), respectively, surpassing all previous methods. The code is available at https://github.com/Erxucomeon/OmniOVCD.

2601.12979 2026-04-27 cs.CL

The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check

Qingyu Lu, Liang Ding, Kanjian Zhang, Jinxia Zhang, Dacheng Tao

Comments ACL 2026 - Main Conference

详情
英文摘要

The pursuit of real-time agentic interaction has driven interest in Diffusion-based Large Language Models (dLLMs) as alternatives to auto-regressive backbones, promising to break the sequential latency bottleneck. However, does such efficiency gains translate into effective agentic behavior? In this work, we present a comprehensive evaluation of dLLMs (e.g., LLaDA, Dream) across two distinct agentic paradigms: Embodied Agents (requiring long-horizon planning) and Tool-Calling Agents (requiring precise formatting). Contrary to the efficiency hype, our results on Agentboard and BFCL reveal a "bitter lesson": current dLLMs fail to serve as reliable agentic backbones, frequently leading to systematically failure. (1) In Embodied settings, dLLMs suffer repeated attempts, failing to branch under temporal feedback. (2) In Tool-Calling settings, dLLMs fail to maintain symbolic precision (e.g. strict JSON schemas) under diffusion noise. To assess the potential of dLLMs in agentic workflows, we introduce DiffuAgent, a multi-agent evaluation framework that integrates dLLMs as plug-and-play cognitive cores. Our analysis shows that dLLMs are effective in non-causal roles (e.g., memory summarization and tool selection) but require the incorporation of causal, precise, and logically grounded reasoning mechanisms into the denoising process to be viable for agentic tasks.

2601.12430 2026-04-27 cs.CL

System-Mediated Attention Imbalances Make Vision-Language Models Say Yes

Tsan Tsai Chan, Varsha Suresh, Anisha Saha, Michael Hahn, Vera Demberg

Comments Accepted to ACL Findings 2026

详情
英文摘要

Vision-language model (VLM) hallucination is commonly linked to imbalanced allocation of attention across input modalities: system, image and text. However, existing mitigation strategies tend towards an image-centric interpretation of these imbalances, often prioritising increased image attention while giving less consideration to the roles of the other modalities. In this study, we evaluate a more holistic, system-mediated account, which attributes these imbalances to functionally redundant system weights that reduce attention to image and textual inputs. We show that this framework offers a useful empirical perspective on the yes-bias, a common form of hallucination in which VLMs indiscriminately respond `yes'. Causally redistributing attention from the system modality to image and textual inputs substantially suppresses this bias, often outperforming existing approaches. We further present evidence suggesting that system-mediated attention imbalances contribute to the yes-bias by encouraging a default reliance on coarse input representations, which are effective for some tasks but ill-suited to others. Taken together, these findings firmly establish system attention as a key factor in VLM hallucination and highlight its potential as a lever for mitigation.

2601.11719 2026-04-27 cs.LG hep-ex

jBOT: Semantic Jet Representation Clustering Emerges from Self-Distillation

Ho Fung Tsoi, Dylan Rankin

Comments Under review

详情
英文摘要

Self-supervised learning, in the context of foundation model training, is a powerful pre-training method for learning feature representations without labels, which often capture generic underlying semantics from the data and can later be fine-tuned for downstream tasks. In this work, we introduce jBOT, a pre-training method based on self-distillation for jet data from the CERN Large Hadron Collider, which combines local particle-level distillation with global jet-level distillation to learn jet representations that support downstream tasks such as anomaly detection and classification. We observe that pre-training on unlabeled jets leads to emergent semantic class clustering in the representation space. The clustering in the frozen embedding, when pre-trained on background jets only, enables anomaly detection via simple distance-based metrics, and the learned embedding can be fine-tuned for classification with improved performance compared to supervised models trained from scratch.

2601.11020 2026-04-27 cs.CL

From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models

Youmi Ma, Naoaki Okazaki

Comments Findings of ACL 2026; Source code available at https://github.com/YoumiMa/RetMask

详情
英文摘要

Advances in mechanistic interpretability have identified special attention heads, known as retrieval heads, that are responsible for retrieving information from the context. However, the role of these retrieval heads in improving model performance remains unexplored. This work investigates whether retrieval heads can be leveraged to enhance the long-context capabilities of LLMs. Specifically, we propose RetMask, a method that generates training signals by contrasting normal model outputs with those from an ablated variant in which the retrieval heads are masked. This mechanism-based approach achieves substantial improvements: +2.28 points on HELMET at 128K for Llama-3.1, with +70% gains on generation with citation and +32% on passage re-ranking, while preserving performance on general tasks. Experiments across four models in three families demonstrate that RetMask consistently improves long-context performance, where gains correlate with the sparsity of the retrieval score distribution: models with sparser distributions, where retrieval capabilities are concentrated in a small set of heads, respond more strongly, while those with less sparse distributions show more modest gains. These results validate the functional role of retrieval heads and show that mechanistic insights can be transformed into performance enhancements.

2601.10386 2026-04-27 cs.CV cs.AI cs.MM

Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer

Filippo Ruffini, Camillo Maria Caruso, Claudia Tacconi, Lorenzo Nibid, Francesca Miccolis, Marta Lovino, Carlo Greco, Edy Ippolito, Michele Fiore, Alessio Cortellini, Bruno Beomonte Zobel, Giuseppe Perrone, Bruno Vincenzi, Claudio Marrocco, Alessandro Bria, Elisa Ficarra, Sara Ramella, Valerio Guarrasi, Paolo Soda

详情
英文摘要

Accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) requires integrating clinical, radiological, and histopathological data. Multimodal Deep Learning (MDL) can improve precision prognosis, but small cohorts and missing modalities limit its clinical applicability, as conventional approaches enforce complete case filtering or imputation. We present a missing-aware multimodal survival framework that combines Computed Tomography (CT), Whole-Slide Histopathology Images (WSI), and structured clinical variables for overall survival modeling in unresectable stage II-III NSCLC. The framework uses Foundation Models (FMs) for modality-specific feature extraction and a missing-aware encoding strategy that enables intermediate multimodal fusion under naturally incomplete modality profiles. By design, the architecture processes all available data without dropping patients during training or inference. Intermediate fusion outperforms unimodal baselines and both early and late fusion strategies, with the trimodal configuration reaching a C-index of 74.42. Modality-importance analyses show that the fusion model adapts its reliance on each data stream according to representation informativeness, shaped by the alignment between FM pretraining objectives and the survival task. The learned risk scores produce clinically meaningful stratification of disease progression and metastatic risk, with statistically significant log-rank tests across all modality combinations, supporting the translational relevance of the proposed framework.

2601.07447 2026-04-27 cs.CV

PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion

Mahdi Chamseddine, Didier Stricker, Jason Rambach

Comments Accepted in ICPR 2026

详情
英文摘要

Existing image foundation models are not optimized for spherical images having been trained primarily on perspective images. PanoSAMic integrates the pre-trained Segment Anything (SAM) encoder to make use of its extensive training and integrate it into a semantic segmentation model for panoramic images using multiple modalities. We modify the SAM encoder to output multi-stage features and introduce a novel spatio-modal fusion module that allows the model to select the relevant modalities and best features from each modality for different areas of the input. Furthermore, our semantic decoder uses spherical attention and dual view fusion to overcome the distortions and edge discontinuity often associated with panoramic images. PanoSAMic achieves state-of-the-art (SotA) results on Stanford2D3DS for RGB, RGB-D, and RGB-D-N modalities and on Matterport3D for RGB and RGB-D modalities. https://github.com/dfki-av/PanoSAMic

2601.05414 2026-04-27 cs.CL cs.AI stat.ML

Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions

Minda Zhao, Yilun Du, Mengyu Wang

Comments Accepted to ACL 2026 (Main Conference)

详情
英文摘要

As large language models (LLMs) transition from chat interfaces to integral components of stochastic pipelines and systems approaching general intelligence, the ability to faithfully sample from specified probability distributions has become a functional requirement rather than a theoretical curiosity. We present the first large-scale, statistically powered audit of native probabilistic sampling in frontier LLMs, benchmarking 11 models across 15 distributions. To disentangle failure modes, we employ a dual-protocol design: Batch Generation, where a model produces $N{=}1000$ samples within one response, and Independent Requests, comprising $N{=}1000$ stateless calls. We observe a sharp protocol asymmetry: batch generation achieves only modest statistical validity, with a 7% median pass rate, while independent requests collapse almost entirely, with 10 of 11 models passing none of the distributions. Beyond this asymmetry, we reveal that sampling fidelity degrades monotonically with distributional complexity and aggravates as the sampling horizon $N$ increases. Finally, we demonstrate how the propagation of these failures into downstream real-world application tasks introduces systematic biases: models fail to enforce uniform answer-position constraints in Multiple Choice Question generation and systematically violate demographic targets in attribute-constrained text-to-image prompt synthesis. These findings indicate that current LLMs lack a functional internal sampler, necessitating external tools for applications requiring statistical guarantees.

2601.03779 2026-04-27 cs.CL

Tracing the complexity profiles of different linguistic phenomena through the intrinsic dimension of LLM representations

Marco Baroni, Emily Cheng, Iria de-Dios-Flores, Francesca Franzon

详情
英文摘要

We explore intrinsic dimension (ID) of LLM representations as a marker of linguistic complexity. Specifically, we test whether ID differences across model layers reflect well-known complexity contrasts established in (psycho)linguistics: coordination vs. subordination, right-branching vs. center-embedding, and unambiguous vs. ambiguous attachment. Our results on six different LLMs show that these contrasts are consistently reflected in ID differences, with more complex phenomena eliciting higher ID profiles. Notably, ID differences emerge at different points across layers for different contrasts, also reaching their peaks at different stages. Further experiments using representational similarity and layer pruning confirm the trends. We conclude that ID is a useful marker of linguistic complexity in LLMs, that it points to similar linguistic processing steps across disparate LLMs, and that it has the potential to differentiate between different types of complexity.