arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.28082 2026-05-01 cs.AI

Characterizing the Consistency of the Emergent Misalignment Persona

Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko

详情

英文摘要

Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlation between harmful behavior and self-assessment in emergently misaligned models, it remains unclear how consistent this correspondence is across tasks and whether it varies across fine-tuning domains. We characterize the consistency of the EM persona by fine-tuning Qwen 2.5 32B Instruct on six narrowly misaligned domains (e.g., insecure code, risky financial advice, bad medical advice) and administering experiments including harmfulness evaluation, self-assessment, choosing between two descriptions of AI systems, output recognition, and score prediction. Our results reveal two distinct patterns: coherent-persona models, in which harmful behavior and self-reported misalignment are coupled, and inverted-persona models, which produce harmful outputs while identifying as aligned AI systems. These findings reveal a more fine-grained picture of the effects of emergent misalignment, calling into question the consistency of the EM persona.

URL PDF HTML ☆

赞 0 踩 0

2604.28078 2026-05-01 cs.CV

AesRM: Improving Video Aesthetics with Expert-Level Feedback

Yujin Han, Yujie Wei, Yefei He, Xinyu Liu, Tianle Li, Zichao Yu, Andi Han, Shiwei Zhang, Tingyu Weng, Difan Zou

Comments 37 pages, 14 figures, 12 tables

详情

英文摘要

Despite rapid advances in photorealistic video generation, real-world applications such as filmmaking require video aesthetics, e.g., harmonious colors and cinematic lighting, beyond visual fidelity. Prior work on visual aesthetics largely focuses on images, often reducing aesthetics to coarse definitions, e.g., visual pleasure, without a rigorous and systematic evaluation. To improve video aesthetics, we propose a hierarchical rubric that decomposes video aesthetics into three core dimensions, Visual Aesthetics (VA), Visual Fidelity (VF), and Visual Plausibility (VP), with 15 fine-grained criteria, e.g., shot composition. This framework enables a large-scale expert-annotated preference dataset and an evaluation benchmark, AesVideo-Bench, containing about 2500 video pairs with expert annotations on VA, VF, and VP. We then build a family of Video Aesthetic Reward Models (AesRM): AesRM-Base, which directly predicts pairwise preferences on these dimensions to provide efficient post-training rewards, and AesRM-CoT, which additionally generates CoT aligned with all 15 criteria to improve assessment interpretability. Specifically, we train AesRM with a three-stage progressive scheme: (1) Atomic Aesthetic Capability Learning, which strengthens AesRM's recognition of fundamental aesthetic concepts, e.g., accurately identifying centered composition; (2) Cold-Start, aligning the model with structured reasoning protocols; and (3) GRPO, further improving evaluation accuracy. To enhance AesRM-CoT, we additionally propose self-consistency-based CoT synthesis to improve CoT quality and design CoT-based process rewards during GRPO. Extensive experiments show AesRM outperforms baselines on multiple aesthetics benchmarks and is more robust, with lower position bias. Finally, we align Wan2.2 with AesRM and observe clear aesthetic gains over existing aesthetic reward models.

URL PDF HTML ☆

赞 0 踩 0

2604.28076 2026-05-01 cs.CL cs.AI cs.LG

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye

2604.28070 2026-05-01 cs.LG

A Unified Framework of Hyperbolic Graph Representation Learning Methods

Sofía Pérez Casulo, Marcelo Fiori, Bernardo Marenco, Federico Larroca

Comments submitted

2604.28064 2026-05-01 cs.CV

3D Reconstruction Techniques in the Manufacturing Domain: Applications, Research Opportunities and Use Cases

Chialoon Cheng, Kaijun liu, Zhiyang Liu, Marcelo H Ang

Comments 24 pages

2604.28057 2026-05-01 cs.RO cs.MA

Framework for Collaborative Operation of Autonomous Delivery Vehicles Within a Marshaling Yard

James O'Hara, Karl Wunderlich, Gregory Stevens

2604.28056 2026-05-01 cs.AI

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Feiyu Wu, Xu Zheng, Zhuocheng Wang, Yi ming Dai, Hui Li

2604.28055 2026-05-01 cs.LG cs.AI eess.IV

PROMISE-AD: Progression-aware Multi-horizon Survival Estimation for Alzheimer's Disease Progression and Dynamic Tracking

Qing Lyu, Jeremy Hudson, Mohammad Kawas, Yuming Jiang, Chenyu You, Christopher T Whitlow

详情

英文摘要

Individualized Alzheimer's disease (AD) progression prediction requires models that use irregular visits, account for censoring, avoid diagnostic leakage, and provide calibrated horizon risks. We propose PROgression-aware MultI-horizon Survival Estimation for Alzheimer's Disease (PROMISE-AD), a leakage-safe survival framework for predicting conversion from cognitively normal (CN) to mild cognitive impairment (MCI) and from MCI to AD dementia using ADNI/TADPOLE tabular histories. PROMISE-AD converts pre-index visits into tokens with standardized measurements, missingness masks, longitudinal changes, time-normalized slopes, visit timing, and non-diagnostic categorical attributes. A temporal Transformer fuses global, attention-pooled, and latest-visit representations to estimate a progression score and latent discrete-time mixture hazards. Training combines survival likelihood, horizon-specific focal risk loss, progression ranking, hazard smoothness, and mixture-balance regularization, followed by validation-set isotonic calibration for 1-, 2-, 3-, and 5-year risks. In held-out testing across three seeds, PROMISE-AD achieved an integrated Brier score (IBS) of 0.085 $\pm$ 0.012, C-index of 0.808 $\pm$ 0.015, and mean time-dependent AUC of 0.840 $\pm$ 0.081 for CN-to-MCI conversion, yielding the lowest IBS among compared methods. For MCI-to-AD conversion, PROMISE-AD achieved the highest C-index (0.894 $\pm$ 0.018) and near-ceiling 5-year discrimination (AUROC 0.997 $\pm$ 0.003; AUPRC 0.999 $\pm$ 0.001), although some baselines had lower IBS. Ablations and interpretability supported longitudinal change features, fused temporal representations, mixture hazards, cognitive and functional measures, APOE4 status, and recent conversion-proximal visits. These findings suggest that progression-aware survival modeling can provide interpretable multi-horizon AD conversion risk estimates.

URL PDF HTML ☆

赞 0 踩 0

2604.28049 2026-05-01 cs.AI

Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems

Taslim Jamal Arif, Kuldeep Singh

2604.28043 2026-05-01 cs.AI

Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

Rahul Ramachandran, Nidhi Jha, Muthukumaran Ramasubramanian

2604.28039 2026-05-01 cs.AI

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang

2604.28038 2026-05-01 cs.LG

Early Detection of Water Stress by Plant Electrophysiology: Machine Learning for Irrigation Management

Eduard Buss, Till Aust, Heiko Hamann

2604.28036 2026-05-01 cs.LG cs.IT math.IT

Exponential families from a single KL identity

Marc Dymetman

2604.28034 2026-05-01 cs.CL physics.soc-ph

Ease of dependency distance minimization in star-like structures

Emília Garcia-Casademont, Ramon Ferrer-i-Cancho

2604.28032 2026-05-01 cs.LG

Shuffling-Aware Optimization for Private Vector Mean Estimation

Shun Takagi, Seng Pei Liew

2604.28030 2026-05-01 cs.LG cs.AI cs.CY cs.IT math.IT

MIFair: A Mutual-Information Framework for Intersectionality and Multiclass Fairness

Jeanne Monnier, Thomas George, Frédéric Guyard, Christèle Tarnec, Marios Kountouris

2604.28028 2026-05-01 cs.CL cs.AI cs.DB cs.IR

Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

Smit Jivani, Sarvam Maheshwari, Sunita Sarawagi

Comments Project Code: https://github.com/SSLab-CSE-IITB/tecod

2604.28025 2026-05-01 cs.CV

ResiHMR: Residual-Limb Aware Single-Image 3D Human Mesh Recovery for Individuals with Limb Loss

Jiaying Ying, Heming Du, Kaihao Zhang, Sean M. Tweedy, Xin Yu

Comments Highlight in CVPR 2026. Project at https://akitaraphael.github.io/ResiHMR/

2604.28024 2026-05-01 cs.LG

FedHarmony: Harmonizing Heterogeneous Label Correlations in Federated Multi-Label Learning

Zhiqiang Kou, Junxiang Wu, Wenke Huang, Wenwen He, Ming-Kun Xie, Changwei Wang, Yuheng Jia, Di Jiang, Yang Liu, Xin Geng, Qiang Yang

Comments Accepted by CVPR 2026. 11 pages, 6 figures

2604.28022 2026-05-01 cs.CV

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Sharayu Nilesh Deshmukh, Kailash A. Hambarde, Joana C. Costa, Hugo Proença, Tiago Roxo

Comments Submitted to IJCB 2026

2604.28016 2026-05-01 cs.CV cs.GR cs.LG

Faster 3D Gaussian Splatting Convergence via Structure-Aware Densification

Linjie Lyu, Ayush Tewari, Jianchun Chen, Thomas Leimkühler, Christian Theobalt

Comments Siggraph 2026

详情

DOI: 10.1145/3799902.3811212

英文摘要

3D Gaussian Splatting has emerged as a powerful scene representation for real-time novel-view synthesis. However, its standard adaptive density control relies on screen-space positional gradients, which do not distinguish between geometric misplacement and frequency aliasing, often leading to either over-blurred high-frequency textures or inefficient over-densification. We present a structure-aware densification framework. Our key insight is that the decision to subdivide a Gaussian should be driven by an explicit comparison between its projected screen-space extent and the local structure of the texture it seeks to represent. We introduce a multi-scale frequency analysis combining structure tensors with Laplacian scale space analysis to estimate the dominant frequency at each pixel, enabling robust supervision across varying texture scales. Based on this analysis, we define $η$, a per-Gaussian, per-axis frequency violation metric that indicates when a primitive may be under-resolving local texture details. Unlike methods that perform isotropic splitting (e.g., splitting each Gaussian into two smaller ones with uniform shape), our approach performs anisotropic splitting. For each axis with high $η$, we compute a split factor to better resolve the local frequency content. We further introduce a multiview consistency criterion that aggregates $η$ observations across multiple views. By performing densification early and faster, we skip the lengthy iterative densification phases required by baseline methods and achieve significantly faster convergence. Experiments on standard benchmarks demonstrate that our method also achieves superior reconstruction quality, particularly in high-frequency regions.

URL PDF HTML ☆

赞 0 踩 0

2604.28011 2026-05-01 cs.CV

Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao

Comments 12 pages, 4 figures. Technical report

2604.28001 2026-05-01 cs.AI cs.SE

A Pattern Language for Resilient Visual Agents

Habtom Kahsay Gidey, Alexander Lenz, Alois Knoll

Comments Accepted to the 23rd International Conference on Software Architecture (ICSA 2026), New and Emerging Ideas Track. 5 pages, 1 figure

2604.27998 2026-05-01 cs.LG cs.CL

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

Jingcheng Deng, Zihao Wei, Liang Pang, Junhong Wu, Shicheng Xu, Zenghao Duan, Huawei Shen

Comments This is an actively developing work, and we will continue to update the arXiv version

2604.27987 2026-05-01 cs.LG

Dynamic Scaled Gradient Descent for Stable Fine-Tuning for Classifications

Nghia Bui, Lijing Wang

2604.27981 2026-05-01 cs.LG cs.AI

ITS-Mina: A Harris Hawks Optimization-Based All-MLP Framework with Iterative Refinement and External Attention for Multivariate Time Series Forecasting

Pourya Zamanvaziri, Amirhossein Sadr, Aida Pakniyat, Dara Rahmati

Comments 19 pages, 2 figures, 3 tables, 4 algorithms

2604.27975 2026-05-01 cs.CV cs.AI

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

Ce Chen, Yi Ren, Yuanming Li, Viktor Goriachko, Zhenhui Ye, Zujin Guo, Zhibin Hong, Mingming Gong

Comments This work has been deployed to production. For more related research, please visit HeyGen Research (https://www.heygen.com/research) and HeyGen Avatar-V (https://www.heygen.com/research/avatar-v-model). Project page: https://chence17.github.io/TransVLM/

2604.27974 2026-05-01 cs.CV cs.DB

FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

Fengxian Ji, Jingpu Yang, Zirui Song, Yuanxi Wang, Zhexuan Cui, Yuke Li, Qian Jiang, Xiuying Chen

2604.27972 2026-05-01 cs.AI cs.HC

From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pokémon Case Study

Johannes Pfau, Panagiotis Vrettis

2604.27968 2026-05-01 cs.CV

ClimateVID -- Social Media Videos Analysis and Challenges Involved

Shiqi Xu, Moritz Burmester, Katharina Prasse, Isaac Bravo, Stefanie Walter, Margret Keuper

Comments Equal contributions by Shiqi Xu and Moritz Burmester