arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03558 2026-04-07 cs.CV

LOGER: Local--Global Ensemble for Robust Deepfake Detection in the Wild

Fei Wu, Dagong Lu, Mufeng Yao, Xinlei Xu, Fengjun Guo

Comments 2nd place (out of 94 teams) in the NTIRE 2026 Robust Deepfake Detection Challenge

详情

英文摘要

Robust deepfake detection in the wild remains challenging due to the ever-growing variety of manipulation techniques and uncontrolled real-world degradations. Forensic cues for deepfake detection reside at two complementary levels: global-level anomalies in semantics and statistics that require holistic image understanding, and local-level forgery traces concentrated in manipulated regions that are easily diluted by global averaging. Since no single backbone or input scale can effectively cover both levels, we propose LOGER, a LOcal--Global Ensemble framework for Robust deepfake detection. The global branch employs heterogeneous vision foundation model backbones at multiple resolutions to capture holistic anomalies with diverse visual priors. The local branch performs patch-level modeling with a Multiple Instance Learning top-$k$ aggregation strategy that selectively pools only the most suspicious regions, mitigating evidence dilution caused by the dominance of normal patches; dual-level supervision at both the aggregated image level and individual patch level keeps local responses discriminative. Because the two branches differ in both granularity and backbone, their errors are largely decorrelated, a property that logit-space fusion exploits for more robust prediction. LOGER achieves 2nd place in the NTIRE 2026 Robust Deepfake Detection Challenge, and further evaluation on multiple public benchmarks confirms its strong robustness and generalization across diverse manipulation methods and real-world degradation conditions.

URL PDF HTML ☆

赞 0 踩 0

2604.03557 2026-04-07 cs.AI

When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compression

Xinnan Dai, Kai Yang, Cheng Luo, Shenglai Zeng, Kai Guo, Jiliang Tang

2604.03556 2026-04-07 cs.CV cs.AI cs.CL

Focus Matters: Phase-Aware Suppression for Hallucination in Vision-Language Models

Sohyeon Kim, Sang Yeon Yoon, Kyeongbo Kong

2604.03555 2026-04-07 cs.CV

HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild

Fei Wu, Dagong Lu, Mufeng Yao, Xinlei Xu, Fengjun Guo

Comments 4th place (out of 193 teams) in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge

2604.03553 2026-04-07 cs.AI cs.CL cs.DL

Towards the AI Historian: Agentic Information Extraction from Primary Sources

Lorenz Hufe, Niclas Griesshaber, Gavin Greif, Sebastian Oliver Eck, Philip Torr

2604.03552 2026-04-07 cs.RO cs.AI cs.CV cs.LG

CRAFT: Video Diffusion for Bimanual Robot Data Generation

Jason Chen, I-Chun Arthur Liu, Gaurav Sukhatme, Daniel Seita

2604.03537 2026-04-07 cs.CL cs.LG

Rethinking Token Prediction: Tree-Structured Diffusion Language Model

Zihao Wu, Haoming Yang, Juncheng Dong, Vahid Tarokh

2604.03533 2026-04-07 cs.AI

Automated Analysis of Global AI Safety Initiatives: A Taxonomy-Driven LLM Approach

Takayuki Semitsu, Naoto Kiribuchi, Kengo Zenitani

Comments 18 pages, 6 figures, 6 tables, to be published in PoliticalNLP 2026

2604.03532 2026-04-07 cs.CL cs.AI cs.LG

LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering

Sing Hieng Wong, Hassan Sajjad, A. B. Siddique

Comments Submitted to COLM 2026

详情

英文摘要

Large language models (LLMs) show strong multilingual capabilities, yet reliably controlling the language of their outputs remains difficult. Representation-level steering addresses this by adding language-specific vectors to model activations at inference time, but identifying language-specific directions in the residual stream often relies on multilingual or parallel data that can be expensive to obtain. Sparse autoencoders (SAEs) decompose residual activations into interpretable, sparse feature directions and offer a natural basis for this search, yet existing SAE-based approaches face the same data constraint. We introduce LangFIR (Language Feature Identification via Random-token Filtering), a method that discovers language-specific SAE features using only a small amount of monolingual data and random-token sequences. Many SAE features consistently activated by target-language inputs do not encode language identity. Random-token sequences surface these language-agnostic features, allowing LangFIR to filter them out and isolate a sparse set of language-specific features. We show that these features are extremely sparse, highly selective for their target language, and causally important: directional ablation increases cross-entropy loss only for the corresponding language. Using these features to construct steering vectors for multilingual generation control, LangFIR achieves the best average accuracy BLEU across three models (Gemma 3 1B, Gemma 3 4B, and Llama 3.1 8B), three datasets, and twelve target languages, outperforming the strongest monolingual baseline by up to and surpassing methods that rely on parallel data. Our results suggest that language identity in multilingual LLMs is localized in a sparse set of feature directions discoverable with monolingual data. Code is available at https://anonymous.4open.science/r/LangFIR-C0F5/.

URL PDF HTML ☆

赞 0 踩 0

2604.03527 2026-04-07 cs.AI cs.HC

Explainable Model Routing for Agentic Workflows

Mika Okamoto, Ansel Kaplan Erol, Mark Riedl

Comments ACM CHI 2026 Human-Centered Explainable AI (HCXAI) Workshop (Spotlight)

2604.03526 2026-04-07 cs.CV cs.AI

Determined by User Needs: A Salient Object Detection Rationale Beyond Conventional Visual Stimuli

Chenglizhao Chen, Shujian Zhang, Luming Li, Wenfeng Song, Shuai Li

2604.03525 2026-04-07 cs.LG

Online learning of smooth functions on $\mathbb{R}$

Jesse Geneson, Kuldeep Singh, Alexander Wang

2604.03524 2026-04-07 cs.AI

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Gregory M. Ruddell

Comments Extends arXiv:2603.21415. 30 pages. Also available on Zenodo (10.5281/zenodo.19393882)

详情

DOI: 10.5281/zenodo.19393882

英文摘要

Current AI safety relies on behavioral monitoring and post-training alignment, yet empirical measurement shows these approaches produce no detectable pre-commitment signal in a majority of instruction-tuned models tested. We present an energy-based governance framework connecting transformer inference dynamics to constraint-satisfaction models of neural computation, and apply it to a seven-model cohort across five geometric regimes. Using trajectory tension (rho = ||a|| / ||v||), we identify a 57-token pre-commitment window in Phi-3-mini-4k-instruct under greedy decoding on arithmetic constraint probes. This result is model-specific, task-specific, and configuration-specific, demonstrating that pre-commitment signals can exist but are not universal. We introduce a five-regime taxonomy of inference behavior: Authority Band, Late Signal, Inverted, Flat, and Scaffold-Selective. Energy asymmetry (Σ\r{ho}_misaligned / Σ\r{ho}_aligned) serves as a unifying metric of structural rigidity across these regimes. Across seven models, only one configuration exhibits a predictive signal prior to commitment; all others show silent failure, late detection, inverted dynamics, or flat geometry. We further demonstrate that factual hallucination produces no predictive signal across 72 test conditions, consistent with spurious attractor settling in the absence of a trained world-model constraint. These results establish that rule violation and hallucination are distinct failure modes with different detection requirements. Internal geometry monitoring is effective only where resistance exists; detection of factual confabulation requires external verification mechanisms. This work provides a measurable framework for inference-layer governability and introduces a taxonomy for evaluating deployment risk in autonomous AI systems.

URL PDF HTML ☆

赞 0 踩 0

2604.03523 2026-04-07 cs.RO cs.AI cs.CV cs.LG

Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret

Viet Dung Nguyen, Yuhang Song, Anh Nguyen, Jamison Heard, Reynold Bailey, Alexander Ororbia

Comments 10 pages, 4 figures, 4 tables

2604.03506 2026-04-07 cs.AI

BioAlchemy: Distilling Biological Literature into Reasoning-Ready Reinforcement Learning Training Data

Brian Hsu, Ozan Gökdemir, Carlo Siebenschuh, Bruce Parrello, Neil Getty, Thomas S. Brettin, Rick L. Stevens, Ian T. Foster, Nicholas Chia, Arvind Ramanathan

2604.03505 2026-04-07 cs.CV

Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies

In Seon Kim, Ali Moghimi

详情

英文摘要

Beyond the immediate biophysical benefits, urban trees play a foundational role in environmental sustainability and disaster mitigation. Precise mapping of urban trees is essential for environmental monitoring, post-disaster assessment, and strengthening policy. However, the transition from traditional, labor-intensive field surveys to scalable automated systems remains limited by high annotation costs and poor generalization across diverse urban scenarios. This study introduces a multimodal framework that integrates high-resolution satellite imagery with ground-level Google Street View to enable scalable and detailed urban tree detection under limited-annotation conditions. The framework first leverages satellite imagery to localize tree candidates and then retrieves targeted ground-level views for detailed detection, significantly reducing inefficient street-level sampling. To address the annotation bottleneck, domain adaptation is used to transfer knowledge from an existing annotated dataset to a new region of interest. To further minimize human effort, we evaluated three learning strategies: semi-supervised learning, active learning, and a hybrid approach combining both, using a transformer-based detection model. The hybrid strategy achieved the best performance with an F1-score of 0.90, representing a 12% improvement over the baseline model. In contrast, semi-supervised learning exhibited progressive performance degradation due to confirmation bias in pseudo-labeling, while active learning steadily improved results through targeted human intervention to label uncertain or incorrect predictions. Error analysis further showed that active and hybrid strategies reduced both false positives and false negatives. Our findings highlight the importance of a multimodal approach and guided annotation for scalable, annotation-efficient urban tree mapping to strengthen sustainable city planning.

URL PDF HTML ☆

赞 0 踩 0

2604.03498 2026-04-07 cs.AI

Resource-Conscious Modeling for Next- Day Discharge Prediction Using Clinical Notes

Ha Na Cho, Sairam Sutari, Alexander Lopez, Hansen Bow, Kai Zheng

2604.03497 2026-04-07 cs.RO cs.AI cs.CV

Sim2Real-AD: A Modular Sim-to-Real Framework for Deploying VLM-Guided Reinforcement Learning in Real-World Autonomous Driving

Zilin Huang, Zhengyang Wan, Zihao Sheng, Boyue Wang, Junwei You, Yue Leng, Sikai Chen

Comments 36 pages, 21 figures

2604.03493 2026-04-07 cs.CL

Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations

Erin MacMurray van Liemt, Aida Davani, Sinchana Kumbale, Neha Dixit, Sunipa Dev

Comments 18 pages, 4 figures

2604.03489 2026-04-07 cs.LG math.OC

Improving Feasibility via Fast Autoencoder-Based Projections

Maria Chzhen, Priya L. Donti

2604.03478 2026-04-07 cs.LG

Investigating Data Interventions for Subgroup Fairness: An ICU Case Study

Erin Tan, Judy Hanwen Shen, Irene Y. Chen

2604.03473 2026-04-07 cs.CL cs.AI

Evolutionary Search for Automated Design of Uncertainty Quantification Methods

Mikhail Seleznyov, Daniil Korbut, Viktor Moskvoretskii, Oleg Somov, Alexander Panchenko, Elena Tutubalina

2604.03465 2026-04-07 cs.CL

The Tool Illusion: Rethinking Tool Use in Web Agents

Renze Lou, Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng, Suman Nath, Wenpeng Yin, Jianfeng Gao

Comments preprint

2604.03463 2026-04-07 cs.LG cs.RO

Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction

Daniel Jost, Luca Paparusso, Martin Stoll, Jörg Wagner, Raghu Rajan, Joschka Bödecker

2604.03462 2026-04-07 cs.CV cs.GR cs.RO

SpectralSplat: Appearance-Disentangled Feed-Forward Gaussian Splatting for Driving Scenes

Quentin Herau, Tianshuo Xu, Depu Meng, Jiezhi Yang, Chensheng Peng, Spencer Sherk, Yihan Hu, Wei Zhan

Comments Under review

2604.03456 2026-04-07 cs.LG cs.CY

Earth Embeddings Reveal Diverse Urban Signals from Space

Wenjing Gong, Udbhav Srivastava, Yuchen Wang, Yuhao Jia, Qifan Wu, Weishan Bai, Yifan Yang, Xiao Huang, Xinyue Ye

Comments 30 pages, 18 figures

2604.03454 2026-04-07 cs.CV cs.AI

RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

Ganlin Feng, Yuxi Long, Hafsa Ali, Erin Lou, Fahad Butt, Qian Liu, Yang Wang, Pingzhao Hu

Comments Accepted to CVPR 2026. 8 pages main paper + appendix

2604.03451 2026-04-07 cs.RO cs.CY cs.HC

Do Robots Need Body Language? Comparing Communication Modalities for Legible Motion Intent in Human-Shared Spaces

Jonathan Albert Cohen, Kye Shimizu, Allen Song, Vishnu Bharath, Kent Larson, Pattie Maes

2604.03449 2026-04-07 cs.LG cs.SY eess.SY

Neural Operators for Multi-Task Control and Adaptation

David Sewell, Xingjian Li, Stepan Tretiakov, Krishna Kumar, David Fridovich-Keil

Comments 25 pages, 10 figures, 2 tables

2604.03448 2026-04-07 cs.CV cs.AI cs.HC cs.LG

ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop

Kenan Tang, Jiasheng Guo, Jeffrey Lin, Yao Qin

Comments Accepted to CVPR 2026 Workshop on Generative AI for Storytelling (AISTORY)